Author: @Web3_Mario
Summary: Recently, I have been looking for new project directions and came across a technology stack that I had not encountered before while doing product design, so I did some research and organized my learning insights to share with everyone. In general, zkTLS is a new technology that combines Zero-Knowledge Proofs (ZKP) and TLS (Transport Layer Security Protocol) and is mainly used in the Web3 field to verify the authenticity of off-chain HTTPS data provided, without trusting a third party, in an on-chain virtual machine environment. Here, authenticity includes three aspects: the data source indeed comes from a certain HTTPS resource, the returned data has not been tampered with, and the timeliness of the data can be guaranteed. Through this cryptographic implementation mechanism, on-chain smart contracts gain the ability to access off-chain Web2 HTTPS resources reliably, breaking data silos.
What is the TLS protocol?
To gain a deeper understanding of the value of zkTLS technology, it is necessary to provide a brief overview of the TLS protocol. First, TLS (Transport Layer Security Protocol) is used to provide encryption, authentication, and data integrity in network communications, ensuring secure data transmission between the client (such as a browser) and the server (such as a website). For those not in network development, you may notice that some domain names have HTTPS as a prefix while others have HTTP. When accessing the latter, mainstream browsers will warn of insecurity. The former may encounter prompts like 'Your connection is not private' or HTTPS certificate errors. The reason for these prompts lies in the availability of the TLS protocol.
Specifically, the so-called HTTPS protocol guarantees the privacy and integrity of information transmission based on the HTTP protocol using the TLS protocol, making the authenticity of the server verifiable. We know that HTTP is a plaintext transmission protocol and cannot verify the authenticity of the server, leading to several security issues:
1. The information you transmit to the server may be monitored by third parties, leading to privacy leaks;
2. You cannot verify the authenticity of the server, i.e., whether your request has been hijacked by other malicious nodes and returned malicious information;
3. You cannot verify the completeness of the returned information, i.e., whether it is possible for data loss due to network reasons;
The TLS protocol was designed to solve these problems. Here’s an explanation: some may know the SSL protocol, but the TLS protocol is actually developed based on SSL version 3.1. Due to some commercial-related issues, it has been renamed, but it is essentially the same. Therefore, sometimes in certain contexts, the two terms can be used interchangeably.
The main idea of the TLS protocol to solve the above problems is:
1. Encrypted communication: using symmetric encryption (AES, ChaCha20) to protect data and prevent eavesdropping.
2. Identity authentication: Verify the server's identity through digital certificates (such as X.509 certificates) issued by a third party to designated organizations to prevent man-in-the-middle attacks (MITM).
3. Data integrity: Use HMAC (Hash Message Authentication Code) or AEAD (Authenticated Encryption) to ensure data has not been tampered with.
Let’s briefly explain the technical details of the HTTPS protocol based on the TLS protocol in the data interaction process, which is divided into two stages. The first is the handshake stage, where the client and server negotiate security parameters and establish an encrypted session. The second is the data transmission stage, which involves encrypted communication using the session key. The entire process is divided into four steps:
1. Client sends ClientHello:
The client (such as a browser) sends a ClientHello message to the server, which includes:
Supported TLS versions (such as TLS 1.3)
Supported encryption algorithms (Cipher Suites, such as AES-GCM, ChaCha20)
Random number (Client Random) (used for key generation)
Key sharing parameters (such as ECDHE public key)
SNI (Server Name Indication) (optional, used to support multi-domain HTTPS)
Its purpose is to inform the server of the client's encryption capabilities and prepare security parameters.
2. Server sends ServerHello:
Server responds with ServerHello message, which includes:
Selected encryption algorithms
Server random number (Server Random)
Server's certificate (X.509 certificate)
Server's key sharing parameters (such as ECDHE public key)
Finished message (used to confirm handshake completion)
Its purpose is to let the client know the server's identity and confirm the security parameters.
3. Client verifies server:
The client performs the following operations:
Verify server certificate: ensure the certificate is issued by a trusted CA (Certificate Authority) and validate whether the certificate has expired or been revoked;
Calculate shared key: use the ECDHE public keys of both yourself and the server to calculate the session key, which is used for subsequent symmetric encryption (such as AES-GCM).
Send Finished message: prove the integrity of handshake data to prevent man-in-the-middle attacks (MITM).
Its purpose is to ensure server trustworthiness and generate a session key.
4. Start encrypted communication:
The client and server are now using the negotiated session key for encrypted communication.
Using symmetric encryption (such as AES-GCM, ChaCha20) to encrypt data, improving speed and security.
Data integrity protection: using AEAD (such as AES-GCM) to prevent tampering.
Therefore, after these four operations, HTTP protocol issues can be effectively resolved. However, this widely used underlying technology in the Web2 network has caused difficulties in Web3 application development, especially when on-chain smart contracts want to access certain off-chain data. Due to data availability issues, on-chain virtual machines do not open the ability to call external data to ensure the traceability of all data, thereby guaranteeing the security of the consensus mechanism.
However, after a series of iterations, developers found that DApps still had a demand for off-chain data, leading to the emergence of various Oracle projects such as Chainlink and Pyth. They act as a relay bridge between on-chain and off-chain data to break this data island phenomenon. To ensure the availability of relay data, these Oracles typically implement it through the PoS consensus mechanism, making the cost of malicious behavior for relay nodes higher than the benefits, ensuring that they do not provide incorrect information to the chain from an economic perspective. For instance, if we want to access the weighted price of BTC on centralized exchanges like Binance or Coinbase within a smart contract, we rely on these Oracles to aggregate the data accessed off-chain and transmit it to be stored within the on-chain smart contract for use.
What problems does zkTLS solve?
However, people have found that this Oracle-based data retrieval solution has two problems:
1. Cost is too high: We know that to ensure that the data transmitted to the chain by the Oracle is real data and has not been tampered with, it needs to be guaranteed by the PoS consensus mechanism. However, the security of the PoS consensus mechanism is based on the amount of staked funds, which brings maintenance costs. Moreover, in general, the PoS consensus mechanism has a lot of redundant data interactions because when a data set needs to be repeatedly transmitted, computed, and summarized in the network to reach consensus, it increases the data usage costs. Therefore, generally, Oracle projects only maintain the most mainstream data for free, such as the price of BTC and other mainstream assets. For exclusive needs, payment is required. This hinders application innovation, especially for some long-tail and customized demands.
2. Efficiency is too low: Generally, the consensus of the PoS mechanism requires some time, leading to the lag of on-chain data, which is unfavorable for some high-frequency access scenarios, as there is a significant delay between the data obtained on-chain and the real off-chain data.
To address the aforementioned issues, zkTLS technology was born. Its main idea is to introduce the ZKP Zero-Knowledge Proof algorithm, allowing on-chain smart contracts to directly verify that the data provided by a certain node indeed originates from accessing a specific HTTPS resource and has not been tampered with, thus avoiding the high usage costs caused by traditional Oracles due to consensus algorithms.
Some may ask why we do not directly enable Web2 API calling capabilities in the on-chain VM environment. The answer is no, because the reason for maintaining a closed data environment on-chain is to ensure the traceability of all data. In the consensus process, all nodes have a unified evaluation logic for the accuracy of a certain data or execution result, or it can be said to be an objective verification logic. This ensures that in a completely trustless environment, most benevolent nodes can rely on their redundant data to judge the authenticity of direct results. However, because Web2 data is challenging to construct such a unified evaluation logic, different nodes may obtain different results when accessing Web2 HTTPS resources due to certain network latency issues, complicating consensus, especially for high-frequency data fields. Furthermore, another critical issue is that the security of the TLS protocol that HTTPS relies on depends on the random numbers generated by the client (Client Random) (used for key generation) and key sharing parameters, to achieve negotiation with the server regarding encryption keys. However, we know that the on-chain environment is open and transparent; if smart contracts are allowed to maintain random numbers and key sharing parameters, critical data will be exposed, thereby damaging data privacy.
zkTLS adopts another approach, which aims to replace the high costs of traditional Oracles that provide data availability based on consensus mechanisms through cryptographic protection. This is similar to the optimization of OP-Rollup by ZK-Rollup in L2. Specifically, by introducing ZKP zero-knowledge proofs and requesting certain HTTPS resources from off-chain relay nodes, verifying information related to CA certificates, temporal proofs, and data integrity proofs based on HMAC or AEAD, generating proofs and maintaining necessary verification information and algorithms on-chain, allowing smart contracts to verify the authenticity, timeliness, and reliability of the data source without exposing critical information. Specific algorithm details will not be discussed here; interested parties can delve into research.
The greatest benefit of this technical solution is that it lowers the cost of achieving availability for Web2 HTTPS resources. This has sparked many new demands, especially in reducing the on-chain price acquisition of long-tail assets and utilizing authoritative websites in the Web2 world for on-chain KYC, thereby optimizing the technical architecture design of DID and Web3 Game. Of course, we can see that zkTLS also impacts existing Web3 enterprises, particularly the current mainstream Oracle projects. Therefore, to cope with this impact, industry giants such as Chainlink and Pyth are actively following up on related research directions, trying to maintain their dominant position during the technical iteration process, while also giving rise to new business models, such as shifting from time-based to usage-based charging, Compute as a service, etc. Of course, the difficulty here, like most ZK projects, lies in how to reduce computing costs to make it commercially viable.
In summary, when designing products, friends can also pay attention to the development dynamics of zkTLS and integrate this technology stack in appropriate areas, which may lead to new directions in business innovation and technical architecture.