图片

Zero-knowledge machine learning (zkML) is the combination of cryptographic magic and artificial intelligence. Imagine proving that a machine learning model produced a specific result without revealing model inputs, internal workings, or outcomes. That's the vision of zkML - and there are already various cutting-edge methods competing to make it a reality.

In this article, we will delve into three leading zkML paradigms - JOLT (modified for machine learning operations and with precompiled features: JOLTx), EZKL (based on Halo2), and DeepProve (based on GKR), compare how they work and their performance, and explain why JOLT's lookup-centric approach is set to reverberate throughout the industry like a snowball in a taxi.

What is zkML?

Zero-knowledge machine learning (zkML) is an emerging field that combines zero-knowledge proofs (ZKP) with machine learning (ML) to achieve verifiable and privacy-preserving ML computations. zkML allows provers to prove that ML models execute correctly without revealing sensitive inputs or requiring verifiers to rerun computations. You can also use it to conceal model weights and models.

This is critical for applications like decentralized finance, privacy-preserving AI, and secure off-chain computation. By ensuring the privacy and trustlessness of ML inference, zkML paves the way for transparent and scalable AI applications in blockchain, Web3, and beyond.

The technical foundation of zkML

JOLT-based zkML (JOLTx) - JOLT is a zkVM, a zero-knowledge virtual machine capable of proving the execution of arbitrary programs (JOLT paper, JOLT blog).

JOLT targets the RISC-V instruction set, which means you can compile any program (written in high-level languages like Rust or C++) into RISC-V assembly language, and then generate a proof that the assembly code runs correctly.

JOLT introduces a new front-end based on the concept of 'lookup singularities': it no longer imposes heavy algebraic constraints on each operation but converts CPU instructions into lookups in a massive predefined valid instruction result table. Each calculation step (like addition, multiplication, or even bitwise operations) is validated against this table through fast lookup parameters called Lasso or recently Shout.

JOLT's circuits only need to perform lookups in these tables, significantly simplifying the proof generation process. For example, 64-bit 'or' and 'and' operations (which are costly under regular arithmetic constraints) can be completed with just one table lookup in JOLT.

The design ensures that the prover's main work for each CPU instruction is merely submitting a small number of field elements (about 6 per step) and proving the correctness of these lookups, making JOLT a general and scalable method: any machine learning model can be viewed as a program and proved with minimal custom circuit design.

We plan to contribute some dedicated precompiled features for machine learning operations to the JOLT stack. Nonlinearities can be efficiently handled through lookups, while other common attributes can be managed through sum checks and unique techniques of the JOLT zkVM.

We are expanding this to call it JoltX, but it actually just adds a few tools and dedicated precompiled features on top of JOLT. We expect to release it in a few months, so stay tuned!

What other excellent zkML projects are there?

EZKL (based on Halo2) - EZKL adopts a more traditional SNARK circuit approach built on Halo2.

EZKL does not simulate the CPU but operates at the computational graph layer of machine learning models. Developers export neural networks (or any computation graph) as ONNX files, and the EZKL toolkit compiles it into a set of polynomial constraints (arithmetic circuits).

Each layer of the neural network - such as convolution, matrix multiplication, activation functions - is transformed into constraints solvable by the Halo2 prover. To handle non-natural polynomial operations (like ReLU activations or large integer operations), Halo2 also uses lookup parameters, but in a more limited way.

For example, large tables (like all $2^{64}$ possibilities of 64-bit operations) must be split or 'chunked' into smaller tables (like 16-bit chunks), requiring multiple lookups plus reorganization constraints to simulate the original operations. This chunking increases the overhead and complexity of the circuit.

Thus, the proof generation of EZKL involves creating many such constraints and using Halo2's proof algorithms (usually KZG or Halo-based commitments) to generate proofs. The advantage of the EZKL approach is its model awareness - it can be specifically optimized for the layers of neural networks and even prune or quantize weights to enhance efficiency.

However, each new model or layer type may require custom constraint writing or at least re-generating circuits, and provers must handle large constraint systems, which can be slow for large models.

DeepProve (based on GKR) - Lagrange's DeepProve takes a different path, using an interactive proof protocol called GKR (Goldwasser–Kalai–Rotblum).

Essentially, GKR treats the entire computation process (similar to the forward pass of a machine learning model) as a layered arithmetic circuit and proves its correctness through a sum and check protocol rather than cumbersome polynomial operations. DeepProve's workflow extracts the model (also via ONNX) and then automatically generates a computation sequence corresponding to each layer of the neural network.

It does not directly convert it into static SNARK circuits but operates at the computational graph layer of machine learning models. Developers export neural networks (or any computation graph) to ONNX files, and the EZKL toolkit compiles it into a set of polynomial constraints (arithmetic circuits) tailored for that model.

The advantage of GKR is that its prover complexity scales linearly with circuit size (O(n)), with only a small constant factor slowdown compared to normal execution. In fact, for certain tasks (like large matrix multiplication), modern GKR-based systems can be less than 10 times slower than ordinary execution. DeepProve combines this with modern polynomial commitment techniques to output concise proofs after verification rounds, effectively creating zkSNARKs for neural network inference.

One disadvantage of GKR is that it is best suited for structured computations (like the static layers of neural networks) and involves more complex cryptographic protocol logic, but its advantage lies in the raw speed of deep computation proofs.

Advantages and disadvantages

Each method has its unique advantages and potential drawbacks.

JOLTx (lookup-based precompiled zkVM)

Advantages: Extremely flexible (it can prove any code, not just neural networks) and benefits from JOLT's 'lookup singularity' optimization, making even bit-level operations inexpensive.

It does not require custom circuits for each model - just compile and run - greatly enhancing the developer experience and reducing the chances of bugs.

Using Lasso to find means proving that constraints scale primarily based on the number of operations performed rather than their complexity, resulting in JOLT having a consistent cost model.

Disadvantages: As a general-purpose virtual machine, it may incur some overhead for each instruction; for ultra-large models with millions of simple operations, dedicated approaches like GKR can achieve lower absolute proof times through streaming computation.

Moreover, JOLT is relatively new - it relies on a novel lookup parameter and complex ISA-level tables, which are cutting-edge technologies that take time to mature. However, considering its design, even JOLT's current prototype outperforms previous zkVMs in efficiency.

EZKL (Halo2 / PLONK)

Advantages: It is built on a widely used SNARK framework, meaning it can benefit from existing tools, audit, and on-chain verifier support (Halo2 proofs can be verified using Ethereum-friendly cryptographic techniques).

EZKL is quite easy for data scientists to use: you can use PyTorch or TensorFlow models, export them to ONNX, and obtain a proof that the model inference was completed correctly.

It has already achieved practical integrations (from DeFi risk models to game AI, which we will discuss below), indicating its ability to prove genuine machine learning tasks.

Disadvantages: As models grow, performance may become a bottleneck. Traditional SNARK circuits often incur huge overheads - historically, the workload of provers was a million times greater than merely running the model.

Halo2's approach attempts to optimize, but operations like large matrix multiplication or nonlinear activations still translate into many constraints. The need to chunk large lookups (like 32-bit arithmetic or nonlinear functions) adds extra constraints and proof time.

Essentially, EZKL may struggle with very large networks (in terms of proof time and memory), sometimes requiring circuit splitting or special techniques to fit practical constraints. It is a good general-purpose SNARK method, but not the fastest when scaling.

DeepProve (GKR)

Advantages: Provides extremely fast proof generation speed for deep models. By avoiding the overhead of encoding each multiplication as a polynomial constraint, GKR allows the prover to perform almost only regular numerical calculations and then add a thin layer of cryptographic verification. The DeepProve team reports that for equivalent neural networks, GKR's proof speed is 54 to 158 times faster than EZKL.

In fact, the larger the model, the greater the advantage of GKR: as model complexity increases, DeepProve's advantages also increase, as its linear expansion remains controllable, while the costs of circuit-based methods continue to balloon.

Disadvantages: This approach is somewhat only applicable to circuit-like computations (luckily, this type of computation covers most feedforward machine learning). If your workload contains a lot of conditional logic or irregular operations, its flexibility will diminish - these operations are easier to handle in virtual machines like JOLT.

Moreover, setting GKR proofs as zero-knowledge and concise leads to proofs being larger than classical SNARK proofs and verification processes. While the latter is much faster than rerunning the entire model, it is not instantaneous.

The verification time for DeepProve's proofs for CNN is about 0.5 seconds, which is excellent for large models, but traditional SNARK verifiers can complete it in milliseconds.

Therefore, DeepProve focuses on performance, which may come at the cost of proof protocol complexity and somewhat heavier verification tasks than Halo2 proofs. It is a powerful method for scaling zkML, especially in server or cloud environments, although it might be less suited for lightweight clients or on-chain verification before further optimization.

Performance and efficiency comparison

In terms of raw performance, each zkML method has different priorities. Proof generation time is often a key metric; currently, the GKR-based prover of DeepProve ranks first in speed - benchmarks show that under the same model, its proof generation speed is 50 to 150 times faster than EZKL.

This leap stems from GKR's near-linear time algorithm, which sidesteps the cumbersome polynomial algebraic operations of SNARK circuits. In fact, neural network inference that might take hours to prove with EZKL can be completed in just a few minutes with DeepProve. Moreover, as the model size increases (more layers, more parameters), this gap widens further, as GKR's single-operation overhead remains low while Halo2's overhead continues to grow.

JOLT's performance goals are equally ambitious - it aims to be an order of magnitude faster than existing SNARK frameworks. The a16z team has demonstrated that Lasso (the lookup engine in JOLT) runs 10 times faster than Halo2's lookup mechanism and plans to boost the speed to 40 times.

This means that operations that were once bottlenecks (like those pesky bitwise operations or large field operations) become much cheaper. JOLT essentially trades computation for table lookups, and because of Lasso, the cost of looking up values in the massive virtual table is low, and it does not increase with the size of the table (no, they are not actually storing $2^{128}$ rows of data in memory!) - it primarily grows with the number of lookups executed.

Thus, if your machine learning model performs a million ReLU operations, the cost of the prover grows with those million operations, but each operation is just a quick table check. Early results indicate that JOLT's prover processes only a small number of field element commitments per instruction step, incurring minimal overhead. In short, JOLT is maximizing the computational load of each operation by using pre-computed knowledge (look-up tables) to skip complex dynamic mathematical operations, significantly reducing the traditional SNARK overhead.

The combination of EZKL with Halo2, while slower, has not stagnated; it benefits from Halo2's optimizations like custom gates and partial lookups. For medium-sized models, EZKL is fully usable and has been proven superior to some early alternatives. On Cairo, it is about 3 times faster than STARK-based methods and 66 times faster than RISC-V STARK in a benchmark test in 2024.

This indicates that carefully optimized SNARK circuits can outperform simpler implementation methods in virtual machines. The downside is that to achieve these speeds, EZKL and Halo2 must meticulously tune everything, and they still inherit the fundamental costs of polynomial commitments, FFTs, and proof algorithms.

In contrast, the new approaches of JOLT and DeepProve avoid most of the overhead from FFTs or high-order polynomials - JOLT limits itself to lookup operations (plus a new parameter for efficiently handling lookup operations), while DeepProve uses sum checks (which applies to multilinear polynomials and only requires lightweight hash commitments).

In classic SNARKs, a lot of time is spent computing operations like large FFTs or multi-scalar multiplications. GKR largely avoids FFTs by operating on Boolean hypercubes and multilinear extensions, while JOLT avoids the need for large FFTs required for lookups by not implementing gigantic tables from the start.

In terms of proof size and verification, there are trade-offs. JOLT and EZKL (Halo2) ultimately generate standard SNARK proofs (typically a few KB) that can be verified quickly (some pairings or some polynomial evaluations).

DeepProve's approach is similar to STARKs and may generate larger proofs (tens of KB to hundreds of KB, depending on the model). Verification, while much faster than rerunning the model, may involve more steps than validating concise KZG proofs.

The DeepProve team emphasizes that their verification speed is 671 times faster than simply recalculating an MLP using a GPU to check it, reducing verification time to about 0.5 seconds for fairly complex models.

This is a powerful result indicating that even if the proofs are larger, verifying them is much easier than performing the original AI computations (this is especially important if the verifier is a smart contract or a lightweight client with limited computing power). Halo2's proof sizes are smaller, and verification speeds are faster, but if the initial proof speeds are too slow for your application, this difference becomes moot.

An important aspect of efficiency is how these frameworks handle special operations. For example, in machine learning inference, we often encounter nonlinear steps (like ArgMax, ReLU, or Sigmoid). EZKL can handle ReLU by enforcing constraints using boolean selectors (which can itself be done through lookups or other constraints).

In contrast, JOLT can implement ReLU in the program (as a few CPU instructions based on symbolic branching) and prove these branches at low cost through lookups - essentially leveraging the CPU's comparison capabilities.

DeepProve can accommodate piecewise linear functions by incorporating them into the arithmetic circuits that GKR will verify, but highly nonlinear functions (like Sigmoid) might require polynomial or lookup approximations. Overall, the philosophy of JOLT is to make all peculiarities appear normal by executing actual code (including conditional statements, loops, etc.) and using lookup parameters to cover the logic of any operation.

The philosophy of Halo2 is to limit everything to polynomial equations (which can be cumbersome for complex computations), while GKR's philosophy is to break calculations down into recursively verifiable sums and products. The efficiency each method has in handling these issues will reflect in proof times: JOLT might excel at controlling logic, GKR might excel at large linear algebra operations, while Halo2 sits in between, balancing both but with higher overhead.

To verify Kinic proofs, we use the Internet Computer blockchain, as it handles larger scale proof verifications faster than other L1 blockchains. We are also not restricted by precompiled constraints, as precompiles require complex work to put JOLTx on-chain.

Scalability for large models

When we talk about scalability in zkML, we may refer to two things: handling large models (multi-layered, lots of parameters) and handling diverse models (different architectures or even dynamic models). We first consider large-scale deployment - assume you want to validate inference on a large CNN or a Transformer with millions of parameters.

DeepProve is designed for this scenario, and its performance scales exponentially with the size of the model. For instance, if a small model (like a tiny MLP) is proved with DeepProve 50 times faster than with EZKL, then a larger model might be over 100 times faster.

The team notes that as we scale to models with millions of parameters, DeepProve's speed will outpace other solutions. They plan to further enhance this speed through parallel and distributed proof techniques. GKR is easy to parallelize because many subcomputations (like all neurons in a certain layer) can be verified in bulk.

Moreover, GKR does not require a gigantic monolithic 'proof key' that grows with circuit size - it can run in real-time, thus using less memory, which makes DeepProve promising for cloud-scale or even on-chain verification of large neural networks in the future.

EZKL (Halo2) can handle fairly large networks, but there are limitations. Building a single giant circuit for large models can require substantial memory and time (in some cases, tens of GB of RAM and hours of computation). The EZKL team has been exploring improvement methods such as circuit splitting and aggregation (proving different parts of the model separately and then merging proofs) and quantization strategies to reduce arithmetic complexity.

Nonetheless, general-purpose SNARK algorithms like Halo2 will still face challenges beyond a certain scale without specific optimizations. It may be best suited for medium to small models or offline situations where proofs are infrequent (or if powerful hardware is available).

The advantage is that once a proof is generated, verifying it is easy even for large models - this is very useful for on-chain scenarios, as smart contracts might verify proofs for models with 100 million parameters, which is absolutely infeasible to recompute on-chain.

JOLT is in an interesting position regarding scalability. On one hand, it is a SNARK-based method, so its workload must also scale roughly linearly with the number of operations executed (just like Halo2 and GKR).

On the other hand, due to the adoption of lookup-based techniques, the constant for each operation in JOLT is very low. If we consider 'large-scale deployment,' such as running a model with 10 million operations, JOLT must perform these 10 million lookups and corresponding commitments, which is heavy, but in essence, it is no more burdensome than GKR or even a simple circuit - it remains linear.

The question is: how well can JOLT optimize and parallelize it? Since JOLT treats each instruction as a step, if the proof system supports split tracing, it can achieve proof parallelization by executing multiple instructions in parallel (similar to multiple CPU cores).

Current research indicates that they are first focusing on single-core performance (with a per-step cost of about 6 field elements), but since this method is a virtual machine, it is conceivable that in the future, different parts of the program trace could be distributed among provers (this is merely speculation, but considering that lookup parameters can be combined, it is not impossible).

Even without fancy distributions, JOLT can leverage existing polynomial commitment techniques - for example, universal setups that support circuits of arbitrary size, thus eliminating the need for new trusted setups for larger models.

In handling different ML architectures, JOLT excels: regardless of whether your model is CNN, RNN with loops, decision tree ensembles, or some new hybrid model, as long as you can run it on a CPU, JOLT can prove it, making it highly scalable in a development sense - you don't need to redesign the proof method for each new model type.

DeepProve and other GKR-based methods are currently tailored to typical deep neural networks (matrix operation layers and element functions). They scale excellently with the depth and width of the network, but if you impose very irregular workloads (like a dynamic decision to skip layers or models with data-dependent loops), the framework may need adjustments or may lose some efficiency.

However, most large-scale deployed machine learning models (vision, natural language processing, etc.) have structured architectures, so this is not an issue.

One might ask: which method is best suited for real-time or device-side use rather than cloud scale? Real-time means that even for smaller models, we want the proof latency to be very low.

JOLT's approach is a SNARK that might allow proving smaller models in a few seconds or less on a decent device, especially when the technology matures. EZKL on mobile CPUs may be slower (Halo2 proofs are not yet fully suited for mobile devices, although efforts are being made to accelerate it).

DeepProve can effectively leverage GPUs - if there is a GPU on the device, it might actually prove a small model very quickly (GKR favors parallel hardware), but DeepProve on a CPU may not be optimized for real-time scenarios as well as JOLT.

Thus, scalability is not just about handling larger models - it's also about effectively handling the right sizes in the right environment. JOLT aims to be a universal mainstay across environments, making it a strong candidate for cloud and edge deployments in the long run.

JOLTx: Redefining zkML capabilities through checksums and lookups

Given so many innovations, why do we emphasize that JOLT-based zkML is the best choice for the future? The answer lies in its versatility, performance, and practical application - this combination is hard to surpass.

First, JOLT introduces a brand new paradigm for building SNARKs. JOLT no longer embeds high-level programs gate by gate into circuits; instead, it realizes the vision of circuits that only perform lookup operations, meaning complexity is handled in the precomputation phase (defining those massive instruction tables), while the online phase is very simple: just prove that you performed a valid lookup.

It's like turning every complex operation in a circuit into an 'O(1)' step, significantly reducing the overhead for the prover performing traditionally cumbersome tasks (like bitwise operations, arbitrary branching logic) in SNARK.

For machine learning models, which often mix linear algebra (which SNARK can handle) and nonlinear decisions or data transformations (which SNARK handles poorly), JOLT provides a balanced solution - linear algebra still has to be done step by step, but each arithmetic operation is simple, and any nonlinear decisions (like 'if neuron > 0 then...' in ReLU) are also simple, as the VM can simply branch, and the proof of the correct branch is just a lookup check.

Secondly, JOLT is fast, and its speed is continuously improving. Research behind it indicates that its prover speed can be immediately increased by over 10 times compared to mainstream SNARK tools, and hints at achieving up to 40 times with optimization.

It also inherits many of the early improvements of SNARKs: it adopts modern polynomial commitment schemes and can leverage existing cryptographic libraries, its core Lasso proof is efficiency-focused, and it has demonstrated performance superior to older lookup proofs (which were the bottleneck for systems like Halo2).

For real-time or local machine learning, this means that generating proofs on smartphones or laptops suddenly becomes less crazy. If your model inference typically takes 100 milliseconds, JOLT could take only a few seconds to prove this - that’s a big deal!

In contrast, proofs in old methods might take minutes or hours, making them impractical outside of server farms. JOLT's efficiency gains bring zkML closer to the realm of interactive use.

For example, we can envision a browser extension that uses JOLT to prove in real-time, 'I ran a visual model on the image you just uploaded, and it contains no NSFW content,' before allowing the publication of that image.

Or the car's onboard computer proves to the insurance server that it indeed used a validated driving model during the autonomous driving process, these scenarios require quick turnaround, and JOLT's speed makes it possible.

Another important factor is the developer and user experience. JOLT allows developers to use the languages and models they are familiar with (through RISC-V compilation and upcoming ONNX conversion), and you don’t need to understand the complexities of SNARK circuits to use it.

This is crucial for the machine learning field: most machine learning engineers are not cryptography experts, nor should they be. With JOLT, they can write or compile existing code and obtain proofs. This approach is reminiscent of the early days of GPUs - initially, only graphics experts wrote GPU code, but eventually, general frameworks allowed any programmer to use GPU acceleration.

In this sense, JOLT is like a 'GPU for zero-knowledge proofs': a dedicated engine accessible via standard toolchains, greatly lowering the barrier to adoption. We will likely see some libraries packaging common machine learning tasks (like model inference, proof of model accuracy, etc.) on top of JOLT for easy plug-and-play.

The auditability of JOLT is another subtle change. Since it fundamentally proves the execution of a standard ISA, it is easier to reason about and audit than custom circuits. You can rely on the correctness of the clearly defined RISC-V specification and lookup tables without having to verify thousands of handwritten constraints, meaning proofs for critical machine learning models have higher reliability.

For instance, if a model is used for court or medical decision-making, having a clear audit trail ('this program executed correctly') is more reassuring than 'this custom circuit, understood only by a few experts, has been satisfied.' If needed, auditors can even step through the virtual machine execution trace, something that is not possible in monolithic circuit proofs.

JOLT-based zero-knowledge proofs (zkML) perfectly combine the elegance of theory with practical implementation - possessing the performance breakthroughs necessary for scalability and the flexibility required for widespread application, transforming zero-knowledge proofs into developer-friendly high-speed utilities.

While methods based on Halo2 and GKR have paved the way, demonstrating their possibilities (and will continue to apply with their specific advantages), JOLT aims to unify and elevate the zkML space, like the leap from hand assembly to high-level programming - once a universal efficient solution emerges, it can empower the entire ecosystem to thrive.

For anyone looking to deploy verifiable machine learning in practice, JOLT offers a clear and highly attractive path forward: fast proofs for any model, anytime and anywhere. The future of zkML also belongs to meticulously designed zkVMs and their precompiled code!

图片

#Kinic #AI #ICP生态

IC content you care about

Technical advancements | Project information | Global events

Follow the IC Binance channel

Stay updated