
Since the initial "The DAO" smart contract, reentrancy vulnerabilities have plagued smart contracts on various blockchains, followed by a series of high-profile exploits that resulted in the theft of hundreds of millions or even billions of dollars in funds from smart contracts.
These vulnerabilities are difficult to detect through testing because they only occur when different parts of smart contract code are executed in a specific order, and the number of such orders is vast, making simple testing unfeasible.
In previous blog posts, we demonstrated how to perform formal verification using TLA+ (a general toolkit capable of comprehensively testing software designs) to effectively eliminate reentrancy vulnerabilities in smart contracts (or "containers") on the Internet Computer.
These blog posts are as follows:
Using TLA+ to eliminate smart contract errors: An introduction to TLA+ with a ckBTC case study to illustrate how model checking finds and helps fix reentrancy errors.
Using TLA+ models to clear errors: Learn about DFINITY's open-source TLA+ models (NNS, ckBTC, etc.) and real-world examples for developers.
Tutorial - Creating Internet Computer containers with TLA+: A step-by-step technical guide on how to model Internet Computer containers using TLA+ and PlusCal and check their properties.
While we find TLA+ to be very effective in eliminating reentrancy errors, its use has a limitation: TLA+ models correspond to a snapshot of the code at a certain point in time. However, as the code evolves, these models may become outdated and may even go unnoticed, allowing reentrancy errors to reappear in the codebase.
In this article, we propose a solution to this problem: an automated framework for testing the consistency between Rust containers and their TLA+ specifications.
TLA+ and synchronization challenges
In the TLA+ approach, we do not directly test the code but build a simplified model of the code. We then instruct TLA+ tools (i.e., the TLC model checker) to systematically check all possible execution sequences in the model (up to a certain extent) and ensure that certain provided properties hold, regardless of the sequence.
For example, in the governance model of the neural network (NNS), we can check whether the voting power of neurons is correctly based on their staked ICP. Then, TLC can detect any reentrancy vulnerabilities that could allow an attacker to gain more voting power or tell us that no such vulnerabilities exist.
A major advantage of this approach is that TLA+ models are simpler than code, enabling effective analysis of the model without manual input. For this reason, some tech giants, such as Microsoft and AWS, also use TLA+ to verify their critical infrastructure.
However, the verification guarantees apply only to the model, not to the code itself. While it is possible to manually check the consistency between the model and the code effectively, this task becomes tedious and error-prone as the code evolves, and each change in code may require checking whether the model still correctly captures the code.
Therefore, models and code often become unsynchronized over time, which means that the strong consistency guarantees provided when creating and checking the models will eventually fade.
This is precisely the problem we addressed in a recent feature. We developed a framework for automatically testing the correspondence between Rust container code and its TLA+ model and applied it to our governance container. Now, our CI (Continuous Integration) pipeline continuously tests whether governance code deviates from the model.
This allows us and the broader ICP community to feel secure, as the NNS is safe and can withstand any attempts by attackers to achieve an NNS takeover through reentrancy vulnerabilities. Furthermore, the framework is not specific to governance: it can be applied to other Internet Computer containers, including community containers.
How it works
TLA+ models describe systems as transition systems, in which the so-called transition predicates describe how the system transitions its state from one state to another. When the predicate is "true" for two states s1 and s2, the system can transition from state s1 to s2. If the predicate is also "true" for s2 and some s3, the system can transition from s1 to s2 and then to s3, and so on.
Given the transition predicate and another predicate that specifies which states are valid initial states, the model checker can construct all valid sequences of states (also known as traces). Furthermore, users can specify a predicate to define "good" and "bad" states, and then the model checker can issue an alert when a bad state is reached via some valid sequences of states.
What does this have to do with our containers? Containers change state while processing messages, which can be user messages, messages from other containers, or system-generated "heartbeat" or "timer" messages, processed by so-called message handlers.
Since ICP guarantees that message processor execution is atomic, we can model containers in TLA+ by defining the transition predicate as "some message processor of the relevant container has executed" and defining the operations we believe each message processor executes in TLA+ predicates.
So the question is: Are we capturing the actions that message handlers actually perform correctly in TLA+?
To address this issue, we built a tool that can reuse existing container tests to collect state pairs from such test executions. Each collected state pair describes the state of the container at the start and end of each message handler or at least describes the part of the state relevant to the TLA+ model (which parts are relevant is determined by the user).
Now we can invoke the transition predicates of the TLA+ model and ask: Do the observed state pairs really satisfy the predicates? If not, we know that the code has done something that the TLA+ model did not account for, and an alert is issued (leading to a CI failure).
As mentioned above, this is especially important as code continues to evolve: now, unexpected changes that lead to TLA+ models being out of sync with code have a chance to be detected automatically.
How much work is required?
The tool requires three main steps:
Building the TLA+ model itself.
Instrument the Rust container code to detect the start and end of each message handler and describe how to map the container state to TLA+ values. Instrumentation can be divided into two types: instrumentation scattered throughout the production code and instrumentation stored in separate modules. We focus on streamlining the first type of instrumentation. For governance, for example, the code added to the production code does not exceed 100 lines, and while we did not focus on keeping the second type of instrumentation streamlined, such code still does not exceed 1000 lines.
Instrument the tests to call functions that check whether the recorded state satisfies the transition predicates. For tests that only use Rust (i.e., without building actual Wasm containers and uploading them to replicas), each test can be accomplished with just one line of code. For Wasm-based tests, some additional work is needed to expose and obtain the recorded state through queries, but each test also only requires one line of code.
More detailed technical instructions on setting up the tools can be found in the IC GitHub repository.
Conclusion
This tool focuses on providing a cheap and effective way to check the correspondence between models and Rust container code. However, it can only detect deviations based on what has been observed in the tests you have written; it cannot detect deviations that only occur in untested scenarios.
Thus, the guarantees obtained in this way are much weaker compared to some other verification methods, particularly code-level verification, which provides very strong guarantees.
However, code-level verification is also more labor-intensive than writing TLA+ models, automatically performing model checking, and using provided tools for code instrumentation. We believe that this approach strikes an optimal balance between the required workload and the benefits provided.
Moreover, it synergizes with testing: the higher the test coverage, the more confident you can be that your code is in sync with the TLA+ model, thus avoiding reentrancy errors.
If you have already started writing TLA+ models for your own containers, you can obtain tools from the IC repository to link the models to the code (including documentation).
Otherwise, if you are just starting to use TLA+ for your own containers, here are some useful resources:
Our blog post introduces how and why TLA+ helps prevent reentrancy vulnerabilities.
Technical tutorials or videos on modeling Internet Computer containers with TLA+ (Part 1, Part 2);
Free TLA+ resources: (Learning TLA+) book, (Specifying Systems) book, TLA+ video course;
Documentation page on TLA+ for Internet Computer.

IC content you care about
Technical Progress | Project Information | Global Events

Follow the IC Binance channel
Stay updated