This is my first version of the notes for studying @megaeth_labs white paper. You can take a look if you are interested. Ethereum is the de facto innovation frontier of blockchain and is very worthy of attention. My level is limited. If there are any mistakes or omissions, please point them out. Thank you.
1. Three characteristics and current limiting cases:
high transaction throughput
The best opBNB stands out with its extremely high Gas Rate of 100MGas/s, but it is still very low compared to the capabilities of web2 servers. 100MGas/s is equivalent to 650 uniswap swaps or 3,700 ERC20 transfers per second, while modern servers can execute more than 1 million transactions per second.
abundant compute capacity, abundant compute capacity
Complex applications cannot be put on the chain, mainly due to the limitation of computing power. If the EVM contract is used to calculate the n-Fibonacci number, 5.5 billion Gas is required, which requires 55 seconds of 100MGas/s calculation speed of the entire opbnb chain. In traditional cases, programs written in C language only need 30ms. The speed of a single-core CPU has increased by 1833 times
and, most uniquely, millisecond-level response times even under heavy load.
Except for arb, other mainstream second layers in the table require a block time of more than 1s to update the status of the chain. This is not feasible for applications that require high update rates and fast feedback loops. For example, homemade worlds and games on the chain require a response time of less than 100ms, and high-frequency trading requires a response time of 10ms to place or cancel orders, otherwise these cannot be achieved.
2. How to exceed the limits of performance
Current blockchain architecture (L1)
Every blockchain consists of two basic components: consensus and execution.
Consensus determines the order of user transactions, while execution processes these transactions in the established order to update the blockchain state. In most L1 blockchains, each node performs the same tasks without specialization. Each node participates in a distributed protocol, reaches consensus, and then executes transactions locally. Each L1 must decide how much it can increase the hardware requirements of ordinary user-operated nodes without compromising the basic properties of the blockchain, such as security and censorship resistance.
Therefore, the operation requirements of the full node are very important, which is related to security and anti-censorship
The new paradigm of layer 2
L2 blockchains are heterogeneous in nature and are inherently different. Different L2 nodes are specialized to perform specific tasks more efficiently.
megaETH goes a step further and decouples transaction execution tasks from full nodes. Specifically, megaETH has three roles: sequencers, provers, and full nodes.
The first key is a powerful centralized sorter
Sequencers: Responsible for sorting and executing transactions, but Megaeth is different in that there is only one active sequencer at any given time, eliminating consensus overhead during normal execution. Most full nodes receive state diffs from the sequencer through the p2p network, and then directly apply the differences to update the local state, but they do not re-execute transactions, they will indirectly verify blocks through proofs provided by provers. Advanced users (bridge operators and market makers) can still execute each transaction to get the finality of the transaction as quickly as possible, but this requires higher hardware requirements to keep up with the sequencer. Finally, provers use a stateless validation scheme to asynchronously and unorderedly verify blocks.
Node specialization is very important. Although the generation of blocks is more centralized, the blockchain is more decentralized. For example, sequencers require higher-end servers, while full nodes require very cheap servers.
In addition to the powerful centralized server, there are more complex engineering implementations
If it only relies on powerful servers, Reth can only reach 1000TPS in the experiment, which is about 100MGas/s. This is mainly due to the limitation of updating MPT (the data structure used by Ethereum) in each block, which is 10 times higher than the computational cost of executing the transaction itself.
So there are still many complex situations.
3. Design of megaETH
measure , then build , measure first, find out where the problems are, the real problems with performance limitations, and then design a new system to solve all the problems at the same time.
Strive to design systems to the limits of the hardware, dislike incremental designs in favor of clean-sheet designs that approach theoretical limits.
The following are some challenges and solutions encountered during the design process
Transaction execution
Let's start with the sequencer. Many people say that EVM is the reason for poor L2s performance and low tps, but this is not true. According to Megaeth's test, EVM can reach 14,000 tps, which is already very high.
However, this is not enough for real-time blockchains. Traditional EVM implementations have three inefficiencies:
High state access latency: Accessing and reading blockchain state is slow because it is stored on disk and requires multiple reads.
Solution: The orderer node is equipped with sufficient RAM to save the entire blockchain state. Currently, Ethereum’s RAM is about 100GB. This method significantly speeds up state access by eliminating SSD read latency.
Lack of parallel execution: Because transactions are executed sequentially to ensure state consistency and double spending, it is difficult to execute in parallel
Solution: There is a solution for this solution, but even if it is solved, the actual speedup that can be achieved in actual production is essentially limited by the available parallelism in the workload. According to tests, the actual median parallelism of Ethereum is less than 2 recently, and the parallelism is limited. In fact, the core is that different transactions in Ethereum have a lot of dependencies and even read and write objects in the same state, which leads to conflicts in parallel. This problem needs to be solved.
Interpreter overhead: The additional overhead caused by the virtual machine or interpreter when executing smart contracts.
Solution: A relatively high percentage of opcodes are already native to Rust, so it is difficult to benefit from compilation, and the maximum multiple may be only 2 times acceleration.
In addition to the problems faced by these three common high-performance blockchains, there are two more challenges to achieving a 10ms-level real-time blockchain. The first is to produce blocks with high frequency consistency, for example, one block every 10ms. The second is that the parallel execution engine must support transaction priorities, so that critical transactions can be processed without queuing delays even during peak congestion.
State Synchronization
State synchronization is the process of keeping full nodes up to speed with the sorter, and it is one of the most challenging aspects of high-performance blockchain design.
If transfers and uniswap transactions are transmitted 100,000 times per second, bandwidths of 152.6Mbps and 476.1Mbps are required respectively, which is far more than the 100Mbps bandwidth of the full node, and this 100Mbps is likely to be only one-third utilized. The actual bandwidth used for synchronization may be only 25Mbps, which is a huge difference from the actual requirements.
Update the state root
The concept is very complicated. In the MPT data structure, to update the state root, it is necessary to read and write many leaf nodes and child nodes. If we use 100,000 transfers to calculate, only the reads are counted, which requires about 6 million non-cached database reads. Even if we assume that each database read can be processed by a single disk I/O, 6 million IOPS far exceeds the capabilities of any consumer SSD today, and this calculation does not even take write operations into account.
A common optimization strategy to reduce disk I/O is to group multiple trie nodes in a subtree together and store them in a single 4KB disk page. But this is still 6 times lower than what we required.
Block Gas Limit
For the security and reliability of the blockchain, we must set a reasonable Gas limit.
Infrastructure
Finally, users do not interact directly with sequencer nodes, and most people do not run a full node at home. Instead, users submit transactions to third-party RPC nodes and rely on dApps or blockchain browsers, such as the web frontend of http://etherscan.io/ to confirm transaction results.
Therefore, the actual user experience of a blockchain depends heavily on its supporting infrastructure, such as RPC nodes and indexers. No matter how fast a live blockchain runs, it won’t matter if the RPC nodes can’t efficiently handle a large number of read requests during peak hours, propagate transactions to sorter nodes quickly, or if the indexers can’t update the application view quickly enough to keep up.
Scaling blockchain with a principled approach
Committed to a holistic and principled approach to R&D. By conducting in-depth performance analysis early on, we ensure we stay focused on solving problems that deliver real benefits to our users. The key is to be holistic, in-depth, and user-focused.
4. Expected Application Types
• game
• Decentralized physical infrastructure (dePin) required for real-time computation
• Autonomous World Engine
• Decentralized VPN network
• Cross-border payments
• Leverage ultra-low latency high frequency trading (on-chain Binance?)
There is actually a lot of room for imagination in the application part. I have listened to a lot of related spaces, and I think everyone’s thinking is still not good enough.