Technology, economics, efficiency: three mountains that cannot be avoided.
Written by: Pranav Garimidi, Joseph Bonneau, Lioba Heimbach, a16z
Compiled by: Saoirse, Foresight News
In blockchain, the maximum extractable value, abbreviated as MEV, refers to the maximum value that can be earned by deciding which transactions to include in blocks, which to exclude, or adjusting the order of transactions. MEV is ubiquitous in most blockchains and has been a widely discussed topic in the industry.
Note: This article assumes the reader has a basic understanding of MEV. Some readers may first want to read our MEV popular science article.
Many researchers have posed a clear question when observing the MEV phenomenon: Can cryptographic technology solve this problem? One proposed solution is to use an encrypted memory pool: users broadcast encrypted transactions, which are only decrypted and disclosed after sorting is completed. In this way, consensus protocols must 'blindly select' the transaction order, which seems to prevent profiting from MEV opportunities during the sorting phase.
Unfortunately, whether from practical application or theoretical aspects, the encrypted memory pool is unlikely to provide a universal solution to the MEV problem. This article will outline the difficulties and explore feasible design directions for the encrypted memory pool.
How the encrypted memory pool works
There have been many proposals regarding encrypted memory pools, but its general framework is as follows:
Users broadcast encrypted transactions.
Encrypted transactions are submitted to the chain (in some proposals, transactions must first undergo verifiable random shuffling).
When the block containing these transactions is finally confirmed, the transactions are decrypted.
Finally, execute these transactions.
It is important to note that step 3 (transaction decryption) presents a key issue: who is responsible for decryption? What happens if decryption fails? A simple idea is to let users decrypt their own transactions (in which case encryption is not even necessary, only hiding the commitment is sufficient). However, this approach has vulnerabilities: attackers may implement speculative MEV.
In speculative MEV, attackers guess that a certain encrypted transaction contains MEV opportunities, then encrypt their own transactions and attempt to insert them into favorable positions (for example, in front of or behind the target transaction). If the transaction is arranged in the expected order, the attacker will decrypt it and extract MEV through their own transaction; if not, they will refuse to decrypt, and their transaction will not be included in the final blockchain.
Perhaps penalties can be imposed on users who fail to decrypt, but the implementation of this mechanism is extremely difficult. The reason is that the severity of penalties for all encrypted transactions must be uniform (after all, transactions cannot be distinguished once encrypted), and the penalties must be severe enough to curb speculative MEV even in the face of high-value targets. This would lead to a large amount of funds being locked, and these funds must remain anonymous (to avoid exposing the connection between transactions and users). More tricky is that if genuine users are unable to decrypt due to program vulnerabilities or network failures, they would also suffer losses.
Therefore, most proposals suggest that when encrypting transactions, it must ensure they can be decrypted at some future point, even if the transaction initiating user is offline or refuses to cooperate. This goal can be achieved through several means:
Trusted Execution Environments (TEEs): Users can encrypt transactions to keys held securely by trusted execution environments (TEEs). In some basic versions, TEEs are only used to decrypt transactions after a specific time point (requiring that the TEE has time awareness). More complex schemes allow TEEs to be responsible for decrypting transactions and building blocks, sorting transactions based on arrival time, fees, and other criteria. Compared to other encrypted memory pool schemes, the advantage of TEEs is that they can directly handle plaintext transactions, reducing on-chain redundant information by filtering out transactions that would roll back. However, the downside of this method is its reliance on hardware trustworthiness.
Secret sharing and threshold encryption: In this scheme, users encrypt transactions to a certain key, which is jointly held by a specific committee (usually a subset of validators). Decryption requires meeting certain threshold conditions (for example, two-thirds of the committee members must agree).
When using threshold decryption, the trusted carrier shifts from hardware to a committee. Proponents argue that since most protocols have already assumed the 'honest majority' property of validators in their consensus mechanisms, we can make a similar assumption that the majority of validators will remain honest and not decrypt transactions in advance.
However, it is important to note a key distinction: these two trust assumptions are not the same concept. Consensus failures, such as blockchain forks, have public visibility (belonging to 'weak trust assumptions'), while malicious committees privately decrypting transactions in advance leave no public evidence. Such attacks cannot be detected or punished (belonging to 'strong trust assumptions'). Therefore, although the consensus mechanism and the security assumptions of the encrypted committee may seem consistent on the surface, in practice, the credibility of the assumption that 'the committee will not collude' is much lower.
Time-lock and delay encryption: as an alternative to threshold encryption, the principle of delay encryption is that users encrypt transactions to a certain public key, and the private key corresponding to that public key is hidden in a time-lock puzzle. A time-lock puzzle is a cryptographic puzzle that encapsulates a secret, which can only be revealed after a preset time, specifically requiring repeatedly executing a series of non-parallelizable computations. In this mechanism, anyone can solve the puzzle to obtain the key and decrypt the transaction, but only by completing a long enough slow (essentially serially executed) computation to ensure the transaction cannot be decrypted before final confirmation. The strongest form of this encryption primitive is to publicly generate such puzzles through delay encryption technology; it can also be approximated through trusted committees using time-lock encryption, but its relative advantages over threshold encryption are debatable.
Whether using delayed encryption or having a trusted committee perform computations, such schemes face many practical challenges: first, since delays inherently depend on the computation process, it is challenging to ensure the accuracy of decryption timing; second, these schemes must rely on specific entities to run high-performance hardware to efficiently solve puzzles; although anyone can take on this role, it remains unclear how to incentivize that entity's participation; finally, in such designs, all broadcasted transactions would be decrypted, including those that were never ultimately written to blocks. In contrast, threshold (or witness encryption) schemes may only decrypt those transactions that are successfully included.
Witness encryption: The last and most advanced cryptographic scheme uses 'witness encryption' technology. Theoretically, the mechanism of witness encryption is: after encrypting information, only those who know the specific NP relationship corresponding to the 'witness information' can decrypt it. For example, information can be encrypted such that only someone who can solve a particular Sudoku puzzle or provide a specific numerical hash pre-image can complete the decryption.
(Note: An NP relationship is the correspondence between a 'problem' and an 'answer that can be verified quickly.')
For any NP relationship, similar logic can be implemented through SNARKs. It can be said that witness encryption essentially encrypts data in a way that only entities that can prove through SNARK that they meet specific conditions can decrypt it. In the context of an encrypted memory pool, a typical example of such conditions is that transactions can only be decrypted after the block is finally confirmed.
This is a theoretical primitive with great potential. In reality, it is a universal scheme, with committee-based and delay-based methods being specific applications. Unfortunately, we currently do not have any practical witness-based encryption schemes. Moreover, even if such schemes exist, it is hard to say whether they would have an advantage over committee-based methods in proof-of-stake chains. Even if witness encryption is set to 'only decrypt when transactions are sorted in the finally confirmed block,' malicious committees can still privately simulate consensus protocols to falsify the final confirmation state of transactions, then use this private chain as a 'witness' to decrypt the transactions. At this point, using threshold decryption by the same committee achieves equal security and is much simpler to operate.
However, in proof-of-work consensus protocols, the advantages of witness encryption become more pronounced. Even if the committee is completely malicious, they cannot privately mine multiple new blocks at the current blockchain head to falsify the final confirmation state.
Technical challenges faced by the encrypted memory pool
Multiple practical challenges restrict the ability of encrypted memory pools to prevent MEV. Overall, information confidentiality itself is a dilemma. Notably, the application of cryptographic technology in the Web3 field is not widespread, but decades of practice in deploying cryptographic technology in networks (such as TLS/HTTPS) and private communications (from PGP to modern encrypted messaging platforms like Signal and WhatsApp) have fully exposed the difficulties: while encryption is a tool for protecting confidentiality, it cannot provide absolute guarantees.
First, certain entities may directly obtain plaintext information about user transactions. In typical scenarios, users often do not encrypt transactions themselves, but instead delegate this task to wallet service providers. As a result, wallet service providers can access transaction plaintext and may even utilize or sell this information to extract MEV. The security of encryption always depends on all entities that can access the keys. The scope of key control defines the boundary of security.
Moreover, the biggest problem lies in the metadata, namely the unencrypted data surrounding the encrypted payload (transaction). Searchers can exploit this metadata to infer transaction intentions and implement speculative MEV. It is important to note that searchers do not need to fully understand the transaction content, nor do they need to guess correctly every time. For example, as long as they can reasonably judge that a transaction is a buy order from a specific decentralized exchange (DEX), it is sufficient to initiate an attack.
We can categorize metadata into several types: one category consists of classic challenges inherent in cryptographic technology, while another category pertains to issues unique to encrypted memory pools.
Transaction size: encryption itself cannot hide the size of the plaintext (notably, the formal definition of semantic security explicitly excludes hiding plaintext size). This is a common attack vector in encrypted communications, with a typical case being that even after encryption, eavesdroppers can still infer the content being played on Netflix in real-time by the size of each packet in a video stream. In the encrypted memory pool, specific types of transactions may have unique sizes, thus leaking information.
Broadcast time: encryption likewise cannot hide timing information (this is another classic attack vector). In the Web3 context, certain senders (such as in structured sell-off scenarios) may initiate transactions at fixed intervals. Transaction timing may also be associated with other information, such as activities on external exchanges or news events. A more covert use of timing information is arbitrage between centralized exchanges (CEX) and decentralized exchanges (DEX): sorters can insert transactions created as late as possible, taking advantage of the latest CEX price information; at the same time, sorters can exclude all other transactions broadcast after a certain point in time (even if encrypted), ensuring their transactions exclusively enjoy the latest price advantage.
Source IP address: searchers can infer the identity of the transaction sender by monitoring peer-to-peer networks and tracing source IP addresses. This issue was identified in the early days of Bitcoin (over a decade ago). If a specific sender has a fixed behavioral pattern, this can be very valuable to searchers. For example, knowing the identity of the sender allows them to associate encrypted transactions with previously decrypted historical transactions.
Transaction sender and fee/gas information: transaction fees are a unique type of metadata specific to the encrypted memory pool. In Ethereum, traditional transactions include the on-chain sender address (used to pay fees), maximum gas budget, and the unit gas fee the sender is willing to pay. Similar to source network addresses, sender addresses can be used to associate multiple transactions with real entities; the gas budget can imply transaction intentions. For example, interacting with a specific DEX may require a recognizable fixed amount of gas.
Complex searchers may combine various types of metadata mentioned above to predict transaction content.
Theoretically, this information can all be hidden, but at the cost of performance and complexity. For example, filling transactions to a standard length can hide their size but waste bandwidth and on-chain space; adding delays before sending can hide timing, but it will increase latency; submitting transactions through anonymous networks like Tor can hide IP addresses, but this brings new challenges.
The hardest metadata to hide is transaction fee information. Encrypted fee data poses a series of problems for block builders: first, the issue of garbage information; if transaction fee data is encrypted, anyone can broadcast incorrectly formatted encrypted transactions, which, though sorted, cannot pay fees and cannot be executed after decryption, leaving no one accountable. This may be solvable through SNARKs, which prove that the transaction format is correct and funds are sufficient, but it would significantly increase overhead.
Secondly, there is the efficiency issue of block construction and fee auctions. Builders rely on fee information to create profit-maximizing blocks and determine the current market price of on-chain resources. Encrypted fee data would disrupt this process. One solution is to set a fixed fee for each block, but this is economically inefficient and may lead to the emergence of a secondary market for transaction packaging, contrary to the original design intent of the encrypted memory pool. Another solution is to conduct fee auctions through secure multi-party computation or trusted hardware, but both methods are extremely costly.
Finally, a secure encrypted memory pool will increase system overhead from multiple aspects: encryption will increase chain delays, computational load, and bandwidth consumption; how to integrate with important future goals such as sharding or parallel execution remains unclear; it may also introduce new failure points for liveness (e.g., decryption committees in threshold schemes or delay function solvers); at the same time, design and implementation complexity will also significantly rise.
Many of the issues faced by the encrypted memory pool are similar to the challenges faced by blockchains aimed at ensuring transaction privacy (such as Zcash and Monero). If there is any positive aspect, it is that solving all challenges of cryptographic technology in MEV mitigation will also pave the way for transaction privacy.
Economic challenges faced by the encrypted memory pool
Lastly, the encrypted memory pool also faces economic challenges. Unlike technical challenges, which can be gradually alleviated with sufficient engineering investment, these economic challenges are fundamental limitations that are extremely difficult to solve.
The core issue of MEV stems from the information asymmetry between transaction creators (users) and those mining MEV opportunities (searchers and block builders). Users typically do not know how much extractable value their transactions contain, so even with a perfect encrypted memory pool, they may still be induced to disclose decryption keys in exchange for a reward below the actual MEV value, a phenomenon known as 'incentivized decryption.'
This scenario is not hard to imagine, as similar mechanisms like MEV Share already exist in reality. MEV Share is an order flow auction mechanism that allows users to selectively submit transaction information to a pool, with searchers competing for the rights to exploit MEV opportunities from those transactions. The winning bidder returns part of the profits (i.e., the bid amount or a certain percentage) to the users after extracting MEV.
This model can be directly adapted to the encrypted memory pool: users need to disclose decryption keys (or partial information) to participate. However, most users are unaware of the opportunity cost of participating in such mechanisms; they only see the immediate returns and are happy to disclose information. Similar cases exist in traditional finance: for example, the zero-commission trading platform Robinhood profits by selling user order flow to third parties through 'payment-for-order-flow'.
Another possible scenario is that large builders may force users to disclose transaction content (or related information) under the pretext of censorship. Resistance to censorship is an important and controversial topic in the Web3 space. However, if large validators or builders are legally bound (such as by the U.S. Office of Foreign Assets Control (OFAC) regulations) to enforce a list of censored transactions, they may refuse to process any encrypted transactions. Technically, users might be able to prove their encrypted transactions meet censorship requirements through zero-knowledge proofs, but this would add extra costs and complexity. Even if the blockchain has strong resistance to censorship (ensuring encrypted transactions are guaranteed to be included), builders may still prioritize known plaintext transactions at the front of the block, placing encrypted transactions at the end. Therefore, those transactions that need to ensure execution priority may ultimately be forced to disclose their content to builders.
Other efficiency-related challenges
Encrypted memory pools will increase system overhead in various obvious ways. Users need to encrypt transactions, and the system must decrypt them in some manner, which will increase computational costs and may also increase transaction size. As mentioned earlier, processing metadata will further exacerbate these overheads. However, there are also some efficiency costs that are not so obvious. In finance, if prices can reflect all available information, the market is considered efficient; delays and information asymmetries lead to market inefficiency. This is an inevitable result of encrypted memory pools.
Such inefficiencies lead to a direct consequence: increased price uncertainty, which is a direct product of additional delays introduced by the encrypted memory pool. Consequently, transactions that fail due to exceeding price slippage tolerance may increase, wasting on-chain space.
Similarly, this price uncertainty may also give rise to speculative MEV transactions, which attempt to profit from on-chain arbitrage. Notably, encrypted memory pools may make such opportunities more prevalent: due to execution delays, the current state of decentralized exchanges (DEX) becomes more ambiguous, which likely leads to decreased market efficiency and price discrepancies among different trading platforms. Such speculative MEV transactions also waste block space, as they often abort execution once no arbitrage opportunities are found.
Summary
The intent of this article is to outline the challenges faced by encrypted memory pools so that people can shift their focus to the development of other solutions, but encrypted memory pools may still become part of MEV governance solutions.
One feasible idea is a hybrid design: some transactions achieve 'blind sorting' through an encrypted memory pool, while others use different sorting schemes. For specific types of transactions (e.g., buy and sell orders from large market participants who are capable of carefully encrypting or padding transactions and are willing to pay higher costs to avoid MEV), a hybrid design may be a suitable choice. For highly sensitive transactions (such as repair transactions targeting vulnerable security contracts), this design also holds practical significance.
However, due to technical limitations, high engineering complexity, and performance overhead, the encrypted memory pool is unlikely to become the 'universal solution' for MEV that people expect. The community needs to develop other solutions, including MEV auctions, application layer defense mechanisms, and shortened final confirmation times. MEV will remain a challenge for some time, requiring in-depth research to find a balance among various solutions to address its negative impacts.