Written by: Pranav Garimidi, Joseph Bonneau, Lioba Heimbach, a16z

Written by: Saoirse, Foresight News

In blockchain, the maximum extractable value, or MEV, is the term used for the maximum value that can be earned by deciding which transactions are packaged into blocks, which are excluded, or by adjusting the order of transactions. MEV is prevalent in most blockchains and has always been a widely discussed topic in the industry.

Note: This article assumes that readers have a basic understanding of MEV. Some readers may first read our MEV popular science article.

Many researchers have posed a clear question while observing the MEV phenomenon: Can encryption technology solve this problem? One solution is to use an encrypted memory pool: users broadcast encrypted transactions that will only be decrypted after sorting is completed. This way, consensus protocols must 'blindly select' the transaction order, which seems to prevent profiting from MEV opportunities during the sorting phase.

Unfortunately, from both practical application and theoretical perspectives, the encrypted memory pool is unlikely to provide a universal solution to the MEV problem. This article will explain the difficulties involved and explore feasible design directions for the encrypted memory pool.

How the encrypted memory pool works

There are many proposals regarding the encrypted memory pool, but its general framework is as follows:

  1. Users broadcast encrypted transactions.

  2. The encrypted transaction is submitted to the chain (in some proposals, the transaction must first undergo verifiable random shuffling).

  3. Once the block containing these transactions is finally confirmed, the transactions are decrypted.

  4. Finally, these transactions are executed.

It is important to note that Step 3 (transaction decryption) has a key issue: who is responsible for decryption? What if decryption fails? A simple idea is to let users decrypt their own transactions (in which case encryption is not even necessary, just hiding the commitment is sufficient). However, this method has vulnerabilities: attackers could implement speculative MEV.

In speculative MEV, attackers guess that a particular encrypted transaction contains MEV opportunities, then encrypt their own transaction and try to insert it into a favorable position (e.g., before or after the target transaction). If the transactions are ordered as expected, the attacker will decrypt and extract MEV through their own transaction; if not as expected, they will refuse to decrypt, and their transaction will not be included in the final blockchain.

Perhaps penalties can be imposed on users who fail to decrypt, but the implementation of this mechanism is extremely difficult. The reason is that the penalties for all encrypted transactions must be uniform (after all, transactions cannot be distinguished once encrypted), and the penalties must be severe enough to deter speculative MEV even when facing high-value targets. This would lead to a large amount of funds being locked, and these funds need to maintain anonymity (to avoid revealing the association between transactions and users). More challenging is that if real users cannot decrypt normally due to program bugs or network failures, they will also incur losses.

Therefore, most proposals suggest that when encrypting transactions, it is necessary to ensure that they can be decrypted at a certain point in the future, even if the initiating user is offline or refuses to cooperate. This goal can be achieved through several methods:

Trusted Execution Environments (TEEs): Users can encrypt transactions to keys held securely by Trusted Execution Environments (TEE). In some basic versions, TEEs are only used to decrypt transactions after a specific time point (which requires the TEE to have time-awareness). More complex schemes allow the TEE to decrypt transactions and construct blocks, sorting transactions based on arrival time, fees, and other criteria. Compared to other encrypted memory pool schemes, the advantage of TEE is that it can directly handle plaintext transactions, reducing on-chain redundant information by filtering out any transactions that would roll back. However, this approach's drawback is its reliance on hardware trust.

Secret-sharing and threshold encryption: In this scheme, users encrypt transactions to a certain key, which is jointly held by a specific committee (usually a subset of validators). Decryption requires meeting certain threshold conditions (e.g., two-thirds of the committee members agreeing).

When employing threshold decryption, the trusted entity shifts from hardware to the committee. Supporters argue that since most protocols already assume that validators possess 'honest majority' characteristics in consensus mechanisms, we can make similar assumptions that a majority of validators will remain honest and will not decrypt transactions early.

However, it is important to note a key distinction: these two trust assumptions are not the same concept. Consensus failures such as blockchain forks are publicly visible (which belongs to 'weak trust assumptions'), while a malicious committee decrypting transactions in secret will leave no public evidence; such an attack cannot be detected or punished (which belongs to 'strong trust assumptions'). Therefore, even though, on the surface, the security assumptions of consensus mechanisms and encrypted committees seem aligned, in practice, the credibility of the assumption that 'committees will not collude' is much lower.

Time-lock and delay encryption: As an alternative to threshold encryption, the principle of delay encryption is that users encrypt transactions to a certain public key, with the corresponding private key hidden in a time-lock puzzle. A time-lock puzzle is a cryptographic puzzle that encapsulates a secret, which cannot be revealed until a preset time has passed; more specifically, the decryption process requires repeatedly executing a series of non-parallelizable computations. In this mechanism, anyone can solve the puzzle to obtain the key and decrypt the transaction, but only after completing a sufficiently time-consuming slow (essentially serially executed) computation, ensuring that the transaction cannot be decrypted before final confirmation. The strongest form of this encryption primitive is generated publicly through delay encryption technology; it can also be approximated through trusted committees using time-lock encryption, though its relative advantages over threshold encryption are debatable.

Whether using delay encryption or having trusted committees perform computations, such schemes face many practical challenges: first, since delays essentially rely on computational processes, it is difficult to ensure the accuracy of decryption timing; second, these schemes need to rely on specific entities running high-performance hardware to efficiently solve the puzzles; although anyone can take on this role, how to incentivize such entities to participate remains unclear; lastly, in such designs, all broadcast transactions will be decrypted, including those never finally written into blocks. In contrast, schemes based on threshold (or witness encryption) may only decrypt transactions that have been successfully included.

Witness Encryption: The last and most advanced cryptographic scheme utilizes 'witness encryption' technology. Theoretically, the mechanism of witness encryption is that after encrypting information, only those who know the specific NP relation corresponding to the 'witness information' can decrypt it. For example, information can be encrypted such that only someone who can solve a particular Sudoku puzzle or provide a specific value hash preimage can complete the decryption.

(Note: NP relations refer to the correspondence between 'problems' and 'answers that can be quickly verified.')

For any NP relation, similar logic can be achieved through SNARKs. It can be said that witness encryption essentially encrypts data in a form that can only be decrypted by entities that can prove through SNARK that they meet specific conditions. In an encrypted memory pool scenario, a typical example of such a condition is that transactions can only be decrypted after the block is finally confirmed.

This is a highly promising theoretical primitive. In fact, it is a universal scheme, with both committee-based and delay-based methods being specific applications of it. Unfortunately, we currently do not have any practical witness-based encryption schemes. Moreover, even if such schemes exist, it is hard to say whether they would have advantages over committee-based methods in proof-of-stake chains. Even if witness encryption is set up to allow decryption only after transactions are sorted in the finalized block, a malicious committee could still simulate consensus protocols in secret to fabricate the final confirmation status of the transaction, using this private chain as a 'witness' to decrypt the transaction. At this point, threshold decryption by the same committee could achieve the same level of security while being much easier to operate.

However, in proof-of-work consensus protocols, the advantages of witness encryption become more pronounced. Because even if the committee is completely malicious, it cannot privately mine multiple new blocks at the current blockchain head to fabricate the final confirmation status.

Technical challenges faced by the encrypted memory pool

Multiple practical challenges limit the ability of the encrypted memory pool to prevent MEV. Overall, information confidentiality is inherently a difficult problem. It is worth noting that the application of encryption technology in the Web3 space is not widespread, but decades of practice deploying encryption technologies in networks (like TLS/HTTPS) and private communications (from PGP to modern encrypted messaging platforms like Signal and WhatsApp) have fully exposed its difficulties: encryption, while a tool for protecting confidentiality, cannot provide absolute assurance.

First, certain entities may directly access the plaintext information of user transactions. In a typical scenario, users usually do not encrypt transactions themselves but delegate this task to wallet service providers. As a result, wallet service providers can access the transaction plaintext and may even utilize or sell this information to extract MEV. The security of encryption always depends on all entities that can access the keys. The extent of key control defines the boundary of security.

Moreover, the biggest issue lies in the metadata, which is the unencrypted data surrounding the encrypted payload (transaction). Seekers can use this metadata to infer transaction intent and implement speculative MEV. It is important to note that seekers do not need to fully understand the transaction content nor must they always guess correctly. For example, as long as they can reasonably infer that a transaction is a buy order from a specific decentralized exchange (DEX), that is sufficient to initiate an attack.

We can categorize metadata into several types: one category consists of classic problems inherent in encryption technology, while another category consists of problems unique to the encrypted memory pool.

  • Transaction size: Encryption itself cannot hide the size of plaintext (notably, the formal definition of semantic security explicitly excludes the hiding of plaintext size). This is a common attack vector in encrypted communication; a typical case is that even after encryption, an eavesdropper can still determine the content being played on Netflix in real time by the size of each packet in the video stream. In the encrypted memory pool, specific types of transactions may have unique sizes, thereby leaking information.

  • Broadcast time: Encryption cannot hide time information (this is another classic attack vector). In the Web3 scenario, certain senders (such as in structured sell-off scenarios) may initiate transactions at fixed intervals. Transaction times may also be associated with other information, such as activities on external exchanges or news events. A more covert way to utilize time information is through arbitrage between centralized exchanges (CEX) and decentralized exchanges (DEX): the sorter can exploit the latest CEX price information by inserting transactions created as late as possible while excluding all other transactions broadcast after a certain time point (even if encrypted), ensuring their transaction exclusively benefits from the latest price advantage.

  • Source IP address: Seekers can infer the identity of transaction senders by monitoring peer-to-peer networks and tracing source IP addresses. This issue was identified early in Bitcoin's history (over a decade ago). If a particular sender has a fixed behavioral pattern, this can be highly valuable to seekers. For example, knowing the sender's identity allows them to associate encrypted transactions with previously decrypted historical transactions.

  • Transaction sender and fee/gas information: Transaction fees are a unique type of metadata for encrypted memory pools. In Ethereum, traditional transactions include the on-chain sender address (used for paying fees), maximum gas budget, and the unit gas fee the sender is willing to pay. Similar to the source network address, the sender address can be used to associate multiple transactions with real entities; the gas budget can imply transaction intent. For example, interacting with a specific DEX may require a recognizable fixed amount of gas.

Sophisticated seekers may combine multiple types of metadata mentioned above to predict transaction content.

In theory, this information can all be hidden, but at the cost of performance and complexity. For example, padding transactions to a standard length can hide their size but wastes bandwidth and on-chain space; introducing delays before sending can hide time but increases delay; submitting transactions through anonymous networks like Tor can hide IP addresses, but this introduces new challenges.

The hardest metadata to hide is transaction fee information. Encrypting fee data presents a series of problems for block builders: first is the issue of garbage data; if transaction fee data is encrypted, anyone can broadcast incorrectly formatted encrypted transactions, which will be sorted but cannot pay fees, remaining unexecuted after decryption with no one held accountable. This may be solvable through SNARKs, proving that transaction format is correct and that funds are sufficient, but it would significantly increase overhead.

Secondly, there are efficiency issues with block construction and fee auctions. Builders rely on fee information to create profit-maximizing blocks and determine the current market price of on-chain resources. Encrypting fee data disrupts this process. One solution is to set fixed fees for each block, but this is economically inefficient and may create a secondary market for transaction packaging, which contradicts the encrypted memory pool's design intent. Another solution is to conduct fee auctions through secure multi-party computation or trusted hardware, but both approaches are extremely costly.

Finally, a secure encrypted memory pool will increase system overhead from multiple aspects: encryption will increase chain latency, computational load, and bandwidth consumption; how to combine this with important future goals such as sharding or parallel execution is currently unclear; it may also introduce new fault points for liveness (such as decryption committees in threshold schemes, delay function solvers); at the same time, design and implementation complexity will also significantly increase.

Many of the problems faced by the encrypted memory pool are similar to those faced by blockchains aimed at ensuring transaction privacy (such as Zcash and Monero). If there is any positive significance, it is that solving all the challenges of encryption technology in MEV mitigation will also clear obstacles for transaction privacy.

Economic challenges faced by the encrypted memory pool

Lastly, the encrypted memory pool also faces economic challenges. Unlike technical challenges, which can be gradually alleviated through sufficient engineering investment, these economic challenges represent fundamental limitations that are extremely difficult to resolve.

The core issue of MEV stems from the information asymmetry between transaction creators (users) and MEV opportunity miners (seekers and block builders). Users are often unaware of how much extractable value their transactions contain; thus, even if a perfect encrypted memory pool exists, they may be induced to leak decryption keys in exchange for a reward below the actual MEV value, a phenomenon referred to as 'incentivized decryption.'

Such scenarios are not hard to imagine, as similar mechanisms like MEV Share already exist in reality. MEV Share is an order flow auction mechanism that allows users to selectively submit transaction information to a pool, where seekers compete for the right to exploit MEV opportunities from that transaction. The winning bidder, after extracting MEV, will return a portion of the proceeds (i.e., the bid amount or a percentage of it) to the user.

This model can directly adapt to the encrypted memory pool: users must disclose decryption keys (or partial information) to participate. However, most users are unaware of the opportunity costs of participating in such mechanisms; they only see the immediate rewards and are willing to leak information. Similar cases exist in traditional finance: for instance, the commission-free trading platform Robinhood profits by selling user order flows to third parties through 'payment-for-order-flow.'

Another possible scenario is that large builders force users to disclose transaction contents (or related information) under the guise of censorship. Censorship resistance is an important and controversial topic in the Web3 space; however, if large validators or builders are legally bound (such as by the US Office of Foreign Assets Control OFAC regulations) to enforce a censorship list, they may refuse to process any encrypted transactions. Technically, users could use zero-knowledge proofs to verify that their encrypted transactions comply with censorship requirements, but this adds additional costs and complexity. Even if the blockchain has strong censorship resistance (ensuring encrypted transactions are necessarily included), builders may still prioritize known plaintext transactions at the front of the block while placing encrypted transactions at the end. Therefore, those needing to ensure execution priority may ultimately be forced to disclose content to builders.

Other efficiency-related challenges

An encrypted memory pool can significantly increase system overhead in multiple obvious ways. Users need to encrypt transactions, and the system must decrypt them in some way, which increases computational costs and may also increase transaction size. As mentioned before, handling metadata will further exacerbate these overheads. However, there are also some efficiency costs that are not so obvious. In finance, if prices can reflect all available information, the market is considered efficient; delays and information asymmetry can lead to market inefficiencies. This is the inevitable result of the encrypted memory pool.

Such inefficiencies lead to a direct consequence: increased price uncertainty, which is a direct product of the additional delays introduced by the encrypted memory pool. As a result, the number of transactions that fail due to exceeding the price slippage tolerance may increase, further wasting on-chain space.

Likewise, this price uncertainty may also give rise to speculative MEV transactions, which attempt to profit from on-chain arbitrage. It is worth noting that the encrypted memory pool may make such opportunities more common: due to execution delays, the current state of decentralized exchanges (DEX) becomes more ambiguous, which is likely to lead to decreased market efficiency and price discrepancies between different trading platforms. Such speculative MEV transactions will also waste block space, as they often terminate execution once arbitrage opportunities are not discovered.

Summary

The initial intention of this article is to outline the challenges faced by the encrypted memory pool so that people can redirect their efforts toward developing other solutions, but the encrypted memory pool may still become part of the MEV governance scheme.

One feasible idea is a hybrid design: some transactions achieve 'blind sorting' through the encrypted memory pool, while others use alternative sorting schemes. For specific types of transactions (e.g., buy and sell orders from large market participants who can carefully encrypt or pad transactions and are willing to pay higher costs to avoid MEV), a hybrid design may be appropriate. This design also makes practical sense for highly sensitive transactions (such as those aimed at repairing vulnerabilities in security contracts).

However, due to technical limitations, high engineering complexity, and performance overhead, the encrypted memory pool is unlikely to become the 'universal solution' for MEV that people hope for. The community needs to develop other solutions, including MEV auctions, application layer defense mechanisms, and shortening final confirmation times. MEV will remain a challenge for the foreseeable future, necessitating in-depth research to find a balance among various solutions to address its negative impacts.