Original title: Possible futures for the Ethereum protocol, part 2: The Surge
Author: Vitalik, founder of Ethereum, compiled by: Deng Tong, Golden Finance
Special thanks to Justin Drake, Francesco, Hsiao-wei Wang, @antonttc and Georgios Konstantopoulos
In the beginning, there were two expansion strategies in the Ethereum roadmap. One of them is "sharding": each node only needs to verify and store a small part of the transactions, instead of verifying and storing all the transactions in the chain. This is also how any other peer-to-peer network (e.g. BitTorrent) works, so we can certainly make blockchains work the same way. The other is layer 2 protocols:Networks would sit on top of Ethereum, allowing them to fully benefit from its security while keeping most data and computation off the main chain. “Layer 2 protocols” referred to state channels in 2015, Plasma in 2017, and Rollups in 2019. Rollups are more powerful than state channels or Plasma, but they require a lot of on-chain data bandwidth. Fortunately, by 2019, sharding research had solved the problem of verifying “data availability” at scale. As a result, the two paths converged and we got a rollup-centric roadmap, which is still Ethereum’s scaling strategy today.
The Surge, 2023 roadmap edition.
The rollup-centric roadmap proposes a simple division of labor: Ethereum L1 focuses on becoming a strong and decentralized base layer, while L2 is tasked with helping the ecosystem scale. This is a recurring pattern across society: the court system (L1) is not designed to be super fast and efficient, but to protect contracts and property rights, while entrepreneurs (L2) need to build on top of this solid base layer and take humans to (metaphorically and literally) Mars.
This year, the Rollup-centric roadmap has had important successes: Ethereum L1 data bandwidth has increased significantly via the EIP-4844 blob, and multiple EVM Rollups are now in phase 1. Very heterogeneous and diverse implementations of sharding, where each L2 acts as a “shard” with its own internal rules and logic, are now a reality. But as we’ve seen, taking this path has some unique challenges of its own. So now our task is to complete the Rollup-centric roadmap and address these issues while retaining the robustness and decentralization that make Ethereum L1 special.
Surge: Key Goals
100,000+ TPS on L1+L2
Maintain decentralization and robustness of L1
At least some L2 fully inherits the core properties of Ethereum (trustless, open, censorship-resistant)
Maximum interoperability between L2. Ethereum should feel like an ecosystem, not 34 different blockchains.
The Scalability Trilemma
The scalability trilemma is an idea proposed in 2017 that posits that there is a tension between three properties of a blockchain: decentralization (more specifically: low cost of running a node), scalability (more specifically: processing a large number of transactions), and security (more specifically: an attacker would need to compromise a majority of the nodes in the entire network to make a single transaction fail).
It’s worth noting that the trilemma is not a theorem, and the post introducing the trilemma does not come with a mathematical proof. It gives a heuristic mathematical argument: if a decentralization-friendly node (e.g. a consumer laptop) can verify N transactions per second, and you have a chain that processes k*N transactions per second, then either (i) each transaction will only be seen by 1/k of the nodes, meaning an attacker only needs to compromise a few nodes to push bad transactions, or (ii) your nodes will become overpowered and your chain is not decentralized. The purpose of this post was never to show that breaking the trilemma is impossible; rather, it was to show that breaking the trilemma is hard — it requires thinking outside the box that the argument implies in some way.
For years, some high-performance chains have often claimed that they solved the trilemma without doing anything clever at the infrastructure level, usually by using software engineering tricks to optimize nodes. This was always misleading, and running a node in such chains was always much harder than in Ethereum. This post explores many of the subtleties of why this is the case (and why L1 client software engineering alone cannot scale Ethereum itself).
However, the combination of data availability sampling and SNARKs does solve the trilemma: it allows a client to verify that a certain amount of data is available, and that a certain number of computational steps were performed correctly, while only downloading a small fraction of that data and running a much smaller computation. SNARKs are trustless. Data availability sampling has a subtle few-N trust model, but it preserves the fundamental property that non-scalable chains have, that even a 51% attack cannot force the network to accept bad blocks.
Another approach to solving the trilemma is the Plasma architecture, which uses clever techniques to push the responsibility of monitoring data availability onto users in an incentive-compatible way. Back in 2017-2019, when all we needed to scale computation were fraud proofs, Plasma had very limited security capabilities, but the mainstreaming of SNARKs has made the Plasma architecture more applicable to a wider range of use cases than before.
Further progress on data availability sampling
What problem are we solving?
As of March 13, 2024, when the Dencun upgrade goes live, the Ethereum blockchain has 3 "blobs" of ~125 kB per 12 second period, or ~375 kB of available bandwidth per period for data. Assuming transaction data is posted directly to the chain, and ERC20 transfers are ~180 bytes, the maximum TPS for rollups on Ethereum is:
375000 / 12 / 180 = 173.6 TPS
If we add Ethereum's calldata (theoretical maximum: 30 million gas per slot / 16 gas per byte = 1,875,000 bytes per slot), this becomes 607 TPS. For PeerDAS, the plan is to increase the blob count target to 8-16, which would give us 463-926 TPS of calldata.
This is a significant improvement over Ethereum L1, but it's not enough. We want more scalability. Our medium-term goal is 16MB per slot, which, when combined with improvements in rollup data compression, will give us about 58,000 TPS.
What is it and how does it work?
PeerDAS is a relatively simple implementation of "1D sampling". Every blob in Ethereum is a 4096th-degree polynomial over a 253-bit prime field. We broadcast "shares" of the polynomial, where each share consists of 16 evaluations at 16 adjacent coordinates taken from a total set of 8192 coordinates. Any 4096 of the 8192 evaluations (using the currently proposed parameters: any 64 of the 128 possible samples) can recover the blob.
PeerDAS works by having each client listen to a small number of subnets, where the i-th subnet broadcasts the i-th sample of any blob, and additionally requests the desired blobs on other subnets by asking peers in the global p2p network (who may listen to different subnets). A more conservative version, SubnetDAS, uses only the subnet mechanism, without the additional requesting peer layer. The current proposal is for nodes participating in proof-of-stake to use SubnetDAS, and for other nodes (i.e. "clients") to use PeerDAS.
In theory, we could extend 1D sampling quite far: if we increase the blob count max to 256 (thus, a target of 128), then we'd hit our 16MB target, and data availability sampling would only cost each node 16 samples * 128 blobs * 512 bytes per sample per blob = 1MB of data bandwidth per slot. This is just within our tolerance: it's doable, but it means bandwidth-constrained clients can't sample. We could optimize this by reducing the number of blobs and increasing blob size, but that would make reconstruction more expensive.
So ultimately we want to go a step further and do 2D sampling, which does this by randomly sampling not just within a blob, but between blobs as well. The linear property of the KZG commitment is used to "extend" the set of blobs in a block by a list of new "virtual blobs" that redundantly encode the same information.
2D sampling. Source: a16z
Crucially, blobs are not required to compute the expansion of the commitment, so the scheme is fundamentally friendly to distributed block construction. Nodes that actually build blocks only need to have the Blob KZG commitment and can rely on DAS to verify the availability of the blobs themselves. 1D DAS is also inherently friendly to distributed block construction.
How does it relate to existing research?
Original paper introducing data availability (2018): https://github.com/ethereum/research/wiki/A-note-on-data-availability-and-erasure-coding
Follow-up paper: https://arxiv.org/abs/1809.09044
DAS explainer post, paradigm: href="https://www.paradigm.xyz/2022/08/das">https://www.paradigm.xyz/2022/08/das
KZG committed 2D availability: https://ethresear. ch/t/2d-data-availability-with-kate-commitments/8081
PeerDAS on ethresear.ch: https://ethresear.ch/t/peerdas-a-simpler-das-approach-using-battle-tested-p2p-components/16541 and paper: https://eprint.iacr.org/2024/1362
EIP-7594: https://eips.ethereum.org/EIPS/eip-7594
ethresear. on ch SubnetDAS: Nuances of Recoverability in 2D Sampling: What else needs to be done, and what are the tradeoffs? The next step is to complete the implementation and rollout of PeerDAS. From that point on, it has been an incremental effort to keep increasing the blob count on PeerDAS while carefully watching the network and improving the software to ensure security. In parallel, we hope to do more academic work on formalizing PeerDAS and other versions of DAS and their interaction with issues like the safety of the fork choice rule.
Going forward, we need to do more work to figure out the ideal version of 2D DAS and prove its safety properties. We also hope to eventually migrate away from KZG to quantum-resistant, trusted-setup-free alternatives. Currently, we do not know of any candidates that would be friendly to distributed blockchain construction. Even the expensive “brute force” technique of using recursive STARKs to generate validity proofs for reconstructing rows and columns is not enough, because technically the hash size of a STARK is O(log(n) * log(log(n)) (with STIR), and in practice a STARK is almost as big as the entire blob.
In the long run, I think the realistic path is to:
ideally 2D DAS tools;
stick with 1D DAS, sacrificing sampling bandwidth efficiency and accepting a lower data cap for simplicity and robustness;
(hard pivot) abandon DA, and fully embrace Plasma as the primary layer 2 architecture we focus on.
We can look at these in terms of a range of tradeoffs:
Note that this choice still exists even if we decide to scale execution directly on L1. This is because if L1 is to handle a lot of TPS, L1 blocks will become very large and clients will need an efficient way to verify that they are correct, so we must use the same technology that supports rollups (ZK-EVM and DAS) and L1.
How does it interact with the rest of the roadmap?
The need for 2D DAS is reduced, or at least delayed, if data compression is implemented (see below), and further reduced if Plasma becomes widely used. DAS also poses challenges to distributed block construction protocols and mechanisms: while DAS Friendly to distributed refactoring in theory, but in practice needs to be combined with inclusion list proposals and the fork choice mechanism surrounding them.
Data Compression
What Problem Are We Solving?
Each transaction in a Rollup takes up a lot of on-chain data space: an ERC20 transfer takes about 180 bytes. This limits the scalability of layer 2 protocols even with ideal data availability sampling. At 16 MB per slot, we get:
16000000 / 12 / 180 = 7407 TPS
What if, in addition to solving the numerator, we could also solve the denominator, and make each transaction in the Rollup take up fewer bytes on-chain?
What is it and how does it work?
I think the best explanation is this image from two years ago:
The simplest gain is zero-byte compression: replace each long sequence of zero bytes with two bytes representing the number of zero bytes. Going a step further, we exploit specific properties of transactions:
Signature aggregation - we switch from ECDSA signatures to BLS signatures, which have the property that many signatures can be combined together into a single signature that can prove the validity of all the original signatures. L1 doesn't consider this because the computational cost of verification (even with aggregation) is higher, but in a data-scarce environment like L2, they arguably make sense. ERC-4337's aggregation capabilities provide one way to do this.
Replace addresses with pointers - If an address has been used before, we can replace a 20-byte address with a 4-byte pointer to the historical location. This is necessary to achieve maximum benefits, although it takes effort to implement, as it requires (at least part of) the history of the blockchain to effectively be part of the state.
Custom serialization of transaction values - Most transaction values have only a few digits, eg. 0.25 ETH is represented as 250,000,000,000,000,000 wei. Gas max-basefees and priority fees work similarly. As a result, we can represent most monetary values very compactly using a custom decimal floating point format, or even a dictionary for particularly common values.
How does it connect to existing research?
Explorations from sequence.xyz: https://sequence.xyz/blog/compressing-calldata
Calldata Optimized Contracts for L2, from ScopeLift: https://github.com/ScopeLift/l2-optimizoooors
Another strategy - Proof-of-Validity Rollups (aka ZK Rollups) publish state diffs instead of transactions: BLS Wallet - BLS Aggregation via ERC-4337: https://github.com/getwax/bls-wallet
What else needs to be done and what are the trade-offs?
The main work left to do is to put the above scheme into practice. The main tradeoffs are:
Switching to BLS signatures requires significant effort and reduces compatibility with trusted hardware chips that can improve security. ZK-SNARK wrappers for other signature schemes can be used instead.
Dynamic compression (e.g. replacing addresses with pointers) complicates client code.
Publishing state differences to the chain instead of transactions reduces auditability and makes many software (e.g. block explorers) not work.
How does it interact with other parts of the roadmap?
The adoption of ERC-4337, and eventually incorporating parts of it into the L2 EVM, could greatly accelerate the deployment of aggregate technology. Incorporating parts of ERC-4337 into L1 could accelerate its deployment on L2.
Plasma in General
What Problem Are We Solving?
Even with 16MB blobs and data compression, 58,000 TPS isn’t necessarily enough to completely take over consumer payments, decentralized social, or other high-bandwidth domains, especially if we start to consider privacy, which can drop scalability by 3-8x. For high-volume, low-value applications, one option today is validium, which keeps data off-chain and has an interesting security model where operators can’t steal users’ funds, but they can disappear and temporarily or permanently freeze all users’ funds. But we can do better.
What is it and how does it work?
Plasma is a scaling solution that involves operators publishing blocks off-chain and putting the Merkle roots of those blocks on-chain (unlike Rollups, which put the entire block on-chain). For each block, the operator sends each user a Merkle branch proving what did or did not happen to that user's assets. Users can withdraw assets by providing a Merkle branch. Importantly, that branch does not have to be rooted in the latest state - so even if data availability fails, users can still recover their assets by withdrawing the latest state available. If a user submits an invalid branch (e.g., withdrawing assets they have already sent to someone else, or the operator creates assets out of thin air themselves), an on-chain challenge mechanism can adjudicate who the asset correctly belongs to.
Plasma Cash chain graph. A transaction that spends coin i is put into the i-th position in the tree. In this example, assuming all previous trees are valid, we know that Eve currently owns coin 1, David owns coin 4, and George owns coin 6.
Early versions of Plasma could only handle the payments use case and could not be effectively generalized further. However, if we require that every root be verified with a SNARK, then Plasma becomes much more powerful. Each challenge game can be greatly simplified because we eliminate most possible paths for the operator to cheat. New paths are also opened up, enabling Plasma technology to scale to a wider range of asset classes. Finally, in the case where the operator does not cheat, users can withdraw their funds immediately, without having to wait for a week-long challenge period.
One way (not the only way) to make an EVM Plasma chain: Use ZK-SNARK to build a parallel UTXO tree that reflects the balance changes made by the EVM and defines what is a unique mapping of "the same coin" to different points in history. Then the Plasma structure can be built on this basis.
An important insight is that the Plasma system does not need to be perfect. Even if you can only secure a portion of the assets (e.g. even just tokens that haven’t moved in the past week), you’ve already made a big improvement on the status quo of the hyper-scalable EVM, which is a validation.
Another class of constructions are hybrid Plasma/rollups constructions, such as Intmax. These constructions put very small amounts of data per user on-chain (e.g. 5 bytes), and by doing so, get properties somewhere in between Plasma and rollups: in the Intmax case you get very high levels of scalability and privacy, with a theoretical upper limit of about 16,000,000 / 12 / 5 = 266,667 TPS even in a 16 MB world.
What are the connections to existing research?
Original Plasma paper: https://plasma.io/plasma-deprecated.pdf
Plasma cash: https://ethresear.ch/t/plasma-cash-plasma-with-much-less-per-user-data-checking/1298
Plasma cash flow: https://hackmd.io/DgzmJIRjSzCYvl4lUjZXNQ?view#-Exit
Intmax (2023): https://eprint.iacr.org/2023/1082
What else needs to be done, and what are the trade-offs?
The main task remaining is to put Plasma systems into production. As mentioned above, "plasma vs. validium" is not a binary opposition: any validium can improve security by adding Plasma features to the exit mechanism at least a little bit. Research is partly about getting the best properties of the EVM (in terms of trust requirements, worst-case L1 gas costs, and DoS vulnerability) and alternative application-specific constructions. Additionally, the conceptual complexity of Plasma relative to rollups is greater and needs to be addressed directly, both through research and building better general frameworks.
The main drawback of using Plasma designs is that they are more operator-dependent and harder to "base on", although hybrid Plasma/rollup designs can generally avoid this weakness.
How does this interact with the rest of the roadmap?
The more efficient a Plasma solution is, the less pressure there is on L1 to have high-performance data availability features. Moving activity to L2 also reduces MEV pressure on L1.
Mature L2 Proof Systems
What Problem Are We Solving?
Today, most rollups are not actually trustless; there is a security council that has the ability to overrule the behavior of (optimism or validity) proof systems. In some cases, the proof system doesn’t even exist at all, or if it does it only has an “advisory” function. The most advanced are (i) some application-specific rollups, such as Fuel, which are trustless, and (ii) Optimism and Arbitrum, two full EVM rollups that have achieved a partial trustlessness milestone called “Phase 1” as of this writing. The reason Rollups haven’t gone further is because of concerns about bugs in the code. We need trustless rollups, so we need to tackle this problem head-on.
What is it and how does it work?
First, let’s review the “stage” system that was originally introduced in this article. There are more detailed requirements, but to summarize:
Phase 0: Users must be able to run a node and sync the chain. This is fine if the validation is fully trusted/centralized.
Phase 1: There must be a (trustless) proof system that ensures that only valid transactions are accepted. A security council that can overturn the proof system is allowed to exist, but only with a 75% voting threshold. Additionally, a quorum-blocking portion of the council (i.e. above 26%) must be outside of the primary companies building the rollup. Less powerful upgrade mechanisms (e.g. DAOs) are allowed, but there must be a long enough delay so that if a malicious upgrade is approved, users can exit their funds before the upgrade goes live.
Phase 2:There must be a (trustless) proof system to ensure that only valid transactions are accepted. The Council is only allowed to intervene if there is a provable bug in the code, e.g. if two redundant proof systems disagree with each other, or if one proof system accepts two different post-state roots for the same block (or doesn't accept anything for a long enough period of time, e.g. a week). Upgrade mechanisms are allowed, but there must be a long delay.
Our goal is to reach phase 2. The main challenge in reaching phase 2 is to gain enough confidence that the proof system is actually trustworthy enough. There are two main ways to do this:
Formal Verification: We can use modern math and computational techniques to prove that a (optimistic or validity) proof system only accepts blocks that pass the EVM specification. These techniques have been around for decades, but recent advances (e.g. Lean 4) have made them more practical, and advances in AI-assisted proofs may accelerate this trend further.
Multiple Provers: Make multiple proof systems, and put money into 2-of-3 (or larger) multisigs between those proof systems and a security council (and/or other gadgets with trust assumptions, such as a TEE). If the proving systems agree, the council has no power. If they disagree, the council can only choose one of them, and cannot unilaterally impose its own answer.
A stylized diagram of multiple provers, combining an optimistic proof system, a validity proof system, and a safety committee.
What are the connections with existing research?
EVM K Semantics (formal verification work since 2017):https://github.com/runtimeverification/evm-semantics
Demonstration on the idea of multiple provers (2022):https://www.youtube.com/watch?v=6hfVzCWT6YI
Taiko plans to use multiple proofs:https://www.youtube.com/watch?v=6hfVzCWT6YI
href="https://docs.taiko.xyz/core-concepts/multi-proofs/">https://docs.taiko.xyz/core-concepts/multi-proofs/
What else needs to be done, and what are the trade-offs?
For formal verification, there is a lot. We need to create a formally verified version of the entire SNARK prover for the EVM. This is an extremely complex project, although we have already started. There is a trick that can significantly simplify the task: we can make a formally verified SNARK prover for a minimal virtual machine, e.g. RISC-V or Cairo, and then write an implementation of the EVM in that minimal VM (and formally prove its equivalence to some other EVM specification).
For multi-proofs, there are two main remaining pieces. First, we need to have enough confidence in at least two different proof systems that they are each reasonably secure, and that if they crash, they crash for different and unrelated reasons (so they don't crash at the same time). Second, we need to get a very high level of assurance in the underlying logic of the merged proof system. This is a small snippet of code. There are ways to make it very small — just store the funds in a secure multisig contract whose signers are contracts that represent individual proof systems — but this comes at the expense of high on-chain gas costs. Some balance needs to be found between efficiency and security.
How does it interact with the rest of the roadmap?
Moving activity to L2 reduces MEV pressure on L1.
Cross-L2 interoperability improvements
What problem are we solving?
A big challenge with the L2 ecosystem today is that it is difficult for users to manipulate. Furthermore, the simplest approaches often reintroduce trust assumptions: centralized bridges, RPC clients, and so on. If we’re serious about the idea of L2 being part of Ethereum, we need to make using the L2 ecosystem feel like using the unified Ethereum ecosystem.
A Pathologically Bad Example (even Dangerous: I personally lost $100 due to a bad chain choice here) Cross-L2 UX - While this is not Polymarket’s fault, cross-L2 interoperability should be the responsibility of wallets and the Ethereum Standards (ERC) community. In a well-functioning Ethereum ecosystem, sending tokens from L1 to L2 or from one L2 to another should be just like sending tokens within the same L1.
What is it and how does it work?
There are many categories of cross-L2 interoperability improvements. Generally, the way to ask these questions is to note that in theory, Ethereum with rollups at its core is the same as L1 doing sharding, and then ask where the current Ethereum L2 version falls short of that ideal in practice. Here are a few:
Chain-specific addresses:The chain (L1, Optimism, Arbitrum...) should be part of the address. Once implemented, the cross-L2 send flow should be implemented by simply putting the address into the "send" field, at which point wallets can figure out how to do the send in the background (including using a bridge protocol).
Chain-specific payment requests:Making messages of the form "send me X token of type Y on chain Z" should be simple and standardized. There are two main use cases for this: (i) payments, either person-to-person or person-to-merchant services, and (ii) dapps requesting funds, e.g. the Polymarket example above.
Cross-chain swaps and gas payments:There should be a standardized, open protocol for expressing cross-chain operations, such as “I send 1 ETH on Optimism to someone who sends 0.9999 ETH on Arbitrum”, and “I send 0.0001 ETH on Optimism to anyone who includes this transaction on Arbitrum”. ERC-7683 is an attempt at the former, and RIP-7755 is an attempt at the latter, though both are more general than these specific use cases.
Light clients:Users should be able to actually verify which chain they are interacting with, rather than just trusting the RPC provider. A16z Crypto’s Helios does this for Ethereum itself, but we need to extend this trustlessness to L2. ERC-3668 (CCIP-read) is one strategy to achieve this.
How a light client updates its view of the Ethereum header chain. Once you have the header chain, you can use Merkle proofs to verify any state object. Once you have the correct L1 state object, you can use Merkle proofs (and possibly signatures if you want to check pre-confirmations) to verify any state object on L2. Helios already does the former. Expanding to the latter is a standardization challenge.
Keystore Wallets:Today, if you want to update the keys that control a smart contract wallet, you have to do it on all N chains that wallet is on. Keystore Wallets are a technology that allows keys to exist in one place (either on L1, or potentially later on L2), and then be read from any L2 that has a copy of the wallet. This means updates only need to happen once. To be efficient, Keystore Wallets require that L2s have a standardized way to costlessly read L1s; two proposals for this are L1SLOAD and REMOTESTATICCALL.
Stylized diagram of how the Keystore wallet works.
More radical "shared token bridge" idea: Imagine a world where all L2s are proof-of-validity rollups, with every slot dedicated to Ethereum. Even in this world, "natively" moving assets from one L2 to another requires withdrawals and deposits, which require paying large amounts of L1 Gas. One way to solve this problem is to create a shared minimal rollup whose only function is to maintain which L2 holds the balances of how many types of tokens, and allow those balances to be collectively updated through a series of cross-L2 send operations initiated by an arbitrary L2. This would allow cross-L2 transfers to occur without having to pay L1 Gas for each transfer, and without requiring liquidity provider-based technologies such as ERC-7683.
Sync composability: Allows synchronous calls to occur between a specific L2 and L1, or between multiple L2s. This could help improve the financial efficiency of defi protocols. The former can be done without any cross-L2 coordination; the latter requires shared sequencing. Rollup-based is automatically friendly to all of these technologies.
What are the connections with existing research?
Chain-specific address: ERC-3770: https://eips.ethereum.org/EIPS/eip-3770
ERC-7683: https://eips.ethereum.org/EIPS/eip-7683
RIP-7755:https://github.com/wilsoncusack/RIPs/blob/cross-l2-call-standard/RIPS/rip-7755.md
Rolling Keystore Wallet Design: https://hackmd.io/@haichen/keystore
Helios:https://github.com/a16z/helios
ERC-3668 (sometimes called CCIP-read): Justin Drake’s proposal for “based on (shared) preconfirmations”: L1SLOAD (RIP-7728): https://ethereum-magicians.org/t/rip-7728-l1sload-precompile/20388 https://ethereum-magicians.org/t/rip-7728-l1sload-precompile/20388 style="text-align: left;">Remote calls in Optimism: https://github.com/ethereum-optimism/ecosystem-contributions/issues/76
AggLayer, which includes the idea of a shared token bridge: https://github.com/AggLayer
What else needs to be done, and what are the trade-offs?
Many of the examples above face the standards dilemma of when to standardize and which layers to standardize. If you standardize too early, you risk inferior solutions. If you standardize too late, you risk unnecessary fragmentation. In some cases, there are both short-term solutions that are less performant but easier to implement, and long-term solutions that are "eventually correct" but will take quite a while to implement.
What's unique about this section is that these tasks are not just technical problems: they are also (and perhaps primarily!) social problems. They require L2 and wallets to cooperate as well as L1. Our ability to successfully handle this is a test of our ability to come together as a community.
How does it interact with the rest of the roadmap?
Most of these proposals are “higher layer” constructs, and therefore will not have much impact on L1 considerations. One exception is shared ordering, which has a big impact on MEV.
Scaling Execution on L1
What problem are we solving?
If L2 becomes very scalable and successful, but L1 is still only able to process a very small number of transactions, there are a number of risks that could arise for Ethereum:
The economics of the ETH asset become more risky, affecting the long-term security of the network.
Many L2s benefit from being closely tied to the highly developed financial ecosystem on L1, and if this ecosystem weakens significantly, the incentive to become an L2 (rather than an independent L1) will weaken.
It will take a long time for L2 to have exactly the same security guarantees as L1.
If L2 fails (e.g. due to malicious operation or a disappeared operator), users will still need to go through L1 to recover their assets. Therefore, L1 needs to be strong enough to at least occasionally actually handle the highly complex and messy end of L2.
For these reasons, it is valuable to continue to scale L1 itself and ensure that it can continue to adapt to more and more uses.
What is it and how does it work?
The simplest way to scale is to simply increase the gas limit. However, this runs the risk of centralizing L1, thereby weakening another important property that makes Ethereum L1 so powerful: its credibility as a strong base layer. There is ongoing debate about how far the simple gas limit increase is sustainable, and this will also change depending on the implementation of other techniques to make larger blocks easier to verify (e.g. history expiration, statelessness, L1 EVM validity proofs). Another important thing that needs to be continuously improved is the efficiency of Ethereum client software, which is more optimized today than it was five years ago. An effective L1 gas limit increase strategy will involve speeding up these verification techniques.
Another scaling strategy involves identifying specific functions and types of computations that can be made cheaper without compromising the decentralization of the network or its security properties. Examples of this include:
EOF - A new EVM bytecode format that is more friendly to static analysis and allows for faster implementations. Given these efficiencies, the EOF bytecode can be given a lower gas cost.
Multi-dimensional Gas Pricing - Establishing separate base fees and limits for compute, data, and storage could increase the average capacity of Ethereum L1 without increasing its maximum capacity (and thus creating new security risks).
Reducing Gas Costs for Specific Opcodes and Precompiles - Historically, we’ve had rounds of gas cost increases for certain underpriced operations to avoid denial of service attacks. Something we’ve done less but could do more of is reducing the gas cost of overpriced operations. For example, addition is much cheaper than multiplication, but the ADD and MUL opcodes currently cost the same. We could make ADD cheaper, and even simpler opcodes like PUSH cheaper. EOF is more expensive overall.
EVM-MAX and SIMD : EVM-MAX (“Modular Arithmetic Extensions”) is a proposal to allow more efficient native big-number modular math as a separate module of the EVM. Values computed by EVM-MAX computations can only be accessed by other EVM-MAX opcodes unless intentionally exported; this allows for more room to store these values in an optimized format. SIMD ("Single Instruction Multiple Data") is a proposal that allows the same instruction to be executed efficiently on arrays of values. Together, the two can create a powerful coprocessor with the EVM that can be used to implement cryptographic operations more efficiently. This is particularly useful for privacy protocols and L2 proof systems, so it will help with L1 and L2 scaling.
These improvements will be discussed in more detail in a future article on Splurge.
Finally, the third strategy is native rollup (or "built-in rollup"): essentially, creating many copies of the EVM that run in parallel, resulting in a model equivalent to what rollup can provide, but more natively integrated into the protocol.
What are the connections to existing research?
Polynya's Ethereum L1 expansion roadmap: https://polynya.mirror.xyz/epju72rsymfB-JK52_uYI7HuhJ-W_zM735NdP7alkAQ
Multidimensional Gas Pricing: https://vitalik.eth.limo/general/2024/05/09/multidim.html
EIP-7706: https://eips.ethereum.org/EIPS/eip-7706
EOF: https://evmobjectformat.org/
EVM-MAX: um-magicians.org/t/eip-6601-evm-modular-arithmetic-extensions-evmmax/13168">https://ethereum-magicians.org/t/eip-6601-evm-modular-arithmetic-extensions-evmmax/13168
SIMD:https://x.com/BanklessHQ/status/1831319419739361321
Justin Drake on scaling with SNARKs and native rollups: https://www.reddit.com/r/ethereum/comments/1f81ntr/comment/llmfi28/
What else needs to be done and what are the tradeoffs?
There are three strategies for L1 scaling, which can be pursued separately or in parallel:
Make L1 easier to verify with improved techniques (e.g. client code, stateless clients, history expiration), then raise the gas limit
Reduce the cost of specific operations, increasing average capacity without increasing worst-case risk
Native rollups (i.e. "create N parallel copies of the EVM", though perhaps giving developers a lot of flexibility in the parameters of how the copies are deployed)
It’s worth understanding that these are different techniques with different tradeoffs. For example, native rollups have many of the same weaknesses as regular rollups in terms of composability: you can’t send a single transaction to synchronously execute an operation across multiple transactions, like you can with contracts on the same L1 (or L2). Raising the gas limit takes away other benefits that can be achieved by making L1 easier to verify, such as increasing the fraction of users running validating nodes and increasing the number of individual stakers. Making specific operations in the EVM cheaper (depending on how they are done) may increase the overall complexity of the EVM.
One of the big questions that any L1 scaling roadmap needs to answer is: what is the ultimate vision for L1 and L2? Obviously, it would be absurd for everything to happen on L1: potential use cases are hundreds of thousands of transactions per second, which would make L1 completely unverifiable (unless we go the native rollups route). But we do need some guiding principles so that we can ensure that we don't create a situation where we increase the gas limit 10x, severely harming the decentralization of Ethereum L1, and find that instead of 99% of activity being on L2, we've just entered a world where 90% of activity is on L2, so the results look pretty much the same, except for a mostly irreversible loss of the specialness of Ethereum L1.
A proposed view on the "division of labor" between L1 and L2
How does it interact with the rest of the roadmap?
Getting more users onto L1 means improving not only scaling, but other aspects of L1 as well. This means more MEV will remain on L1 (rather than just being an issue for L2), so there is a greater urgency to deal with it explicitly. It greatly increases the value of fast slot times on L1. It also relies heavily on validation from L1 ("The Verge") going well.