Original title: Possible futures of the Ethereum protocol, part 1: The Merge
Author: Vitalik, founder of Ethereum
Originally, "The Merge" referred to the most important event since the launch of the Ethereum protocol: the long-awaited and hard-won transition from proof of work PoW to PoS. Today, Ethereum has been running stably for nearly two years, and this PoS has performed very well in terms of stability, performance, and avoiding centralization risks. However, PoS still has some important areas to improve.
The roadmap I drew in 2023 divides it into several parts: Improvements in technical features such as stability, performance, and accessibility to small validators, as well as economic changes to address centralization risks. The former became part of “The Merge”, while the latter became part of “the Scourge”.
This post will focus on “The Merge”:What else could be improved about the technical design of Proof of Stake (PoS), and what are the paths to achieving those improvements?
This is not an exhaustive list of things that could be done to PoS; rather, it is a list of ideas that are being actively considered.
Single Slot Finality (SSF) and Democratizing Staking
What problem are we solving?
Currently, Ethereum takes 2-3 epochs (about 15 minutes) to finalize a block, and requires 32 ETH to become a staker.
This was originally a compromise to balance three goals:
Maximize the number of validators participating in staking (which directly means minimizing the minimum amount of ETH required to stake)
Minimize the finalization time
Minimize the overhead of running a node
These three goals conflict with each other: In order to achieve "economic finality" (i.e., an attacker needs to destroy a lot of ETH to recover the finalized block), every validator needs to sign two messages for each finalization. Therefore, if you have many validators, it either takes a long time to process all the signatures, or you need very powerful nodes to process all the signatures at the same time.
Note that this all depends on a key goal of Ethereum: ensuring that even a successful attack is costly to the attacker. This is what the term "economic finality" means. If we don't have this goal, then we can solve this problem by randomly selecting a committee (such as Algorand does) to finalize each slot. But the problem with this approach is that if an attacker does control 51% of the validators, then they can attack (undo finalized blocks, censor or delay finalization) at a very low cost: only the part of the nodes in the committee can be detected as participating in the attack and punished, whether through slashing or minority soft forks. This means that the attacker can attack the chain over and over again many times. So if we want economic finality, then a simple committee-based approach won’t work, and at first glance it seems like we really need full validator participation.
Ideally, we want to retain economic finality while improving on the status quo in two areas:
1. Finalize blocks in one slot (ideally keeping or even reducing the current length of 12 seconds) instead of 15 minutes
2. Allow validators to stake 1 ETH (was 32 ETH)
The first goal is justified by two goals, both of which can be seen as “aligning the properties of Ethereum with those of (more centralized) performance-focused L1 chains”.
First, it ensures that all Ethereum users benefit from the higher level of security guarantees achieved through the finalization mechanism. Today, most users cannot enjoy this guarantee because they are unwilling to wait 15 minutes; with a single-slot finalization mechanism, users can see their transactions finalized almost immediately after confirming the transaction. Second, it simplifies the protocol and the infrastructure around it if users and applications don’t have to worry about the possibility of a chain rollback (barring a relatively rare inactivity leak).
The second goal is motivated by a desire to support solo stakers. Poll after poll has repeatedly shown that the primary factor preventing more people from solo staking is the 32 ETH minimum. Lowering the minimum to 1 ETH would address this issue to the point where other issues become the primary factor limiting solo staking.
There is a challenge: both the goals of faster finality and more democratized staking conflict with the goal of minimizing overhead. In fact, this fact is the entire reason we don’t adopt single-slot finality in the first place. However, recent research has suggested some possible solutions to this problem.
What is SSF and how does it work?
Single-slot finality involves using a consensus algorithm that finalizes blocks within one slot. This is not in itself a difficult goal to achieve: many algorithms (such as Tendermint consensus) already implement this with optimal properties. One desirable property unique to Ethereum, which Tendermint does not support, is the "inactivity leak", which allows the chain to continue running and eventually recover even if more than 1/3 of validators are offline. Fortunately, this desire has been addressed: there are proposals to modify Tendermint-style consensus to accommodate the inactivity leak.
Leading Single-Slot Finality Proposals
The hardest part of the problem is figuring out how to make single-slot finality work for very high validator counts without incurring extremely high node operator overhead. To this end, there are several leading solutions:
Horn, one of the designs proposed for a better aggregation protocol.
Option 2: Orbit Committee - A new mechanism that allows randomly selected medium-sized committees to be responsible for completing the chain, but in a way that preserves the attack cost properties we seek.
One way to think about the Orbit SSF is that it opens up a space of compromise options ranging from x=0 (Algorand-style committees with no economic finality) to x=1 (status quo Ethereum), opening up middle points where Ethereum still has enough economic finality to be extremely secure, but at the same time we gain the efficiency advantage of only requiring a moderately sized random sample of validators to participate in each slot.
Orbit exploits pre-existing heterogeneity in validator deposit sizes to get as much economic finality as possible while still giving solo validators a role. Additionally, Orbit uses slow committee rotation to ensure a high overlap between adjacent quorums, ensuring that its economic finality still applies across committee rotation boundaries.
Option 3: Two-Tier Staking - A mechanism where stakers are divided into two classes, one with a higher deposit requirement and one with a lower deposit requirement. Only the tier with the higher deposit requirement would directly participate in providing economic finality. There are various proposals (e.g., see the Rainbow Staking article) to specify what rights and responsibilities the tier with the lower deposit requirement has. Common ideas include:
Delegating staked rights to higher-tier stakers
Randomly sampling lower-tier stakers to attest and finalize each block
Generating inclusion lists
What is the connection to existing research?
What’s left to do? What are the tradeoffs?
There are four main possible paths to choose from (we can also take a hybrid path):
1. Maintain the status quo
2. Orbit SSF
3. Brute force SSF
4. SSF with a two-tier staking mechanism
1 means doing no work and leaving it as is, but this makes Ethereum’s security experience and staking centralization properties worse than they should be.
2 Avoid "high tech" and solve the problem by cleverly rethinking the protocol assumptions: We relax the "economic finality" requirement so that we require attacks to be expensive, but the cost of an attack can be 10 times lower than it is today (e.g., $2.5 billion instead of $25 billion). It is widely believed that Ethereum has far more economic finality today than it needs to, and its main security risks are elsewhere, so this is arguably an acceptable sacrifice.
The main work is to verify that the Orbit mechanism is secure and has the properties we want, and then fully formalize and implement it. In addition, EIP-7251 (increase maximum valid balance) allows voluntary validator balances to be merged, which immediately reduces chain verification overhead and serves as an effective initial stage for the launch of Orbit.
3 Avoid clever rethinking and instead use high tech to brute force the problem. To do this requires collecting a large number of signatures (1 million+) in a very short time (5-10 seconds).
4 avoids clever rethinking and high tech, but it does create a two-tier staking system that still has centralization risks. The risks depend heavily on the specific rights granted to the lower staking tiers. For example:
If lower tier stakers are required to delegate their attestation power to higher tier stakers, then delegation may become centralized and we end up with two highly centralized staking tiers.
If a random sampling of lower tiers is required to approve each block, then an attacker can block finality with only a tiny amount of ETH spent.
If lower tier stakers can only make inclusion lists, then the proof tier may remain centralized, at which point a 51% attack on the proof tier can censor the inclusion lists themselves.
Multiple strategies can be combined, for example:
1+2: Add Orbit, but don’t do single-slot finality
1+3: Use brute force techniques to reduce the minimum deposit without single-slot finalization. The amount of aggregation required is 64x less than the pure (3) case, so the problem becomes easier.
2+3: Do Orbit SSF with conservative parameters (e.g., 128k validator committee instead of 8k or 32k), and use brute force techniques to make it super efficient.
1+4: Add Rainbow staking, but don’t do single-slot finalization
How does SSF interact with other parts of the roadmap?
Single-slot finality reduces the risk of certain types of multi-block MEV attacks, among other benefits. Furthermore, in a single-slot finality world, the prover-proposer separation design and other in-protocol block production pipelines need to be designed differently.
The weakness of brute force strategies is that they make it much harder to reduce the slot time.
Single secret leader election (SSLE)
What problem are we solving?
Today, it is known in advance which validator will propose the next block. This creates a security vulnerability: an attacker can monitor the network, determine which validators correspond to which IP addresses, and launch a DoS attack on the validators when they are about to propose a block.
What is SSLE and how does it work?
The best way to solve the DoS problem is to hide the information about which validator will produce the next block, at least until the block is actually produced. Note that this is easy if we remove the "single" requirement: one solution is to let anyone create the next block, but require the randao to reveal less than 2 256 /N. On average, only one validator will be able to meet this requirement - but sometimes there will be two or more, and sometimes there will be none. Combining the "secret" requirement with the "single" requirement has always been a difficult problem.
The single secret leader election protocol solves this problem by using some cryptography to create a "blind" validator ID for each validator, and then giving many proposers a chance to shuffle and re-blind the pool of blind IDs (similar to how a mixnet works). During each period, a random blind ID is chosen. Only the owner of that blind ID can generate a valid proof to propose a block, but no one knows which validator that blind ID corresponds to.
Whisk SSLE Protocol
What is the connection with existing research?
What’s left to do? What are the tradeoffs?
Really, all that’s left is to find and implement a protocol that’s simple enough that we can easily implement it on mainnet. We take very seriously that Ethereum is a fairly simple protocol, and we don't want complexity to increase further. The SSLE implementations we've seen add hundreds of lines of code to the specification and introduce new assumptions in complex cryptography. Finding a sufficiently efficient quantum-resistant SSLE implementation is also an open problem.
It may eventually come to the point where the "marginal additional complexity" of SSLE will only drop low enough if we take the plunge and introduce mechanisms for performing general zero-knowledge proofs in the Ethereum protocol at L1 for other reasons (e.g. state tries, ZK-EVM).
Another option is to not bother with SSLE at all, and instead use out-of-protocol mitigations (e.g. at the p2p layer) to address the DoS problem.
How does it interact with the rest of the roadmap?
If we add attester-proposer separation (APS) mechanisms, such as execution tickets, then execution blocks (i.e. blocks containing Ethereum transactions) will not need SSLE, as we can rely on specialized block builders. However, for consensus blocks (i.e. blocks that contain protocol messages such as proofs, parts that may contain lists, etc.), we will still benefit from SSLE.
Faster Transaction Confirmation
What Problem Are We Solving?
It would be valuable to further reduce Ethereum's transaction confirmation time, from 12 seconds to 4 seconds. Doing so would significantly improve the user experience of L1 and rollups, while making DeFi protocols more efficient. It would also make it easier for L2 to decentralize, as it would allow a large number of L2 applications to work on rollups, reducing the need for L2 to build its own committee-based decentralized ordering.
Faster Transaction ConfirmationWhat is it and how does it work?
There are roughly two techniques here:
1. Reduce the slot time, for example to 8 seconds or 4 seconds. This does not necessarily mean 4-second finality: finality itself requires three rounds of communication, so we can make each round a separate block, which will get at least tentative confirmation after 4 seconds.
2. Allow proposers to issue pre-confirmations during a slot. In the extreme case, proposers can include the transactions they see in their blocks in real time and immediately publish pre-confirmation messages for each transaction ("My first transaction is 0×1234...", "My second transaction is 0×5678..."). The case where a proposer issues two conflicting confirmations can be handled in two ways: (i) punish the proposer, or (ii) use witnesses to vote to decide which one is earlier.
What are the connections with existing research?
What’s left to do? What are the tradeoffs?
It’s not clear that reducing slot times is feasible. Even today, stakers in many parts of the world struggle to get proofs fast enough. Trying 4 second slot times risks centralizing validators, and making it impractical to be a validator outside of a few privileged regions due to latency.
The weakness of the proposer preconfirmation approach is that it greatly improves average case inclusion time, but not the worst case: if the current proposer is behaving well, your transaction will be preconfirmed in 0.5 seconds instead of (on average) 6 seconds, but if the current proposer is offline or behaving poorly, you’ll still have to wait a full 12 seconds before the next period starts and a new proposer is offered.
In addition, there is an open question of how to incentivize preconfirmations. Proposers have an incentive to maximize their optionality for as long as possible. If witnesses sign off on the timeliness of preconfirmations, then senders can make part of their fees conditional on immediate preconfirmations, but this places an additional burden on witnesses and may make it harder for witnesses to continue to act as neutral "dumb pipes".
On the other hand, if we don't try to do this, and keep finalization times at 12 seconds (or longer), the ecosystem will place more emphasis on layer 2 preconfirmations, and cross-layer 2 interactions will take longer.
How does it interact with the rest of the roadmap?
Proposer-based preconfirmations actually rely on attestor-proposer separation (APS) mechanisms such as execution tickets. Otherwise, the pressure to provide real-time pre-confirmations may be too concentrated for regular validators.
Other Research Areas
51% Attack Recovery
It is often assumed that in the event of a 51% attack (including attacks that cannot be cryptographically proven, such as censorship), the community will work together to implement a minority soft fork, ensuring that the good guys win and the bad guys are inactivity-leaked or slashed. However, this level of over-reliance on the social layer is arguably unhealthy. We can try to reduce the reliance on the social layer by making the recovery process as automated as possible.
Full automation is impossible, because if it were, it would be equivalent to a consensus algorithm with a >50% fault tolerance, and we already know the (very strict) mathematically provable limitations of such algorithms. But we can achieve partial automation: for example, a client could automatically refuse to accept a chain as finalized or even as the head of a fork choice if it censors transactions it has seen for a long time. A key goal is to ensure that an attacker cannot at least achieve a quick and complete victory.
Raising the Quorum Threshold
Today, blocks are finalized as long as 67% of stakers support it. Some people think this is too aggressive. In the entire history of Ethereum, there has only been one (very brief) failure of finality. If this ratio is raised to 80%, the number of additional non-finality periods will be relatively low, but Ethereum will gain security: in particular, many more controversial situations will lead to temporary suspensions of finality. This seems to be a healthier situation than "the wrong party" winning immediately, whether the wrong party is an attacker or the client has a bug.
This also answers the question "what is the point of a solo staker?". Today, most stakers already stake through staking pools, and the possibility of a solo staker staking 51% of ETH seems small. However, it seems possible to get solo stakers to a quorum-blocking minority if we try, especially if the quorum is 80% (so a quorum-blocking minority only needs to be 21%). As long as solo stakers do not participate in a 51% attack (either by finality reversal or censorship), such an attack will not result in a "clean win", and solo stakers will have an incentive to help prevent a minority soft fork.
Quantum Resistance
Metaculus currently believes that quantum computers could start breaking cryptography sometime in the 2030s, albeit with a large margin of error:
Quantum computing experts, such as Scott Aaronson, have also recently begun to take the possibility of quantum computers actually working in the medium term more seriously. This will have an impact on the entire Ethereum roadmap: it means that every part of the Ethereum protocol that currently relies on elliptic curves will need some kind of hash-based or other quantum-resistant alternative. This specifically means that we cannot assume that we will be able to rely forever on the superior performance of BLS aggregations to process signatures from large validator sets. This justifies conservatism in performance assumptions of proof-of-stake designs, and is a reason to more aggressively develop quantum-resistant alternatives.
Special thanks to Justin Drake, Hsiao-wei Wang, @antonttc, and Francesco for their feedback and reviews.