Source: Geek Web3
Summary:
· Since EIP-4844, the data throughput and storage pressure of the Ethereum network have been increasing, and the growing storage demand has brought huge challenges to Ethereum nodes. In order to reduce storage pressure, some Ethereum clients have deleted the Ethereum historical data stored locally, and the consistency of storage behavior of different full nodes has gradually disintegrated.
· In order to ensure that all Ethereum clients can reach a consensus on behavior, EIP-4444 and EIP-4844 standardized the historical data deletion behavior, which will become the standard configuration of Ethereum nodes in the future.
· Therefore, if you want to replay historical data to restore the latest Layer1 or Layer2 state, you have to rely on centralized, outside-of-Ethereum protocol service facilities, which has prompted people to explore more decentralized storage solutions consistent with Ethereum
· The Ethereum Portal Network is a lightweight, decentralized P2P network for all types of Ethereum data, including historical data. It is designed for resource-constrained devices and provides Ethereum JSON-RPC services. The historical network and beacon chain network are almost ready.
· EthStorage is an incentivized modular storage network for EIP-4844 BLOBs data. To store BLOBs, users can call the storage contract on L1, use ETH as the storage fee, and record the hash value of the BLOB on the chain. Over time, the storage fee will be gradually distributed to storage service providers who provide off-chain BLOB storage proofs.
· Currently, the EthStorage testnet is running on the Ethereum Sepolia testnet, and several community participants have successfully proved their local storage status. Future plans include developing a decentralized Ethereum state network, implementing storage proofs for dynamically sized data, and accessing the EthStorage network in a decentralized manner directly from the browser.
Acknowledgements: Thanks to Piper Merriam from the Ethereum Foundation, Karthik Raju from Polychain, and Qiang Zhu from EthStorage for their feedback on this article.
Background:
On October 22, 2023, Péter Szilágyi, the head of the well-known Go-Ethereum (Geth) development, expressed his concerns about Ethereum's data storage solution on Twitter. He pointed out that while the Geth client retains all historical data, other types of Ethereum clients, such as Nethermind and Besu, can be configured to delete certain Ethereum historical data (such as historical blocks). This will make some client nodes behave inconsistently with other clients, which is unfair to Geth client operators. The above topic immediately triggered a heated discussion about the storage solution in the Ethereum roadmap.
Storage Challenges
Why do Nethermind and Besu allow client operators to prune local historical data? What is the problem reflected by this decision?
From our perspective, there are two main reasons:
The first reason stems from the ever-increasing storage requirements of Ethereum clients. The pie chart below shows the storage distribution of a new Geth node at block height 18,779,761 as of December 13, 2023.
As shown in the figure:
Total storage size: 925.39 GB
Historical data (blocks/transaction receipts): about 628.69 GB
State data in Merkle Patricia Trie (MPT): about 269.74 GB
The second reason is that Ethereum nodes lack in-protocol incentives or penalties for storing historical blocks. Although the protocol advocates that nodes store all historical data, it fails to provide any mechanism to encourage storage or punish violations. Nodes are willing to store and provide external access to historical data, more out of altruism than because of incentives.
Of course, client operators are free to delete or modify all historical data without any penalty. In contrast, Validator nodes must maintain and update the complete state locally to prevent being slashed for proposing/voting for invalid blocks.
Therefore, it is not surprising that some node operators choose to delete historical data when storage costs become a significant burden on nodes. Without historical data, node clients can significantly reduce storage costs, reducing the occupied storage space from about 1TB to about 300GB.
Chart: Nethermined configuration runs a node without historical blocks - currently saves about 460GB of storage costs
With the upcoming Ethereum Data Availability (DA) upgrade, storage challenges will intensify. The road to fully scaling Ethereum DA begins with EIP-4844 in the DenCun upgrade, which introduces a fixed-size binary large object (BLOB) and an independent fee model called blobGasPrice. Each BLOB is set to 128KB, and after EIP-4844 is implemented, each block will contain a maximum of 6 BLOBs. In order to scale data throughput, Ethereum plans to adopt 1D Reed-Solomon erasure coding, initially allowing 32 BLOBs per block, and reaching the order of 256 BLOBs per block when fully scaled.
If Ethereum DA runs at full capacity (256 BLOBs per block), the Ethereum DA network is expected to receive approximately 80 TB of DA data a year, which is far beyond the storage capacity of most nodes.
Ethereum storage roadmap and its consequences
Vitalik's Ethereum roadmap tweet mentioned that Purge mainly involves storage.
The rising storage costs have attracted the attention of researchers in the Ethereum ecosystem. To address this issue and ensure consistency across all clients, researchers are working on proposals to explicitly delete historical data from Ethereum clients. The two main proposals are:
· EIP-4444: Restrict historical data in execution clients: This proposal allows clients to delete past blocks that are more than one year old. Assuming an average block size of 100K, the upper limit of historical block data is about 250GB (100K * (3600 * 24 * 365) / 12, assuming block time = 12 seconds).
· EIP-4844: Sharded BLOB transactions: discard BLOB data older than 18 days. Compared with EIP-4444, this is a more aggressive approach, limiting the historical BLOB size to about 100GB ((18 * 3600 * 24) * 128K * 6 / 12, assuming block time = 12 seconds).
What are the consequences of deleting historical data for all clients? One major problem is that new nodes cannot be synchronized to the latest state through the "full sync" mode, which is a data synchronization scheme that replays historical data from the genesis block to the latest block. Accordingly, we must adopt "snap sync" or "state sync" to directly synchronize the latest state of the Ethereum node. This method has been implemented in Geth and is used as the default synchronization operation mode.
Nodes deleting the historical data of the Ethereum mainnet will also cause problems in Ethereum L2, that is, newly added Layer2 nodes cannot synchronize to the current latest state by replaying all the historical data of Layer2. In addition, since L1 nodes do not maintain L2 states, L2's "snap sync" method cannot directly derive the latest Layer2 state based on Layer1 blocks, which violates the important assumptions required for Layer2 to inherit Ethereum security.
The expected solution will rely on third-party services of Infura/Etherscan/L2 projects themselves to store Layer2 historical data or state copies, which is a centralized solution achieved through indirect incentives outside the protocol.
The core question we want to explore is:
Can we find a better decentralized solution for storage and access?
Is it possible to find a solution that gives nodes direct incentives and is guaranteed by the Ethereum network itself (for example, by L1 contracts)?
On the basis of all this, can we provide a fully decentralized, in-protocol direct incentive solution for the Ethereum storage route?
Solution
Solution 1: Ethereum Portal Network
The Ethereum Portal Network is a lightweight, decentralized network for connecting to the Ethereum protocol. It provides Ethereum JSON-RPC interfaces such as eth_call, eth_getBlockByNumber, etc. It converts JSON-RPC requests into P2P requests to the distributed hash table (DHT), similar to the IPFS network. Unlike IPFS, which allows the storage of any data type and is susceptible to junk data, the Portal P2P network exclusively hosts Ethereum data, such as historical block headers and transaction data, which is achieved through the light client verification technology built into the Portal network.
An important feature of the Portal network is. Its lightweight operation design and compatibility with resource-constrained devices. It can run on nodes with a few MB of storage space and low memory, thus promoting decentralization. Even mobile phones or Raspberry Pi devices have the potential to join the network and contribute to solving the Ethereum DA problem.
The development of the Portal network is consistent with the philosophy of diversity of Ethereum clients, and clients are written in Rust, JavaScript, and Nim. The Beacon Network and the History Network are already available, while the State Network is under active development. It is worth noting that the Portal network does not provide direct incentives for data storage.
Illustration: Portal network Rust client (Trin) with 100MB storage limit in operation
Solution 2: EthStorage Network
The EthStorage Network is a decentralized incentivized storage network dedicated to storing EIP-4844 BLOBs and is funded by the ESP project.
· Minimal Trust:Unlike existing solutions that require centralized data bridges, EthStorage relies on Ethereum's consensus and a 1/m trust model of permissionless EthStorage storage nodes. The process of storing a BLOB is as follows: the user signs a transaction with the BLOB and calls the put(key, blob_idx) method of the storage contract. The storage contract will then record the BLOB hash on-chain. The storage provider will then download and store the BLOB directly from the Ethereum DA network, bypassing the data bridge problem.
· Storage costs are aligned with incentives:When the put() method is called, the transaction must send a storage fee (via msg.value) and deposit it into the contract. After the storage proof is successfully submitted and verified by the off-chain storage node, this storage fee will be gradually distributed to the storage node over time. Compared to the existing Ethereum storage fee model that pays a one-time storage fee to the block producer (proposer), the storage fees paid over time follow a discounted cash flow model - assuming that storage costs will decrease relative to the ETH price over time. This major innovation introduced by EthStorage aligns fees with storage nodes' storage contributions.
· Proof of Storage:Proof of Storage is inspired by data availability sampling, and sampling in EthStorage is for BLOBs stored over a period of time. To efficiently verify on-chain sampling, EthStorage makes full use of smart contracts and the latest SNARK technology developments.
· Permissionless Operation:Any storage node in EthStorage can get paid as long as it stores data and submits storage proofs on the chain regularly.
From a modular blockchain perspective, EthStorage acts as Ethereum storage L2, but it charges storage fees instead of transaction fees. By indexing BLOB hashes on-chain, EthStorage is an Ethereum modular storage layer that improves storage scalability and reduces costs (targeted to be about 1000 times).
In terms of development, EthStorage has been integrated with EIP-4844 on the Ethereum Sepolia testnet. We have stress-tested EthStorage and the Ethereum Sepolia testnet, including writing BLOBs of about hundreds of GB to EthStorage. More than 100 community participants joined the network and successfully proved their local storage.
The main advantage of the EthStorage network is to provide decentralized direct incentives on top of Ethereum - a groundbreaking feature to our current knowledge. However, the network is limited in that it is designed specifically for fixed-size BLOBs.
Dashboard of Ethereum Sepolia testnet on EthStorage
Looking Ahead
Although Ethereum storage has not received major attention, it is of great significance in the Ethereum ecosystem. With the rapid growth of the Ethereum network, the storage and accessibility of Ethereum data has become a key challenge. Portal Network and EthStorage Network are still in the early stages, and there are many important long-term development directions to focus on:
Decentralized low-latency access to Ethereum state data network:Accessing Ethereum state in a decentralized and verifiable way is a critical but challenging task. Using the traditional DHT network model, querying account information usually requires multiple queries to the internal trie nodes stored in different P2P nodes. This often leads to considerable latency. How to use the structure of the state tree to speed up access is the key. The Ethereum Portal Network’s upcoming state network is designed to solve this problem.
Integration of Portal Network with EthStorage Network:The Portal Network can be seamlessly extended to support BLOB data. The EthStorage team has partially implemented this feature. The next step is to unify these networks to provide a decentralized JSON-RPC network that can provide programmable access to BLOBs through contracts. By combining the application logic in the contract with the scalable BLOB storage provided by EthStorage, we can enable new dApps on Ethereum, such as dynamic decentralized websites (e.g. decentralized Twitter/YouTube/Wikipedia, etc.).
Decentralized access from browsers:Similar to the ipfs:// protocol for accessing data in the IPFS network, the web3 industry needs an Ethereum native access protocol to support direct browser access to unlock the huge potential of Ethereum's rich data. This data covers a wide range of areas, from token ownership and account balances to NFT images and dynamic decentralized websites, all of which are enabled by smart contracts and future Ethereum storage capabilities. In this area, the web3:// protocol defined by ERC-4804/6860 is currently being actively developed and promoted to achieve this goal.
Advanced storage proofs for dynamically sized data:In addition to fixed BLOBs, exploring advanced storage proofs is also imperative to solve dynamic sized data (such as historical blocks or even state objects, etc.). Developing complex algorithms can enhance the adaptability of storage solutions.
In our pursuit, we hope that through these efforts, we can collectively contribute to the Ethereum roadmap and lay the foundation for decentralized storage solutions for the future Ethereum ecosystem.