Author: 0xNatalie Source: ChainFeeds
Ethereum state data expansion problem and solution
With the popularity of the Ethereum network and the increase in application demand, its historical state data began to grow rapidly. To address this problem, Ethereum has improved step by step, from the initial full node to the light client, and then to the recent Dencun upgrade to introduce the state expiration function to automatically clean up long-term unused data.
One of Ethereum's long-term goals is to reduce the load of a single blockchain by implementing sharding to spread data to different blockchains. EIP-4844 implemented in the Dencun upgrade is an important step for the Ethereum network to fully implement sharding. EIP-4844 introduces the "blobs" temporary data type, allowing Rollup to submit more data to the Ethereum main chain at a lower cost. In order to control the expansion of state data, Ethereum deletes blobs data after storing it in the consensus layer node for about 18 days.
In addition to Ethereum's own improvements, some projects such as Celestia, Avail, and EigenDA are also building solutions to improve data problems. They provide effective short-term data availability (DA) solutions, enhancing the real-time operation and scalability of blockchains. However, these solutions do not address applications that require long-term access to historical data, such as dApps that rely on long-term storage of user authentication data or dApps that require artificial intelligence model training.
In order to solve the challenges of long-term data storage in the Ethereum ecosystem, projects such as EthStorage, Pinax, and Covalent have proposed solutions. EthStorage provides long-term DA for Rollup, ensuring that data can be accessed and used for a long time. Pinax, The Graph, and StreamingFast jointly developed solutions for long-term storage and retrieval of blobs. Covalent's Ethereum Wayback Machine (EWM) is not only a long-term data storage solution, but also a complete system that can realize data query and analysis.
As artificial intelligence becomes the mainstream trend of global technological development, its combination with blockchain technology is also seen as the future development direction. This trend has led to a growing demand for access to and analysis of historical data. In this context, EWM demonstrates its unique advantages. EWM provides archiving and data processing of Ethereum historical data, allowing users to retrieve complex data structures and conduct in-depth analysis and query of the internal state of smart contracts, transaction results, event logs, etc.
Ethereum Wayback Machine (EWM) Introduction
Ethereum Wayback Machine (EWM) draws on the concept of Wayback Machine to save historical data on Ethereum and make it accessible and verifiable. Wayback Machine is a digital archive project created by the Internet Archive to record and preserve the history of the Internet. This tool allows users to view archived versions of a website at different points in the past, helping people understand the historical changes in the website content.
Historical data is the fundamental reason for the birth of blockchain. It not only supports the technical architecture of blockchain, but also is the cornerstone of its economic model. Blockchain was designed to provide a public and unchangeable historical record. For example, Bitcoin is designed to create an unalterable and decentralized ledger that records the history of each transaction to ensure the transparency and security of transactions. The demand scenarios for historical data are very wide, but there is currently a lack of an efficient and verifiable storage method. As a long-term DA solution, EWM can permanently store data, including blob data, and can deal with the accessibility issues of historical data caused by state expiration and data sharding. EWM focuses on the archiving and long-term accessibility of historical data on Ethereum, and supports complex data structure queries. Next, we will explore in detail how EWM achieves this goal through its unique data processing process.
EWM's data processing flow: extraction, refinement and indexing
Covalent is a platform that provides users with access and query services to blockchain data. It achieves reliable storage and fast access to data by capturing and indexing blockchain data and storing it on multiple nodes on the network. Covalent processes data through the Ethereum Wayback Machine (EWM) to ensure the continued accessibility of blockchain historical data. The EWM data processing flow includes three key steps: extraction and export, refinement, indexing and query.
Extraction and Export: This is the first step in the process and involves extracting historical transaction data directly from the blockchain network. This step is performed by specialized entities, namely Block Specimen Producers (BSPs). The main task of BSPs is to create and save "block samples", which are raw snapshots of blockchain data. These block samples serve as a canonical representation of the historical state of the blockchain, and it is critical to maintain the integrity and accuracy of the data. Once created, these block samples are uploaded to a distributed server (built on IPFS) and published and verified through the ProofChain contract. This not only ensures the security of the data, but also provides a signal to others that the data has been safely saved.
Refining: After the data is extracted, it is refined by Block Results Producers (BRPs). BRPs are responsible for converting the basic data into a more useful form. Traditional blockchain data access methods usually only provide limited information and are not easy to query complex data structures. By re-executing and transforming data, BRP is able to provide more detailed information, such as the internal state of the contract, the execution path of the transaction, etc. In addition, BRP significantly reduces the need to re-run the full node for each query or data analysis by pre-processing and storing the processed data, thereby improving query speed and reducing storage and computing costs. At this point, the original "block sample" is converted into a form "block result" that is easier to query and analyze. This process not only speeds up the performance of the Covalent network, but also provides more possibilities for further query and analysis of the data.
Index and Query: Finally, the query operator organizes and saves the processed data in an easy-to-find location. According to the needs of API users, data is extracted from distributed servers to ensure that both historical and real-time data can be used to respond to API queries. In this way, users can effectively access and utilize blockchain data stored in the Covalent network.
Covalent provides a unified GoldRush API that supports obtaining historical data from multiple blockchains (such as Ethereum, Polygon, Solana, etc.). This GoldRush API provides developers with a one-stop data solution, allowing developers to obtain an account's ERC20 token balance and NFT data through a single call, making it easy to build cryptocurrency and NFT wallets (such as Rainbow, Zerion), greatly simplifying the development process. In addition, using the API to access DA data requires credit points (Credit), and different types of requests are divided into different categories (Class A, Class B, Class C, etc.), each with its own specific credit cost. This revenue is used to support the operator network.
Future Outlook
With the rapid development of AI, the trend of combining AI with blockchain is becoming more and more obvious. Blockchain technology provides AI with an unalterable and distributed verified data source, which enhances data transparency and trust, making AI models more accurate and reliable in data analysis and decision-making. By analyzing on-chain data, AI can optimize algorithms and predict trends, thereby directly executing complex tasks and transactions, significantly improving the efficiency of dApps and reducing costs. Through EWM, AI models can access a wide range of on-chain structured Web3 data sets, and these data are complete and verifiable. As a bridge between AI models and blockchain, EWM greatly facilitates data retrieval and utilization for AI developers.
There are already some AI projects that have integrated Covalent:
SmartWhales: A platform that uses AI technology to optimize copy trading investment strategies. Copy trading relies on the analysis of historical data to identify successful trading patterns and strategies. Covalent provides a comprehensive and detailed blockchain data set. SmartWhales analyzes past trading behaviors and results through this data, identifies which strategies perform well under specific market conditions and recommends them to users.
BotFi: DeFi trading robot. Analyze market trends and automate trading strategies by integrating Covalent's data, and automatically buy and sell according to market changes.
Laika AI: Use AI for comprehensive on-chain analysis. The Laika AI platform drives its AI model by integrating structured blockchain data provided by Covalent, helping users perform complex on-chain data analysis.
Entendre Finance: Automated DeFi asset management, providing real-time insights and predictive analysis. Its AI uses Covalent's structured data to simplify and automate asset management, such as monitoring and managing digital asset holdings, and automatically executing specific trading strategies.
EWM is also constantly improving and upgrading as needs change. Covalent engineer Pranay Valson said that in the future EWM will expand the protocol specification to support other blockchains such as Polygon and Arbitrum, and will integrate BSP forks into Ethereum clients such as Nethermind and Besu to achieve wider compatibility and application. In addition, when EWM processes blob transactions on the beacon chain, it will use KZG commitments to improve data storage and retrieval efficiency and reduce storage costs.