But what if one day Bob's mining car breaks down. He left the mining car at the repair shop and forgot the mining pickaxe in the mining car. It was too late to return to the processing plant, but they still had work to do. Using only Alice's mining car and the 1 pickaxe inside, can they still mine 2 portions of ore per minute?
In the above analogy, the four steps of mining are threads, the mining car is the core; the mine is the data unit that the smart contract needs to execute; the pickaxe is the execution unit; the program consists of two interdependent threads: you cannot execute thread 2 before thread 1 finishes executing. The amount of ore harvested means the performance of the program. The higher the performance, the higher the profit of Bob and Alice in mining. Think of the mine as memory from which you can get a unit of data (gold), so the process of picking a unit of ore in thread 1 is similar to reading a unit of data from memory.
Now, let's see what happens if Bob's mining cart breaks down. Bob and Alice need to share a cart, which is not a problem at first, but after the mining equipment is upgraded under the premise of ensuring mining efficiency, things become different; the amount of ore that can be loaded into the mining becomes the bottleneck of overall efficiency, because no matter how efficient the mining machine used is, the amount of ore that can be sent for grinding and processing is limited by the "maximum ore that the mining cart can accommodate". This is also the essence of Solana's parallel VM, core sharing.
Core Sharing
The final design element of Solana is "pipelining". Pipelining occurs when data needs to be processed through a series of steps and different hardware is responsible for each step. The key idea here is to take data that needs to run serially and parallelize it using pipelines. These pipelines can run in parallel, and each pipeline stage can process different batches of transactions. The higher the processing speed of the hardware (mining car loading capacity), the higher the parallelization throughput. Today, Solana's hardware node requirements have left its node operators with only one choice - data centers, which brings efficiency but deviates from the original intention of blockchain.
Data dependencies are not split (memory resource sharing)
After upgrading the mining car, because the mining capacity cannot keep up, the mining car is often not fully loaded. So Bob spent a high price to purchase a mining machine to improve the efficiency of mining (execution unit upgrade). The same 15 minutes can produce 10 ore, but because the ore grinding work is still done by hand as before, more ore produced per unit time cannot be converted into more gold, and more ore is squeezed in the warehouse; this example shows what happens when access to memory is the limiting factor of our program execution speed. It doesn't matter how fast we process data (i.e. how fast the core runs). We will be limited by how fast we can get the data. Slow I/O speeds will seriously bother us because I/O is the slowest part of the computer, and asynchronous reading of data will become critical. Even if Bob has a mining machine that can mine 10 minerals in 15 minutes, if there is memory access contention, they will still be limited to 2 minerals every 15 minutes. Existing parallel blockchain solutions to this problem are divided into two camps - pessimistic execution and optimistic execution. The former requires that the dependencies of data states be clearly defined before data is written and read, which requires developers to make upfront static control dependency assumptions, which often deviate from reality in the field of smart contract programming. The latter makes no assumptions or restrictions on writing data and rolls back if a conflict occurs. Take Monad's optimistic execution scheme as an example: the reality is that most of the workload is transaction execution, and there are not as many scenarios that occur in parallel as imagined. The following figure shows the source types of Ethereum's daily Gas Fee consumption; you will find that although this distribution is not as high as the popular smart contracts, in fact, different types of transactions are not evenly distributed. The optimistic execution of parallel logic is feasible in the Web2 era, because a large part of the requests of Web2 applications are access, not modification; for example, Taobao and Douyin, you don't have many opportunities to modify the status of these apps. However, in the Web3 field, it is just the opposite. Most smart contract requests are just to modify the status - update the ledger, which actually brings more rollbacks than expected, making the chain unavailable.

So the conclusion is that Monad can indeed achieve parallelism, but the degree of parallelism (Concurrency) has a theoretical upper limit, which is between 2 and 3 times, not the advertised 100K. Secondly, this upper limit cannot be expanded by adding virtual machines, that is, there is no way to achieve multi-core equal to increased processing power. Finally, it is a commonplace question. Because there is no sharding of data, Monad does not actually answer the requirements for nodes brought about by the expansion of the on-chain state. The following are its requirements for nodes, which have exceeded the limit that a home computer can bear. With the launch of its mainnet, if the data is not sharded, we may inevitably see Monad go the way of Solana. And the last and most important point is that optimistic execution is not suitable for parallelism in the blockchain field.

After mining for a while, Bob asked himself a question: "Why should I wait for Alice to come back before grinding? While she is grinding, I can load the car, because loading and grinding take exactly the same time, and we will definitely not encounter a state where we need to wait for grinding to be idle. Before Alice finishes mining, I will drive to continue mining, so that both of us can be 100% busy." This genius idea allowed them to return to twice the efficiency, and they didn't even need an additional mining car. Importantly, Bob redesigned the program, that is, the order in which threads execute, so that all threads will never be stuck waiting for shared resources within the core (such as mining cars and stone pickaxes).
This is the correct version of parallelism. By splitting the state of smart contracts, access to shared resources will neither cause a thread to enter the queue nor limit the speed of the final atomicity due to the data I/O pipeline.
The PREDA model exposes the access structure of the contract state to the execution layer at the time of contract code execution, so that the execution layer can easily and reasonably schedule and completely avoid the rollback of the execution result. This parallel mode is also called asynchronous parallelism.
Asynchronous parallelism

Because only parallelism is asynchronous, adding threads will lead to linear improvement. Unlike the previous example, the capacity of the mining vehicle is upgraded but the mining vehicle runs empty due to backward mining equipment. PREDA's parallel execution environment is fundamentally different from Moand and Solana - just like the difference between multi-core CPUs and GPUs, the processing efficiency of its shared core will not be a bottleneck for parallelism, and there is no problem of data dependency during I/O reading and writing. More importantly, the parallelism of PREDA's parallel model will increase with the increase of threads, which is similar to GPU. In the logic of the blockchain, the increase of threads (VM) will reduce the hardware requirements of the full node, thereby improving performance while ensuring decentralization.


Finally, to reach the end of this parallel blockchain, the industry lacks not only architectural design, but also the semantic expression of parallel programming languages.
Semantic expression of parallel programming languages
Just like Nvidia needs CUDA, parallel blockchains also need a new programming language: PREDA. Today's smart contract developers use parallel semantics to express, but cannot effectively utilize the support provided by the underlying multi-chain architecture (data sharding or execution sharding or both), and cannot achieve effective parallelization of general smart contract transactions. All systems use traditional common smart contract programming languages such as Solidity, Move, and Rust in terms of programming languages. These programming languages lack the ability to express parallel semantics, that is, they do not have the ability to express control flows and data flows between parallel units in parallel programming models and programming languages similar to CUDA in the field of high-performance computing or big data.
The lack of parallel programming models and programming languages suitable for smart contracts will result in the inability to complete the reconstruction of applications and algorithms from serial to parallel, resulting in the inability of applications and algorithms to adapt to the underlying blockchain system with parallel execution capabilities, thereby failing to improve the execution efficiency of applications and the overall throughput of blockchain systems.
This distributed programming model proposed by PREDA divides the contract state into fine-grained parts through programmatic contract scopes, and decomposes the transaction execution flow and distributes it on multiple parallel execution engines through functional relay semantics.
The model also defines the division scheme of contract state through programmable contract scope, allowing developers to optimize according to the access mode of the application. Through asynchronous functional relay, the transaction execution flow can be moved to the execution engine that needs to access the state to continue, realizing the movement of the execution process rather than the movement of data.
This model realizes the distributed division of contract state and the sharing of transaction traffic without developers having to care about the details of the underlying multi-chain system. Experimental results show that the PREDA model can achieve a throughput increase of up to 18 times on 256 execution engines, which is close to the theoretical parallel limit. The parallelism is further improved by using technologies such as partition counters and exchangeable instructions.
Conclusion
Blockchain systems traditionally use a single sequential execution engine (such as EVM) to process all transactions, which limits scalability. Multi-chain systems run parallel execution engines, but each engine processes all transactions of smart contracts, and scalability cannot be achieved at the contract level. This article discusses the essential core sharing of deterministic parallelism represented by Solana, and why optimistic parallelism represented by Monad cannot run stably in real blockchain application scenarios & The possibility of high-frequency rollbacks. And introduces PREDA's parallel execution engine. The PREDA team proposed a novel programming model to scale a single smart contract by dividing the state of the smart contract and distributing transaction traffic across execution engines. It introduces programmable contract scopes to define how the contract state is divided. Each scope runs on a dedicated execution engine. Asynchronous Functional Relay is used to decompose the transaction execution flow and move it across execution engines when the required state resides elsewhere.

This decouples transaction logic from contract state partitions, allowing inherent parallelism without data movement overhead. Its parallel model not only splits the state at the smart contract level, decouples the dependencies at the data publishing level, but also provides a Multi-Threaded execution engine cluster architecture similar to Move. More importantly, it innovatively launched a new programming model PREDA, which may be the last piece of the puzzle for blockchain parallelism.