A16z: Why is blockchain performance hard to measure?

2022/08/09 08:21

Original: https://a16zcrypto.com/why-blockchain-performance-is-hard-to-measure/

By Joseph Bonneau

Performance and scalability are much-discussed challenges in the crypto space, relevant to both layer 1 projects (standalone blockchains) and layer 2 solutions such as rollups and off-chain channels. However, we have no standardized metrics or benchmarks. Numbers are often reported inconsistently and incompletely, making accurate comparisons of projects difficult and often obscuring what is most important in practice.

We need a more granular and thorough approach to measuring and comparing performance—one that breaks down performance into components and compares them with trade-offs along multiple axes. In this post, I define basic terms, outline challenges, and provide guidelines and key principles to keep in mind when evaluating blockchain performance.

Scalability and Performance

First, let's define two terms, scalability and performance, which have standard computer science meanings and are often misused in the blockchain context. Performance measures what the system is currently capable of achieving. As we'll discuss below, performance metrics might include transactions per second or median transaction confirmation times. Scalability, on the other hand, measures a system's ability to increase performance by adding resources.

This distinction is important: if well defined, many ways to improve performance will not improve scalability at all. A simple example is using a more efficient digital signature scheme, such as BLS signatures, which are about half the size of Schnorr or ECDSA signatures. If Bitcoin switched from ECDSA to BLS, the number of transactions per block could increase by 20-30%, improving performance overnight. But we can only do this once - there is no more space-efficient signature scheme to switch (BLS signatures can also be aggregated to save more space, but this is a one-time trick).

Many other one-shot tricks (such as Segregated Witness) are possible in blockchains, but you need a scalable architecture to achieve continuous performance improvement, where adding more resources improves performance over time. This is also conventional wisdom in many other computer systems, such as building web servers. With a few common tricks, you can build a very fast server; but ultimately, you need a multi-server architecture that keeps adding additional servers to meet growing demand.

Understanding this distinction also helps avoid common class errors found in statements like "Blockchain X is highly scalable, it can process Y transactions per second!" The second statement may be impressive, but it's a performance metric, not a scalability metric. It does not account for the ability to increase performance by adding resources.

Scalability inherently requires exploiting parallelism. In the blockchain space, layer 1 scaling seems to require sharding or what looks like sharding. The basic concept of sharding — breaking state into chunks so that different validators can process it independently — fits nicely with the definition of scalability. Layer 2 has more options that allow for the addition of parallel processing — including off-chain channels, rollup servers, and sidechains.

Latency vs. Throughput

Traditionally, blockchain system performance is evaluated along two dimensions: latency and throughput: latency measures how quickly individual transactions can be confirmed, while throughput measures the aggregate rate of transactions over time. These axes apply to Tier 1 and Tier 2 systems, as well as many other types of computer systems such as database query engines and Web servers.

Unfortunately, both latency and throughput are difficult to measure and compare. Also, individual users don't really care about throughput (it's a system-wide measure). What they really care about is latency and transaction fees — more specifically, their transactions getting confirmed as quickly as possible and as cheaply as possible. While many other computer systems are also evaluated on a cost/performance basis, transaction fees represent a new performance axis for blockchain systems that does not exist in traditional computer systems.

Challenges of Measuring Latency

The delay seems simple at first: how long does it take for a transaction to be confirmed? But there are always several different ways to answer this question.

First, we can measure the delay between different time points and get different results. For example, do we start measuring latency when the user hits the local "submit" button, or when the transaction hits the mempool? Do we stop the clock when a transaction is in a proposed block, or when a block is confirmed by one or six subsequent blocks?

The most common way to measure this is from a validator's perspective, from the time a client first broadcasts a transaction to when it is reasonably "confirmed" (in the sense that a real-world merchant would consider receiving payment and sending out an item). Of course , different merchants may have different acceptance criteria, and even a single merchant may have different criteria based on the transaction amount.

The validator-centric approach ignores a few things that are important in practice. First, it ignores latency on the peer-to-peer network (how long does it take for a client to broadcast a transaction until a majority of nodes hear it?) and client-side latency (how long does it take for the transaction to be prepared on the client's local machine?). For simple transactions like signing an Ethereum payment, client-side latency can be very small and predictable, but for more complex cases like proving that shielded Zcash transactions are correct, it can be significant.

Even if we normalize the time window we're trying to measure with latency, the answer almost always depends on it. There has never been a cryptocurrency system that offered fixed transaction latency. The basic rules of thumb to remember are:

Latency is a distribution, not a number.

The network research community has long understood this. Special emphasis is placed on the "long tail" of the distribution, since high latency in even 0.1% of transactions (or web server queries) can severely impact end users.

In a blockchain, confirmation delays can vary for a number of reasons:

Batching: Most systems batch transactions in some fashion, such as blocks on most layer 1 systems. This causes variable latency as some transactions have to wait until the batch fills up. Others may get lucky and join the batch last. These transactions are confirmed instantly without any additional delay.

Variable congestion: Most systems suffer from congestion, meaning that there are (at least some of the time) more transactions to process than the system can handle at once. Congestion levels can increase when transactions are broadcast at unpredictable times (often abstracted as a Poisson process), or when the rate of new transactions changes over the course of a day or week, or in response to external events such as popular NFT issuances. different.

Consensus layer differences: Confirming transactions at layer 1 typically requires a distributed set of nodes to reach consensus on blocks, which can add variable latency without being affected by congestion. A proof-of-work system discovers blocks at unpredictable times (also abstracted as a Poisson process). Proof-of-stake systems can also add various delays (for example, if there are not enough nodes online to form a committee in a round, or if views need to be changed in response to a leader crashing).

For these reasons, a good guideline is:

Statements about latency should present a distribution (or histogram) of confirmation times, not a single number like mean or median.

While summary statistics such as means, medians, or percentiles provide part of the picture, an accurate assessment of a system requires consideration of the entire distribution. In some applications, average latency can provide good insight if the latency distribution is relatively simple (eg, Gaussian). But in cryptocurrencies, this is almost never the case: usually, the confirmation time will be very long.

Payment channel networks such as the Lightning Network are a good example. As classic L2 scaling solutions, these networks provide very fast payment confirmations in most cases, but sometimes they require channel resets, which can add orders of magnitude to latency.

Even if we have good statistics on the exact latency distribution, they may vary over time as systems and system requirements change. It's also not always clear how to compare latency distributions between competing systems. For example, consider a system that confirms transactions with a uniformly distributed latency between 1 and 2 minutes (mean and median 90 seconds). If a competing system accurately confirms 95% of transactions within 1 minute and the other 5% within 11 minutes (average 90 seconds, median 60 seconds), which system is better? The answer may be that some apps prefer the former and some prefer the latter.

Finally, it is important to note that in most systems not all transactions have the same priority. Users can pay more to get higher inclusion priority, so in addition to all of the above, the latency depends on the transaction fee paid. In short:

Latency is complicated. The more data reported, the better. Ideally, the complete delay profile should be measured under different congestion conditions. Breaking down latency into different components (local, network, batch, consensus latency) is also helpful.

The Challenge of Measuring Throughput

Throughput also seems simple at first glance: how many transactions per second can a system handle? Two main difficulties arise: what exactly is a "transaction" and are we measuring what a system does today, or what it might be able to do?

While "transactions per second" (tps) is the de facto measure of blockchain performance, transactions as a unit of measure are problematic. For a system that offers general programmability ("smart contracts") or even limited functionality like Bitcoin's multi-transaction or multi-signature verification options, the fundamental questions are:

Not all deals are created equal.

This is clearly true in Ethereum, where transactions can include arbitrary code and modify state arbitrarily. The concept of gas in Ethereum is used to quantify (and charge a fee for) the total work of a transaction, but this is highly relevant to the EVM execution environment. There is no easy way to compare the total amount of work done by a set of EVM transactions to a set of Solana transactions using a BPF environment. Comparing any of these to a set of bitcoin transactions is equally worrisome.

A blockchain that separates the transaction layer into a consensus layer and an execution layer can make this clearer. In a (pure) consensus layer, throughput can be measured in bytes added to the chain per unit of time. Execution layers are always more complex.

A simpler execution layer, such as a rollup server that only supports payment transactions, avoids the difficulty of quantitative calculations. However, even in this case, the amount of input and output paid will be different. The number of "hops" required for a payment channel transaction can vary, which affects throughput. The throughput of a rollup server may depend on how far a batch of transactions can be "broken down" into a smaller set of aggregated changes.

Another challenge with throughput is going beyond empirically measuring today's performance to assess theoretical capacity. This introduces various modeling problems to assess potential capacity. First, we must determine the actual transactional workload at the execution layer. Second, real systems almost never reach theoretical capacity, especially blockchain systems. For robustness reasons, we want node implementations to be heterogeneous and diverse in practice (as opposed to all clients running a single software implementation). This makes accurate simulations of blockchain throughput more difficult.

Overall:

Throughput claims require careful explanation of transaction workload and number of validators (their number, implementation, and network connections). In the absence of any clear standard, historical workloads from popular networks like Ethereum will suffice.

Latency-throughput tradeoff

Latency and throughput are often a tradeoff. As noted by Lefteris Kokoris-Kogias, this trade-off is often not smooth, with latency increasing dramatically as the system load approaches its maximum throughput.

Zero-knowledge rollup systems provide a natural example of the throughput/latency tradeoff. Large batches of transactions increase proof time and thus latency. However, in terms of proof size and verification cost, the on-chain footprint will be amortized across more transactions with larger batches, increasing throughput.

transaction fee

End users are understandably more concerned with the tradeoff between latency and cost than latency and throughput. Users have no direct reason to care about throughput at all, just that they can confirm transactions quickly with the lowest possible fees (some users care more about fees, others care about latency). Overall, costs are influenced by a number of factors:

How much market demand is there to trade?
What is the total throughput achieved by the system?
How much total revenue does the system provide validators or miners?
How much of this revenue is based on transaction fees versus inflation rewards?

The first two factors are roughly the supply and demand curve leading to a market clearing price (although there are claims that miners act as a cartel to drive fees above this point). All else being equal, more throughput should result in lower fees, but there's a lot more to it.

Especially the 3rd and 4th points above are the basic problems of blockchain system design, but we lack good principles for them. We have some understanding of the pros and cons of giving miners income from inflation rewards vs. transaction fees. However, despite many economic analyzes of blockchain consensus protocols, we still do not have a widely accepted model for determining how much revenue needs to flow to validators. Most systems today are built on educated guesses about how much revenue is enough for validators to act honestly without getting in the way of actual use of the system. In a simplified model, it can be shown that the cost of launching a 51% attack is proportional to the reward to the validator.

Raising the cost of attack is a good thing, but we also don't know how much security is "enough". Imagine you are considering going to two amusement parks. One of them claims to spend 50% less on ride maintenance than the other. Is it a good idea to go to this park? It could be that they are more efficient and get the same security for less money. Maybe the other person is spending more than it takes to keep the rides safe for no benefit. But it could also be that the first park is dangerous. Blockchain systems are similar. Once throughput is factored in, blockchains with lower fees have lower fees because they reward (and thus incentivize) fewer validators. We don't have good tools today to assess whether this is possible, or whether it would leave the system vulnerable. Overall:

Comparing fees between different systems can be misleading. Although transaction fees are important to users, they are affected by many factors besides the system design itself. Throughput is a better metric for analyzing the overall system.

in conclusion

Assessing performance fairly and accurately is difficult. The same applies to measuring a car's performance. Just like blockchain, different people will care about different things. With a car, some users will care about top speed or acceleration, others will care about fuel consumption, and still others will care about towing capacity. All of these are not easy to evaluate. In the United States, for example, the Environmental Protection Agency provides detailed guidelines on how gas mileage is assessed and how it must be presented to users at dealerships.

The blockchain space is still a long way from this level of standardization. In some areas, we may in the future measure the throughput of the system through a normalized workload or a normalized graph for presenting the latency distribution. For now, the best course of action for evaluators and builders is to collect and publish as much data as possible, and describe the evaluation methodology in detail so that it can be replicated and compared with other systems.

Hard Fork

Gain a broader understanding of the crypto industry through informative reports, and engage in in-depth discussions with other like-minded authors and readers. You are welcome to join us in our growing Coinlive community:https://t.me/CoinliveSG

Add Comment

LoginLeave your comments

0 Comments

Earliest

Load more comments

Live Updates

3 hours ago
BNB Drops Below 600 USDT with a Narrowed 0.20% Increase in 24 Hours
Bullish
Bearish
3 hours ago
SOL falls below 150 USDT
Bullish
Bearish
3 hours ago
Bitcoin Rises as ‘Magnificent 7’ Earnings Fuel Risk-Taking
Bullish
Bearish
3 hours ago
How Much Has MicroStrategy Gained Since Adopting the Bitcoin Standard?
Bullish
Bearish
3 hours ago
Bitcoin on Cusp of Moving Higher Amid Bullish Signal, Says Crypto Analyst Kevin Svenson – Here Are His Targets
Bullish
Bearish
3 hours ago
House Republicans to unveil landmark draft digital asset bill ahead of key crypto hearing next week
Bullish
Bearish
3 hours ago
Bitcoin Primed for New All-Time High if BTC Breaks Above This Resistance Level, Says Crypto Trader
Bullish
Bearish
3 hours ago
Bunq Enters Crypto Market, Offers Over 300 Coins in App
Bullish
Bearish
3 hours ago
Eric Trump tells TOKEN2049 that traditional banks are biased, outdated and bloated, unlike crypto
Bullish
Bearish
3 hours ago
Crypto to accelerate AI adoption — LONGITUDE panel
Bullish
Bearish

A16z: Why is blockchain performance hard to measure?

Live Updates

Trending News

Israeli Web3 Startup and MetaMask Partner Blockaid Raises $33M

50% Explosion As Upbit Lists MINA for KRW Trading [Updated]

China's Progress with Digital Yuan Signals Potential for De-Dollarisation

Apple's AI Catch-up: Apple GPT and A Smarter Siri?

Standard Chartered and Deutsche Bank Complete SWIFT Substitute Trial

Dark Web Murder Plot Unveiled: Doctor's Shocking $25,000 Bitcoin Scheme

Bitcoin Hits New Heights in 2023 as ETF Approval Draws Near

China's New Central Bank Governor Vows To Crack Down On Crypto Speculation

US-China Tech War? US Restricting China's Access to Cloud Technologies

Tron USDT An Alternative For Legal Marijuana Businesses - But Is It Safe?