Original title: "Exploring circle STARKs" Author: Vitalik Buterin, co-founder of Ethereum Translator: Chris, Techub News
The premise for understanding this article is that you already understand the basic principles of SNARKs and STARKs. If you are not familiar with this, it is recommended that you read the first few parts of this article to understand the basics.
In recent years, the trend in STARKs protocol design has been to move towards using smaller fields. The earliest STARKs production implementations used 256-bit fields, that is, modular operations on large numbers (such as 21888...95617), which made these protocols compatible with elliptic curve-based signatures. However, this design is relatively inefficient. In general, there is no practical use for processing and calculating these large numbers, and a lot of computing power is wasted, such as processing X (representing a number) and processing four times X, processing X only takes one-ninth of the computing time. To address this, STARKs began to move to smaller fields: first Goldilocks, then Mersenne31 and BabyBear.
This shift has led to faster proofs, with Starkware being able to prove 620,000 Poseidon2 hashes per second on an M3 laptop, for example. This means that as long as we are willing to trust Poseidon2 as a hash function, the challenges of developing an efficient ZK-EVM can be solved. So how do these techniques work? How are these proofs built on smaller fields? How do these protocols compare to solutions like Binius? This post explores those details, focusing specifically on a scheme called Circle STARKs (implemented in Starkware’s stwo, Polygon’s plonky3, and my own version in Python) that has the unique property of being compatible with Mersenne31 fields.
Common Problems with Small Mathematical Fields
When creating hash-based proofs (or any kind of proof), a very important trick is to be able to indirectly verify properties of the polynomial by proving the evaluation of the polynomial at a random point. This approach can greatly simplify the proof process, because evaluation at a random point is often much easier than processing the entire polynomial.
For example, suppose a proof system requires you to generate a commitment to a polynomial, A, that A^3 (x) + x - A (\omega*x) = x^N (a very common type of proof in ZK-SNARK protocols). The protocol may require you to choose a random coordinate ? and prove that A (r) + r - A (\omega*r) = r^N. Then, working backwards from A (r) = c, you prove that Q = \frac {A - c}{X - r} is a polynomial.
If you know the details or internal mechanisms of the protocol in advance, you may find ways to bypass or crack these protocols. The specific operations or methods to achieve this may be mentioned next. For example, to satisfy the A (\omega * r) equation, you can set A (r) to 0 and then let A be a straight line passing through the two points.
To prevent these attacks, we need to choose r after the attacker has provided A (Fiat-Shamir is a technique used to enhance the security of protocols by making certain parameters a hash of some kind to avoid attacks. The parameters chosen need to be from a large enough set to ensure that the attacker cannot predict or guess them, thereby improving the security of the system.
In elliptic curve-based protocols and 2019-era STARKs, this problem is simple: all our math is done on 256-bit numbers, so we can just choose r to be a random 256-bit number and be done with it. However, in STARKs on smaller fields, we run into a problem: there are only about 2 billion possible values of r to choose from, so an attacker who wants to forge a proof only needs to try 2 billion times, which is a lot of work, but still completely doable for a determined attacker!
There are two solutions to this problem:
Performing multiple random checks is the simplest and most effective method: instead of checking at one coordinate, repeat the check at four random coordinates. This is theoretically possible, but there are efficiency issues. If you are dealing with a polynomial with a degree less than a certain value and operating on a field of a certain size, an attacker can actually construct malicious polynomials that look normal at these positions. Therefore, whether they can successfully break the protocol is a probabilistic problem, so it is necessary to increase the number of rounds of checks to enhance the overall security and ensure that it can effectively defend against attackers.
This leads to another solution: extended fields, which are similar to complex numbers, but are based on finite fields. We introduce a new value, denoted as α\alphaα, and declare that it satisfies certain relations, such as α2=some value\alpha^2 = \text {some value}α2=some value. In this way, we have created a new mathematical structure that enables more complex operations on finite fields. In this extended field, the calculation of multiplication becomes a multiplication with a new value α\alphaα. We can now operate on pairs of values in the extended field, not just single numbers. For example, if we are working on a field like Mersenne or BabyBear, such an extension allows us to have a wider choice of values, thus increasing security. To further increase the size of the field, we can repeatedly apply the same technique. Since we are already working with α\alphaα, we need to define a new value, which in Mersenne31 manifests itself as choosing α\alphaα such that α2=some value\alpha^2 = \text {some value}α2=some value.
Code (you can improve it with Karatsuba)
By extending the domain, we now have enough values to choose from to meet our security needs. If we are dealing with polynomials of degree less than ddd, we can provide 104 bits of security per round. This means we have enough security. If we need to reach a higher security level, such as 128 bits, we can add some additional computational work to the protocol to enhance security.
Extended fields are only actually used in the FRI (Fast Reed-Solomon Interactive) protocol and other scenarios that require random linear combinations. Most of the math is still done on the base fields, which are usually fields modulo ppp or qqq. At the same time, almost all of the data hashed is done on the base fields, so only four bytes are needed per value. This allows the efficiency of small fields to be exploited while allowing larger fields to be used when security enhancements are needed.
Regular FRI
When building a SNARK or STARK, the first step is usually arithmetization. This is the transformation of an arbitrary computational problem into an equation where some of the variables and coefficients are polynomials. Specifically, this equation will usually look like P (X,Y,Z)=0P (X, Y, Z) = 0P (X,Y,Z)=0, where P is a polynomial, X, Y, and Z are given parameters, and the solver needs to provide values for X and Y. Once you have such an equation, the solution to it corresponds to the solution to the underlying computational problem.
To prove that you have a solution, you need to show that the values you came up with are indeed sensible polynomials (not fractions, or look like one polynomial in some places but a different polynomial in others, etc.), and that these polynomials have a certain maximum degree. To do this, we use a trick of random linear combinations, applied step by step:
Suppose you have an evaluation of a polynomial A, and you want to prove that its degree is less than 2^{20}
Consider the polynomials B(x^2) = A(x) + A(-x), C(x^2) = \frac {A(x) - A(-x)}{x}
D is a random linear combination of B and C, i.e. D=B+rCD = B + rCD=B+rC, where r is a random coefficient.
Essentially, what happens is that B isolates the even coefficients of A, and ? isolates the odd coefficients. Given B and C, you can recover the original polynomial A: A (x) = B (x^2) + xC (x^2). If the degree of A is indeed less than 2^{20}, then the degrees of B and C will be the degree of A and the degree of A minus 1, respectively. This is because the combination of even and odd terms does not increase the degree. Since D is a random linear combination of B and C, the degree of D should also be the degree of A, which allows you to verify whether the degree of A meets the requirements through the degree of D.
First, FRI simplifies the verification process by reducing the problem of proving that a polynomial has degree d to the problem of proving that a polynomial has degree d/2. This process can be repeated many times, reducing the problem by half each time.
FRI works by repeating this simplification process. For example, if you start by proving that the polynomial is of degree d, through a series of steps you will eventually prove that the polynomial is of degree d/2. After each simplification, the degree of the resulting polynomial is gradually reduced. If the output of a stage is not the expected polynomial degree, then the check will fail for that round. If someone tries to push a polynomial that is not of degree d through this technique, then in the second round the output will have a certain probability of not being the expected degree, and in the third round it will be more likely to be not the expected degree, and the final check will fail. This design can effectively detect and reject non-conforming inputs. If a dataset is equal to a polynomial of degree d in most positions, it is possible that the dataset will pass FRI verification. However, in order to construct such a dataset, the attacker needs to know the actual polynomial, so even a slightly flawed proof shows that the prover was able to generate a "real" proof.
Let's look at what's happening here in more detail, and the properties required to make this all work. At each step, we reduce the degree of the polynomial by half, and we also reduce the set of points (the set of points we care about) by half. The former is the key to making the FRI (Fast Reed-Solomon Interactive) protocol work. The latter makes the algorithm run extremely fast: since each round is half the size of the previous round, the total computational cost is O (N) instead of O (N*log (N)).
To achieve this gradual reduction of the domain, a two-to-one mapping is used, that is, each point is mapped to one of two points. This mapping allows us to reduce the size of the dataset by half. An important advantage of this two-to-one mapping is that it is reproducible. That is, when we apply this mapping, the resulting set still retains the same properties. Suppose we start with a multiplicative subgroup. This subgroup is a set S where every element x has its multiple 2x in the set. If we square the set S (i.e. map every element x in the set to x^2), the new set S^2 will also retain the same properties. This operation allows us to continue to reduce the size of the data set until we are left with only one value. While in theory we could reduce the data set to just one value, in practice we often stop before reaching a smaller set. You can think of this operation as stretching a line (e.g., a line segment or arc) along the circumference of a circle so that it rotates around the circumference twice. For example, if a point is at x degrees on the circumference, after this operation it will move to 2x degrees. For every point on the circumference from 0 to 179 degrees, there is a corresponding point from 180 to 359 degrees, and these two points will coincide. This means that when you map a point from x degrees to 2x degrees, it will coincide with a point at x+180 degrees. This process is repeatable. That is, you can apply this mapping operation multiple times, each time moving the points on the circumference to their new positions.
For the mapping technique to be effective, the size of the original multiplicative subgroup needs to have a large power of 2 as a factor. BabyBear is a system with a modulus such that its largest multiplicative subgroup contains all non-zero values, so the subgroup size is 2k−1 (where k is the number of bits in the modulus). A subgroup of this size is well suited for the above technique because it allows the degree of a polynomial to be efficiently reduced by repeatedly applying the mapping operation. In BabyBear, you can choose a subgroup of size 2^k (or just use the entire set), then apply the FRI method to gradually reduce the degree of the polynomial to 15, and check the degree of the polynomial at the end. This method exploits the properties of the modulus and the size of the multiplicative subgroup to make the computation very efficient. Mersenne31 is another system where the modulus is a value such that the size of the multiplicative subgroup is 2^31 - 1. In this case, the size of the multiplicative subgroup has only one power of 2 as a factor, which means that it can only be divided by 2 once. The above techniques are no longer applicable to further processing, that is, it is not possible to use FFTs for efficient polynomial degree reduction as in BabyBear.
The Mersenne31 domain is well suited for arithmetic operations in existing 32-bit CPU/GPU operations. Because of the nature of the modulus (e.g. 2^{31} - 1), many operations can be performed using efficient bit manipulation: if the result of adding two numbers exceeds the modulus, the result can be reduced to a suitable range by shifting the modulus. Shifting is a very efficient operation. In multiplication operations, special CPU instructions (usually called high-bit shift instructions) can be used to process the result. These instructions can efficiently calculate the high-bit part of the multiplication, thereby improving the efficiency of the operation. In the Mersenne31 domain, arithmetic operations are about 1.3 times faster than in the BabyBear domain due to the above properties. Mersenne31 provides higher computational efficiency. If FRI can be implemented in the Mersenne31 domain, this will significantly improve computational efficiency, making the application of FRI more efficient.
Circle FRI
This is the cleverness of Circle STARKs. Given a prime number p, you can find a group of size p that has a similar two-to-one property. This group consists of all points that meet certain specific conditions, for example, the set of points where x^2 mod p is equal to a certain value.
These points follow an addition rule that may look familiar to you if you’ve done any trigonometry or complex number multiplication recently.
(x_1, y_1) + (x_2, y_2) = (x_1x_2 - y_1y_2, x_1y_2 + x_2y_1)
The double form is:
2 * (x, y) = (2x^2 - 1, 2xy)
Now, let's focus on the odd-numbered points on this circle:
This mapping actually reduces the size of the above set by half each time, and what's happening here is that each x in a sense represents two points: (x, y) and (x, -y). And (x → 2x^2 - 1) is the point doubling rule from above. So we're converting the x coordinates of two opposite points on the circle to the x coordinates of the doubled points.
For example, if we take the second value from the right on the circle, 2, and apply the mapping 2 → 2 (2^2) - 1 = 7, we get 7. If we go back to the original circle, (2, 11) is the third point from the right, so after doubling it, we get the sixth point from the right, which is (7, 13).
This could have been done in two dimensions, but operating in one dimension makes the process more efficient.
Circle FFTs
A closely related algorithm to FRI is the FFT, which converts n evaluations of a polynomial of degree less than n into n coefficients of the polynomial. The FFT process is similar to the FRI, except that instead of generating random linear combinations of f_0 and f_1 at each step, the FFT recursively performs half-sized FFTs on them and takes the output of the FFT (f_0) as the even coefficients and the output of the FFT (f_1) as the odd coefficients.
Circle groups also support FFTs, which are constructed in a similar manner to the FRI. However, a key difference is that the objects operated on by the Circle FFT (and Circle FRI) are not strictly polynomials. Instead, they're what's known in mathematics as a Riemann-Roch space: in this case, the polynomials are modulo the circle ( x^2 + y^2 - 1 = 0 ). That is, we treat any multiple of x^2 + y^2 - 1 as zero. Another way to think of it is that we only allow one power of y: wherever a y^2 term appears, we replace it with 1 - x^2 .
This also means that the coefficients output by the Circle FFT are not monomials like regular FRI (e.g., if the regular FRI output is [6, 2, 8, 3], then we know that this means P (x) = 3x^3 + 8x^2 + 2x + 6 ). In contrast, the coefficients of the Circle FFT are a basis specific to the Circle FFT: {1, y, x, xy, 2x^2 - 1, 2x^2y - y, 2x^3 - x, 2x^3y - xy, 8x^4 - 8x^2 + 1...}
The good news is that as a developer you can almost completely ignore this. STARKs never require you to know the coefficients. Instead, you just store the polynomial as a set of evaluations over a specific domain. The only place you need to use an FFT is to do a (similar) low-degree expansion of the Riemann-Roch space: given N values, generate k*N values that are all over the same polynomial. In this case, you can use the FFT to generate the coefficients, append (k-1)n zeros to those coefficients, and then use the inverse FFT to get a larger set of evaluations.
The Circle FFT is not the only special FFT type. Elliptic curve FFTs are more powerful because they work over any finite field (prime fields, binary fields, etc.). However, the ECFFT is more complicated and less efficient, so since we can use the Circle FFT over p = 2^{31}-1, we choose to use that.
From here we will get into some of the more arcane details of implementing Circle STARKs differently than regular STARKs.
Quotienting
A common operation in STARK protocols is to take a quotient of specific points, which can be chosen deliberately or randomly. For example, if you want to prove that P (x) = y, you can do it by following these steps:
Compute the quotient: Given a polynomial P (x) and a constant y, compute the quotient Q = {P - y}/{X - x}, where X is the chosen point.
Prove the polynomial: Prove that Q is a polynomial, not a fractional value. In this way, it is proved that P (x) = y is true.
In addition, in the DEEP-FRI protocol, the random selection of evaluation points is to reduce the number of Merkle branches, thereby improving the security and efficiency of the FRI protocol.
When dealing with the STARK protocol of circle groups, since there is no linear function that can pass through a single point, different techniques need to be used to replace the traditional quotient calculation method. This usually requires designing new algorithms using the unique geometric properties of circle groups.
In a circle group, you can construct a tangent function that is tangent to a point (P_x, P_y), but this function will have multiplicity 2 through that point, that is, for a polynomial to be a multiple of the linear function, it must satisfy a stricter condition than just being zero at that point. Therefore, you can't prove the evaluation result at just one point. So how do we deal with this?
We have to accept this challenge and prove it by evaluating at two points, thereby adding a dummy point that we don't need to care about.
Line function: ax + by + c. If you turn this into an equation and force it to equal 0, then you might recognize it as a line in what's called standard form in high school math.
If we have a polynomial P that equals y_1 at x_1 and y_2 at x_2, we can choose an interpolation function L that equals y_1 at x_1 and y_2 at x_2. This can be simply expressed as L = \frac {y_2 - y_1}{x_2 - x_1} \cdot (x - x_1) + y_1.
We then show that P is equal to y_1 at x_1 and y_2 at x_2 by subtracting L (so that P - L is zero at both points) and then showing that the quotient Q is a polynomial by dividing L (which is a linear function between x_2 - x_1).
Vanishing polynomials
In STARK, the polynomial equation you are trying to prove typically looks like C (P (x), P ({next}(x))) = Z (x) ⋅ H (x) , where Z (x) is a polynomial that is equal to zero over the entire original evaluation domain. In a regular STARK, this function is simply x^n - 1. In a circular STARK, the corresponding function is: Z_1 (x,y) = y Z_2 (x,y) = x Z_{n+1}(x,y) = (2 * Z_n (x,y)^2) - 1 Note that you can deduce the vanishing polynomial from the folding function: in a regular STARK, you repeatedly use x → x^2, while in a circular STARK, you repeatedly use x → 2x^2 - 1. However, you do things differently for the first round, because the first round is special.
Reverse bit order
In STARKs, polynomial evaluations are typically not arranged in a “natural” order (e.g. P(1),P(ω),P(ω2),…,P(ωn−1), but in what I call reverse bit order:
P(\omega^{\frac {3n}{8}})
If we set n=16, and only care about which powers of ω we evaluate at, then the list looks like this:
{0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15}
This ordering has a key property that values that were grouped together early in the FRI evaluation process appear to be next to each other in the ordering. For example, the first step of FRI groups x and -x. In the case of n = 16, ω^8 = -1, which means that P (ω^i) ) and ( P (-ω^i) = P (ω^{i+8}) \. As we can see, these are exactly the pairs that are next to each other. The second step of FRI groups P (ω^i) , P (ω^{i+4}) , P (ω^{i+8}) , and P (ω^{i+12}) . These are exactly the groups of four that we saw, and so on. Doing this makes FRI more space-efficient, because it allows you to provide a Merkle proof for the values that are folded together (or for all 2^k values if you fold them k rounds at a time).
In circle STARKs, the folding structure is slightly different: in the first step, we pair (x, y) with (x, -y); in the second step, we pair x with -x; in the subsequent step, we pair p with q, choosing p and q such that Q^i (p) = -Q^i (q), where Q^i is the mapping x → 2x^2-1 repeated i times. If we think of the points in terms of their position on a circle, then at each step this looks like the first point is paired with the last point, the second point is paired with the second-to-last point, and so on.
To adjust the reverse bit order to reflect this folded structure, we need to flip every bit except the last. We keep the last bit and use it to decide whether to flip the other bits.
A folded reverse order of size 16 is as follows:
{0, 15, 8, 7, 4, 11, 12, 3, 2, 13, 10, 5, 6, 9, 14, 1}
If you look at the circle in the previous section, you will find that the points 0, 15, 8, and 7 (starting from the right and going counterclockwise) are (x, y), (x, -y), (-x, -y), and (-x, -y). (-x, y) , which is exactly what we need.
Efficiency
In Circle STARKs (and 31-bit prime STARKs in general), these protocols are very efficient. A computation proven in Circle STARK typically involves several types of computation:
1. Raw arithmetic: used for business logic, such as counting.
2. Raw arithmetic: used for cryptography, such as hash functions like Poseidon.
3. Lookup parameters: a general efficient way to perform various computations by reading values from a table.
The key measure of efficiency is whether you are fully utilizing the space in the computation trace to do useful work, or whether you are leaving a lot of free space. In SNARKs with large fields, there tends to be a lot of free space: business logic and lookup tables often involve computations on small numbers (which tend to be smaller than N, where N is the total number of computation steps, so in practice smaller than 2^{25}), but you still pay the cost of using fields of size 2^{256} bits. Here, the field size is 2^{31}, so the wasted space isn't that large. Low-complexity hashes designed for SNARKs (e.g. Poseidon) fully utilize every bit of every number in the trace in any field.
Binius is a better solution than Circle STARKs because it allows you to mix fields of different sizes, allowing for more efficient bit packing. Binius also provides the option of doing 32-bit additions without the overhead of lookup tables. However, these advantages come at the (in my opinion) cost of more conceptual complexity, whereas Circle STARKs (and regular STARKs based on BabyBear) are much simpler in concept.
Conclusion: My thoughts on Circle STARKs
Circle STARKs are no more complex for developers than STARKs. In implementation, the three issues mentioned above are essentially the differences from regular FRI. Although the math behind the polynomials that Circle FRI operates on is quite complex, and it takes some time to understand and appreciate that math, that complexity is actually hidden from developers.
Understanding Circle FRI and Circle FFTs can also be a gateway to understanding other special FFTs: most notably binary domain FFTs, such as those used by Binius and the previous LibSTARK, but also more complex constructions such as elliptic curve FFTs, which use a few-to-many mapping that combines well with elliptic curve point operations.
Combined with Mersenne31, BabyBear, and binary domain technologies like Binius, we do feel like we are approaching the efficiency limits of the STARKs base layer. At this point, I expect the following optimization directions for STARKs to be focused on:
Maximize efficiency of hash functions and signatures: Optimize basic cryptographic primitives such as hash functions and digital signatures to the most efficient form so that they can achieve the best performance when used in STARK proofs. This means making dedicated optimizations for these primitives to reduce the amount of computation and improve efficiency.
Perform recursive construction to enable more parallelization: Recursive construction refers to recursively applying the STARK proof process to different levels to increase parallel processing capabilities. In this way, computing resources can be used more efficiently and the proof process can be accelerated.
Arithmetically process the virtual machine to make it more efficient and simpler for developers to create and run computing tasks. This includes optimizing the virtual machine for use in STARK proofs and improving the developer experience when building and testing smart contracts or other computing tasks.