### Overview

In this series of posts, we will look at zero-knowledge proofs (ZKPs): a family of probabilistic protocols, first described by Goldwasser, Micali and Rackoff in 1985, which has garnered increased popularity with the rise of distributed ledger technology (DLT). We will start by introducing some of the theory behind this groundbreaking work and showing how to implement the different components of its intricate machinery. Eventually, this will result in an end-to-end zero-knowledge proof for a toy problem, following the Pinocchio protocol and implemented in Python. The main goal is to provide a gentle introduction to the topic, breaking down the terse mathematics involved in ZKPs and discussing some of the intuition behind it. Along the way, we will also touch on some of the exciting applications, enabled by this technology.

So, what is a ZKP and why should you care?

A ZKP allows a prover, let’s call her Peggy, to demonstrate beyond any reasonable doubt to a verifier, let’s call him Victor, that she knows some secret without revealing what the secret is. For example, Peggy might want to prove to Victor that she knows the factorization of a very large non-prime number without revealing the factors; or that she knows the solution to a given Sudoku puzzle without revealing it. More generally, ZKPs can be used as the building blocks for verifiable computation: a method of offloading computation from a weak client to a more computationally powerful worker, which enables the client to cryptographically verify the correctness of the computation that was carried out by the worker with a much smaller computational effort compared to executing the original program. This is an extremely powerful paradigm, which affects online privacy, scalability of DLT and mobile phone applications, as well as the security of cloud computing, among many other applications.

Another very promising application of ZKPs lies in the field of machine learning. In this context, the technology can be used to prove possession of specific data without the need of full disclosure as well as to enable 3rd parties to verfiy that specific data has been used in the process of model training or prediction. OpenMined is a pioneering project that is actively working on multiple cryptographic primitives to create an ecosystem for private and secure machine-learning.

But what is a ZKP, really?

To oversimplify: represented on a computer, a ZKP is nothing more than a sequence of numbers, carefully computed by Peggy, together with a bunch of boolean checks that Victor can run in order to verify the proof of correctness for the computation. A zero-knowledge protocol is thus the mechanism used for deriving these numbers and defining the verification checks.

### Non-interactiveness and succinctness

There is a growing body of research in the field of zero-knowledge and an increasing number of protocols have been suggested since the original publication of Goldwasser, Micali and Rackoff: 1, 2, 3, 4, 5.

One of the main differentiators between these protocols is interactiveness. In an interactive zero-knowledge protocol, the prover and the verifier exchange multiple messages with each other, until the verifier is convinced that the prover knows some secret. Each communication round can be seen as a challenge presented by Victor to which Peggy must come up with the correct answer, which is then verified by Victor and accepted only if it passes all the boolean checks specified by the protocol. The probability of Peggy not knowing the secret and being able to successfully complete $N$ rounds decreases exponentially with $N$. Thus, the back-and-forth communication can continue arbitrarily long until Victor is convinced beyond any reasonable doubt that Peggy knows the secret, despite the fact that he doesn't learn anything about it.

In contrast, in non-interactive zero knowledge protocols there is no repeated communication between the prover and the verifier. Instead, there is only a single "round", which can be carried out asynchronously. Using publicly available data, the contents of which will be discussed in subsequent sections and posts, Peggy generates a proof, which she publishes in a place accessible to Victor (e.g. on a distributed ledger). Following this, Victor can verify the proof at any point in time to complete the “round”. Note that even though Peggy produces only a single proof, as opposed to multiple ones in the interactive version, the verifier can still be certain that except for negligible probability, she does indeed know the secret she is claiming. Consequently, one advantage of the non-interactive approach is that it is verifier independent. Peggy’s prove is universal for the problem instance at hand (e.g. a given Sudoku puzzle) and any verifier, not only Victor, can later verify Peggy’s claim that she has a solution.

Another dimension along which zero-knowledge protocols differ is succinctness, which deals with the size of the proof generated by Peggy. Succinctness plays an important role for ZKPs published on distributed ledgers, for two reasons: storage space and verification time. Storage can be extremely expensive, e.g. at current prices for Bitcoin, Ethereum, NEO and EOS it costs $\$3.83, \$2.86, \$19.38 $and$ \$3.00$ to store 1KB of data on each of those ledgers, respectively, and at peak levels the cost can be much higher! The effects of variable price and expensive storage can be offest by using distributed ledgers which offer cheap and constant pricing, such as Factom ($\$0.001 $per 1KB). Nonetheless, it is still desirable to keep the size of the proof as small as possible. In addition, proofs need to be verifiable quickly, otherwise the gain of offloading a computation to a 3rd party diminishes. Therefore, for on-chain applications it’s crucial that the generated proofs are as short as possible and ideally have a size independent of the problem instance as well as constant verification times. A natural question to ask is: are there cases in which interactive proofs are preferrable? It turns out that many non-interactive protocols suffer from an annoying problem, commonly referred to as “trusted setup”, which does not exist for their interactive counterparts. The trusted setup is the process which generates part of the public data used by Peggy when computing her proof. We will discuss the setup in greater detail later, but simply speaking the setup relies on a party or a group of parties publishing some numbers out in the public, with these numbers derived from a source of randomness, commonly referred to as “toxic waste”. Anyone having access to the source of randomness can also produce fake proofs that will be accepted by a verifier following the protocol. This is why it is crucial that the “toxic waste” is destroyed by the parties generating the public numbers, which in turn means those parties need to be trusted. This makes the applicability of some non-interactive zero-knowledge protocols tricky in practice. In many cases, however, we don’t benefit greatly from non-interactivity or succinctness. Non-interactivity is only useful if we want to allow multiple independent verifiers to verify a given proof without each one having to individually query the prover. Succinctness is necessary only if the medium used for storing the proofs is very expensive and/or if we need very short verification times. However, in general storage is cheap, with distirbuted ledgers being the standout exception, and verification time can also be flexible unless there are hard constraints, such as the ones enforced by inter-block time on public blockchains. This means that for B2B applications, where two entities are communicating over a secure channel such as HTTPS, we can live with interactive proofs. The companies can put on the masks of Peggy or Victor depending on the use case, with the verifier being responsible for carrying out the trusted setup. Since the interaction is 1-on-1 by nature, the verifier has no interest in faking proofs, it’s not necessary to store the proofs on-chain and Victor can afford to spend more time carrying out the verification, such applications can work in an interactive setting. Following this small detour, we go back to the main subject of interest for this series -- proofs which are non-interactive and succinct. In particular, we are interested in zero-knowledge Succinct Non-interactive ARguments of Knowledge, a.k.a. zk-SNARKs. zk-SNARKs are the most widely used zero-knowledge protocols, with the anonymous cryptocurrency Zcash and the smart-contract platform Ethereum among the notable early adopters. ### QAP: from computation to polynomials Before we can more rigorously describe zk-SNARK protocols, we have to cover some background. One complication with zero-knowledge protocols in general is that in order for them to be applicable the original problem has to be completely transformed. Consider the scenario in which Peggy wants to prove to Victor that she knows a solution to the equation $$x^2 - 4 = 0$$ without revealing what this solution is. In this case, the solutions are trivial and in reality one doesn’t need a zero-knowledge proof for such problems, as anyone would be able to derive the answer. Nonetheless, for the sake of simplicity, this is the example that we will be looking at during this series, keeping in mind that instead of a second degree polynomial this could just as easily be a polynomial of degree 1 million. So, how do we go about encoding this problem? A detailed description of the process is available in Vitalik Buterin’s excellent post on Quadratic Arithmetic Programs (QAPs), which we highly recommend as an additional reading. Here we provide a shorter summary of the transformation. We start by representing the problem in code, e.g. in Python we can define the following simple function encoding the computation: def f(x): y = x ** 2 - 4  Remember that the goal is for Peggy to evaluate the above function at some secret value$x$known only to her, with the evaluation leading to$y$being set to$0$. Clearly, we have at least two problems: • we need to find a way to allow Victor to get a “peek” of the value of$y$at the end of the function execution, without him also learning the value of$x$• since Peggy is the one running the program, in the current formulation of the code Victor has no control over what code she runs exactly, e.g. she could cheat by just running the code$y = 0$which would satisfy the condition Both of these problems appear very difficult to solve. Unsurprisingly, the solution involves a highly non-trivial reformulation of the above problem, namely its conversion to a QAP. This is a three step process: code flattening, conversion to a rank-1 constraint system (R1CS) and finally formulation of the QAP. The goal of code flattening is to convert the original code, which may contain arbitrarily complex statements and expressions, into a sequence of statements that are of two forms: •$x=y$(where$y$can be a variable or a number) •$x=y \; (op) \; z$(where$op \in (+, -, \ast, /)$and$y$and$z$can be variables, numbers or themselves sub-expressions) You can think of these statements as gates of an arithmetic circuit. The circuit for our example program would look like this: The corresponding flattened code is: def f(x): out_1 = x * x y = out_1 - 4  The next step is to convert the above to a R1CS. A R1CS is a list of triplets of vectors$(\vec{a_i}$,$\vec{b_i}$,$\vec{c_i})$and its solution is a vector$\vec{s}$, such that: $$\langle \vec{a_i}, \vec{s} \rangle \ast \langle \vec{b_i}, \vec{s} \rangle - \langle \vec{c_i}, \vec{s} \rangle = 0 \quad \textrm{for} \quad \forall i$$ where$\langle \cdot, \cdot \rangle$denotes the dot product of two vectors. The triplets can be interpreted as defining constraints, which need to be satisfied by finding$\vec{s}$such that the equations above hold. There is a natural way to transform the flattened version of our code to a R1CS. First we define a vector, which will hold the state of our program, i.e. all variables which are used in the program. Each component of this vector will be one variable and the solution$\vec{s}$will be an assignment to each variable. The variable vector for our program is: $$\vec{s} = \begin{pmatrix} 1 \\ x \\ out\_1 \\ y \end{pmatrix}$$ Note that the inclusion of$1$as the first component is done in order to encode constants, as we will see shortly. Having defined this vector, we continue by processing each line of the flattened code sequentially and defining suitable$\vec{a_i}$,$\vec{b_i}$and$\vec{c_i}$to encode the computation contained in the current line. For example, to encode the first line in the function body we set:$\vec{a_1} = \begin{pmatrix}
0 \
1 \
0 \
0
\end{pmatrix}\vec{b_1} = \begin{pmatrix}
0 \
1 \
0 \
0
\end{pmatrix}\vec{c_1} = \begin{pmatrix}
0 \
0 \
1 \
0
\end{pmatrix}$To encode the second line we use:$\vec{a_2} = \begin{pmatrix}
-4 \
0 \
1 \
0
\end{pmatrix}\vec{b_2} = \begin{pmatrix}
1 \
0 \
0 \
0
\end{pmatrix}\vec{c_2} = \begin{pmatrix}
0 \
0 \
0 \
1
\end{pmatrix}$Let's check the expansion of the encodings for the two lines, while also noting how in order to represent the constant$-4$in the second line we made use of the$1$in the first component of$\vec{s}$for the definition of$\vec{a_2}$: Line 1:$

\langle \vec{a_1}, \vec{s} \rangle \ast \langle \vec{b_1}, \vec{s} \rangle - \langle \vec{c_1}, \vec{s} \rangle = \sum_{i=1}^{n} a_{1,i} s_i \ast \sum_{i=1}^{n} b_{1,i} s_i - \sum_{i=1}^{n} c_{1,i} s_i = x \ast x - out\_1 = 0
$Line 2:$

\langle \vec{a_2}, \vec{s} \rangle \ast \langle \vec{b_2}, \vec{s} \rangle - \langle \vec{c_2}, \vec{s} \rangle = \sum_{i=1}^{n} a_{2,i} s_i \ast \sum_{i=1}^{n} b_{2,i} s_i - \sum_{i=1}^{n} c_{2,i} s_i = out\_1 - 4 - y = 0
$With the above, we have successfully transformed our Python program from computer code to a set of constraints! The logic of the code has been "transferred" to the triplets of constraint vectors$(\vec{a_i}$,$\vec{b_i}$,$\vec{c_i})$. Running the program is equivalent to finding an assigment of$\vec{s}$, which satisfies the R1CS equations$\langle \vec{a_i}, \vec{s} \rangle \ast \langle \vec{b_i}, \vec{s} \rangle - \langle \vec{c_i}, \vec{s} \rangle = 0 \quad \textrm{for} \quad \forall i$. To complete the transformation to a QAP we need to switch from vectors to polynomials. We start by defining polynomials$A_i(x)$,$B_i(x)$and$C_i(x)$for$i\in[{1,N}]$where$N$is the number of elements in any of the constraint vectors (in our case 4). We construct these polynomials by requiring that$A_i(n) = \vec{a_{n,i}}$and similarly for$B_i(n)$and$C_i(n)$and doing a Lagrange interpolation based on these points. For our example we require that:$A_1(1) = 0A_1(2) = -4A_2(1) = 1A_2(2) = 0$and similarly for$B_i$and$C_i$. Doing a Lagrange interpolation, we arrive at the polynomials:$A_1(x) = -4x + 4A_2(x) = -x + 2A_3(x) = x - 1A_4(x) = 0B_1(x) = x - 1B_2(x) = -x + 2B_3(x) = B_4(x) = 0C_1(x) = C_2(x) = 0C_3(x) = -x + 2C_4(x) = x - 1$With these definitions we can now express the R1CS from above as one equation: $$A(x) \ast B(x) - C(x) = H(x) \ast Z(x)$$ where:$\vec{A(x)}= \begin{pmatrix}
A_1(x) \
A_2(x) \
A_3(x) \
A_4(x)
\end{pmatrix}\vec{B(x)}= \begin{pmatrix}
B_1(x) \
B_2(x) \
B_3(x) \
B_4(x)
\end{pmatrix}\vec{C(x)}= \begin{pmatrix}
C_1(x) \
C_2(x) \
C_3(x) \
C_4(x)
\end{pmatrix}A(x) = \langle \vec{A},\vec{s} \rangleB(x) = \langle \vec{B},\vec{s} \rangleC(x) = \langle \vec{C},\vec{s} \rangleZ(x) = (x - 1)(x - 2)$and$H(x)$needs to be such that$A(x) \ast B(x) - C(x) = H(x) \ast Z(x)$holds for$\forall x$. Still here? Great! At this point, you might be wondering: • why on Earth did we just do all of this • what is$Z(x)$• what is$H(x)$and who should come up with it To answer these questions, we first note that, by definition:$\vec{A(1)} = \vec{a_1}\vec{A(2)} = \vec{a_2}$and similarly for$\vec{B(x)}$and$\vec{C(x)}$. Together with the definition of$A(x)$,$B(x)$and$C(x)$, this implies that the original R1CS system is equivalent to: $$A(x) \ast B(x) - C(x) = 0 \quad \textrm{for} \quad x \in \{1,2\}$$ The only way for this to hold is if the polynomial$A(x) \ast B(x) - C(x)$is divisible without remainder by$(x-1)(x-2)$, i.e.: $$\exists H(x): A(x) \ast B(x) - C(x) = H(x)(x-1)(x-2)$$ This clarifies what$H(x)$and$Z(x)$are; but who computes$H(x)$? It must be the prover because by computing$H(x)$as$(A(x) \ast B(x) - C(x)) / Z(x)$they show that they know a satisfying assignment$\vec{s}$to the underlying R1CS. Thus, informally we can think of a QAP as a 4-tuple$(\vec{A(x)}, \vec{B(x)}, \vec{C(x)}, Z)$and of$\vec{s}$as the "solution" to the QAP. We still haven't answered the most important question: why do we need all of this? Remember that by expressing our problem as a Python program we ran into a couple of big issues: • there is no way to verify that Peggy runs the correct program • it is not clear how to allow Victor to observe the value of$y$at the end of the execution, without revealing any other information With the computation expressed as a R1CS, if Victor sees a valid assignment$\vec{s}$, he can be certain that Peggy ran the program corresponding to the R1CS! This already solves one of our main problems. The reason we transform the R1CS further to obtain a QAP is that mathematically it's much more fruitful to reason about polynomials than R1CSs. In the next section, we leverage our freshly minted problem reformulation to describe one technique central to the construction of zk-SNARKs, which exploits the structure of polynomials. ### Evaluation of polynomials at a hidden point One nifty trick that we can do with polynomials is to evaluate them at a point that is not known to us, a.k.a. blind evaluation of polynomials. Wait, what?! Surely, this must be impossible. Given a polynomial: $$P(x) = a_0 + a_1 x + a_2 x ^ 2 + \cdots + a_n x ^ n$$ you must know$x$if you want to evaluate the polynomial. Indeed that is the case, but it turns out that if we have access to some additional information, we can evaluate a polynomial without knowing the point of evaluation! Imagine for a second that we have access to some function$f(x)$, which has the following properties: 1. one-way: given$x$it is easy to compute$y = f(x)$, but given$y$, it is hard to find$x$s.t.$y=f(x)$2. linearity:$f(\alpha x + \beta y) = \alpha f(x) + \beta f(x)$3.$x \ne y \implies f(x) \ne f(y)$Let's see how such a function can help us do blind evaluation. To make things easier to visualize, assume that Victor wants to ask Peggy to do a blind evaluation of$P$, which is known to both of them, in such a way that he can verify that Peggy evaluated the polynomial at the point in question. Assume also that$f$exists and is known to both Peggy and Victor. First, Victor chooses a point$x_0$at which he wants the polynomial to be evaluated. He then feeds it to$f$to get a new point, which we denote$z$, and sends$z$to Peggy. Note that because of the 1st property of$f$, there is no way for Peggy to find$x_0$from$z$. Furthermore, because of the 3rd property, she cannot come up with$x_0' \ne x_0$s.t.$f(x_0') = z$. Together, these two facts imply that once Peggy receives$z$from Victor there is no way for her to get back the original$x_0$. Consider the case when$P(x) = a_1 x$. Given$z = f(x_0)$, Peggy can trivially compute$P(z) = a_1 z$. Note that because of the linearity of$f$: $$P(z) = P(f(x_0)) = a_1 f(x_0) = f(a_1 x_0) = f(P(x_0))$$ This is exactly what we need: by computing$P(z)$, Peggy is effectively computing$f(P(x_0))$, without knowing anything about$x_0$! Victor can now look at$P(z)$computed by Peggy and compare it with$f(P(x_0))$locally, since he knows$f$,$P$and$x_0$. If the two results match, he can be certain that she evaluated$P$at$x_0$, since evaluation at$x_0' \ne x_0$would have produced a different result. In order to generalize the above approach to any polynomial, the only thing that needs to change is for Victor to send more information to Peggy. In particular, he sends her: $$f(1), f(x_0), f(x_0^2), \cdots, f(x_0^d)$$ where$d$is the degree of the polynomial that needs to be evaluated. With this information, Peggy can now compute:$
f(P(x_0)) = f(a_0 + a_1 x_0 + \cdots + a_n x_0^d) = f(a_0) + f(a_1 x_0) + \cdots + f(a_d x_0^d) \\ \qquad \quad \; \; \;= a_0 f(1) + a_1 f(x_0) + \cdots + a_d f(x_0^d)