Skip to content

Commit

Permalink
Merge pull request #210 from 0xPolygon/km/move-prover
Browse files Browse the repository at this point in the history
Move prover docs to CDK
  • Loading branch information
EmpieichO authored Feb 8, 2024
2 parents 2bae063 + f1b24cd commit 4549fb6
Show file tree
Hide file tree
Showing 22 changed files with 378 additions and 512 deletions.
30 changes: 30 additions & 0 deletions docs/cdk/architecture/type-1-prover/intro-t1-prover.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
The Polygon Type 1 Prover is a zk-evm proving component used for creating proofs on your ZK-EVM chain. It has been developed in collaboration with the Toposware team.

## Get started

If you want to get up and running quickly, follow the [how to deploy the Type 1 Prover guide](../../how-to/deploy-t1-prover.md).

!!! warning
Throughout this section, we refer to ZK-EVM chains in a general sense and this should not be confused with Polygon's zkEVM product which is a specific example of a ZK-EVM.

## Type definitions

The emergence of various ZK-EVMs ignited the debate of how 'equivalent' is a given ZK-EVM to the Ethereum virtual machine (EVM).

Vitalik Buterin has since introduced some calibration to EVM-equivalence in his article, "[The different types of ZK-EVMs](https://vitalik.eth.limo/general/2022/08/04/zkevm.html)". He made a distinction among five types of ZK-EVMs, which boils down to the inevitable trade-off between Ethereum equivalence and the efficacy of the zero-knowledge proving scheme involved. For brevity, we refer to this proving scheme as the zk-prover or simply, prover.

The types, as outlined by Vitalik, are as follows;

- **Type 1** ZK-EVMs strive for full Ethereum-equivalence. These types do not change anything in the Ethereum stack except adding a zk-prover. They can therefore verify Ethereum and environments that are exactly like Ethereum.
- **Type-2** ZK-EVMs aim at full EVM-equivalence instead of Ethereum-equivalence. These ZK-EVMs make some minor changes to the Ethereum stack with the exception of the Application layer. As a result, they are fully compatible with almost all Ethereum apps, and thus offer the same UX as with Ethereum.
- **Type-2.5** ZK-EVMs endeavor for EVM-equivalence but make changes to gas costs. These ZK-EVMs achieve fast generation of proofs but introduces a few incompatibles.
- **Type-3** ZK-EVMs seek to be EVM-equivalent but make a few minor changes to the Application layer. These type of ZK-EVMs achieve faster generation of proofs, and are not compatible with most Ethereum apps.
- **Type-4** ZK-EVMs are high-level-language equivalent ZK-EVMs. These type of ZK-EVMs take smart contract code written in Solidity, Vyper or other high-level languages and compile it to a specialized virtual machine and prove it. Type-4 ZK-EVMs attain the fastest proof generation time.

The figure below gives a visual summary of the types, contrasting compatibility with performance.

![Figure: ZK-EVM types](../../../img/cdk/zkevm-types-vitalik.png)

Ultimately, choosing which type of ZK-EVM to develop involves a trade-off between EVM-equivalence and performance.

The challenge this poses for developers who favor exact Ethereum-equivalence is to devise ingenious designs and clever techniques to implement faster zk-provers. Vitalik mentions one mitigation strategy to improve proof generation times: cleverly engineered, and massively parallelized provers.
Original file line number Diff line number Diff line change
@@ -1,19 +1,18 @@
Polygon Type-1 zkEVM is designed for efficient implementation of the STARK proving and verification of Ethereum transactions. It achieves efficiency by restricting the Algebraic Intermediate Representation (AIR) to constraints of degree 3.
The Polygon Type 1 Prover is designed for efficient implementation of STARK proofs and verification of Ethereum transactions. It achieves efficiency by restricting the Algebraic Intermediate Representation (AIR) to constraints of degree 3.

The execution trace needed to generate a STARK proof can be assimilated to a large matrix, where columns are registers and each row represents a view of the registers at a given time.

From the initial register values on the first row to the final one, validity of each internal state transition is enforced through a set of dedicated constraints. Generating the execution trace for a given transaction unfortunately yields a considerable overhead for the prover.

A naïve design strategy would be to utilize a single table, which is solely dedicated to the entire EVM execution. Such a table would have thousands of columns, and although it would be a highly sparse matrix, the prover would treat it as fully dense.


### Modular design strategy
## Modular design strategy

Since most of the operations involved in the EVM can be independently executed, the execution trace is split into separate STARK modules, where each is responsible for ensuring integrity of its own computations.

These STARK modules are:

- **Arithmetic module** handles binary operations including ordinary addition, multiplication, subtraction and division, comparison operations such as 'Less than' and 'Greater than', as well as ternary operations like modular operations.
- **Arithmetic module** handles binary operations including ordinary addition, multiplication, subtraction and division, comparison operations such as 'less than' and 'greater than', as well as ternary operations like modular operations.
- **Keccak module** is responsible for computing a Keccak permutation.
- **KeccakSponge module** is dedicated to the sponge construction's 'absorbing' and 'squeezing' functions.
- **Logic module** specializes in performing bitwise logic operations such as AND, OR, or XOR.
Expand All @@ -26,52 +25,44 @@ In addition to the constraints of each module, this design requires an additiona

For this reason, this design utilizes _Cross-table lookups_ (CTLs), based on a [logUp argument](https://eprint.iacr.org/2022/1530.pdf) designed by Ulrich Haböck, to cheaply add copy-constraints in the overall system.

Polygon Type-1 zkEVM uses a central component dubbed the **CPU** to orchestrate the entire flow of data that occurs among the STARK modules during execution of EVM transactions. The CPU dispatches instructions and inputs to specific STARK modules, as well as fetches their corresponding outputs.
The Polygon Type 1 Prover uses a central component dubbed the **CPU** to orchestrate the entire flow of data that occurs among the STARK modules during execution of EVM transactions. The CPU dispatches instructions and inputs to specific STARK modules, as well as fetches their corresponding outputs.

Note here that “dispatching” and “fetching” means that initial values and final values resulting from a given operation are being copied with the CTLs to and from the targeted STARK module.

## Prover primitives


### Prover primitives

This document discusses the cryptographic primitives used to engineer Polygon Type-1 zkEVM, which is a custom-built zkEVM capable of tracing, proving and verifying the execution of the EVM through all state changes.
We now look at the cryptographic primitives used to engineer the Polygon Type 1 Prover, which is a custom-built prover capable of tracing, proving, and verifying the execution of the EVM through all state changes.

The proving and verification process is made possible by the zero-knowledge (ZK) technology. In particular, a combination of STARK[^1] and SNARK[^2], proving and verification schemes, respectively.

#### STARK for proving
### STARK for proving

Polygon Type-1 zkEVM prover implements a STARK proving scheme, a robust cryptographic technique with fast proving time.
The Polygon Type 1 Prover implements a STARK proving scheme, a robust cryptographic technique with fast proving time.

Such a scheme has a proving component, called the STARK prover, and a verifying component called the STARK verifier. A proof produced by the STARK prover is referred to as a STARK proof.

The process begins with constructing a detailed record of all the operations performed when transactions are executed. The record, called the `execution trace`, is then passed to a STARK prover, which in turn generates a STARK proof attesting to correct computation of transactions.

Although STARK proofs are relatively big in size, they are put through a series of recursive SNARK proving, where each SNARK proof is more compact than the previous one. This way the final transaction proof becomes significantly more succinct than the initial one, and hence the verification process is highly accelerated.

Ultimately, this SNARK proof can stand alone or be combined with preceding blocks of proofs, resulting in a single zkEVM validity proof that validates the entire blockchain back from genesis.
Ultimately, this SNARK proof can stand alone or be combined with preceding blocks of proofs, resulting in a single validity proof that validates the entire blockchain back from genesis.

#### Plonky2 SNARK for verification
### Plonky2 SNARK for verification

Polygon Type-1 prover implements a SNARK called [Plonky2](https://github.com/0xPolygonZero/plonky2), which is a SNARK designed for fast recursive proofs composition. Although its arithmetization is based on [TurboPLONK](https://docs.zkproof.org/pages/standards/accepted-workshop3/proposal-turbo_plonk.pdf), it replaces the polynomial commitment scheme of [PLONK](https://eprint.iacr.org/2019/953) with a scheme based on [FRI](https://drops.dagstuhl.de/storage/00lipics/lipics-vol107-icalp2018/LIPIcs.ICALP.2018.14/LIPIcs.ICALP.2018.14.pdf). This allows encoding the witness in 64-bit words, represented as field elements of a low-characteristic field.
The Polygon Type 1 Prover implements a SNARK called [Plonky2](https://github.com/0xPolygonZero/plonky2), which is a SNARK designed for fast recursive proofs composition. Although the math is based on [TurboPLONK](https://docs.zkproof.org/pages/standards/accepted-workshop3/proposal-turbo_plonk.pdf), it replaces the polynomial commitment scheme of [PLONK](https://eprint.iacr.org/2019/953) with a scheme based on [FRI](https://drops.dagstuhl.de/storage/00lipics/lipics-vol107-icalp2018/LIPIcs.ICALP.2018.14/LIPIcs.ICALP.2018.14.pdf). This allows encoding the witness in 64-bit words, represented as field elements of a low-characteristic field.

The field used, denoted by $\mathbb{F}_p$ , is called Goldilocks. It is a prime field where the prime $p$ is of the form $p = 2^{64} - 2^{32} + 1$.

Since SNARKs are succinct, a Plonky2 proof is published as the validity proof that attests to the integrity of a number of aggregated STARK proofs. This results in reduced verification costs.

This innovative approach holds the promise of a succinct, verifiable chain state, marking a significant milestone in the quest for blockchain verifiability, scalability, and integrity. It is the very innovation that plays a central role in Polygon Type-1 zkEVM.


### Documentation remarks

The documentation of Polygon Type-1 zkEVM is still WIP, and some of the documents are in the Github repo.

The STARK modules, which are also referred to as **STARK tables**, have been documented in the Github repo [here](https://github.com/0xPolygonZero/plonky2/tree/main/evm/spec/tables). The **CPU component** is documented below, while the **CPU logic** is in the [repo](https://github.com/0xPolygonZero/plonky2/blob/main/evm/spec/cpulogic.tex).

In order to complete the STARK framework, the cross-table lookups (CTLs) and the **CTL protocol** can be found in this document, while **range-checks** are also discussed below.

Details on **Merkle Patricia tries** and how they are used in Polygon Type-1 zkEVM, can be found [here](https://github.com/0xPolygonZero/plonky2/blob/main/evm/spec/mpts.tex). Included in there are outlines on the prover's internal memory, data encoding and hashing, and prover input format.
This innovative approach holds the promise of a succinct, verifiable chain state, marking a significant milestone in the quest for blockchain verifiability, scalability, and integrity. It is the very innovation that plays a central role in the Polygon Type 1 Prover.

!!! info "Further reading"

- The STARK modules, which are also referred to as **STARK tables**, have been documented in the Github repo [here](https://github.com/0xPolygonZero/plonky2/tree/main/evm/spec/tables).
- We have documented [the CPU component](t1-cpu-component.md) while the CPU logic documentation can be found in the [repo](https://github.com/0xPolygonZero/plonky2/blob/main/evm/spec/cpulogic.tex).
- In order to complete the STARK framework, read more about the [cross-table lookups (CTLs) and the CTL protocol](t1-ctl-protocol.md) and [range-checks](t1-rangechecks.md).
- Details on **Merkle Patricia tries** and how they are used in the Polygon Type 1 Prover can be found [here](https://github.com/0xPolygonZero/plonky2/blob/main/evm/spec/mpts.tex). Included are outlines on the prover's internal memory, data encoding and hashing, and prover input format.

[^1]: STARK is short for Scalable Transparent Argument of Knowledge
[^2]: SNARK is short for Succinct Non-interactive Argument of Knowledge.
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
The CPU is the central component of Polygon Zero zkEVM. Like any central processing unit, it reads instructions, executes them, and modifies the state (registers and the memory) accordingly.
The CPU is the central component of the Polygon Type 1 Prover. Like any central processing unit, it reads instructions, executes them, and modifies the state (registers and the memory) accordingly.

Other complex instructions, such as Keccak hashing, are delegated to specialized STARK tables.

This section briefly presents the CPU and its columns. However, details on the CPU logic can be found [here](https://github.com/0xPolygonZero/plonky2/blob/main/evm/spec/cpulogic.tex).


### CPU flow
## CPU flow

CPU execution can be decomposed into two distinct phases; CPU cycles, and padding.

This first phase of the CPU execution is a lot bulkier than the second, more so that padding comes only at the end of the execution.

#### CPU cycles
### CPU cycles

In each row, the CPU reads code at a given program counter (PC) address, executes it, and writes outputs to memory. The code could be kernel code or any context-based code.

Expand All @@ -28,23 +27,21 @@ Subsequent contexts are created when executing user code.

Syscalls, which are specific instructions written in the kernel, may be executed in a non-zero user context. They don't change the context but the code context, which is where the instructions are read from.

#### Padding
### Padding

At the end of any execution, the length of the CPU trace is padded to the next power of two.

When the program counter reaches the special halting label in the kernel, execution halts. And that's when padding should follow.

There are special constraints responsible for ensuring that every row subsequent to execution halting is a padded row, and that execution does not automatically resume. That is, execution cannot resume without further instructions.

## CPU columns


### CPU columns

This document discusses CPU columns as they relate to all relevant operations being executed, as well as how some of the constraints are checked.
We now have a look at CPU columns as they relate to all relevant operations being executed, as well as how some of the constraints are checked.

These are the register columns, operation flags, memory columns, and general columns.

#### Registers
### Registers

- $\texttt{context}$: Indicates the current context at any given time. So, $\texttt{context}\ 0$ is for the kernel, while any context specified with a positive integer indicates a user context. A user context is incremented by $1$ at every call.
- $\texttt{code_context}$: Indicates the context in which the executed code resides.
Expand All @@ -55,7 +52,7 @@ These are the register columns, operation flags, memory columns, and general col
- $\texttt{clock}$: Monotonic counter which starts at 0 and is incremented by 1 at each row. It is used to enforce correct ordering of memory accesses.
- $\texttt{opcode_bits}$ These are 8 boolean columns, indicating the bit decomposition of the opcode being read at the current PC.

#### Operation flags
### Operation flags

Operation flags are boolean flags indicating whether an operation is executed or not.

Expand Down Expand Up @@ -85,8 +82,7 @@ $$
\texttt{eq_iszero * opcode_bits[0]}
$$


#### Memory columns
### Memory columns

The CPU interacts with the EVM memory via its memory channels.

Expand All @@ -101,7 +97,7 @@ A full memory channel is composed of the following:

The last memory channel is a partial channel. It doesn't have its own $\texttt{value}$ columns but shares them with the first full memory channel. This allows saving eight columns.

#### General columns
### General columns

There are eight ($8$) shared general columns. Depending on the instruction, they are used differently:

Expand All @@ -126,5 +122,4 @@ The `popping-only` instruction uses the $\text{Stack}$ columns to check if the S
While the `pushing-only` instruction uses the $\text{Stack}$ columns to check if the Stack is empty before the instruction.

$\texttt{stack_len_bounds_aux}$ is used to check that the Stack doesn't overflow in user mode. The last four columns are used to prevent conflicts with other general columns.
See the $\text{Stack Handling}$ subsection of this [document](https://github.com/0xPolygonZero/plonky2/blob/main/evm/spec/cpulogic.tex) for more details.

See the $\text{Stack Handling}$ subsection of this [document](https://github.com/0xPolygonZero/plonky2/blob/main/evm/spec/cpulogic.tex) for more details.
Loading

0 comments on commit 4549fb6

Please sign in to comment.