0xeigenlabs / eigen-zkvm Goto Github PK

A Rust zkVM with a Modular Proof System

License: Apache License 2.0

Rust 78.19% Solidity 0.53% Shell 0.26% TypeScript 0.23% JavaScript 0.39% Circom 20.40% Makefile 0.01%

plonk stark zkp zkvm circom snarkjs rust

eigen-zkvm's Introduction

eigen-zkvm

eigen-zkvm is a zkVM on a layered proof system, allowing the developers to write Zero-knowledge applications, proving with the layered proof system to achieve no trusted setup, constant on-chain proof size and low gas cost, and finally generating the solidity verifier.

zkit: universal command line for stark, plonk, and groth16.
Circom 2.x support;
Proof composition: proof aggregation and recursion on Stark;
Proof Recursion with Snark on Stark;
Solidity verifier generation;
GPU acceleration for proving, not open-sourced;
WASM friendly for single proving and verifying, NodeJS/Javascript prover and verifier, plonkjs;
Eigen zkVM: basing on RISCV ISA.

How the layered proof system works

Tutorial

Generate universal setup key

zkit setup -p 13 -s setup_2^13.key

For the power from 20 to 26, you can download directly from universal-setup hub.

Single proof

test_plonk_verifier.sh

test_plonk_verifier.sh poseidon

Snark aggregation proof

test_aggregation.sh

Stark aggregation proof

stark_aggregation.sh yes BN128

stark_aggregation.sh yes BLS12381

Stark proof and recursive stark prove

starky

Layered proof

starkjs

Applications

eigen-zkvm's People

Contributors

Stargazers

Watchers

eigen-zkvm's Issues

Hang when compiling with a smaller than expected power

Support ark-g16 as final proof

Now the input for final snark proof is Circom circuits, and the output should be the proof and Solidility verifier.

We can do a benchmark by https://github.com/arkworks-rs/circom-compat/blob/master/tests/groth16.rs#L12 to figure out how fast it can be achieved for us.

The plonk proving time cost is about 10 mins on the 16-core server, hence our expectation for changing to ark-g16 as a more efficient approach, with time cost less than 3mins.

if the test result goes as expected, We will discuss how to import ark-groth16. Currently, we don't depend on ark-work's library too much, and I prefer to build a simple service in eigen-prover as one step of the proving pipeline.

`zkit compile` optimization

when compiling the stark_verifier.circom of bottom layer, the cost is much larger than circom.

../target/debug/zkit compile -p goldilocks -i circuits/circuit.circom -l node_modules/pil-stark/circuits.gl --O2=full -o /tmp/circ2
template instances: 34

non-linear constraints: 514852
linear constraints: 270120
public inputs: 1
public outputs: 0
private inputs: 7406
private outputs: 0
wires: 1188497
labels: 1715330
Written successfully: /tmp/circ2/circuit.r1cs
Written successfully: /tmp/circ2/circuit.sym
Written successfully: /tmp/circ2/circuit_js/circuit.wasm
time cost: 14752.481025962

Lower the log level

wasmer outputs too much INFO level log;
change all the debug level log to trace.

NTT accelerating Goldilocks

paper: https://github.com/ingonyama-zk/papers/blob/main/goldilocks_ntt_trick.pdf
reference implements: https://github.com/0xPolygonHermez/goldilocks/blob/abfa726d0a0426058decc9c3f3671988069712c9/src/ntt_goldilocks.cpp

Differentiate ONE and ONES

We can define ONE with dim=1 with ONE, and one() with dim=3 by ONES.

ONE: https://github.com/0xEigenLabs/eigen-zkvm/blob/main/starky/src/f3g.rs#L59

one(): https://github.com/0xEigenLabs/eigen-zkvm/blob/main/starky/src/f3g.rs#L266

Test bls12-381 for Plonk proof

Panic on server with cores more than 40

run test_single.sh

error:

thread '<unnamed>' panicked at 'received 4 elements to spawn 16 threads', thread '/root/.cargo/git/checkouts/bellman-7a75f3b44e91e034/416f79d/src/multicore.rs<unnamed>:' panicked at '122received 4 elements to spawn 16 threads:', thread '9/root/.cargo/git/checkouts/bellman-7a75f3b44e91e034/416f79d/src/multicore.rs<unnamed>thread '
<unnamed>:' panicked at 'note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
' panicked at '122received 4 elements to spawn 16 threadsreceived 4 elements to spawn 16 threads:', ', 9/root/.cargo/git/checkouts/bellman-7a75f3b44e91e034/416f79d/src/multicore.rs/root/.cargo/git/checkouts/bellman-7a75f3b44e91e034/416f79d/src/multicore.rs
::122122::99

thread 'main' panicked at 'must run: Any { .. }', /root/.cargo/git/checkouts/bellman-7a75f3b44e91e034/416f79d/src/multicore.rs:85:12

Fix dependencies

in SM's package.json, we have:

"@0xpolygonhermez/zkevm-commonjs": "https://github.com/eigmax/zkevm-commonjs.git",
"@0xpolygonhermez/zkevm-storage-rom": "https://github.com/0xpolygonhermez/zkevm-storage-rom.git",

change 1 to the original repo, clone 2) lazily.

Keep output log verbosity small

Now some logging uses println! directly, and some use log::info, this makes the stdout messy. This repo aims to:

change all the println! to log::debug
change all the log::info in starky into log::debug;
For necessary logging, like timing, a new feature can be used to enable the output.

F3G implements trait ff::Field

F3G doesn't implement traits ff::Field and makes pretty_print_array not able to print F3G, including the Debug trait for StarkContext(https://github.com/0xEigenLabs/eigen-zkvm/blob/main/starky/src/stark_gen.rs#L62).

Refers to the field_gl.rs.

Optimize memory usage

StarkContext takes huge memory space since we define the elements are from FieldExtension, but this is not a must.

Using base field as more as we can, take ctx.cm1_n for instance, cm1_n can use the Goldilocks element. Hence, we need figure out the proper element type for the members in StarkContext.
Drop const and cm PolsArray after using
Use mmap, instead of vec![] to allocate memory. Use memmap2, and setup MmapOptions::map_anon() to write arbitrary size data.

Run plonkjs on-device

Merge compressor_setup/exec

Check MPT state transition integrity when aggregating proof

Currently, we only check that the 2 proofs(of stark verifier circuits) are satisfiable independently, without checking the order relationship of the proofs.

Ref imps: https://github.com/0xPolygonHermez/zkevm-proverjs/blob/v0.7.0.0-rc.3/recursive/recursive2.circom.ejs#L177C3-L226C41

Move errors::EigenError to common util from algebraic

Abstract MerkleHash

Now the load and save are exactly the same in MerkleHash{GL, BN128, BLS12381}, can abstract the MerkleHash by adding LinearHash trait.

Unify field implementation

[] Now the bn128 and bls12381 are implemented via ff_ce in starky, instead of using pairing_ce::{bn254, bls381} like we do in plonky. Hence we should reuse the pairing_ce.

Impl File IO Write/Read for MerkleTree

Pass serde object as input to revm

The powdr runtime provides get_data to receive data as circuit inputs. we need to pass the whole transaction/block and new State as public inputs to the revm circuit.

test_aggregation is not working

Fix readme

Now all the links in the readme are invalid. stark_aggregation.sh yes bn128

Aggregation proof verify failed

when I run the ./test/test_aggregation.sh, an error is raised in the verify step.

[2023-03-05T23:28:09Z INFO  plonky::aggregation] individual_inputs: [
        [
            Fr(0x115cc0f5e7d690413df64c6b9662e9cf2a3617f2743245519e19607a4417189a),
        ],
        [
            Fr(0x20a3af0435914ccd84b806164531b0cd36e37d4efb93efab76913a93e1f30996),
        ],
        [
            Fr(0x0427b43899bdfc36d3d4f26c018dd73f5437ea8e5f533fc122441881d5d0b737),
        ],
    ]
execute error: Proof is invalid

Generate circom on GL error

Connection constraints not working

OOM on Ubuntu22.02

Server configuration: 16core X 32GB

run test/test_aggregation.sh.

Can not verify the recursive circuit

Convert any r1cs to PIL

Tidy up the test scripts

Now the test scripts is too complicated due to mixing the proving pipeline with different Snark framework.

We have two options:

Compose the pipeline with an RPC server, which is now being designed and followed here;
Tidy up the scripts and test files' layout.
This issue focuses on point 2.

Dump proof/public input/vk to json format

Groth16: generate proof in hex format

When verifying proof on Ton, the proof is hex by default, but we current produce decimal proof only.

Profiling `stark_gen`

Some functions in stark_gen are very slow on zkEVM.

interpolate: large buffer(tmpbuf) allocation;
interpolate_bit_reverse
interpolate_prepare

interpolate_prepare_block

bit_reverse

Remember to increase the MIN_STACK_SIZE, which is 2M by default.

Witness calculator datatype unifying

We use BigUint in witness calculator, it's not friendly to be used in starky when converting the R1CS to PIL.

Compile error in algebraic project

Those errors should be raised by our last change #183

error[E0277]: `?` couldn't convert the error to `anyhow::Error`
   --> algebraic/src/witness/witness_calculator.rs:244:37
    |
244 |         writer.write_all(&prime_buf)?;
    |                                     ^ the trait `From<std::io::Error>` is not implemented for `anyhow::Error`
    |
    = note: the question mark operation (`?`) implicitly performs a conversion on the error value using the `From` trait
    = help: the following other types implement trait `FromResidual<R>`:
              <std::result::Result<T, F> as FromResidual<Yeet<E>>>
              <std::result::Result<T, F> as FromResidual<std::result::Result<Infallible, E>>>
    = note: required for `std::result::Result<(), anyhow::Error>` to implement `FromResidual<std::result::Result<Infallible, std::io::Error>>`

Compile dsl failed

  Compiling dsl_compile v0.1.0 (/home/dell/Projects/eigen-zkvm/dsl_compile)
error[E0063]: missing field `json_substitutions` in initializer of `BuildConfig`
  --> dsl_compile/src/execution_user.rs:28:24
   |
28 |     let build_config = BuildConfig {
   |                        ^^^^^^^^^^^ missing `json_substitutions`

For more information about this error, try `rustc --explain E0063`.

typo: plookup is wrongly spelled to plonk.

Code: https://github.com/0xEigenLabs/eigen-zkvm/blob/main/starky/src/starkinfo_Z.rs#L22

Fast loading large file

From our profiling, when we load a large commit file, it takes more than 20 seconds.

··Start:   load_pols_array
··End:     load_pols_array .........................................................21.785s

The function load is at https://github.com/0xEigenLabs/eigen-zkvm/blob/main/starky/src/polsarray.rs#L141.

We have 2 strategies to speed up function load:

mmap the large file into [vritual] memory;
Seek and read the elements from memory.

The `cargo bench -- merklehash` fails due to `gen_rand_goldfields`

When we run the benchmark on a large server, it raises a panic.

pub fn gen_rand_goldfields<F: FieldExtension>(k: usize) -> Vec<F> {
    let mut num_threads = rayon::current_num_threads();  // given it's 128, and k maybe 4.
    let mut parts = vec![F::one(); 1 << k];
    rayon::scope(|scope| {
        for out in parts.chunks_mut(num_threads) {
            scope.spawn(move |_| {
                let mut rng = ::rand::thread_rng();
                for i in 0..num_threads {
                    out[i] = <F as rand::Rand>::rand(&mut rng)
                }
            })
        }
    });
    parts
}

Change custom result/error into anyhow&thiserror

Failed to generate proof over bls12381

When running stark_aggregation.sh yes bls12381, I got the error below:

+ echo '>>> groth16 scheme <<< '
>>> groth16 scheme <<< 
+ '[' true = true ']'
+ echo '1. generate groth16 zkey'
1. generate groth16 zkey
+ /home/dell/Projects/eigen-zkvm/test/aggregation/node_modules/snarkjs/build/cli.cjs g16s /home/dell/Projects/eigen-zkvm/test/aggregation/fibonacci.final/fibonacci.final.r1cs '/home/dell/Projects/eigen-zkvm/test/../keys/setup_2^22.bls12-381.ptau' /home/dell/Projects/eigen-zkvm/test/aggregation/fibonacci.final/g16.zkey
[ERROR] snarkJS: r1cs curve does not match powers of tau ceremony curve
+ echo '2. groth16 fullprove'
2. groth16 fullprove
+ /home/dell/Projects/eigen-zkvm/test/aggregation/node_modules/snarkjs/build/cli.cjs g16f /home/dell/Projects/eigen-zkvm/test/aggregation/fibonacci.final/final_input.zkin.json /home/dell/Projects/eigen-zkvm/test/aggregation/fibonacci.final/fibonacci.final_js/fibonacci.final.wasm /home/dell/Projects/eigen-zkvm/test/aggregation/fibonacci.final/g16.zkey /home/dell/Projects/eigen-zkvm/test/aggregation/fibonacci.final/proof.json /home/dell/Projects/eigen-zkvm/test/aggregation/fibonacci.final/public.json
[ERROR] snarkJS: Error: /home/dell/Projects/eigen-zkvm/test/aggregation/fibonacci.final/g16.zkey: Invalid File format
    at Object.readBinFile (/home/dell/Projects/eigen-zkvm/test/aggregation/node_modules/@iden3/binfileutils/build/main.cjs:36:35)
    at async groth16Prove$1 (/home/dell/Projects/eigen-zkvm/test/aggregation/node_modules/snarkjs/build/cli.cjs:5704:50)
    at async groth16FullProve$1 (/home/dell/Projects/eigen-zkvm/test/aggregation/node_modules/snarkjs/build/cli.cjs:6119:12)
    at async Object.groth16FullProve [as action] (/home/dell/Projects/eigen-zkvm/test/aggregation/node_modules/snarkjs/build/cli.cjs:12996:36)
    at async clProcessor (/home/dell/Projects/eigen-zkvm/test/aggregation/node_modules/snarkjs/build/cli.cjs:481:27)

This raised due to the Num2Bits_strict use 254bits array, so we should change it to 384bits.

Re-run compile and prove eigen-zkvm/test/../starkjs/circuits/fibonacci.final.circom by

/home/dell/Projects/eigen-zkvm/test/../target/release/eigen-zkit compile -p bls12381 -i ../starkjs/circuits/fibonacci.final.circom -l ../starkjs/node_modules/pil-stark/circuits.bn128 -l ../starkjs/node_modules/circomlib/circuits --O2=full -o aggregation/fibonacci.final

bash -x snark_verifier.sh groth16 false bls12381

we got another error:

ERROR:  4 Error in template VerifyEvaluations_111 line: 1178
Error in template StarkVerifier_522 line: 2644
Error in template Main_626 line: 3056

[ERROR] snarkJS: Error: Error: Assert Failed. Error in template VerifyEvaluations_111 line: 1178
Error in template StarkVerifier_522 line: 2644
Error in template Main_626 line: 3056

Merge Goldilocks implements

Now the Goldilocks is implemented by both ff and winterfell, we should merge them, or implement a new one.

error handle

the api doesn't handle the error gracefully.

"deps" field inside starky::types::Expression

Expressions in the JSON representation of a PIL might have a "deps" field, containing arrays of integers. But the serde struct starky::types::Expression doesn't have the corresponding deps: Option<Vec<u64>> field, I assume because the field is unused by starky.

Would you be opposed to adding the field deps: Option<Vec<u64>> to the Expression, so that there is no loss if a json file is deserialized and then serialized back?

Optimization: codeblock eval by inteperater is very expensive

The IR looks like this: (https://github.com/0xEigenLabs/eigen-zkvm/blob/main/starky/src/interpreter.rs#L90)

         Mul addr (challenge) (5 + ((i + 0)%16777216) * 0) dim=1 addr (cm1_2ns) (0 + ((i + 0)%16777216) * 215) dim=1
         write (addr (tmp) (1826 + ((i + 0)%16777216) * 0) dim=1)
         Add addr (tmp) (1826 + ((i + 0)%16777216) * 0) dim=1 addr (cm1_2ns) (1 + ((i + 0)%16777216) * 215) dim=1
         write (addr (tmp) (1827 + ((i + 0)%16777216) * 0) dim=1)
         Mul addr (challenge) (5 + ((i + 0)%16777216) * 0) dim=1 addr (tmp) (1827 + ((i + 0)%16777216) * 0) dim=1
         write (addr (tmp) (1828 + ((i + 0)%16777216) * 0) dim=1)
         Add addr (tmp) (1828 + ((i + 0)%16777216) * 0) dim=1 addr (cm1_2ns) (2 + ((i + 0)%16777216) * 215) dim=1
         write (addr (tmp) (1829 + ((i + 0)%16777216) * 0) dim=1)
         Mul addr (challenge) (5 + ((i + 0)%16777216) * 0) dim=1 addr (tmp) (1829 + ((i + 0)%16777216) * 0) dim=1
         write (addr (tmp) (1830 + ((i + 0)%16777216) * 0) dim=1)
         Add addr (tmp) (1830 + ((i + 0)%16777216) * 0) dim=1 addr (cm1_2ns) (3 + ((i + 0)%16777216) * 215) dim=1
         write (addr (tmp) (1831 + ((i + 0)%16777216) * 0) dim=1)
         Mul addr (challenge) (5 + ((i + 0)%16777216) * 0) dim=1 addr (tmp) (1831 + ((i + 0)%16777216) * 0) dim=1
         write (addr (tmp) (1832 + ((i + 0)%16777216) * 0) dim=1)
         Add addr (tmp) (1832 + ((i + 0)%16777216) * 0) dim=1 addr (cm1_2ns) (4 + ((i + 0)%16777216) * 215) dim=1
         write (addr (tmp) (1833 + ((i + 0)%16777216) * 0) dim=1)
         Mul addr (challenge) (5 + ((i + 0)%16777216) * 0) dim=1 addr (tmp) (1833 + ((i + 0)%16777216) * 0) dim=1
         write (addr (tmp) (1834 + ((i + 0)%16777216) * 0) dim=1)
         Add addr (tmp) (1834 + ((i + 0)%16777216) * 0) dim=1 addr (cm1_2ns) (5 + ((i + 0)%16777216) * 215) dim=1
         write (addr (tmp) (1835 + ((i + 0)%16777216) * 0) dim=1)
         Mul addr (challenge) (5 + ((i + 0)%16777216) * 0) dim=1 addr (tmp) (1835 + ((i + 0)%16777216) * 0) dim=1
         write (addr (tmp) (1836 + ((i + 0)%16777216) * 0) dim=1)
         Add addr (tmp) (1836 + ((i + 0)%16777216) * 0) dim=1 addr (cm1_2ns) (6 + ((i + 0)%16777216) * 215) dim=1
         write (addr (tmp) (1837 + ((i + 0)%16777216) * 0) dim=1)
         Mul addr (challenge) (5 + ((i + 0)%16777216) * 0) dim=1 addr (tmp) (1837 + ((i + 0)%16777216) * 0) dim=1
         write (addr (tmp) (1838 + ((i + 0)%16777216) * 0) dim=1)
         Add addr (tmp) (1838 + ((i + 0)%16777216) * 0) dim=1 addr (cm1_2ns) (7 + ((i + 0)%16777216) * 215) dim=1
         write (addr (tmp) (1839 + ((i + 0)%16777216) * 0) dim=1)
         Mul addr (challenge) (5 + ((i + 0)%16777216) * 0) dim=1 addr (tmp) (1839 + ((i + 0)%16777216) * 0) dim=1
         write (addr (tmp) (1840 + ((i + 0)%16777216) * 0) dim=1)
         Add addr (tmp) (1840 + ((i + 0)%16777216) * 0) dim=1 addr (cm1_2ns) (8 + ((i + 0)%16777216) * 215) dim=1
         write (addr (tmp) (1841 + ((i + 0)%16777216) * 0) dim=1)
         Mul addr (challenge) (5 + ((i + 0)%16777216) * 0) dim=1 addr (tmp) (1841 + ((i + 0)%16777216) * 0) dim=1
         write (addr (tmp) (1842 + ((i + 0)%16777216) * 0) dim=1)
         Add addr (tmp) (1842 + ((i + 0)%16777216) * 0) dim=1 addr (cm1_2ns) (9 + ((i + 0)%16777216) * 215) dim=1
         write (addr (tmp) (1843 + ((i + 0)%16777216) * 0) dim=1)
         Mul addr (challenge) (5 + ((i + 0)%16777216) * 0) dim=1 addr (tmp) (1843 + ((i + 0)%16777216) * 0) dim=1
...

Each line starting with 'Mul' or 'Add' represents a calculation and then follows a write instruction to write back the output of the previous instruction to the memory. This is slow.

Furthermore, We have to iterate i \in 0...(2**23), which costs hours to finish the whole arithmetization.

We can use JIT to optimize this part by https://github.com/bytecodealliance/wasmtime/tree/main/cranelift. The jit demo can refer to https://github.com/bytecodealliance/cranelift-jit-demo

Parallel base sum computation

When we evaluate some point x on a polynomial p(a_0, ..., a_d), we usually have such a function:

a_0 + x * ( a_1 + x * ( ...  x * a_d ... ))

I think we can convert it to the classical prefix sum problem, by which, given an input array a[0..n], we output b[0..n], where b[i] = sum_j (a[j]), j < i.

i.e.

out[0] = 0; for j from 1 to n do   out[j] = out[j-1] + f(in[j-1]);

TBD

Reference: https://developer.nvidia.cn/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda

Implement SM executor service

Move circom related code into dsl_circom

Refer to the discussion here: #128 (comment).