Coder Social home page Coder Social logo

a16z / jolt Goto Github PK

View Code? Open in Web Editor NEW
610.0 28.0 118.0 15.42 MB

The simplest and most extensible zkVM. Fast and fully open source from a16z crypto and friends. ⚡

Home Page: https://jolt.a16zcrypto.com

License: MIT License

Rust 89.14% Shell 2.11% Solidity 8.75%
arkworks cryptography snark zk zk-snarks zkp crypto zkvm

jolt's Introduction

Jolt

imgs/jolt_alpha.png

Just One Lookup Table.

Jolt is a zkVM (zero-knowledge virtual machine) for RISC-V, built to be the simplest, fastest, and most extensible general-purpose of its kind. This repository currently contains an implementation of Jolt for the RISC-V 32-bit Base Integer instruction set (RV32I). Contributors are welcome!

The Jolt paper was written by Arasu Arun, Srinath Setty, and Justin Thaler.

Resources

Quickstart

Note

Jolt is in alpha and is not suitable for production use yet.

For developers looking to build using Jolt, check out the Quickstart guide.

For developers looking to contribute to Jolt, follow the instructions below.

Installation

You will need Rust nightly.

If you have rustup installed, you do not need to do anything as it will automatically install the right toolchain and install additional target on the first cargo invocation.

Clone this repo:

git clone [email protected]:a16z/jolt.git

To check if rustup has picked the right version of Rust run rustup show inside the cloned repository.

cd jolt; rustup show.

Build

This repository uses workspaces, and each workspace can be built individually, e.g.

cargo build -p jolt-core

For faster incremental builds, use the build-fast profile:

cargo build --profile build-fast -p jolt-core

Test

Unit and end-to-end tests for jolt-core can be run using the following command:

cargo test -p jolt-core

Examples in the examples directory can be run using e.g.

cargo run --release -p sha2-chain

Performance profiling

Jolt uses tracing_chrome for performance profiling.

To generate a trace, run:

cargo run --profile build-fast -p jolt-core trace --name sha3 --format chrome

Where --name can be sha2, sha3, sha2-chain, or fibonacci. The corresponding guest programs can be found in the examples directory. The benchmark inputs are provided in bench.rs.

The above command will output a JSON file, e.g. trace-1712455107389520.json, which can be viewed in Perfetto.

Acknowledgements

This repository started as a fork of https://github.com/arkworks-rs/spartan. Original Spartan code by Srinath Setty.

Disclaimer

This code is being provided as is. No guarantee, representation or warranty is being made, express or implied, as to the safety or correctness of the code. It has not been audited and as such there can be no assurance it will work as intended, and users may experience delays, failures, errors, omissions or loss of transmitted information. Nothing in this repo should be construed as investment advice or legal advice for any particular facts or circumstances and is not meant to replace competent counsel. It is strongly advised for you to contact a reputable attorney in your jurisdiction for any questions or concerns with respect thereto. a16z is not liable for any use of the foregoing, and users should proceed with caution and use at their own risk. See a16z.com/disclosures for more info.

jolt's People

Contributors

0xfuturistic avatar aleph-v avatar arasuarun avatar emlazzarin avatar ethan-000 avatar flyq avatar fmhall avatar gujustin avatar huitseeker avatar imikushin avatar jakzale avatar karmacoma-eth avatar matteomer avatar microsoftopensource avatar mmagician avatar moodlezoup avatar mw2000 avatar nanne007 avatar ncitron avatar patstiles avatar sebasti810 avatar shreyas-londhe avatar spartucus avatar sragss avatar srinathsetty avatar succinctpaul avatar tahsintunan avatar tgodden avatar tumberger avatar zhenfeizhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jolt's Issues

Proof recursion

Proof recursion will eventually be needed for on-chain Lasso proofs.
There are multiple verifications being done within Lasso and only some will need to be 'made into circuits' for this to work.

Why?

  • Reducing proof size.
  • Better memory usage in some zkVM settings.

I want to open this ticket up early as it should influence design decisions such as hash selection.
Also, if there is already a plan for this I would love to learn about it!

Quarks grand product argument

Jolt currently uses the grand product argument from Thaler13, which has roughly log^2(n)-sized proofs for products of n values.

Section 6 of the Quarks paper reduces this proof size to close log(n) with a modest increase in prover commitment costs.

Parallelize compute_chunks_operands

For a 64 core machine at a cycle count of ~16M, Jolt spends ~1.8% of its time in a segment called compute_chunks_opreands here.

This segment allocates and computes C different chunks for each instruction. For example for the EQ instruction we split the input operands X, Y into 4 8-bit chunks (WORD_SIZE / C). We then can compute EQ over each chunk individually.

Idea for acceleration: Split chunks_x and chunks_y into mutable slices. Iterate over each and compute the values in parallel writing the the slice indexes directly.

It may be helpful to review the tracing strategy for performance testing.

Indices explanation.

Hi,

I was looking into the benchmarks and subtables examples.
Thank you for the great work 🙏

Indices are the actually lookups?
If so, a brief explanation on how they work would be helpful.
For example using "And(2^128, 2^4)" how would one see that the specific value [1010] is in the table?

i.e. In a zkVM you could take as input [0,1] and [1,0] AND them and get [0,0]. Using Lasso lib, how would you check that [0,0] is in the table?

https://github.com/a16z/Lasso/blob/b0db15bb83c81912ba514b614e60b4d3f5f42e7e/src/benches/bench.rs#L13

It is also used here as x_indices and y_indices with no explanation of what these things are.

https://github.com/a16z/Lasso/blob/b0db15bb83c81912ba514b614e60b4d3f5f42e7e/src/subtables/and.rs#L126

Improve Benchmarking

benches/dimensions.rs was intended to help understand performance of various Subtables and parts of the prover / verifier workflow. Unfortunately Criterion is quite bad at long running benchmarks (see) leading src/bench.rs to be the main source of benchmarks.

sudo cargo flamegraph gives some idea of work, but requires export RAYON_NUM_THEADS=1 to be useful and does not give a good idea of system level parallelized performance.

Additionally, bench.rs does not have proper granularity to understand where time is spent.

The `combine_lookups` function

This issue is rather a question. The JoltInstruction trait contains a function called combine_lookups (here). I understand this is meant to represent the polynomial g from the Lasso and Jolt preprints (e.g. Jolt preprint, definition 2.6) and it receives the lookups T_i(r_j) of suitable chunks in suitable subtables.

  • Am I correct in thinking that, even though this function can contain any Rust code, it should strictly perform a polynomial evaluation on its input? Although this is perhaps not meant to be an extensible library at the moment, it would seem natural to enforce this on the code level.
  • In particular, combine_lookups should not e.g. use the instruction's input data, such as self.0 (in the implementors that have such a field) directly, correct?
  • I understand the degree function of in the JoltInstruction trait should return the degree of the polynomial g computed by combine_lookups. But is this degree variable-wise, or monomial-wise? In other words, should g(x_1, x_2) = x_1 * x_2 return 1 or 2? I'm guessing the latter, since the Jolt paper estipulates g should be a multilinear polynomial and hence have variable-wise degree <= 1.

Thank you for sharing this excellent library!

Parallelize RAM Timestamp Count Initialization

For a 64 core machine at a cycle count of ~16M, Jolt spends ~3.5% of its time in a segment called memory_trace_processing here. This segment allocates and computes the offline memory checking (a,v,t) polynomials used for the combined registers and RAM. Some additional details can be found in the wiki.

Currently this ~300 line segment takes ~3.5% of end-to-end time because it is computed completely serially. No use of additional CPU cores. We should parallelize this to get up to a NUM_CPUx speedup.

The goal is to fill out the following.
Trace Length sized

  • a_ram
  • v_read
  • v_write_rd
  • v_write_ram
  • t_read
  • t_write_ram

Memory sized

  • v_final
  • t_final

It may be helpful to review the tracing strategy for performance testing.

Build error due to incompatible version of packages related to clap

When building, the following error occurs:
package clap_derive v4.4.2 cannot be built because it requires rustc 1.70.0 or newer, while the currently active rustc version is 1.66.0-nightly

A solution could be to update the version of the (nightly) rust-toolchain used.

ReadWriteMemory::get_polys_r1cs by reference

ReadWriteMemory::get_polys_r1cs() is extremely slow (here) because it clones all of the underlying not once but twice. On a ~16M cycle count trace on a 64 core machine it takes 2% of end-to-end time.

First: remove the double clones from self.v_read and self.v_write_rd. Double clones come from a clone in DensePolynomial::evals() then another in .flat_map(...).collect().

Second: refactor get_polys_r1cs() to return lifetimed references to the underlying vectors via DensePolynomial::evals_ref() rather than cloning / reallocating at all.

It may be helpful to review the tracing strategy for performance testing.

RV32I M-Extension

More instructions: The Jolt codebase currently implements the RISC-V 32-bit Base Integer instruction set (RV32I), but the Jolt construction is highly flexible. Add support for the RISC-V “M” extension for integer multiplication and division, as described in the Jolt paper. Further, Jolt can trivially support the 64-bit variant RV64IM.

Lazy SubtableStrategy

Currently we call SubtableStrategy::materialize_subtables before doing any of the dereferencing work to create $E_i$ for all dimensions. For most of the existing tables this work is not in the critical path and the NUM_SUBTABLES * MEMORY_SIZE work is not a blocker.

Initially this strategy was used for Spark tables which can only be computed in O(memory_size) time if computed all at once, but we no longer have this restriction: Most of our SOS tables can compute a single T[k] evaluation in O(1) time without computing all other M evaluations.

We could explore a lazy subtable evaluation strategy where we only evaluate T[k] as needed and cache the results. This may result in speedups for tables where T[k] is expensive to compute.

Compile Lasso/Jolt verifier to RISC-V

Understanding Jolt verifier complexity in terms of RISC-V instructions will help us understand recursive properties and on-chain proving complexity.

Better naming for 'r' params

Throughout surge we overload the name r to mean various different verifier (fiat-shamir) provided randomness.

We should standardize the naming of these different r parameters and clean up the references.

Benchmark + Improve EqPolynomial::evals

We use EqPolynomial::evals everywhere. On inspection of the gant chart or direct tracing logs, you can see how often this is used. Any improvements here would speed up the system significantly.

Best steps are

  1. Write a benchmark (Criterion would be good)
  2. Speed up experimentally

Reorg file structure

Some ideas:

  • Move all dense PCS code into directory
  • Move testing / benchmarking code into directory
  • Move subtable code to directory
  • Move memory checking code to directory

Build requires nightly rust

I cloned the repo and ran the first command in the README.
Turns out it requires the latest rust (nightly).

I used rust installed using brew on my Mac, it failed:

jon@Jonathans-MBP-3 Lasso % git log -1
commit 97ccfd9ac1ec5e261c2f0fd45d5b494cc324b3ac (HEAD -> master, origin/master, origin/HEAD)
Author: Michael Zhu <[email protected]>
Date:   Fri Aug 11 09:59:15 2023 -0400

    Update README.md

jon@Jonathans-MBP-3 Lasso % cargo --version
cargo 1.71.0 (cfd3bbd8f 2023-06-08)

jon@Jonathans-MBP-3 Lasso % cargo build --release
   Compiling ark-lasso v0.4.0 (/Users/jon/src/Lasso)
error[E0554]: `#![feature]` may not be used on the stable release channel
 --> src/lib.rs:3:1
  |
3 | #![feature(extend_one)]
  | ^^^^^^^^^^^^^^^^^^^^^^^

error[E0554]: `#![feature]` may not be used on the stable release channel
 --> src/lib.rs:4:1
  |
4 | #![feature(associated_type_defaults)]
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0554]: `#![feature]` may not be used on the stable release channel
 --> src/lib.rs:6:1
  |
6 | #![feature(generic_const_exprs)]
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0554]: `#![feature]` may not be used on the stable release channel
 --> src/lib.rs:3:12
  |
3 | #![feature(extend_one)]
  |            ^^^^^^^^^^

For more information about this error, try `rustc --explain E0554`.
error: could not compile `ark-lasso` (lib) due to 4 previous errors

The solution is to install nightly using rustup (I answered Yes to overwrite the brew install)

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
rustup default nightly
cargo build --release

Slow builds

Some preliminary investigation follows (mostly to demonstrate methods).

To find bloated libraries: cargo bloat --release --crate

Can find the source of these deps with: cargo tree -p <dependency> -i

Most of these can be traced back to cranelift-codegen <- wasmer <- circom-scotia. We don't use wasm in circom scotia so can likely kill this dependency with proper feature gating (potentially upstream).

We can profile the whole build with cargo build -p jolt-core --timings --release. This reveals a ton of time spent building dependencies that we don't care about.

Improve errors for large M

M > 2^20 crashes Lasso.

This is likely due to OOM when constructing the dereferenced memory tuples (address, value, timestamp).

We can likely detect that this will happen ahead of time based on the parameters and provide a more useful error.

SubtableStrategy 'M' const generic

Currently M (the size of memory) is a const generic parameter on the SubtableStrategy trait. M is only used for the bit packing of AndSubtableStrategy::combine_lookups. It will likely be used by some subset of the yet-to-be-created tables, but not all.

Should M be moved to a generic on the AndSubtableStrategy impl?

This would allow us to remove the M generic in many other places.

Multiple Dense PCS

Surge uses the dense multilinear polynomial commitment scheme from microsoft/Spartan.

We can swap this PCS with any other dense multilinear PCS for different performance characteristics. There are not many other dense multilinear PCS written in Rust, but it's worth generalizing the interface.

Zeromorph polynomial commitment

Zeromorph paper: https://eprint.iacr.org/2023/917

This will make the Jolt proofs much shorter than when using the Hyrax commitment, and will also slightly speed up the prover (Hyrax commitments are big enough that serializations group elements actually has a time cost).

A Rust implementation of Zeromorph is here, so it just needs to be integrated into Jolt (specifically, ported to arkworks): https://github.com/lurk-lab/arecibo/blob/dev/src/provider/non_hiding_zeromorph.rs

A Solidity implementation of the Zeromorph verifier (plus sum-check) is here: https://github.com/Maddiaa0/honk-verifier

iai-callgrind benchmarks

iai-callgrind provides high-precision, high-granularity performance benchmarks of Rust code (using Valgrind under the hood).

It would be nice to have iai-callgrind benchmarks for hot paths in our code, including:

  • MSMs
  • polynomial binding
  • polynomial evaluation (as used in sumcheck)

[Question] How could we use Lasso to solve real problems?

Hi, developers of Lasso, I recently got to know and learn about this impressive project.

After I have an idea about how to write a subtable, I'm getting excited and want to solve some more complicated problems with the help of those subtables from src/subtables. Like calculating ((a AND b) XOR b) OR a with help of subtable AND, OR and XOR.

But I really don't have a strong background of lookup argument and surge. This makes me not even know where to start. So if Lasso is able to solve any real problems, is there anybody could help me with it? A doc or tutorial would help a lot. Thanks in advance.

Optimize / Parallelize InstructionLookups::polynomialize

For a 64 core machine at a cycle count of ~16M Jolt spends ~4% of its time on segment called InstructionLookups::polynomialize here. This segment allocates and computes the offline memory checking (a,v,t) polynomials and other inputs for the Jolt instruction execution.

Currently much of the time is spent serially. We should parallelize this to get a speedup as the core count increases.

Similar to #292.

It may be helpful to review the tracing strategy for performance testing.

Fix Github Workflows

Fix Github Workflows to run tests, potentially some linting, and add badge to README

Optimize Sumcheck polynomial evaluations

Within a given round of Sumcheck, we compute evaluations of a polynomial after the combine or g function is applied. The number of points we compute is the degree of the resulting polynomial, as n+1 points uniquely define an n degree polynomial. We can compute evaluations at any points, and so we've chosen (0, 1, 2, ..., degree + 1).

Currently we naively compute these points, but there are lots of optimizations here, particularly for (0, 1).

See // TODO(#28) for details

Consider removing const generics

Surge makes extensive use of const generics to allow statically sized arrays throughout rather than Vectors or Slices.

In order to use this pattern through our generic SubtableStrategy traits we require the use of feature(generic_const_exprs) which is highly unstable. Beyond the instability, we have strange artifacts on many of our impls, such as [(); S::NUM_SUBTABLES]: Sized. These tell the compiler that the const generic expression is bounded.

It's likely that the overhead of using Slices and len asserts everywhere is minimal enough to be irrelevant. It's worth considering whether the benefit is worth the downside.

Remove dependencies on N as a u64 to allow big tables

Currently N, the size of the big table (N=M^c), is stored in various places as a u64. We'd like to explicitly support a big table size of N=128.

We should be able to do this by just storing M (the subtable size) and c and never explicitly computing N.

Consider removing support for Spark

Currently Surge supports using Surge as a multi-dimensional sparse polynomial commitment scheme known in the Lasso paper as "Spark". To support, we generate and pass through a 'C' different log2(sparsity) sized randomness vectors for evaluating the multilinear extension of the $\tilde{eq}_i(x,r)$ polynomial.

Currently we generate both eq_randomness: Vec<F> of size log-s and spark_randomness: [Vec<F>; C] with each vec of size log-m. For Surge we use only eq_randomness.

The SubtableStrategy::materialize_subtables and SubtableStrategy::evaluate_subtable_mle function signatures depend on r: &[Vec<F>; C] despite being an unused parameter in all Subtable strategies beyond Spark.

This means that we pass spark_randomness deep into prover and verifier workflows.

Spark is useful in it's own right – support would be nice, but it does carry a substantial readability / maintainability cost.

What PCS is being used with Lasso?

Hey @sragss @moodlezoup ,

The write-up says Lasso is using Hyrax. Hyrax is actually a scheme from this paper ->
https://eprint.iacr.org/2017/1132.pdf

The same paper introduced a polynomial commitment scheme for multilinear polynomials: Hyrax-PCS.
I am not an expert on naming, but I am told the PCS in that paper is referred to as Hyrax-PCS 😉.

Write-up:
https://github.com/a16z/Lasso/blob/master/EngineeringOverview.md


DotProductProofLog - reading the code makes sense.
But is this Hyrax or just an efficient way to commit to Poly vectors using (bullet-proof / IPA)?

I was directed to another version of Spartan when asking around about Hyrax-PCS:
https://github.com/microsoft/Spartan2/tree/hyrax_pc2/src/provider

[EDIT]
Re-read the Hyrax paper. This all makes sense now.

This issue can be removed.

Thanks!

Continuations via folding

Many zkVMs today use continuations, which means they break the execution of a computer program into chunks and prove each chunk independently before recursively aggregating the proofs into one. One major reason to do this is to control the prover's space requirements.

Continuations can be implemented especially efficiently when Jolt is combined with a homomorphic commitment scheme like HyperKZG or Zeromorph.

Still, implementing this performatively will be a major endeavor and will rely on some upcoming research.

Incorporate support for pre-compiles

Incorporating precompiles into Jolt is essential to achieve developer adoption, since performance of all zkVMs, Jolt included, is simply too slow today without them.

The most natural approach is to have the prover "collect" all the inputs to the various invocations of the precompile together and apply (a data-parallel version) of the precompile to the collected inputs.

If the precompile consists of a hand-specified constraint system (which is how the current generation of zkVMs implements precompiles), we would have developers specify the constraint system in an appropriate constraint specification language, and then apply a back-end for that constraint system to prove satisfiability.

Today’s zkVMs all use AIR as the constraint system. We could reuse these precompiles, applying BabySpartan or SuperSpartan (our sum-check-based SNARKs for AIR and Plonkish constraint systems) to prove their satisfiability.

Long-term, the fastest precompiles will be special-purpose sum-check-based SNARKs that avoid constraint systems entirely. In this case, we’d probably have developers specify them by expressing the prover and verifier directly, in Rust.

Gluing Tables Together

We'd like to support the use of multiple SubtableStrategies in recursive fashion to allow a single proof of lookups over multiple large tables. The strategy for combining large tables is identical to that of the combination of Subtables described in Surge with the addition of some "selector bits" out front.

Binius commitment scheme

This is a major issue involving the following sub-issues.

*Switch the finite field to GF[2^128].
**This will force changing all the "subtables and combining function" in the Instruction Lookups (Lasso).
**It will also require modest changes to the Lasso lookup argument (described in the Binius paper, Section 4.4).

*RISC-V addition and multiplication will no longer be amenable to solution via lookups. We will instead use gadgets (in this case, this means constraint systems) given for them in the Binius paper, Section 5.3. This involves the same changes/approach as supporting pre-compiles (Issue 210)

It also probably makes sense to handle XOR operations without lookups, since they will correspond to addition over GF[2^128].

*Implement recursion/continuations. This will require pre-compiles for native field arithmetic over GF[2^128], and for Keccak or Grostl hashing see Binius paper, Appendix A

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.