tritonvm / triton-vm Goto Github PK

Triton is a virtual machine that comes with Algebraic Execution Tables (AET) and Arithmetic Intermediate Representations (AIR) for use in combination with a STARK proof system.

Home Page: https://triton-vm.org

License: Apache License 2.0

Rust 99.85% Makefile 0.15% CSS 0.01%

rust cryptography stark zero-knowledge zk-starks

triton-vm's Introduction

Triton VM

Triton is a virtual machine that comes with Algebraic Execution Tables (AET) and Arithmetic Intermediate Representations (AIR) for use in combination with a STARK proof system. It defines a Turing complete Instruction Set Architecture, as well as the corresponding arithmetization of the VM. The really cool thing about Triton VM is its efficient recursive verification of the STARKs produced when running Triton VM.

Getting Started

If you want to start writing programs for Triton VM, check out Triton TUI. If you want to generate or verify proofs of correct execution, take a look at the examples.

Recursive STARKs of Computational Integrity

Normally, when executing a machine – virtual or not – the flow of information can be regarded as follows. The tuple of (input, program) is given to the machine, which takes the program, evaluates it on the input, and produces some output.

If the – now almost definitely virtual – machine also has an associated STARK engine, one additional output is a proof of computational integrity.

Only if input, program, and output correspond to one another, i.e., if output is indeed the result of evaluating the program on the input according to the rules defined by the virtual machine, then producing such a proof is easy. Otherwise, producing a proof is next to impossible.

The routine that checks whether a proof is, in fact, a valid one, is called the Verifier. It takes as input a 4-tuple (input, program, output, proof) and evaluates to true if and only if that 4-tuple is consistent with the rules of the virtual machine.

Since the Verifier is a program taking some input and producing some output, the original virtual machine can be used to perform the computation.

The associated STARK engine will then produce a proof of computational integrity of verifying some other proof of computational integrity – recursion! Of course, the Verifier can be a subroutine in a larger program.

Triton VM is specifically designed to allow fast recursive verification.

Project Status

Triton VM is still under construction. We currently don't recommend using it in production.

Please note that the Instruction Set Architecture is not to be considered final. However, we don't currently foresee big changes.

Specification

The specification can be found online. Alternatively, you can self-host the mdBook by first installing the dependencies, then serving the mdBook.

cargo install mdbook
cargo install mdbook-katex
cargo install mdbook-linkcheck

mdbook serve --open

Potentially, ~/.cargo/bin needs to be added to the PATH.

Running the Code

The Rust implementation of Triton VM resides in triton-vm and can be found on crates.io.

Triton VM depends on the twenty-first cryptographic library.

For trying out the code, install Rust and run:

~ $ git clone https://github.com/TritonVM/triton-vm.git
~ $ cd triton-vm
~/triton-vm $ make test

For local development of both libraries, it is encouraged to follow GitHub's fork & pull workflow by forking and cloning both, place twenty-first relative to triton-vm, and change the dependency to be path-local:

~ $ git clone [email protected]:you/triton-vm.git
~ $ git clone [email protected]:you/twenty-first.git
~ $ cd triton-vm
~/triton-vm $ ln -s ../twenty-first/twenty-first twenty-first
~/triton-vm $ sed -i '/^twenty-first =/ s/{.*}/{ path = "..\/twenty-first" }/' triton-vm/Cargo.toml

triton-vm's People

Contributors

Stargazers

Watchers

triton-vm's Issues

Add Pseudo Instructions

It's nice to have easy access to common instruction patterns without having to write the exact pattern, but rather, have a tool translate some instruction to the pattern – pseudo instructions.

Please suggest any pseudo instructions you'd like to have access to or mention any you have already implemented so they can be added here. Also, feel free to criticize the ones in the list.

Since the language supporting pseudo instructions is not “pure” TritonVM assembly, we can also add other nice-to-haves.

(This list is edited to summarize the results of the discussion below.)

Pseudo Instructions and other Features

`call_function`

Adds the functionality of labels, syntactically marked by having a colon : as their last character. Any such label can be the jump target of instruction call_function. For example:

call foo
push 2
add

foo:
push 2
return

`subtract`

Given stack _ b a produces _ (b-a).

push -1
mul
add

`increment`

Increments the top-most value of the stack by 1.

push 1
add

`decrement`

Decrements the top-most value of the stack by 1.

push -1
add

`if`

alias for skiz

`skinz`

Skips the next instruction if top of the stack is non-zero. Inspired by @Sword-Smith.

push 0
eq
skiz

`neg`

Given stack _ a produces _ -a. Thanks @einar-triton.

push -1
mul

`neq`

Given stack _ b a produces _ (b != a).

eq
push 0
eq

`p=np`

alias for divine

(Originally issue number 17 in the internal issue tracker.)

Measure and reduce memory consumption

(Originally posted by @sshine.)

Create an overview of the most memory-wasteful areas of the code.
Avoid generating entire co-set domains
Avoid cloning table collection data

Ultimately, it would be neat to be able to do both padding and extending in-memory; padding is already done in-place, but extending isn't because of the shift from BFieldElement to XFieldElement. Completely shifting to XFieldElement for everything has the performance drawback being that it inflates the proof with zeroes. (It's also a bit obscure.)

We might want to think of a way to pre-allocate space for XFieldElements in the original execution trace, even though it's technically BFieldElements.

(Originally issue number 33 in the internal issue tracker.)

Describe AIRs in terms of Table Elements

The polynomial presentation of AIR constraints is too dense. Best to explain them in words first, then formulaically in terms of table cells with variable indices, and only then in terms of polynomials.

(Originally issue number 4 in the internal issue tracker.)
(Originally opened by @aszepieniec.)

Add the following part of the AIR:

DNF in words for U32 Table's constraints
Polynomials for U32 Table's constraints
Polynomials for Hash Table's transition constraints

In the code we need to use:

Polynomials for U32 Table's constraints
Polynomials for Hash Table's transition constraints
the transition constraints of instruction groups in the Processor Table

Reduce Rescue Prime's number of rounds from 8 to 7

Update spec to reflect digest length (6 → 5) change

In isa.md: Occurrences of "6" and "12".
In arithmetization.md: Constraint polynomials.

The changes must reflect changes made to the source code.

The source code currently depends on twenty-first-0.1.

The digest length (6 → 5) change lives in twenty-first-0.2.

Program Attestation

Here is an idea for program attestation in Triton VM that @aszepieniec and I have talked about. Not all details are completely fleshed out yet, and are subject to change as our understanding develops. For other ideas, see also this presentation by @bobbinth, or the presentation's write-up.

The Construction

The suggestion in a nutshell: Triton VM requires the STARK prover to use the hash coprocessor to compute the hash digest of the program and (successfully) equate it to a dedicated part of the public input.

The Hash coprocessor now has a new Sponge mode, not accessible to programs being executed in Triton VM. An Evaluation Argument between the Program Table and the Hash Table establishes that the program has been loaded into the hash coprocessor correctly. The hash coprocessor's transition constraints establish the correct computation of the program's hash digest.

An Evaluation Argument (or possibly a Permutation Argument) between the Hash Table and the input establishes that the dedicated part of the input is equal to the program's hash digest.

Summarizing, the Prover can produce a valid proof if and only if the hash digest of the program that's actually being executed is supplied as part of the public input. This allows anyone to easily verify whether

the program is indeed the one whose execution is being claimed, in case the program is public, and
two triples of (input, program, output) share the program, in case the program is private.

Hiding Only Part of the Program

The construction allows for an immediate add-on: keeping one part of the program secret, while publishing another part. The clue is that the Sponge construction is always initialized to 0. By supplying the hash digest of the secret part of the program as part of the public input, hashing the secret part of the program first, then comparing the two hash digests, the secret part of the program can be attested to. Continuing from there in just the same way, the total program can be attested to.

Function calls into the secret part of the code allows programmers to write a “secret standard library” which can always be prepended, and of which some parts are simply not executed in case they are not needed. Jumping from the public part of the code into the secret part can leak information about the size of various functions contained therein. Changing the “secret standard library” generates a new hash digest for it, making some comparisons impossible (unless the update is somehow proven).

While our construction is not as flexible as the MAST approach, having the option to hide some part is already pretty valuable.

Actionable Items

Flesh out the details of the construction
Flesh out the details of the add-on

(Originally issue number 40 in the internal issue tracker.)

Include consistency constraints when calculating quotients

(Originally posted by @sshine.)

Take consistency constraints into account when calculating quotients.
It seems that consistency constraints don't need challenges; remove.
Consider if boundary constraints need challenges; if not, remove.

    fn consistency_quotients(
        &self,
        fri_domain: &FriDomain<BWord>,
        codewords_transposed: &[Vec<XWord>],
        challenges: &AllChallenges,
    ) -> Vec<Vec<XWord>> {
        //
        let x_values: Vec<BFieldElement> = fri_domain.domain_values();

        let consistency_constraints = self.ext_consistency_constraints(challenges);
        let consistency_quotients = consistency_constraints
            .iter()
            .map(|constraint| {
                //
            })
            .collect();

        consistency_quotients
    }

(Originally issue number 34 in the internal issue tracker.)

Factor out Rescue Prime related functionality

Currently, certain things pertaining to Rescue Prime live in various files. Concretely:

number of round constants (hash_table.rs)
number of rounds (hash_table.rs)
rescue_xlix_round_constants_by_round_number (state.rs)

Factor above things out and move them to a new file rescue_prime.rs.

Thoughts about Yuncong's Attack

Yuncong's (@yczhangsjtu's) attack may apply whenever the correct sorting of memory tables is not enforced. It reads an inconsistent value from memory even though the memory table's AIR is satisfied and even though the permutation argument checks out. This is possible in general because jumps in memory address or clock cycle are are not verified to be increasing jumps. Specifically, jumps that decrease one of these values are allowed (but shouldn't be), and thus it may be possible to rearrange rows between contiguous regions of the memory table.

Without loss of generality, we use mp to denote the memory address, mv to denote the value, and clk to denote the cycle counter. Bear in mind that the same problem (and solutions) apply to three memory structures in Triton: RAM, OpStack, and JumpStack.

Ferdinand's Intuition

mp partitions the memory table; the induced regions should be contiguous.
clk should increase monotonically within each region.

Intuitively, these rules are necessary and sufficient for integral memory access. However, it's not quite clear how to prove this.

Theorem

What do we even mean by integral memory access? Here is a suggestion:

An execution trace $T$ is has integral memory access iff:

for all $i$ in {0, . . . , height( $T$ )}, if $T$[ $i$ ][ci] = mem_read then there is a $j$ in {0, . . ., $i-1$} for which all of the following hold:
- $T$[ $i+1$ ][st0] = $T$[ $j$ ][st0];
- $T$[ $j$ ][ci] = mem_write;
- for all $k$ in { $j+1$ , . . ., $i - 1$} either $T$[ $k$ ][ci] =/= mem_write or $T$[ $k$ ][st0] =/= $T$[ $j$ ][st0].

Analogous propositions should be considered for OpStack and JumpStack.

Sequential Addressing

For the RAM specifically, the memory pointer can be any field element. As a result, the jumps can be arbitrarily big, and no sorting can be enforced.

A solution is to use a lookup table into a map that sends random memory addresses to sequential memory addresses and back. This memory address bijection (MAB) maps into the range {0, . . . , k} where k+1 is the number of unique memory addresses. The column ramp comes with a counterpart samp (s for sequential). A set equality argument establishes that the pairs (ramp, samp) that appear in the Ram Table constitute the same set as the pairs in the Mab Table. The Ram Table is then sorted by samp and this sorting can be verified because samp can never increase by more than one.

How do we establish that the Mab Table defines a bijection? In particular, it can't include two-to-one relations or one-to-two relations. This is an open question.

answer

Lazy `ramp`

In the current architecture, ramp and st1 are the same register. As a result, st1 takes all sorts of values and thus implicitly points to memory cells located there.

It is desirable to split ramp and st1 back into two separate registers. This split does not induce a change in the instruction set architecture: ramp only has to assume the value of st1 when the current instruction is mem_read or mem_write.

Solutions

Whenever there is a jump in clk, the difference between the next and the current values must be small but positive. There are various strategies to prove that this is the case.

Solution 1: U32 Lookups

Since we do not expect the cycle counter to rise above $2^{32}$, we can require that this difference is a u32 integer. A table lookup argument to the U32 Table can establish this fact.

Solution 2: Add Bit Columns

We add k columns whose values are constrained to bits, for some k – say, k = 32. Whenever there is a jump, the jump is exactly equal to the integer represented by the bit columns.

Solution 3: Add Dummy Rows

Add 1 column that indicates whether rows are dummy rows or not. Whenever there is a jump, insert dummy rows that increase clk by one in each row.

Solution 4: Build Difference

This solution can be seen as a hybrid between solutions 2 and 3. Whenever there is a jump, we add logarithmically many dummy rows where we build a difference by adding one or multiplying by two in every row. When the sequence of dummy rows ends, the value at the end is added to clk.

U32 Table Design explorations

There are multiple ways to achieve the current functionality of the U32 Table. Let's explore the effects of different designs on both TritonVM as a whole and the recursive verifier specifically.

Re-Design Exploration 1 – Combine Permutation Arguments

Currently, there are 5 Permutation Arguments between the Processor Table and the U32 Table. The running product for all these Permutation Arguments is only updated in rows where the indicator idc is 1.

By adding column current instruction ci to the U32 Table, we can get away with only one Permutation Argument, which is conditional on not only on the indicator idc but also on ci.

Immediate upsides:

fewer columns in U32 Table
fewer columns in Processor Table

Immediate downsides:

AIR of slightly higher degree for U32 Table
AIR of slightly higher degree for Processor Table
no re-usability of different result columns for same input

Re-Design Exploration 2 – Combine Result Columns

The Processor conditions the legal relation of two consecutive rows based on the current instruction ci. If we add ci as a column to the U32 Coprocessor, we can do exactly the same thing. This allows us to combine currently existing columns LT, AND, XOR, REV into a single column OP_RESULT.

Immediate upsides:

fewer columns in U32 Table

Immediate downsides:

considerably more complex AIR for U32 Table

Re-Design Exploration 3 – Ditch U32 Table

Instead of using a dedicated Coprocessor for u32 operations, we can enhance and use the Processor directly. This requires decomposition into bits using helper variables, which can then be immediately combined into the desired result. In conjunction with re-design exploration 4, the negative impacts are less severe by a factor of 8.

Immediate upsides:

less high-level complexity of Triton VM

Immediate downsides:

many, many more columns in the Processor Table
somewhat more complex design of the Processor

Re-Design Exploration 4 – Add U8 Lookup Table

Currently, the U32 Coprocessor deals with one bit at a time. Having access to a static, pre-computed lookup table for bytes allows the U32 Coprocessor to deal with one byte at a time. Correct lookup is proven via some argument, and might require the introduction of a new argument type to Triton VM.

Immediate upsides:

faster U32 Coprocessor

Immediate downsides:

higher design complexity

(Originally issue number 26 in the internal issue tracker.)

Merge mergeable AIR constraints

Naïvely, you compute one quotient polynomial for every AIR constraint. However, if the support (= points where the composition polynomials evaluate to zero) is the same, then it pays to combine the AIR polynomials using random verifier-supplied coefficients first. Then compute fewer quotient polynomials. This speeds up both prover and verifier computations.
Specifically, the mergeable AIR constraints are:

all boundary constraints (since they all pertain to row 0 in their respective tables)
all transition constraints but separately for each table (unless the tables happen to share the same omicron)
all terminal constraints but separately for each table (ditto)

Write Constraints for Extension Tables

Implement AIR for the table's extension columns, which mainly deal with Permutation and Evaluation Arguments.

~~Blocked by #52.~~

Document recent changes to U32 Table

Document the following changes to the U32 Table:

include current instruction ci (see #16)
merge the 5 Permutation Arguments into 1 (see #16)
assert u32-ness of operands by counting rows, introducing 2 columns (see #22)

On the number of trace randomizers

A conversation from the previous issue tracker.

@Sword-Smith:
Adding a debug check when constructing the shifted quotient codewords for use in the non-linear combination reveals a problem. Number of randomizers: 1.

let interpolated = self.fri.domain.xinterpolate(terms.last().unwrap());
assert!(
    interpolated.degree() == -1 || interpolated.degree() == self.max_degree as isize,
    "The shifted quotient codeword with index {} must be of maximal degree {}. \
    Got {}. Predicted degree of unshifted codeword: {}. Shift = {}",
    i,
    self.max_degree,
    interpolated.degree(),
    qdb,
    shift
);

The interpolated shifted codeword has a degree of 111. Degree 127 was expected.

Error message:

thread '[…]]prove_verify_test' panicked at 'The shifted quotient codeword with 
index 8 must be of maximal degree 127. Got 111. Predicted degree of unshifted 
codeword: 65. Actual degree of unshifted codeword: 49. Shift = 62', […]]/stark.rs:499:17

@jan-ferdinand:
We (@Sword-Smith, @aszepieniec) believe to have identified and solved the problem.

The problem

Say you interpolate a trace without adding a randomizer, resulting in polynomial p_0(x). Checking transition constraints will check (powers and multiples of) p_0(x) - p_0(ο·x).

Now interpolate the same trace but with a single randomizer added, resulting in polynomial p_1(x) = p_0(x) + r_1(x). Checking transition constraints will now check (powers and multiples of) p_1(x) - p_1(ο·x).

Because the trace polynomial always needs to evaluate to 0 on powers of omicron, the random polynomial r_1(x) is invariant with respect to multiplication of its argument by omicron: r_1(x) - r_1(ο·x) = 0

The solution

Adding a second randomizer eradicates this phenomenon. The intuitive reason is that the random part r_2(x) of interpolant p_2(x) = p_0(x) + r_2(x) has roots not only in powers of omicron, but also in an additional point.

Appendix

The sagemath-script below allows exploring the described phenomenon by playing with numbers, their interpolants, and so on.

Example Output

# current configuration
p = 257, o = 241, ω = 81, (v0, v1, v2, v3, r0, r1) = (0, 1, 2, 3, 182, 76)

# interpolants
ip_0(x) = 136*x^3 + 128*x^2 + 120*x + 130
ip_1(x) = 138*x^4 + 136*x^3 + 128*x^2 + 120*x + 249
ip_2(x) = 46*x^5 + 10*x^4 + 136*x^3 + 128*x^2 + 74*x + 120

# part of the interpolant added by the randomizers
r_0(x) = ip_0(x) - ip_0(x) = 0
r_1(x) = ip_1(x) - ip_0(x) = 138*x^4 + 119
r_2(x) = ip_2(x) - ip_0(x) = 46*x^5 + 10*x^4 + 211*x + 247

# check for invariance with respect to the argument's multiplication by omicron
ip_0(x) - ip_0(o*x) = 16*x^3 + 256*x^2 + 241*x
ip_1(x) - ip_1(o*x) = 16*x^3 + 256*x^2 + 241*x
ip_2(x) - ip_2(o*x) = 11*x^5 + 16*x^3 + 256*x^2 + 230*x

# check for invariance of random parts with respect to the argument's multiplication by omicron
r_0(x) - r_0(o*x) = 0
r_1(x) - r_1(o*x) = 0
r_2(x) - r_2(o*x) = 11*x^5 + 246*x

# Roots of random parts
roots of r_0: []
roots of r_1: [1, 16, 241, 256]
roots of r_2: [1, 16, 78, 241, 256]

Script

# configuration:
# p = 257
# |trace| = |<o>| = 4
# |fri evaluation domain| = |<ω>| = 64

# prepare field, ring, indeterminate
F = GF(257)
R = F['x']
x = R.gen()

# four values, two randomizers
v0, v1, v2, v3, r0, r1 = 0, 1, 2, 3, 182, 76

# get generator for evaluation domain
o = F(1).nth_root(4)
ω = F(1).nth_root(64)

# trace without randomizers
trace_0 = [(o^0, v0), (o^1, v1), (o^2, v2), (o^3, v3)]
trace_1 = [(o^0, v0), (o^1, v1), (o^2, v2), (o^3, v3), (ω, r0)]
trace_2 = [(o^0, v0), (o^1, v1), (o^2, v2), (o^3, v3), (ω, r0), (ω^2, r1)]

# interpolate different traces
ip_0 = R.lagrange_polynomial(trace_0)
ip_1 = R.lagrange_polynomial(trace_1)
ip_2 = R.lagrange_polynomial(trace_2)

# get the random part of the polynomial
r_0 = ip_0(x) - ip_0(x)
r_1 = ip_1(x) - ip_0(x)
r_2 = ip_2(x) - ip_0(x)

print(f"# current configuration")
print(f"p = 257, o = {o}, ω = {ω}, (v0, v1, v2, v3, r0, r1) = {v0, v1, v2, v3, r0, r1}")
print()
print(f"# interpolants")
print(f"ip_0(x) = {ip_0(x)}")
print(f"ip_1(x) = {ip_1(x)}")
print(f"ip_2(x) = {ip_2(x)}")
print()
print(f"# part of the interpolant added by the randomizers")
print(f"r_0(x) = ip_0(x) - ip_0(x) = {r_0(x)}")
print(f"r_1(x) = ip_1(x) - ip_0(x) = {r_1(x)}")
print(f"r_2(x) = ip_2(x) - ip_0(x) = {r_2(x)}")
print()
print(f"# check for invariance with respect to the argument's multiplication by omicron")
print(f"ip_0(x) - ip_0(o*x) = {ip_0(x) - ip_0(o*x)}")
print(f"ip_1(x) - ip_1(o*x) = {ip_1(x) - ip_1(o*x)}")
print(f"ip_2(x) - ip_2(o*x) = {ip_2(x) - ip_2(o*x)}")
print()
print(f"# check for invariance of random parts with respect to the argument's multiplication by omicron")
print(f"r_0(x) - r_0(o*x) = {r_0(x) - r_0(o*x)}")
print(f"r_1(x) - r_1(o*x) = {r_1(x) - r_1(o*x)}")
print(f"r_2(x) - r_2(o*x) = {r_2(x) - r_2(o*x)}")
print()
print(f"# Roots of random parts")
print(f"roots of r_0: {sorted([a[0] for a in r_0.roots()])}")
print(f"roots of r_1: {sorted([a[0] for a in r_1.roots()])}")
print(f"roots of r_2: {sorted([a[0] for a in r_2.roots()])}")

Drop unnecessary row-compression columns

Some of the challenges coming from the verifier are weights with which to compress multiple columns into one column, over which the running product (or sum) is then computed. This compressed column is currently stored explicitly. However, since the compression is a linear operation, this explicit representation is not necessary.

Eliminate about half of the currently existing extension columns.

Use correct number of trace randomizers in low-degree extension

Per consistency check of the non-linear codeword, the verifier queries each polynomial (in value form, i.e., as a codeword) that goes into the non-linear in one place. Each check reduces the degree of freedom for the queried polynomial. To achieve zero-knowledge, the number of trace randomizers needs to be at least the number of non-linear combination checks.

As per #33, the number of non-linear combination checks needs to be at least 2, which is a lower lower bound given realistic security levels.

Increase the number of trace randomizers for the base tables to 2·num_nonlin_checks
Increase the number of trace randomizers for the extension tables to 2·num_nonlin_checks

Implement TIP 0001: Contiguity Argument for Memory Consistency

Together with #78, this supersedes #24.

Rename “boundary x” to “initial x”

Currently, Triton VM specifies boundary constraints and terminal constraints. A contiguous region has two boundaries, one on either side. The word “terminal” is specific, and so is the word “initial.”

Rename “boundary constraints” to “initial constraints,” “boundary quotients” to “initial quotients,” and so on.

Reorder constraint types

For reasons unbeknown to me, the current canonical order for constraints and their associated quotients is

boundary (or “initial,” see #53),
transition,
consistency,
terminal.

To me, it feels more correct if 2. and 3. were switched. I'm open to arguments why the current (or yet another) order is superior.

Use XFri from twenty-first instead

Why doesn't it just share the xfri that lives in twenty-first?

Use instruction bucket's constraints in Processor Table

Even though the constraints for the various instruction groups (like grow_stack, keep_stack, etc.) are both specified and coded up, the relevant chunk of code is never actually used, i.e., the constraints are never checked. Change that.

Put instruction-specific definition of hv registers in arithmetization.md

(Originally posted by @sshine.)

The definition of hv helper variable registers is not explicitly mentioned in the sections for each instruction. Extract the way hv helper variables are set in the code back into arithmetization.md.

(Originally issue number 43 in the internal issue tracker.)

Consistency of RAM Table

There is an attack whereby the malicious prover can read a zero from a given memory location when the last time he wrote to it, the written value wasn't zero. Here is an example trace that illustrates the attack:

`clk`	`ramp`	`ramv`
0	0	0
1	0	7
2	1	0
3	1	-
4	2	0
5	2	-
6	0	0

The boundary constraints are satisfied, because they state that the first row should consist of all zeros. The hv6 column is omitted because it is not relevant for the attack, but there is a valid assignment to this column. The remaining transition constraints are

if ramp changes then ramv is zero – this constraint is satisfied;
If ramp does not change and ramv does change then clk increases by one – this constraint is also satisfied.

Therefore the above trace is a valid trace. But the problem is that the memory cell at location 0 is accessed again in cycle 6. At this point it should have the value that it was set to in cycle 1, but it has the value 0 instead.

We came up with the following solution: we disallow reads from uninitialized memory. Specifically:

Include the current instruction ci in the RAM table. This adds one column.
Include a new column init which contains only 0 or 1, adding a second column.
Whenever the ramp changes, init is set to zero.
Whenever ci is write_mem, init is set to one.
Whenever ci is read_mem, init has to be one.

Tasks

Verify solution
Update arithmetization
Implement in VM's simulate: add columns
Implement in RamTable: add constraints and update permutation argument.

Add cross-table initial constraints

In brainSTARK, every evaluation argument refers to data known to the verifier – either the public input, or the (public) program, or the output. Consequently, no random initialization of an evaluation argument is necessary, from which it follows that no cross-table initial constraint is necessary for evaluation arguments.

Because the initial version of Triton VM was a very direct translation of brainSTARK, cross-table initial constraints for evaluation arguments have not been programmed, independent of whether or not they refer to data known to the verifier.

Add cross-table initial constraints for evaluation arguments for

program ⇒ instruction table,
processor ⇒ hash coprocessor, and
processor ⇐ hash coprocessor.

Make U32 Table assert u32-ness of operands

Currently, the u32 coprocessor will just do its thing until both operands are 0. Crucially, it doesn't enforce that the operands correspond to 32-bit unsigned integers.

Suggested change

Introduce new columns num_bits_processed and num_bits_processed_minus_33_inv (working titles) to the U32 Table. Column num_bits_processed records the number of bits already processed, i.e., increments by 1 between any two consecutive rows, unless the indicator idc is set, in which case the value is reset to 0. Column num_bits_processed_minus_32_inv corresponds to the multiplicative inverse of (num_bits_processed_minus_33_inv - 33) if that inverse exists, and 0 otherwise.

Add the consistency constraint 1 - (num_bits_processed - 33)·num_bits_processed_minus_33_inv to the U32 Table. This enforces that the u32 coprocessor cannot process 33 bits. Since it only processos 1 bit at a time, this constraint enforces that operands are at most 32 bits wide.

Write recursive verifier

Write the recursive verifier. The checklist below is probably incomplete and in some places too coarse-grained. It will be updated and broken up into additional issues as subtasks become clearer.

Triton Assembly Verifier

`sample weights`

input: seed, num_weights

list_of_weights = []
for i in {0, …, num_weights-1}:
    digest = xlix(seed, i)
    weight = XFieldElement(digest)
    list_of_weights += [weight]
return list_of_weights

`sample indices`

See document on index sampling.

Revealed indices

For the recursive verifier, we need to do something different with revealed indices: We currently sort and deduplicate, which is going to be expensive/hard when coding this in TVM.

(Originally issue number 24 in the internal issue tracker.)

Checking low-degreeness of last FRI codeword in recursive verifier

Let's discuss ways to check the low-degreeness of the last codeword in FRI and analyze which of the approaches is the least complex one with regards to recursive proof verification.

Currently, we do FRI (depicted here with a measly 2 co-linearity computations) in the following way:

Suggestion 1: Use current ISA & FRI

We use only currently available instructions to write a TritonVM method performing INTT and check the low-degreeness of the resulting polynomial.

Suggestion 2: Modify ISA

We create a dedicated instruction intt.

Suggestion 3: Modify FRI

We change FRI in the following way.

Split-and-fold continues beyond the last codeword, creating co-linearity-computation-collapsing codewords (“clcc codewords”). Each clcc-codeword can be sent in the clear, just like the last codeword can.¹

For rounds involving at least one clcc-codeword, co-linearity indices can coincide, collapsing a co-linearity computation – hence the name clcc. Whenver such a collapse occurs, the verifier needs to check whether the values at the corresponding indices are, in fact, the same. For example, in the picture below, a collapse happens between the pink and the turquoise co-linearity computations. The turquoise value was computed by the verifier, the pink part was sent in the clear by the prover, allowing for a meaningful comparison.

The last clcc-codeword is of length one, which means computing INTT is trivial – it is the identity operation. This constant polynomial is called (orange) c in the picture above. The only remaining check the verifier has to perform is between c and the last computed co-linearity.

Questions to answer for cost/benefit analysis:

How costly is arithmetization of INTT?
What are typical expected FRI-domain sizes?
How many co-linearity checks will be performed?
How many clcc-codewords will be created?
(At least) how many co-linearity checks will collapse in a given clcc-round?
If only the merkle root of the clcc-codeword is sent: how long is one authentication path at that clcc-round?

Using all of the above:

How many extra elements are put in the proof due to clcc-codewords?
How many elements are not put in the proof stream if last codeword and (some) clcc codewords are only commited to?
Which suggestion is the best?

(Originally issue number 15 in the internal issue tracker.)

It might be better to build Merkle trees from the longer clcc-codewords. This is an open question. ↩

Add remaining constraints for Hash and U32 tables

(Originally posted by @sshine.)

Encode equations for Hash Table
Derive equations for U32 Table
Encode equations for U32 Table

(Originally issue number 36 in the internal issue tracker.)

Implement TIP 0003: Subset Argument for Memory Consistency

Together with #77, this supersedes #24.

Fix Test `run_tvm_mt_ap_verify_test`

Currently this test has hardcoded Merkle roots and Merkle siblings and (probably?) also hardcoded leafs. We want this test to be property-based: generate a random Merkle tree, choose a random leaf, and verify its authentication path. And perhaps also verify that the other validation tasks (with different root, different authentication paths, etc) actually fail.

Explore Lookup Arguments to Relate AETs

Currently, executing instructions like lt (and a few others) leads to an increase of the relevant coprocessor's AET, e.g., the U32 Table. Most importantly, this happens even if the instruction has been executed before with the same arguments. The reason is the way AETs are linked: Permutation Arguments do not allow for re-using the same value.

An alternative way of linking two AETs is by using Lookup Arguments. The idea is taken straight from Ariel's & Zac's plookup, then stuffed into AETs. The picture below gives an example of one such table thusly linked.

Explore the tradeoffs of introducing Lookup Arguments for linking AETs.
Optimize the argument to use fewer columns than the sketch above.

(Originally issue number 29 in the internal issue tracker.)

Remove explicit terminal values

Once all tables pad to the same height (#10), terminals do not need to be sent to the verifier explicitly. Instead, a cross-table terminal check akin to the currently existing cross-table initial check can be performed.

The main advantage of this approach is the removal of random initials while retaining zero knowledge. Because the terminal is never learned by the verifier, it cannot leak any information. Consequently, the running product (respectively, running evaluation) can always start with 1 (respectively, 0).

A minor advantage of this approach is a marginally decreased proof size.

Merge evaluation arguments between processor and hash coprocessor

Since inputs to and outputs from the hash coprocessor are written in exactly one order – namely in_0, out_0, in_1, out_1, and so on – we can use one evaluation argument serving as a bidirectional bus to connect the processor with the hash coprocessor.

Always output the culprit of max degree

The max degree dictates the FRI domain length which is the main factor for the runtime of the prover. During development we should output to standard out which transition quotient for which table dictates the max degree. This will help us get a feel the most obvious target for optimization.

`read_mem` followed by `halt` leaves `st0` unconstrained

The consistency of a value read from RAM is guaranteed by the memory table. But since padded rows do not influence the running product¹ which is used to calculate the terminal value, and since padded rows cannot be distinguished from the first halt, the running product will not be affected by the row containing the first halt. This means that the row containing the first halt should not be included when deriving the RAM table from the processor table. If this row is not included then its st0 value is not constrained if the previous instruction was a read_mem.

Since this st0 cannot be output, you could argue that this is not a problem. This st0 value will also not be part of any permutation argument/running product calculation, so the freedom that the prover has to set it shouldn't lead to soundness problems in the permutation proofs.

We have to avoid any validation logic that is conditioned on this st0 value though.

The reason that padded rows do not update the running product is that this would be incompatible with different number of padded rows in each table. And since the current idea is to have each table padded to the same height, to get a global omicron, we cannot let the padded rows count towards running products. ↩

Pad all tables to the same height

A Verifier needs to compute (what we call) unit_distances from the extension tables' padded_heights. For that, they currently need to have access to the padded height of all tables, which (for now) is communicated via the proof stream.

At a runtime cost for general proving, moving to a single padded_height simplifies the Verifier. A simpler Verifier implies a faster runtime for the recursive Verifier. Thus, the proposed change should be implemented.

(Originally issue number 31 in the internal issue tracker.)

Improve VM Error Messaging

Bug-hunt 2aed426 identified two types of errors that can occur when running the VM.

Rust produces a "failed to fill whole buffer" message. This is probably related to stdin/secret_in when you're trying to read too many or too few symbols. This kind of behavior should be caught and the programmer should be notified of it gracefully.
The VM chokes gracefully because of an assert or assert_vector or similar instruction failing. When this happens, it would be nice if the error message supplied which instruction (instruction type and instruction pointer) caused it, possibly with a (jump) stack trace. Also, we will want to verify that certain programs that should fail gracefully do fail gracefully.

Investigate lower-than-allowed degrees after shifting of codewords

The debug output of the test triton_prove_verify indicates that some codewords do not correspond to maximal degree codewords even after they have been shifted by what we currently assume to be the correct shift. Gut feel says that this is a soundness error.

Investigate why above phenomenon happens and apply any needed fixes.

Remove initially prover-known fixpoint for updating a running evaluation

Currently, the registers in some row (or the relevant parts of it) are compressed into a single value by taking the verifier-randomized non-linear sum. For example, for the Evaluation Argument between the Program Table and the Instruction Table, using verifier supplied values $\alpha, \beta, \gamma$, this function is $\mathsf{compress}_{\alpha, \beta, \gamma} : \texttt{ip}, \texttt{ci}, \texttt{nia} \mapsto \alpha \cdot \texttt{ip} + \beta \cdot \texttt{ci} + \gamma \cdot \texttt{nia}$. Notably, we have $\mathsf{compress}_{\alpha, \beta, \gamma}(0, 0, 0) = 0$.

The running evaluation of an Evaluation Argument is initialized with 0. Using verifier-supplied $\epsilon$, the update rule is $\mathsf{update\_running\_eval}_\epsilon : \texttt{re}, \mathsf{compressed\_row} \mapsto \epsilon \cdot \texttt{re} + \mathsf{compressed\_row}$ where $\mathsf{compressed\_row}$ is the appropriate evaluation of $\mathsf{compress}$. Notably, we have $\mathsf{update\_running\_eval}_\epsilon(0, 0) = 0$.

This means that a prover can sneak in an arbitrary amount of all-zero rows before the first not-all-zero row is absorbed into the running evaluation. One way to overcome this is by altering the function $\mathsf{compress}$ to take additional verifier-supplied value $\delta$:
$$\mathsf{compress}_{\alpha, \beta, \gamma, \delta}: \texttt{ip}, \texttt{ci}, \texttt{nia} \mapsto \alpha \cdot \texttt{ip} + \beta \cdot \texttt{ci} + \gamma \cdot \texttt{nia} + \delta$$

Introduce dedicated column RAM pointer `ramp`

In TIP 0001, the prover computes the formal polynomial, whose degree depends on the number of changes of the RAM pointer. In the current design of Triton VM, the RAM pointer is equivalent to stack element st1, independ of the current instruction.

Introduce column ramp that equals st1 if the current instruction is read_mem or write_mem and doesn't change otherwise.

Introduce types AllInitials and AllTerminals to better indicate their functions

From a data structure perspective, initials and terminals are identical. However, they differ wildly in their functionality. Use type aliases to re-use the data structure but better indicate their respective functions.

Identify optimal opcodes and instruction grouping.

In order to keep the degree of the AIR polynomials low, we group instructions exhibiting the same behavior in certain aspects, allowing to (de)select for those aspects. This is achieved by decomposing register “current instruction” ci using helper registers “instruction bucket” ib0 through ib5 (or some other, more suitable number of them).

For optimal AIRity, we need to identify

suitable groups of instructions, and
for all instructions: the opcode allowing easiest identification of the groups the instruction is part of, based on the opcode's decomposition into ib0 through ib5.

Current groups are described in arithmetization.md, section AIR. They might be improved upon and should definitely be subject of discussions.

Identifying the opcodes should (probably) happen programmatically. Part of this issue is therefor to write a script that computes an instruction's opcode based on its group membership.

(Originally issue number 8 in the internal issue tracker.)

Replace instructions `and` and `xor` by `nand` or `nor`

The current set of binary instructions available in Triton VM, and and xor, is functionally complete, but not minimally so. Replacing them by either nand or nor

does not change completeness,
marginally decreases usability (or and not are not natively available anyway),
simplifies the instruction set architecture, and
simplifies the U32 Table.

(Originally issue number 41 in the internal issue tracker.)

Make maximum number of trace randomizers (as) independent (as possible) from table's height

Currently, during column interpolation as part of low-degree extension, at most one trace randomizer per row in the table can be inserted. Given the current code, it is trivial to increase this to (expansion_factor - 1)-many trace randomizers per row. With high probability, the two domains can be mostly independent. For interpolation to work, they do have to remain disjoint.

Figure out the best trace randomizer domain.
Implement the change.
Adapt computation of a table's padded height.

Rename “running sum” to “running evaluation”

Currently, the columns dealing with evaluation arguments have “running sum” as part of their identifier. I feel that “running evaluation” is technically more accurate since both a multiplication and an addition are part of the performed operation.

Broken link in README.md

The following link can be reached:
https://triton-vm.org/spec/isa.md

Derive Ext*TableColumn indexing

(Originally posted by @sshine.)

Right now, each *TableColumn's .into() for usize hardcodes enumeration, and the corresponding Ext*TableColumn's .into() for usize does, too, starting with a constant that can go out of sync.

Instead, have two *Column that are simple enums (not sum types), derive their .into()s, and combine them in a third sum type that piggy-backs on the derived instances. This also makes property tests ("for all columns") possible.

The main change here is that deriving enum instances doesn't work for generic sum types, only for plain enum types (sum types without constructor arguments).

(Originally issue number 38 in the internal issue tracker.)

Repair U32Table's Permutation Arguments

Currently, none of the 5 running products of the Permutation Arguments between the U32 Table and the Processor Table match up. Repair them.

Loosely relates to #8.

(Originally issue number 37 in the internal issue tracker.)

Replace `assert_vector` by pseudo instruction `eq_vector`

Suggested by @aszepieniec.

By returning the result of the quintuple equality test instead of crashing the VM on inequality, a programmer can test whether a hash digest equals the Merkle root of any Merkle tree, for example the ones making up a Merkle Mountain Range. Other use cases might exist.

If desired, a crash can still be induced by calling already existing instruction assert on the result of eq_vector.

Incorrect padding row derivation in memory tables

Currently, all memory tables (i.e., RAM, OpStack, JumpStack) derive their next padding row directly from their current last row, incrementing the value of column cycle count clk by 1. However, since a memory table gets sorted by memory pointer (ramp, osp, jsp, respectively) first, clk second, this derivation of the padding row is generally incorrect: the highest value of clk and the highest value of the memory pointer do not have to appear in the same row.

The solution that I think is easiest to implement:
Derive the memory tables from the processor table after the processor table has been padded.

tritonvm / triton-vm Goto Github PK

triton-vm's Introduction

Triton VM

Getting Started

Recursive STARKs of Computational Integrity

Project Status

Specification

Running the Code

triton-vm's People

Contributors

Stargazers

Watchers

Forkers

triton-vm's Issues

Pseudo Instructions and other Features

call_function

subtract

increment

decrement

if

skinz

neg

neq

p=np

The Construction

Hiding Only Part of the Program

Actionable Items

Ferdinand's Intuition

Theorem

Sequential Addressing

Lazy ramp

Solutions

Solution 1: U32 Lookups

Solution 2: Add Bit Columns

Solution 3: Add Dummy Rows

Solution 4: Build Difference

Re-Design Exploration 1 – Combine Permutation Arguments

Re-Design Exploration 2 – Combine Result Columns

Re-Design Exploration 3 – Ditch U32 Table

Re-Design Exploration 4 – Add U8 Lookup Table

The problem

The solution

Appendix

Example Output

Script

Tasks

Suggested change

Triton Assembly Verifier

sample weights

sample indices

Revealed indices

Suggestion 1: Use current ISA & FRI

Suggestion 2: Modify ISA

Suggestion 3: Modify FRI

Questions to answer for cost/benefit analysis:

Footnotes

Footnotes

Recommend Projects

Recommend Topics

Recommend Org

`call_function`

`subtract`

`increment`

`decrement`

`if`

`skinz`

`neg`

`neq`

`p=np`

Lazy `ramp`

`sample weights`

`sample indices`