0xpolygonmiden / miden-vm Goto Github PK
View Code? Open in Web Editor NEWSTARK-based virtual machine
License: MIT License
STARK-based virtual machine
License: MIT License
In some situations it may be useful to get the absolute memory location of a local variable.
For example, let's say we have a procedure for multiplying two 256-bit integers (u256:mul
) this procedure expects memory addresses of operands a
and b
to be on the top of the stack, and after it performs the multiplication, it will leave the result on the top of the stack (we could have passed operands a
and b
via the stack as well, but for the purpose of this example, we do it via memory).
Thus, to invoke u256:uml
the caller needs to put absolute addresses of a
and b
onto the stack, but if a
and b
"live" in local memory of the caller, it is not clear how to do that (since we don't have an instruction for accessing the value of fmp
directly).
We could add assembly instructions to access fmp
and then the caller would be able to compute the absolute address, but i wonder whether having a more specialized instruction is more convenient. I'm thinking of something like this (though open to other name/format suggestions):
push.env.locaddr.i
where i is the index of the local. This instruction would read the fmp
, subtract i from it, and push the result onto the stack.
The only downside of this I can think of is that it would enable potential abuse of fmp
where a procedure could return absolute address of its local (which would be a logical error).
There are currently some inconsistencies in how we've implemented hashing operations. Specifically:
RPPERM
to compute permute(A, B, C), expects the stack to be arranged as [A, B, C, ...]
where A is at the top of the stack.rpperm
and rphash
operations assume that the stack is arranged as [C, B, A, ...]
where C is the capacity portion of the sponge and is located at the top of the stack. In case of rphash
we assume that once RPPERM
is executed, the word A will contain the result of the hash.One option to address this is to simply fix the implementation of RPPERM
in the VM. However, it might be a good idea to have the capacity portion of the sponge deep in the stack (the way it is currently arranged for RPPERM
). This way, when we compute a linear hash of many elements, we don't need to touch capacity elements (word C) as we update the outer portion of the sponge (words A and B).
At the same time, the arrangement [A, B, C, ...]
is contrary to how we arrange stack for most other operations and might be confusing. For example, as we compute linear hash of many elements, we'd first need to populate the word A and only then the word B with new data - which will be awkward to do since A
is at the top of the stack.
The other option is to modify how the Rp64_256 hash function actually works. Specifically, we can change it so that if the state is [A, B, C]
, it is expected that A
is the capacity portion, and B
is expected to contain the result of the hash after the permutation.
So, to summarize the options:
RPPERM
operation in the VM. This is the simplest option but would make linear hashes of many elements slightly less efficient.rphash
to work with the current implementation of RPPERM
. This accepts the awkwardness of the current stack arrangement of RPPERM
and also would make computing linear hashes of many elements less efficient.RPPERM
and related assembly instructions accordingly. The stack would be arranged as: [C, B, A, ...]
. This is a more involved option, but perhaps the cleanest and most efficient.As specified by the assembly operations specs, bitwise u32 operations should fail when given inputs >= 2^32.
Currently, the following operations do not fail correctly:
These errors should be fixed in the in the Bitwise helper in the processor crate.
Currently, Hasher component of the processor always builds traces for hash computations from scratch. This happens even in cases when computing hashes of the same values more than once.
We can improve on this by keeping track of the hashes which have already been computed, and just copying the sections of the trace with minimal modifications. Specifically, the only thing that needs to be updated when computing hash for previously hashed values is the row address column of the trace - everything else would remain the same.
This can also be done at a higher level. For example, we could keep track of sections of a trace used for Merkle path verification and then, if the same Merkle path was verified more than once, we can just copy the relevant sections of the trace (again, with minimal modifications).
As discussed in #44 and in #73 we should refactor i/o assembly operations to follow the what.where format and also add new operations for handling memory-based local variables. Overall, the new i/o instruction set should be as follows:
Pushing values onto the stack:
push.xyz
- similar to current push
but should allow pushing multiple values (up to 8). For example, the following should be valid: push.1
, push.1.2
, push.1.2.0x3
etc.pushw.0x...
- this should push a single 32-byte value provided in hexadecimal format. Semantically, this should be equivalent to push.a.b.c.d
where a is the first 8 bytes of the value, b is the next 8 bytes etc. (although, maybe we should reverse the order?).push.evn.sdepth
pushw.mem.addr
pushw.local.idx
where idx
is the index of the local variable. using this instruction outside of a procedure should be an error.push.adv.n
where n can be up to 8.Removing values from the stack:
popw.mem.addr
popw.local.idx
where idx
is the index of the local variable. using this instruction outside of a procedure should be an error.Overwriting values on the stack:
loadw.mem.addr
loadw.local.idx
where idx
is the index of the local variable. using this instruction outside of a procedure should be an error.loadw.adv
Save stack values without removing them:
storew.mem.addr
storew.local.idx
where idx
is the index of the local variable. using this instruction outside of a procedure should be an error.To enable handling locals, we'll also need to update procedure parsers. The new procedure declaration format should include an optional specifier for the number of memory-based locals the procedure can access. For example:
proc.foo.2 # needs access to 2 memory-based locals
proc.bar # same as proc.bar.0
If code within a procedure tries to access a local outside of the specified bounds, the assembler should return an error.
The VM has several components traces of which should be "stacked" so that they all share the same set of execution trace columns. These components are:
All of these components already implement fill_trace()
methods which should help with the process. The overall procedure for building a trace is as follows:
First, we need to determine overall length of the execution trace (this would happen when we convert Process
struct into the ExecutionTrace
here). This length should be computed as be the max of the lengths of all segments of the trace. For the purposes of this issue, we care only about two segments: stack trace and auxiliary table trace.
Stack trace can be retrieved via Stack::trace_length()
(which should probably be renamed into trace_len()
for consistency). All components of the auxiliary table also expose trace_len()
. So, the length should be: max(stack_len, hasher_len + bitwise_len + memory_len)
. And then we need to pad this length to the next power of two.
Once we have this length, we should allocate 18 vectors of this length, break it into fragments corresponding to lengths of each auxiliary component, and use fill_trace()
to fill corresponding segment. We fill the fragments as follows:
ZERO
, and the remaining 17 columns contain the actual hasher trace.ONE
and values in column 1 are set to ZERO
. The next 13 columns contain the actual bitwise processor trace. The remaining 3 columns are padded with ZERO
ONE
, values in column 2 are set to ZERO
. The next 14 columns contain the actual memory trace, and the last column is padded with ZERO
.ONE
, and values in remaining columns are set to ZERO
.Even though the above is described sequentially - we can fill all fragments in parallel.
Also, the Bitwise trace is currently missing two columns (so, currently in the code the number of bitwise columns is 11 rather than 13). These are selector columns which we need to identify the bitwise operation being performed.
Currently, the processor is unaware of the procedure context of an executing program. That is, the processor doesn't know which procedure is being executed, which procedure called the current procedure etc. This limits usefulness of debug info. Thus, it would be very useful if the processor maintained a "procedure stack".
To make a process aware of which procedures are executing, we could introduce two new decorators. Something like:
ProcStart
- which the assembler would add at the start of each procedure.ProcEnd
- which the assembler would add right before a procedure returns.Then, the processor would push a procedure onto a stack when it sees ProcStart
decorator in the instruction stream, and remove a procedure from the stack when it sees ProcEnd
decorator. This should be very lightweight - so, I don't think we should worry about performance implications.
As I learned from @dlubarov, it is possible to emulate bitwise AND
, OR
, XOR
operations using just one of these operations and some filed element operations. Specifically, assuming we have access to AND
, the other two can be described as:
XOR(x, y) = x + y - 2 * AND(x, y)
OR(x, y) = x + y - AND(x, y)
Thus, we could eliminate 2 out of the following 3 operations: U32AND
, U32OR
, U32XOR
. This wouldn't save us much in terms of trace length or constraint complexity, but it would allow us to remove 2 instructions from the instruction set if needed.
If we do go this route, it probably makes sense to keep XOR
operation as the one natively supported by the VM and emulate the other two through it.
For most u32
operations the "safe" variants are much more expensive than the "unsafe" variations - frequently at least 5x or even 10x more expensive. This might make the safe variants much less desirable to use. One of the reason for this is that asserting that a value is a u32
value is emulated by a sequence of 3 operations: U32SPLIT EQZ ASSERT
. Thus, many instructions which work with 2 operands, asserting that the operands are 32 bit values requires 7 or 8 operations.
We could make safe operations much more efficient by introducing U32ASSERT2
operation in the VM. This operation would check whether the top two items on the stack are u32
values and if either one of them is not, return an error. Thus, we'd be able to do most safe operations at just 2x the cost of the unsafe ones.
From AIR constraint standpoint it shouldn't be too difficult to support this - i.e., this would require 4 16-bit range checks, which we need to support for other operations anyway (e.g. u32 mul).
The implementation of parse_u32div
in the assembly u32 ops parser currently has 2 errors:
u32div.0
), which means that the operation fails during execution instead of during compilation.U32DIV
op should be zero, which means that all divisions with non-zero remainder fail.Current assembly instructions for memory access deal with words (4 field elements). This creates quite a bit of over head for use case when we want to read/write a single field element. The overhead is in both places: the VM itself and the assembly instruction sequences needed to accomplish single-element reads/writes.
Adding dedicated operations to read/write single elements at the VM level is possible, but we should probably do it at a later date, once we have a better picture of how frequently these operations are used. However, we could add instructions at the assembly level to help with this. The instructions I'm thinking of are:
push.mem.addr
- pushes the first element of the word at addr onto the stack.push.local.idx
- same as above but for a local.pop.mem.addr
- pops the element off the stack and saves it addr.pop.local.idx
- same as above but for a local.Advice sets (src) are used to provided structured non-deterministic inputs to the VM. Currently, the only concretely implemented advice set is a Merkle tree. But anything that meets the requirements of an advice set interface could also work. The requirements of the interface are roughly as follows:
Some other options for advice sets include:
Now that support for standard library imports has landed in #121 we should start discussing what we want to include into the initial release of stdlib
. Here are my preliminary thoughts on the library content:
math
u64
- arithmetic and bitwise operations with 64-bit unsigned integers.u128
- not sure if we need this, maybe skip for v0.1?u256
- needed to support EVM. we should discuss if we want to try to support all operations in the original release.crypto
hashes
sigs
(verification only)
horst
- an instantiation of HORST signature scheme using default hash function of the VM. Could be a good way to test out mtree.get
instruction.schnorr
- an instantiation of Schnorr signature using the default curve of the VM. Definitely delay till later.evm
- EVM-specific built-ins not covered above. Probably defer to future releases.Anything else I'm missing?
Given that there are already things on the horizon which can go into standard library (e.g., #77), I think it might be a good time to think about how to organize standard library.
I'm inclined to keep things as simple as possible (at least at this point) - so, a few assumptions:
The way I'm currently thinking about is that the assembler will load the full standard library at instantiation time. For this, we'll need to build a procedure map (similar to procedure map for local procedures) in the assembler constructor. Then, when a script is being parsed, the assembler can look up a given procedure in this map and inline it as it does with local procedures.
A few preliminary questions to consider:
assembly
crate or its own crate? I'm inclined to think that in its own crate (e.g., stdlib
)..ma
(Miden assembly) - but open to other suggestions.use
statements in Rust? Or should standard library procedures be always referred to by fully qualified names? e.g., exec.crypto:hashes:blake3
vs use std:crypto:hashes:blake3
and then in the code exec.blake3
.We should add support for u32 operations in v0.2 release according to the specs described here.
This will also require updating VM instruction set to support u32 operations.
The way procedures are parsed and inlined causes extra noop
operations to be added and can be optimized.
As noted in the specs, the rules for grouping operations inside of span blocks are:
push.2
) cannot be the first operation in the group. In this case, the VM must insert a noop
in front of it.Currently, it seems that noop
insertion is done during procedure parsing and then again during procedure inlining, causing extra unnecessary noop
instructions to be added.
Here are some (very contrived) examples to demonstrate:
Assembly:
proc.foo
push.0 push.0 push.0 push.0
drop drop drop drop
push.3 push.7 mul
end
begin
push.2 push.3 add
exec.foo
end
Compiled:
begin
span
push(2) push(3) add
pad pad pad pad
drop drop drop drop
push(3) noop push(7) mul
end
end
The noop
is added during the parsing of the procedure where push(7)
would be the 10th instruction and would therefore be the first instruction in a group. However, once the procedure is inlined, this noop
is no longer necessary, because at this point the second group starts with drop
. The batches in the compiled script seem to regroup the operations correctly such that the push(3)
and push(7)
from the procedure are no longer in different groups, as seen here:
Op Batches:
op_batches: [
OpBatch {
ops:
[Push(BaseElement(2)), Push(BaseElement(3)), Add, Pad, Pad, Pad, Pad, Drop, Drop, Drop, Drop, Push(BaseElement(3)), Noop, Push(BaseElement(7)), Mul],
groups:
[BaseElement(1016745240677138690), BaseElement(2), BaseElement(3), BaseElement(241055074062), BaseElement(3), BaseElement(7), BaseElement(0), BaseElement(0)]
}]
Assembly:
proc.foo.3
push.0 push.1
storew.local.2
add
push.0
loadw.local.2
mul
push.0 push.0 push.0 drop drop
end
begin
push.4 push.3 push.2 push.1
exec.foo
end
Compiled:
begin
span
push(4) push(3) push(2) pad incr
push(3) fmpupdate
pad pad incr
push(2) neg fmpadd storew
add
pad
noop push(2) neg fmpadd loadw
mul
pad pad pad drop drop
noop push(3) neg fmpupdate
end
end
In this example, the first noop
is added on line 9 during procedure parsing where the 1st operation of the second group of the procedure operations is push(2)
(the first VM instruction for loadw.local.2
). Once the procedure is inlined, the placement of this noop
is no longer necessary. However, once all operations are combined, the second noop
needs to be added, since, after the inlining, the push(2)
on line 12 would be the first operation in a group. Notice that without the first unnecessary noop
this second noop
would not be required.
Given that proptest may need different sampling for valid vs. invalid inputs, it makes sense to handle these separately.
As discussed, we've decided to handle error cases in manual tests and only test validity in the randomized tests.
This should be updated in the processor u32 tests here
Maybe invalid cases can be handled in manual tests? And the test validity of operations over valid inputs using randomized tests?
Originally posted by @bobbinth in #75 (comment)
Hi there,
Is there a way to start contributing to Miden ? Excited about the project would love to get involved in some capacity
Best,
Hi there,
Is Miden ready to start developing on for developers or would it be advisable to wait for a later release ? Excited about the project but just wanted to get a sense of how stable the APIs are and what release version is targeted at new developers.
Thanks so much :)
Best,
One of the conditions we currently impose on program execution is that after a program completes, there must be no items left in the stack overflow table (i.e., stack depth must be 16). It is probably a good idea to have the assembler take care of this. Specifically, at the end of the main script body the assembler could include a call to a procedure which make sure the extra items are dropped from the stack.
This procedure should probably be located in the standard library (maybe in std::sys
module?), and it should ensure the following:
Currently, we have a single Operation
type which encodes all operations which can be executed on the VM. These operations fall into two main categories:
ADD
, MUL
etc.)DEBUG
, ADVICE
). As opposed to regular operations, decorators do not advance the clock cycle of the VM, and do not affect the state of the deterministic VM components (e.g., stack, memory).The idea behind decorators is that we can use this mechanism to provide "hints" to the VM for debugging and other purposes. However, the way the decorators are implemented currently, leads to a couple of undesired consequences.
First undesired consequence is that decorators are relatively "heavy". For example, the current size of an Operation
is 48 bytes (ideally, it would be just one byte, though this ideal might be difficult to achieve).
Second undesired consequence is that decorators can affect program hash. Specifically, decorators are placed into span blocks and as long as there is at least one regular operation in the same block, all is fine. But if there are no regular operations, we basically have a block which consists of noops, and there is no way to eliminate it. This should happen relatively rarely - but this edge case is possible, and thus we need to handle it.
To sum up: we want to support decorators, but we want to maintain the following properties:
Operation
type.One way to achieve this is to introduce a separate Decorator
type, and have a separate data structure which specifies which decorator is to be executed at a given clock cycles (this would be somewhat similar to how we handle this in v0.1). There might be performance implications to this. Specifically, on each cycle we'd need to check if there are decorators to be executed before executing a regular operation. My hope is that these performance implications would be close to negligible.
However, this still doesn't address the second issue. Consider the following example:
while.true
<instructions>
end
debug
while.true
<instructions>
end
The debug
decorator here would have to be located in an empty span block, and this would mean that the hash of the program would be different from exactly the same program but without the debug
decorator.
One potential solution to this is to also put decorators into other blocks (not just span blocks). But this may add a considerable amount of complexity.
u32div
operation processor currently incorrectly handles division by zero. It panics, but instead should return ExecutionError::DivideByZero
.
We should also update relevant u32tests to expect an error instead of a panic.
There seems to be an issue in one of the 2 following places:
op_u32split
+ a lack of documentation about stack requirements for that opAccording to the assembly ops specs here, which specify a stack input for u32cast
of [a, ...] and stack output of [b, ...] I expected the following minimal execution test to pass, but it fails with a stack underflow
error.
#[test]
fn u32cast() {
let script = compile("begin u32cast end");
let inputs = build_inputs(&[1]);
let trace = execute(&script, &inputs).unwrap();
let last_state = trace.last_stack_state();
let expected_state = build_stack_state(&[1]);
assert_eq!(expected_state, last_state);
}
Essentially what's happening is that when the processor does a u32split
& attempts to set stack[1]
with the low 32-bit values, it panics, because the current (and required) stack depth is only 1. If I add an extra input to the stack then everything works fine, and that input isn't touched.
Relevant backtrace:
stack backtrace:
0: rust_begin_unwind
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
1: core::panicking::panic_fmt
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
2: miden_processor::stack::Stack::set
at ./src/stack/mod.rs:99:9
3: miden_processor::operations::u32_ops::<impl miden_processor::Process>::op_u32split
at ./src/operations/u32_ops.rs:19:9
This seems to be a bug either in the way the stack depth is managed for operations that output more elements than they take as inputs or a bug in the check_depth
assertion in op_u32split
. What gave me pause was that in the u32split
unit test in the processor you're initializing the stack with 2 values, which suggests that maybe the current behaviour is intentional but wasn't documented yet (or is in a place I haven't seen).
What is supposed to be happening here?
As mentioned #125, we should add a module for 64-bit unsigned integer operations to the standard library. Initially, it should probably include the following procedures:
add
sub
mul
div
mod
,eq
eqz
lt
lte
gt
gte
and
or
xor
shl
- might be difficult without #126shr
- might be difficult without #126Also, for most of these (excluding bitwise operations), we probably should do the _unsafe
variants.
We need to implement the debug
operation according to the specs described here. Implementing this operation would require:
Miden VM is currently a stack machine, and while we are fairly far down the path of implementing it as such, it's worth listing pros and cons of different architectures, and taking another look at the alternatives.
I have the following architectures in my consideration set:
With all of the above, we assume that there is also random access memory and instructions can move data btween stack/registers and memory.
The sections below attempt to list pros and cons of each approach, though, the lists are probably not going to be exhaustive.
MPVERIFY
instruction which consumes 10 field elements as inputs and MRUPDATE
instruction which takes 14 field elements as inputs.Pseudo-stack machine inherits a lot of the pros and cons of the stack machine, but there are some differences.
DROPW
instruction to drop top 4 items of the stack.SWAPW3
, even if we don't have all 16 items on the stack.SWAP
instructions makes this a tiny bit better.Here I'm assuming we have a set of general purpose registers (probably between 8 and 16), and instructions can reference any of these registers directly.
MPVERIFY
) rather complicated.Since memory access is rather cheap in zkVM context, we can do away with registers entirely and work directly with memory. As with register machines, I'm assuming that each instruction can read values from one or two memory locations and write the result of the operation into a third memory location.
The VM currently does not support dynamic bitwise shifts. One of the reasons for this is that emulating shifts using other operations is relatively easy when we know how many bits to shift by statically. For example, u32shl
is actually computed as PUSH(2^x) MUL U32SPLIT DROP
where x
is the number of bits to shift by and 2^x
is computed at compile time.
One way to support dynamic shifts of u32
values (which can be used to support dynamic shifts of u256
values), is to compute 2^x
in the VM where x
is a value on the stack. The operation could be called U32POW2
(or something similar), and could work as follows:
[x, ... ] -> [2^x, ...]
This, of course, can be computed via repeated multiplication by two, but that would be rather expensive, and we should find a better way of supporting this operation.
One thing to note: even if we do have support for this operation, implementing u256
bitwise shifts is going to be far from trivial. So, other approaches to bitwise shifts may need to be explored.
As mentioned in #65, we might introduce swapdw
(swap double word) operation at the VM level at some point. It might be a good idea to have this operation supported in the assembly even before then. This operation would be useful for working with u256
values (as each u256
value is represented with 8 32-bit limbs).
Semantics of the operation would be as follows:
[a7, a6, a5, a4, a3, a2, a1, a0, b7, b6, b5, b4, b3, b2, b1, b0, ... ] ->
[b7, b6, b5, b4, b3, b2, b1, b0, a7, a6, a5, a4, a3, a2, a1, a0, ... ]
For now, we can implement this using our existing SWAPW
operations.
As discussed in the PR implementing parsing of pushw, it may make sense to allow the pushw instruction to take a single 32-byte hex value either instead of or in addition to the current approach of taking 4 inputs of 1 element each.
This is inspired by this contrived but strange example with the current implementation:
let op = Token::new("pushw.1.23.0x1C8.0", 0);
From Bobbin:
"now that I'm seeing how this actually looks, I wonder if we should change the format of pushw instruction to always take a single hexadecimal value of 32 bytes. Or maybe making this as an additional allowed format. So, for example, possible formats could be:
pushw.1.2.3.4
pushw.0x13244... (32 bytes)"
Currently, the assembler handles only a limited set of field operations. We should expand it to support all operations mentioned in the specs.
This would also require updating VM instruction set to support missing operations.
Currently, the assembler does not support comments in the source code. We should implement parsing of comments according to the spec described here.
It would be cool if we had a way to analyze a given script. This would let us make informed judgements about scripts we may have in the standard library. Some interesting statistics that come to mind:
u32add.full
operation x times, and this translated to y VM cycles.Program analysis task list (grjte edit):
We should implement cryptographic operations in v0.2 assembler according to the specs described here.
This will also require updating VM instruction set to support missing operations.
Currently, at the VM level we have a single loadw operation which is used to load data from memory onto the stack. This operation is used as the basis for loadw.mem
and pushw.mem
assembly instructions. Specifically, loadw.mem
translates directly to loadw
operation, while pushw.mem
pads the stack with zeros before executing loadw
.
Thus, loadw.mem
requires 1 VM cycle, while popw.mem
required 6 VM cycles. We can make these a little more balanced by changing the semantics of loadw
processor operation. Specifically, currently stack transition diagram for loadw
looks as follows:
[addr, 0, 0, 0, 0, .... ] -> [a3, a2, a1, a0, ...]
where, addr is the memory address from which to load a word, and a0, a1 etc. are elements of the word located at that address. The net result is that the stack is shifted the left by one.
With the new approach we could do the following:
[addr, 0, 0, .... ] -> [a3, a2, a1, a0, ...]
The net result would be that the stack is shifted to the right by one.
With this approach both loadw.mem
and pushw.mem
instruction can be translated to 4 VM operations:
loadw.mem: MOVDN2 DROP DROP LOADW
pushw.mem: PAD PAD MOVUP2 LOADW
The net effect is that we reduce cycle count of pushw.mem
by 2 cycles while increasing cycle count for loadw.mem
by 3 cycles. Thus, whether this optimization is worth it should be dictated by relative prevalence of pushw.mem
vs. loadw.mem
in real-world programs.
As described in the assembly specs, Miden VM has word-addressable read-write memory. There are a few load
and store
instructions which are responsible for moving data between the memory and the stack. These instructions take memory address from the top of the stack.
From code organization standpoint, Miden assembly supports procedures and (will support) functions. The difference between the two is as follows:
In the context of managing local variables, we have the following situations to consider:
Handling local variables for simple procedures is straight forward - all variables live on the stack and the procedure doesn't need to allocate memory for storing locals. Similarly, handling local variables for functions is straight forward: each function gets its own memory and can place variables w/e it sees fit.
However, for complex procedures which can't fit all locals onto the stack, it is not immediately clear where they should store locals in memory to make sure one procedure doesn't overwrite locals of another procedure. There are several approaches to handle this.
We can do something similar to what the EVM does: have a convention for putting the "free memory pointer" (fmp
) somewhere in memory (in our case, probably at address 0x0) and then use this free memory pointer to indicate where a procedure could put its locals.
This is the simplest approach in terms of implementation (we don't need to change anything at the VM level), but it would introduce quite a bit of runtime overhead. Basically, each time we need to read or write a local variable to memory, we'd need to read and possibly overwrite fmp
value. Thus, in the worst case, each access to a local variable may require 2 - 3 memory accesses and a bunch of instructions in-between. Overall, I'd expect moving local variables between stack and memory to take up dozens of VM cycles with this approach.
We can still store fmp
at address 0x0, but we would modify the semantics of load and store operations to implicitly read from this address, add the value at the top of the stack to the fmp
and that would be the memory address for the operation. Basically, the memory address would be computed as: mem[0] + stack[0]
, but this would be done implicitly.
This would reduce the number of cycles we'd have to expand, but we would still need to read write fmp
from memory which would add rows to the memory trace.
We can have a dedicated register for fmp
and would modify the semantics of load and store instructions to compute addresses as fmp + stack[0]
(basically, top of the stack provides an offset relative to fmp
). The value of fmp
would be set to 0
at the start of the program.
The advantage of this approach is that we drastically cut down on the number of cycles consumed by load and store operations, and each load/store still translates to a single memory operation. The downsides are twofold:
fmp
which would only be used for the purposes of load/store operations. I don't think this is a big deal.fmp
for this we would need to have 1 or 2 dedicated instructions.The reason I said possibly for getting the value of fmp
is because we might be able to get away with just setting the value using relative offsets. For example, if a procedure needed to allocate 3 words of memory for its locals it could do something like this:
push.3 setfmp
This would be equivalent to fmp
โ fmp + 3
. And then the procedure could store its locals at addresses fmp
, fmp - 1
, and fmp - 2
.
Once the procedure is done, it could reset the value in fmp
as follows:
push.3 neg setfmp
This would be equivalent to fmp
โ fmp - 3
. We could also change the format of push
instruction to accept negative values to make this a bit more ergonomic.
Simple procedures which don't need to store locals in memory won't need to touch values in fmp
at all.
One major drawback of options 2 and 3 is that managing absolute memory pointers becomes very tricky (and maybe even impossible). So, probably more thinking is needed here.
Hi,
I was reading Miden Assembly specification's RAM addressing modes
In context of function local memory and absolute index based memory addressing, I was wondering how do I handle following depicted scenario.
# Imagine I've following Miden Assembly code, where I'd like to use local memory ( reserved while
# invoking `fn0` ) in `fn1` procedure, which is invoked from `fn0`
#
# i ) But following program instead accesses memory locations by absolute indices [ not desired ]
# ii ) If I declare `fn2`, instead of `fn1`, then it reserves another 2 VM words; lets me work with that [ not desired ]
#
# Question: How do I refer `fn{1,2}` to use local memory reserved while invoking their parent function `fn0` ?
proc.fn2.2
pushw.local.0
pushw.local.1
end
proc.fn1.0
pushw.mem.0
pushw.mem.1
end
proc.fn0.2
push.3.2.1.0
push.7.6.5.4
popw.local.1
popw.local.0
exec.fn1
end
begin
exec.fn0
end
The sys_ops file in the processor is mis-commented - it has i/o and push comments instead.
I'd be happy to fix if you want, but I know there's a lot of active work in the processor right now.
As described in u32 operations note, these operations need to rely on a set of helper registers. This is not yet implemented because the helper registers will actually be located in the decoder which is currently not implemented.
Some of the helper registers are needed to perform 16-bit range checks. And while we can't yet populate these registers with needed values, we can make lookups into the RangeChecker
for for these value (this can be done via RangeChecker::add_value()
method).
At the high level, there are two things we should do:
RangeChecker
. It might make sense to have some helper functions to handle this rather than duplicating this lookup code in every operation handler.RangeChecker
trace being populated. We should integrate this trace into the overall execution trace similar to how we did this for co-processor execution traces.For debugging purposes, it would be great to have a way to execute a program one cycle at a time, and to be able to read the state of the VM at every cycle. One way to do this is described below:
First, we could create a separate function in the processor
crate (where the execute()
function is defined) to handle step-by-step execution. This function could be called execute_iter()
and it would return an iterator over process states (we'll also need to define a struct which represents a process state).
Internally, execute_iter()
could do the same thing as execute()
but instead of converting a Process
into an ExecutionTrace
, it would iterate over the data contained in the process and return it for a given cycle. Basically, we would execute the program to the end (or until there is an error), and then start reading generated states one-by-one as a part of iteration process. This would be relatively "wasteful", but I think that for debug purposes its fine.
As mentioned above, the Process
struct already contains most of the historical data that we'd need (e.g., it keeps track of how memory changed with every cycle). There are two things for which it keeps only the latest state (rather than all historical states):
I think we can tackle advice provider part at a later time, but it would be good to come up with a way to keep historical states of the stack overflow table. This will probably be relatively heavy - so, this functionality should be optional. Meaning, we'd use it only when executing programs from execute_iter()
context, but not when using the regular execute()
context.
Currently, some test modules take input arrays in order while other modules require the inputs to be in the order they would be in after being pushed onto the stack, which is the reverse order.
We should make the order of input arrays consistent across all tests. Specifically, all test input arrays should be provided with data in order (not stack order), and we can update the test helper functions as required.
Changes are required in the following test modules:
All processor integration tests (not unit tests):
We should implement stack manipulation operations in v0.2 assembler according to the specs described here.
The basic idea is that we already have parsers stubbed out for stack manipulation instructions here and we also have the VM instructions for stack manipulation defined here. Now, we need to map each assembly instruction to one or more VM instructions.
For example, on the assembly side, we have movupw.n
instructions but there are no such instructions on the VM side. However, we can use SWAPW
instructions to simulate movupw.n
instructions as follows:
movupw.2
-> SWAPW SWAPW2
movupw.3
-> SWAPW SWAPW2 SWAPW3
For additional references see how parsing of field operations is implemented here.
The shift and and rotation operations in the assembly parsers are enforcing parameter bounds on the bit shift value of 1 <= b <= 31, but according to the assembly specs parameter value b = 0 should be allowed.
This should be fixed in parse_u32shl
, parse_u32shr
, parse_u32rotl
, parse_u32rotr
in the Assembly ops parsers.
As discussed in PR #40 , we want tests to be checking for exact errors according to Rust best practices.
TL;DR: there are different options and no perfect consensus, so we should make an opinionated decision
It seems like Rust actually doesn't handle error testing as ergonomically as most other things, so there isn't an opinionated "right answer." Most errors in Rust implement the PartialEq trait, which allows using assert_eq! in tests to check exact errors. However, a few kinds of errors don't, most notably std::io::Error. See the reference discussions below for more detailed reasoning, but it can be summarized by "It's an unfortunate ergonomic loss that io::Result instances can't be equated, but indicating that io::Error has an implementation of PartialEq is somewhat of a lie because the only type of error that can be equal is two OS errors"
The info above suggests that either deriving PartialEq (where possible) or manually implementing it for custom error types and then using assert_eq!
is an acceptable idiomatic option (as recommended in the referenced Stack Overflow discussion as well).
should_panic
or return a Result<T,E>
from the testThe book presents 2 options:
should_panic
with an expectation stringThis is possibly the "best practice" given that it's in the book, but it's not always fine-grained enough & many people seem dissatisfied with it. Additionally, both options mean that there will be strictly one error checked per unit test, which makes sense with the idea of a "unit test" but could cause a blowup in the number of tests, which may or may not be desirable (needs an opinionated decision).
should_panic
can be improved upon by using catch_unwind
which "allows you to put the code you expect to panic into a closure, and only panics emitted by that code will be considered expected (i.e. a passing test)." See the second SO reference for a discussion & links to examples of this approach being used in the Rust standard library. This approach cannot catch non-unwinding panics.
Final note: using should_panic
can give different output depending on how it's executed. "As you can see, the output of the second should_panic test case contains a backtrace. However, if you run your tests with the simple cargo test <test_name> command, the output of the both test cases are the same. So, be careful when you configure your CI/CD system!" (see the reference from Yury Zhauniarovich)
assert_matches
crateWe could use this crate to test pattern matching. It gives more control than the second option without the overhead of implementing PartialEq. This was mentioned in the Rust discussions referenced below, but I haven't seen it mentioned otherwise.
unit testing errors (PartialEq):
https://stackoverflow.com/questions/57234140/how-to-assert-io-errors-in-rust
testing with should_panic (catch_unwind):
https://stackoverflow.com/questions/26469715/how-do-i-write-a-rust-unit-test-that-ensures-that-a-panic-has-occurred
article on ins & outs of testing errors:
https://zhauniarovich.com/post/2021/2021-01-testing-errors-in-rust/
We currently put all of the integration tests into the processor crate. This may not be the best place for them as the test a bunch of things besides validity of processor execution (e.g., assembly compilation, correctness of stlib procedures). In the future, we'll probably add more integration tests which go beyond just the processor (e.g., testing proof generation/verification process).
It might be a good idea to move integration tests into a separate crate to make the overall project structure cleaner and more intuitive.
As mentioned in the comment here current implementation of u32testw
assembly instruction parser (parse_u32testw()
) produces an incorrect instruction sequence. We could fix it as suggested in the comment, but there might be a better way.
In general, we want u32testw
to leave 1
at the top of the stack if all 4 top stack elements are u32
values and 0
otherwise. Thus, we can test if a given element is a u32
(leaving 1
at the top of the stack if it is) and then use AND
operation to combine these values. Thus, we don't need to have NOT
operations after EQZ
and then we can replace OR
operations with AND
. This should also reduce the cycle count needed for this operation by 4.
So, for example, to test the first element, we'd do:
span_ops.push(Operation::Dup3);
span_ops.push(Operation::U32split);
span_ops.push(Operation::Swap);
span_ops.push(Operation::Drop);
span_ops.push(Operation::Eqz);
Instead of the current:
span_ops.push(Operation::Dup3);
span_ops.push(Operation::U32split);
span_ops.push(Operation::Swap);
span_ops.push(Operation::Drop);
span_ops.push(Operation::Eqz);
span_ops.push(Operation::Not);
And for the second, we could do:
span_ops.push(Operation::Dup3);
span_ops.push(Operation::U32split);
span_ops.push(Operation::Swap);
span_ops.push(Operation::Drop);
span_ops.push(Operation::Eqz);
span_ops.push(Operation::And);
Instead of the current:
span_ops.push(Operation::Dup3);
span_ops.push(Operation::U32split);
span_ops.push(Operation::Swap);
span_ops.push(Operation::Drop);
span_ops.push(Operation::Eqz);
span_ops.push(Operation::Not);
span_ops.push(Operation::Or);
Tests added in #117 demonstrate how local procedure memory currently overwrites memory from the outer context.
let script = compile(
"
proc.bar.2
pop.local.0
pop.local.1
end
begin
pop.mem.0
pop.mem.1
pop.mem.2
pop.mem.3
exec.bar
push.mem.3
push.mem.2
push.mem.1
push.mem.0
end",
);
let inputs = [1, 2, 3, 4, 5, 6, 7];
test_script_execution(&script, &inputs, &[7, 6, 5, 4, 1]);
Instead of the expected final stack looking like [7, 6, 5, 4, 1, ...]
, the resulting stack is [7, 2, 3, 4, 1, ...]
, indicating that absolute memory at addresses 1 and 2 was overwritten by the locals at 1 and 0 respectively.
It looks like the reason for this is that fmp
is only ever updated in the op_fmpupdate
function, so local procedure memory for procedures called directly from the main context is always offset from 0.
In validate_op_len
we return AssemblyError::invalid_op
when any part of the instruction is missing. Examples of this would be:
parse_adv
with "adv" which is missing the push.n
or loadw
portionparse_push
with ""In many of the parsers for the u32 operations (and possibly other places) we return AssemblyError::missing_param(op)
in these cases. An example of this is calling parse_u32div
with the op ""
Obviously the case with the u32 ops is trivial, but for consistency I think we should pick one error to use in all equivalent cases when the base instruction is malformed. Thoughts?
Currently, comments in Miden assembly must be surround by #
characters. Thus, even for single line comments we must write:
# this is a comment #
It is a bit annoying that we need to terminate single line comments with #
. The main motivation for doing so was ease of parsing, and in fact parsing this style of comments is very easy (they are stripped out from the source code as implemented here. But maybe there are fairly simple ways to strip out comments which terminate with a newline instead of #
as well?
We should implement input/output operations according to the specs described here.
This will also require updating VM instruction set to support missing operations.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.