Comments (6)
That makes sense, though I guess what I was wondering, is if that changes at all with element-addressable memory vs word-addressable memory. In other words, would that change just make memory look element-addressable, but preserve how memory loads/stores work in the VM today (i.e. word-oriented, with some changes to the constraints as you described)? Or would it also involve reworking the implementation of memory in the VM so that it is element-oriented rather than word-oriented?
The approach I described above would still keep the internal mechanics of memory word-oriented. So, basically, it would just look element-addressable.
Obviously I'm looking at this through a very narrow lens, so the importance of doing things like memory accesses in word-sized units might have far reaching implications I'm not thinking about, but in an element-addressable world, it seems out of place to implement memory accesses in terms of words, particularly if
mem_load
/mem_store
would require multiple helper registers that go essentially unused (beyond their role in the constraint system). It would obviously still be important to enforce word-alignment formem_loadw
/mem_storew
, but I'm surprised it would be necessary to have the memory itself be word-oriented, if the word-alignment restriction ensures that there is always a full word at a given address.
The word-oriented design is done primarily for efficiency reasons. For the use cases we are targeting, it will be very common to read/write memory as full words rather than individual elements. For example, nodes in a Merkle tree are words, output of the RPO hash function is a word (and inputs are 3 words), assets in Miden rollup are words etc. etc. So, we frequently have a need to read or write a full word (or even two consecutive words), and I wanted to make this as efficient as possible.
Consider mem_stream
for example. This instruction reads two consecutive words and overwrites top 8 elements of the stack with the result in a single cycle. Implementing this instruction with element-addressable memory would not be possible in our current design (or we'd need to add extra columns which would slow down the rest of the VM). And emulating this instruction with individual loads would probably require ~30 cycles.
from miden-vm.
Thank you for such a detailed proposal! Overall, I think this would add very nice additional capability to the VM (i.e., having fully element-addressable memory could be useful in many circumstances). A few preliminary thoughts:
Implementation within the VM
I think implementing this within the VM will be a quite tricky - but I also haven't spent a lot of time on this - so, maybe there are good solutions which I haven't considered yet.
The main issue is that we actually always have to read a full word from a memory address. It is just for cases of mem_load
we put the other 3 elements into helper registers (see here). So, to select a specific element from the word, we'd need to do the following:
- Put all 4 elements into helper registers.
- Split the address into 2 parts - this would require putting lower limb into another helper register.
- Represent the upper limb with 2 binary registers (e.g.,
[0, 0]
means 0,[0, 1]
means 1 etc.). Then we'd be able to use these registers as selectors for selecting a specific value.
Summarizing the above means that we need 7 helper registers, but currently we have only 6. I have some thoughts on how we can increase the number of helper registers to 8, but this would require a fairly significant refactoring of the decoder.
Another potential complication is that selectors would be of degree 2, and thus, the constraint for populating the top stack slot would be of degree 3. This means that MLOAD
and MSTORE
operation flags cannot have degree greater than 6. This should't be a problem as we currently have some empty slots for degree 5 operations. But this is still something to consider for the future.
Instruction semantics
If I understood everything correctly, the semantics you propose would be useful when addresses are static (or known at compile time). But for dynamic addresses, we'd still run into problems. For example, if we wanted to iterate over a sequence of elements in a loop, we wouldn't be able to easily increment memory address. So, it seems to me the usefulness is relatively limited.
One other idea is to treat memory as if it was element-addressable, and then impose alignment requirements on mem_loadw
and mem_storew
operations. So, for example:
mem_load.0
would load the first element of the first word;mem_load.1
would load the second element of the first word,mem_load.4
would load the first element of the second word etc.mem_loadw.0
would load the first word,mem_loadw.4
would load the second word, butmem_loadw.1
would be illegal.
I think this would be much cleaner, but there are two main downsides to this approach:
- We still need to solve the implementation problem I described above.
- This would probably break most programs which use memory operations.
from miden-vm.
If I understood everything correctly, the semantics you propose would be useful when addresses are static (or known at compile time). But for dynamic addresses, we'd still run into problems. For example, if we wanted to iterate over a sequence of elements in a loop, we wouldn't be able to easily increment memory address. So, it seems to me the usefulness is relatively limited.
I don't think that's necessarily true, since the element offsets for all 32-bit addresses are constant, i.e. adding 0 << 32
, 1 << 32
, 2 << 32
, or 3 << 32
to any address gives you the "paired" address for the given element of that word.
If you wanted to traverse every element of n
sequential words, completely dynamically, you'd do something like the following (using pseudocode):
let base_addr: *const Word = <the address in memory of the first word, doesn't have to be constant>;
for i in 0..n {
let word_offset: u64 = i / 4;
let element_offset: u64 = (i % 4) << 32;
let word_addr: u64 = base_addr as u64 + word_offset;
let element_addr: u64 = word_addr | offset;
let element = *(element_addr as *const Element);
...do stuff with element..
}
That's useful in quite a few scenarios, certainly for those in which I was intending to make use of the proposed extension.
One other idea is to treat memory as if it was element-addressable, and then impose alignment requirements on mem_loadw and mem_storew operation
I'm definitely a fan of this, as it is much cleaner right from the get go, and is similar to how on, say x86-64, certain instructions impose additional alignment requirements on their operands (e.g. SSE2 instructions), though as you mentioned it breaks backwards-compatibility. That said, I'm not sure to what extent the memory ops are used currently, so perhaps it isn't too catastrophic. I would think memory gets used pretty frequently, for precomputed constant tables and things like that, but I have no idea.
We still need to solve the implementation problem I described above.
Is that true if addresses now refer to elements not words? I would think in that case, there is no longer a need for an offset, leaving things unchanged at the instruction level. Is it because the alignment requirements that would be imposed on mem_loadw
and mem_storew
require additional helper registers, much in the same way my proposal would?
from miden-vm.
Having native element addressable memory support at the VM level seems the superior option to me. There are a number of instances in the transaction kernel which would benefit for this functionality.
from miden-vm.
If you wanted to traverse every element of
n
sequential words, completely dynamically, you'd do something like the following (using pseudocode):
I didn't write out the actually assembly code for this, but just doing a rough estimation I think we'd incur extra 6 - 7 cycles per element read in this code. This is better than 9 - 10 cycles per read using padw, push.ADDR, mem_loadw, drop, drop, swap, drop
- but it is not dramatically better. So, there is still a question in my mind whether the extra effort is worth ~30% improvement.
I'm not sure to what extent the memory ops are used currently, so perhaps it isn't too catastrophic.
Quite a few modules currently use memory operations, but almost all of them are written (and maintained) by us, and most of them a pretty small. There are two exceptions:
- Transaction kernel which is fairly complex and relies heavily on memory. Memory access there are fairly well encapsulated and @frisitano believes that it shouldn't be too difficult to move to the new memory model.
- Recursive verifier which is also fairly complex and also relies on memory quite a bit. @Al-Kindi-0 also believes that we could migrate it to the new memory model (though effort may be more significant than with transaction kernel).
Is that true if addresses now refer to elements not words?
No, addresses right now refer to words and we can read/write from/to memory only full words. The way we get around this for mem_load
and mem_store
instructions is by putting the things which don't end up on the stack into helper registers. So, we still read (or write) a full word, but part of the word goes into (or comes from) the helper registers.
As a side note, now that I look at the constraints of the mem_store
operation it seems to me that there might be a soundness bug there (i.e., a malicious prover could inject incorrect values into the temporary registers, and overwrite the other 3 elements at a given address with arbitrary values). @Al-Kindi-0 - what do you think? If so, we might have to move back to the approach we used previously when each memory row in memory chiplet contained both old and new values.
Combing back to the design for new mem_load
and mem_store
operations. Their transition diagrams could look like so:
and
In the above:
-
$addr$ is the element address. -
$waddr$ is the word address computed as$waddr = addr / 4$ . -
$s_0$ and$s_1$ is a binary encoding of the element index within a word. Thus, we can impose a constraint for valid address decomposition as follows:$addr \cdot 4 + 2 \cdot s_1 + s_0 = waddr$ . In addition to this, we'll need constraints to enforce that$s_0$ and$s_1$ are binary, and that$add$ is a u32 value (not sure about the best way to do this yet). -
$val_0, ..., val_3$ are the values currently stored at$waddr$ .
In the case of MLOAD
, the constraint for
The constraints for MSTORE
seem to be somewhat more complicated - so, I'll need to think more about them. But in either case, we need to have at least 1 extra helper register.
from miden-vm.
...I think we'd incur extra 6 - 7 cycles per element read in this code. This is better than 9 - 10 cycles per read...but it is not dramatically better. So, there is still a question in my mind whether the extra effort is worth ~30% improvement.
I agree that the savings are relatively minimal, though I do think it's a matter of frequency - if such instructions occur frequently, the cycles saved start to add up fast for a program with any significant runtime. That said, the design here is a compromise for the sake of backwards-compatibility, and is really more about ergonomics/uniformity (whether the assembly is hand-written or compiler-generated). Overall though, I much prefer the approach of making memory element-addressable in general, since that seems like an option we can actually consider. The difference in cycle count for equivalent memory accesses between the two models would be night and day I think, since it removes a lot of complexity around accessing individual elements. It does impose a bit more overhead when accessing words, but only in situations where the alignment of the address is not known, and must be checked at runtime. In practice, accessing values smaller than a word in memory is going to be far more common than >= word-sized values (at least for code compiled through Wasm), so favoring those operations is going to have a bigger performance impact.
No, addresses right now refer to words and we can read/write from/to memory only full words. The way we get around this for mem_load and mem_store instructions is by putting the things which don't end up on the stack into helper registers. So, we still read (or write) a full word, but part of the word goes into (or comes from) the helper registers.
That makes sense, though I guess what I was wondering, is if that changes at all with element-addressable memory vs word-addressable memory. In other words, would that change just make memory look element-addressable, but preserve how memory loads/stores work in the VM today (i.e. word-oriented, with some changes to the constraints as you described)? Or would it also involve reworking the implementation of memory in the VM so that it is element-oriented rather than word-oriented?
Obviously I'm looking at this through a very narrow lens, so the importance of doing things like memory accesses in word-sized units might have far reaching implications I'm not thinking about, but in an element-addressable world, it seems out of place to implement memory accesses in terms of words, particularly if mem_load
/mem_store
would require multiple helper registers that go essentially unused (beyond their role in the constraint system). It would obviously still be important to enforce word-alignment for mem_loadw
/mem_storew
, but I'm surprised it would be necessary to have the memory itself be word-oriented, if the word-alignment restriction ensures that there is always a full word at a given address.
Anyway, I want to avoid derailing the conversation here just because I don't have a solid grasp of how some of these pieces fit together in the VM internals, so if I'm missing obvious stuff here (or, obvious if you know the internals well), feel free to ignore the above. I'm certainly curious about the relationship between the memory instructions and their implementation in the VM (and how that affects the constraints), but I imagine it's probably been discussed/documented somewhere already, and I just need to go do some reading.
from miden-vm.
Related Issues (20)
- `ExecutionTrace.main_trace`: change type to `MainTrace`
- `AuxColumnBuilder`: consider using a `BTreeMap` for requests HOT 3
- Investigate faster polynomial evaluation in Falcon DSA HOT 1
- docs.rs build is failling because of stdlib masl compilation HOT 7
- Add infrastructure to isolate the contents of the Advice Provider based on the call context.
- `mtree_get` should return only `V` (not the root as well) HOT 1
- Rewrite MAST to use structure-of-arrays/table-based representation HOT 6
- `stdlib smt`: add support for leaves with multiple key/value pairs
- Remove `smt64` from `stdlib` HOT 1
- Implement`adv.push_smtset` and `adv.push_smtget`
- Allow the removal of entries from advice map
- Handle extra memory requests made by new `RCombBase` op HOT 1
- `debug.stack` decorator is broken HOT 1
- Implement extensible subsystem for on-demand storage/provisioning of MAST objects HOT 7
- Test to ensure that `RCOMBBASE` instruction can be executed and proven HOT 1
- Introduce a newtype for to represent a row in a trace
- Update `StackOutputs` fields HOT 1
- Update `usize` serialization and deserialization HOT 1
- Encryption in the VM
- Kernel procedure table
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from miden-vm.