Comments (9)
One moment when reading I got hope for lazy arrays and dictionaries. ( wishful reading )
But then I thought why not have that as it also covers this case. I spent very little time on atree but maybe this gives some ideas so I am sharing. ( It can be totally stupid, in that case sorry in advance to take your time )
What if what we have something like Hydrate
method on Value
. Normally everything will be dehydrated
( something like pointer with type ) but user of atree ( in this case Cadence ) hydrate values ( something like pointer dereference ) when needed.
We will be able to navigate all object tree with dehydrated values, hydrate something if we need.
In this use case Cadence can iterate and check if value is hydrated.
from atree.
@bluesign Great points and explanation!
atree's data structures (arrays and ordered maps) are basically already lazy. Loading an array or map from storage only loads their root node, and only once elements are accessed, those accessed elements are loaded, and nothing else.
The "dehydrated" elements are Storable
s, the "hydrated" elements/values are Value
s, the "hydrate" function is Storable.StoredValue() Value
, and the "dehydrate" function is Value.Storable() Storable
.
The "problem" is that the current iteration functionality operates on high-level/"hydrated" Value
s, instead of low-level/"dehydrated" Storable
s.
Offering iteration functionality on Storable
level would be a good first step towards avoiding loading values / dehydration. However, what is proposed here goes one step further: it would be even more efficient to not even consider Storable
s in a container that were never read before / loaded into memory (what Faye is referring to above as gaps in the tree).
from atree.
@bluesign Thanks for great points! I think Bastian's reply covered everything.
@turbolent Thanks for explanation and links to Cadence issue+PR for more context.
Would it be possible to add such a feature? Would such an approach work?
I think this feature is possible but we might need a different approach than suggested to iterate loaded elements.
Currently iterator traverses data slabs using next storage ID to avoid loading more metadata slabs. So it doesn't handle "gaps" of unloaded data slabs. We need a different iterator to traverse data slabs from parent slabs.
To help me prioritize, when do we need this feature?
from atree.
Thanks @fxamacker for looking into this!
Yeah, like Bastian mentioned, I've been looking into alternatives, but apart from the one in the original PR onflow/cadence#2460 (which is inefficient), other alternatives are turning out to be mostly dead-ends.
And yes, it is not a blocker for the moment, but would be a blocker for releasing Stable Cadence. The existing PR (onflow/cadence#2460) would only need a very little change once this is available I believe (meaning that there are no big changes waiting on this feature).
Let me know if you need more context on the use-case/how this would be used, I can maybe set up a quick call to walk through an example.
from atree.
Hi @turbolent
What is the use case for iterating over just the loaded elements?
It'll be helpful to know more context (why needed & how used) because some edge cases come to mind:
- What if the loaded element is
StorageID
pointing to another slab? - What if there is gap between loaded elements? For example, there are 3 data slabs for an array and only first and last slabs are loaded.
from atree.
@fxamacker Great questions!
What is the use case for iterating over just the loaded elements?
This idea / feature request is related to onflow/cadence#2460.
For example, imagine a large container (array or dictionary) is loaded from storage, and only one or a few elements are accessed. When the container is moved, we need to potentially perform some operations for elements of the container (in particular, if references where taken to elements, those need to be invalidated).
In that case we know though that it was only possible to create a reference if the element was first loaded, i.e. we can ignore elements that have not been loaded before.
What if the loaded element is StorageID pointing to another slab?
If a storage ID storable (a storable that points to another storable stored in another slab) is encountered, it should only be followed if the targeted slab was previously loaded.
For example, imagine a Cadence array of resources. The array is backed by an atree array, and the resources in it are backed by atree ordered maps, the array is only directly storing storage ID storables.
If a reference is created to the second element, only it gets loaded into memory, the other elements are not.
Thus, when iterating, only the storage ID storable for the loaded slab ID should be followed (i.e. result in a load via StoredValue
, and iterator callback).
What if there is gap between loaded elements? For example, there are 3 data slabs for an array and only first and last slabs are loaded.
As far as I understand, the same logic applies for iteration/loading of internal slabs: I think only the loaded data slabs have to be considered, because by definition, a load of an element that is stored by a data slab in the "gap" should have caused a load of that data slab (and its parents in the tree).
from atree.
I think this feature is possible but we might need a different approach than suggested to iterate loaded elements.
Currently iterator traverses data slabs using next storage ID to avoid loading more metadata slabs. So it doesn't handle "gaps" of unloaded data slabs. We need a different iterator to traverse data slabs from parent slabs.
Nice!
To help me prioritize, when do we need this feature?
It is not urgent, but would be "needed" for the Stable Cadence release, as it features reference invalidation on resource invalidation (move/deletion; see linked Cadence PR). @SupunS is looking into alternative solutions, but it feels like this would be the "most efficient", in terms of avoiding the addition of a new bookkeeping mechanism to Cadence for this purpose.
from atree.
Quick update. I told @j1010001 this should take about 3-5 days total (most of that for writing tests). Will try to open PR this week (most likely tomorrow if there are no surprises or emergencies because its going much faster than expected).
from atree.
Sounds great @fxamacker! This isn't urgent BTW, we only need it for releasing Stable Cadence, which is several months out
from atree.
Related Issues (20)
- Update smoke tests for atree inlining HOT 1
- Add custom maprange linter to GitHub Actions from Cadence
- Add more integration tests for inlining HOT 1
- Deduplicate data when encoding nested Cadence composites to reduce memory and storage HOT 1
- Smoke tests need to check recently added data deduplication feature HOT 1
- Resolve issues found during Cadence integration for Atree inlining and deduplication HOT 1
- Add more integration tests HOT 1
- Determine if atree needs to support edge case related to iterating maps being mutated HOT 1
- Add feature to support mutation of primitive values returned by iterators HOT 1
- Handle edge cases needed to support mutation of inlined maps and arrays during iteration HOT 1
- Additional memory and storage savings are possible by deduplicating Cadence dictionary type info HOT 1
- Pin dependencies in CodeQL workflow
- Allow updating static types of `Array` HOT 1
- Allow updating static types of `OrderedMap` HOT 1
- Add feature to enable migration programs to fix references to non-existent register HOT 1
- Create optimized commit and preload for migrations HOT 1
- Fix the migration feature that filters unreferenced slabs
- SlabIterator should include nested storage ID (non-inlining feature branch) HOT 1
- Allow detection and logging on mutation of elements returned by readonly iterators HOT 2
- PersistentSlabStorage cache and deltas optimization HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from atree.