Coder Social home page Coder Social logo

atree's People

Contributors

dependabot[bot] avatar fxamacker avatar jribbink avatar ramtinms avatar savetherbtz avatar supuns avatar turbolent avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

atree's Issues

Implement more benchmarks compatible with benchstat

Implement benchmarks compatible with benchstat and make it similar to Ramtin's benchmarks used for the spreadsheets. E.g. measure 100 ops rather than 1 op, and group by array size.

Additionally:

  • make long-running new benchmarks skip in short mode
  • use PCG random number generator with deterministic mode for benchmark comparisons

slab flag changes

values used for slab flags should be changed to

slab flag (8bits)
- one bit - set if this is a root slab
- one bit - set if it has pointers to other slabs (for fastcopy)
- one bit - set if no limit on the size (external collision group current implementation)

- rest for types  (32 types)
	- [00000] ArrayData		
	- [00001] ArrayMeta
	- [00010] LargeImutableArray (includes large strings) (not used right now)
	- [01000] MapData
	- [01001] MapMeta
	- [01010] LargeMapEntry (key/value pair)
	- [01011] LargeCollisionGroup (External)
	- [11111] Storable (general purpose for external objects for now) (edited) 

We also need to add method on Slab to check the first few bits with bit masking and exposes

  • IsRootOfAnObject()
  • HasPointers()
  • HasSizeLimit()

Add partial deltas map reset during `Commit`

Issue To Be Solved

Merged PR #165 to fix issue #164 disables deltas map reset during Commit. PR #165 fix was rushed to fix testnet issue related to storageUsed.

It causes all delta slabs to be encoded and saved again in the next Commit.

Suggested Solution

We should only keep slabs with empty address in deltas map after Commit (only discard saved slabs).

Delay encoding TypeInfo to enable optimizing integration into Cadence

@turbolent identified this nice optimization on Friday. It was discussed extensively on Slack today.

Basically, computing type info can be expensive. And type info for some maps and arrays aren't necessary because they aren't save to storage.

By deferring the retrieval of type info until encoding, we can also take advantage of Storage.FastCommit which uses multiple goroutines for encoding slabs.

OMT: maybe optimize by using faster hash algo combination, possibly using wyhash, XXH128, etc. as an alternative to SipHash in front of BLAKE3

Maybe we can use a significantly faster hash such as XXH128 without sacrificing much security (as long as we handle collisions) because SipHash relies on its key being secret for security.

XXHash v2 is used by many software. XXHash v3 (XXH3) is substantially faster and was finalized in v0.8 of its reference implementation, so its digest output won't change in the future.

The preferred Go implementation of XXH3 and XXH128 is https://github.com/zeebo/xxh3. Coincidentally, our BLAKE3 library in Go is by the same author (zeebo). zeebo's BLAKE3 library is used by Jean-Philippe Aumasson (the co-creator of BLAKE/BLAKE2/BLAKE3 and SipHash) in multi-party-sig (a Taurus Group SA project).

The API of XXH3 and XXH128 by zeebo doesn't support seeds/keys but we can workaround that by hashing the seed as a prefix in front of the data. The overhead of doing this is offset by XXH3 and XXH128 being much faster than SipHash.

I'm not proposing we implement a bunch of alternatives, but I thought it would be useful to document some MAC/Hash combinations that are possible using OMT.

BASELINE: OMT's hash algo combination is currently:

  1. SipHash128 -- first 64 bit of digest, with 128-bit key (not secret for now, hopefully secret key soon)
  2. SipHash128 -- next 64 bit of digest, with 128-bit key (not secret for now, hopefully secret key soon)
  3. BLAKE3 -- first 64 bit of digest, without key
  4. BLAKE3 -- next 64 bit of digest, without key
  5. no more hashing, linear search all collisions

ALT0: Use WyHash for super fast first level, then fall back to SipHash128 and BLAKE3 on collisions:

  1. WyHashv3 -- 64 bit digest, with 64-bit key
  2. SipHash128 -- first 64 bit of digest, with 128-bit key (not secret for now, hopefully secret key soon)
  3. SipHash128 -- next 64 bit of digest, with 128-bit key (not secret for now, hopefully secret key soon)
  4. BLAKE3 -- first 64 bit of digest, without key
  5. BLAKE3 -- next 64 bit of digest, without key
  6. no more hashing, linear search all collisions

ALT1: Replace SipHash128 with XXH128 for super fast first two levels:

  1. XXH128 -- first 64 bit of digest, with 128-bit key (not secret for now, hopefully secret key soon)
  2. XXH128 -- next 64 bit of digest, with 128-bit key (not secret for now, hopefully secret key soon)
  3. BLAKE3 -- first 64 bit of digest, without key
  4. BLAKE3 -- next 64 bit of digest, without key
  5. no more hashing, linear search all collisions

ALT2: Using only 64-bit digests for the first 2 levels, so the total bits remain 256 bits:

  1. XXH128low -- 64-bit digest, with 64 bit seed hashed as a prefix to the data (this allows use of libraries that don't allow seeds/keys)
  2. SipHash -- 64-bit digest, with 128-bit key (not secret for now, hopefully secret key soon)
  3. BLAKE3 -- first 64 bits of digest, without key
  4. BLAKE3 -- next 64 bits of digest, without key
  5. no more hashing, linear search all collisions

ALT3: Use XXH3 first as the fast & happy path, before the baseline:

  1. XXH128low -- 64-bit digest, with 64 bit seed hashed as a prefix to the data (this allows use of libraries that don't allow seeds/keys)
  2. SipHash128 -- first 64 bits of digest, with 128-bit key (not secret for now, hopefully secret key soon)
  3. SipHash128 -- next 64 bits of digest, with 128-bit key (not secret for now, hopefully secret key soon)
  4. BLAKE3 -- first 64 bits of digest, without key
  5. BLAKE3 -- next 64 bits of digest, without key
  6. no more hashing, linear search all collisions

SAT: Optimize rebalancing algorithm

The rebalancing algorithm can be optimized for speed and to reduce number of slabs accessed/loaded.

  • Check right sibling first (instead of left sibling) to see if it has enough data to lend
  • Check left sibling only if (instead of always) the right sibling doesn't have enough data to lend. The benefit from borrowing from the larger sibling was maybe not worth the extra effort of accessing 2 slabs.
  • Move data within the right sibling's underlying array after it lends so it has capacity for future ops

SAT: root slab size can overflow or underflow on Set()

This can happen when new element triggers the slab to split or merge, causing the root slab size to overflow/underflow without adjustment.

We need to check for such a scenario in Array.Set() to adjust the root slab properly.

Add nested array and nested map in stress test

Issue To Be Solved

Add nested array and nested map in stress test for array and map so that stress test supports arbitrary nested structures.

Supporting this may require implementing deep remove for array and map (currently done in Cadence).

Maybe add versioning to slab layouts

It can be useful to serialize version information in case the layout changes in the future.

Consider adding 1 or 2 bytes to hold layout version info.

  • with 2 bytes, we can have major version 0-255 and minor version 0-255.
  • with 1 byte, we either can have:
    • major version 0-15 and minor version 0-15, or
    • major version 0-255 and no minor versions, or
    • make the high bit tell us we're using 2 bytes, etc.

Remove deltas map reset during Commit

During Commit, delta maps are reset so all slabs with empty Address are wiped out.

EDIT: I was too hasty labeling this as "Bug" instead of "Enhancement". This issue arose because Commit was called in the middle of a transaction by another project (logic error). My apologies for the confusion. I relabeled this as "Enhancement".

OMT: maybe optimize integer map keys if feasible

We can potentially skip SipHash for integer key types to gain speed.

Maybe this optimization can be evaluated after OMT is feature complete and we have more test coverage (and fuzzing) in place.

Thanks @ramtinms for mentioning this possible optimization!

Fix unused `prev` field not being updated in some cases by removing it entirely

A new validation check found that in some edge cases, the prev sibling ids can get out of place.

The bug happens when slab A splits and we need to update slab A's old "right" sibling slab B, so slab B's prev can point to the newly split slab.

Fixing this may require loading an extra data slab. However, prev doesn't appear to be used in SAT and OMT. It only appears to be encoded/decoded.

So we can just remove prev since it's unused, which would give us back 16 bytes in each data slab for both SAT and OMT.

Return error from StoredValue() for non-root slabs

Slabs implement Storable interface, which has func StoredValue(SlabStorage) (Value, error).

Only root slabs have enough information to recreate Array/OrderedMap. Non-root slabs should return error from StoredValue function.

This affects SAT and OMT.

OMT: evaluate using fxamacker/circlehash for level 1 hash

SipHash is too slow to use as a level 1 hash.

XXH3 is a very fast alternative but there are some problems with XXH3_64 and XXH3_128.

  • Current version of github.com/zeebo/xxh3 doesn't support seeds. Adding support for seeds in a way that is compatible with XXH3 and XXH128 requires many lines of changes.

  • XXH3_64 failed smhasher tests. We'll focus on XXH128 and XXH128low because they both passed rurban/smhasher. UPDATE: XXH128low fails demerphq/smhasher (Strict Avalanche Criterion).
    UPDATE: some of these issues may have been fixed upstream and not yet updated in rurban/smhasher.

  • XXH128 is bijective for 128-bit input -> 128-bit digest but is not bijective for 64-bit input -> 64-bit digest, which can make collisions faster to find.

Example collision on 64-bit input for XXH128low (low 64 bits of XXH128):
data=3f1d4bb70c38aa9f, digest128={54ca4b07dcfa8704 ffddad0ba4c9ace9}
data=ec524de4468a8921, digest128={ce414f5214c1fa36 ffddad0ba4c9ace9}

One workaround for inputs <= 8 bytes is to prefix the data with a 64-bit key. Unfortunately, this approach using unmodified zeebo/xxh3 is slower than SipHash128 and also introduces an extra allocation.

Another solution is to use fxamacker/xxh128p when it becomes available shortly. It's optimized for using a 64-bit prefix key with XXH3_128. This solution doesn't change the algorithm of XXH3_128, so it's easy to verify results with existing XXH3_128 implementations. Although this solution would be faster than SipHash128, it would be slower than unmodified XXH3_128.

The simplest solution appears to be to use fxamacker/circlehash. It passes every test in smhasher (both rurban/smhasher and demerphq/smhasher), support a uint64 seed, produce a 64-bit digest, is faster than cespare/XXH64, and (for some data sizes) is faster than unseeded XXH3.

The biggest advantage of CircleHash64 would be simplicity and ease-of-audit given our release schedule.

OMT: optimize binary search

Binary search can be faster. Use the same optimization done for SAT, which is to use linear scan if there are insufficient elements to justify the overhead of binary search.

OMT: Storable() is used incorrectly in some edge cases

Storable is used incorrectly if:

  • there's a hash collision, existing key as Storable type is converted to Value and then back to Storable.
  • Set() is used to modify value, key's storable is created but never used.
  • Set() returns error, value's storable is created but never used.

OMT: Add and modify map features for Cadence integration

Definition of Done

  • Export newDefaultDigesterBuilder
    @fxamacker
  • Export storage field of OrderedMap
    @turbolent
  • Refactor ComparableValue to comparator
    @fxamacker
  • Refactor OrderedMap.Set to return existing storable for key and value (not needed)
  • Add PopIterate to OMT to enable optimized Cadence DictionaryValue.DeepRemove
    @fxamacker
  • Iterator for just keys
    @turbolent
  • Iterator for just values
    @turbolent
  • Refactor Hashable interface to HashInputProvider function
    @fxamacker

Linear scanning

			root
		      /      \
		meta1a        meta1b
		/    \	       /    \
         meta2a    meta2b   meta2c   meta2d
	   /  \     /   \     / \      /   \ 			 
	  d1   d2  d3   d4    ...   dn-1    dn

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.