Coder Social home page Coder Social logo

py-trie's Introduction

Python Implementation of the Ethereum Trie structure

Join the conversation on Discord Build Status PyPI version Python versions

This library and repository was previously located at pipermerriam/py-trie. It was transferred to the Ethereum foundation GitHub in November 2017 and renamed to py-trie.

Installation

python -m pip install trie

Developer Setup

If you would like to hack on py-trie, please check out the Snake Charmers Tactical Manual for information on how we do:

  • Testing
  • Pull Requests
  • Documentation

We use pre-commit to maintain consistent code style. Once installed, it will run automatically with every commit. You can also run it manually with make lint. If you need to make a commit that skips the pre-commit checks, you can do so with git commit --no-verify.

Development Environment Setup

You can set up your dev environment with:

git clone [email protected]:ethereum/py-trie.git
cd py-trie
virtualenv -p python3 venv
. venv/bin/activate
python -m pip install -e ".[dev]"
pre-commit install

Running the tests

You can run the tests with:

git submodule update --init --recursive
pytest tests

Release setup

To release a new version:

make release bump=$$VERSION_PART_TO_BUMP$$

How to bumpversion

The version format for this repo is {major}.{minor}.{patch} for stable, and {major}.{minor}.{patch}-{stage}.{devnum} for unstable (stage can be alpha or beta).

To issue the next version in line, specify which part to bump, like make release bump=minor or make release bump=devnum. This is typically done from the main branch, except when releasing a beta (in which case the beta is released from main, and the previous stable branch is released from said branch).

If you are in a beta version, bumpversion stage will switch to a stable.

To issue an unstable version when the current version is stable, specify the new version explicitly, like bumpversion --new-version 4.0.0-alpha.1 devnum

Usage

>>> from trie import HexaryTrie
>>> t = HexaryTrie(db={})
>>> t.root_hash
b'V\xe8\x1f\x17\x1b\xccU\xa6\xff\x83E\xe6\x92\xc0\xf8n[H\xe0\x1b\x99l\xad\xc0\x01b/\xb5\xe3c\xb4!'
>>> t.set(b'my-key', b'some-value')
>>> t.get(b'my-key')
b'some-value'
>>> t.exists(b'another-key')
False
>>> t.set(b'another-key', b'another-value')
>>> t.exists(b'another-key')
True
>>> t.delete(b'another-key')
>>> t.exists(b'another-key')
False

You can also use it like a dictionary.

>>> from trie import HexaryTrie
>>> t = HexaryTrie(db={})
>>> t.root_hash
b'V\xe8\x1f\x17\x1b\xccU\xa6\xff\x83E\xe6\x92\xc0\xf8n[H\xe0\x1b\x99l\xad\xc0\x01b/\xb5\xe3c\xb4!'
>>> t[b'my-key'] = b'some-value'
>>> t[b'my-key']
b'some-value'
>>> b'another-key' in t
False
>>> t[b'another-key']  = b'another-value'
>>> b'another-key' in t
True
>>> del t[b'another-key']
>>> b'another-key' in t
False

Traversing (inspecting trie internals)

>>> from trie import HexaryTrie
>>> t = HexaryTrie(db={})
>>> t.root_hash
b'V\xe8\x1f\x17\x1b\xccU\xa6\xff\x83E\xe6\x92\xc0\xf8n[H\xe0\x1b\x99l\xad\xc0\x01b/\xb5\xe3c\xb4!'
>>> t[b'my-key'] = b'some-value'
>>> t[b'my-other-key']  = b'another-value'

# Look at the root node:
>>> root_node = t.traverse(())
>>> root_node
HexaryTrieNode(sub_segments=((0x6, 0xd, 0x7, 0x9, 0x2, 0xd, 0x6),), value=b'', suffix=(), raw=[b'\x16\xd7\x92\xd6', b'\xb4q\xb8h\xec\x1c\xe1\xf4\\\x88\xda\xb4\xc1\xc2n\xbaw\xd0\x9c\xf1\xacV\xb4Dk\xa7\xe6\xd7qf\xc2\x82'])

# the root node is an extension down, because the first 7 nibbles are the same between the two keys

# Let's walk down to the child of that extension
>>> prefix6d792d6 = t.traverse(root_node.sub_segments[0])
>>> prefix6d792d6
HexaryTrieNode(sub_segments=((0xb,), (0xf,)), value=b'', suffix=(), raw=[b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', [b' ey', b'some-value'], b'', b'', b'', [b' ther-key', b'another-value'], b''])

# A branch node separates the second nibbles of b'k' and b'o': 0xb and 0xf
# Notice the position of the children in the 11th and 15th index

# Another way to get there without loading the root node from the database is using traverse_from:
>>> assert t.traverse_from(root_node, root_node.sub_segments[0]) == prefix6d792d6

# Embedded nodes can be traversed to the same way as nodes stored in the database:

>>> t.traverse(root_node.sub_segments[0] + (0xb,))
HexaryTrieNode(sub_segments=(), value=b'some-value', suffix=(0x6, 0x5, 0x7, 0x9), raw=[b' ey', b'some-value'])

# This leaf node includes the suffix (the rest of the key, in nibbles, that haven't been traversed,
# just b'ey': 0x6579

Walking a full trie

To walk through the full trie (for example, to verify that all node bodies are present in the database), use HexaryTrieFog and the traversal API above.

For example:

>>> from trie import HexaryTrie
>>> t = HexaryTrie(db={})
>>> t.root_hash
b'V\xe8\x1f\x17\x1b\xccU\xa6\xff\x83E\xe6\x92\xc0\xf8n[H\xe0\x1b\x99l\xad\xc0\x01b/\xb5\xe3c\xb4!'
>>> t[b'my-key'] = b'some-value'
>>> t[b'my-other-key']  = b'another-value'
>>> t[b'your-key'] = b'your-value'
>>> t[b'your-other-key'] = b'your-other-value'
>>> t.root_hash
b'\xf8\xdd\xe4\x0f\xaa\xf4P7\xfa$\xfde>\xec\xb4i\x00N\xa3)\xcf\xef\x80\xc4YU\xe8\xe7\xbf\xa89\xd5'

# Initialize a fog object to track unexplored prefixes in a trie walk
>>> from trie.fog import HexaryTrieFog
>>> empty_fog = HexaryTrieFog()
# At the beginning, the unexplored prefix is (), which means that none of the trie has been explored
>>> prefix = empty_fog.nearest_unknown()
()

# So we start by exploring the node at prefix () -- which is the root node:
>>> node = t.traverse(prefix)
HexaryTrieNode(sub_segments=((0x6,), (0x7,)), value=b'', suffix=(), raw=[b'', b'', b'', b'', b'', b'', b"\x03\xd2vk\x85\xce\xe1\xa8\xdb'F\x8c\xe5\x15\xc6\n+M:th\xa1\\\xb13\xcc\xe8\xd0\x1d\xa7\xa8U", b"\x1b\x8d'\xb3\x99(yX\xaa\x96C!\xba'X \xbb|\xa6,\xb5V!\xd3\x1a\x05\xe5\xbf\x02\xa3fR", b'', b'', b'', b'', b'', b'', b'', b'', b''])
# and mark the root as explored, while defining the unexplored children:
>>> level1fog = empty_fog.explore(prefix, node.sub_segments)
# Now the unexplored prefixes are the keys starting with the four bits 6 and the four bits 7.
# All other keys are known to not exist (and so have been explored)
>>> level1fog
HexaryTrieFog<SortedSet([(0x6,), (0x7,)])>

# So we continue exploring. The fog helps choose which prefix to explore next:
>>> level1fog.nearest_unknown()
(0x6,)
# We can also look for the nearest unknown key to a particular target
>>> prefix = level1fog.nearest_unknown((8, 1))
(0x7,)
>>> node7 = node.traverse(prefix)
HexaryTrieNode(sub_segments=((0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6),), value=b'', suffix=(), raw=[b'\x00\x96\xf7W"\xd6', b"\xe2\xe2oN\xe1\xf8\xda\xc1\x8c\x03\x92'\x93\x805\xad-\xef\x07_\x0ePV\x1f\xb5/lVZ\xc6\xc1\xf9"])
# We found an extension node, and mark it in the fog
# For simpliticy, we'll start clobbering the `fog` variable
>>> fog = level1fog.explore(prefix, node7.sub_segments)
HexaryTrieFog<SortedSet([(0x6,), (0x7, 0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6)])>

# Let's explore the next branch node and see what's left
>>> prefix = fog.nearest_unknown((7,))
(0x7, 0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6)
>>> node796f75722d6 = t.traverse(prefix)
HexaryTrieNode(sub_segments=((0xb,), (0xf,)), value=b'', suffix=(), raw=[b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', [b' ey', b'your-value'], b'', b'', b'', [b' ther-key', b'your-other-value'], b''])
# Notice that the branch node inlines the values, but the fog and annotated node ignore them for now
>>> fog = fog.explore(prefix, node796f75722d6.sub_segments)
HexaryTrieFog<SortedSet([(0x6,), (0x7, 0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xb), (0x7, 0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xf)])>

# Index keys may not matter for some use cases, so we can leave them out
#   entirely, like nearest_unknown().
# There's one more feature to consider: we can look directionally to the right
#   of an index for the nearest prefix.
>>> prefix = fog.nearest_right((0x7, 0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xc))
(0x7, 0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xf)
# That same index key would give a closer prefix to the left if direction didn't matter
#   (See the difference in the very last nibble)
>>> fog.nearest_unknown((0x7, 0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xc))
(0x7, 0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xb)
# So we traverse to this last embedded leaf node at `prefix`
>>> a_leaf_node = t.traverse(prefix)
HexaryTrieNode(sub_segments=(), value=b'your-other-value', suffix=(0x7, 0x4, 0x6, 0x8, 0x6, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xb, 0x6, 0x5, 0x7, 0x9), raw=[b' ther-key', b'your-other-value'])
# we mark the prefix as fully explored like so:
>>> fog = fog.explore(prefix, a_leaf_node.sub_segments)
HexaryTrieFog<SortedSet([(0x6,), (0x7, 0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xb)])>
# Notice that sub_segments was empty, and the prefix has disappeared from our list of unexplored prefixes

# So far we have dealt with an un-changing trie, but what if it is
#   modified while we are working on it?
>>> del t[b'your-other-key']
>>> t[b'your-key-rebranched'] = b'your-value'
>>> t.root_hash
b'"\xc0\xcaQ\xa7X\x08E\xb5"A\xde\xbfY\xeb"XY\xb1O\x034=\x04\x06\xa9li\xd8\x92\xadP'

# The unexplored prefixes we have before might not exist anymore. They might:
#   1. have been deleted entirely, in which case, we will get a blank node, and need no special treatment
#   2. lead us into the middle of a leaf or extension node, which makes things tricky
>>> prefix = fog.nearest_unknown((8,))
(0x7, 0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xb)
>>> t.traverse(prefix)
TraversedPartialPath: Could not traverse through HexaryTrieNode(sub_segments=((0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xb, 0x6, 0x5, 0x7, 0x9),), value=b'', suffix=(), raw=[b'\x19our-key', b'f\xbe\x88\x8f#\xd5\x15-8\xc0\x1f\xfb\xf7\x8b=\x98\x86 \xec\xdeK\x07\xc8\xbf\xa7\x93\xfa\x9e\xc1\x89@\x00']) at (0x7,), only partially traversed with: (0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xb)

# Let's drill into what this means:
#   - We fully traversed to a node at prefix (7,)
#   - We tried to traverse into the rest of the prefix
#   - We only got part-way through the extension node: (0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xb)
#   - The extension node full sub-segment is actually: (0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xb, 0x6, 0x5, 0x7, 0x9)

# So what do we do about it? Catch the exception, and explore with the fog slightly differently
>>> from trie.exceptions import TraversedPartialPath
>>> last_exception = None
>>> try:
      t.traverse(prefix)
    except TraversedPartialPath as exc:
      last_exception = exc

# We can now continue exploring the children of the extension node, by using an attribute on the exception:
>>> sub_segments = last_exception.simulated_node.sub_segments
((0x6, 0x5, 0x7, 0x9),)
# Note that this sub-segment now carries us the rest of the way to the child
#   of the node that we only partially traversed into.
# This "simulated_node" is created by slicing the extension node in two: the
#   first extension node having the path that we (partially) traversed, and the second
#   extension node being the child of that parent, which continues on to point to
#   the child of the original extension.
# If the exception is raised on a leaf node, then the leaf node is sliced into
#   an extension and another shorter leaf node.
>>> fog = fog.explore(prefix, sub_segments)
HexaryTrieFog<SortedSet([(0x6,), (0x7, 0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xb, 0x6, 0x5, 0x7, 0x9)])>

# So now we can pick up where we left off, traversing to the child of the extension node, and so on.
>>> prefix = fog.nearest_unknown((8,))
(0x7, 0x9, 0x6, 0xf, 0x7, 0x5, 0x7, 0x2, 0x2, 0xd, 0x6, 0xb, 0x6, 0x5, 0x7, 0x9)
# The following will not raise a TraversedPartialPath exception, because we know that
#   a node was at the path, and the trie hasn't changed:
>>> t.traverse(prefix)
HexaryTrieNode(sub_segments=((0x2,),), value=b'your-value', suffix=(), raw=[b'', b'', [b'=rebranched', b'your-value'], b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'your-value'])

# etc...

Note: traverse() will access the database for every node from the root to the target node. If navigating a large trie, consider using TrieFrontierCache and HexaryTrie.traverse_from() to minimize database lookups. See the tests in tests/test_hexary_trie_walk.py for some examples.

BinaryTrie

Note: One drawback of Binary Trie is that one key can not be the prefix of another key. For example, if you already set the value value1 with key key1, you can not set another value with key key or key11 and the like.

BinaryTrie branch and witness helper functions

>>> from trie import BinaryTrie
>>> from trie.branches import (
>>>     check_if_branch_exist,
>>>     get_branch,
>>>     if_branch_valid,
>>>     get_witness_for_key_prefix,
>>> )
>>> t = BinaryTrie(db={})
>>> t.root_hash
b"\xc5\xd2F\x01\x86\xf7#<\x92~}\xb2\xdc\xc7\x03\xc0\xe5\x00\xb6S\xca\x82';{\xfa\xd8\x04]\x85\xa4p"
>>> t.set(b'key1', b'value1')
>>> t.set(b'key2', b'value2')

Now Trie looks like this:

    root --->  (kvnode, *common key prefix*)
                         |
                         |
                         |
                    (branchnode)
                     /         \
                    /           \
                   /             \
(kvnode, *remain kepath*)(kvnode, *remain kepath*)
            |                           |
            |                           |
            |                           |
  (leafnode, b'value1')       (leafnode, b'value2')
>>> # check_if_branch_exist function
>>> check_if_branch_exist(t.db, t.root_hash, b'key')
True
>>> check_if_branch_exist(t.db, t.root_hash, b'key1')
True
>>> check_if_branch_exist(t.db, t.root_hash, b'ken')
False
>>> check_if_branch_exist(t.db, t.root_hash, b'key123')
False
>>> # get_branch function
>>> get_branch(t.db, t.root_hash, b'key1')
(b'\x00\x82\x1a\xd9^L|38J\xed\xf31S\xb2\x97A\x8dy\x91RJ\x92\xf5ZC\xb4\x99T&;!\x9f\xa9!\xa2\xfe;', b"\x01*\xaccxH\x89\x08}\x93|\xda\xb9\r\x9b\x82\x8b\xb2Y\xbc\x10\xb9\x88\xf40\xef\xed\x8b'\x13\xbc\xa5\xccYGb\xc2\x8db\x88lPs@)\x86v\xd7B\xf7\xd3X\x93\xc9\xf0\xfd\xae\xe0`j#\x0b\xca;\xf8", b'\x00\x11\x8aEL3\x839E\xbd\xc4G\xd1xj\x0fxWu\xcb\xf6\xf3\xf2\x8e7!M\xca\x1c/\xd7\x7f\xed\xc6', b'\x02value1')

Node started with b'\x00', b'\x01' and b'\x02' are kvnode, branchnode and leafnode respectively.

>>> get_branch(t.db, t.root_hash, b'key')
(b'\x00\x82\x1a\xd9^L|38J\xed\xf31S\xb2\x97A\x8dy\x91RJ\x92\xf5ZC\xb4\x99T&;!\x9f\xa9!\xa2\xfe;',)
>>> get_branch(t.db, t.root_hash, b'key123') # InvalidKeyError
>>> get_branch(t.db, t.root_hash, b'key5') # there is still branch for non-exist key
(b'\x00\x82\x1a\xd9^L|38J\xed\xf31S\xb2\x97A\x8dy\x91RJ\x92\xf5ZC\xb4\x99T&;!\x9f\xa9!\xa2\xfe;',)
>>> # if_branch_valid function
>>> v = t.get(b'key1')
>>> b = get_branch(t.db, t.root_hash, b'key1')
>>> if_branch_valid(b, t.root_hash, b'key1', v)
True
>>> v = t.get(b'key5') # v should be None
>>> b = get_branch(t.db, t.root_hash, b'key5')
>>> if_branch_valid(b, t.root_hash, b'key5', v)
True
>>> v = t.get(b'key1')
>>> b = get_branch(t.db, t.root_hash, b'key2')
>>> if_branch_valid(b, t.root_hash, b'key1', v) # KeyError
>>> if_branch_valid([], t.root_hash, b'key1', v) # AssertionError
>>> # get_witness_for_key_prefix function
>>> get_witness_for_key_prefix(t.db, t.root_hash, b'key1') # equivalent to `get_branch(t.db, t.root_hash, b'key1')`
(b'\x00\x82\x1a\xd9^L|38J\xed\xf31S\xb2\x97A\x8dy\x91RJ\x92\xf5ZC\xb4\x99T&;!\x9f\xa9!\xa2\xfe;', b"\x01*\xaccxH\x89\x08}\x93|\xda\xb9\r\x9b\x82\x8b\xb2Y\xbc\x10\xb9\x88\xf40\xef\xed\x8b'\x13\xbc\xa5\xccYGb\xc2\x8db\x88lPs@)\x86v\xd7B\xf7\xd3X\x93\xc9\xf0\xfd\xae\xe0`j#\x0b\xca;\xf8", b'\x00\x11\x8aEL3\x839E\xbd\xc4G\xd1xj\x0fxWu\xcb\xf6\xf3\xf2\x8e7!M\xca\x1c/\xd7\x7f\xed\xc6', b'\x02value1')
>>> get_witness_for_key_prefix(t.db, t.root_hash, b'key') # this will include additional nodes of b'key2'
(b'\x00\x82\x1a\xd9^L|38J\xed\xf31S\xb2\x97A\x8dy\x91RJ\x92\xf5ZC\xb4\x99T&;!\x9f\xa9!\xa2\xfe;', b"\x01*\xaccxH\x89\x08}\x93|\xda\xb9\r\x9b\x82\x8b\xb2Y\xbc\x10\xb9\x88\xf40\xef\xed\x8b'\x13\xbc\xa5\xccYGb\xc2\x8db\x88lPs@)\x86v\xd7B\xf7\xd3X\x93\xc9\xf0\xfd\xae\xe0`j#\x0b\xca;\xf8", b'\x00\x11\x8aEL3\x839E\xbd\xc4G\xd1xj\x0fxWu\xcb\xf6\xf3\xf2\x8e7!M\xca\x1c/\xd7\x7f\xed\xc6', b'\x02value1', b'\x00\x10O\xa9\x0b\x1c!_`<\xb5^\x98D\x89\x17\x148\xac\xda&\xb3P\xf6\x06[\x1b9\xc09\xbas\x85\xf5', b'\x02value2')
>>> get_witness_for_key_prefix(t.db, t.root_hash, b'') # this will return the whole trie

py-trie's People

Contributors

6ug avatar aashutoshrathi avatar antazoey avatar balajipachai avatar bhargavasomu avatar carver avatar cburgdorf avatar davesque avatar fselmo avatar ftruzzi avatar fubuloubu avatar gsalgado avatar hwwhww avatar kclowes avatar mattermat avatar miohtama avatar mttbernardini avatar nic619 avatar njgheorghita avatar pacrob avatar pckilgore avatar pipermerriam avatar reedsa avatar richardlitt avatar stefanmendoza avatar tmckenzie51 avatar uxio0 avatar wolovim avatar zsaladin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

py-trie's Issues

Python 3.12: no pkg_resources

What happened?

I added support for python 3.12 in this PR: https://github.com/rkvst/rkvst-receipt-scitt/pull/11.

The CICDaction failes because pkg_resources no longer exists.

See https://github.com/rkvst/rkvst-receipt-scitt/actions/runs/6559511687/job/17815261244?pr=11

Code that produced the error

File "/opt/hostedtoolcache/Python/3.12.0/x64/lib/python3.12/site-packages/trie/__init__.py", line 1, in <module>
    import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'

Full error output

No response

Fill this section in if you know how this could or should be fixed

Use importlib.resources instead.

https://importlib-resources.readthedocs.io/en/latest/migration.html

py-trie Version

2.1.0

Python Version

3.12

Operating System

linux

Output from pip freeze

Run python3 -m pip freeze # required to report issue to py-trie
alabaster==0.7.13
asn1crypto==1.5.1
astroid==3.0.1
attrs==23.1.0
autopep8==2.0.4
Babel==2.13.0
black==23.10.0
boolean.py==4.0
build==1.0.3
CacheControl==0.13.1
cbor2==5.4.6
certifi==2023.7.22
certvalidator==0.11.1
cffi==1.16.0
charset-normalizer==3.3.0
click==8.1.7
coverage==7.3.2
cryptography==41.0.4
cyclonedx-python-lib==4.2.3
cytoolz==0.12.2
defusedxml==0.7.1
dill==0.3.7
docutils==0.18.1
ecdsa==0.18.0
eth-hash==0.5.2
eth-typing==3.5.0
eth-utils==2.2.2
filelock==3.12.4
hexbytes==0.3.1
html5lib==1.1
idna==3.4
imagesize==1.4.1
importlib-metadata==6.8.0
isort==5.12.0
jaraco.classes==3.3.0
jeepney==0.8.0
Jinja2==3.1.2
keyring==24.2.0
license-expression==30.1.1
lxml==4.9.3
markdown-it-py==2.2.0
MarkupSafe==2.1.3
mccabe==0.7.0
mdit-py-plugins==0.3.5
mdurl==0.1.2
mistune==2.0.5
more-itertools==10.1.0
msgpack==1.0.7
mypy-extensions==1.0.0
myst-parser==1.0.0
nh3==0.2.14
oscrypto==1.3.0
packageurl-python==0.11.2
packaging==23.2
pathspec==0.11.2
pip-api==0.0.30
pip-requirements-parser==32.0.1
pip_audit==2.6.1
pkginfo==1.9.6
platformdirs==3.11.0
py-serializable==0.11.1
pycodestyle==2.11.1
pycose==1.0.1
pycparser==2.21
pycryptodome==3.19.0
pyenchant==3.2.2
Pygments==2.16.1
pylint==3.0.1
pyparsing==3.1.1
pyproject_hooks==1.0.0
PyYAML==6.0.1
readme-renderer==42.0
requests==2.31.0
requests-toolbelt==1.0.0
rfc3339==6.2
rfc3986==2.0.0
rich==13.6.0
rlp==3.0.0
SecretStorage==3.3.3
six==1.16.0
snowballstemmer==2.2.0
sortedcontainers==2.4.0
Sphinx==5.3.0
sphinx-gallery==0.12.2
sphinx-rtd-theme==1.2.2
sphinx_mdinclude==0.5.3
sphinxcontrib-applehelp==1.0.7
sphinxcontrib-devhelp==1.0.5
sphinxcontrib-htmlhelp==2.0.4
sphinxcontrib-jquery==4.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.6
sphinxcontrib-serializinghtml==1.1.9
sphinxcontrib-spelling==8.0.0
toml==0.10.2
tomlkit==0.12.1
toolz==0.12.0
trie==2.1.1
twine==4.0.2
typing_extensions==4.8.0
urllib3==2.0.7
webencodings==0.5.1
xq==0.0.4
zipp==3.17.0

Batch insertion application

What was wrong?

When a trie adds a single value, it must fix up all the nodes it touched along the way. This carries some performance cost, but is necessary for any single update. When applying a batch update, all the intermediate node fixups are wasted work. We could gain some (likely) significant performance benefits from a smarter batch-apply of a bunch of key/value pairs.

How to fix it

Rough concept: walk through the trie with all scheduled insertions, applying them simultaneously.

For example, if two keys from the batch have the same first nibble, then follow the root node down into the extension/leaf/branch node with that nibble, keeping both of those insertions in memory. Then apply them both to the newly created node at the same time.

Look into ways to reduce our DB size

I've just completed a fast sync of blocks/receipts (no state) on ropsten and after that our DB has 28GB, whereas geth's (including state) is at 14GB. We need to find why we're using so much more and fix it. One thing to look into would be changing the lookup keys (e.g 'transaction-hash-to-block') to something shorter

Modernize project structure

This project was made without the use of our standard project template.

https://github.com/carver/ethereum-python-project-template

The following should be cleaned up to closer match our modern project structure.

  • remove requirements-dev.txt in favor of using extras_require pattern.
  • add classifiers to setup.py
  • update the readme to contain the appropriate badgest, title and general structure.
  • update the tox.ini flake8 environment to use the more modern lint name as well as the same approach, installing from extras_require.
  • add python_requires (and exclude python==3.5.2)

Extra credit for finding other tangible differences and fixing them.

Code Refactoring

What was wrong?

The whole code base is a bit hard to read and could be structured better.

How can it be fixed?

The code base can be refactored with the following improvements

  • Make seperate classes for Leaf Nodes, Branch Nodes, Extension Nodes which in turn inherit a common class BaseNode. The class BaseNode should contain the following
    • Type of the Node (Leaf or Branch or Extension)
    • List of all the keys contained in the Node - (List of size 1 for Extension and Leaf nodes. List of size 16 for Branch nodes).
    • value contained in the node (If has nothing, then Null)
    • get_all_children method
    • Node Encode and Decode methods
    • parse_node method
  • Common BaseTrie class should be implemented from which the BinaryTrie and the HexaryTrie should be implemented.
  • Changing the test cases correspondingly to the above changes
  • Adding Benchmarks to compare the performance (benchmark architecture similar to py-evm)

The different actions between BinaryTrie and HexaryTrie

  • Version: 1.1.0 or the latest commit (4b78035)
  • Python: 3.6
  • OS: osx

What was wrong?

When I was trying to switch the transaction and receipt trie from HexaryTrie to BinaryTrie for ShardingVM (ethereum/py-evm#331), I found that the APIs results of these two tries are slightly different in dealing not-found-in-dict.

HexaryTrie returns BLANK_NODE

py-trie/trie/hexary.py

Lines 69 to 79 in 4b78035

def _get(self, node, trie_key):
node_type = get_node_type(node)
if node_type == NODE_TYPE_BLANK:
return BLANK_NODE
elif node_type in {NODE_TYPE_LEAF, NODE_TYPE_EXTENSION}:
return self._get_kv_node(node, trie_key)
elif node_type == NODE_TYPE_BRANCH:
return self._get_branch_node(node, trie_key)
else:
raise Exception("Invariant: This shouldn't ever happen")

BinaryTrie returns None

py-trie/trie/binary.py

Lines 57 to 86 in 4b78035

def _get(self, node_hash, keypath):
"""
Note: keypath should be in binary array format, i.e., encoded by encode_to_bin()
"""
# Empty trie
if node_hash == BLANK_HASH:
return None
nodetype, left_child, right_child = parse_node(self.db[node_hash])
# Key-value node descend
if nodetype == LEAF_TYPE:
if keypath:
return None
return right_child
elif nodetype == KV_TYPE:
# Keypath too short
if not keypath:
return None
if keypath[:len(left_child)] == left_child:
return self._get(right_child, keypath[len(left_child):])
else:
return None
# Branch node descend
elif nodetype == BRANCH_TYPE:
# Keypath too short
if not keypath:
return None
if keypath[:1] == BYTE_0:
return self._get(left_child, keypath[1:])
else:
return self._get(right_child, keypath[1:])

What did you expect it to do?

Code to reproduce the error

from trie import (
    BinaryTrie,
    HexaryTrie,
)


hexary_trie = HexaryTrie(db={})
assert b'hello' not in hexary_trie

binary_trie = BinaryTrie(db={})
assert b'hello' not in binary_trie

result:

Traceback (most recent call last):
  File "hello.py", line 16, in <module>
    assert b'hello' not in binary_trie
AssertionError

^^^^ That's false.

I replaced None with BLANK_NODE in trie._get() function, and then the tests in py-evm are passed.

I'm afraid that I may break BinaryTrie, so I wanna check with @NIC619, could I simply replace None with BLANK_NODE in BinaryTrie._get() function?

Use latest snake-charmers tooling

What was wrong?

Not merged with our project template, so missing a lot of the goodies we might like to have in place.

How to fix

Merge in the template, allowing unrelated branches. And fix the invariable flood of problems. Almost certainly disable mypy, for example. Probably want to figure out how to skip downloading the fixtures submodule when building at readthedocs, because it's huge and unnecessary.

Also, add in a note here about how to do that ^ merge for the first time with an existing project:
https://github.com/ethereum/snake-charmers-tactical-manual/blob/master/template.md

Return annotated node from `HexaryTrie.root_node`

All other traversals return an annotated HexaryTrieNode, but trie.root_node just returns the raw body. The inconsistency seems problematic (and annotations would be nice on the root, too!)

TODOs:

  • actually use self.traverse(()) under the hood for root_node
  • change locations that need the raw root node to either use self.root_node.raw or maybe a custom internal method for speed
  • add a breaking change notice to the CHANGELOG

Update binary trie with new data structure

As proposed and discussed in ethresearch, Binary Trie built on Sparse Merkle Tree has the advantage that it's more simple and straight forward(there only exist one node type - branch node, just like plain Binary Tree) while size of merkle proof is roughly the same(or even slightly better) since most subtree in Sparse Merkle Tree would be empty and we can compress these nodes into 1 bit representation.

The cost is that it takes O(log(N)) time to get/set value. In our case(with 2^160 address space) it's 160 nodes to traverse to get to the leaf node, but there should be room for optimization.

Add Python 3.8 support

Should be pretty straightforward: add a tox and circleci job to test python38, and fix any issues that come up.

Handling a TraversedPartialPath exception is awkward

What was wrong?

In an older HexaryTrieFog API, the TraversedPartialPath exception arguments made more sense. Now, they don't really help you use the fog API without a lot of argument fiddling.

What did you expect it to do?

TPP can return a "simulated" node which helps cleanly deal with the case of using the fog to traverse to a path that doesn't exist anymore.

Vendor the part of the fixtures repo we need

What was wrong?

CI takes nearly 1minute checking out the fixtures repo, and only another 1m actually running the tests. We don't use most of the fixtures repo, and the tests should be pretty static.

To-do

Just include the copy the folder/files we want to use for testing as a folder directly in this repo, and drop the submodule altogether. Also drop the submodule checkout step in the circleci config. CI speed should ~double.

Failed to update value with BinaryTrie

  • Version: 1.1.0 or the latest commit (4b78035)
  • Python: 3.6
  • OS: osx

What was wrong?

Failed to update the value with an existing key. (Not every case)

What did you expect it to do?

In this case, it works:

from trie import (
    BinaryTrie,
)


t = BinaryTrie(db={})

t.set(b'\x01\x01', b'value1')
t.set(b'\x01\x02', b'value2')
t.set(b'\x02\x01', b'value3')
t.set(b'\x02\x02', b'value4')

t.set(b'\x01\x01', b'value5')

But another case, it fails:

from trie import (
    BinaryTrie,
)


t = BinaryTrie(db={})

t.set(b'\x01\x00', b'value1')
t.set(b'\x01\x01', b'value2')
t.set(b'\x02\x00', b'value3')
t.set(b'\x02\x01', b'value4')

t.set(b'\x01\x00', b'value5')

Result:

hello.py:17: in <module>
    t.set(b'\x01\x00', b'value5')
trie/binary.py:97: in set
    self.root_hash = self._set(self.root_hash, encode_to_bin(key), value)
trie/binary.py:139: in _set
    if_delete_subtrie
trie/binary.py:177: in _set_kv_node
    if_delete_subtrie,
trie/binary.py:154: in _set
    if_delete_subtrie
trie/binary.py:267: in _set_branch_node
    new_left_child = self._set(left_child, keypath[1:], value, if_delete_subtrie)
trie/binary.py:139: in _set
    if_delete_subtrie
trie/binary.py:177: in _set_kv_node
    if_delete_subtrie,
trie/binary.py:154: in _set
    if_delete_subtrie
trie/binary.py:265: in _set_branch_node
    "Fail to set the value because it's key"
E   trie.exceptions.NodeOverrideError: Fail to set the value because it's key is the prefix of other existing key
!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!
=========================== 1 error in 0.24 seconds ============================

Code to reproduce the error

^ ditto

Use eth-hash.keccak instead of eth-utils.keccak

What was wrong?

eth_utils.keccak enables hashes of multiple types, including text strings and hex strings. This comes at a small performance cost, but py-trie uses keccak heavily so is fairly sensitive to these costs. py-trie only ever submits bytes for hashing, so the performance penalty is unnecessary.

How to fix it

  • Replace all usages of eth_utils.keccak with eth_hash.auto.keccak.
  • Add eth-hash as a dependency, and eth-hash[pycryptodome] as a testing dependency.

Flat database layout

At some point we have to implement a trie that operates on a flat database layout. It might be appropriate to focus this on the binary trie implementation, if there is enough confidence that will come to pass. Alternatively, implementing it with the hexary trie lets us focus on things we are in control of now, and would likely be somewhat low effort to create a separate binary trie implementation later.

Implement Merkle Proof Generation and Validation API

For working on things like Plasma and Ethereum 2.0, the Merkle Tree data structure is very critical, especially the generation and validation of merkle proofs. It would be beneficial to implement an API to make this easy where generation takes a key and makes a witness (for the current value in the tree), and validation takes a key, value, and a specific witness and validates that the provided data and witness are valid with the current root of the tree.

The validation part might be done as a separate utility function that could obtain the merkle tree root for that proof and specific key/value pair, and then validates that the root hash matches. This would allow others to validate proofs generated against prior values of the root hash for things like Plasma that need to work with validation of proofs against prior tree roots.

Update to use `eth_utils.toolz` indirection for imports

What is wrong?

This library currently specifies cytoolz as a hard dependency. This makes pypy support more difficult.

How can it be fixed?

  • Remove cytoolz from the direct dependencies.
  • Update the eth-utils requirement to be >=1.3.0,<2
  • Update all imports that come from cytoolz to instead come from eth_utils.toolz

Errors with BinaryTrie

  • Version: current master
  • Python: 3.5
In [1]: from trie import BinaryTrie

In [2]: t = BinaryTrie({})

In [3]: t[b'key1'] = b'value'

In [4]: t[b'key'] = b'value'

Fails with a this error:

py-trie/trie/utils/nodes.py in encode_kv_node(keypath, child_node_hash)
    148     """
    149     if keypath is None or keypath == b'':
--> 150         raise ValidationError("Key path can not be empty")
    151     validate_is_bytes(keypath)
    152     validate_is_bytes(child_node_hash)

ValidationError: Key path can not be empty

Similarly,

In [5]: t = BinaryTrie({})

In [6]: t[b'key'] = b'value'

In [7]: t[b'key1'] = b'value'

fails with

py-trie/trie/binary.py in _set(self, node_hash, keypath, value, if_delete_subtrie)
    118             if keypath:
    119                 raise LeafNodeOverrideError(
--> 120                     "Existing kv pair is being effaced because"
    121                     " it's key is the prefix of the new key")
    122             if if_delete_subtrie:

LeafNodeOverrideError: Existing kv pair is being effaced because it's key is the prefix of the new key

Potential v2 changes

Would like to make some backwards-incompatible changes. For example:

  • Change MissingTrieNode exception to stop subclassing from KeyError #98
  • Default trie to be immutable, where every change spits out a new root hash, and list of nodes to change, but doesn't affect the original trie. (and stop tracking the root hash as a property of the trie) -- This would be cool, but might not make it into v2
  • remove trie.sync #100
  • remove trie.Trie #100
  • Drop py3.5 support #97
  • return HexaryTrieNode from HexaryTrie.root_node #103
  • Move trie.utils to trie._utils
  • Drop pypy spport (to enable the faster, rust-based py-rlp) #118

Not incompatible, but important changes:

  • Publicize some methods that seem to be useful tools, like key_starts_with (See ethereum/trinity#1883 (comment) )
  • Unify imports so that we can do from trie import HexaryTrieFog, NodeIterator etc.

Surely there are others, just wanted to start a running list.

Remove prune flag as an external API

It's too easy to screw up handling of pruning when you use it directly. We should direct everyone to use the squash_changes() API instead. Let's remove the prune keyword from init, and add some internal-only API to enable pruning from within squash_changes.

It seems this needs to go into a major version bump

It's possible that we might be able to hide away the prune keyword somehow and force squash_changes as the only approach to do that. I think we could cover both use cases that way, but I'm not sure how it would look to launch a pruning trie from inside squash_changes() (which uses the prune keyword internally right now).

Originally posted by @carver in #93 (comment)

Python 3.7 support

Add formal support for python 3.7

  • Update classifiers in setup.py
  • Add python3.7 to tox.ini
  • Add python3.7 run(s) to .travis.yml
  • Fix any issues or failures that are exposed.

Import error with eth-hash

  • Version: 1.3.4
  • Python: 3.6
  • OS: osx

What was wrong?

After installing trie via pip, when you attempt to use the trie it throws the error:

ImportError: None of these hashing backends are installed: ['pycryptodome', 'pysha3'].
Install with `pip install eth-hash[pycryptodome].

It appears pip install is missing dependencies in eth-hash

Code to reproduce the error

To reproduce the error, simply try to import HexaryTrie: from trie import HexaryTrie

Proving absence of a key

What was wrong?

One might like to prove that a key is absent from a trie with get_proof(). (Or maybe a separate API to avoid confusion)?

Right now it just crashes if you try to get a proof for a key that's not in the trie.

What did you expect it to do?

Add a new method get_absence_proof() (or something like it) that uses the trie nodes to prove that a key doesn't exist.

Code to reproduce the error

Re-naming of Trie Class

What was wrong?

If binary trie is added into trie.py as BinaryTrie class, should we re-name original Trie class to, like MerkleTrie, PatriciaTrie, MerklePatriciaTrie?

What did you expect it to do?

perhaps we can have both BinaryTrie and re-named Trie in trie.py so people can tell the difference but keep the original Trie reference in __init__.py like this

from .trie import (  # noqa: F401
    MerklePatriciaTrie as Trie,
    BinaryTrie,
)

so that the existing codes implemented with the original Merkle Patricia Trie with from trie import Trie don't require any change

asyncio support

Continuation from some of things touched on in #84

What is wrong

Using py-trie with an asyncio codebase is problematic. Trie operations can be heavy and in an asyncio environment you typically don't want that stuff blocking the main event loop.

How can it be fixed.

I don't really know....

There are two things that I think this can try to address.

  1. The computationally heavy components (like all the trie node hashing)
  2. The database components, actually writing to and reading from the base database.

The database part feels easy enough. An AsyncTrie class where get/set/delete are awaitables, or extend the existing class to have coro_get/coro_set/coro_delete methods that are awaitable. I lean towards the AsyncTrie approach but I could be convinced either way.

The heavy computation offloading feels harder. I think this is because there isn't a one true way to offload computationally heavy stuff because the right approach varies in different contexts. That leads me to believe that we might need to build an API that would enable users of the library to hook into the right places where offloading should/could occur and leave it up to the developer to do the actual offloading. We could provide some built-in mechanisms that operate using this API for things like using the asyncio.Executor patterns.

Add type hints and expose via PEP561

Largely copy/pasted from ethereum/py-evm#1398

Background

Type hints allow us to perform static type checking, among other things. They raise the security bar by catching bugs at development time that, without type support, may turn into runtime bugs.

This stackoverflow answer does a great job at describing their main benefits.

What is wrong?

This library currently does not have any type hints.

This needs to be fixed by:

  1. Add the eth-typing>=2.0.0,<3 library as a dependency.
  2. Adding all missing type hints.
  3. Enforcing (stricter) type checking in CI

How

There does exist tooling (monkeytype) to the generation of type hints for existing code bases. From my personal experience monkeytype can be helpful but does still require manual fine tuning. Also, manually adding these type hints does serve as a great boost to the general understanding of the code base as it forces one to think about the code.

  1. Run mypy --follow-imports=silent --warn-unused-ignores --ignore-missing-imports --no-strict-optional --check-untyped-defs --disallow-incomplete-defs --disallow-untyped-defs --disallow-any-generics -p trie

  2. Eliminate every reported error by adding the right type hint

Because this library supports older versions of python, the type hints will not be able to use the modern python3.6 syntax.

Definition of done

This issue is done when the following criteria are met:

  1. mypy is run in CI

Add a new command to the flake8 environment in the tox.ini file that runs:

mypy --follow-imports=silent --warn-unused-ignores --ignore-missing-imports --no-strict-optional --check-untyped-defs --disallow-incomplete-defs --disallow-untyped-defs --disallow-any-generics -p py_ecc`
  1. Usage of type: ignore (silencing the type checker) is minimized and there's a reasonable explanation for its usage

Stretch goals

When this issue is done, stretch goals can be applied (and individually get funded) to tighten type support to qualify:

  1. mypy --strict --follow-imports=silent --ignore-missing-imports --no-strict-optional -p trie

Hexary trie prunes when it shouldn't

  • Version: 1.4.0
  • Python: 3.6
  • OS: debian linux

What was wrong?

Trying to create a trie of a list of identical hashes, using make_trie_root_and_nodes from py-evm, and I get the exception: trie.exceptions.MissingTrieNode: Trie database is missing hash.

This only occurs if the HexaryTrie has prune = True.

It looks like a leaf was pruned when it shouldn't have been.

The exception occurs on the 129th iteration of the loop.

What did you expect it to do?

Expected it to add the hashes to the trie database like normal so that I can get a root hash and nodes.

Code to reproduce the error

    import rlp
    from trie import HexaryTrie
    ZERO_HASH32 = 32 * b'\x00'
    BLANK_ROOT_HASH = b'V\xe8\x1f\x17\x1b\xccU\xa6\xff\x83E\xe6\x92\xc0\xf8n\x5bH\xe0\x1b\x99l\xad\xc0\x01b/\xb5\xe3c\xb4!'
    
    
    def make_trie_root_and_nodes( items):
        return _make_trie_root_and_nodes(tuple(rlp.encode(item) for item in items))
    
    def _make_trie_root_and_nodes(items):
        kv_store = {}
        trie = HexaryTrie(kv_store, BLANK_ROOT_HASH, prune=True)
        memory_trie = trie
        for index, item in enumerate(items):
            index_key = rlp.encode(index, sedes=rlp.sedes.big_endian_int)
            memory_trie[index_key] = item
        return trie.root_hash, kv_store
    
    def test_trie_root():
        list_of_bytes = []
        for i in range(129):
            random_bytes = ZERO_HASH32
            list_of_bytes.append(random_bytes)
    
        print(_make_trie_root_and_nodes(list_of_bytes))
    
    test_trie_root()

Got a LevelDBError (Too many open files) when running a state sync

I saw this in the logs when running the StateDownloader on a fully synced ropsten DB (28 GB). We may be leaking file descriptors somewhere, or it may just be that we need to pass a higher max_open_files when creating a LevelDB database

Traceback (most recent call last):
  File "/home/salgado/src/py-evm/p2p/state.py", line 65, in handle_peer
    await self._handle_peer(peer)
  File "/home/salgado/src/py-evm/p2p/state.py", line 90, in _handle_peer
    self.scheduler.process([(node_key, node)])
  File "/home/salgado/virtualenvs/pyevm/lib/python3.6/site-packages/trie-1.3.1-py3.6.egg/trie/sync.py", line 168, in process
    self.commit(request)
  File "/home/salgado/virtualenvs/pyevm/lib/python3.6/site-packages/trie-1.3.1-py3.6.egg/trie/sync.py", line 184, in commit
    self.db[keccak(request.data)] = request.data
  File "/home/salgado/src/py-evm/evm/db/backends/base.py", line 40, in __setitem__
    return self.set(key, value)
  File "/home/salgado/src/py-evm/evm/db/backends/level.py", line 25, in set
    self.db.Put(key, value)
leveldb.LevelDBError: IO error: /home/salgado/.local/share/trinity/ropsten-full-3/1375440.ldb: Too many open files

Update ethereum/tests fixture

  • Version: v2.0.0

What was wrong?

The ethereum/tests fixture hasn't been updated in a long time. It doesn't look like there will need to be major changes, just some expectation changes (famous last words).

Faster verification of HexaryTrieFog.explore inputs

It seems like we should be able to do better if we modeled the data as a tree. IIUC the current approach is super-linear in runtime complexity.

tree = {}
for segment in sub_segments:
    sub_tree = tree
    for nibble in segment:
        sub_tree.setdefault(nibble, {})
        if not isinstance(sub_tree[nibble], collections.Mapping):
            ...  # we've detected a that `segment` is a superset of `sub_tree[nibble]`
        sub_tree = sub_tree[nibble]
    # once the segment terminates, we store the full segment at that position 
    # so that we can later detect segments that are prefixes of other segments.
    sub_tree[segment[-1]] = segment

I think the untested psuedocode above has O(n) complexity.

Originally posted by @pipermerriam in #95

Add mypy tests

Add mypy tests to lint CI job, as defined in project template, which will be merged in #99

Slowly progress from:

  • Validate existing annotations
  • Require annotations everywhere
  • Strictest mypy checks enabled

[HexaryTrie] Can't use proof to compute updated root-hash on key removal

  • Version: 2.0.0a4 (and below)
  • Python: 3.7
  • OS: linux

What was wrong?

Given a key k with value v, in some cases it's not possible to exploit only the information provided by get_proof(k) to determine a new root-hash of the trie where k is deleted.

(Conversely, I was not able to find a case where it's not possible to use get_proof(k) to determine a new root-hash after an update or an insertion of k.)

What did you expect it to do?

Authenticated Data Structures usually allow users to update values in the following fashion:

  1. User queries the ADS for the value of key k
  2. ADS returns value v and its proof p
  3. User computes the root-hash determined by v and p and compares it to their previously known root-hash r to authenticate the response. ^
  4. User sends an update message to the ADS for key k with new value v'
  5. User computes the new root-hash r' determined by k and v' and stores it for authenticating following queries.

(^ In Ethereum Tries this differs a tiny bit in the way that r and p are used to determine v and then values are compared instead of root-hashes)

Now, even though this library does not provide a direct method for computing a new root-hash starting from a proof and a new value, it is still possible to do so by building a "partial" trie with the nodes obtained by get_proof() (in a similar fashion the way get_from_proof() is implemented), perform the update on such trie and then retrieve its new root-hash.

Therefore it should still be possible, for a third party not storing the ADS, to perform any kind of update on k starting just from the proof obtained by get_proof(), whether a literal "update" (i.e. change of value), an insertion or a removal (obviously just in respect to k).

NB: I already have a initial patch for this issue (needs refining because it adds nodes also when it might not be needed), if welcome I can go ahead with a PR.

Code to reproduce the error

First, assume a function new_root_after_delete(root, key, proof) for retrieving the new root-hash
def new_root_after_delete(root, key, proof):
    trie = HexaryTrie({})
    for node in proof:
        trie._persist_node(node)
    trie.root_hash = root
    trie.delete(key)

    return trie.root_hash

I was only able to find two scenarios in which this bug arises, and in particular only when the nodes are long enough to get hashed.

1. k is targeted to a branch node B having only one child node C:

In this case the removal of k causes B to be removed and C to collapse upwards, therefore C should be provided in the proof for k, to be able to update its remaining key and update hashes upwards accordingly.

t = HexaryTrie({}, prune=True)
t[b"\x00\x00"] = b"a"*30
t[b"\x00"] = b"b"*30

# client side
prev_root = t.root_hash
key_to_del = b"\x00"
proof = t.get_proof(key_to_del)

new_root_after_delete(prev_root, key_to_del, proof)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in new_root_after_delete
  File "/home/matteo/src/py-trie/trie/hexary.py", line 86, in wrapped
    fn(trie_self, *args)
  File "/home/matteo/src/py-trie/trie/hexary.py", line 356, in delete
    self._raise_missing_node(exc, key)
  File "/home/matteo/src/py-trie/trie/hexary.py", line 302, in _raise_missing_node
    raise MissingTrieNode(exception.args[0], self.root_hash, key, prefix=None) from exception
trie.exceptions.MissingTrieNode: Trie database is missing hash HexBytes('0x1fb7fdc5aae3c3f57d3e9cfa86bceddc19de02cf587c14a326895f451167c4af') needed to look up node at prefix None, when searching for key HexBytes('0x00') at root hash HexBytes('0x8374694098724fa740337bc9b496646c30219e9c5c87285b88e76d987d43a56d')

A visual representation will make it clear:

t proof proof after deletion
scenario 1 t scenario 1 proof case0 trie_after

In this example the node C corresponds to 1fb7fd, which is not included in proof, that then collapses to 13ef40 after the removal of k (which, for this trivial example it also corresponds to the whole trie).

2. k is targeted to a leaf node L which is child of a branch node B (with no value associated) and, in respect to B, L is the only sibling of some other node S:

In this case, the removal of k causes L to be removed, and since B has no value associated with it B is removed too, causing S to collapse upwards, recreating the first scenario. Therefore S should be provided in the proof for k.

t = HexaryTrie({}, prune=True)
t[b"\x00\x00"] = b"a"*30
t[b"\x00\x01"] = b"b"*30

# client side
prev_root = t.root_hash
key_to_del = b"\x00\x00"
proof = t.get_proof(key_to_del)

new_root_after_delete(prev_root, key_to_del, proof)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in new_root_after_delete
  File "/home/matteo/src/py-trie/trie/hexary.py", line 86, in wrapped
    fn(trie_self, *args)
  File "/home/matteo/src/py-trie/trie/hexary.py", line 356, in delete
    self._raise_missing_node(exc, key)
  File "/home/matteo/src/py-trie/trie/hexary.py", line 302, in _raise_missing_node
    raise MissingTrieNode(exception.args[0], self.root_hash, key, prefix=None) from exception
trie.exceptions.MissingTrieNode: Trie database is missing hash HexBytes('0x873d300c30dc3b1d383643935e480a8ca36821ffd3803f78432a345a37e1a440') needed to look up node at prefix None, when searching for key HexBytes('0x0000') at root hash HexBytes('0xc773de0ac200c0f6f3809b2bb7b03ca1e2974068e75b1bab4b48a6a6a2a2d179')

Visual representation:

t proof proof after deletion
scenario 2 t scenario 2 proof case1 after

In this example the node S corresponds to 873d30 which is not included in proof. After the removal of k, S collapses to ef3798 (which as before, for this trivial example is also the whole trie).

I collapsed the code for convenience, since this issue is already quite long. I hope this helps, let me know if I can start a PR or provide any additional input.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.