Deion Currently tendermint uses simple merkle tree to comput

Updated the deion of the issue to reflect <a class="issue-link js-issue-link" da

Namespaced Merkle Tree about celestia-core HOT 11 CLOSED

celestiaorg commented on August 15, 2024

Namespaced Merkle Tree

from celestia-core.

Comments (11)

musalbas commented on August 15, 2024 1

Hence, the simplest way is similar to the prototype but let the hash function decide via the prefix:
if the first NAMESPACE_ID_BYTES (32 bytes) are equal to PARITY_SHARE_NAMESPACE_ID, we are in "coding mode", else we are not.

The namespace ID isn't in the value that's being hashed though, so the hash function still has no way to differentiate. The namespace ID is something that the hash function is responsible for computing - it's not something that's an input to the hash function.

The tree should allow to pass in a "hasher" that enables us to hash leafs and internal nodes differently. Not only is this needed for prefixing leafs / inner nodes differently but also the prefixing with the proper namespaces. As a go interface, this could look like

Leaf and internal nodes are prefixed and hashed differently in the NebulousLabs library - see https://gitlab.com/NebulousLabs/merkletree/-/blob/master/tree.go#L61.

from celestia-core.

musalbas commented on August 15, 2024

In the prototype, the namespaced Merkle tree was implemented by using a standard Merkle tree library, but with a different hash function. In practice I have found this to result in the need for hacks, particularly because I had to make the hash function stateful so that it knows when its hashing an erasure code parity share. This is because in the NMT all the erasure code parity shares should go in a reserved namespaced at the very end of the NMT (all bits set to 1). However, the hash function cannot determine by itself if a piece of data is a parity share or not, because parity shares in theory look like random data. It's worth thinking about the cleanest way to implement the NMT. For example, the first byte of each share could be a prefix that signifies if the share is a message share or a parity share.

from celestia-core.

liamsi commented on August 15, 2024

particularly because I had to make the hash function stateful so that it knows when its hashing an erasure code parity share.

I was wondering if we couldn't make the hash function understand from (some prefix of) the data if it is hashing an erasure code parity share or not. It's still a bit hacky but then the hasher wouldn't need to be stateful. Didn't think this through yet. But this sounds very similar:

For example, the first byte of each share could be a prefix that signifies if the share is a message share or a parity share.

from celestia-core.

liamsi commented on August 15, 2024

Revisiting this issue with the evolution of the spec in mind:

For example, the first byte of each share could be a prefix that signifies if the share is a message share or a parity share.

We reserve a namespace ID for parity shares as per:
https://github.com/lazyledger/lazyledger-specs/blob/master/specs/consensus.md#reserved-namespace-ids

Hence, the simplest way is similar to the prototype but let the hash function decide via the prefix:
if the first NAMESPACE_ID_BYTES (32 bytes) are equal to PARITY_SHARE_NAMESPACE_ID, we are in "coding mode", else we are not.

As the underlying merkle tree, we could also use nebolouslabs' implementation, or change tendermint's implementation (which basically implements the tree of rfc6962) to accept a different hasher. The latter is probably better on the long run as we can upstream it back to tendermint.

from celestia-core.

liamsi commented on August 15, 2024

Thinking about it further, it might be worth to either implement our own merkle tree, or, use one that allows to plug in the underlying hasher in a slightly more fine-grained way, than the nebulousLabs tree:
While Neboulous labs (and most other merkle tree implementations out there) let you pass in the underlying hash function, it does not allow you to hash differently depending on if you are hashing a leaf or an interior node. Although most trees do prefixing correctly to prevent certain attacks, they do not give us the control we need to structure the tree as an NMT.

The tree should allow to pass in a "hasher" that enables us to hash leafs and internal nodes differently. Not only is this needed for prefixing leafs / inner nodes differently but also the prefixing with the proper namespaces. As a go interface, this could look like

type TreeHasher interface {
  HashLeaf(leaf []byte) []byte
  HashInner(left, right []byte) []byte
}

An implementation of this could take in an unmodified (plain) hasher and would need a way to extract the namespace for the value (message) that it wants to hash (for leaf nodes) and compute the n_min / n_max for inner nodes (https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#namespace-merkle-tree).

from celestia-core.

liamsi commented on August 15, 2024

The namespace ID isn't in the value that's being hashed though, so the hasher still has no way to differentiate. The namespace ID is something that the hasher is responsible for computing - it's not something that is an input to the hasher.

Wait, if I submit a message for my application, aren't I supposed to submit it with the right namespace ID (which I need to know upfront)?
https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#message

Leaf and internal nodes are prefixed and hashed differently in the NebulousLabs library - see https://gitlab.com/NebulousLabs/merkletree/-/blob/master/tree.go#L61.

I know, I didn't mention nebulousLabs in that sentence but:

most trees do prefixing correctly to prevent certain attacks, they do not give us the control we need to structure the tree as an NMT.

from celestia-core.

musalbas commented on August 15, 2024

Wait, if I submit a message for my application, aren't I supposed to submit it with the right namespace ID (which I need to know upfront)?

Yes but the namespace ID should be part of the transaction data. We assume a function ns(m) that takes as input a message and returns its namespace ID. The hash function being able to compute the namespaces from the leaf values is important to ensure than the tree being ordered correctly is consensus-critical.

I suppose we can modify parity shares in some way such that ns(m) returns the parity shares namespace, e.g. by prefixing the parity shares, which is what I think you're suggesting. However, that would make the size of the leaves in the parity shares greater than the non-parity shares, so that's something that we may need to think about.

I know, I didn't mention nebulousLabs in that sentence but:

I think you can achieve the same effect in the NebulousLabs library by simply passing the tree a custom hasher. The library prefixes leafs and nodes differently, so the hash function can act accordingly. That's what I do in the prototype.

from celestia-core.

liamsi commented on August 15, 2024

I think the point I was trying to make is that it could be cleaner if the merkle tree library we use (or write) would allow to pass in a TreeHasher as above instead of just a plain hasher (as in the NebolousLabs library). It would be the TreeHashers responsibility (in HashLeaf / HashInner) to do what flagDigest in combination with a Flagger do in the prototype: https://github.com/lazyledger/lazyledger-prototype/blob/4e65b735d82d74ff06e6796a9867b43515316cc3/flaghasher.go#L58-L64

Maybe, I should just integrate a first version with nebolousLabs library as you did in the prototype and one variant with as vaguely described by me. Then we can decide what makes the most sense.

from celestia-core.

liamsi commented on August 15, 2024

Yes but the namespace ID should be part of the transaction data.

@adlerjohn: sounds like this needs to go into the Tx data in the spec, too? It's currently only part of the message data.

We assume a function ns(m) that takes as input a message and returns its namespace ID.

If the namespace ID is embedded in the message struct like currently in the spec, it would just return the namespaceID field value:
https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#message

from celestia-core.

liamsi commented on August 15, 2024

Updated the description of the issue to reflect celestiaorg/celestia-specs#45 (comment). I'd propose to define a minimalistic interface for the NMT such that the internal actual implementation can easily be swapped out by anything that adheres to that interface (see opening comment).

from celestia-core.

liamsi commented on August 15, 2024

closing in favour of #58

The NMT itself has been implemented in https://github.com/lazyledger/nmt. While the API of the NMT might still undergo minimal changes / iterations (e.g. replacing the underlying merkle tree lib), it is feature complete in the sense that it can be integrated in here.

from celestia-core.

Namespaced Merkle Tree about celestia-core HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent