Coder Social home page Coder Social logo

Comments (11)

musalbas avatar musalbas commented on August 15, 2024 1

Hence, the simplest way is similar to the prototype but let the hash function decide via the prefix:
if the first NAMESPACE_ID_BYTES (32 bytes) are equal to PARITY_SHARE_NAMESPACE_ID, we are in "coding mode", else we are not.

The namespace ID isn't in the value that's being hashed though, so the hash function still has no way to differentiate. The namespace ID is something that the hash function is responsible for computing - it's not something that's an input to the hash function.

The tree should allow to pass in a "hasher" that enables us to hash leafs and internal nodes differently. Not only is this needed for prefixing leafs / inner nodes differently but also the prefixing with the proper namespaces. As a go interface, this could look like

Leaf and internal nodes are prefixed and hashed differently in the NebulousLabs library - see https://gitlab.com/NebulousLabs/merkletree/-/blob/master/tree.go#L61.

from celestia-core.

musalbas avatar musalbas commented on August 15, 2024

In the prototype, the namespaced Merkle tree was implemented by using a standard Merkle tree library, but with a different hash function. In practice I have found this to result in the need for hacks, particularly because I had to make the hash function stateful so that it knows when its hashing an erasure code parity share. This is because in the NMT all the erasure code parity shares should go in a reserved namespaced at the very end of the NMT (all bits set to 1). However, the hash function cannot determine by itself if a piece of data is a parity share or not, because parity shares in theory look like random data. It's worth thinking about the cleanest way to implement the NMT. For example, the first byte of each share could be a prefix that signifies if the share is a message share or a parity share.

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

particularly because I had to make the hash function stateful so that it knows when its hashing an erasure code parity share.

I was wondering if we couldn't make the hash function understand from (some prefix of) the data if it is hashing an erasure code parity share or not. It's still a bit hacky but then the hasher wouldn't need to be stateful. Didn't think this through yet. But this sounds very similar:

For example, the first byte of each share could be a prefix that signifies if the share is a message share or a parity share.

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

Revisiting this issue with the evolution of the spec in mind:

For example, the first byte of each share could be a prefix that signifies if the share is a message share or a parity share.

We reserve a namespace ID for parity shares as per:
https://github.com/lazyledger/lazyledger-specs/blob/master/specs/consensus.md#reserved-namespace-ids

Hence, the simplest way is similar to the prototype but let the hash function decide via the prefix:
if the first NAMESPACE_ID_BYTES (32 bytes) are equal to PARITY_SHARE_NAMESPACE_ID, we are in "coding mode", else we are not.

As the underlying merkle tree, we could also use nebolouslabs' implementation, or change tendermint's implementation (which basically implements the tree of rfc6962) to accept a different hasher. The latter is probably better on the long run as we can upstream it back to tendermint.

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

Thinking about it further, it might be worth to either implement our own merkle tree, or, use one that allows to plug in the underlying hasher in a slightly more fine-grained way, than the nebulousLabs tree:
While Neboulous labs (and most other merkle tree implementations out there) let you pass in the underlying hash function, it does not allow you to hash differently depending on if you are hashing a leaf or an interior node. Although most trees do prefixing correctly to prevent certain attacks, they do not give us the control we need to structure the tree as an NMT.

The tree should allow to pass in a "hasher" that enables us to hash leafs and internal nodes differently. Not only is this needed for prefixing leafs / inner nodes differently but also the prefixing with the proper namespaces. As a go interface, this could look like

type TreeHasher interface {
  HashLeaf(leaf []byte) []byte
  HashInner(left, right []byte) []byte
}

An implementation of this could take in an unmodified (plain) hasher and would need a way to extract the namespace for the value (message) that it wants to hash (for leaf nodes) and compute the n_min / n_max for inner nodes (https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#namespace-merkle-tree).

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

The namespace ID isn't in the value that's being hashed though, so the hasher still has no way to differentiate. The namespace ID is something that the hasher is responsible for computing - it's not something that is an input to the hasher.

Wait, if I submit a message for my application, aren't I supposed to submit it with the right namespace ID (which I need to know upfront)?
https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#message

Leaf and internal nodes are prefixed and hashed differently in the NebulousLabs library - see https://gitlab.com/NebulousLabs/merkletree/-/blob/master/tree.go#L61.

I know, I didn't mention nebulousLabs in that sentence but:

most trees do prefixing correctly to prevent certain attacks, they do not give us the control we need to structure the tree as an NMT.

from celestia-core.

musalbas avatar musalbas commented on August 15, 2024

Wait, if I submit a message for my application, aren't I supposed to submit it with the right namespace ID (which I need to know upfront)?

Yes but the namespace ID should be part of the transaction data. We assume a function ns(m) that takes as input a message and returns its namespace ID. The hash function being able to compute the namespaces from the leaf values is important to ensure than the tree being ordered correctly is consensus-critical.

I suppose we can modify parity shares in some way such that ns(m) returns the parity shares namespace, e.g. by prefixing the parity shares, which is what I think you're suggesting. However, that would make the size of the leaves in the parity shares greater than the non-parity shares, so that's something that we may need to think about.

I know, I didn't mention nebulousLabs in that sentence but:

I think you can achieve the same effect in the NebulousLabs library by simply passing the tree a custom hasher. The library prefixes leafs and nodes differently, so the hash function can act accordingly. That's what I do in the prototype.

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

I think the point I was trying to make is that it could be cleaner if the merkle tree library we use (or write) would allow to pass in a TreeHasher as above instead of just a plain hasher (as in the NebolousLabs library). It would be the TreeHashers responsibility (in HashLeaf / HashInner) to do what flagDigest in combination with a Flagger do in the prototype: https://github.com/lazyledger/lazyledger-prototype/blob/4e65b735d82d74ff06e6796a9867b43515316cc3/flaghasher.go#L58-L64

Maybe, I should just integrate a first version with nebolousLabs library as you did in the prototype and one variant with as vaguely described by me. Then we can decide what makes the most sense.

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

Yes but the namespace ID should be part of the transaction data.

@adlerjohn: sounds like this needs to go into the Tx data in the spec, too? It's currently only part of the message data.

We assume a function ns(m) that takes as input a message and returns its namespace ID.

If the namespace ID is embedded in the message struct like currently in the spec, it would just return the namespaceID field value:
https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#message

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

Updated the description of the issue to reflect celestiaorg/celestia-specs#45 (comment). I'd propose to define a minimalistic interface for the NMT such that the internal actual implementation can easily be swapped out by anything that adheres to that interface (see opening comment).

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

closing in favour of #58

The NMT itself has been implemented in https://github.com/lazyledger/nmt. While the API of the NMT might still undergo minimal changes / iterations (e.g. replacing the underlying merkle tree lib), it is feature complete in the sense that it can be integrated in here.

from celestia-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.