Coder Social home page Coder Social logo

nmt's People

Contributors

adlerjohn avatar chirag018 avatar dependabot[bot] avatar distractedm1nd avatar elias-orijtech avatar evan-forbes avatar ivan-gavran avatar jbowen93 avatar liamsi avatar mattdf avatar msevey avatar odeke-em avatar preston-evans98 avatar rach-id avatar rahulghangas avatar ramin avatar renaynay avatar rootulp avatar staheri14 avatar vgonkivs avatar walldiss avatar wondertan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nmt's Issues

Add proof API

Sth along the lines of:
fn GetWithProof(nid NamespaceID) -> (startIdx int, proof Proof, nodes []NamespacedDataOrHashes, ...)
and
fn VerifyProof(nid NamespaceID, startIdx int, proof Proof, nodes []NamespacedDataOrHashes, ...) -> bool

To build the proof, we need to rebuild (parts of) the tree. Hence, the above call also needs to be able to iterate of the leaf data / leaf hashes of the tree.

Feature Request: document nmt proof format

Overview

Celestia-node now supports retrieving shares with proofs of inclusion 🎉.

Unfortunately, the format of the returned merkle proof doesn't seem to be documented anywhere, which makes it difficult to verify. I'd suggest adding documentation (maybe here) explaining the format of the proof (is the array of siblings generated by an inorder traversal? preorder? postorder?) and demonstrating how to verify it using the go implementation of the nmt.

nmt: unnecessary and expensive powerOf2 exponentiation using math.Pow then uint(float64) cast yet could simply use left bit shifts aka 1<<N

This code here

nodeSpan := uint(math.Pow(float64(2), float64(treeDepth-i)))

image
performs exponentiation for powers of 2 in a very expensive and unnecessary way, this simply could be 1<<N which makes for fast code and this is an articulation for example:

  • uint(math.Pow(float64(2), float64(8))) == 1<<8
  • uint(math.Pow(float64(2), float64(1))) == 1<<1
  • uint(math.Pow(float64(2), float64(12))) == 1<<12

Benchmarks

If you run these benchmarks with go test -run=^$ -bench=Pow2 -count=10 then compare

var sink any

func BenchmarkPow2Naive(b *testing.B) {
        benchmarkPow2(b, func(i int) uint {
                return uint(math.Pow(float64(2), float64(i)))
        })
}

func BenchmarkPow2BitShift(b *testing.B) {
        benchmarkPow2(b, func(i int) uint {
                return 1 << i
        })
}

var cases = []struct {
        i    int
        want uint
}{
        {0, 1},
        {1, 2},
        {2, 4},
        {3, 8},
        {4, 16},
        {5, 32},
        {12, 4096},
        {20, 1048576},
        {24, 16777216},
        {31, 2147483648},
        {63, 9223372036854775808},
}

func benchmarkPow2(b *testing.B, fn func(int) uint) {
        b.ReportAllocs()
        for i := 0; i < b.N; i++ {
                for _, tc := range cases {
                        n := fn(tc.i)
                        if n != tc.want {
                                b.Fatalf("Bad result: got=%d want=%d", n, tc.want)
                        }
                        sink = n
                }
        }

        if sink == nil {
                b.Fatal("Benchmark did not run")
        }
        sink = (any)(nil)
}

Results

$ benchstat before.txt after.txt 
name    old time/op    new time/op    delta
Pow2-8     322ns ± 2%     114ns ± 1%  -64.69%  (p=0.000 n=9+8)

name    old alloc/op   new alloc/op   delta
Pow2-8     40.0B ± 0%     40.0B ± 0%     ~     (all equal)

name    old allocs/op  new allocs/op  delta
Pow2-8      5.00 ± 0%      5.00 ± 0%     ~     (all equal)

Kind cc @elias-orijtech @kirbyquerby @willpoint @jhusdero @musalbas @liamsi

Verification logic does not ensure lexicographical ordering

The NMT guarantees lexicographical ordering and hence generated proofs follow this ordering as well. The verification logic does not enforce this though currently. This might not be a big deal as verification would fail if ordering is different though it makes it harder to reason about this and it is obviously better to enforce ordering there too.

@musalbas suggested to enforce ordering in the hasher directly. While it might seem better to follow the current pattern and fail much earlier (when adding data to the tree and not only later when hashing/creating the root), having the ordering enforced while hashing is better from a consistency perspective. it would also immediately resolve the ordering guarantee during verification as well.

NamespacedMerkleTree.foundInRange should not narrow uint64 to int

In

nmt/nmt.go

Lines 238 to 245 in 6274243

func (n *NamespacedMerkleTree) foundInRange(nID namespace.ID) (bool, int, int) {
// This is a faster version of this code snippet:
// https://github.com/celestiaorg/celestiaorg-prototype/blob/2aeca6f55ad389b9d68034a0a7038f80a8d2982e/simpleblock.go#L106-L117
foundRng, found := n.namespaceRanges[string(nID)]
// XXX casting from uint64 to int is kinda crappy but nebolousLabs'
// range proof api requires int params only to convert them to uint64 ...
return found, int(foundRng.start), int(foundRng.end)
}

the XXX'ed comment correctly identifies the narrowing cast from uint64 to int as problematic. I suggest either (1) removing the casts and changing the return types to uint64, or (2) add a range check and panic on overflow. Given #15, and the fact that return types tend to spread to callers, I recommend (1) even if it's a bit more work.

Defining and Clarifying the Verification Process for NMT Proofs Generated from Non-Lexicographically Ordered Trees

Problem

The issue at hand pertains to the generation of NMTs from sets of leaves that are not ordered lexicographically based on their namespaces. The behavior of NMT proofs generated from such malformed trees is unclear and needs to be addressed. The main objectives of this issue are to:

  • Determine whether the current NMT proof verification system is able to detect out-of-order leaves and respond accordingly.
  • If the current system is not capable, then develop and implement measures to invalidate NMT proofs generated from malformed NMTs.

Replace the `IntervalDigest` struct with raw bytes

I'm wondering if we should not get rid of this struct entirely and instead store this as raw bytes min||max||digest instead. For instance, when using this with ipfs, the CID will essentially be this plus some overhead bytes that stem from the multihash standard.

Originally posted by @liamsi in #35 (comment)

bug: Hasher is not go-routine safe

On lazyledger-core, we access NMTs hasher (e.g when hashing leaves) in several go routines.

These will try to re-use the same hasher instance: https://github.com/lazyledger/nmt/blob/6dc97cc6f9fdd1c800108d2cab06a4d8b088ea8a/hasher.go#L98

We should create a new instance instead.

This might be related to: celestiaorg/celestia-core#330

At least tests with larger blocks panic with:

panic: d.nx != 0

goroutine 85 [running]:
crypto/sha256.(*digest).checkSum(0xc00043d9f0, 0x0, 0x0, 0x0, 0x0)
        c:/go/src/crypto/sha256/sha256.go:234 +0x1e0
crypto/sha256.(*digest).Sum(0xc00009eb00, 0x0, 0x0, 0x0, 0xc0, 0xb6, 0x0)
        c:/go/src/crypto/sha256/sha256.go:210 +0x70

refactor(share/ipld): Remove dependency on redundant namespaceID prepend

    After discussing with @Wondertan, we realized that https://github.com/celestiaorg/celestia-node/issues/183 can be tackled without modifying the NMT library at all (and instead by modifying this [visitor](https://github.com/celestiaorg/celestia-node/blob/ed4cdf03f2bd35c30d5b9874d4bc40bea6f5a061/ipld/nmt_adder.go#L34)). So we could put this on the back burner for now. 

I still think the suggested change makes sense. It would also allow us to delete the PrefixedData type entirely. But it might make sense to do this as part of a somewhat larger refactoring (can/should be split into several PRs) where we also tackle #15. In it's current form and because of the usage of this interface we still need to do a copy which merges the nid and the data. Hlib convinced me that it is better to hide this copying inside of the NMT though. Currently, it happens in the wrapper which is bad usability. Especially as Hlib put it:

The reason is; right now, we can avoid storing unnecessary data, but it's hacky. Meaning that on Push we need to prepend, and on Visit remove that we prepend so we don't store it

Originally posted by @liamsi in #55 (comment)

Add possibility to store nodes of the tree (MapStore)

Also reuse stored hashes instead of recomputing them each time.

Copying the related review comment here:

There are other changes that we may need to make to the underlying merkletree implementation that may not be mergeable upstream. For example, we should probably change the implementation to store key=>value pairs of nodes in the tree using the same interface as the SMT MapStore, so that we can plug in other MapStores like an IPFS-based one, and reuse the same code: https://github.com/lazyledger/smt-ipfs-mapstore

This is something that needs to be investigated a bit though, as from my understanding the NebulousLabs merkletree library I think may compute the root on demand and doesn't keep around any of the old nodes in memory. Alternatively, it may be possible to implement this MapStore storage functionality within the NMT library, which uses the MapStore as "cache", and only invokes the NebulousLabs merkletree library to recompute roots if the node data is unavailable in the MapStore.

related to: #6

Export nmt-specific hashers

It might make sense to export the nmt-specific hashing currently "hidden" in the internal package:
https://github.com/lazyledger/nmt/tree/master/internal

This particularly makes sense in the light of the IPLD plugin. Instead of redefining the logic when adding a custom hasher (see: celestiaorg/celestia-core#155, https://github.com/lazyledger/go-multihash/pull/1 and in particular this (wip) code), we could just import the hashers from here instead. Besides exporting it would require some minor changes to the hasher api to easily match that of HashFunc.

The main benefit of doing this is that the logic is defined in one place only. As a result, we don't need to keep two implementations in sync. Furthermore, we do not need to make sure (e.g. via unit tests) that the hasher used in the nmt internally matches the one used to compute the Sum of an IPLD block (as the registered custom multihasher just imports the code from here).

Note to use the hashers as described above, ideally, we would register two multihashers instead of one (and looking at the prefix as we do currently).

New verify and create functions are needed for NamespaceMerkleTreeInclusionProof.

A new verify function is needed to verify a single share via a NamespaceMerkleTreeInclusionProof defined in celestia-specs.

This function should take (i) a NMT root of type namespace.IntervalDigest, (ii) a hash function of type hash.Hash, (iii) a proof of type NamespaceMerkleTreeInclusionProof and (iv) a share of type Share, and output a boolean and an error that is (true, nil) if the share is committed by the NMT root inside the NMT and the proof of type NamespaceMerkleTreeInclusionProof is correct.

Similarly, a new create function is needed to create a NamespaceMerkleTreeInclusionProof. This function should take the index of the Share inside the NMT and output the NamespaceMerkleTreeInclusionProof corresponding to the share.

fix: Resolving Inconsistency in the ProveNamespace Signature for Error Handling

Problem

Identified as part of this EPIC celestiaorg/celestia-app#1296.
This issue is to investigate and address the inconsistency in the ProveNamespace signature, where it indicates that the method should return an error but in its current implementation, the error value is always nil.

Acceptance Criteria

Identify the cases where the method should return a proper error or remove the error from the method signature if no error conditions are perceived. Refactor the code, if necessary.

feat: verify the range of the namespace IDs of left and right nodes in the Hasher

Problem

The Hasher implementation does not verify the range of namespace IDs in the left and right nodes with respect to each other, which is crucial for proper ordering of leaves based on their namespace IDs and for detecting invalidly constructed trees and proofs.

Quoting from Lazy Ledger paper:

An adversarial consensus node may attempt to produce a block that contains a Merkle tree with children that are not ordered correctly. To prevent this, we can set a condition in nsHash such that there is no valid hash when leftMaxNs ≥ rightMinNs, and thus there would be no valid Merkle root for incorrectly ordered children

The issue was identified as part of celestiaorg/celestia-app#1296

Acceptance Criteria

  • To modify the Hasher implementation by incorporating this additional namespace ID range check for the supplied children nodes. Possibly, the Hasher can return an error for invalid inputs. Subsequent refactor needs to take place to properly consume this change.

Bug: Proofs of Absence can be Forged

Bug: Proofs of Absence can be Forged

Problem

The namespaced merkle tree exposes a VerifyNamespace method whose documentation states that it "verifies that the namespace is complete and no leaf of that namespace was left out in the proof."

This API allow consumers to write code like the following:

proof, namespaceData := getUntrustedProofFor(namespaceId)
// check that the namespaceData was all returned
if proof.VerifyNamespace(sha256.New(), namespaceId, namespaceData, n.Root()) {
    process(namespaceData)
} else {
    return errors.New("prover lied about namespace data")
}

Unfortunately, this code is vulnerable. A malicious prover can falsely claim that the namespace is empty, and, as long as he also provides an empty proof, the claim will be accepted.

Test Case

A simple test case which reproduces the issue is as follows:

func TestNMT_forgeNamespaceEmptinessProof(t *testing.T) {
	data := [][]byte{
		append(namespace.ID{0}, []byte("leaf_0")...),
		append(namespace.ID{0}, []byte("leaf_1")...),
		append(namespace.ID{1}, []byte("leaf_2")...),
		append(namespace.ID{1}, []byte("leaf_3")...)}
	// Init a tree with the namespace size as well as
	// the underlying hash function:
	tree := New(sha256.New(), NamespaceIDSize(1))
	for _, d := range data {
		if err := tree.Push(d); err != nil {
			panic(fmt.Sprintf("unexpected error: %v", err))
		}
	}

	root := tree.Root()
	actualLeaves := tree.Get(namespace.ID{0})
	if len(actualLeaves) == 0 {
		t.Fatalf("Get(namespace.ID{0}) should have returned two leaves but returned none.")
	}

	forgedProof := Proof{
		start:                   0,
		end:                     0,
		nodes:                   [][]byte{},
		leafHash:                []byte{},
		isMaxNamespaceIDIgnored: true,
	}

	forged_proof_succes := forgedProof.VerifyNamespace(sha256.New(), namespace.ID{0}, [][]byte{}, root)
	if forged_proof_succes {
		t.Fatalf("Successfully verified proof that non-empty namespace was empty")
	}
}

Root Cause

This behavior is introduced by the following condition here:

// empty range, proof, and data: always checks out
if len(data) == 0 && isEmptyRange && len(proof.nodes) == 0 {
    return true
}

Suggested Solution

Currently, the handling of empty range proofs is not well defined. I propose to reject all proofs that do not contain at least one node in either the proof.nodes or data arguments, unless the tree is the empty. (Without this special case, it's impossible to check range proofs against the empty tree). This will change the semantics of the NMT, causing some existing test cases to fail.

Incorporating test-vectors in the NMT specs

Problem

EPIC: celestiaorg/celestia-app#1296.

Based on the comments by @liamsi in #101 (review)

It would be beneficial to include an appendix of realistic test vectors in the NMT spec for implementers to use when verifying their implementation. These test vectors should specifically cater to the nmt implementation for Celestia and match its parameters.

Acceptance Criteria

Once there is complete confidence in the implementation, proper test vectors shall be generated and added to the specs.

Implement algorithm for subtree root paths

Spec:

Pure function that takes arguments: square size, share index start, and share length, and returns a minimal path to the subtree root that encompasses that entire range, with the path starting from the nearest row root.

Vulnerability in VerifyNamespace: Partial Absence Proofs

Problem

Identified as part of EPIC celestiaorg/celestia-app#1296.
The current implementation of VerifyNamespace is vulnerable to an issue in which a queried node can act inconsistently with the protocol specifications for when a queried namespace ID does not have corresponding items in the tree (i.e., absence proof). The attack/issue involves providing a partial absence proof for a namespace ID, while remaining undetected by the verification implementation. A partial proof in this case refers to a proof of an ancestor of a leaf to the root, rather than a proof of inclusion of the leaf to the tree. This is because the implementation cannot distinguish between a full or partial proof.

Note: Please note that although VerifyNamespace is unable to detect partial proofs, the soundness of the namespace proof is not compromised. In other words, a malicious node is not able to convince someone of the absence of a namespace ID that actually exists in the tree. However, the inability to distinguish this inconsistent behavior may lead to incorrect conclusions being drawn at the application layer about the underlying data represented by NMT. The impact of this issue depends on the context in which NMT is being used and may or may not be considered an attack. Nevertheless, it is important to address this issue properly as it currently causes inconsistency and ambiguity in the verification process.
Update: I suspect that the current implementation may be intentional to help with the non-interactive default rules for inclusion proofs of Blobs. Even if this is the case, the specs needs to be updated to make it clear whether partial absence proofs are considered valid or not.

Root Cause:
This is because VerifyNamespace does not consider the height of the tree or the total number of leaves during verification. Furthermore, the absence proof is based solely on the leaf hash and not on its underlying data. As a result, any internal node (non-leaf node) can be used as a leaf hash in the proof.

Example:
Below is an example of such a partial proof that can be successfully verified by the VerifyNamespace algorithm.
Description of the example: A correct absence proof for nID=1 would consist of the following: proof.start=1, proof.end=2, proof.leafHash=leaf1, and proof.nodes=[leaf0, hash2].
However, it is possible to construct a partial proof that would also be considered correct by the verifyNamespace function.
This alternate proof would have proof.start=0, proof.end=1, proof.leafHash=hash1, and proof.nodes=[hash2].
Screenshot 2023-02-28 at 4 24 38 PM

nID := namespace.ID{1}
// create a tree with 4 leaves, variables names match the figure above
n := New(sha256.New(), NamespaceIDSize(1))
d0 := append([]byte{0}, []byte("data0")...)
d1 := append([]byte{2}, []byte("data1")...)
d2 := append([]byte{3}, []byte("data2")...)
d3 := append([]byte{4}, []byte("data3")...)

n.Push(d0)
n.Push(d1)
n.Push(d2)
n.Push(d3)

// this will populate n.leafHashes with the hash of d0...d3
n.computeLeafHashesIfNecessary()

root := n.Root()
hash1 := n.computeRoot(0, 2)
hash2 := n.computeRoot(2, 4)
leaf0 := n.leafHashes[0]
leaf1 := n.leafHashes[1]

// attack scenario: create a partial proof from hash1 to the root instead of leaf1 to the root
partialProof := Proof{start: 0, end: 1, leafHash: hash1, nodes: [][]byte{hash2}, isMaxNamespaceIDIgnored: true}
// run VerifyNamespace for the fabricated absence proof and see if it can pass
if partialProof.VerifyNamespace(sha256.New(), nID, nil, root) {
    fmt.Println("partial proof is successfully verified")  // this will be executed
} else {
    fmt.Println("verification of the partial proof failed")
}

// correct scenario: create a full proof from leaf1 to the root
fullProof := Proof{start: 1, end: 2, leafHash: leaf1, nodes: [][]byte{leaf0, hash2}}
if fullProof.VerifyNamespace(sha256.New(), nID, nil, root) {
    fmt.Println("full proof is successfully verified")  // this will be executed
} else {
    fmt.Println("verification of the full proof failed") 
}

Proposed Solutions

One of the following options will solve this issue:

  • If the current behaviour does not cause serious issue in the entire system, then we shall consider the current behavior of the VerifyNamespace valid and include it in the specification.
  • For an absence proof, we should return the underlying leaf data instead of the leaf hash so that the internal nodes wont be acceptable in place of leaf nodes. Please see this comment for the rationale of this solution.
  • To prevent partial proofs from being successfully verified, we can modify the implementation of VerifyNamespace by adding a new parameter to take the size of the tree, i.e., the number of data items it represents. This new parameter would be used to adjust the verification process to account for the tree size. By doing so, we can ensure that partial proofs do not comply with the height of the tree, since they will include fewer nodes than what is required for a full proof.

namespace: PrefixedData/PrefixedData8 are easy to misuse

The PrefixedData and PrefixedData8 are both byte slices:

// PrefixedData simply represents a slice of bytes which consists of
// a namespace.ID and raw data.
// The user has to guarantee that the bytes are valid namespace prefixed data.
// Go's type system does not allow enforcing the structure we want:
// [namespaceID, rawData ...], especially as this type does not expect any
// particular size for the namespace.
type PrefixedData []byte

// PrefixedData8 like PrefixedData is just a slice of bytes.
// It assumes that the slice it represents is at least 8 bytes.
// This assumption is not enforced by the type system though.
type PrefixedData8 []byte

The definitions are problematic, because (1) casting byte slices may violate the type invariants (as pointed out in the comments) (2) they're mutable (3) PrefixedData has no way to indicate the namespace identifier length. For an example where mutability is a problem, see

nmt/nmt.go

Lines 253 to 257 in 6274243

// Push adds data with the corresponding namespace ID to the tree.
// Returns an error if the namespace ID size of the input
// does not match the tree's NamespaceSize() or the leaves are not pushed in
// order (i.e. lexicographically sorted by namespace ID).
func (n *NamespacedMerkleTree) Push(namespacedData namespace.PrefixedData) error {

where it is not clear to the caller that the parameter namespacedData is retained by the Push method.

The missing namespace length is problematic for validateAndExtractNamespace,

nmt/nmt.go

Lines 336 to 340 in 6274243

func (n *NamespacedMerkleTree) validateAndExtractNamespace(ndata namespace.PrefixedData) (namespace.ID, error) {
nidSize := int(n.NamespaceSize())
if len(ndata) < nidSize {
return nil, fmt.Errorf("%w: got: %v, want >= %v", ErrMismatchedNamespaceSize, len(ndata), nidSize)
}

because it can only validate that the namespace+data total length is at least as long as the expected namespace length.

Perhaps the definitions of PrefixedData and PrefixedData8 are caused by wanting to support zero-copy and zero-allocation use, where the user generates PrefixedData values by slicing existing byte slices. If so, the behaviour of Push should document that argument values are retained.

Otherwise, I recommend changing the definition to hide the internal structure, and add validating constructors. There are several possibilites:

type PrefixedData struct {
    // namespace and data can be sliced from the same
    // underlying []byte allocation.
    namespace []byte
    data []byte
}

func NewPrefixedData(namespace, data []byte) PrefixedData {
    bytes := make([]byte, len(namespace)+len(data))
    d := PrefixedData{
        namespace: bytes[:len(namespace)],
        data: bytes[len(namespace)],
    }
    copy(d.namespace, namespace)
    copy(d.data, data)
    return d
}

or equivalently

type PrefixedData struct {
    namespaceAndData []byte
    namespaceLen int
}

@odeke-em

Automated CHANGELOG.md check for nmt

There should be an automated step that ensures that a CHANGELOG.md file is appropriately updated for all merges to the main branch.

There should be a proper description in an issue and corresponding PR as well as a summary in the CHANGELOG.md file. This will be enforced through an automated action. In rare cases for small PRs it may be acceptable to skip this.

Make `IntervalDigest` json encodable

Some tests in lazyledger-core will require DataAvailabilityHeader to be encoded via json.

This issue needs to be solved before #312 can be merged.

The json representation of IntervalDigest might be useful for #21 as well.

reconsider Push to take in the pair (nid, data) as one argument instead

Push currently looks like this:
https://github.com/lazyledger/nmt/blob/e0a317aa2d0ff84fcf1478527e4c90e6a13245d7/nmt.go#L249

The reason for this, is that in a few parts of the code, we need to operate on the namespaceID independently. Also, at the very beginning of the implementation, I did not fully understand that having a one to one mapping of one NMT <-> one namespace size would be the only reasonable use-case. The implementation already stores the namespace size and internally could split and merge the passed in argument into (nid, data) and back.

The only downside is that the API looks a bit less clear. This can easily be mitigated by having a well documented type alias type NamespacedData []byte.

I'm proposing to change Push to func (n *NamespacedMerkleTree) Push(ndata NamespacedData) error.

This would also slightly simplify the wrapper that is needed for the rsmt2d library (see #25 ).

api: `Push` for data/shares with decoupled namespace

Currently, tree.Push requires passing namespace.PrefixedData, which is a byte slice prepending namespace.ID. However, there is a use case where we don't need to push data that is not prepended with a namespace.ID, and it can be passed as a separate field. This is the case where we push parity data generated via rsmt2d. Currently, we prepend parity namespace to the parity shares, which is not required. This is one of the issues causing us to store an additional 8 bytes of parity namespace per parity share via NMTWrapper, as outlined in celestiaorg/celestia-node#183. I propose adding an additional method like PushWith to the nmt.Tree to allow passing decoupled namespace from the share data to allow mentioned disk usage optimization.

feat: expose methods to get min and max namespace IDs of an NMT

Problem

Currently, there is no exposed method in the nmt API to get the min and max namespace IDs of an NMT. To do so, one needs to compute the root and then make another function call to spot the min and max namespace IDs encoded in the root.

Acceptance Criteria

  • Augment the current NMT API with getMinNamespace and getMaxNamespace as methods of the NamespaceMerkleTree struct. Proper unit tests should be included as well.

feat: Handling Panics in HashNode Method

Problem

Identified as part of this EPIC: celestiaorg/celestia-app#1296.
The HashNode method of the NMT may potentially cause a panic if it receives invalid or malformed inputs (as mentioned by @evan-forbes in this comment #102 (comment)). For example, if the left or right children do not conform to the namespace hash format, meaning their length is less than the namespace ID size, or if the relation between the left and right children namespace range is invalid. If an invalid node is sent to a light node, it can result in a crash.

It is important to note that returning an error from this method is not possible as it would violate the interface it is intended to fulfill i.e., NodeHasher and Hash.

Acceptance Criteria

This issue is to implement proper error handling mechanism for HashNode.

api: Make the api consistent

When we push data to the tree, we are using prefixed data, but when we verify proofs, we are using non-prefixed data. Whatever the API ends up being, we should probably make it consistent.

This also applies to creating and verifying proofs. When creating proofs, we have two different functions, one for creating a ranged proof, and another for creating a proof to a single leaf. When verifying proofs, we are only using a single function to verify proofs of a single leaf or a range of leaves. Whatever we choose, we should probably make this consistent as well.

relevant comment #58 (comment)

related to #55

Consider removing abstractions in favour of clarifying requirements to data serialization

The following might result in a slightly simpler API:

get rid of the namespaced data abstraction altogether and change the Push method to take in raw byte pairs (nid, data) instead. This would simplify the API and reduce the number of introduced types / interfaces (see #3 (comment))

I think clarifying the requirements on how the data should be serialized and sorted is favourable over adding serialization concerns into the NMTs API.

ref: #3 (comment)
related discussion: #3 (comment)

Remove NebulousLabs' merkletree dependency for proofs as well

#14 removed the dependency on NebulousLabs' implementation for computing the root of the tree and replaced it a very simple recursive algo that boils down to a few lines of code (derived from the recursive definition of rfc6962 and very similar to the simple merkle tree implementation in tendermint.

It would be nice if the same internal mechanism would be used for generating/verifiying proofs (instead of using the range proofs of nebolousLabs).

Work around the panic thrown by IntervalDigestFromBytes

IntervalDigestFromBytes is meant to deserialize bytes commonly received from the wire that does not guarantee to compel with the panic check, allowing a non-valid network message to kill an application that uses the lib. We can either:

  1. Throw an error instead
  2. Force lib user to do an additional check before or recover from panic :)

But obviously, the first one is more preferable.

clarify go-routine safety

Most methods in the nmt are not go-routine safe. Investigate which need to be (if any) and clarify this in the go-docs.

related: #37

feat: compute leaf hashes immediately after data items are pushed

Problem

Identified as part of this EPIC: celestiaorg/celestia-app#1296.
When adding data items to the tree, their hash is not directly calculated. Instead, every time we need to access the leaf hashes, we have to compare the number of data items added so far (which can be found in the leaves field of NamespaceMerkleTree) with the leaf hashes (which are stored in the leafHashes field of the same struct). For example, see the computeRoot() method and computeLeafHashesIfNecessary() function.
This can make debugging and code maintenance more challenging.

Acceptance criteria

To simplify the process, it would be better to hash the data items immediately after they are pushed. We can address this issue by implementing this feature in the Push method and refactoring the other parts of the code as necessary.

Min/max definitions in diagram in readme don't match specs and implementation

If I'm understanding the diagram (https://github.com/celestiaorg/nmt/blob/6d5b99ddb71ae8f8c11d7a5fedc3b84532e949a7/imgs/example_4-leaves.png) correctly, it seems to have mistakes in how the mins and maxes are calculated. For example:

min_2.0 = min(min_1.0, max_1.1)

Judging from the specs (https://github.com/celestiaorg/celestia-specs/blob/ec98170398dfc6394423ee79b00b71038879e211/src/specs/data_structures.md) and the implementation (

nmt/hasher.go

Lines 128 to 131 in 6d5b99d

leftMinNs, leftMaxNs := l[:n.NamespaceLen], l[n.NamespaceLen:flagLen]
rightMinNs, rightMaxNs := r[:n.NamespaceLen], r[n.NamespaceLen:flagLen]
minNs := min(leftMinNs, rightMinNs)
) it should instead say:

min_2.0 = min(min_1.0, min_1.1)

And analogously for all the other non-leaf min and max definitions in the diagram.

Enable markdownlint

[no change needed] but markdownlint will complain here because multiple H1 headings are used in the same document. Do we want to enable markdownlint in this repo configure a CI check for it to conform to other celestiaorg repos?

no change needed because this spec can be reviewed without conforming to markdownlint and we can tackle that in a separate issue / PR if desired.

Originally posted by @rootulp in #101 (comment)

nmt: prune's idxStart, idxEnd, maxWidth bounds aren't checked

This code in prune doesn't have sanity bounds checks and these vectors cause panics

prune(0, 0, 0)
prune(0, 1, 0)
prune(0, 1, 1)
prune(1, 0, 0)
$ go test -run=Prune -fuzz=^$ -short
--- FAIL: TestPrune (0.00s)
panic: runtime error: slice bounds out of range [:-1] [recovered]
	panic: runtime error: slice bounds out of range [:-1]

goroutine 18 [running]:
testing.tRunner.func1.2({0x12271e0, 0xc00012c018})
	/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/testing/testing.go:1451 +0x24e
testing.tRunner.func1()
	/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/testing/testing.go:1454 +0x39f
panic({0x12271e0, 0xc00012c018})
	/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/panic.go:884 +0x213
github.com/celestiaorg/nmt.prune(0x0, 0x0, 0x0)
	/Users/emmanuelodeke/go/src/github.com/celestiaorg/nmt/subrootpaths.go:52 +0x6e5
github.com/celestiaorg/nmt.TestPrune(0x0?)
	/Users/emmanuelodeke/go/src/github.com/celestiaorg/nmt/vuln_test.go:10 +0x45
testing.tRunner(0xc000116820, 0x12479f8)
	/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/testing/testing.go:1501 +0x10b
created by testing.(*T).Run
	/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/testing/testing.go:1548 +0x35f
exit status 2
FAIL	github.com/celestiaorg/nmt	0.369s

Define String representation of IntervalDigest

@liamsi suggested printing IntervalDigest as CID in a private discussion, however that may require additional go-cid dependency for the lib. Anyway, we should define a concrete representation to ensure proper test coverage in the parent repo.

Change NMT Leaf Node to Hash the NID

Implementation for the Namespaced Merkle Tree (NMT) leaf node must be changed to hash the Namespace ID. In the new implementation, leaf node 'node' of share data d should be structured as follows:

node.n_min = d.namespaceID
node.n_max = d.namespaceID
node.v = h(0x00, d.namespaceID, d.rawData)

See PR #203 opened for celestia-specs for the current structure of the NMT leaf node.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.