celestiaorg / nmt Goto Github PK

View Code? Open in Web Editor NEW

107.0 107.0 39.0 1.27 MB

Namespaced Merkle Tree

License: Apache License 2.0

Go 86.56% Bluespec 13.44%

namespaced-merkle-tree

nmt's People

Contributors

Stargazers

Watchers

Forkers

evan-forbes renaynay mattdf jbowen93 vgonkivs rootulp rahulghangas pynchmeister orijtech elias-orijtech preston-evans98 distractedm1nd andrijamitrovic23 ivan-gavran case7026 walldiss daotlresearch ucwong muhtasimtanmoy chirag018 wondertan bidon15 gmh5225 bianyuanop mrparsa313 tahsintunan deljavi937 samtrader3388 naji1368 wolfenergy rijuletn3 hack3rmaaan trikes0a kismetvv mahmutosman imanpajand mickhhll preresearch-labs gitsrc

nmt's Issues

Explore if we want to store inner nodes (in memory) too

See discussion: #16 (comment)

Add proof API

Sth along the lines of:
fn GetWithProof(nid NamespaceID) -> (startIdx int, proof Proof, nodes []NamespacedDataOrHashes, ...)
and
fn VerifyProof(nid NamespaceID, startIdx int, proof Proof, nodes []NamespacedDataOrHashes, ...) -> bool

To build the proof, we need to rebuild (parts of) the tree. Hence, the above call also needs to be able to iterate of the leaf data / leaf hashes of the tree.

Feature Request: document nmt proof format

Overview

Celestia-node now supports retrieving shares with proofs of inclusion 🎉.

Unfortunately, the format of the returned merkle proof doesn't seem to be documented anywhere, which makes it difficult to verify. I'd suggest adding documentation (maybe here) explaining the format of the proof (is the array of siblings generated by an inorder traversal? preorder? postorder?) and demonstrating how to verify it using the go implementation of the nmt.

Define license policy

Documentation for this package isn't displayed on pkg.go.dev because there is no license policy defined. See https://pkg.go.dev/license-policy

Screenshot

nmt: unnecessary and expensive powerOf2 exponentiation using math.Pow then uint(float64) cast yet could simply use left bit shifts aka 1<<N

This code here

nmt/subrootpaths.go

Line 74 in 230d27f

nodeSpan := uint(math.Pow(float64(2), float64(treeDepth-i)))

performs exponentiation for powers of 2 in a very expensive and unnecessary way, this simply could be 1<<N which makes for fast code and this is an articulation for example:

uint(math.Pow(float64(2), float64(8))) == 1<<8
uint(math.Pow(float64(2), float64(1))) == 1<<1
uint(math.Pow(float64(2), float64(12))) == 1<<12

Benchmarks

If you run these benchmarks with go test -run=^$ -bench=Pow2 -count=10 then compare

var sink any

func BenchmarkPow2Naive(b *testing.B) {
        benchmarkPow2(b, func(i int) uint {
                return uint(math.Pow(float64(2), float64(i)))
        })
}

func BenchmarkPow2BitShift(b *testing.B) {
        benchmarkPow2(b, func(i int) uint {
                return 1 << i
        })
}

var cases = []struct {
        i    int
        want uint
}{
        {0, 1},
        {1, 2},
        {2, 4},
        {3, 8},
        {4, 16},
        {5, 32},
        {12, 4096},
        {20, 1048576},
        {24, 16777216},
        {31, 2147483648},
        {63, 9223372036854775808},
}

func benchmarkPow2(b *testing.B, fn func(int) uint) {
        b.ReportAllocs()
        for i := 0; i < b.N; i++ {
                for _, tc := range cases {
                        n := fn(tc.i)
                        if n != tc.want {
                                b.Fatalf("Bad result: got=%d want=%d", n, tc.want)
                        }
                        sink = n
                }
        }

        if sink == nil {
                b.Fatal("Benchmark did not run")
        }
        sink = (any)(nil)
}

Results

$ benchstat before.txt after.txt 
name    old time/op    new time/op    delta
Pow2-8     322ns ± 2%     114ns ± 1%  -64.69%  (p=0.000 n=9+8)

name    old alloc/op   new alloc/op   delta
Pow2-8     40.0B ± 0%     40.0B ± 0%     ~     (all equal)

name    old allocs/op  new allocs/op  delta
Pow2-8      5.00 ± 0%      5.00 ± 0%     ~     (all equal)

Kind cc @elias-orijtech @kirbyquerby @willpoint @jhusdero @musalbas @liamsi

Verification logic does not ensure lexicographical ordering

The NMT guarantees lexicographical ordering and hence generated proofs follow this ordering as well. The verification logic does not enforce this though currently. This might not be a big deal as verification would fail if ordering is different though it makes it harder to reason about this and it is obviously better to enforce ordering there too.

@musalbas suggested to enforce ordering in the hasher directly. While it might seem better to follow the current pattern and fail much earlier (when adding data to the tree and not only later when hashing/creating the root), having the ordering enforced while hashing is better from a consistency perspective. it would also immediately resolve the ordering guarantee during verification as well.

NamespacedMerkleTree.foundInRange should not narrow uint64 to int

nmt/nmt.go

Lines 238 to 245 in 6274243

    
           func (n *NamespacedMerkleTree) foundInRange(nID namespace.ID) (bool, int, int) { 
        
           	// This is a faster version of this code snippet: 
        
           	// https://github.com/celestiaorg/celestiaorg-prototype/blob/2aeca6f55ad389b9d68034a0a7038f80a8d2982e/simpleblock.go#L106-L117 
        
           	foundRng, found := n.namespaceRanges[string(nID)] 
        
           	// XXX casting from uint64 to int is kinda crappy but nebolousLabs' 
        
           	// range proof api requires int params only to convert them to uint64 ... 
        
           	return found, int(foundRng.start), int(foundRng.end) 
        
           }

the XXX'ed comment correctly identifies the narrowing cast from uint64 to int as problematic. I suggest either (1) removing the casts and changing the return types to uint64, or (2) add a range check and panic on overflow. Given #15, and the fact that return types tend to spread to callers, I recommend (1) even if it's a bit more work.

Defining and Clarifying the Verification Process for NMT Proofs Generated from Non-Lexicographically Ordered Trees

Problem

The issue at hand pertains to the generation of NMTs from sets of leaves that are not ordered lexicographically based on their namespaces. The behavior of NMT proofs generated from such malformed trees is unclear and needs to be addressed. The main objectives of this issue are to:

Determine whether the current NMT proof verification system is able to detect out-of-order leaves and respond accordingly.
If the current system is not capable, then develop and implement measures to invalidate NMT proofs generated from malformed NMTs.

Replace the `IntervalDigest` struct with raw bytes

I'm wondering if we should not get rid of this struct entirely and instead store this as raw bytes min||max||digest instead. For instance, when using this with ipfs, the CID will essentially be this plus some overhead bytes that stem from the multihash standard.

Originally posted by @liamsi in #35 (comment)

bug: Hasher is not go-routine safe

On lazyledger-core, we access NMTs hasher (e.g when hashing leaves) in several go routines.

These will try to re-use the same hasher instance: https://github.com/lazyledger/nmt/blob/6dc97cc6f9fdd1c800108d2cab06a4d8b088ea8a/hasher.go#L98

We should create a new instance instead.

This might be related to: celestiaorg/celestia-core#330

At least tests with larger blocks panic with:

panic: d.nx != 0

goroutine 85 [running]:
crypto/sha256.(*digest).checkSum(0xc00043d9f0, 0x0, 0x0, 0x0, 0x0)
        c:/go/src/crypto/sha256/sha256.go:234 +0x1e0
crypto/sha256.(*digest).Sum(0xc00009eb00, 0x0, 0x0, 0x0, 0xc0, 0xb6, 0x0)
        c:/go/src/crypto/sha256/sha256.go:210 +0x70

refactor(share/ipld): Remove dependency on redundant namespaceID prepend

    After discussing with @Wondertan, we realized that https://github.com/celestiaorg/celestia-node/issues/183 can be tackled without modifying the NMT library at all (and instead by modifying this [visitor](https://github.com/celestiaorg/celestia-node/blob/ed4cdf03f2bd35c30d5b9874d4bc40bea6f5a061/ipld/nmt_adder.go#L34)). So we could put this on the back burner for now.

I still think the suggested change makes sense. It would also allow us to delete the PrefixedData type entirely. But it might make sense to do this as part of a somewhat larger refactoring (can/should be split into several PRs) where we also tackle #15. In it's current form and because of the usage of this interface we still need to do a copy which merges the nid and the data. Hlib convinced me that it is better to hide this copying inside of the NMT though. Currently, it happens in the wrapper which is bad usability. Especially as Hlib put it:

The reason is; right now, we can avoid storing unnecessary data, but it's hacky. Meaning that on Push we need to prepend, and on Visit remove that we prepend so we don't store it

Originally posted by @liamsi in #55 (comment)

nmt: GetSubrootPaths: top could simply invoke make([][][]int, closingRow-startRow-1) and append once, instead of an append in a for loop

If we look at

nmt/subrootpaths.go

Line 165 in 230d27f

for i := 1; i < (closingRow-startRow)-1; i++ {

that code can use an invocation to make then a single append, and one could argue that allocating that memory in one shot can be a problem but we invoke append in a loop anyways, so same result

CC @elias-orijtech

Add possibility to store nodes of the tree (MapStore)

Also reuse stored hashes instead of recomputing them each time.

Copying the related review comment here:

There are other changes that we may need to make to the underlying merkletree implementation that may not be mergeable upstream. For example, we should probably change the implementation to store key=>value pairs of nodes in the tree using the same interface as the SMT MapStore, so that we can plug in other MapStores like an IPFS-based one, and reuse the same code: https://github.com/lazyledger/smt-ipfs-mapstore

This is something that needs to be investigated a bit though, as from my understanding the NebulousLabs merkletree library I think may compute the root on demand and doesn't keep around any of the old nodes in memory. Alternatively, it may be possible to implement this MapStore storage functionality within the NMT library, which uses the MapStore as "cache", and only invokes the NebulousLabs merkletree library to recompute roots if the node data is unavailable in the MapStore.

related to: #6

nmt: potential overflow if the the power of 2 argument (treeDepth-i) is greater than bits.Len(math.MaxUint)

Noticed in an audit and in #72 that that code passes (treeDepth-i) unfettered and sure with invariants somewhere but it'll overflow if that value exceeds 31 on 32-bit computers then 63 on 64-bit computers. For safety, let's panic if that value ever exceeds those limits

Kindly cc-ing @elias-orijtech

Export nmt-specific hashers

It might make sense to export the nmt-specific hashing currently "hidden" in the internal package:
https://github.com/lazyledger/nmt/tree/master/internal

This particularly makes sense in the light of the IPLD plugin. Instead of redefining the logic when adding a custom hasher (see: celestiaorg/celestia-core#155, https://github.com/lazyledger/go-multihash/pull/1 and in particular this (wip) code), we could just import the hashers from here instead. Besides exporting it would require some minor changes to the hasher api to easily match that of HashFunc.

The main benefit of doing this is that the logic is defined in one place only. As a result, we don't need to keep two implementations in sync. Furthermore, we do not need to make sure (e.g. via unit tests) that the hasher used in the nmt internally matches the one used to compute the Sum of an IPLD block (as the registered custom multihasher just imports the code from here).

Note to use the hashers as described above, ideally, we would register two multihashers instead of one (and looking at the prefix as we do currently).

New verify and create functions are needed for NamespaceMerkleTreeInclusionProof.

A new verify function is needed to verify a single share via a NamespaceMerkleTreeInclusionProof defined in celestia-specs.

This function should take (i) a NMT root of type namespace.IntervalDigest, (ii) a hash function of type hash.Hash, (iii) a proof of type NamespaceMerkleTreeInclusionProof and (iv) a share of type Share, and output a boolean and an error that is (true, nil) if the share is committed by the NMT root inside the NMT and the proof of type NamespaceMerkleTreeInclusionProof is correct.

Similarly, a new create function is needed to create a NamespaceMerkleTreeInclusionProof. This function should take the index of the Share inside the NMT and output the NamespaceMerkleTreeInclusionProof corresponding to the share.

Move underlying tree library into org or into internal package

Move the underlying merkletree implementation to the LL org, or, move it into the internal package of the nmt library directly.

ref: #3 (comment)

fix: Resolving Inconsistency in the ProveNamespace Signature for Error Handling

Problem

Identified as part of this EPIC celestiaorg/celestia-app#1296.
This issue is to investigate and address the inconsistency in the ProveNamespace signature, where it indicates that the method should return an error but in its current implementation, the error value is always nil.

Acceptance Criteria

Identify the cases where the method should return a proper error or remove the error from the method signature if no error conditions are perceived. Refactor the code, if necessary.

feat: verify the range of the namespace IDs of left and right nodes in the Hasher

Problem

The Hasher implementation does not verify the range of namespace IDs in the left and right nodes with respect to each other, which is crucial for proper ordering of leaves based on their namespace IDs and for detecting invalidly constructed trees and proofs.

Quoting from Lazy Ledger paper:

An adversarial consensus node may attempt to produce a block that contains a Merkle tree with children that are not ordered correctly. To prevent this, we can set a condition in nsHash such that there is no valid hash when leftMaxNs ≥ rightMinNs, and thus there would be no valid Merkle root for incorrectly ordered children

The issue was identified as part of celestiaorg/celestia-app#1296

Acceptance Criteria

To modify the Hasher implementation by incorporating this additional namespace ID range check for the supplied children nodes. Possibly, the Hasher can return an error for invalid inputs. Subsequent refactor needs to take place to properly consume this change.

Computing leaf node hash must include namespace ID

Ref: celestiaorg/celestia-specs#145

The namespace ID of the leaf needs to be included in the computation of the leaf node hash.

Bug: Proofs of Absence can be Forged

Problem

The namespaced merkle tree exposes a VerifyNamespace method whose documentation states that it "verifies that the namespace is complete and no leaf of that namespace was left out in the proof."

This API allow consumers to write code like the following:

proof, namespaceData := getUntrustedProofFor(namespaceId)
// check that the namespaceData was all returned
if proof.VerifyNamespace(sha256.New(), namespaceId, namespaceData, n.Root()) {
    process(namespaceData)
} else {
    return errors.New("prover lied about namespace data")
}

Unfortunately, this code is vulnerable. A malicious prover can falsely claim that the namespace is empty, and, as long as he also provides an empty proof, the claim will be accepted.

Test Case

A simple test case which reproduces the issue is as follows:

func TestNMT_forgeNamespaceEmptinessProof(t *testing.T) {
	data := [][]byte{
		append(namespace.ID{0}, []byte("leaf_0")...),
		append(namespace.ID{0}, []byte("leaf_1")...),
		append(namespace.ID{1}, []byte("leaf_2")...),
		append(namespace.ID{1}, []byte("leaf_3")...)}
	// Init a tree with the namespace size as well as
	// the underlying hash function:
	tree := New(sha256.New(), NamespaceIDSize(1))
	for _, d := range data {
		if err := tree.Push(d); err != nil {
			panic(fmt.Sprintf("unexpected error: %v", err))
		}
	}

	root := tree.Root()
	actualLeaves := tree.Get(namespace.ID{0})
	if len(actualLeaves) == 0 {
		t.Fatalf("Get(namespace.ID{0}) should have returned two leaves but returned none.")
	}

	forgedProof := Proof{
		start:                   0,
		end:                     0,
		nodes:                   [][]byte{},
		leafHash:                []byte{},
		isMaxNamespaceIDIgnored: true,
	}

	forged_proof_succes := forgedProof.VerifyNamespace(sha256.New(), namespace.ID{0}, [][]byte{}, root)
	if forged_proof_succes {
		t.Fatalf("Successfully verified proof that non-empty namespace was empty")
	}
}

Root Cause

This behavior is introduced by the following condition here:

// empty range, proof, and data: always checks out
if len(data) == 0 && isEmptyRange && len(proof.nodes) == 0 {
    return true
}

Problem

EPIC: celestiaorg/celestia-app#1296.

Based on the comments by @liamsi in #101 (review)

It would be beneficial to include an appendix of realistic test vectors in the NMT spec for implementers to use when verifying their implementation. These test vectors should specifically cater to the nmt implementation for Celestia and match its parameters.

Acceptance Criteria

Once there is complete confidence in the implementation, proper test vectors shall be generated and added to the specs.

Implement algorithm for subtree root paths

Spec:

Pure function that takes arguments: square size, share index start, and share length, and returns a minimal path to the subtree root that encompasses that entire range, with the path starting from the nearest row root.

Vulnerability in VerifyNamespace: Partial Absence Proofs

Problem

Identified as part of EPIC celestiaorg/celestia-app#1296.
The current implementation of VerifyNamespace is vulnerable to an issue in which a queried node can act inconsistently with the protocol specifications for when a queried namespace ID does not have corresponding items in the tree (i.e., absence proof). The attack/issue involves providing a partial absence proof for a namespace ID, while remaining undetected by the verification implementation. A partial proof in this case refers to a proof of an ancestor of a leaf to the root, rather than a proof of inclusion of the leaf to the tree. This is because the implementation cannot distinguish between a full or partial proof.

Note: Please note that although VerifyNamespace is unable to detect partial proofs, the soundness of the namespace proof is not compromised. In other words, a malicious node is not able to convince someone of the absence of a namespace ID that actually exists in the tree. However, the inability to distinguish this inconsistent behavior may lead to incorrect conclusions being drawn at the application layer about the underlying data represented by NMT. The impact of this issue depends on the context in which NMT is being used and may or may not be considered an attack. Nevertheless, it is important to address this issue properly as it currently causes inconsistency and ambiguity in the verification process.
Update: I suspect that the current implementation may be intentional to help with the non-interactive default rules for inclusion proofs of Blobs. Even if this is the case, the specs needs to be updated to make it clear whether partial absence proofs are considered valid or not.

Root Cause:
This is because VerifyNamespace does not consider the height of the tree or the total number of leaves during verification. Furthermore, the absence proof is based solely on the leaf hash and not on its underlying data. As a result, any internal node (non-leaf node) can be used as a leaf hash in the proof.

Example:
Below is an example of such a partial proof that can be successfully verified by the VerifyNamespace algorithm.
Description of the example: A correct absence proof for nID=1 would consist of the following: proof.start=1, proof.end=2, proof.leafHash=leaf1, and proof.nodes=[leaf0, hash2].
However, it is possible to construct a partial proof that would also be considered correct by the verifyNamespace function.
This alternate proof would have proof.start=0, proof.end=1, proof.leafHash=hash1, and proof.nodes=[hash2].

nID := namespace.ID{1}
// create a tree with 4 leaves, variables names match the figure above
n := New(sha256.New(), NamespaceIDSize(1))
d0 := append([]byte{0}, []byte("data0")...)
d1 := append([]byte{2}, []byte("data1")...)
d2 := append([]byte{3}, []byte("data2")...)
d3 := append([]byte{4}, []byte("data3")...)

n.Push(d0)
n.Push(d1)
n.Push(d2)
n.Push(d3)

// this will populate n.leafHashes with the hash of d0...d3
n.computeLeafHashesIfNecessary()

root := n.Root()
hash1 := n.computeRoot(0, 2)
hash2 := n.computeRoot(2, 4)
leaf0 := n.leafHashes[0]
leaf1 := n.leafHashes[1]

// attack scenario: create a partial proof from hash1 to the root instead of leaf1 to the root
partialProof := Proof{start: 0, end: 1, leafHash: hash1, nodes: [][]byte{hash2}, isMaxNamespaceIDIgnored: true}
// run VerifyNamespace for the fabricated absence proof and see if it can pass
if partialProof.VerifyNamespace(sha256.New(), nID, nil, root) {
    fmt.Println("partial proof is successfully verified")  // this will be executed
} else {
    fmt.Println("verification of the partial proof failed")
}

// correct scenario: create a full proof from leaf1 to the root
fullProof := Proof{start: 1, end: 2, leafHash: leaf1, nodes: [][]byte{leaf0, hash2}}
if fullProof.VerifyNamespace(sha256.New(), nID, nil, root) {
    fmt.Println("full proof is successfully verified")  // this will be executed
} else {
    fmt.Println("verification of the full proof failed") 
}

Proposed Solutions

One of the following options will solve this issue:

If the current behaviour does not cause serious issue in the entire system, then we shall consider the current behavior of the VerifyNamespace valid and include it in the specification.
For an absence proof, we should return the underlying leaf data instead of the leaf hash so that the internal nodes wont be acceptable in place of leaf nodes. Please see this comment for the rationale of this solution.
To prevent partial proofs from being successfully verified, we can modify the implementation of VerifyNamespace by adding a new parameter to take the size of the tree, i.e., the number of data items it represents. This new parameter would be used to adjust the verification process to account for the tree size. By doing so, we can ensure that partial proofs do not comply with the height of the tree, since they will include fewer nodes than what is required for a full proof.

Update to Go 1.18

Go 1.18 adds support for fuzzing so this is a prerequisite to #60

namespace: PrefixedData/PrefixedData8 are easy to misuse

The PrefixedData and PrefixedData8 are both byte slices:

// PrefixedData simply represents a slice of bytes which consists of
// a namespace.ID and raw data.
// The user has to guarantee that the bytes are valid namespace prefixed data.
// Go's type system does not allow enforcing the structure we want:
// [namespaceID, rawData ...], especially as this type does not expect any
// particular size for the namespace.
type PrefixedData []byte

// PrefixedData8 like PrefixedData is just a slice of bytes.
// It assumes that the slice it represents is at least 8 bytes.
// This assumption is not enforced by the type system though.
type PrefixedData8 []byte

The definitions are problematic, because (1) casting byte slices may violate the type invariants (as pointed out in the comments) (2) they're mutable (3) PrefixedData has no way to indicate the namespace identifier length. For an example where mutability is a problem, see

nmt/nmt.go

Lines 253 to 257 in 6274243

    
           // Push adds data with the corresponding namespace ID to the tree. 
        
           // Returns an error if the namespace ID size of the input 
        
           // does not match the tree's NamespaceSize() or the leaves are not pushed in 
        
           // order (i.e. lexicographically sorted by namespace ID). 
        
           func (n *NamespacedMerkleTree) Push(namespacedData namespace.PrefixedData) error {

where it is not clear to the caller that the parameter namespacedData is retained by the Push method.

The missing namespace length is problematic for validateAndExtractNamespace,

nmt/nmt.go

Lines 336 to 340 in 6274243

    
           func (n *NamespacedMerkleTree) validateAndExtractNamespace(ndata namespace.PrefixedData) (namespace.ID, error) { 
        
           	nidSize := int(n.NamespaceSize()) 
        
           	if len(ndata) < nidSize { 
        
           		return nil, fmt.Errorf("%w: got: %v, want >= %v", ErrMismatchedNamespaceSize, len(ndata), nidSize) 
        
           	}

because it can only validate that the namespace+data total length is at least as long as the expected namespace length.

Perhaps the definitions of PrefixedData and PrefixedData8 are caused by wanting to support zero-copy and zero-allocation use, where the user generates PrefixedData values by slicing existing byte slices. If so, the behaviour of Push should document that argument values are retained.

Otherwise, I recommend changing the definition to hide the internal structure, and add validating constructors. There are several possibilites:

type PrefixedData struct {
    // namespace and data can be sliced from the same
    // underlying []byte allocation.
    namespace []byte
    data []byte
}

func NewPrefixedData(namespace, data []byte) PrefixedData {
    bytes := make([]byte, len(namespace)+len(data))
    d := PrefixedData{
        namespace: bytes[:len(namespace)],
        data: bytes[len(namespace)],
    }
    copy(d.namespace, namespace)
    copy(d.data, data)
    return d
}

or equivalently

type PrefixedData struct {
    namespaceAndData []byte
    namespaceLen int
}

@odeke-em

Automated CHANGELOG.md check for nmt

There should be an automated step that ensures that a CHANGELOG.md file is appropriately updated for all merges to the main branch.

There should be a proper description in an issue and corresponding PR as well as a summary in the CHANGELOG.md file. This will be enforced through an automated action. In rare cases for small PRs it may be acceptable to skip this.

Make `IntervalDigest` json encodable

Some tests in lazyledger-core will require DataAvailabilityHeader to be encoded via json.

This issue needs to be solved before #312 can be merged.

The json representation of IntervalDigest might be useful for #21 as well.

reconsider Push to take in the pair (nid, data) as one argument instead

Push currently looks like this:
https://github.com/lazyledger/nmt/blob/e0a317aa2d0ff84fcf1478527e4c90e6a13245d7/nmt.go#L249

The reason for this, is that in a few parts of the code, we need to operate on the namespaceID independently. Also, at the very beginning of the implementation, I did not fully understand that having a one to one mapping of one NMT <-> one namespace size would be the only reasonable use-case. The implementation already stores the namespace size and internally could split and merge the passed in argument into (nid, data) and back.

The only downside is that the API looks a bit less clear. This can easily be mitigated by having a well documented type alias type NamespacedData []byte.

I'm proposing to change Push to func (n *NamespacedMerkleTree) Push(ndata NamespacedData) error.

This would also slightly simplify the wrapper that is needed for the rsmt2d library (see #25 ).

api: `Push` for data/shares with decoupled namespace

Currently, tree.Push requires passing namespace.PrefixedData, which is a byte slice prepending namespace.ID. However, there is a use case where we don't need to push data that is not prepended with a namespace.ID, and it can be passed as a separate field. This is the case where we push parity data generated via rsmt2d. Currently, we prepend parity namespace to the parity shares, which is not required. This is one of the issues causing us to store an additional 8 bytes of parity namespace per parity share via NMTWrapper, as outlined in celestiaorg/celestia-node#183. I propose adding an additional method like PushWith to the nmt.Tree to allow passing decoupled namespace from the share data to allow mentioned disk usage optimization.

Use the go-internal fuzzer soon.

We should switch to the go-internal fuzzer soon.

Originally posted by @liamsi in #58 (comment)

feat: expose methods to get min and max namespace IDs of an NMT

Problem

Currently, there is no exposed method in the nmt API to get the min and max namespace IDs of an NMT. To do so, one needs to compute the root and then make another function call to spot the min and max namespace IDs encoded in the root.

Acceptance Criteria

Augment the current NMT API with getMinNamespace and getMaxNamespace as methods of the NamespaceMerkleTree struct. Proper unit tests should be included as well.

feat: Handling Panics in HashNode Method

Problem

Identified as part of this EPIC: celestiaorg/celestia-app#1296.
The HashNode method of the NMT may potentially cause a panic if it receives invalid or malformed inputs (as mentioned by @evan-forbes in this comment #102 (comment)). For example, if the left or right children do not conform to the namespace hash format, meaning their length is less than the namespace ID size, or if the relation between the left and right children namespace range is invalid. If an invalid node is sent to a light node, it can result in a crash.

It is important to note that returning an error from this method is not possible as it would violate the interface it is intended to fulfill i.e., NodeHasher and Hash.

Acceptance Criteria

This issue is to implement proper error handling mechanism for HashNode.

Add the possibility to "ignore" namespace IDs

In other words implement: celestiaorg/celestia-specs#59

api: Make the api consistent

When we push data to the tree, we are using prefixed data, but when we verify proofs, we are using non-prefixed data. Whatever the API ends up being, we should probably make it consistent.

This also applies to creating and verifying proofs. When creating proofs, we have two different functions, one for creating a ranged proof, and another for creating a proof to a single leaf. When verifying proofs, we are only using a single function to verify proofs of a single leaf or a range of leaves. Whatever we choose, we should probably make this consistent as well.

relevant comment #58 (comment)

related to #55

Create a rsmt2d.Tree interface wrapper for NamespacedMerkleTree

We need a wrapper that fulfills the rsmt2d.Tree interface. That wrapper will need to be able to create nmt roots assuming that the namespaces are included in the data being pushed, but only for the first half of the tree. This means the wrapper should be aware of the square size ahead of time.

realted ll-core issue

chore: separation of NMT lib usage guide from the specs

As suggested by @liamsi, it would be better to split the specs and NMT usage lib into two different docs with the following directory structure:
/
|

docs
|
- specs

Consider removing abstractions in favour of clarifying requirements to data serialization

The following might result in a slightly simpler API:

get rid of the namespaced data abstraction altogether and change the Push method to take in raw byte pairs (nid, data) instead. This would simplify the API and reduce the number of introduced types / interfaces (see #3 (comment))

I think clarifying the requirements on how the data should be serialized and sorted is favourable over adding serialization concerns into the NMTs API.

ref: #3 (comment)
related discussion: #3 (comment)

Remove NebulousLabs' merkletree dependency for proofs as well

#14 removed the dependency on NebulousLabs' implementation for computing the root of the tree and replaced it a very simple recursive algo that boils down to a few lines of code (derived from the recursive definition of rfc6962 and very similar to the simple merkle tree implementation in tendermint.

It would be nice if the same internal mechanism would be used for generating/verifiying proofs (instead of using the range proofs of nebolousLabs).

Work around the panic thrown by IntervalDigestFromBytes

IntervalDigestFromBytes is meant to deserialize bytes commonly received from the wire that does not guarantee to compel with the panic check, allowing a non-valid network message to kill an application that uses the lib. We can either:

Throw an error instead
Force lib user to do an additional check before or recover from panic :)

But obviously, the first one is more preferable.

clarify go-routine safety

Most methods in the nmt are not go-routine safe. Investigate which need to be (if any) and clarify this in the go-docs.

related: #37

feat: compute leaf hashes immediately after data items are pushed

Problem

Identified as part of this EPIC: celestiaorg/celestia-app#1296.
When adding data items to the tree, their hash is not directly calculated. Instead, every time we need to access the leaf hashes, we have to compare the number of data items added so far (which can be found in the leaves field of NamespaceMerkleTree) with the leaf hashes (which are stored in the leafHashes field of the same struct). For example, see the computeRoot() method and computeLeafHashesIfNecessary() function.
This can make debugging and code maintenance more challenging.

Acceptance criteria

To simplify the process, it would be better to hash the data items immediately after they are pushed. We can address this issue by implementing this feature in the Push method and refactoring the other parts of the code as necessary.

chore: add a link to the nmt specification inside the nmt implementation

Problem

A small tracking issue to ensure that once the specification of the NMT is complete, a link to it is integrated into the nmt.go file.

EPIC: celestiaorg/celestia-app#1296.

Min/max definitions in diagram in readme don't match specs and implementation

If I'm understanding the diagram (https://github.com/celestiaorg/nmt/blob/6d5b99ddb71ae8f8c11d7a5fedc3b84532e949a7/imgs/example_4-leaves.png) correctly, it seems to have mistakes in how the mins and maxes are calculated. For example:

min_2.0 = min(min_1.0, max_1.1)

Judging from the specs (https://github.com/celestiaorg/celestia-specs/blob/ec98170398dfc6394423ee79b00b71038879e211/src/specs/data_structures.md) and the implementation (

nmt/hasher.go

Lines 128 to 131 in 6d5b99d

    
           leftMinNs, leftMaxNs := l[:n.NamespaceLen], l[n.NamespaceLen:flagLen] 
        
           rightMinNs, rightMaxNs := r[:n.NamespaceLen], r[n.NamespaceLen:flagLen] 
        
           minNs := min(leftMinNs, rightMinNs)

) it should instead say:

min_2.0 = min(min_1.0, min_1.1)

And analogously for all the other non-leaf min and max definitions in the diagram.

Enable markdownlint

[no change needed] but markdownlint will complain here because multiple H1 headings are used in the same document. Do we want to enable markdownlint in this repo configure a CI check for it to conform to other celestiaorg repos?

no change needed because this spec can be reviewed without conforming to markdownlint and we can tackle that in a separate issue / PR if desired.

Originally posted by @rootulp in #101 (comment)

nmt: prune's idxStart, idxEnd, maxWidth bounds aren't checked

This code in prune doesn't have sanity bounds checks and these vectors cause panics

prune(0, 0, 0)
prune(0, 1, 0)
prune(0, 1, 1)
prune(1, 0, 0)

$ go test -run=Prune -fuzz=^$ -short
--- FAIL: TestPrune (0.00s)
panic: runtime error: slice bounds out of range [:-1] [recovered]
	panic: runtime error: slice bounds out of range [:-1]

goroutine 18 [running]:
testing.tRunner.func1.2({0x12271e0, 0xc00012c018})
	/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/testing/testing.go:1451 +0x24e
testing.tRunner.func1()
	/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/testing/testing.go:1454 +0x39f
panic({0x12271e0, 0xc00012c018})
	/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/panic.go:884 +0x213
github.com/celestiaorg/nmt.prune(0x0, 0x0, 0x0)
	/Users/emmanuelodeke/go/src/github.com/celestiaorg/nmt/subrootpaths.go:52 +0x6e5
github.com/celestiaorg/nmt.TestPrune(0x0?)
	/Users/emmanuelodeke/go/src/github.com/celestiaorg/nmt/vuln_test.go:10 +0x45
testing.tRunner(0xc000116820, 0x12479f8)
	/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/testing/testing.go:1501 +0x10b
created by testing.(*T).Run
	/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/testing/testing.go:1548 +0x35f
exit status 2
FAIL	github.com/celestiaorg/nmt	0.369s

Define String representation of IntervalDigest

@liamsi suggested printing IntervalDigest as CID in a private discussion, however that may require additional go-cid dependency for the lib. Anyway, we should define a concrete representation to ensure proper test coverage in the parent repo.

Change NMT Leaf Node to Hash the NID

Implementation for the Namespaced Merkle Tree (NMT) leaf node must be changed to hash the Namespace ID. In the new implementation, leaf node 'node' of share data d should be structured as follows:

node.n_min = d.namespaceID
node.n_max = d.namespaceID
node.v = h(0x00, d.namespaceID, d.rawData)

See PR #203 opened for celestia-specs for the current structure of the NMT leaf node.

Consistent receiver on NamespacedMerkleTree

Just noted that NamespacedMerkleTree sometimes uses a value receiver some a pointer receiver. This should rather be consistent.

Implement NMT in Rust + make it easy to use from other languages

We got this feedback:

Verifying NMT proofs will probably a common operation. It would be nice eventually to have an implementation in a language like C/C++/Rust that people could access via FFI/bindings.

I think we should prioritize this post-launch.

	func (n *NamespacedMerkleTree) foundInRange(nID namespace.ID) (bool, int, int) {
	// This is a faster version of this code snippet:
	// https://github.com/celestiaorg/celestiaorg-prototype/blob/2aeca6f55ad389b9d68034a0a7038f80a8d2982e/simpleblock.go#L106-L117
	foundRng, found := n.namespaceRanges[string(nID)]
	// XXX casting from uint64 to int is kinda crappy but nebolousLabs'
	// range proof api requires int params only to convert them to uint64 ...
	return found, int(foundRng.start), int(foundRng.end)
	}

	// Push adds data with the corresponding namespace ID to the tree.
	// Returns an error if the namespace ID size of the input
	// does not match the tree's NamespaceSize() or the leaves are not pushed in
	// order (i.e. lexicographically sorted by namespace ID).
	func (n *NamespacedMerkleTree) Push(namespacedData namespace.PrefixedData) error {

	func (n *NamespacedMerkleTree) validateAndExtractNamespace(ndata namespace.PrefixedData) (namespace.ID, error) {
	nidSize := int(n.NamespaceSize())
	if len(ndata) < nidSize {
	return nil, fmt.Errorf("%w: got: %v, want >= %v", ErrMismatchedNamespaceSize, len(ndata), nidSize)
	}

	leftMinNs, leftMaxNs := l[:n.NamespaceLen], l[n.NamespaceLen:flagLen]
	rightMinNs, rightMaxNs := r[:n.NamespaceLen], r[n.NamespaceLen:flagLen]

	minNs := min(leftMinNs, rightMinNs)

celestiaorg / nmt Goto Github PK

nmt's People

Contributors

Stargazers

Watchers

Forkers

nmt's Issues

Overview

Screenshot

Benchmarks

Results

Problem

Problem

Acceptance Criteria

Problem

Acceptance Criteria

Bug: Proofs of Absence can be Forged

Problem

Test Case

Root Cause

Suggested Solution

Problem

Acceptance Criteria

Problem

Proposed Solutions

Problem

Acceptance Criteria

Problem

Acceptance Criteria

Problem

Acceptance criteria

Problem

Recommend Projects

Recommend Topics

Recommend Org