Coder Social home page Coder Social logo

Comments (15)

liamsi avatar liamsi commented on August 15, 2024 2

Regarding retrieving the leaves: the bug was indeed in here (cc @musalbas): https://github.com/lazyledger/go-multihash/blob/a31dec8c92fa0f526d20fbc54572953d5a08e03d/sum.go#L202-L207
These two commits (in combination) fixed it: https://github.com/lazyledger/go-multihash/commit/1743be8d8a5bf1b66b1910780599cef5313b61d8 and https://github.com/lazyledger/go-multihash/commit/88bad1265973e3d986433b60817d2e49a72812f3

Summary: when hashing the leaf of the form nid || rawData inside the NMT, we return nid || nid || sha256(leafPrefix || rawData)but as an IPFS block for a leaf we return leafPrefix || nid || rawData here. And the custom multihasher did not reflect that exact behaviour of the NMT previously but instead returned nid || nid || sha256(leafPrefix || nid || rawData) (need to add more tests).

IMO, two custom hasher would be a bit cleaner. Also moving the hashing code to the NMT repo and just registering the hasher from ll-core (instead of forking go-multihash) would indeed be preferable too (as @Wondertan suggested above).

from celestia-core.

Wondertan avatar Wondertan commented on August 15, 2024 1

Way fork MH if you could have just registered new hash func?

https://github.com/multiformats/go-multihash/blob/master/sum.go#229

from celestia-core.

Wondertan avatar Wondertan commented on August 15, 2024 1

@liamsi, can you pls specify all the use cases for IPFS you are willing to have for lazyledger? I need that to understand the picture widely and to propose other possible workarounds not to introduce, IMO, redundant forks.

from celestia-core.

Wondertan avatar Wondertan commented on August 15, 2024 1

@liamsi, the issue you’ve mentioned is probably enough, and I should have read it before, as I did some wrong assumptions regarding your use case and, therefore, gave slightly wrong suggestions. Going to expand on this soon.

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

I'll look into adding a custom multihasher to the plugin. If that does not work, we might need to fork bitswap/ipfs (I hope we can avoid that though).

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

Unfortunately, adding a custom multihash isn't just doable via a plugin it seems. Several places check for particular good / valid hashes: e.g. Encode validates against a list of known codes:
https://github.com/multiformats/go-multihash/blob/6f1ea18f1da5f7735ea31b5e2011da61c409e37f/multihash.go#L269-L270
https://github.com/multiformats/go-multihash/blob/6f1ea18f1da5f7735ea31b5e2011da61c409e37f/multihash.go#L38

This small library has a stricter list of allowed hashes:
https://github.com/ipfs/go-verifcid/blob/a9d170f0d9246da3a1512396520b0d4eb7602c67/validate.go#L15-L31
and is called on pinning, garbage collection, inside the blockservice, and quite a few other places.

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

Also, I'm wondering if we need to add two custom multihashes then. The leaf and the inner nodes are handled differently in terms of the namespaces that get in front of the sha256 hash in the end. It feels a bit odd as other hashers are just this, hash functions that do know nothing about what they are hashing, but here we do need to make a difference for leaves and inner nodes. Additionally, we'd need to "ignore" the max namespace as this is used for the parity shares.

Basically, we'd want these two "hashers":
https://github.com/lazyledger/nmt/blob/6e8a6a5ec3fcae4015ca5dd1923ce033a0458679/internal/hasher.go#L48-L107

from celestia-core.

musalbas avatar musalbas commented on August 15, 2024

Can't you assign different prefixes to inner node and leaf data? Then the hash function will know what it's hashing. That's what NebulousLabs Merkle tree does.

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

Can't you assign different prefixes to inner node and leaf data?

Yeah, we currently do this anyway (in the NMT as well as in the plugin when returning RawData).
Yes, we could read that prefix byte and hash accordingly adding only one hasher. I think it's a bit cleaner to have two different hashers instead of one that parses the input and decides how to hash it according to the embedded prefix. Also, @Wondertan suggested introducing two different Codecs for inner and leaf nodes and not make that byte part of that RawData here; in that case we'd definitely need two hashers).

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

Here is a quick and dirty implementation of only using one hash function: https://github.com/lazyledger/go-multihash/blob/a31dec8c92fa0f526d20fbc54572953d5a08e03d/sum.go#L184

IMO, we should move most of the logic of that code to another repo and the hash function should be a few lines like the other ones in that file. That would be another good argument for two (instead of one) mulihashers, it would make these two functions even a bit cleaner.

For now, I'll keep the above version and see if everything works as expected (over the network). We can iterate on it if it does.

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

Experimented with using one mh in #155 which additionally required these changes:
https://github.com/lazyledger/go-multihash/pull/1
celestiaorg/go-ipfs#1
celestiaorg/go-verifcid#1

I can confirm that this (almost) works: querying for namespaced CID works, retrieving the leaf data doesn't.
For example:
ipfs dag get bagao4amb5yataazahlu5shzrnd5enehjoa6mefxqvg4nav537bp6xje5ynoa3q2yjojhnumb4afjcqvs7kv6ufho3m works as expected but actually going down the path to the leaf via
ipfs dag get bagao4amb5yataazahlu5shzrnd5enehjoa6mefxqvg4nav537bp6xje5ynoa3q2yjojhnumb4afjcqvs7kv6ufho3m/0/0/0/0/0 times out (or doesn't return).

Would expect sth like below as output instead (output from node that has the data locally and doesn't need to request it)::

Click here for output:
$ipfs dag get bagao4amb5yataazahlu5shzrnd5enehjoa6mefxqvg4nav537bp6xje5ynoa3q2yjojhnumb4afjcqvs7kv6ufho3m/0/0/0/0/0
{"Data":"AyA66dkfMWg7d7ZrUXE1YYwJYDrYfwnJkWC9GOrQD7/Ou/gGQS8ld/nIM8O64ZK+jcvS9TXLIVuxlR47VeCgBJRxwpkU2XOMIiWbySpcwYl2mEKqyfAsj7Qa/xTyNEKlNfXVMRkWJGvO2NfzwlnJk9DH7G82MHitMMZUQM1vjw1NvTVn25adRqgaivg58BsCuAkw31EDiv8mcwfLBVyYEIc9UIxXDXpaBB5v5wcCBJYLvkRGBOrVbzer/vUdGIsj9DHSNxhkuUZGBoaiNgIaanrYpnCWF5iOsb5kCU4hl2QufjqrMexz9aVcIO/shcu32i3a4faivf0ATAxvcY1KVBPYkurwTkjL"}

The latter might be bc how we store the data in the leaf directly (instead of just the hashes and having one additional link), or, bc I've introduced some subtle bug when adding the mh.

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

Probably for no good reason. It just seemed the most consistent approach, as we definitely need to fork go-verifcid to add the custom multihash to the "good set": https://github.com/lazyledger/go-verifcid/blob/master/validate.go (note that this also means we need to fork go-ipfs as it uses go-verifcid internally).
And go-verifcid uses go-multihash to add the other hashes. So if we ever want to upstream this all (surely in a cleaner version), we'd likely want to fork go-multihash first. If we can avoid forking go-multihash for now, that would certainly be preferable as each fork comes with a non-negligible maintenance overhead... I'll spend the 2nd half of this week on this. Though the priority will be to understand what is going on with the leaves / retrieving the leaf data first.

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

Yes, happy to summarize it here shortly. Will try do so before end of the week. In the meantime, a slightly out of date and relatively vague (and probably confusing) description can be found here: #35

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

To very briefly summarize, we will use IPFS in the following cases:

  1. as a (LL) block store, which means storing the erasure coded version of blocks (see the image below)
  2. for Data Availability proofs: sample cells from the erasure coded version of a block (see image below), e.g. retrieve the data of CID/0/1/1/0/0 of a row or column tree and validate that it checks out with with the roots in the DataAvailabilityHeader

Ideally, we want the hashes in above data availability root to match with the CIDs (it's OK that IPFS adds a few bytes of overhead to identify the multihash etc.). That is why we want to include the namespaces in the CIDs. Also, that's why I think your suggestion to add one or multiple custom multihashers was exactly right. Actually, I do not see a way around this now. While I could get around adding a custom multihash when initially using a simple merkle tree for some very first experiments, we can't do the same if we flag the resulting sha256 hash with the min/max namespace (which we want to do as explained above).

Looking forward to your input!

On a side note, we will also use a DHT to find nodes that stored data for a particular namespace. But I have not yet thought this through but we will likely use a separate protocol on top (using libp2p).

image

from celestia-core.

liamsi avatar liamsi commented on August 15, 2024

On a side note, we will also use a DHT to find nodes that stored data for a particular namespace. But I have not yet thought this through but we will likely use a separate protocol on top (using libp2p).

As @musalbas mentioned on discord, for a start we can simply use the same mechanism as for DA proofs:

The client does a "search " of tree on IPFS to find the leafs for a namespace, which should be O(log(n))
As each node has its min and max namespace id

from celestia-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.