Coder Social home page Coder Social logo

rust-multiaddr's Introduction

rust-multiaddr

Travis CI codecov.io crates.io

multiaddr implementation in Rust.

Table of Contents

Install

First add this to your Cargo.toml

[dependencies]
multiaddr = "*"

then run cargo build.

Usage

extern crate multiaddr;

use multiaddr::{Multiaddr, multiaddr};

let address = "/ip4/127.0.0.1/tcp/1234".parse::<Multiaddr>().unwrap();
// or with a macro
let other = multiaddr!(Ip4([127, 0, 0, 1]), Udp(10500u16), QuicV1);

assert_eq!(address.to_string(), "/ip4/127.0.0.1/tcp/1234");
assert_eq!(other.to_string(), "/ip4/127.0.0.1/udp/10500/quic-v1");

Maintainers

Captain: @dignifiedquire.

Contribute

Contributions welcome. Please check out the issues.

Check out our contributing document for more information on how we work, and about contributing in general. Please be aware that all interactions related to multiformats are subject to the IPFS Code of Conduct.

Small note: If editing the README, please conform to the standard-readme specification.

License

MIT © 2015-2017 Friedel Ziegelmeyer

rust-multiaddr's People

Contributors

ackintosh avatar centril avatar demi-marie avatar dependabot[bot] avatar dignifiedquire avatar eira-fransham avatar galargh avatar gnunicorn avatar hrxi avatar john-littlebearlabs avatar jxs avatar koushiro avatar kpp avatar maciejhirsz avatar mxinden avatar mzabaluev avatar pawanjay176 avatar peat avatar progval avatar richardlitt avatar rklaehn avatar romanb avatar sdbondi avatar stebalien avatar stormshield-frb avatar thomaseizinger avatar tomaka avatar twittner avatar vmx avatar web-flow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rust-multiaddr's Issues

Purpose of Arc in Multiaddr

i was searching for why Arc was used after my overnight test shown additional allocation tracing back to the Arc in Multiaddr (probably unrelated, though it did catch my eye). The previous commit that made the change did not mention the reason for using it, and looking over the code, removing the Arc should work just fine and without being a breaking change.

tests/lib: Update `impl Arbitrary for Proto`

Arbitrary implementation seems to be missing some enum variants.

Protocol has 30 variants. Arbitrary implementation generates 25.

Ideally this would be fixed by removing the need to list every Protocol variant. Though I can not think of a clean way of doing that. Simply updating the Arbitrary implementation seems fine for now.

impl Arbitrary for Proto {
fn arbitrary<G: Gen>(g: &mut G) -> Self {
use Protocol::*;
match u8::arbitrary(g) % 26 {
// TODO: Add Protocol::Quic
0 => Proto(Dccp(Arbitrary::arbitrary(g))),
1 => Proto(Dns(Cow::Owned(SubString::arbitrary(g).0))),
2 => Proto(Dns4(Cow::Owned(SubString::arbitrary(g).0))),
3 => Proto(Dns6(Cow::Owned(SubString::arbitrary(g).0))),
4 => Proto(Http),
5 => Proto(Https),
6 => Proto(Ip4(Ipv4Addr::arbitrary(g))),
7 => Proto(Ip6(Ipv6Addr::arbitrary(g))),
8 => Proto(P2pWebRtcDirect),
9 => Proto(P2pWebRtcStar),
10 => Proto(P2pWebSocketStar),
11 => Proto(Memory(Arbitrary::arbitrary(g))),
// TODO: impl Arbitrary for Multihash:
12 => Proto(P2p(multihash(
"QmcgpsyWgH8Y8ajJz1Cu72KnS5uo2Aa2LpzU7kinSupNKC",
))),
13 => Proto(P2pCircuit),
14 => Proto(Quic),
15 => Proto(Sctp(Arbitrary::arbitrary(g))),
16 => Proto(Tcp(Arbitrary::arbitrary(g))),
17 => Proto(Udp(Arbitrary::arbitrary(g))),
18 => Proto(Udt),
19 => Proto(Unix(Cow::Owned(SubString::arbitrary(g).0))),
20 => Proto(Utp),
21 => Proto(Ws("/".into())),
22 => Proto(Wss("/".into())),
23 => {
let a = iter::repeat_with(|| u8::arbitrary(g))
.take(10)
.collect::<Vec<_>>()
.try_into()
.unwrap();
Proto(Onion(Cow::Owned(a), std::cmp::max(1, u16::arbitrary(g))))
}
24 => {
let a: [u8; 35] = iter::repeat_with(|| u8::arbitrary(g))
.take(35)
.collect::<Vec<_>>()
.try_into()
.unwrap();
Proto(Onion3((a, std::cmp::max(1, u16::arbitrary(g))).into()))
}
25 => Proto(Tls),
_ => panic!("outside range"),
}
}
}

Use trait object

I believe all the multiformats are meant to be extensible, so a Multiaddr trait object is probably more appropriate than a closed enum.

avoid breaking network compatibility

As witnessed in libp2p/rust-libp2p#3244 (comment) adding a new protocol to the multiaddr implementation breaks communication between nodes using the new feature and nodes using an older version of this library. This would be warranted if understanding the meaning of a multiaddr is always mandatory, but there are cases (like the identify protocol) where that is not the case.

Breakage for such cases could be avoided by adding a new layer of validation: besides syntactically invalid and fully understood there could be a class that is syntactically valid but not fully understood.

Due to the design of multiaddr syntax, this is not a trivial question: a protocol segment may have arguments, like /tcp/1234, and without understanding the protocol name it is impossible to know the number of arguments. It would have been possible to choose different separator characters (like /tcp=1324 or some such), but that ship has sailed. So the only way to express syntactically valid but not fully understood addresses is to add a variant like Protocol::Unknown(Cow<'a, str>), where /tcp2/1234 would lead to two unknown segments (with tcp2 and 1234 payloads, respectively).


The alternative to handling this in this library is to always deserialize a multiaddr property as String first and then check whether it can be fully parsed if needed. However, given that multiaddr aims to offer an abstraction over various addressing schemes, I think it is reasonable to expect that this scheme itself is extensible and handles extensions in a graceful fashion.

Tooling for unparsable `Multiaddr`

Sample use-case:

  • Roll out of new Protocol::X.
  • Old DHT nodes should forward Multiaddrs with Protocol::X of new nodes even though they can't parse the Multiaddr. Currently they don't.

Considerations:

  • Instead of providing e.g. an Unparsable type in multiaddr, each user that cares could also carry a Either<Multiaddr, Vec<u8>>.
  • One could add a Protocol::Unparsable(Vec<u8>) containing the remaining unparsable bytes. This might break existing implementations as they depend on Multiaddr either to succeed or fail, but not fail partially.

Related:

Remove `from_url` module

As discussed in #71, we should remove the from_url module to reduce the surface area of this crate's public API.

Users are encouraged to implement the from_url functionality themselves and/or create their own crate if they need to reuse the code in different projects.

Change Multiaddr internal structure to store parsed address

As of now, Multiaddr stores a byte array, representing the serialized version of the address.

Although this allows very fast serialization (array copy), I believe this is not the most appropriate format.
Indeed, operations on Multiaddr (eg. decapsulate) actually work on the list of protocols, and not on the byte representation. The bug mentioned in PR #18 is a good example of it.

I see two possible representations:

  • Make Multiaddr store a Vec<Addr>, with Addr being an Enum for all supported protocols
  • Make Multiaddr only store an Addr and an Option<Multiaddr> for the inner and/or outer multiaddr.

Make decapsulate return an error if the protocol is not found

The doc of decapsulate currently reads: “Returns the original if the passed in address is not found”.

Could you change the behavior of decapsulate to return an error of the protocol is not found? This way we have a way to know the protocol was not in the multiaddr.
Additionally, it forces users of decapsulate to explicitly handle this case when calling it.

feat: validate onion3 addresses

You can currently create invalid onion3 addresses:

   let mut b = [0u8; 35];
   OsRng.fill_bytes(&mut b);
   let addr = multiaddr::Onion3Addr::from((b, 12345u16));
   let invalid = multiaddr!(Onion3(addr));

Resulting in these error logs in tor

Jun  5 10:10:44 thor Tor[692]: Service address "pd6sf3mqkkkfrn4rk5odgcr2j5sn7m523a4tm7pzpuotk2b7rpuhaeym" invalid checksum.
Jun  5 10:10:44 thor Tor[692]: Invalid onion hostname pd6sf3mqkkkfrn4rk5odgcr2j5sn7m523a4tm7pzpuotk2b7rpuhaeym.onion; rejecting

Question: Should we be validating the checksum and/or the public key in this library? Defined as follows:

  onion_address = base32(PUBKEY | CHECKSUM | VERSION) + ".onion"
     CHECKSUM = H(".onion checksum" | PUBKEY | VERSION)[:2]

     where:
       - PUBKEY is the 32 bytes ed25519 master pubkey of the hidden service.
       - VERSION is a one byte version field (default value '\x03')
       - ".onion checksum" is a constant string
       - CHECKSUM is truncated to two bytes before inserting it in onion_address

source: https://github.com/torproject/torspec/blob/main/rend-spec-v3.txt#LL2258C6-L2258C6

This would introduce a number of dependencies: base32(edit: not required due to existing data-encoding dep), sha3 and an ed25519 library if we decided to validate the PUBKEY.

In many cases the user can leave it to tor, but you may not want to e.g. include the address in a database if it is invalid.

I wanted thoughts on if this is in scope for this library or if this validation should be left up to the user.

I'd be happy to work on a PR for this if this is in scope. Changes should be fairly minor: Fallible TryFrom impl, decoding the address and verifying the checksum, version and possibly the ed25519 key, and adding some new tests.

Should we add an empty MA to the failure test?

Seems that the spec doesn't allow an empty MA.

Human-readable multiaddr: (/<protoName string>/<value string>)+
Example: /ip4/127.0.0.1/udp/1234
Machine-readable multiaddr: (<protoCode uvarint><value []byte>)+
Same example: 0x4 0x7f 0x0 0x0 0x1 0x91 0x2 0x4 0xd2
Values are usually length-prefixed with a uvarint

Add Multiaddr::starts_with

There is Multiaddr::ends_with, but for matching addresses with/without P2P ending it would be convenient to have Multiaddr::start_with method as well.

Build fails on rust stable due to CID crate

$ cargo --version
cargo 1.42.0 (86334295e 2020-01-31)
$ git rev-parse --short HEAD
9682623
$ cargo build
   Compiling multiaddr v0.3.1 (/Users/jameshageman/Documents/Github/school/fydp/rust-multiaddr)
warning: trait objects without an explicit `dyn` are deprecated
  --> src/errors.rs:13:22
   |
13 |     ParsingError(Box<error::Error + Send + Sync>),
   |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use `dyn`: `dyn error::Error + Send + Sync`
   |
   = note: `#[warn(bare_trait_objects)]` on by default

warning: trait objects without an explicit `dyn` are deprecated
  --> src/errors.rs:34:32
   |
34 |     fn cause(&self) -> Option<&error::Error> {
   |                                ^^^^^^^^^^^^ help: use `dyn`: `dyn error::Error`

warning: use of deprecated item 'std::error::Error::description': use the Display impl or to_string()
  --> src/errors.rs:18:21
   |
18 |         f.write_str(error::Error::description(self))
   |                     ^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(deprecated)]` on by default

error[E0277]: the trait bound `cid::Error: std::error::Error` is not satisfied
  --> src/errors.rs:50:33
   |
50 |         Error::ParsingError(err.into())
   |                                 ^^^^ the trait `std::error::Error` is not implemented for `cid::Error`
   |
   = note: required because of the requirements on the impl of `std::convert::From<cid::Error>` for `std::boxed::Box<dyn std::error::Error + std::marker::Send + std::marker::Sync>`
   = note: required because of the requirements on the impl of `std::convert::Into<std::boxed::Box<dyn std::error::Error + std::marker::Send + std::marker::Sync>>` for `cid::Error`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0277`.
error: could not compile `multiaddr`.

To learn more, run the command again with --verbose.

Just thought I'd flag the build failure - it looks like #37 is addressing it.

Making the `/p2p` protocol type-safe

The /p2p protocol can only be followed by a PeerId: https://github.com/libp2p/specs/blob/master/addressing/README.md#the-p2p-multiaddr

As per spec, a PeerId can only be a multihash with either SHA256 or the identity hash in case the encoded public key is less than 42 bytes.

Currently, /p2p exposes a Multihash which is more general than that.

I see several options of how we can improve this situation:

  1. Extract the PeerId type from rust-libp2p into a dedicated crate and have multiaddr depend on that
  2. Move the entire identity module of libp2p-core into its own crate (keys + PeerId)
  3. Move the rust-libp2p PeerId type into multiaddr
  4. Define a minimal PeerId type within multiaddr that encodes the above invariants

All of the above have their pros and cons.

Extracting PeerId into its own crate feels a bit odd because it would be a very small crate. On the other hand, it would encode a very important concept in a concise form so it may be worth it.

The next step up would be a crate that encapsulates everything around keys in libp2p into its own crate, i.e. libp2p-identity. That is basically this module + the peer_id module.

Personally, I'd be in favor of option (2). I think it makes sense to break out this part into a separate crate. We can heavily feature-flag that one to the point where the multiaddr crate itself only depends on the bits that define the PeerId and doesn't come with any other dependencies.

Input welcome!

cc @mxinden @dignifiedquire

Consider an inline representation for small multiaddrs

Similar to multiformats/rust-multihash#47

Using the same approach, for a total size of 64 bytes, you get 62 bytes of data. That is enough to store a multiaddr like /ip4/<32 bit addr>/tcp/<port>/ipfs/<256 bit node hash> or even /ip6/<128 bit addr>/tcp/<port>/ipfs/<256 bit node hash> inline, which should be quite useful.

Rust-libp2p nodes frequently store peer ids multiple times, so this would be a nice optimisation.

Let's make fewer breaking changes

A Multiaddr is a core primitive of libp2p and thus an almost guaranteed dependency of everything libp2p. Every breaking change in this library will trigger a cascade of breaking changes across the ecosystem because Multiaddr constantly appears in public type signatures.

Just within our own crates, a breaking change here causes a breaking change in: libp2p-core -> libp2p-swarm -> libp2p-identify (for example) -> libp2p. There are several users of libp2p that build their own libraries on top which continues the chain.

We should do everything possible to:

  • Harden the API against breaking changes
  • Only make them very selectively (4 breaking changes in 1 year is pretty bad)

After a quick survey, I see the following problems:

  • Protocol is not non_exhaustive: There will always be more protocols, we should future proof the API for that.
  • multihash is re-exported and part of the public API despite being < 1.0: This is a problem. A fundamental crate like multiaddr should only have stable dependencies in its API.

I am not sure how to deal with multihash. It is useful to represent Certhash and P2p in a type-safe way. One thing we can do is just not update to the latest version as quickly unless we actually need a specific feature.

Update: After looking at multihash in more detail and opening and issue there, I think the best way forward would be to split multihash into two crates: one for the definition of multihash and one for all the implementations of hash algorithms, the custom derive, etc.

Re-export libp2p_identity::PeerId?

While updating a project with dependencies on multiaddr and multihash, I noticed that libp2p_identity::PeerId isn't re-exported but is in two public interfaces: MultiAddr::with_p2p and Protocol::P2p. Any objections to a re-export (or a wrapper tuple struct with Deref/DerefMut)? I'm happy to make a PR, but I figured I'd ask first as it has dependency/API ramifications.

I know the libp2p-identity crate is small so I could add it, but it'd be nice to have this library be self contained. I have an API that looks like a tiny bit of IPFS, which is why I want address info without actually pulling in libp2p proper.

`Multiaddr::from_bytes` panics on output of `to_bytes()`/`.as_slice()`

Wanted to add parsing support for an external library, added a test, found that:

        let address: Multiaddr = "/ipv4/127.0.0.1".parse().unwrap();
        let wtf = Multiaddr::from_bytes(address.to_bytes()).unwrap();

panics. Same with as_slice().to_vec(). Is that expected behavior?

The docs don't really state/show how these APIs are supposed to be used, neither do any tests show that any output of the bytes*-functions of Multiaddr can be parsed by from_bytes...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.