Coder Social home page Coder Social logo

libp2p / py-libp2p Goto Github PK

View Code? Open in Web Editor NEW
444.0 30.0 100.0 2.5 MB

The Python implementation of the libp2p networking stack ๐Ÿ [under development]

Home Page: https://libp2p.io

License: Other

Python 97.78% Makefile 0.80% Shell 0.26% Go 1.16%

py-libp2p's Introduction

py-libp2p

Join the chat at https://gitter.im/py-libp2p/Lobby Build Status PyPI version Python versions Docs build Freenode Matrix Discord

py-libp2p hex logo

WARNING

py-libp2p is an experimental and work-in-progress repo under development. We do not yet recommend using py-libp2p in production environments. Right now, tests_interop are turned off for CI, and a number of tests are failing. WIP.

The Python implementation of the libp2p networking stack

Read more in the documentation on ReadTheDocs. View the release notes.

Maintainers

Currently maintained by @pacrob and @dhuseby, looking for assistance!

Note that tests/core/test_libp2p/test_libp2p.py contains an end-to-end messaging test between two libp2p hosts, which is the bulk of our proof of concept.

Feature Breakdown

py-libp2p aims for conformity with the standard libp2p modules. Below is a breakdown of the modules we have developed, are developing, and may develop in the future.

Legend: ๐Ÿ Done ย  ๐Ÿ‹ In Progress ย  ๐Ÿ… Missing ย  ๐ŸŒฐ Not planned

libp2p Node Status
libp2p ๐Ÿ
Identify Protocol Status
Identify ๐Ÿ‹
Transport Protocols Status
TCP ๐Ÿ
UDP ๐Ÿ…
WebSockets ๐ŸŒฐ
UTP ๐ŸŒฐ
WebRTC ๐ŸŒฐ
SCTP ๐ŸŒฐ
Tor ๐ŸŒฐ
i2p ๐ŸŒฐ
cjdns ๐ŸŒฐ
Bluetooth LE ๐ŸŒฐ
Audio TP ๐ŸŒฐ
Zerotier ๐ŸŒฐ
QUIC ๐ŸŒฐ
Stream Muxers Status
multiplex ๐Ÿ
yamux ๐Ÿ…
benchmarks ๐ŸŒฐ
muxado ๐ŸŒฐ
spdystream ๐ŸŒฐ
spdy ๐ŸŒฐ
http2 ๐ŸŒฐ
QUIC ๐ŸŒฐ
Protocol Muxers Status
multiselect ๐Ÿ
Switch (Swarm) Status
Switch ๐Ÿ
Dialer stack ๐Ÿ
Peer Discovery Status
bootstrap list ๐Ÿ…
Kademlia DHT ๐ŸŒฐ
mDNS ๐ŸŒฐ
PEX ๐ŸŒฐ
DNS ๐ŸŒฐ
Content Routing Status
Kademlia DHT ๐ŸŒฐ
floodsub ๐Ÿ
gossipsub ๐Ÿ
PHT ๐ŸŒฐ
Peer Routing Status
Kademlia DHT ๐ŸŒฐ
floodsub ๐Ÿ
gossipsub ๐Ÿ
PHT ๐ŸŒฐ
NAT Traversal Status
nat-pmp ๐ŸŒฐ
upnp ๐ŸŒฐ
ext addr discovery ๐ŸŒฐ
STUN-like ๐ŸŒฐ
line-switch relay ๐ŸŒฐ
pkt-switch relay ๐ŸŒฐ
Exchange Status
HTTP ๐ŸŒฐ
Bitswap ๐ŸŒฐ
Bittorrent ๐ŸŒฐ
Consensus Status
Paxos ๐ŸŒฐ
Raft ๐ŸŒฐ
PBTF ๐ŸŒฐ
Nakamoto ๐ŸŒฐ

Explanation of Basic Two Node Communication

Core Concepts

(non-normative, useful for team notes, not a reference)

Several components of the libp2p stack take part when establishing a connection between two nodes:

  1. Host: a node in the libp2p network.
  2. Connection: the layer 3 connection between two nodes in a libp2p network.
  3. Transport: the component that creates a Connection, e.g. TCP, UDP, QUIC, etc.
  4. Streams: an abstraction on top of a Connection representing parallel conversations about different matters, each of which is identified by a protocol ID. Multiple streams are layered on top of a Connection via the Multiplexer.
  5. Multiplexer: a component that is responsible for wrapping messages sent on a stream with an envelope that identifies the stream they pertain to, normally via an ID. The multiplexer on the other unwraps the message and routes it internally based on the stream identification.
  6. Secure channel: optionally establishes a secure, encrypted, and authenticated channel over the Connection.
  7. Upgrader: a component that takes a raw layer 3 connection returned by the Transport, and performs the security and multiplexing negotiation to set up a secure, multiplexed channel on top of which Streams can be opened.

Communication between two hosts X and Y

(non-normative, useful for team notes, not a reference)

Initiate the connection: A host is simply a node in the libp2p network that is able to communicate with other nodes in the network. In order for X and Y to communicate with one another, one of the hosts must initiate the connection. Let's say that X is going to initiate the connection. X will first open a connection to Y. This connection is where all of the actual communication will take place.

Communication over one connection with multiple protocols: X and Y can communicate over the same connection using different protocols and the multiplexer will appropriately route messages for a given protocol to a particular handler function for that protocol, which allows for each host to handle different protocols with separate functions. Furthermore, we can use multiple streams for a given protocol that allow for the same protocol and same underlying connection to be used for communication about separate topics between nodes X and Y.

Why use multiple streams?: The purpose of using the same connection for multiple streams to communicate over is to avoid the overhead of having multiple connections between X and Y. In order for X and Y to differentiate between messages on different streams and different protocols, a multiplexer is used to encode the messages when a message will be sent and decode a message when a message is received. The multiplexer encodes the message by adding a header to the beginning of any message to be sent that contains the stream id (along with some other info). Then, the message is sent across the raw connection and the receiving host will use its multiplexer to decode the message, i.e. determine which stream id the message should be routed to.

py-libp2p's People

Contributors

alexh avatar carver avatar cburgdorf avatar chihchengliang avatar csunny avatar davesque avatar dependabot[bot] avatar dhuseby avatar dmuhs avatar fselmo avatar hukkinj1 avatar jorropo avatar kclowes avatar libp2p-mgmt-read-write[bot] avatar mhchia avatar nic619 avatar njgheorghita avatar pacrob avatar pipermerriam avatar ralexstokes avatar reedsa avatar robzajac avatar shadowjonathan avatar stuckinaboot avatar swedneck avatar tranlv avatar web-flow avatar wolfgang avatar zaibon avatar zixuanzh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

py-libp2p's Issues

Implement SECIO security upgrade

A security transport upgrade can upgrade a connection to an encrypted and authenticated connection. The default libp2p security transport will be SECIO. One can start from implementing the security interface as defined here and then implementing SECIO in secio.py. upgrade_security in transport/upgrader.py can then be implemented to enable security upgrade.

As we are currently focusing on PubSub, we will not be implementing SECIO till later.

Plan to move this repo into the libp2p org

Hey all! It was great to meet @zixuanzh and @stuckinaboot at devcon4. I enjoyed our brainstorming and hacking sessions on py-libp2p.

We agreed with @pipermerriam that moving this repo to the libp2p organisation is appropriate, once the PoC was delivered and the EF grant is approved. This will help discoverability and will encourage people to contribute, thus having better chances at sparking a community. In fact, there's a thread of Python contributors willing to help.

I'm capturing some thoughts on the path to migrating this repo to the libp2p org.

  • A NOTICE on the frontpage warning that this is an experimental and work-in-progress repo under heavy development.
  • An sponsorship acknowledgement to the Ethereum Foundation on the frontpage once the grant becomes official.
  • README formatted according to https://github.com/RichardLitt/standard-readme, which is what we lean towards in the libp2p/IPFS universe.
  • A breakdown of features and development status/estimation, in a formatted table. This helps people assess the maturity and lend a hand in those features that are WIP. Maybe a bunch of issues tagged help-wanted to lessen the decision burden for willing contributors.
  • Evaluate existing py-multiaddr implementations, continue maintaining those in the multiformats org: multiformats/multiaddr#7 (comment)
  • Continuous Integration.
  • Badges pointing to the libp2p IRC channel, CI, and else.

Green Apple Status

Update the status of the green apples in the README to reflect the current state of the repo.

Complete all Notify interface methods

The following interface methods in Notify have not been implemented as they are not crucial to PubSub PoC.

  • disconnected
  • closed_stream
  • listen_close

They should be implemented in swarm.py however this will not be possible before #136 lands.

Converting examples/chat/chat.py into a pytest test case

The example at examples/chat/chat.py is a useful benchmark of correctness. It would be nice to have this as a runnable pytest test case (maybe chat over a few messages in varying orders?) to avoid running it manually upon code changes.

Adding logging

It would be nice for py-libp2p to do proper logging for network events and other noteworthy events through a standard Python logging framework.

PeerStore Persistant Storage

Following the precedent set by go (https://github.com/libp2p/go-libp2p-peerstore/), py-libp2p should have an implementation for PeerStore that stores all peer store related data in-memory (already done) and an implementation that stores all peer store related data in persistent storage, which would allow for clients to restart their libp2p instance while maintaining the same peer store (needs to be done).

Improved peer ID construction and usage

As mentioned in #64, the py-libp2p notion of a peer ID should mirror the Go libp2p notion by taking the SHA256 hash of the public key generated for the host. More generally, py-libp2p should have a cleaner abstraction for dealing with peer IDs and they should be used more consistently in the library (instead of using simplistic peer IDs like certain parts of the code currently use).

Add timeout to mplex_stream's read()

There should be a timeout parameter to mplex_stream's read() method that propogates down to the low level call's to the underlying read call on the raw connection

Wire up and test close()

Currently, we have close() in tcp.py, mplex.py, mplex_stream.py, net_stream.py but they are not properly wired up. We want to be able to do the following in swarm.py.

  • close_peer(peer_id): close all connections to a given peer
  • tearDown(): close all listeners by calling something like transport.listener.close() and close all connections. It should also prevent new connections and/or listeners from being added to the swarm.

We would also want to have test cases on close() in muxed_stream().

Overall structure.

Usually python libs are structured in following manner:

โ”œโ”€โ”€ examples
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ **setup.py**
โ”œโ”€โ”€ tests
โ”œโ”€โ”€ **libp2p**

Note, that requirements.txt is usually discouraged for libraries, it is more like application level dependency definition facility. Also, all the relevant contents are packed to the single folder, which name goes to packages=... arg for setuptools.setup function.
Proposed structure is simply cleaner to follow and use, for instance once you got your python setup.py develop you get

  • "Installed" version of your lib, so no need of writing weird things like:
$ python examples/chat/chat.py 
Traceback (most recent call last):
  File "examples/chat/chat.py", line 8, in <module>
    from libp2p.libp2p import *
ImportError: No module named 'libp2p'
  • Auto update of code while developing.
  • You actually need setup.py in order to pack your code as a PyPI package.

I can help on that, if needed.

Closing network resources in tests, adding close functionality

Currently no test does any kind of cleanup (e.g. by calling close methods of the underlying connections opened by a host). This balloons into errors (but not test case failures) of the "Task was destroyed but it is pending!" type in CI.

  • py-libp2p should do cleanup in tests.
  • py-libp2p should add close methods to host and network instances in the style of go-libp2p.

from vs. from_id in RPC

Following the RPC precedent, all messages should be serialized the same way across libp2p implementations. However, in python, from is a reserved word so we use from_id.

  • Investigate if there is a way to get around this

Adding documentation for what is currently completed

Hi,

It was nice meeting some of you at Devcon. It's great to see people our age (I'm a university student too!) do great things in the community.

This might be a tad bit early, but, it would be great to have documentation to get people trying to use py-libp2p in their projects and have a tutorial.

Python has several libraries that can do docs automatically. Do you guys have any thoughts on which ones to use?

MulitAddr to_dict() decapsulate()

Feature Request

We want the following two methods in MultiAddr.py as used in TCP.py
ref: https://github.com/multiformats/js-multiaddr/blob/master/src/index.js

  • to_dict() (equivalent of toOptions())
"""
convert multiaddr to dictionary format for easy access
:return multiaddr_dict: multiaddr in dictionary format for easy access
"""
  • decapsulate(key)
"""
remove multiaddr from another multiaddr by protocol_id
:return multiaddr: multiaddr after modification
"""

similar usage: https://github.com/libp2p/js-libp2p-tcp/blob/master/src/index.js
actual usage: https://github.com/zixuanzh/py-libp2p/blob/ae48708e9c4fe9fa8f1a3b9b9b2f7fbb114e7519/network/tcp.py#L27

We might also need get_ipfs_id() in the future.

Chat example errors when I run with python3.6.3

There has some errors in the code of chat example when i run with python3.6, and the error just like this

python3 chat.py --port 3001 -d /ip4/127.0.0.1/tcp/3000/p2p/2d37wbGvQzbAQ84yRouh2m2vBKkN8s5AfH9Q75HZRCUQmJW7yAVSNKzjJj6gcjE2mDNDUHCichXWdMH3S2c8AaDLm3kXmf5R8D5dDg7mMksSs28iX8EnFs7nAHEg6T2xFcg35M8sNmf7arzjFoMW8Vseaykx9c21Hy5RiTxmGZw12urzdAZHgrWZptnVR6Qsr3HAuuKqdENr1PxEjdVdmBL3Gi6iNhYeED9PWLi7noNFF8bvfSY5wmY6ys7oXABtJXeu8dC6Jd5sm3nvUDRFXarUXNLfpAQNRDyuVGvDUVR3DRQVw3btt9pQXmmmaHwzFtxxL5qijZqgbrRdmFrdC5HgyiYs8rB5hbSDpCeRNaD8BtMdtyzSAn8Qt8bTNSYSeBsGveFQU8ovHQ4gLFmKkBdvVLe5pFVdRAnVN6GFAZrWQdWsUh6kEhy2q7R2zRjK6UF4oL5dqmw9eoWmXgzNYfmji1oQSx6DxNNpcdvKM7apEzeX6KQANRbLgn7VkGwoU3eZv2qoHEzWpULp5K6i4hz7Tab12ovct9uQKSFdCoKE7Qp34dhR5sExhns3GyytwiqfySZ7YAsctNRPAwoTTLNGBN3ZQfD4Agvf2Ck
timeout!
Task exception was never retrieved
future: <Task finished coro=<read_data() done, defined at chat.py:16> exception=KeyError(1532609824184786476,)>
Traceback (most recent call last):
  File "chat.py", line 18, in read_data
    read_string = (await stream.read()).decode()
  File "/Users/magic/workspace/data/www/GitHub/py-libp2p/network/stream/net_stream.py", line 28, in read
    return await self.muxed_stream.read()
  File "/Users/magic/workspace/data/www/GitHub/py-libp2p/stream_muxer/mplex/mplex_stream.py", line 43, in read
    return await self.mplex_conn.read_buffer(self.stream_id)
  File "/Users/magic/workspace/data/www/GitHub/py-libp2p/stream_muxer/mplex/mplex.py", line 48, in read_buffer
    data = self.buffers[stream_id]
KeyError: 1532609824184786476

The hostA is ok, but when i run hostB, the error raise.

Adding peer discovery using Kad DHT

py-libp2p should implement peer discovery based on the interfaces added in #123 and conforming to standard libp2p implementations of peer discovery (generally using the Kademlia DHT, see go-libp2p and js-libp2p).

@zaibon we saw that you had some work around this in #123. Are you still planning on submitting a PR, and if so on what timeline? We're happy to help and would welcome the addition. Thanks!

Minimum libp2p requirements

I just wanted to cross-link this post: ethresearch/p2p#4 (comment)

It appears to define the subset of libp2p features needed by eth2.0. I thought you might find it useful. My feelings won't be hurt if I'm wrong and you close this issue without explanation. :)

Assigning port numbers in tests to avoid sharing ports

As mentioned in #82, until we move to mocking network communication, we should assign ports to hosts in our test cases in a more canonical way which avoids:

  1. different tests trying to communicate on the same port
  2. assigning port numbers ad-hoc

One way to do this is to have tests share a _next_port function which gives a monotonically increasing port number.

Cleanup unnecessary outdated branches

There are a number of branches lingering around that need to be deleted from remote. We need to determine which are necessary to keep around and which can be deleted.

Post PoC design enhancements

Our PoC makes simplifying design choices that will be improved in the next py-libp2p iteration. These tasks are sorted roughly in decreasing order of priority.

  • handle_incoming() should write to a blocking queue buffer (instead of bytearray) and read_buffer() should block on read from the corresponding buffer. Consequently, handle_incoming() can be repeatedly scheduled to run in the background as an asyncio task (instead of being invoked on read_buffer calls) @alexh
  • Protocol muxing: protocol IDs should be sent over the wire and picked up by the receiving side - this way we can multiplex protocols over the same raw connection, multistream-select. @stuckinaboot
  • refactor muxed_connection to mplex @zixuanzh
  • stream IDs should be generated more conventionally - the current design tramples a uuid into a 64 bit representation. @robzajac
  • more robust error handling, propagating exceptions properly @robzajac
  • better style around constants used in the code, such as the handle_incoming() timeout and the assumed protocol ID. (Perhaps a constants file?)
  • multiple transports should be supported in Swarm

More inspiration can be found in the TODOs of the source code.

Update read logic to support multiple streams over one connection

Read logic should be updated to support multiple streams over one connection. Current logic fails in the case I have streams open with A and B over one connection. A sends me a message then B sends me a message. I call read on my stream with B. Currently, we would pull off the message from A from the connection then the read call would fail.

This will be fixed by reading all available messages in handle_incoming.

PEP8 compliance

In perusing some of the code, I noticed a fair number of camelcase function names. It would be nice to integrate flake8 into the CI process in order to keep code style consistent.

Implement Gossipub

Implement Gossipsub following the pubsub spec with the go implementation as reference.

Here is what needs to be done:

Pubsub adjustments:
Modifications to pubsub.py that are necessary to allow gossipsub to function properly

  • modify continuously_read_stream of pubsub.py to process gossipsub RPCs (go reference in handleIncomingRPC)

Message Processing:
Handle new RPC calls and implement gossipsub-specific publish

  • modify protobuf to include control messages
  • control messages GRAFT and PRUNE handled
  • control message IHAVE(ids) handled
  • control message IWANT(ids) handled
  • messages properly forwarded to every peer in peers.floodsub[topic] and every peer in mesh[topic]

Calls on Gossipsub Router
Handle router subscribing/unsubscribing from topics

  • JOIN(topic) properly makes gossipsub router join the topic
  • LEAVE(topic) properly makes gossipsub router leave the topic

Heartbeat Procedure
The router periodically runs a heartbeat procedure as defined in the spec

  • maintain mesh
  • maintain fanout
  • emit gossip

Data structures
These follow the spec although a judgment will need to be made if they are entirely necessary for V1

  • mcache

Pre-benchmarking requirements

In order to perform benchmark tests on various network topologies running floodsub, the following must be completed to prevent memory and latency issues (as well as serving as enhancements):

  • change hardcoded timeout in muxed conn to be an argument
  • make seqno a 64-bit big endian int
  • make seen messages list in pubsub into an LRU cache
  • remove unnecessary print statements, such as in floodsub.py

Floodsub RPC seqno

As per #141 , seqno in RPC message should be a linearly increasing number based on the spec but instead it is currently just a uuid.

PubsubRouter interface handle_rpc function

The PubsubRouter interface handle_rpc function takes in two parameters, an RPC and the sender peer ID of that message. In go, the equivalent HandleRPC function takes in only a single parameter, which is the RPC. Somehow go manages to extract a from peer ID from this RPC even though protobuf specifies no from property on the RPC object itself.

Given this, we use the router defined above. This issue should be discussed with the relevant parties responsible for maintaining the interfaces and pubsub protobuf.

Cjdns not planned ?

I saw that Cjdns is not planned, but Cjdns isn't like tor or i2p :

  • tor are not very "ip" compatible, it use some ip proxy protocol, that require a program to implement support of any supported proxy (socks 4/5) (or tor itself). (you have some exception but there are too rarely used to be considerate here)
  • i2p is better cause you can create a fake ip of an other i2p member (example : you can translate somethings.i2p to 127.4.76.243 locally) (you also have proxy option) so even if a program doesn't support proxy, that not a problem. But that require an intervention on the administration page of your router, so that require work for be implemented (create an API client making that for us (or use proxy)).
  • Cjdns ? Emulate a FULL ip network, Cjdns implement all ip option (I'm not sure for anycast).
    But there is more, the way that he implement that, the connection from ip to Cjdns is a TUN interface, so if your program use layer 3 or more and use the kernel routing table (that will probably will the case for py-libp2p), you don't have ANY work for be Cjdns compatible.
    Because any Cjdns address is like any ip address, Cjdns address are a sub section of ip address (0xfc) and all connection to Cjdns are made by an external daemon getting all packet from the TUN interface and do what it must do.

So for me saying : Cjdns not planned is an error, because you don't need to support Cjdns, Cjdns is fully ip compatible.

This is why I ask :
Why saying that ?
Would you someone to write a test for verifying that all ip compatible transport are cjdns compatible (if needed I will ๐Ÿ˜„ that is quite simple) ?
Or is "supporting" means here supporting Cjdns without any external daemon ?

Kademlia Implementation Roadmap

We just landed #129 to work on Kademlia feature by feature. Thanks @zaibon! Here's a summary of what needs to be done in descending order.

  • replace (ip, port, node_id) tuple with multiaddr of ip, port, node_id
  • implement RoutedHost with Kad DHT
  • refactor dial by injecting Router as a dependency
  • implement ADD_PROVIDER and REMOVE_PROVIDER
  • implement content routing with Kademlia
  • implement peer discovery by extending content routing as discussed here and relevant to #128
  • phase out rpcudp with gRPC

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.