libp2p / py-libp2p Goto Github PK

View Code? Open in Web Editor NEW

444.0 30.0 100.0 2.5 MB

The Python implementation of the libp2p networking stack 🐍 [under development]

Home Page: https://libp2p.io

License: Other

Python 97.78% Makefile 0.80% Shell 0.26% Go 1.16%

py-libp2p's Introduction

py-libp2p

WARNING

py-libp2p is an experimental and work-in-progress repo under development. We do not yet recommend using py-libp2p in production environments. Right now, tests_interop are turned off for CI, and a number of tests are failing. WIP.

The Python implementation of the libp2p networking stack

Maintainers

Currently maintained by @pacrob and @dhuseby, looking for assistance!

Note that tests/core/test_libp2p/test_libp2p.py contains an end-to-end messaging test between two libp2p hosts, which is the bulk of our proof of concept.

Feature Breakdown

py-libp2p aims for conformity with the standard libp2p modules. Below is a breakdown of the modules we have developed, are developing, and may develop in the future.

Legend: 🍏 Done 🍋 In Progress 🍅 Missing 🌰 Not planned

libp2p Node	Status
`libp2p`	🍏

Identify Protocol	Status
`Identify`	🍋

Transport Protocols	Status
`TCP`	🍏
`UDP`	🍅
`WebSockets`	🌰
`UTP`	🌰
`WebRTC`	🌰
`SCTP`	🌰
`Tor`	🌰
`i2p`	🌰
`cjdns`	🌰
`Bluetooth LE`	🌰
`Audio TP`	🌰
`Zerotier`	🌰
`QUIC`	🌰

Stream Muxers	Status
`multiplex`	🍏
`yamux`	🍅
`benchmarks`	🌰
`muxado`	🌰
`spdystream`	🌰
`spdy`	🌰
`http2`	🌰
`QUIC`	🌰

Protocol Muxers	Status
`multiselect`	🍏

Switch (Swarm)	Status
`Switch`	🍏
`Dialer stack`	🍏

Peer Discovery	Status
`bootstrap list`	🍅
`Kademlia DHT`	🌰
`mDNS`	🌰
`PEX`	🌰
`DNS`	🌰

Content Routing	Status
`Kademlia DHT`	🌰
`floodsub`	🍏
`gossipsub`	🍏
`PHT`	🌰

Peer Routing	Status
`Kademlia DHT`	🌰
`floodsub`	🍏
`gossipsub`	🍏
`PHT`	🌰

NAT Traversal	Status
`nat-pmp`	🌰
`upnp`	🌰
`ext addr discovery`	🌰
`STUN-like`	🌰
`line-switch relay`	🌰
`pkt-switch relay`	🌰

Exchange	Status
`HTTP`	🌰
`Bitswap`	🌰
`Bittorrent`	🌰

Consensus	Status
`Paxos`	🌰
`Raft`	🌰
`PBTF`	🌰
`Nakamoto`	🌰

Explanation of Basic Two Node Communication

Core Concepts

(non-normative, useful for team notes, not a reference)

Several components of the libp2p stack take part when establishing a connection between two nodes:

Host: a node in the libp2p network.
Connection: the layer 3 connection between two nodes in a libp2p network.
Transport: the component that creates a Connection, e.g. TCP, UDP, QUIC, etc.
Streams: an abstraction on top of a Connection representing parallel conversations about different matters, each of which is identified by a protocol ID. Multiple streams are layered on top of a Connection via the Multiplexer.
Multiplexer: a component that is responsible for wrapping messages sent on a stream with an envelope that identifies the stream they pertain to, normally via an ID. The multiplexer on the other unwraps the message and routes it internally based on the stream identification.
Secure channel: optionally establishes a secure, encrypted, and authenticated channel over the Connection.
Upgrader: a component that takes a raw layer 3 connection returned by the Transport, and performs the security and multiplexing negotiation to set up a secure, multiplexed channel on top of which Streams can be opened.

Communication between two hosts X and Y

(non-normative, useful for team notes, not a reference)

Initiate the connection: A host is simply a node in the libp2p network that is able to communicate with other nodes in the network. In order for X and Y to communicate with one another, one of the hosts must initiate the connection. Let's say that X is going to initiate the connection. X will first open a connection to Y. This connection is where all of the actual communication will take place.

Communication over one connection with multiple protocols: X and Y can communicate over the same connection using different protocols and the multiplexer will appropriately route messages for a given protocol to a particular handler function for that protocol, which allows for each host to handle different protocols with separate functions. Furthermore, we can use multiple streams for a given protocol that allow for the same protocol and same underlying connection to be used for communication about separate topics between nodes X and Y.

Why use multiple streams?: The purpose of using the same connection for multiple streams to communicate over is to avoid the overhead of having multiple connections between X and Y. In order for X and Y to differentiate between messages on different streams and different protocols, a multiplexer is used to encode the messages when a message will be sent and decode a message when a message is received. The multiplexer encodes the message by adding a header to the beginning of any message to be sent that contains the stream id (along with some other info). Then, the message is sent across the raw connection and the receiving host will use its multiplexer to decode the message, i.e. determine which stream id the message should be routed to.

py-libp2p's People

Contributors

Stargazers

Watchers

Forkers

mikerah raulk zaibon upperwal csunny ngetahun dheatovwil magniff jorropo aliabbasmerchant hskang9 gsmadi dipinjithanspal aratz-lasa ralexstokes zimio iletyz nic619 mhchia eigendude abn jqk6 jayd2446 razi-rais chihchengliang lithp abenben swedneck aleph-im epth dmuhs phayaran carver asiergarziapersonalcloud supertanglang shadowjonathan manojbetma kawmaiparis juniuszhou pipermerriam joshua41ghz pabraksas swellander g-r-a-n-t terrorizer1980 karmadonov jtatman noshellposh hoh isabella232 sts0mrg0 hashcashier alejandroalffer youngqqcn wolfgang mpostovoy kashori lanicon listen-lavender chlin501 fahidzahid19 weichen86 mohan-zhang-u maxvonhippel it9good allanmangeni bhaskarvilles jacquelinevv0693 esotericbyte lrdickson tylerferrara salvivona gavinljj jazzz nloadholtes starrysky1986 sukunrt samkenxstream varshalohani kupacariibumu abhishekdhamale lorancecall mxito3 sherifahmed990 sigireddybalasai sffej scitecpop andre-beautrait beautrait korbajan moresearch andreasmhahn kustomzone pacrob bloksmith dhuseby simbahebinbo randymcmillan wesley-yang

py-libp2p's Issues

Implement SECIO security upgrade

A security transport upgrade can upgrade a connection to an encrypted and authenticated connection. The default libp2p security transport will be SECIO. One can start from implementing the security interface as defined here and then implementing SECIO in secio.py. upgrade_security in transport/upgrader.py can then be implemented to enable security upgrade.

As we are currently focusing on PubSub, we will not be implementing SECIO till later.

Plan to move this repo into the libp2p org

Hey all! It was great to meet @zixuanzh and @stuckinaboot at devcon4. I enjoyed our brainstorming and hacking sessions on py-libp2p.

We agreed with @pipermerriam that moving this repo to the libp2p organisation is appropriate, once the PoC was delivered and the EF grant is approved. This will help discoverability and will encourage people to contribute, thus having better chances at sparking a community. In fact, there's a thread of Python contributors willing to help.

I'm capturing some thoughts on the path to migrating this repo to the libp2p org.

A NOTICE on the frontpage warning that this is an experimental and work-in-progress repo under heavy development.
An sponsorship acknowledgement to the Ethereum Foundation on the frontpage once the grant becomes official.
~~README formatted according to https://github.com/RichardLitt/standard-readme, which is what we lean towards in the libp2p/IPFS universe.~~
A breakdown of features and development status/estimation, in a formatted table. This helps people assess the maturity and lend a hand in those features that are WIP. Maybe a bunch of issues tagged help-wanted to lessen the decision burden for willing contributors.
Evaluate existing py-multiaddr implementations, continue maintaining those in the multiformats org: multiformats/multiaddr#7 (comment)
Continuous Integration.
Badges pointing to the libp2p IRC channel, CI, and else.

Green Apple Status

Update the status of the green apples in the README to reflect the current state of the repo.

Complete all Notify interface methods

The following interface methods in Notify have not been implemented as they are not crucial to PubSub PoC.

disconnected
closed_stream
listen_close

They should be implemented in swarm.py however this will not be possible before #136 lands.

Converting examples/chat/chat.py into a pytest test case

The example at examples/chat/chat.py is a useful benchmark of correctness. It would be nice to have this as a runnable pytest test case (maybe chat over a few messages in varying orders?) to avoid running it manually upon code changes.

Adding logging

It would be nice for py-libp2p to do proper logging for network events and other noteworthy events through a standard Python logging framework.

Network Interface should expose listen interface

References:

https://github.com/libp2p/go-libp2p/blob/4dc78663d603e5cf9d19f21f699a5230d0c25211/config/config.go#L153

https://github.com/libp2p/go-libp2p-net/blob/11b9dd9287bf6b9944c4e77d941b4771a6179678/interface.go#L134-L142

PeerStore Persistant Storage

Following the precedent set by go (https://github.com/libp2p/go-libp2p-peerstore/), py-libp2p should have an implementation for PeerStore that stores all peer store related data in-memory (already done) and an implementation that stores all peer store related data in persistent storage, which would allow for clients to restart their libp2p instance while maintaining the same peer store (needs to be done).

Refactor PeerInfo into its own class

PeerInfo needs to be its own class as defined in the interface where a PeerInfo object is used. Do we still need to keep PeerData in this case?

Reference:
https://github.com/libp2p/interface-connection

Improved peer ID construction and usage

As mentioned in #64, the py-libp2p notion of a peer ID should mirror the Go libp2p notion by taking the SHA256 hash of the public key generated for the host. More generally, py-libp2p should have a cleaner abstraction for dealing with peer IDs and they should be used more consistently in the library (instead of using simplistic peer IDs like certain parts of the code currently use).

Add timeout to mplex_stream's read()

There should be a timeout parameter to mplex_stream's read() method that propogates down to the low level call's to the underlying read call on the raw connection

Multiaddr Python Implementation

Reference:

https://godoc.org/github.com/multiformats/go-multiaddr

Existing Python Implementations are not fully featured

https://github.com/sbuss/py-multiaddr

Implement chat poc

Example:

https://github.com/libp2p/go-libp2p-examples/tree/master/chat

Wire up and test close()

Currently, we have close() in tcp.py, mplex.py, mplex_stream.py, net_stream.py but they are not properly wired up. We want to be able to do the following in swarm.py.

close_peer(peer_id): close all connections to a given peer
tearDown(): close all listeners by calling something like transport.listener.close() and close all connections. It should also prevent new connections and/or listeners from being added to the swarm.

We would also want to have test cases on close() in muxed_stream().

Overall structure.

Usually python libs are structured in following manner:

├── examples
├── LICENSE
├── README.md
├── **setup.py**
├── tests
├── **libp2p**

Note, that requirements.txt is usually discouraged for libraries, it is more like application level dependency definition facility. Also, all the relevant contents are packed to the single folder, which name goes to packages=... arg for setuptools.setup function.
Proposed structure is simply cleaner to follow and use, for instance once you got your python setup.py develop you get

"Installed" version of your lib, so no need of writing weird things like:

$ python examples/chat/chat.py 
Traceback (most recent call last):
  File "examples/chat/chat.py", line 8, in <module>
    from libp2p.libp2p import *
ImportError: No module named 'libp2p'

Auto update of code while developing.
You actually need setup.py in order to pack your code as a PyPI package.

I can help on that, if needed.

fix function info_from_p2p_addr in peerinfo.py

This function used multiaddr.protocols.P_IPFS which is deprecated and can no longer be found from multiaddr source code

[WIP] Write GossipSub tests

Testing GossipSub is more complex than testing FloodSub because there are more moving parts that need to be tested. The go gossipsub test suite sets a solid precedent and so we should follow their suite as a baseline.

Closing network resources in tests, adding close functionality

Currently no test does any kind of cleanup (e.g. by calling close methods of the underlying connections opened by a host). This balloons into errors (but not test case failures) of the "Task was destroyed but it is pending!" type in CI.

py-libp2p should do cleanup in tests.
py-libp2p should add close methods to host and network instances in the style of go-libp2p.

from vs. from_id in RPC

Following the RPC precedent, all messages should be serialized the same way across libp2p implementations. However, in python, from is a reserved word so we use from_id.

Investigate if there is a way to get around this

Add more testing for mplex and mplex_stream

Code coveragefor this module is currently pretty poor.

This will probably involve a lot of mocking

Implement Notify function of Swarm

Allow for nodes to listen for various network-related events (as documented here https://godoc.org/github.com/libp2p/go-libp2p-net#Notifiee).

Adding documentation for what is currently completed

Hi,

It was nice meeting some of you at Devcon. It's great to see people our age (I'm a university student too!) do great things in the community.

This might be a tad bit early, but, it would be great to have documentation to get people trying to use py-libp2p in their projects and have a tutorial.

Python has several libraries that can do docs automatically. Do you guys have any thoughts on which ones to use?

QUESTION: Changing to python-cryptography for encryption ?

All is explained in #116.
I would have your though, must we stay with the slow Crypto are can we switch to Cryptography ?

Implement swarm.py

MulitAddr to_dict() decapsulate()

Feature Request

We want the following two methods in MultiAddr.py as used in TCP.py
ref: https://github.com/multiformats/js-multiaddr/blob/master/src/index.js

to_dict() (equivalent of toOptions())

"""
convert multiaddr to dictionary format for easy access
:return multiaddr_dict: multiaddr in dictionary format for easy access
"""

decapsulate(key)

"""
remove multiaddr from another multiaddr by protocol_id
:return multiaddr: multiaddr after modification
"""

We might also need get_ipfs_id() in the future.

Chat example errors when I run with python3.6.3

There has some errors in the code of chat example when i run with python3.6, and the error just like this

python3 chat.py --port 3001 -d /ip4/127.0.0.1/tcp/3000/p2p/2d37wbGvQzbAQ84yRouh2m2vBKkN8s5AfH9Q75HZRCUQmJW7yAVSNKzjJj6gcjE2mDNDUHCichXWdMH3S2c8AaDLm3kXmf5R8D5dDg7mMksSs28iX8EnFs7nAHEg6T2xFcg35M8sNmf7arzjFoMW8Vseaykx9c21Hy5RiTxmGZw12urzdAZHgrWZptnVR6Qsr3HAuuKqdENr1PxEjdVdmBL3Gi6iNhYeED9PWLi7noNFF8bvfSY5wmY6ys7oXABtJXeu8dC6Jd5sm3nvUDRFXarUXNLfpAQNRDyuVGvDUVR3DRQVw3btt9pQXmmmaHwzFtxxL5qijZqgbrRdmFrdC5HgyiYs8rB5hbSDpCeRNaD8BtMdtyzSAn8Qt8bTNSYSeBsGveFQU8ovHQ4gLFmKkBdvVLe5pFVdRAnVN6GFAZrWQdWsUh6kEhy2q7R2zRjK6UF4oL5dqmw9eoWmXgzNYfmji1oQSx6DxNNpcdvKM7apEzeX6KQANRbLgn7VkGwoU3eZv2qoHEzWpULp5K6i4hz7Tab12ovct9uQKSFdCoKE7Qp34dhR5sExhns3GyytwiqfySZ7YAsctNRPAwoTTLNGBN3ZQfD4Agvf2Ck
timeout!
Task exception was never retrieved
future: <Task finished coro=<read_data() done, defined at chat.py:16> exception=KeyError(1532609824184786476,)>
Traceback (most recent call last):
  File "chat.py", line 18, in read_data
    read_string = (await stream.read()).decode()
  File "/Users/magic/workspace/data/www/GitHub/py-libp2p/network/stream/net_stream.py", line 28, in read
    return await self.muxed_stream.read()
  File "/Users/magic/workspace/data/www/GitHub/py-libp2p/stream_muxer/mplex/mplex_stream.py", line 43, in read
    return await self.mplex_conn.read_buffer(self.stream_id)
  File "/Users/magic/workspace/data/www/GitHub/py-libp2p/stream_muxer/mplex/mplex.py", line 48, in read_buffer
    data = self.buffers[stream_id]
KeyError: 1532609824184786476

The hostA is ok, but when i run hostB, the error raise.

Why not use kademlia as a submodule?

Recently, I want to use py-libp2p in my py-bitcorin project, and i notice this lib replace origin dht with kademlia project. it is a better solution with git submodule, is it?

How to contribute?

I would like to 😄

Go show yourself!

Ok, as soon as we get #105 merged, I think it might be a good idea to publish the lib to PyPI. Usually the actual build and deploy pipeline is configured on CI itself, see https://docs.travis-ci.com/user/deployment/pypi/
Why publish:

Easier to get by potential users
Way more sexy than pulling from git

Adding peer discovery using Kad DHT

py-libp2p should implement peer discovery based on the interfaces added in #123 and conforming to standard libp2p implementations of peer discovery (generally using the Kademlia DHT, see go-libp2p and js-libp2p).

@zaibon we saw that you had some work around this in #123. Are you still planning on submitting a PR, and if so on what timeline? We're happy to help and would welcome the addition. Thanks!

Minimum libp2p requirements

I just wanted to cross-link this post: ethresearch/p2p#4 (comment)

It appears to define the subset of libp2p features needed by eth2.0. I thought you might find it useful. My feelings won't be hurt if I'm wrong and you close this issue without explanation. :)

Assigning port numbers in tests to avoid sharing ports

As mentioned in #82, until we move to mocking network communication, we should assign ports to hosts in our test cases in a more canonical way which avoids:

different tests trying to communicate on the same port
assigning port numbers ad-hoc

One way to do this is to have tests share a _next_port function which gives a monotonically increasing port number.

Validators in Pubsub

Add functionality for validators for particular topics, as seen in the go repo.

Cleanup unnecessary outdated branches

There are a number of branches lingering around that need to be deleted from remote. We need to determine which are necessary to keep around and which can be deleted.

Post PoC design enhancements

Our PoC makes simplifying design choices that will be improved in the next py-libp2p iteration. These tasks are sorted roughly in decreasing order of priority.

handle_incoming() should write to a blocking queue buffer (instead of bytearray) and read_buffer() should block on read from the corresponding buffer. Consequently, handle_incoming() can be repeatedly scheduled to run in the background as an asyncio task (instead of being invoked on read_buffer calls) @alexh
Protocol muxing: protocol IDs should be sent over the wire and picked up by the receiving side - this way we can multiplex protocols over the same raw connection, multistream-select. @stuckinaboot
refactor muxed_connection to mplex @zixuanzh
stream IDs should be generated more conventionally - the current design tramples a uuid into a 64 bit representation. @robzajac
more robust error handling, propagating exceptions properly @robzajac
better style around constants used in the code, such as the handle_incoming() timeout and the assumed protocol ID. (Perhaps a constants file?)
multiple transports should be supported in Swarm

More inspiration can be found in the TODOs of the source code.

Side effect in Libp2p constructor

Because of this line: https://github.com/zixuanzh/py-libp2p/blob/5548041a3740573c5a13ea0091a39cfa1cfc6fc5/libp2p/libp2p.py#L11-L12
where we instantiate a Peerstore object in the Libp2p constructor, it can happens that multiple instance of Libp2p share the same Peerstore, which is not the desired behavior.

Fix can be found in my fork: zaibon@47a831b

Update read logic to support multiple streams over one connection

Read logic should be updated to support multiple streams over one connection. Current logic fails in the case I have streams open with A and B over one connection. A sends me a message then B sends me a message. I call read on my stream with B. Currently, we would pull off the message from A from the connection then the read call would fail.

This will be fixed by reading all available messages in handle_incoming.

Handle unsubscribe (add tests)

Unsubscribe is implemented but tests are needed.

add tests checking that unsubscribe works properly

PEP8 compliance

In perusing some of the code, I noticed a fair number of camelcase function names. It would be nice to integrate flake8 into the CI process in order to keep code style consistent.

Plan to generate doc use sphinx?

This lib hasn't document right now, if is it a plan to use sphinx to generate document?

[Stretch] Network attack simulation

The Ethereum networking layer has been attacked several times. Py-libp2p and more generally libp2p would benefit from protection against common attacks on peer to peer networks. Py-libp2p should simulate attacks on topologies constructed through this library and assess security against these attacks.

Implement Gossipub

Implement Gossipsub following the pubsub spec with the go implementation as reference.

Here is what needs to be done:

Pubsub adjustments:
Modifications to pubsub.py that are necessary to allow gossipsub to function properly

modify continuously_read_stream of pubsub.py to process gossipsub RPCs (go reference in handleIncomingRPC)

Message Processing:
Handle new RPC calls and implement gossipsub-specific publish

modify protobuf to include control messages
control messages GRAFT and PRUNE handled
control message IHAVE(ids) handled
control message IWANT(ids) handled
messages properly forwarded to every peer in peers.floodsub[topic] and every peer in mesh[topic]

Calls on Gossipsub Router
Handle router subscribing/unsubscribing from topics

JOIN(topic) properly makes gossipsub router join the topic
LEAVE(topic) properly makes gossipsub router leave the topic

Heartbeat Procedure
The router periodically runs a heartbeat procedure as defined in the spec

maintain mesh
maintain fanout
emit gossip

Data structures
These follow the spec although a judgment will need to be made if they are entirely necessary for V1

mcache

Test coverage for existing modules

Implement Peer and Peerstore

References:

https://github.com/libp2p/go-libp2p-peerstore
https://github.com/libp2p/go-libp2p-peer

Pre-benchmarking requirements

In order to perform benchmark tests on various network topologies running floodsub, the following must be completed to prevent memory and latency issues (as well as serving as enhancements):

change hardcoded timeout in muxed conn to be an argument
make seqno a 64-bit big endian int
make seen messages list in pubsub into an LRU cache
remove unnecessary print statements, such as in floodsub.py

Floodsub RPC seqno

As per #141 , seqno in RPC message should be a linearly increasing number based on the spec but instead it is currently just a uuid.

PubsubRouter interface handle_rpc function

The PubsubRouter interface handle_rpc function takes in two parameters, an RPC and the sender peer ID of that message. In go, the equivalent HandleRPC function takes in only a single parameter, which is the RPC. Somehow go manages to extract a from peer ID from this RPC even though protobuf specifies no from property on the RPC object itself.

Given this, we use the router defined above. This issue should be discussed with the relevant parties responsible for maintaining the interfaces and pubsub protobuf.

Moving to Python 3.7

Python 3.7 brought some improvements to asyncio, but we have waited to make the upgrade to ensure compatibility with py-evm. But it seems py-evm has upgraded - is py-libp2p ready to upgrade as well?

Cjdns not planned ?

I saw that Cjdns is not planned, but Cjdns isn't like tor or i2p :

tor are not very "ip" compatible, it use some ip proxy protocol, that require a program to implement support of any supported proxy (socks 4/5) (or tor itself). (you have some exception but there are too rarely used to be considerate here)
i2p is better cause you can create a fake ip of an other i2p member (example : you can translate somethings.i2p to 127.4.76.243 locally) (you also have proxy option) so even if a program doesn't support proxy, that not a problem. But that require an intervention on the administration page of your router, so that require work for be implemented (create an API client making that for us (or use proxy)).
Cjdns ? Emulate a FULL ip network, Cjdns implement all ip option (I'm not sure for anycast).
But there is more, the way that he implement that, the connection from ip to Cjdns is a TUN interface, so if your program use layer 3 or more and use the kernel routing table (that will probably will the case for py-libp2p), you don't have ANY work for be Cjdns compatible.
Because any Cjdns address is like any ip address, Cjdns address are a sub section of ip address (0xfc) and all connection to Cjdns are made by an external daemon getting all packet from the TUN interface and do what it must do.

So for me saying : Cjdns not planned is an error, because you don't need to support Cjdns, Cjdns is fully ip compatible.

This is why I ask :
Why saying that ?
Would you someone to write a test for verifying that all ip compatible transport are cjdns compatible (if needed I will 😄 that is quite simple) ?
Or is "supporting" means here supporting Cjdns without any external daemon ?

Kademlia Implementation Roadmap

We just landed #129 to work on Kademlia feature by feature. Thanks @zaibon! Here's a summary of what needs to be done in descending order.

replace (ip, port, node_id) tuple with multiaddr of ip, port, node_id
implement RoutedHost with Kad DHT
refactor dial by injecting Router as a dependency
implement ADD_PROVIDER and REMOVE_PROVIDER
implement content routing with Kademlia
implement peer discovery by extending content routing as discussed here and relevant to #128
phase out rpcudp with gRPC