loveisgrief / dht-bittorrent-index Goto Github PK

View Code? Open in Web Editor NEW

0.0 3.0 0.0 236 KB

A distributed index of torrents.

CoffeeScript 95.08% JavaScript 0.84% Shell 4.08%

dht-bittorrent-index's Introduction

DHT Bittorrent index

A distributed index of torrents.

Possible ways this could work

Read-only nodes

Each node would either:

respond to requests from other nodes to read certain information like:
- map of infohashes to torrent names
- list of nodes known to the node
send such requests to other nodes

This approach could reduce the load on single nodes. It would be up to a node to build its index of torrents and nodes. Any searches would then be done on the local db.

Distributed search power

Each node would respond to search requests and implement their own search algorithm to return a list of results.

It can also cascade the requests to other nodes in order to find what was requested, but that isn't necessary to be implemented on all nodes. It could be implemented on nodes that have many search requests and belong to one network / owner e.g one website that cascades it's search to its other nodes.

Otherwise it also responds to requests of node indexes.

Dev

npm install -g coffee-script jasmine

# Run tests with
jasmine

Specs

Each node has a hashmap of

{
	"<infohash1>": "<torrent name1>",
	"<infohash2>": "<torrent name2>",
	// ...
	"<infohashN>": "<torrent nameN>",
}

Protocol

A node should implement a protocol that allows others nodes to query for torrents, get that node's index of torrents and index of known nodes.

Commands look simply like shell commands:

<command> [args]

getTorrentIndex [format=JSON]

Returns the node's index of torrents in a given format (JSON is the default).

getNodeIndex [format=JSON]

Returns the node's index of nodes in a given format (JSON is the default).

This is the list index/list of nodes known to the node. It should be regularly checked for dead nodes, either by the node returning the index of the node requesting it.

...

More commands to come

dht-bittorrent-index's People

Contributors

Watchers

dht-bittorrent-index's Issues

Node index duplicates

At the moment we can add DNS and IP entries to the node index. This can lead to duplicates in our node index e.g

sometorrentsite.org can have and IPv4 address and an IPv6 address. What should we save?

Only DNS record
- readable
- if the site gets taken down... well, fuck
- only people with a domain can participate
Only IPv4
- Anybody can participate
- We aren't helping the internet get rid of IPv4 and move on to IPv6
Only IPv6
- Push for upgrade to new protocol
- Not everybody uses IPv6
IPv4 and IPv6
-+ Redundancy

Maybe the questions is: is redudancy a good or bad for us?

Generalize the project

Right now it's an application that cannot be easily integrated with others. A better way to do this and make it more usable would be to add more layers:

Application Layer         <-- to be written by an app calling us
          | 
   Protocol Layer           <-- maybe the API: a set of method that an application server like 
          |                         ExpressJS can call
          |
        Db Layer               <-- maybe Waterline ORM or an a custom adapter that calls a DB

With multiple layers it should be easier for dev to integrate the functionality without reimplementing everything.

Multi-node / Network tests

We need to see how multi-node networks work. For that we should create a network of nodes with different network topologies and see how information is propagated and found.

One of the things that come to mind are the seeds of each node. Depending on the topology each node might have a different seed. For example in a ring network we can test what happens when each created node is the seed of the next node.
How fast will we reach a state of entropy? How fast will the search speed be? How big can the indexes get?

Torrent-name duplicates

An infohash could have multiple names. What will we do in case that happens? Store a list of names? Never update the name?

getTorrentIndex limits and paging

Nodes should be able to create big indexes. Certain websites have databases of torrents that are a few GB big. A simple getTorrentIndex command on our nodes will non-descriminantly return all infohash-'torrent name' key-value pairs.

We should improve our getTorrentIndex to have the form

getTorrentIndex [--limit <number> [--page <number>]]

_Syntax:_ [] - Optional argument

Promises

Callbacks in the args? Ugh! Let's use promises. That'll help us create multiple nodes for testing as well.

Read-only nodes

Move towards read-only nodes (as described in the readme).

Read-only nodes

Each node would either:

respond to requests from other nodes to read certain information like:

map of infohashes to torrent names

list of nodes known to the node

send such requests to other nodes

This approach could reduce the load on single nodes. It would be up to a node to build its index of torrents and nodes. Any searches would then be done on the local db.

This should make development easier as we don't need to think about searching and would basically be an access point to a db.

loveisgrief / dht-bittorrent-index Goto Github PK

dht-bittorrent-index's Introduction

DHT Bittorrent index

Possible ways this could work

Read-only nodes

Distributed search power

Dev

Specs

Protocol

getTorrentIndex [format=JSON]

getNodeIndex [format=JSON]

...

dht-bittorrent-index's People

Contributors

Watchers

dht-bittorrent-index's Issues

Read-only nodes

Recommend Projects

Recommend Topics

Recommend Org