Coder Social home page Coder Social logo

dht-bittorrent-index's Introduction

DHT Bittorrent index

A distributed index of torrents.

Possible ways this could work

Read-only nodes

Each node would either:

  • respond to requests from other nodes to read certain information like:
    • map of infohashes to torrent names
    • list of nodes known to the node
  • send such requests to other nodes

This approach could reduce the load on single nodes. It would be up to a node to build its index of torrents and nodes. Any searches would then be done on the local db.

Distributed search power

Each node would respond to search requests and implement their own search algorithm to return a list of results.

It can also cascade the requests to other nodes in order to find what was requested, but that isn't necessary to be implemented on all nodes. It could be implemented on nodes that have many search requests and belong to one network / owner e.g one website that cascades it's search to its other nodes.

Otherwise it also responds to requests of node indexes.

Dev

npm install -g coffee-script jasmine

# Run tests with
jasmine

Specs

Each node has a hashmap of

{
	"<infohash1>": "<torrent name1>",
	"<infohash2>": "<torrent name2>",
	// ...
	"<infohashN>": "<torrent nameN>",
}

Protocol

A node should implement a protocol that allows others nodes to query for torrents, get that node's index of torrents and index of known nodes.

Commands look simply like shell commands:

<command> [args]

getTorrentIndex [format=JSON]

Returns the node's index of torrents in a given format (JSON is the default).

getNodeIndex [format=JSON]

Returns the node's index of nodes in a given format (JSON is the default).

This is the list index/list of nodes known to the node. It should be regularly checked for dead nodes, either by the node returning the index of the node requesting it.

...

More commands to come

dht-bittorrent-index's People

Contributors

loveisgrief avatar

Watchers

 avatar  avatar  avatar

dht-bittorrent-index's Issues

Node index duplicates

At the moment we can add DNS and IP entries to the node index. This can lead to duplicates in our node index e.g

sometorrentsite.org can have and IPv4 address and an IPv6 address. What should we save?

  • Only DNS record
    • readable
    • if the site gets taken down... well, fuck
    • only people with a domain can participate
  • Only IPv4
    • Anybody can participate
    • We aren't helping the internet get rid of IPv4 and move on to IPv6
  • Only IPv6
    • Push for upgrade to new protocol
    • Not everybody uses IPv6
  • IPv4 and IPv6
    -+ Redundancy

Maybe the questions is: is redudancy a good or bad for us?

Generalize the project

Right now it's an application that cannot be easily integrated with others. A better way to do this and make it more usable would be to add more layers:

Application Layer         <-- to be written by an app calling us
          | 
   Protocol Layer           <-- maybe the API: a set of method that an application server like 
          |                         ExpressJS can call
          |
        Db Layer               <-- maybe Waterline ORM or an a custom adapter that calls a DB

With multiple layers it should be easier for dev to integrate the functionality without reimplementing everything.

Multi-node / Network tests

We need to see how multi-node networks work. For that we should create a network of nodes with different network topologies and see how information is propagated and found.

One of the things that come to mind are the seeds of each node. Depending on the topology each node might have a different seed. For example in a ring network we can test what happens when each created node is the seed of the next node.
How fast will we reach a state of entropy? How fast will the search speed be? How big can the indexes get?

Torrent-name duplicates

An infohash could have multiple names. What will we do in case that happens? Store a list of names? Never update the name?

getTorrentIndex limits and paging

Nodes should be able to create big indexes. Certain websites have databases of torrents that are a few GB big. A simple getTorrentIndex command on our nodes will non-descriminantly return all infohash-'torrent name' key-value pairs.

We should improve our getTorrentIndex to have the form

getTorrentIndex [--limit <number> [--page <number>]]

_Syntax:_ [] - Optional argument

Promises

Callbacks in the args? Ugh! Let's use promises. That'll help us create multiple nodes for testing as well.

Read-only nodes

Move towards read-only nodes (as described in the readme).

Read-only nodes

Each node would either:

  • respond to requests from other nodes to read certain information like:
    • map of infohashes to torrent names
    • list of nodes known to the node
  • send such requests to other nodes

This approach could reduce the load on single nodes. It would be up to a node to build its index of torrents and nodes. Any searches would then be done on the local db.

This should make development easier as we don't need to think about searching and would basically be an access point to a db.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.