The combinatorrent from jlouis

Use mmap() for file I/O

We will assume a 64 bit architecture from the start. This means we can use mmap()'ed I/O all over the place and just map files into the VM memory space. In turn, this will enable fast disk I/O while outsourcing the caching problems to the kernel.

The task is somewhat isolated to the FS Process and its backend FileSystem library. However, it does need some work to get into a running state. Most importantly, one needs a way to do SHA1 on the mmap() store as well.

Support the FAST extension (BEP0006)

Add support in the client for the FAST extension.

Write a Users Guide

A Users guide would be nice to have. Currently I wouldn't really bother with it, because the client changes too much all the time.

Not really an issue - burn after reading!

I found the post in master/doc/haskell-vs-erlang.mkd to be both fun and enlightening - where on the web is it published so I can "like it" (in the new real world ;-) )
(Google took me here)

Do not send HAVE messages if the peer already has the piece.

When we complete a piece at our end, it is mandatory to send a HAVE message to other peers so they can begin requesting the piece from us. However, if the given peer already posses the given piece, there is no need to send a message over the wire to him.

This optimization is fairly straightforward since the Peer process already has all the needed information at its disposal.

Properly link supervisors into a tree.

Some supervisors are currently faked and are hanging on some processes in the wrong way. They should be part of a supervisor tree so the client will be closing down gracefully.

Consider using the event library for network IO

Investigate and use the Event Library of Bryan O'Sullivan and Johan
Tibell:

A Haskell event notification library - Github

Improve run-times of PieceSets

Our new PieceSet implementation is expensive to run. It now accounts for more than half of all the work done in the client. Improve this situation.

Obviously, one can track the complete pieceset as a specific constructor in the PieceSet datatype and thus skip a lot of the work in this common case.

Use a rate estimate to decide how many blocks to request from the piece manager.

We currently always request 25 blocks and then do a rerequest when it hits 5 blocks left. In practice, a better solution is to take the upload rate, multiply it by a number of seconds (3-5) and then divide by the typical block size, e.g., around 17 kilobytes. This number is the number of pieces to request. We should also use this value to update the bound where we should fill up more blocks. In practice you need a bandwidth*delay product if this is to be done right.

if we can't get enough blocks, we can stop asking until something changes the game or a timer of 10 seconds or so expire.

Improve the command line parser

The command-line parser is currently quite weak and simple. It can be improved to handle many more commands and do much more than what is currently possible.

If you are feeling really adventurous, you should first play a couple of Infocom games, play with the inform interactive fiction creator tool, take a bit of craziness and implement "Adventure for a torrent". Less than this will do as well however :)

DHT support

One very interesting extension is that of DHT support. Getting this done enables a client to fetch peers from the DHT rather than from the tracker. It greatly improves the robustness of torrents. When doing this, it is important to heed to "private" field in the torrent file.

Add proper restart support on supervisors.

The supervisors cannot currently restart when something goes wrong. They should be able to.

Add rate limitation support, locally for a torrent or globally for all torrents.

This might not be easy if the client should also be fast. Think well about how to do it.

PieceMgr assertion failure

"PieceMgrP"(Fatal): Process exiting due to ex: user error (P/Blk (655,Block {blockOffset = 81920, blockSize = 16384}) is in the HaveBlocks set)
"ConsoleP"(Info):   Process Terminated by Supervisor

This bug manifests itself from a failure in the PieceManager. Specifically, the above is an assertion failure because there is a block both in the set of blocks we have and blocks we are currently downloading. It may be stray blocks which is a part of the bittorrent protocol without the FAST extension.

Tracker process dies on connection timeout

Figure out how to avoid the death and how to improve the code such that it doesn't happen.

Add support for multiple trackers.

There is an extension for supporting multiple trackers in the same torrent. This extension should be added. There are numerous torrents out there where the main tracker is dead and gone for long but the "backup"-trackers in the multi-tracker list are still up and strong.

Handle Snubbing correctly.

When a peer has not sent us any data for some time he is snubbing us. Handle this to more aggressively go for new peers.

Utilize "ALLOWED FAST"

Use the Allowed Fast set for something in the client.

Be explicit: Use Word32 in the Wire protocol.

The Wire protocol mandates use of Word32 rather than the "Int" or "Integer" types. Change the protocol parsers to do this correctly.

Listen port improvements.

There are two things to do to the Listen-port. One is to make it possible to select a different port than the default. The other is to select a random port off a range.

Use intelligent flushing of the sender queue

Currently, the send queue just flushes itself after each message has been delivered. If we have #5 implemented, we can choose to flush on a rarer basis than right now. This would definitely improve our system and help the kernel some more.

Support UDP tracking (BEP 0015)

Popular trackers dies die to excessive TCP handshaking for sending very little information back to the client. The UDP tracking extension allows one to communicate with the tracker through a UDP interface.

Combinatorrent may freak out GHC 6.12.1

We have hit this one (unfortunately with lost context):

HaskellTorrent: internal error: throwTo: unrecognised why_blocked value
     (GHC version 6.12.1 for x86_64_unknown_linux)
Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Aborted

It is bug #3923 at the
GHC trac.

Currently, we have not seen this bug in the wild for quite a while. If it is there, it is pretty rare. Given that we rearrange the concurrency all the time, it might be impossible to reproduce anymore.

Utilize the "Suggest" Message

The SUGGEST message from the fast extension can perhaps be used for optimizing seeding.

Add Extended messaging support (BEP 0010)

Support and use the extended messaging protocol in the client.

Send Queue Optimization

Currently, the send queue is one queue on which messages flow. The reason we are requesting fairly small 16k piece blocks is because we might want to interleave other messages in the queue stream. We don't do any kind of optimization on this at the moment.

If you take a look at the SCTP protocol, it has a session design in which sessions are multiplexed on top of the same line. That way, you could run a control channel independent of the data channel. We want to simulate this construction in combinatorrent.

A message to be queued is either a control message or a data message. We want messages on the control stream to take precedence over the messages on the data stream.

Change the sender Queue into having two queues, one for short
messages and one for long messages.

Implement scraping

There is a nice scraper methodology for asking a tracker different kinds of information. Rather than using the current methodology, we could in its stead use the scraper method.

ETA estimation.

Write code which can estimate the completion time of torrents.

Add support for partial downloads.

This is a popular feature in modern clients. Rather than having to download everything, you allow the client to just download part of the file and then stop when the partial data is downloaded.

There are some consequences for getting this to work in the client, so it might be worth analyzing and planning a bit ahead before embarking on doing it.

do not propagate nPieces to Peer

... since nPieces can be calculated on the fly from the bounds of the piece map.

Consider a "pure seeder" mode.

In this mode, the client will assume it has ample upstream bandwidth and change its internal algorithms with the sole purpose of using as much bandwidth as possible. Details are to be worked out.

Improve the HTML pages

My lack of HTML/Webdev skills are showing. If anybody wants to improve it, they are free to do so!

Keep a track record of peers

Rather than just accepting any peer blindly, we should keep a track record of the peers we have spoken to in the past. This gives us a way to filter out peers based on their earlier merits, not connect to the same peer twice, blackhole peers which are consistenly bad and so on.

It also paves the way for blocklist support, should you want that kind of thing.

Update Source Code Hieararchy in README.md

This is now outdated. Fix it.

Utilize the HAVE ALL/NONE messages.

Combinatorrent currently just naively sends the Bitfield message irregardless of the corner cases. It would be more beneficial to send HAVE ALL or HAVE NONE if that is indeed the case.

Optimize block reading access

Right now, 16k blocks are read fairly early and then kept in a queue of requested pieces. So a client requesting 32 blocks will have a memory consumption of at least 32 x 16K = 512K. That is far too much, if we expect there to be 40-100 connections. It amounts to something along the lines of 20-50 megabytes of waste.

It is possible to optimize this. The first part is simply to only request this at the last possible time so the fetched data can be thrown away after it has gone down the wire. It will also pave the way for the fast extension SUGGEST option.

Optimize Piece Manager requests

When we grab pieces from the Piece Manager, let it provide us with a pruned set of pieces we can ask with later. This way, we only need to consider pieces we already have once and we get a faster system. When doing this, only prune pieces which are done and checked.

Currently, this is not in the hot code path, so it is not that important to pull off yet.

jlouis / combinatorrent Goto Github PK

combinatorrent's People

Contributors

Stargazers

Watchers

Forkers

combinatorrent's Issues

Recommend Projects

Recommend Topics

Recommend Org