Coder Social home page Coder Social logo

kevinlynx / dhtcrawler2 Goto Github PK

View Code? Open in Web Editor NEW
453.0 453.0 622.0 3.39 MB

dhtcrawler is a DHT crawler written in erlang. It can join a DHT network and crawl many P2P torrents. The program save all torrent info into database and provide an http interface to search a torrent by a keyword

License: Other

Shell 100.00%

dhtcrawler2's People

Contributors

kevinlynx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dhtcrawler2's Issues

Internal Server Error

Any search request gives "Internal Server Error" message. Stats works fine. Mongodb 2.6.1, Erlang 17, Windows 7 32 bit.

crawler not work in intranet.

I use win7 x64.After running win_start_crawler.bat, it just create two table in Mongodb.No data put in db.So I just want to confirm whether dhtcrawler2 support crawler in intranet. If not, how can I improve it to be that?

failed_to_start_child,hash_cache_writer on Linux

voltagex@devbox:~/src/dhtcrawler2$ erl -pa ebin -noshell -s crawler_app start
load file "/home/voltagex/src/dhtcrawler2/ebin/../priv/dhtcrawler.config" success
dhtcrawler startup 6776, 50, "localhost":27017

=INFO REPORT==== 22-Feb-2018::18:18:15 ===
    application: dhtcrawler
    exited: {{shutdown,
                 {failed_to_start_child,hash_cache_writer,
                     {undef,
                         [{mongo_sup,start_pool,
                              [hash_write_db,5,{"localhost",27017}],
                              []},
                          {hash_cache_writer,init,1,
                              [{file,"src/crawler/hash_cache_writer.erl"},
                               {line,37}]},
                          {gen_server,init_it,2,
                              [{file,"gen_server.erl"},{line,365}]},
                          {gen_server,init_it,6,
                              [{file,"gen_server.erl"},{line,333}]},
                          {proc_lib,init_p_do_apply,3,
                              [{file,"proc_lib.erl"},{line,247}]}]}}},
             {crawler_app,start,[normal,[]]}}
    type: temporary

What order do things need to be started in?

Crawling seems not to be working

Hi.

I ran crawling process and no infohashes were to be found, no matter how much time I let him run.
I opened a network analyser and I saw several packets were sent in the first phase, and then no packets were on the traffic. I also disabled the firewall.

Is it still working now?

Thank you.

Not finding any torrents

Hello,

Are there any special network requirements in order to get the DHT crawler running?

I opened the 6776 (UDP and TCP) port on my router, but still - after following the instructions on the README (starting crawler/hash/http) and waiting 10 minutes - the crawler has not found anything yet.

Stats:

total 0 torrents

2013-07-10 RecvQuery 0 ProcessedQuery 0 Updated 0 New 0
2013-07-09 RecvQuery 0 ProcessedQuery 0 Updated 0 New 0
2013-07-08 RecvQuery 0 ProcessedQuery 0 Updated 0 New 0

Any ideas?

No torrents are being processed

After letting it run on my Ubuntu computer (with all required dependencies installed) the output is the following:

    2017-01-31 RecvQ 23198 ProcessQ 0 Updated 0 New 0 UniqueQ 177 Filtered 0
    2017-01-30 RecvQ 0 ProcessQ 0 Updated 0 New 0 UniqueQ 0 Filtered 0
    2017-01-29 RecvQ 0 ProcessQ 0 Updated 0 New 0 UniqueQ 0 Filtered 0

This is with the crawler running, the httpd running, and the hash program running

Fails to compile with Erlang R14B

This is the error message:

==> kdht (compile)
/tmp/dhtcrawler2/deps/kdht/src/msg.erl:54: syntax error before: 'query'
/tmp/dhtcrawler2/deps/kdht/src/msg.erl:66: syntax error before: '('
/tmp/dhtcrawler2/deps/kdht/src/msg.erl:76: syntax error before: '('
/tmp/dhtcrawler2/deps/kdht/src/msg.erl:122: syntax error before: '('
/tmp/dhtcrawler2/deps/kdht/src/msg.erl:150: syntax error before: '('
/tmp/dhtcrawler2/deps/kdht/src/msg.erl:9: function announce_peer/5 undefined
/tmp/dhtcrawler2/deps/kdht/src/msg.erl:9: function find_node/3 undefined
/tmp/dhtcrawler2/deps/kdht/src/msg.erl:9: function get_peer/3 undefined
/tmp/dhtcrawler2/deps/kdht/src/msg.erl:9: function ping/2 undefined
/tmp/dhtcrawler2/deps/kdht/src/msg.erl:26: function ping/2 undefined
/tmp/dhtcrawler2/deps/kdht/src/msg.erl:29: function find_node/3 undefined
/tmp/dhtcrawler2/deps/kdht/src/msg.erl:32: function get_peer/3 undefined
/tmp/dhtcrawler2/deps/kdht/src/msg.erl:35: function announce_peer/5 undefined

Great project by the way!

Indexing rate

Hello,
This is not really a bug/issue, but there it goes:
What can be done to increase the torrent indexing rate? I am running the latest version of the crawler for 8 hours now and it has only indexed 380 torrents.
The machine is pretty good with a 40 Mbps network connection.
Anything I can do to improve the indexing rate?

torcache requests - rate limit

I'm trying out DHTcrawler2 and it performs really well - it discovers around 400 thousand unique torrents per day. However, it does not index all of them (merely around 400). Upon examining the logs, I discovered the issue is related to a timed out request to torcache, which I guess is because of a rate limit.

Are you doing anything special on your instance (bt.com) to overcome this limit?

Compiles but not run (Erlang16, Ubuntu)

/root/dhtcrawler2/ebin/../priv/dhtcrawler.config" success
dhtcrawler startup 26776, 50, "localhost":27017

=INFO REPORT==== 30-Nov-2016::13:07:07 ===
application: dhtcrawler
exited: {{shutdown,
{failed_to_start_child,hash_cache_writer,
{undef,
[{mongo_sup,start_pool,
[hash_write_db,5,{"localhost",27017}],
[]},
{hash_cache_writer,init,1,
[{file,"src/crawler/hash_cache_writer.erl"},
{line,37}]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},{line,304}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}}},
{crawler_app,start,[normal,[]]}}
type: temporary

Which versions of dependencies are needed to compile & run ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.