The program which crawls Github. Only analyses the first page of the search.
You can specify keywords you want to find. List of proxies you want to use, every time proxy will be chosen randomly from the list. There are also three types of search which are supported: Repositories, Issues, and Wikis.
URLS for each of the results of the search.
Specify the parameters you want as JSON in the in.txt file. The result URLS will be written to the file out.txt also in JSON format.
This search in the file in.txt:
{
"keywords": [
"openstack",
"nova",
"css"
],
"proxies": [
"8.129.215.20:8080",
"208.80.28.208:8080"
],
"type": "Repositories"
}
If you run
foo@bar:~$ python github_crawler.py
Will give this result in out.txt:
[
{
"url": "https://github.com/atuldjadhav/DropBox-Cloud-Storage"
},
{
"url": "https://github.com/michealbalogun/Horizon-dashboard"
}
]