Coder Social home page Coder Social logo

github-trending-crawler's Introduction

GitHub-Trending-Crawler

Crawling GitHub Trending Pages every day.

Introduction

The program is highly recommend to be deployed on a Linux server, which can crawl information about popular repositories of languages you are interested in on GitHub every day. Then it will create a markdown file to record those information and generate a wordcloud image according to repositories' descriptions.

This crawler is designed to help me keep track of the latest trends in technology and discover some new and interesting repositories. In fact, reading the newest markdown file has become a part of my daily routines. More importantly, it increases contributions of GitHub :P

The idea was inspired by LJ147.

Requirements

  • python 3.6+
  • git
  • screen
  • unzip

Configuration

Usage on Linux

$ sudo apt install -y unzip screen python3-pip
$ sudo apt-get install -y python-tk python3-tk

# the `release` branch is stable, and there is only code. 
$ wget https://github.com/fgksgf/GitHub-Trending-Crawler/archive/release.zip
$ unzip release.zip
$ cd GitHub-Trending-Crawler-release/
$ mkdir img
$ git init
$ git remote add origin <YourGitHubRepoURL>

# using virtual environment is highly recommended
$ pip3 install -r requirements.txt
  1. Switch to the repository directory and just type screen at the command prompt. Then the screen will show with interface exactly as the command prompt.

  2. When you enter the screen, you can do all your work as you are in the normal CLI environment. But since the screen is an application, so it have command or parameters.

  3. And now, we can run the program: python3 main.py -p -l

  4. While the program is running, you can press Ctrl + A and d to detach the screen. Then you can disconnect your SSH session.

  5. When you want to check the status of the crawler, just reconnect to your server via ssh. Then use this command screen -r to restore the screen. For more information about screen command, you can visit here.

CLI Options

python3 main.py (-h | --help)
python3 main.py (-v | --version)
python3 main.py [-l | --loop] [-p | --push] [--frequency=<f>]

Options:
  -h --help        Show this screen.
  -v --version     Show version.
  -l --loop        Run this program cyclically.
  -p --push        Use git to push the markdown and the image.
  --frequency=<f>  The frequency of crawling [default: daily].

Change Logs

V1.5 (2020-02-22)

  • Refactor code with object-oriented methods
  • Split single python file into several files
  • Improve exception handling
  • Add logging feature
  • Use docopt to enhance command-line usage
  • Update requirements

github-trending-crawler's People

Contributors

dependabot[bot] avatar fgksgf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

github-trending-crawler's Issues

Question about proxies

This utility looks good. I think it would make a good function for OpenFaaS or faasd.

Why is a random pool of proxies required? The code appears to read html pages which are not usually rate limited?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.