Coder Social home page Coder Social logo

gongo's Introduction

Gongo: Grant Announcement Crawler

This project is a Python-based crawler designed to automatically scrape grant announcements from various websites and notify users via Slack. It employs Selenium for dynamic web pages and BeautifulSoup for static pages, facilitating comprehensive coverage across different types of web structures.

This was created to gather information on blockchain or startup support projects in Korea. Gongo means 'announcement' in Korean.

Features

  • Dynamic and Static Page Handling: Supports both dynamic pages that require interaction and static pages for straightforward scraping.
  • Scheduled Execution: Runs at specified times to ensure up-to-date information is fetched without constant manual oversight.
  • Slack Integration: Automatically sends notifications to a designated Slack channel, providing timely updates on new grant announcements.
  • Flexible Configuration: Easily add or modify website configurations to tailor the crawler to specific needs.

Getting Started

  1. Ensure Python 3.7+ is installed.
  2. Clone the repository.
  3. Install dependencies: pip install -r requirements.txt.
  4. Set up the .env file with your Slack API token and channel ID.
  5. Run python start.py to start the scheduler and crawler.

Key Files

The project consists of several key components outlined below:

  • start.py: A scheduler script that triggers the main crawler at specified intervals, ensuring the crawler runs at pre-defined times throughout the day. It also manages logging and the execution schedule, adjusting for weekends.
  • src/main.py: The core script that orchestrates the crawling process. It fetches website configurations, extracts new posts, logs them, and notifies a Slack channel about any new grant announcements.
  • src/websites_config.py: Configuration file containing details of websites to be crawled, including URLs, selectors for scraping, and other parameters tailored to each site's structure.
  • src/fetch_methods/: A directory containing various Python scripts (useSelenium.py, useSeleniumCheckBox.py, useUrlQuery-x.py, etc.), each implementing a specific method for fetching and processing website data.

Configuring Timezone

This program is designed to align with Korean Standard Time. For computers not set to Korean Standard Time, such as AWS EC2 instances, here is how you can adjust the settings appropriately. Applying the commands below updates the timezone for all users and processes on the instance to Korean Standard Time.

To check or change the current timezone in your instance:

  • For Ubuntu or most Linux distributions, you can check the current timezone by running:

    timedatectl or cat /etc/timezone

  • To change the timezone to Seoul, which is the timezone this program is synchronized with, use the following command:

    sudo timedatectl set-timezone Asia/Seoul

gongo's People

Contributors

abitocodes avatar tibarom avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.