Coder Social home page Coder Social logo

htw-webtech / lsfeventscraper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pascalweiss/lsfeventscraper

0.0 3.0 0.0 69 KB

Scrapes all events from HTW-Berlin (university) and stores them into a Database (MYSQL or PSQL)

License: MIT License

Python 100.00%

lsfeventscraper's Introduction

LSFEventScraper

LSFEventScraper is a multithreaded Python scraper, that extracts all events of the current semster from HTW-Berlin.de

How it works

Currently, HTW-Berlin.de has an semester-overview page, from where the crawler can reach a page for every day of the current semster. These day-overview pages are the source, from where every event of the semester can be extracted. What the module does is: • fetching the semester-overview • extracting all day-overview URLs • fetching all day-overviews • extracting every event • save the events to a database

Database Configuration

LSFEventScraper can store the events to either MYSQL or PostgreSQL. If you want to use it with the corresponding HTWRoomFinder, you need to use PostgreSQL. The first thing you need to do is to create the appropriate tables. This is how you do it for PostgreSQL:

psql -h <your host> <db-name> <user> < RoomDBInit_PSQL.db

And here for mysql:

mysql -p -h <your host> -u <user> -p <db-name> < RoomDBInit_MYSQL.db

The LSFEventScraper needs to connect to your database, so you also need to provide your credentials. Just add your credentials to db_credentials_PSQL.json, if you want to use PostgreSQL, or to db_credentials_MYSQL.json, if you want to use MYSQL.

Requirements

The module requires psycopg2, beautifulsoup4, mysql-python. You need pip to install them. for example like this: pip install requirements.txt

Usage

To reduce dependencies, the whole project is built up with the facade pattern. Thus the only class, you need to use is LSFEventScraper, which is an interface for the whole functionality.

There are 2 scenarios for how you can use the LSFEventScraper in a reasonable manner:

  1. Scenario: ============ Scraping all events and store them to a database
# - Fetches all events from HTW-Berlin.de and stores them to memory.
scraper.scrape_events()

# - Sends a TRUNCATE command to the database, to delete all current rows.
scraper.db_access.reset()

# - Sends saves all events to the database.
scraper.save_events_to_db()
  1. Scenario =========== Fetching all day-overviews and store them as html-files to disk. Scrape the locally stored events and store them to a database later.
# - Fetches all day-overviews and stores them as html files to ./data_events/
scraper.crawl_day_pages_and_save_to_disk()

# - ...Later... After you've fetched the pages, you can scrape and store the events later.

# - Scrapes all local sites and stores them to memory
scraper.scrape_local_sites()
# - Sends a TRUNCATE command to the database, to delete all current rows.
scraper.db_access.reset()
# - Sends saves all events to the database.
scraper.save_events_to_db()

test

If all requirements are installed, you can test the scraper with

python main.py

lsfeventscraper's People

Contributors

pascalweiss avatar

Watchers

Max Beier avatar James Cloos avatar Franz Zieris avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.