Coder Social home page Coder Social logo

raghuveerdr / educative.io_scraper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from anilabhadatta/educative.io_scraper

0.0 0.0 0.0 42.52 MB

Educative.io Course Downloader developed using Python and Selenium. Refer Readme.md for setup instructions.

License: MIT License

Python 100.00%

educative.io_scraper's Introduction

Educative.io Scraper / Educative.io Downloader

An Automation tool built using python, selenium and chrome that scrapes Educative.io courses for offline use.

Disclaimer: I am not responsible for any misuse of this scraper, I made this for my personal use.

To view the downloaded courses, use the Educative-Viewer repository.

  I Welcome anyone to contribute here in any form. Star and Fork my project ๐Ÿ˜Š Thanks.
  Repo Version : 8.5 (latest) || Release Version 6.8
  Update 6.9: Added support for scraping a special type of quiz container (Mark Down Quiz) in the course.
  Update 7.0: Fixed File name where "name" is not present in meta property og:title.
  Update 7.1: Various bug fixes related to code containers and improvements.
  Update 7.3: Fixed MarkDown and Copy Code button issue.
  Update 7.4: Fixed File Name for Modules
  Update 7.5: Fixed Slides Opening and Hints Opening issue
  Update 7.6: Fixed Quiz Questions skipped issue
  Update 7.7: Fixed removing of unecessary tags
  Update 7.8: Fixed Multple Bugs and Improvements
  Update 7.9: Added Style tag with filter:none
  Update 8.0: Skipped Projects if it is in current page
  Update 8.1: Skipped Assessments
  Update 8.2: Fixed Puzzle Javascript error
  Update 8.3: Fixed Quiz Container and show solution bug
  Update 8.4: Fixed Pagination buttons
  Update 8.5: Fixed Multiple Issues

How to use the Scraper?

  1. Create a text file and copy the urls of the first topic of any number of courses and paste it in the text file as shown below.

  1. Run both the executables chromedriver and educative_scraper by downloading them from latest releases.
  Note: If the executable release version is older than the current github repo version then run the
        project manually explained below.
  1. Select a config if you don't want to use the default config "0" by pressing 2.
  Note: Make sure to generate the config if it is selected for the first time.
  1. Generate the config (if not created) and provide the urls text file path, save location and headless mode by pressing 1.

  1. Login your educative account by pressing 3.
  2. Start Scraping by pressing 4.
  3. To return to Main Menu/ Exit Scraper press Ctrl+C / CMD+C.

Note 1: If the scraper fails or the User Exits in between for any specific reason, a log.txt file will be created in the save path, containing the index and last known url while scraping, copy the {index url} and replace it in the urls text file to resume scraping the course where it was stopped previously by restarting the scraper.

Note 2: Make sure to delete the urls that are already scraped while replacing in the urls text file.

Note 3: If for any reason your system shuts down for power failure or the scraper crashes then you have to manually search the url and index and provide the {index url} in urls text file since the scraper cannot create log.txt for sudden power cut/ crash.

To Run the project manually using git and python:

Prerequisites:

  Git
  Python 3.9+
  OS: Win/Mac(Intel)/Linux(ARM/AMD) 64bit
  Replace the word "python3" with "python" and "pip3" with "pip" for Windows OS only.

Step 1: Clone the repository and create a terminal inside the cloned directory and run the following commands.

Step 2: Install the virtualenv package for python3 and create a virtual environment named "env".

  pip3 install virtualenv
  virtualenv env

Step 3: Activate the virtual environment.

> (For Windows)

  env\Scripts\activate

> (For MacOS/Linux)

  source env/bin/activate

Step 4: Install the required modules:

  pip3 install -r requirements.txt

Step 5: Download, extract and paste the respective Chrome-bin for your OS from the latest releases section inside the Chrome-bin folder.

img4

Step 6: Open up two terminals and run the following commands in separate terminals.

  python3 chromedriver.py
  python3 educative_scraper.py

Step 7: Refer, How to use the Scraper? explained above, except the 2nd point.

For Mac OS users only: Refer to this Repository to Disable Chrome Updates.

(Optional) To Build the chromedriver and educative-scraper executables using pyinstaller:

Activate the Virtual Environment and Install the required modules for the project (Refer Step 2, 3, 4 above).

Install the pyinstaller package and run the following commands.

  pip3 install pyinstaller

> (For Windows)

  pyinstaller --clean --add-data Chrome-bin;Chrome-bin --onefile -i"icon.ico" educative_scraper.py
  pyinstaller --clean --add-data "Chrome-driver;Chrome-driver" --onefile -i"icon.ico" chromedriver.py

> (For MacOS/Linux)

  pyinstaller --clean --add-data Chrome-bin:Chrome-bin --onefile -i"icon.ico" educative_scraper.py
  pyinstaller --clean --add-data "Chrome-driver:Chrome-driver" --onefile -i"icon.ico" chromedriver.py

Pyinstaller command for Linux OS may or may not work due to a pyinstaller bug, currently checking for a fix.
A Whitepaper will be released containing the explanation of each functions and the cases handled by the scraper.

educative.io_scraper's People

Contributors

anilabhadatta avatar boostupstation avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.