Coder Social home page Coder Social logo

gwe9001 / educative.io_scraper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from anilabhadatta/educative.io_scraper

0.0 0.0 0.0 36.73 MB

Educative.io Course Downloader developed using Python and Selenium. Refer Readme.md for setup instructions.

License: MIT License

Python 100.00%

educative.io_scraper's Introduction

Educative.io Scraper -- Educative.io Downloader

Description: 
Discover the power of automation with my Python-based Educative.io Course Scraper. Harnessing the capabilities of Selenium
and Chromium-based browsers, this tool effortlessly scrapes and saves Educative.io courses for offline use enabling you to
learn at your own pace, even without an internet connection.

Contributions:
I wholeheartedly welcome contributions from individuals in any capacity to enhance this project. Together, we can make this project even better. Join us in building a community of learners
and contributors. Thank you for your support!

Disclaimer:
I want to clarify that I am not accountable for any inappropriate use of this scraper. I developed it solely for research
purposes and take no responsibility for its misuse.

Repository Version: v3.3.6 (Latest)
Master Branch: v3-master

To view the downloaded courses, you can use the Educative-Viewer repository, which provides a better readability and user-friendly interface for accessing the downloaded course content.

Steps to use the scraper:

  • Prerequisites:

Git
Python 3.9+
OS: Win(x86/x64) - Mac(ARM64/x64) - Linux(ARM64/x64)
  • Download & cd this project dir.

git clone https://github.com/anilabhadatta/educative.io_scraper.git
cd educative.io_scraper
  • Run the following commands to start Educative Scraper.

  • Automatic Steps:

    • Use python3 instead of python for Linux and MacOS.

      python setup.py --install
      python setup.py --run
      
      [Commands]
      --install: Creates a virtual environment and installs the required dependencies.
      --run: Activates the environment and starts the scraper. [Default = True]
      --create: Creates a shortcut executable file linked to the scraper directory.
      
      Note: If you have updated to v3.3.0+, run with --install arg again, 
            If you have updated to v3.1.0+, Redownload Chrome Binary and Chrome driver.
            
            If the git repository is moved to a different location after creating
            the executable then recreate it again to set the current repository path.
      
  • Manual Steps:

    • Windows:

      python -m venv env
      env\Scripts\activate
      pip install -r requirements.txt
      python EducativeScraper.py
      
    • MacOS/Linux:

      python3 -m venv env
      source env/bin/activate
      pip3 install -r requirements.txt
      python3 EducativeScraper.py
      

      Recommeded GUI Settings
  • After the GUI successfully loads, please proceed to follow the subsequent steps.

    • Create a text file.

    • Copy the URLs of the first topic/lesson from any number of courses.

    • Paste all the URLs into the text file and save it.




      Reference
    • Select a configuration if you prefer not to use the default configuration.

    • If you prefer not to display the browser window, choose the headless option.

    • Please provide a unique User Data Directory name that the browser will use to store your current session. Ensure that each instance of the scraper has a distinct User Data Directory name.

    • Please select the file path of the text file containing the course URLs, as well as the directory where you would like to save the downloaded content.

    • You can choose to save/export the current configuration for later use, or you can opt for the default configuration.

    • For the initial setup or updates, click on Download Chromedriver and Download Chrome Binary to automatically Download them into the project directory.

    • If you intend to utilize proxies, simply enable the proxy option and enter the proxy in proxies box.


      • For IP authorized proxy, you can directly enter IP:PORT of the proxy.
      • For USER:PASS authorized proxy, you'll need to create a localhost tunnel using the Proxy-Login-Automator repository.
      • After setting up the tunnel, enter the IP:PORT of the localhost proxy that you configured in the Proxy Login Automator.

    • Click on Start Chromedriver to start the Chromedriver.

    • Click on Login Account to log in to your Educative.io account and click on Close Browser Button to close the browser after the login is completed.

    • Click on Start Scraper to begin scraping the courses.

    • The scraper will automatically stop after scraping all the URLs in the selected text file.

    • If you decide to stop the scraper using the Stop Scraper Button before it finishes or face any errors, the most recent URL will be saved in the EducativeScraper.log file. Simply copy the URL from the INFO logger and replace the URL of the topic/lesson that has already been completed with the copied URL. This will allow you to resume the scraper from where you left off.


    • An index is NOT required in the URL's text file, Simply paste the URLs of the topic from which you want to start/resume scraping.

educative.io_scraper's People

Contributors

anilabhadatta avatar anhpho avatar boostupstation avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.