A python script that uses selenium to scrape classes and class meta-data of Lehigh classes.
This script uses selenium to scrape information about classes offered in Lehigh University from this registration link.
The data is stored in a file named class_data.json
in the root directory in the following JSON
list format:
{
"CRN": crn_number,
"section": section_number,
"subject": subject,
"course_number": course_number,
"title": course_title
}
- Create a virtualenv and run it. (This is slightly different for Windows vs Linux/Mac)
- In order for selenium to work, you need to download
chromewebdriver
and place it into the directory containing the shell scripts. Choose the version that matches the web browser you have. Note: You can always opt to use a different browser like Firefox. Just make sure to change the code accordingly innew_zealand_links,py
to reflect that. Also, if you don't have a windows machine, you need to change this part innew_zealand_links.py
to reflect that:
CHROMEDRIVER_PATH = './chromedriver.exe'
- Run
pip install -r requirements.txt
from the inside the directory containingrequirements.txt
file while virtualenv is running to install all the dependencies The dependencies are as follows (automatically installed when above command is run):
autopep8==1.5.3
pycodestyle==2.6.0
selenium==3.141.0
toml==0.10.1
urllib3==1.25.10
- Run command
python main.py
from root directory if you want to start frompage 1
- Run command
python main.py <page_number>
if you want to start from<page_number>
page.
- Your data will be in
class_data.json
๐