AI Web Crawler is a python tool that downloads data about available research studies, formats it, and uploads the data to a database.
Coded in Python v3.8.5
- #7 #8 #9 #15 #25 Download research studies by crawling clinicaltrials.gov=
- #6 #26 Schedule crawling tasks to run on a recurring basis
- #27 #33 Use NLP to create a brief summary and a list of keywords for each crawler
- Automatically upload the data to the specified Firebase database
- #34 Use multithreading for superior performance (up to 7 studies per sec)
- The crawler didn't run if there was http request errors(fixed)
- Python 3 (recommended 3.8.5)
- Pip (included with Python 3)
- The Firebase JSON provided by the development team
- Download the code using
git
or straight from GitHub - To install dependencies, execute the following command where the code was downloaded
pip install -r requirements.txt
- Run the following commands to install other required dependencies
python -m nltk.downloader stopwords
python -m nltk.downloader universal_tagset
python -m spacy download en
- Place the Firebase JSON into the same folder
There are 2 ways to execute the crawler
A. Using the admin panel to schedule recurring crawls
python manage.py runserver
To use the admin panel, you must be an authorized user, with access to the login information
b. Manually executing the crawler
python crawler.py
- When updating studies that have already been crawled, there is a limitation placed by clinicaltrials.gov, causing some studies to be left out. In our testing we have never gotten close to this limit as long as the crawler is executed daily.
Meet our team mates
Adeeb Zaman |
Heejoo Cho |
Jonathon Sisson |
Juntae Kim |
Alex Han |