Coder Social home page Coder Social logo

studyfind-ai-web-crawler's Introduction

AI Web Crawler 1.0

Automated Release Notes by gren All Contributors

AI Web Crawler is a python tool that downloads data about available research studies, formats it, and uploads the data to a database.

Coded in Python v3.8.5

Relesae Notes

v1.0.0 (20/11/2020)

New Features

  • #7 #8 #9 #15 #25 Download research studies by crawling clinicaltrials.gov=
  • #6 #26 Schedule crawling tasks to run on a recurring basis
  • #27 #33 Use NLP to create a brief summary and a list of keywords for each crawler
  • Automatically upload the data to the specified Firebase database
  • #34 Use multithreading for superior performance (up to 7 studies per sec)

Bug Fixes

  • The crawler didn't run if there was http request errors(fixed)

Installation Guide

Requirements

  • Python 3 (recommended 3.8.5)
  • Pip (included with Python 3)
  • The Firebase JSON provided by the development team

Installation

  1. Download the code using git or straight from GitHub
  2. To install dependencies, execute the following command where the code was downloaded
   pip install -r requirements.txt
  1. Run the following commands to install other required dependencies
  python -m nltk.downloader stopwords
  python -m nltk.downloader universal_tagset
  python -m spacy download en
  1. Place the Firebase JSON into the same folder

Usage

There are 2 ways to execute the crawler

A. Using the admin panel to schedule recurring crawls

python manage.py runserver

To use the admin panel, you must be an authorized user, with access to the login information

b. Manually executing the crawler

python crawler.py

Known Limitations

  • When updating studies that have already been crawled, there is a limitation placed by clinicaltrials.gov, causing some studies to be left out. In our testing we have never gotten close to this limit as long as the crawler is executed daily.

Contributors ✨

Meet our team mates


Adeeb Zaman


Heejoo Cho


Jonathon Sisson


Juntae Kim


Alex Han

studyfind-ai-web-crawler's People

Contributors

adeeb897 avatar alexhan46 avatar jkim3389 avatar joheeju avatar jsisson7 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

studyfind-ai-web-crawler's Issues

Create Admin Panel

Create admin panel using Django which allows user to run and schedule the crawler, as well as configure the websites.

Download and format data

Given an ID number of a study, download the data from clinicaltrials.gov, and format/prune the data to fit the client's specifications

Research libraries

Since the team is fairly new to NLP, we will spend some time researching appropriate NLP and web crawling libraries so we can implement those features in later sprints.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.