Coder Social home page Coder Social logo

covid19-ml-project's Introduction

covid19-ml-project

Directory Description

  • scripts: contains '.py' files that run the model
  • data_raw: contains files directly pulled from online sources
  • data_intermediate: contains cleaned files ready to be integrated into main data frame
  • output: contains folders that describe results from the gridsearch
    • data: dfs used for most recent gridsearches (train, validation, test)
    • plot: plots describing the output of the models
    • models_predictions_nopca: predictions and model objects for various hyper-parameters
    • models_predictions_pca: predictions and model objects for various hyper-parameters
    • Note the previous two folders are empty because much of their contents was far too large to upload, see instructions for how to recreate

Description of Files

scripts

  • building_graphs.py: code to create various data exploration graphs
  • build_master_df.py: assembles df to be used in model
  • chart_results.py: contains functions that build select charts from results
  • create_intermediate_data.py: contains function to populate intermediate data folder
  • fit_models_with_pca.py: functions to prepare and execute gridsearch with pca
  • fit_models_without_pca.py: functions to prepare and execute gridsearch without pca
  • identify_key_features.py: identifies key features in non-pca model
  • import_health.py: creates intermediate data csv for cdc health characteristics
  • load_cl_CDC.py: creates intermediate data csv for cases and deaths
  • load_cl_target.py: creates intermediate data csv for target variables
  • load_interventions.py: creates intermediate data csv for county-level covid interventions
  • pipeline.py: various functions for preparing for grid search
  • pull_census.py: creates intermediate data csv for ACS and NAICS data from census sources
  • pull_noaa.py: creates intermediate data csv for noaa weather data
  • pull_votes.py: creates intermediate data csv for MIT Election Data
  • utils.py: various utility functions

Instructions

WARNING: OUR CODE TAKES A LONG TIME TO RUN PROCEED WITH CAUTION

  1. read our awesome report
  2. run setup.sh from the main project folder to unzip data files and setup your virtual environment
  3. to recreate files in intermediate data, run create_intermediate_data.populate_intermediate_data() - otherwise simply use the files provided in the zipped folder
  4. to run our grid search, execute either fit_models_with_pca.py or fit_models_without_pca.py, these files will populate certain sections of the output folder with model objects, predictions, and scores on validation data - takes a long time
  5. to score models on our test data, run evaluate_models_on_test.py - this will create files corresponding to the predictions for the test period of each model in the models_predictions_nopca and models_predictions_pca folders
  6. play around with functions in chart_results.py to... chart the results. execute_plots and execute_test plots can be called to track model performance for different fips codes and target variables. other functions allow experimenting with different null models as benchmarks

covid19-ml-project's People

Contributors

noahpselman avatar stevenbuschbach avatar littleahn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.