Coder Social home page Coder Social logo

siddharthpk / seng-474 Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 10.22 MB

Data mining project to predict COVID-19 second wave based on first wave government measures to curb COVID-19 cases.

Python 100.00%
data-science datamining neural-network random-forest decision-trees python

seng-474's Introduction

SENG 474 Project

Team Members

  • Siddharth Pathak
  • Gillian Bryson
  • Nathan Denny
  • Oliver Lewis
  • Chao Ge

Project Requirements

Since COVID-19 has spread out wildly through the world, each country has implemented different government responses to defend against the virus. However the goal for each government is the same, which is to reduce the number of COVID-19 cases and the ability for the virus to spread through its population. Governments can be more efficient and place their resources more precisely if they know which combination of responses have the greatest chance to be effective in reducing spread. Therefore, our goal for this project has been to create a machine learning program that can accurately predict if a set of government response measures will lead to an increase or (preferably) a decrease of COVID-19 cases. To achieve this goal, we collected our test and train data from Oxford University[1] and Worldometers[2] who have been tracking COVID-19 since the beginning of it’s outbreak at the beginning of 2020. Using this data, we ran several tests on different machine learning models while tuning hyperparameters to determine which methods give the highest classification accuracy.

Experiments

Outlined below are the experiments we conducted using the models to better gauge the results, determine new ways of looking and analysing our data, and check consistency in the accuracy levels.

Experiment 1 - Different time periods as the target feature

After an initial review of the trends in the proportionality of active cases with government measures we found that most of the measures result in an effect after an incubation period of 7 days, hence, we split the time periods to 1 week, 2 weeks, 3 weeks and 4 weeks after the measure was placed into action.

Experiment 2 - Adding an input feature for the number of new cases in the last week or two weeks

To measure accurate results in the success of the measure placed, we narrowed down our cases dependency to active cases and new cases. In this experiment we’re adding “New Cases” split over 2 time periods - 1 week & 2 weeks as a new input feature. We observed a few countries with higher measures in place and observed their progress over 1 & 2 weeks by monitoring their case numbers. Result in the numbers pointed towards the question - Should a country reduce measures in place if it’s still experiencing an uprise in emerging cases?

Experiment 3 - Removing country code as a input feature

Removing the country from the set of input features resulted in a generalized model that could be applied to any country. Additional input features such as the one outlined in Experiment 2 were included in this experiment.

Experiment 4 - Changing the Variables of each machine

Each machine has a set of variables that can be changed and adjusted to achieve better results. The Decision Tree can use different splitting equations, the Random Forest can have the number of trees fine tuned, and the Neural Network can have a multitude of parameters altered (we chose to focus on the number of layers). We also tested what percentage of test/train split would give us the best accuracy scores given that some methods like the Random Forest are less susceptible to overfitting issues.

Experimental Results

Best data split percentage:

On all three machines on average the best accuracies were achieved at a .3 test/train split which is a value supported by most literature and was generally expected.

Best Tree Criterion:

The best on average performing split criterion for weeks was gini with an accuracy range of [0.8375, 0.95]

Best Neural Hidden Layers Count:

The best performing number of hidden Neural network layers was 20, having an accuracy range of [0.78, 0.85]

The following graphs (Figure1, Figure2) show the accuracy scores of each method when applied to each time interval, and when using the parameters that gave us the best accuracy scores while testing:

Our accuracy results all came within the range of [0.7,0.95] which is very close to our goal of 75% accuracy. If we remove neural networks as one of our options we get an accuracy range of [0.77,0.95] which is nicely within the goals we set for ourselves in the midterm report.

There are far too many figures that come from our testing to include in this document. To view them please see our repo at https://github.com/siddharthpk/SENG-474/tree/master/outputs

References

For more info on SARS-Cov-1 see https://www.who.int/ith/diseases/sars/en/

seng-474's People

Contributors

ndenny1 avatar ollewis avatar

Watchers

James Cloos avatar Sid Pathak avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.