nihadtp / covid19patientprediction Goto Github PK

View Code? Open in Web Editor NEW

MAchie learning model developed to predict recovery of covid19 patient based on sex, age and state there living.

Scala 100.00%

covid19patientprediction's Introduction

covid19PatientDeathPredictionProbability

This project uses Random Forest classifier module in spark ml library to predict probability of recovery of a covid19 patient based on there age, sex and state they are residing.

##Probability of recovery of different states from age 10 to 90

Here x-axis is age and y-axis probability percentage.
Here we can notice that probability decreases as age increases.
Also probability of recovery is different for different states duu to difference in population, civic sense, hygiene etc
By dividing whole dataset into 80-20 as training and test data. It is calculated to be accurate upto 93%
Main data source are from covid19API in CSV format(Almost 50% data did not have sufficeint information)

Prequisites

Scala 2.11.12
Spark 2.4.6
SBT 1.0
Cassandra 4.0
cqlsh 5.0.1

Cassandra Set up

Log into cassandra cqlsh and create Keyspace called covid19 and table called prediction

cqlsh> CREATE TABLE covid19.prediction(sex text, age float, state text, p_sex double, p_state double, p_state double,\ PRIMARY KEY (state_prop, country_prop, date, uuid));

Steps to build jar

First clone github project to a folder
Download csv data soircce to local covidAPI
In src/main/scala/covid19PredictionApp.scala file. Edit the following

val rawData = spark.read.format("csv")
      .option("header", "true")
      .load("/home/nihad/machine_learning/patient_data/*.csv")

Under load function function put in your correct path

Then build fat jar using sbt assembly plugin

sbt assembly

In target folder you will find covid19Prediction-assembly.jar file
Start your cassandra service
Go to spark folder

spark-submit ~/pathtoyourtargetfolder/covid19Prediction-assembly.jar

Here app starts and fetches data from the folder, converts them to Dataframe and trains using Random Forest classifier And predicted values are then written to cassandra table.

Recommend Projects