NOTE: This file is a template that you can use to create the README for your project. The TODO comments below will highlight the information you should be sure to include.
This project is part of Udacity Capstone Project. It is performed using two models:
- Automated ML and
- Hyperparameters are tuned using HyperDrive.
The project is carried out using below steps
- Import the External dataset
- Train Auto ML model
- Train Hyperdrive model
- Compare model performance
- Deploy best model
- Test model endpoint
Dataset is downloaded from Kaggle repository and used through Github.
Kaggle link : https://www.kaggle.com/andrewmvd/heart-failure-clinical-data
Citation: Davide Chicco, Giuseppe Jurman: Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics and Decision Making 20, 16 (2020)
12 clinical features:
- age: age of the patient (years)
- anaemia: decrease of red blood cells or hemoglobin (boolean)
- high blood pressure: if the patient has hypertension (boolean)
- creatinine phosphokinase (CPK): level of the CPK enzyme in the blood (mcg/L)
- diabetes: if the patient has diabetes (boolean)
- ejection fraction: percentage of blood leaving the heart at each contraction (percentage)
- platelets: platelets in the blood (kiloplatelets/mL)
- sex: woman or man (binary)
- serum creatinine: level of serum creatinine in the blood (mg/dL)
- serum sodium: level of serum sodium in the blood (mEq/L)
- smoking: if the patient smokes or not (boolean)
- time: follow-up period (days)
In this project, we will predict the death prediction or Heart failure rate with the help of 12 attributes provided in the dataset. The target ("DEATH_EVENT") column with values of 1 means person will suffer from heart failure and 0 means no heart failure.
Explain how you are accessing the data in your workspace.
We download the heart failure dataset from kaggle as a csv file and upload the same to github and access the using rawcontent process from github.
Below is the screenshot after the dataset is registered
Before we proceed with the project , firstly we will create a compute instance to run our jupyter files.
Give an overview of the automl
settings and configuration you used for this experiment
Automl is also known as Automated ML which helps in rapidly performing multiple iteration on different algorithms. It also supports Ensemble methods. Here we get voting ensemble as our best run.
Automl Configuaration
What are the results you got with your automated ML model? What were the parameters of the model? How could you have improved it?
We got Voting Ensembler as best model with an accuracy of
Screenshots of the RunDetails
widget as well as a screenshot of the best model trained with it's parameters.
Here is the Automl run details
Best Auto Ml model
Best run id
What kind of model did you choose for this experiment and why? Give an overview of the types of parameters and their ranges used for the hyperparameter search
We choose a hyperdrive model with Randomparameter sampling , Early termination policy we used is Banditpolicy with a sloack facotr of 0.1 and we have used Accuracy as our primary metric.
What are the results you got with your model? What were the parameters of the model? How could you have improved it?
Screenshots of the RunDetails
widget as well as a screenshot of the best model trained with it's parameters.
Best hyperdrive model
Logs files of the services
Give an overview of the deployed model and instructions on how to query the endpoint with a sample input.
Application insights of Hyperdrive service
Application insights of Automl service
Since we completed all the related works , we will be deleting the Endpoint services and Compute clusters and instances
TODO Provide a link to a screen recording of the project in action. Link
- A working model
- Demo of the deployed model
- Demo of a sample request sent to the endpoint and its response
(Optional): This is where you can provide information about any standout suggestions that you have attempted.
- We have enabled Application insights
- More data would help in getting more insights from the Automl and hyperdrive methods
- Feature engineering can be performed
- Different feature reduction techniques could be used like PCA, RFE
- Using Cross validation techniques would help in cribbing problems like overfitting
- Th model can be converted to ONXX format and be deployed on Edge services.