Coder Social home page Coder Social logo

ecode-ethiopia / supervised-machine-learning-ensemble-model-for-type-2-diabetes-prediction Goto Github PK

View Code? Open in Web Editor NEW

This project forked from akula01/supervised-machine-learning-ensemble-model-for-type-2-diabetes-prediction

0.0 1.0 0.0 7.79 MB

Supervised-Machine-Learning-Ensemble-model-for-Type-2-Diabetes-Prediction

Jupyter Notebook 100.00%

supervised-machine-learning-ensemble-model-for-type-2-diabetes-prediction's Introduction

Supervised-Machine-Learning-Ensemble-model-for-Type-2-Diabetes-Prediction

According to the American Diabetes Association(ADA), 30.3 million people in the United States have diabetes, but only 7.2 million may be undiagnosed and unaware of their condition. Type 2 diabetes is usually diagnosed for most patients later on in life whereas the less common Type 1 diabetes is diagnosed early on in life. People can live healthy and happy lives while living with diabetes, but early detection produces a better overall outcome on most patient's health. Thus, to test the accurate prediction of Type 2 diabetes, we use the patients' information from an electronic health records company called Practice Fusion, that has about 10,000 patient records from 2009 to 2012. This data contains individual key biometrics, including age, diastolic and systolic blood pressure, gender, height, and weight.

We use this data on popular machine learning algorithms: k-Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forest, Gradient Boosting, MLP Neural Network, and Naive Bayes. For each algorithm, we tune hyperparameters to produce the best accuracy, and evaluate the performance of every model based on their classification accuracy, precision, sensitivity, specificity/recall, negative predictive value, and F1 score. Overall, the highest classification accuracy achieved is 82.54% by the MLP Neural Network.

In our study, we find that all algorithms other than Naive Bayes suffered from very low precision. Hence, we take a step further and incorporate all the algorithms into a weighted average or soft voting ensemble model where each algorithm will count towards a majority vote towards the decision outcome of whether a patient has diabetes or not.

Unlike the previous works that focused either particular classifier-set or a Pima Indians dataset that is heavily biased towards limited female population, we use a new approach and dataset, yet use the Pima Indians dataset for the baseline comparison. While the accuracy of the previous works on Pima Indians dataset was less than 80%, the accuracy of our Ensemble model reached 89% for the same dataset. The accuracy of the Ensemble model on Practice Fusion is 85%, by far our ensemble approach is new in this space.

We firmly believe that the weighted average ensemble model not only performed well in overall metrics but also helped to recover wrong predictions and aid in accurate prediction of Type 2 diabetes. Our accurate model can be used as an alert for the patients to seek medical evaluation in time.

supervised-machine-learning-ensemble-model-for-type-2-diabetes-prediction's People

Contributors

akula01 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.