This project aims to develop an end-to-end machine learning model for predicting diabetes. The problem is framed as a binary classification issue, where the outcome is whether an individual has diabetes. The model uses logistic regression, a popular algorithm for binary classification tasks.
The model inputs are critical health metrics, including:
- Pregnancies
- Age
- BMI (Body Mass Index)
- Glucose
- Blood Pressure
- Insulin
- Diabetes Pedigree Function
- Skin Thickness
Collect data relevant to the problem. This includes all the input variables necessary for model.
Perform a statistical analysis to understand the distribution, count, and basic statistical measures of the data.
Visualize the data to identify patterns, outliers, and relationships between variables.
Clean and prepare the data for modeling. This includes handling missing values, feature scaling, and splitting the dataset into training and test sets.
Implement logistic regression, SVC, Decision Tree classifier, Naive Bayes algorithm to develop the prediction model. Use the training set for this purpose.
Evaluate the model's performance using the test set. Metrics such as accuracy, precision, recall, and F1-score are considered for evaluation.The best model comes out is Naive bayes model with accuracy upto 76%.
Deployed the trained model on AWS Elastic Beanstalk, with a CodePipeline set up between the GitHub repository and Elastic Beanstalk for continuous integration and deployment.
This project demonstrates the power of machine learning in predicting health outcomes. By following these steps, we can develop a robust model for diabetes prediction using logistic regression.
I'm a 2nd-year BTech Computer Science student deeply fascinated by the potential of Artificial Intelligence (AI) and Data Science.
If you have any feedback, please reach out to us at [email protected]