srijanshovit / healthlearning Goto Github PK

A repo comprising of various Machine Learning and Deep Learning projects in healthcare domain.

Jupyter Notebook 100.00%

healthlearning's Introduction

Health Learning: ML and Deep Learning for Healthcare 🩺🧠

Health Learning is an open-source project aimed at leveraging machine learning (ML) and deep learning techniques to address various healthcare challenges. By harnessing the power of data-driven approaches, our goal is to develop predictive models, diagnostic tools, and decision support systems to improve patient outcomes, optimize healthcare delivery, and advance medical research.

Motivation 🚀

The field of healthcare is ripe for innovation, with vast amounts of data available from diverse sources such as electronic health records, medical imaging, wearable devices, and genetic sequencing. Health Learning seeks to harness this wealth of data to tackle a wide range of healthcare issues, including disease prediction, diagnosis, treatment optimization, and personalized medicine. By democratizing access to healthcare data and cutting-edge machine learning algorithms, we aim to empower researchers, clinicians, and healthcare professionals to make data-driven decisions and drive innovation in healthcare.

Datasets 📊

Health Learning provides access to a curated collection of healthcare datasets sourced from various sources, including public repositories like Kaggle. These datasets cover a broad spectrum of health-related topics, including maternal health, diabetes classification, cardiovascular disease risk factors, stroke prediction, cancer imaging, and more. Researchers and developers can explore these datasets to develop and validate machine learning models for a wide range of healthcare applications. Individual projects have their datasets mentioned in respective README.md files.

Contributing 🤝

Health Learning welcomes contributions from researchers, developers, healthcare professionals, and enthusiasts passionate about using machine learning and deep learning for healthcare. Whether you're interested in developing new models, improving existing algorithms, or curating datasets, there are plenty of opportunities to get involved. Check out our Contribution Guidelines to learn how you can contribute to the project.

Note: Health Learning is a community-driven initiative and is not affiliated with any specific healthcare organization or institution. We strive to promote collaboration, transparency, and open exchange of knowledge for the betterment of healthcare worldwide. Join us in our mission to revolutionize healthcare through machine learning and deep learning! 🌍💡

healthlearning's People

Contributors

Stargazers

Watchers

Forkers

sohamvalkyrie saksh8 arihant-bhandari mehekfatima avii-07 aditi1807 sinchana0m deedghost aatmajajoshi 1arka02 sanketv010 divyanshi1002 oj1o1 basma2423 piyushseth55 yusuf-khaan sanjanabankar vedanshipathak theiturhs dharanilakkireddy rutikaw1155 dakshsinghrathore roysammy123 jain-anshika w-ight kunalsharma2001 hit-haa18 omghumre avanimehta11 sawarijamgaonkar disha-16 seeratfatima19 officeneerajsaini asymtode712 pradnyagaitonde anuragsarkar12 anasshadad ranamanish674zu chillthrower garvitjoshi1

healthlearning's Issues

Metabolic Syndrome Prediction | 1. Dataset Exploration

HI, im one of the contributors under GSSoC'24 and wanted to work on the stated problem statement. i would be working on basic Exploratory Data Analysis as well as modelling : i would be implementing traditional as well as gradient boosting algorithms, submitting a report on metrics as well as commenting information and segmenting sections in the work file.

i hope i can be assigned this.

Stroke Prediction

I would like to use a few classification models like decision tree, xgboost, random forest for stroke detection. Please assign me the issue. I'm a GSSoC'24 Contributor

Diabetes Prediction | EDA

Hi @SrijanShovit
I would like to perform EDA on Diabetes Prediction dataset under GSSOC'24
Could you please assign me this

Maternal Health Risk prediction

Hi, I am Disha Mukhopadhyay, currently persuing BTech in Computer Science and Engineering (CSE). I have done an internship as an AI intern. and published an IEEE paper in Machine Learning domain, and also have some papers in proceeding in Machine Learning and data science domain. It will be very helpful if you could assign me this project for GSSOC'24 as a contributor, so that I can work on this project.
Approach for this Project :
1.Data collection and preprocessing: In this section, we will collect and gather the dataset, and preprocess it
2.Exploratory Data Analysis(EDA): Visual inspection, statistical summary, data distribution will be performed.
3.Model Selection and Deployment: 4 types of machine learning models(XGBoost, Logistic regression Random Forest, gaussian naive Bayes) will be chosen and implemented.
4.Model Training and evaluation: Each model will be trained on the dataset, and performance of each model will be displayed.
5.Model Comparison and Selection: Will analyze the performance of all models based on the metrics obtained and will Choose the model that shows the best balance between accuracy, generalizability, and computational efficiency.
Thank You

Ayurveda GPT

Problem Description:
The problem is to develop an Ayurveda GPT (Generative Pre-trained Transformer) application trained on concepts provided by ancient Ayurvedic sages such as Sushruta, Charak, and others.

Solution Description:

Gather Data: Collect a diverse dataset of texts, scriptures, and teachings from ancient Ayurvedic texts, including those authored by Sushruta, Charak, and other sages.
Preprocess Data: Clean and preprocess the collected data to remove noise, standardize text formats, and prepare it for training.
Train GPT Model: Utilize the preprocessed data to train a GPT model, ensuring that it captures the nuances and intricacies of Ayurvedic principles, treatments, and philosophies.
Fine-tune Model: Fine-tune the trained GPT model on specific tasks or domains within Ayurveda, such as diagnosis, treatment recommendations, herbal remedies, etc.
Develop Application: Build an intuitive and user-friendly application interface that allows users to interact with the trained GPT model.
Deploy Application: Deploy the Ayurveda GPT application on a suitable platform, making it accessible to users.

Alternatives Considered:

Utilizing existing Ayurveda datasets: Instead of collecting data from scratch, leverage existing datasets of Ayurvedic texts and teachings.
Transfer learning: Instead of training a GPT model from scratch, use transfer learning techniques to adapt pre-trained language models to the Ayurveda domain.

Additional Context:
The completion of this feature will be determined by the successful development and deployment of the Ayurveda GPT application, as well as its usability and effectiveness in providing accurate and valuable insights into Ayurvedic principles and practices.

Chest X-rays: Pneumonia Detection | 1. Dataset Preparation

Step 1:

Add relevant folder with readme in the repo
Can explore datasets; btw we can start with https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia
Explore classes and their respective sizes in dataset and perform necessary augmentation.
Decide upon the number of samples nearly for each class and relevant strategy for noise removal,augmentation(be careful about this).
Upload augmented dataset on kaggle(easy to work with later).

These all steps make us ready to start working on a proper dataset.

Cirrhosis Prediction | 2. EDA

Step 2:

Perform Univariate and Multivariate analysis and draw conclusions from there.
Explore Correlation Matrix(can try different methods and search if they give the same conclusion and why)
Check the distribution(skewness) of the columns
Detect Outliers(don't remove)
Detect Class Label Imbalance

Provide as much relevant graphs and conclusive markdown cells as possible.

@aditi1807

Create a Issue template

I wish to add a issue template for each issues like bug, feature addition, documentation update etc

Maternal Health Risk Prediction | 2. EDA

Step 2:

Perform Univariate and Multivariate analysis and draw conclusions from there.
Explore Correlation Matrix(can try different methods and search if they give the same conclusion and why)
Check the distribution(skewness) of the columns
Detect Outliers(don't remove)
Detect Class Label Imbalance

Provide as much relevant graphs and conclusive markdown cells as possible.

Cardiovascular Heart Disease | Performing Imputer of null value and Normalization of data set and EDA.

Hi I would like to transform the data to set to be fit for model trainnig this include detection removing outilers ,filling null value ,Normalization

can you assign me the issue

PCOS Detection | 1. Dataset Exploration

@Piyushseth55 Would you like to take it up?

Step 1

Load the dataset
Explore and confirm features and label(s) of this dataset
Explore size/shape of dataset
Investigate data type of features and labels and chose any better option for a particular column for data type if possible
Calculate the memory usage differences
Explore the statistical facts like mean, median, x percentiles of the columns

Body Fat Prediction | 1. Dataset Exploration

Use plots and statistical methods for detecting outliers and bring up solutions to deal with them.

Breast Cancer | 2. EDA

Is your feature request related to a problem? Please describe.

Breast Cancer EDA

Describe the solution you'd like

EDA

Describe alternatives you've considered

No response

Additional context

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Metabolic Syndrome Prediction | 3. Statistical Tests for Feature Importance

Explore 3-4 tests like PCA, chi2, anova, Permutation related tests. @Arihant-Bhandari

Cardiovascular Heart Disease Prediction | 2. EDA

Step 2:

Perform Univariate and Multivariate analysis and draw conclusions from there.
Explore Correlation Matrix(can try different methods and search if they give the same conclusion and why)
Check the distribution(skewness) of the columns
Detect Outliers(don't remove)
Detect Class Label Imbalance

Provide as much relevant graphs and conclusive markdown cells as possible.

@deedGhost Please proceed with this as Step 1 is done.

Acne Prediction | Input pipeline and Dataset preprocessing

I wish to add a deep learning model that could predict predict acne based on the image analysis

Cirrhosis Prediction | 1. Explore Dataset

Step 1

Load the dataset
Explore and confirm features and label(s) of this dataset
Explore size/shape of dataset
Investigate data type of features and labels and chose any better option for a particular column for data type if possible
Calculate the memory usage differences
Explore the statistical facts like mean, median, x percentiles of the columns

@aditi1807 Please proceed if you would like to take it up.

Body Fat Prediction | 2. EDA

Step 2:

Detect Outliers(don't remove)
Detect Class Label Imbalance

Provide as much relevant graphs and conclusive markdown cells as possible.
Proceed here after #1

Brain Tumor MRI Classification

Being a GSSOC'24 contributor, I am looking forward to proceed with "Brain Tumor MRI Classification" focusing on MRI images. The dataset will undergo preprocessing steps such as cleaning (if required, suppose the image data contains some salt and pepper noise, it can be removed using Median filter), and feature extraction. For modeling, I will utilize convolutional neural networks (CNNs), well-suited for image classification tasks.

Thanks!

Infant Health Prediction | 1. Explore Dataset

Step 1

Load the dataset
Explore and confirm features and label(s) of this dataset
Explore size/shape of dataset
Investigate data type of features and labels and chose any better option for a particular column for data type if possible
Calculate the memory usage differences
Explore the statistical facts like mean, median, x percentiles of the columns

@Basma2423 Please proceed.

Missing Code of Conduct File in Repository

Currently, the repository lacks a Code of Conduct file, which is an essential component for fostering a healthy and inclusive open-source community. A Code of Conduct serves as a guideline for expected behaviour, ensuring that contributors and participants feel safe, respected, and valued within our community space.

Please assign this issue to me.

Lung Cancer Detection | 1. Dataset Exploration

Hello @SrijanShovit
I am one of the contributors to GSSOC'24. I would like to contribute to the Lung Cancer Detection project by doing EDA and then making ML models for the prediction. I will be using matplotlib and plotly for the analysis, and for modelling I will be using algorithms like LR, RF , XGBClassifier.

Hoping to get a positive response.

Thank you

Breast Cancer Prediction | 1. Dataset Exploration

@SrijanShovit
I would like to contribute for the project Breast Cancer Prediction as GSSOC'24 contributor by implementing Logistic Regression model .
Thank you.

Breast Cancer | 3. Statistical Feature Importance

Is your feature request related to a problem? Please describe.

Feature importance

Describe the solution you'd like

Statistical Methods

Describe alternatives you've considered

@vedanshipathak Proceed here.

Additional context

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Pneumonia Detection from Chest X-rays

Description

Pneumonia is a common and potentially life-threatening condition, particularly among vulnerable populations such as children, the elderly, and individuals with weakened immune systems. Chest X-rays are a common imaging modality used to diagnose pneumonia, but interpretation can be subjective and time-consuming for radiologists. By developing a machine learning model to automatically detect pneumonia from chest X-ray images, this project aims to assist radiologists in triaging cases, speeding up diagnosis, and improving patient outcomes.

Adding a pneumonia detection project would further expand the scope of this repository and provide a valuable resource for healthcare professionals and researchers working in the field of medical imaging and diagnosis.

@SrijanShovit I will be really thankful if you kindly assign me this issue as part of GSSoC'24

Tuberculosis Classification DL | 1. Dataset Prep

Is your feature request related to a problem? Please describe.

Tuberculosis is a conveyance illeness that occurs ailing health and death of millions So i would like to Create a classification model using Deep learning techniques

Describe the solution you'd like

X-ray examination is cosidered to be the most commonly used because of its low cost , wide range of application and fast speed so going to detect the features from X-ray is ours top priority

Describe alternatives you've considered

Going to proposed a model with deep learning algorithm CNN with some basics layers like conv2D,maxpooling2d,flattendense or we can use Yolo as well or use which works best For this tuberculosis classification we are going to use chest X-Ray image dataset and aim to achieve high accuracy

Additional context

Steps to be followed:-

1-Dataset preprocessing and Some preps( includes gathering , cleaning ,understanding , choosing best dataset ,etc)
2-feature engineering and Image preprocessing
3-Augumention (flipping ,cropping ,resizing, rotating ,etc)
4-train and test split
5-train best model using Training set
6-Validate the model using test set
7-Accuracy ,mAps,Confusion matrix
8- Now inferencing , testing in actual environment

Code of Conduct

I agree to follow this project's Code of Conduct

Stroke Prediction | 1. Explore Dataset

Step 1

Load the dataset
Explore and confirm features and label(s) of this dataset
Explore size/shape of dataset
Investigate data type of features and labels and chose any better option for a particular column for data type if possible
Calculate the memory usage differences
Explore the statistical facts like mean, median, x percentiles of the columns

@Divyanshi1002 Please proceed.

Infant Health | EDA

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Perform EDA on the Infant Health Dataset
The data of infant health is ready and i would like to perform EDA on it.

Details
Perform univariate and multivariate analysis, outlier detection and possibly removal if you allow and other all standard EDA processes.
Feature selection.
Feature extraction - Will try to create new features for best accuracy

I also want to train a ML and DL algo using Linear Regression, Decision Tree, XG Boost, and neural network.

Compare all the models and determine which model will perform best with particular data

I am GSSOC contributor.

Body Fat Prediction | 3. Feature Importance | 3.3 ML Based Feature Importance Tests

Is your feature request related to a problem? Please describe.

Related to highlighting features importance in this dataset.

Describe the solution you'd like

Explore techniques like RFE, DT and others (at least 5).

Describe alternatives you've considered

No response

Additional context

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Body Fat Prediction | 3. Feature Importance | 3.1 PCA

Use PCA to plot variance explained ratio plot and cumulative variance explained ratio plot and conclude the ranking of feature importance.

Please proceed @MehekFatima

Stroke Prediction | 2. EDA

Step 2:

Perform Univariate and Multivariate analysis and draw conclusions from there.
Explore Correlation Matrix(can try different methods and search if they give the same conclusion and why)
Check the distribution(skewness) of the columns
Detect Outliers(don't remove)
Detect Class Label Imbalance

Provide as much relevant graphs and conclusive markdown cells as possible.

Once finished #16, proceed with this @Divyanshi1002

Create a PULL Request Template

I wish to add a pull request template for contributing in this project

explorative data analysis

I want to add some more modifications to the existing notebooks regarding explorative data analysis such as data normalization, decreasing skewness and trying more models like random forests, naive bayes etc
please assign this issue to me under gssoc 2024
I can contribute to this project

Cardiovascular Heart Disease Prediction | 1. Dataset Exploration

Hey @SrijanShovit
I would like to contribute for the project Cardiovascular Heart Disease as GSSOC'24 contributor by implementing the KNN and SVM models on the given dataset.
Thank you.

Metabolic Syndrome Prediction | 2. EDA

Step 2:

Perform Univariate and Multivariate analysis and draw conclusions from there.
Explore Correlation Matrix(can try different methods and search if they give the same conclusion and why)
Check the distribution(skewness) of the columns
Detect Outliers(don't remove)
Detect Class Label Imbalance

Provide as much relevant graphs and conclusive markdown cells as possible.
after #5, proceed here @Arihant-Bhandari

Period Tracker to detect PCOS

Hi @SrijanShovit
The purpose of this issue is to build and enhance period tracker application to better support individuals with PCOS or to detect PCOS.
Could you assign me this issue under GSSOC'24?

Contributors Highlight

Is your feature request related to a problem? Please describe.

To highlight handles of contributors

Describe the solution you'd like

Script that picks the contributors from repo and fetches their linkedin, twitter, github profiles and displays in tabular form.

Describe alternatives you've considered

No response

Additional context

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Lung Cancer Detection | 2. EDA

Hi @SrijanShovit ,
I would like to perform EDA on Lung Cancer Detection which includes detecting outliers, handling null values.
Could you assign me this issue.

Body Fat Prediction | Distribution Exploration

Check if the features have normal distribution or some other distributions. What can be the impact of this on ML models?
What remedies can be there to solve any possible issues?

Endometriral Cancer Prediction | 1. Explore Dataset & 2. EDA

Explore the dataset: https://www.kaggle.com/datasets/yeganehbavafa/uterine-corpus-endometrial-carcinoma

@RutikaW1155

Do check how other 1. Explore Dataset & 2. EDA issues need the insights. Make sure to add readme and files in separate folder.

CTG Data :Featal Health Classification | 1. Data preparation And Pipeline Creation

Hi @SrijanShovit I would like to work on this Dataset These are proposed steps :

1.Exploring dataset , EDA , class identifications
2.Visualizing the dataset and corresponding classes to gain insights.
3.Imputing Outliers and NULL values
4.Normalization of skewed values and One Hot coding Text attributes
3.Comparing KNN.Random Forest and SVC
4.Validatiing the Evaluated models on the validation set using metrics like accuracy, precision, recall, and F1 score
5.Including visualizations (e.g., training curves, confusion matrices) to better illustration of the model performance.

Further :

Creation of Automated Pipeline and Custom Tranformer For the Datset each steps proposed .

Body Fat Prediction | 3. Feature Importance | 3.2 Chi-sqaured and/or Anova test

Use these tests to again rank the features for their importance contribution towards the label. @MehekFatima (if you want to take up the issue)

Feat: Automate greeting using Github bot 🤖

Describe the feature

As the contributor count rises on the repo, it becomes increasingly challenging for maintainers to personally greet and encourage each contributor for their valuable input. Equally important is the reminder for them to review the project's contribution guidelines.

Add ScreenShots

PR greeting message ⬇️

ISSUE greeting message ⬇️

Record

I agree to follow this project's Code of Conduct
I'm a GSSoC'24 contributor
I want to work on this issue

Diabetes Classification | 1. Dataset Exploration

Hey @SrijanShovit
I am Anurag , I am contributor in GSSoC'24 . I am exploring machine learning and would like to work on the diabetes classification dataset.
My Approach:
I would begin my process by creating in-depth plots between the several features of a diagnosis to predict which parameters are relevant to the diagnosis. Based on my deductions I would use the chosen parameters to create a classification model using either Logistic Regression or Random Forest Classifier, making changes if necessary along the way.
I hope you find my approach useful and assign me this project under GSSoC'24.

Thank You,
Anurag

Hepatitis C Prediction | 1. Explore Dataset

Step 1

Load the dataset
Explore and confirm features and label(s) of this dataset
Explore size/shape of dataset
Investigate data type of features and labels and chose any better option for a particular column for data type if possible
Calculate the memory usage differences
Explore the statistical facts like mean, median, x percentiles of the columns

@SanjanaBankar Please proceed.

Maternal Health Risk Prediction | 1. Explore Dataset

Step 1

Load the dataset
Explore and confirm features and label(s) of this dataset
Explore size/shape of dataset
Investigate data type of features and labels and chose any better option for a particular column for data type if possible
Calculate the memory usage differences
Explore the statistical facts like mean, median, x percentiles of the columns

Infant Health Prediction | 2. EDA

Step 2:

Perform Univariate and Multivariate analysis and draw conclusions from there.
Explore Correlation Matrix(can try different methods and search if they give the same conclusion and why)
Check the distribution(skewness) of the columns
Detect Outliers(don't remove)
Detect Class Label Imbalance

Provide as much relevant graphs and conclusive markdown cells as possible.

PCOS Detection | 2. EDA

Is your feature request related to a problem? Please describe.

PCOS Detection EDA to investigate data and generate insights.

Describe the solution you'd like

Step 2:

Perform Univariate and Multivariate analysis and draw conclusions from there.
Explore Correlation Matrix(can try different methods and search if they give the same conclusion and why)
Check the distribution(skewness) of the columns
Detect Outliers(don't remove)
Detect Class Label Imbalance

Provide as much relevant graphs and conclusive markdown cells as possible.

Describe alternatives you've considered

No response

Additional context

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Stroke Prediction

Hello @SrijanShovit,
I would like to contribute for Stroke Prediction under GSSOC'24, by implementing the following methods for Stroke prediction:

Logistic Regression: It an appropriate choice for predicting stroke likelihood (0 if no stroke, 1 if stroke).

Random Forest: Multiple decision trees to improve predictive performance. Given the variety of attributes in the dataset and the potential non-linear relationships between predictors and stroke likelihood.

Please assign me this issue under label GSSOC'24.
Looking forward to contribute in this project!

Best Regards,
Divyanshi
GSSOC'24 contributor

Decision tree classification, Data visualization

Hey @SrijanShovit
I am GSSOC Contributor .I'll add decision tree classifier in LUNG CANCER DETECTION ...Can you please assign me

srijanshovit / healthlearning Goto Github PK

healthlearning's Introduction

Health Learning: ML and Deep Learning for Healthcare 🩺🧠

Motivation 🚀

Datasets 📊

Contributing 🤝

healthlearning's People

Contributors

Stargazers

Watchers

Forkers

healthlearning's Issues

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Code of Conduct

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Code of Conduct

Pneumonia Detection from Chest X-rays

Description

@SrijanShovit I will be really thankful if you kindly assign me this issue as part of GSSoC'24

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Code of Conduct

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Code of Conduct

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Code of Conduct

Describe the feature

Add ScreenShots

PR greeting message ⬇️

ISSUE greeting message ⬇️

Record

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Code of Conduct

Recommend Projects

Recommend Topics

Recommend Org