The project "Data Analysis of Cardiovascular Disease (CVD)" aims to analyze the factors contributing to cardiovascular disease and build a machine-learning model to predict the likelihood of an individual developing CVD. The analysis was performed using a dataset containing over 70,000 individual records from the Kaggle platform. The findings provide insights into various lifestyle, genetic, and environmental factors that play a role in cardiovascular health.
- Source: Cardiovascular Disease Dataset (Kaggle)
- Size: 70,000 individuals
- Objective: To discover the factors influencing CVD and build a predictive model
-
Discovery & Objective Definition
- Objective: Identify key factors contributing to CVD and build a predictive model
-
Data Cleaning
- Handling missing values and outliers
- Normalization and scaling of numerical data
-
Exploratory Data Analysis
- Identify correlations between variables and CVD
- Visualize the relationships using various graphs
-
Model Building & Testing
- Build machine learning models to predict CVD
- Optimize and evaluate the model performance
-
Results Interpretation & Reporting
- Interpret key findings and draw actionable recommendations
- Visualize insights and communicate results effectively
-
Age:
- Increased CVD cases are noticeable between ages 50-55.
- Aging causes heart muscles to thicken and arteries to stiffen, increasing blood pressure.
-
Body Mass Index (BMI):
- Higher BMI is associated with a higher risk of CVD.
- Extra strain on the heart due to high BMI leads to insulin resistance and type 2 diabetes.
-
Blood Pressure:
- Elevated blood pressure (hypertension) is a major risk factor for CVD.
- Increased workload due to hypertension can damage artery walls and cause heart attacks.
-
Lifestyle:
- Physical inactivity, smoking, and heavy alcohol consumption contribute to increased CVD risk.
-
Model Building:
- The dataset was used to train a machine-learning model to predict CVD.
- Features used included age, BMI, blood pressure, cholesterol, and lifestyle factors.
-
Model Performance:
- The predictive model achieved an accuracy of approximately 72%.
-
Feature Importance:
- Age, BMI, and blood pressure were the most significant features.
-
Enhanced Public Awareness:
- Promote lifestyle changes like a balanced diet and regular physical activity.
- Early screening and intervention should be encouraged.
-
Integration into Clinical Practice:
- Incorporate predictive models in routine clinical assessments.
-
Lifestyle Interventions:
- Expand programs to reduce risk factors through accessible lifestyle changes.
-
Expand Sample Size:
- Use a larger and more diverse sample size for better generalizability.
-
Incorporate More Variables:
- Add more detailed variables, such as specific cholesterol levels (LDL, HDL).
/
|-- notebooks/
| |-- eda-CVD-final.ipynb
|-- data/
| |-- Data Analysis of Cardiovascular Disease.pdf
|-- README.md