- performed data cleaning operation
- Outlier and missing value treatments
- Finding no'of clusters using Silhoutte Score
- Built model with both K-means clustering and heirarchical clustering and compared both the results
Python Version: 3.7 Packages: Pandas, numpy, sklearn, matplotlib, seaborn, sklearn, scipy
- Data Quality checks on columns like
exports
andimports
- Rounding of the numbers to 2 decimal points for ease of analysis
- Checking for missing values
Outlier identification and capping
the upper quartile to 99th percentile- Univariate and bivariate analysis
- burundi has the low income as per primary analysis and can be a country in desire need of aid
- Before starting clustering the data first i have started with hopkins score to see the cluster tendensy
- Acheived result of hopkins score: 0.96
- Scaled data by importing
StandardScalar
from sklearn
Started building the model with K means clustering and to determine no'of clusters i have used Silhoutte Score
- From the above elbow curve i have concluded to use k = 4 for my model