Coder Social home page Coder Social logo

iron486 / country_data_clustering Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 66.78 MB

Categorization of world countries using socio-economic and health factors

Jupyter Notebook 14.29% HTML 85.71%
birch country-data data-visualization dbscan kmeans correlation-analysis unsupervised-learning plotly

country_data_clustering's Introduction

Country_data_clustering

The objective of this project was the categorisation of world countries using socio-economic and health factors that indicate the overall development of the country.

The dataset was taken from here https://www.kaggle.com/datasets/rohan0301/unsupervised-learning-on-country-data.

Data from 167 countries were given in csv format Country_data_clustering with the following features:

country Name of the country;

child_mot Death of children under 5 years old per 1000 live births;

exports Exports of goods and services per capita. Given as %age of the GDP per capita;

health Total health spending per capita. Given as %age of GDP per capita;

imports Imports of goods and services per capita. Given as %age of the GDP per capita;

Income Net income per person;

Inflation: The measurement of the annual growth rate of the Total GDP;

life_expec: The average number of years a new born child would live if the current mortality pattern remains the same;

total_fer: The number of children that would be born to each woman if the current age-fertility rate remains the same;

gdpp: The GDP per capita. Calculated as the Total GDP divided by the total population.

In the notebook called Country_data_clustering_kmeans.ipynb, I applied k-means algorithm, whilst in this one Country_data_clustering_DBSCAN_Birch.ipynb I applied DBSCAN and Birch.

Firstly, I imported the libraries and read the dataset. Then, I explored the datasets looking at the main statistical parameters and calculating the correlation matrix for all the numerical features.

()

I plotted the countries in the World and in Europe with their respective value for each feature. The interactive plots can be found at the following links:

Interactive-plots_Europe_child_mort, Interactive-plots_Europe_exports, Interactive-plots_Europe_gdpp, Interactive-plots_Europe_health, Interactive-plots_Europe_imports, Interactive-plots_Europe_income, Interactive-plots_Europe_inflation, Interactive-plots_Europe_life_expec, Interactive-plots_Europe_total_fer, Interactive-plots_World_child_mort, Interactive-plots_World_exports, Interactive-plots_World_gdpp, Interactive-plots_World_health, Interactive-plots_World_imports, Interactive-plots_World_income, Interactive-plots_World_inflation, Interactive-plots_World_life_expec, Interactive-plots_World_total_fer

Afterwards, I plotted a violin plot to represent the frequency of the values for each feature. I scaled the data and I applied the K-means algorithm, plotting the inertia and the silhouette score for each chosen number of cluster:

According to the plot of the inertia, the optimal number of cluster is 4 since the curve has an "elbow" at 4 cluster. The silhouette score indicates a high value at 4 clusters, too. In this case, instead, I decided to choose 3 clusters since the algorithm isolates better the countries that need more help.

Next, I plotted an interactive plot able to visualize the clusters (represented with 3 different colors) in a better way. Below, it is possible to check out both the static and interactive plots (click on the link below the figure).

Each feature can be bounded to some particular values, clicking on the bar associated with each feature and unclicking when the user is satisfied with the range of values.

Features vs Labels Kmeans: Interactive Plot

features_and_labels_plot_interactive

Click here to check the interactive plot --> Features vs Labels Kmeans: Interactive Plot

Below, instead, I plotted the different clusters on the globe. Each cluster can be associated with countries that have similar development conditions.

Kmeans: Needed Help Per Country

NeededHelpPerCountry(World)kmeans

Click here to check the interactive plot --> Kmeans: Needed Help Per Country

At the end, a correlation plot was plotted enhancing the 3 different clusters and showing how they were separated in the feature hyperspace.

Kmeans clustering scatterplots

Kmeans clustering scatterplots

Click here to download the plot --> Kmeans: scatterplots

DBDSCAN and Birch were also applied (take a look to the following notebook Country_data_clustering_DBSCAN_Birch.ipynb), showing the following results:

DBSCAN: Needed Help Per Country

Needed Help Per Country (World)

Birch: clustering scatterplots

NeededHelpPerCountry(World)_birch

Note: The interactive plots and the other graphs used for Kmeans with the other algorithms, can be found in the notebooks.

It can be observed that DBSCAN found a consistent number of outliers, even though different hyperparameters were tested.

Using Birch, the result is similar to Kmeans, apart from few countries that were not considered in the same Kmeans classes.

country_data_clustering's People

Contributors

iron486 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.