Light

akhilreddy1795 / clustering_models Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 353 KB

Repo for the data science project on clustering models

Jupyter Notebook 100.00%

data-science clustering-algorithm kmeans-clustering heirarchical-clustering seaborn python3 pandas-dataframe exploratory-data-analysis dataanalysis

clustering_models's Introduction

Data Science Clustering Models: project Overview

performed data cleaning operation
Outlier and missing value treatments
Finding no'of clusters using Silhoutte Score
Built model with both K-means clustering and heirarchical clustering and compared both the results

Code and Resources Used

Python Version: 3.7 Packages: Pandas, numpy, sklearn, matplotlib, seaborn, sklearn, scipy

Data Cleaning

Data Quality checks on columns like exports and imports
Rounding of the numbers to 2 decimal points for ease of analysis

EDA

Checking for missing values
Outlier identification and capping the upper quartile to 99th percentile
Univariate and bivariate analysis
burundi has the low income as per primary analysis and can be a country in desire need of aid

Model Building

Before starting clustering the data first i have started with hopkins score to see the cluster tendensy
Acheived result of hopkins score: 0.96
Scaled data by importing StandardScalar from sklearn

Started building the model with K means clustering and to determine no'of clusters i have used Silhoutte Score

From the above elbow curve i have concluded to use k = 4 for my model

Model Performance

Plotted the clusters for GDPP vs Child_mortality and noted the cluters which have high mortality rate as these countries which needs immediate aid
differentiated clusters based on child_mortality, income, gdpp
Cluster 3 having the list of countries which we need to focus on funding on priority.

clustering_models's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.