shiivashaakeri / clustering-kmeans-from-scratch Goto Github PK

View Code? Open in Web Editor NEW

This is a Python implementation of k-means clustering algorithm from scratch. It allows you to cluster data points into K clusters using Euclidean distance as a similarity metric.

Jupyter Notebook 100.00%

clustering-kmeans-from-scratch's Introduction

Clustering using k-means from Scratch

This is a Python implementation of k-means clustering algorithm from scratch. It allows you to cluster data points into K clusters using Euclidean distance as a similarity metric.

Getting Started

Install the required packages:

pip install numpy pandas matplotlib sklearn

Import the KMeans class:

from kmeans import KMeans

Load a dataset:

from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
Y = iris.target

Usage

Instantiate the KMeans class with the number of clusters K and maximum number of iterations maxIter:

model = KMeans(K=5, maxIter=150)

Fit the model to the data:

hist = model.fit(X)

Plot the convergence of the algorithm:

plotX = list(range(len(hist)))
plt.plot(plotX, hist)
xTicks = [plotX[int((len(plotX) - 1) / 10 * i)] for i in range(10)]
plt.xticks(xTicks)
plt.show()

Get the mean point and variance for each cluster:

meanDists, variances = model.getMetrics()
for i in range(len(meanDists)):
    print('\nFor cluster %d:' % i)
    print('Mean Point:', meanDists[i])
    print('Variance:', variances[i])

Get the intra-to-inter ratio for each cluster:

interClusterDistances = model.getInterClusterDistances()
intraClusterDistances = model.getIntraClusterDistances()
for i in range(len(meanDists)):
    print('\nFor cluster %d:' % i)
    if type(meanDists[i]) == str:
        print('Intra-to-Inter Ratio: Empty Cluster')
    else:
        print('Intra-to-Inter Ratio:', intraClusterDistances[i] / interClusterDistances[i])

Recommend Projects