kmeans's Introduction

C++ implementation of KMeans

K-Means is a very simple clustering algorithm (clustering belongs to unsupervised learning). Given a fixed number of clusters and an input dataset the algorithm tries to partition the data into clusters such that the clusters have high intra-class similarity and low inter-class similarity.

Algorithm

Initialize the cluster centers, either randomly within the range of the input data or (recommended) with some of the existing training examples
Until convergence
2.1. Assign each datapoint to the closest cluster. The distance between a point and cluster center is measured using the Euclidean distance.
2.2. Update the current estimates of the cluster centers by setting them to the mean of all instance belonging to that cluster

Disadvantages of K-Means

The number of clusters has to be set in the beginning
The results depend on the inital cluster centers
It's sensitive to outliers
It's not suitable for finding non-convex clusters
It's not guaranteed to find a global optimum, so it can get stuck in a local minimum

C++ implementation

C++ code

kmeans's People

Contributors

kmeans's Issues

DEFUSE

你好，最近看到了你发的文章Asymptotically Unbiased Estimation for Delayed Feedback Modeling via Label Correction，感觉挺好的，我也在做一些延迟建模方面的工作。想在DEFUSE项目下留言，但是那个项目没开issue，所以在这个项目下留言了。问下我们可不可以交流下延迟建模方面的想法哈？

Recommend Projects