This repo implements simple k-means clustering.
conda env create -f environment.yml
conda activate kmeans
python test.py -n_clusters 5 -n_points 100
# After the above commands, look into the `result` dir to check the generated result
For algorithmic implementation : 45min
For visualization codes : 1 hour
-
test.py
generates-n_points
number of data points for-n_cluters
number of 2 dimensional multivariate gaussian distributions. The parameters for each distribution, mean and covariance, is random-sampled but cross-dimensional variances are set to be0
for its visual tidiness. -
The generated data points and its sampled gaussian distributions will be visualized under the name of
result/raw.png
andresult/gaussian.png
respectively. -
After data generation, the
KMeans
runs on the data and itslabels
andcentorids
will be plotted under the name ofresult/kmeans.png
-
To compare the
Gaussian
andKMeans
together, please refer to theresult/final.png
which concatenates every generated pngs. -
To check the convergence, please refer to
result/log.txt
which logs theintertia
andcenter shift
.
- Taking pictures of shifting centers while KMeans iterates over the process.