I'm trying to apply a Constrained K-means to my data and I get this error "NotImplemen

Constrained K-Means not implemented to sparse Matrix about k-means-constrained HOT 3 CLOSED

joshlk commented on July 20, 2024

Constrained K-Means not implemented to sparse Matrix

from k-means-constrained.

Comments (3)

joshlk commented on July 20, 2024

The error is what it says on the tin: you can’t use a sparse input.

You need to convert the sparse input to a normal (dense) format. Usually you can do this by X.todense() if it’s a scipy sparse array.

from k-means-constrained.

ericjardimx commented on July 20, 2024

The error is what it says on the tin: you can’t use a sparse input.

You need to convert the sparse input to a normal (dense) format. Usually you can do this by X.todense() if it’s a scipy sparse array.

Ok, done this. It took 16 hours with 100 iterations. The clusters were all evenly distributed, almost the same number by cluster., and the assigned minimum.

The native scikit K-means gave me very uneven distributed clusters. I know that k-means should have approximate number of individuals in each cluster, but in comparison to the original, the constrained shouldn't be something in between? not so unevenly distributed, but also not so even?

sklearn.cluster k-means, 100 iterations:

1 | 1764
2 | 872
3 | 2019
4 | 5183
5 | 1956
6 | 1388
7 | 1588
8 | 2241
9 | 3476
10 | 2017
11 | 869
12 | 3238
13 | 3637
14 | 2970
15 | 1362
16 | 4002
17 | 1894
18 | 5300
19 | 2672
20 | 3289
21 | 2353
22 | 68407
23 | 2752
24 | 1349
25 | 5436

k_means_constrained KMeansConstrained, 100 iterations :

min_clus = 5280
max_clus=13202

1 | 5280
2 | 5280
3 | 5280
4 | 5280
5 | 5280
6 | 5304
7 | 5280
8 | 5280
9 | 5280
10 | 5280
11 | 5280
12 | 5280
13 | 5280
14 | 5280
15 | 5280
16 | 5280
17 | 5280
18 | 5280
19 | 5280
20 | 5280
21 | 5280
22 | 5280
23 | 5280
24 | 5280
25 | 5280

from k-means-constrained.

joshlk commented on July 20, 2024

The cluster distribution depends on the data distribution, there are no guarantees for either normal k-means or constrained k-means. The input data drives the output cluster distribution.

In practice from experience, k-means will usual have a power law distribution (which your data above appears to have) and a constrained k-means distribution will be more uniform.

As your k-means-constrained output cluster distribution is completed union this indicates you need a lower min cluster size and possibly more clusters to better represent your data. But this decision is dependent on your use-case.

from k-means-constrained.

Recommend Projects

Constrained K-Means not implemented to sparse Matrix about k-means-constrained HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent