Comments (3)
The error is what it says on the tin: you can’t use a sparse input.
You need to convert the sparse input to a normal (dense) format. Usually you can do this by X.todense()
if it’s a scipy sparse array.
from k-means-constrained.
The error is what it says on the tin: you can’t use a sparse input.
You need to convert the sparse input to a normal (dense) format. Usually you can do this by
X.todense()
if it’s a scipy sparse array.
Ok, done this. It took 16 hours with 100 iterations. The clusters were all evenly distributed, almost the same number by cluster., and the assigned minimum.
The native scikit K-means gave me very uneven distributed clusters. I know that k-means should have approximate number of individuals in each cluster, but in comparison to the original, the constrained shouldn't be something in between? not so unevenly distributed, but also not so even?
sklearn.cluster k-means, 100 iterations:
1 | 1764
2 | 872
3 | 2019
4 | 5183
5 | 1956
6 | 1388
7 | 1588
8 | 2241
9 | 3476
10 | 2017
11 | 869
12 | 3238
13 | 3637
14 | 2970
15 | 1362
16 | 4002
17 | 1894
18 | 5300
19 | 2672
20 | 3289
21 | 2353
22 | 68407
23 | 2752
24 | 1349
25 | 5436
k_means_constrained KMeansConstrained, 100 iterations :
min_clus = 5280
max_clus=13202
1 | 5280
2 | 5280
3 | 5280
4 | 5280
5 | 5280
6 | 5304
7 | 5280
8 | 5280
9 | 5280
10 | 5280
11 | 5280
12 | 5280
13 | 5280
14 | 5280
15 | 5280
16 | 5280
17 | 5280
18 | 5280
19 | 5280
20 | 5280
21 | 5280
22 | 5280
23 | 5280
24 | 5280
25 | 5280
from k-means-constrained.
The cluster distribution depends on the data distribution, there are no guarantees for either normal k-means or constrained k-means. The input data drives the output cluster distribution.
In practice from experience, k-means will usual have a power law distribution (which your data above appears to have) and a constrained k-means distribution will be more uniform.
As your k-means-constrained output cluster distribution is completed union this indicates you need a lower min cluster size and possibly more clusters to better represent your data. But this decision is dependent on your use-case.
from k-means-constrained.
Related Issues (20)
- [BUG] installation issues with numpy < 1.23 HOT 3
- facing the issue during running HOT 2
- [BUG] Failed to build k-means-constrained HOT 3
- Wrong clustering HOT 1
- Wrong clustering with pre_computed adjacency matrix HOT 3
- QGIS implementation HOT 1
- ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject HOT 2
- [How to classify the new instances after obtaining a constrained clustering] HOT 1
- [BUG] error compiling on apple silicon. HOT 2
- Maybe tag a new release? HOT 1
- Issue in importing k-means-constrained in Google Colab notebook HOT 2
- Possibility to use this on k-modes HOT 2
- Can't install k-means-constrained HOT 2
- Resource intensity HOT 2
- Weighting observations HOT 1
- import k_means_constrained HOT 4
- Issue with min cost flow input HOT 2
- Segmentation fault when import k_means_constrained
- Is it possible to implement MiniBatchKmeansConstrained?
- [BUG] IndexError: index 10000 is out of bounds for axis 0 with size 10000
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from k-means-constrained.