Coder Social home page Coder Social logo

Comments (3)

joshlk avatar joshlk commented on July 20, 2024

The error is what it says on the tin: you can’t use a sparse input.

You need to convert the sparse input to a normal (dense) format. Usually you can do this by X.todense() if it’s a scipy sparse array.

from k-means-constrained.

ericjardimx avatar ericjardimx commented on July 20, 2024

The error is what it says on the tin: you can’t use a sparse input.

You need to convert the sparse input to a normal (dense) format. Usually you can do this by X.todense() if it’s a scipy sparse array.

Ok, done this. It took 16 hours with 100 iterations. The clusters were all evenly distributed, almost the same number by cluster., and the assigned minimum.

The native scikit K-means gave me very uneven distributed clusters. I know that k-means should have approximate number of individuals in each cluster, but in comparison to the original, the constrained shouldn't be something in between? not so unevenly distributed, but also not so even?

sklearn.cluster k-means, 100 iterations:

1 | 1764
2 | 872
3 | 2019
4 | 5183
5 | 1956
6 | 1388
7 | 1588
8 | 2241
9 | 3476
10 | 2017
11 | 869
12 | 3238
13 | 3637
14 | 2970
15 | 1362
16 | 4002
17 | 1894
18 | 5300
19 | 2672
20 | 3289
21 | 2353
22 | 68407
23 | 2752
24 | 1349
25 | 5436

k_means_constrained KMeansConstrained, 100 iterations :

min_clus = 5280
max_clus=13202

1 | 5280
2 | 5280
3 | 5280
4 | 5280
5 | 5280
6 | 5304
7 | 5280
8 | 5280
9 | 5280
10 | 5280
11 | 5280
12 | 5280
13 | 5280
14 | 5280
15 | 5280
16 | 5280
17 | 5280
18 | 5280
19 | 5280
20 | 5280
21 | 5280
22 | 5280
23 | 5280
24 | 5280
25 | 5280

from k-means-constrained.

joshlk avatar joshlk commented on July 20, 2024

The cluster distribution depends on the data distribution, there are no guarantees for either normal k-means or constrained k-means. The input data drives the output cluster distribution.

In practice from experience, k-means will usual have a power law distribution (which your data above appears to have) and a constrained k-means distribution will be more uniform.

As your k-means-constrained output cluster distribution is completed union this indicates you need a lower min cluster size and possibly more clusters to better represent your data. But this decision is dependent on your use-case.

from k-means-constrained.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.