Coder Social home page Coder Social logo

Comments (13)

lachhebo avatar lachhebo commented on May 17, 2024 1

I can, but i'm more interested on the internal validity of my clusters.

My plan is to use the clustering operated by the SOM as a way to assess the number of clusters and maybe to use this unsupervised clustering in a supervised model.

from minisom.

JustGlowing avatar JustGlowing commented on May 17, 2024 1

Of course, thanks for using Minisom. Leave a star if you like it!

from minisom.

JustGlowing avatar JustGlowing commented on May 17, 2024

hi @lachhebo, the quantization error simply tells you how much information you lose in case that you quantize your data with the SOM. Just to give you an idea, If the quantization error is 0 the weights of your network are exactly as the original data. To know if the SOM is reliable, you have to test it for your specific application.

from minisom.

lachhebo avatar lachhebo commented on May 17, 2024

In my case, i'm trying to assess the number of cluster in a dataset.

What I'm thinking to do is to separate my dataset in two : train and test.
Then train my som on the training dataset optimising the quantization error.
Eventually, i would compare the distance map of my som to the activation frequencies of the testing dataset.

Do you think it is the way to go to get the reliable as possible som ?

from minisom.

JustGlowing avatar JustGlowing commented on May 17, 2024

Is your data labeled?

from minisom.

lachhebo avatar lachhebo commented on May 17, 2024

Yes, it is

from minisom.

JustGlowing avatar JustGlowing commented on May 17, 2024

Then you have can compare the clusters you obtain with your labels.

from minisom.

JustGlowing avatar JustGlowing commented on May 17, 2024

Then you can use a cluster quality measure. There are many, this is an example: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html

from minisom.

lachhebo avatar lachhebo commented on May 17, 2024

IMHO, directly use the silhouette score on the clustering operated by the som is not pertinent as many nodes are next to each other, hence the silhouette score will be low. The correct number of clusters is probably inferior to the number of nodes.

from minisom.

JustGlowing avatar JustGlowing commented on May 17, 2024

It depends on how you derive your clusters, I usually recommend to give to use small maps and assume that each position in the map gives you a cluster. For example, a 2-by-2 map will give you 4 clusters. This way the silhouette score is suitable.

from minisom.

lachhebo avatar lachhebo commented on May 17, 2024

It will work but i will get a higher quantization error and simpler algorithm like Affinity propagation will probably as well in this case.

I think it's better to user a bigger map with a lower quantization error and then try to interpret the distance map and see if it is reliable.

from minisom.

lachhebo avatar lachhebo commented on May 17, 2024

Thanks for your time and your work, it is a great package and i already starred it !

from minisom.

JustGlowing avatar JustGlowing commented on May 17, 2024

Anyway, to go back to your initial question. You need to tune the SOM to have the quantization error that you desire. More clusters means lower quantization error. The best solution only depends in how many clusters there's in your data.

from minisom.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.