Comments (13)
I can, but i'm more interested on the internal validity of my clusters.
My plan is to use the clustering operated by the SOM as a way to assess the number of clusters and maybe to use this unsupervised clustering in a supervised model.
from minisom.
Of course, thanks for using Minisom. Leave a star if you like it!
from minisom.
hi @lachhebo, the quantization error simply tells you how much information you lose in case that you quantize your data with the SOM. Just to give you an idea, If the quantization error is 0 the weights of your network are exactly as the original data. To know if the SOM is reliable, you have to test it for your specific application.
from minisom.
In my case, i'm trying to assess the number of cluster in a dataset.
What I'm thinking to do is to separate my dataset in two : train and test.
Then train my som on the training dataset optimising the quantization error.
Eventually, i would compare the distance map of my som to the activation frequencies of the testing dataset.
Do you think it is the way to go to get the reliable as possible som ?
from minisom.
Is your data labeled?
from minisom.
Yes, it is
from minisom.
Then you have can compare the clusters you obtain with your labels.
from minisom.
Then you can use a cluster quality measure. There are many, this is an example: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html
from minisom.
IMHO, directly use the silhouette score on the clustering operated by the som is not pertinent as many nodes are next to each other, hence the silhouette score will be low. The correct number of clusters is probably inferior to the number of nodes.
from minisom.
It depends on how you derive your clusters, I usually recommend to give to use small maps and assume that each position in the map gives you a cluster. For example, a 2-by-2 map will give you 4 clusters. This way the silhouette score is suitable.
from minisom.
It will work but i will get a higher quantization error and simpler algorithm like Affinity propagation will probably as well in this case.
I think it's better to user a bigger map with a lower quantization error and then try to interpret the distance map and see if it is reliable.
from minisom.
Thanks for your time and your work, it is a great package and i already starred it !
from minisom.
Anyway, to go back to your initial question. You need to tune the SOM to have the quantization error that you desire. More clusters means lower quantization error. The best solution only depends in how many clusters there's in your data.
from minisom.
Related Issues (20)
- Keyerror HOT 1
- Topographic error wrong for hexagonal topography with rectangular grid HOT 6
- Pip latest version is 2.3.0 (no hexagonal topographic error) HOT 1
- Meaning of num_iteration inside train method HOT 2
- Clarification on the self.update function? HOT 2
- Decay function (Question) HOT 2
- Why is the project called minisom ? HOT 2
- is there a way to find out clusters with minisom? HOT 3
- Is there way to classify SOM based on the u-matrix ? HOT 1
- Question on topology parameter. HOT 2
- KeyError: 339
- Question on the radius. HOT 2
- About how you efficiently calculate the winner and the neighbourhood function. HOT 1
- How can I get reproducible results? HOT 2
- Meaning of the position of winning nodes and distance map HOT 3
- considerations on the hexagonal neighborhood used in `distance_map()` HOT 6
- It seems the next run will be influenced by the former run. Will it be more rational to reset after every run? HOT 4
- Is there a way to create clusters of equal size? HOT 2
- What's the order of the coordinates returned by the winner method? HOT 2
- How to save and load trained model for cluster prediction on new data? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from minisom.