I would like to obtain the activation of every SOM unit for a given input, i.e., not j

PS: If you want to merge it I can set up a pull request, which <code class="notranslat

Activation on the surface of the map,about peterwittek/somoclu

Comments (46)

peterwittek commented on July 18, 2024 1

This activation matrix is calculated explicitly only for the GPU kernel and it is not directly accessible to the user. The solution you are suggestion is flawless, perhaps it is worth merging back to the Python code, although it is bound to be sluggish for large maps and very high dimensional spaces (unless your numpy was compiled with a parallel BLAS/LAPACK).

I don't necessary see what you mean by using D as the input to another SOM. This D is basically nothing but a special Gram matrix.

from somoclu.

peterwittek commented on July 18, 2024 1

I added a new method view_activation_map to study what could possible go wrong (commit aa4d017). As yet, I have no idea.

from somoclu.

peterwittek commented on July 18, 2024 1

I keep forgetting that the codebook entries are initialized randomly between zero and one. This is fine as long as your data is also such. I played with random data initialized somewhere in [-1,1]^2, so the map itself was garbage, the activation map should have been my least problem. So once I work with data in [0,1]^n, the map is fine.

The next issue is that the dot product and the Euclidean distance should generate the same topology. Actually, this is only true if the vectors are normalized (both codebook and data vectors), as I myself pointed out in issue #30 and then duly forgot.

If you factor in these two observations, I think the results of the activation map should be fine. Without normalization, you still have many identical values for the argmax, but otherwise this bit of code generates nice maps and matching activation maps (only one point is labelled to avoid mess):

import somoclu
import numpy as np

c1 = np.random.rand(50, 2)/5
c2 = (0.2, 0.5) + np.random.rand(50, 2)/5
c3 = (0.4, 0.1) + np.random.rand(50, 2)/5
data = np.float32(np.concatenate((c1, c2, c3)))
colors = ["red"] * 50
colors.extend(["green"] * 50)
colors.extend(["blue"] * 50)
n_rows, n_columns = 30, 50
som = somoclu.Somoclu(n_columns, n_rows, data=data, maptype="toroid",
                      gridtype="rectangular")
som.train(epochs=10)
labels = ["" for _ in range(150)]
labels[0] = 0
som.view_umatrix(bestmatches=True, bestmatchcolors=colors, labels=labels)
map_state = som.get_surface_state()
som.view_activation_map(map_state, 0, labels=labels)

from somoclu.

peterwittek commented on July 18, 2024 1

To be on the safe side, I added the CITATION file. Thanks for the hint!

from somoclu.

oliviaguest commented on July 18, 2024 1

Oh, and with respect to my results, the github repo I have written will be made available as will a pre-print and an OSF repo.

from somoclu.

peterwittek commented on July 18, 2024 1

JSS looked like a nice open access journal... Awesome about the repo! I wish everybody else was doing it.

from somoclu.

peterwittek commented on July 18, 2024 1

Many thanks. I made some cosmetic changes and added a bit of documentation. I believe we are following the best practices of open source software development ;-)

from somoclu.

oliviaguest commented on July 18, 2024

I have to admit I have never heard of a Gramian matrix before, so I just looked it up on Wikipedia. I obviously don't fully understand it yet, however maybe reposing my question might help you understand my idea.

So I was wondering if I could create a 2nd SOM, which took the D (the special Gram matrix - in what way is it special by the way?) as input in order to self-organise itself. Is it a meaningful thing to attempt? This would be a hierarchical SOM, similar to, e.g., Hierarchical SOM-Based Detection of Novel Behavior for 3D Human Tracking, by German Ignacio Parisi and Stefan Wermter.

Am I making any sense? 😄

from somoclu.

oliviaguest commented on July 18, 2024

PS: If you want to merge it I can set up a pull request, which .py shall I put the function in and what name would you like it to have? 😉

from somoclu.

peterwittek commented on July 18, 2024

I did not know about this hierarchical variant, but I think you need something different than D. So you have some n/10 nodes, where n is the number of data instances, and you initialize the map with randomly selected vectors of your data instances. Then, if I got it right, they identify the non-outliers, and either the BMU's weight vector, or a non-outlier data instance is used to construct the feature vectors described in Eqs. (2-4) -- I am not sure which. In any case, D does not seem to play a role.

The helper functions go into train.py, preferably as part of the Somoclu object. Many thanks!

from somoclu.

oliviaguest commented on July 18, 2024

OK, sorry that might have confused things more.

I am not trying to build on or replicate their work. I just wanted to use the activation on the map (as the function I will add to somoclu now does). Is your intuition that that will not work?

from somoclu.

peterwittek commented on July 18, 2024

Let me try to understand it. You get an #rows x #columns matrix as the activation map. Then what would be the training data for the next SOM?

from somoclu.

oliviaguest commented on July 18, 2024

It would be trained on the activation for each input on the surface of the map below it.

from somoclu.

oliviaguest commented on July 18, 2024

PR for the above function: #44 (hope that is consistent with the way you have been doing things as much as possible) 😀

from somoclu.

peterwittek commented on July 18, 2024

Thanks, I merged it.

from somoclu.

oliviaguest commented on July 18, 2024

One last silly question: When does np.argmax(map_state, axis=1) (where map_state is the output of get_surface_state) become equal to the BMUs? Sometimes never?

from somoclu.

peterwittek commented on July 18, 2024

I am looking into now. They should always match. An obvious difference is that your get_surface_state calculates a dot product, and not a distance, so that argmax looks suspicious.

from somoclu.

oliviaguest commented on July 18, 2024

Oooh, how else do you locate the most active unit? I must have something conceptually wrong in my head.

from somoclu.

peterwittek commented on July 18, 2024

Okay, the BMUs are fine. If I calculate the activation map with the Euclidean distance, the argmin coincides with the BMU. I forgot why, but in the BMUs, the second index is the row.

from somoclu.

oliviaguest commented on July 18, 2024

That makes absolute sense. Thanks.

from somoclu.

peterwittek commented on July 18, 2024

I still don't get it, though. The argmax on the dot product often gives the same value, even when the BMUs are different. I have to leave it here now, but I'll try to get back to this

from somoclu.

oliviaguest commented on July 18, 2024

Yes, that bit doesn't make sense to me either. Have you managed to get that on a map on your end too?

from somoclu.

oliviaguest commented on July 18, 2024

Thanks so much. 😄

from somoclu.

peterwittek commented on July 18, 2024

Honestly, I quite like this activation map. I just made the plotting a bit smarter, so you can pass an arbitrary vector and see the activation (commit b718f01). For instance, you can do this:

new_vector = np.array([0.2, 0.3])
som.view_activation_map(data_vector=new_vector)

If you have time, it might be worth changing the calculation of the activation to the Euclidean distance.

from somoclu.

oliviaguest commented on July 18, 2024

Yes, that's great. I would really like to explore this more, I am hitting up against a deadline (tomorrow and next week) but will when I have time.

from somoclu.

oliviaguest commented on July 18, 2024

Done! See: #46
This also fixes everything with respect to my initial question!

from somoclu.

peterwittek commented on July 18, 2024

Many thanks. I pushed commit debac27 just now to achieve the same thing for new vectors when viewing the activation map. Well, I still did not catch your original question, but if it fixes things for you, maybe we can close this.

from somoclu.

oliviaguest commented on July 18, 2024

Yeah sure - the reason I/we (on my project) came up with the idea for the surface map will become clear hopefully when I can more freely discuss the model I am working on. But I do agree with you that it is nonetheless useful more generally too.

When the time comes how would you like me to cite you/Somoclu? Tangentially, have you heard of CITATION files?

from somoclu.

peterwittek commented on July 18, 2024

Thanks for asking, this is a bit of a sad story. The paper on Somoclu was accepted for publication ages ago by the Journal of Statistical Software, after a two-year review process. As of today, the editor still refuses to tell anything about the publication date. This is one reason why there is no CITATION file, the other being my lack of awareness.

The manuscript is on arXiv, so as for now, I would like to ask you to cite that version, as referenced in the Readme:

Peter Wittek, Shi Chao Gao, Ik Soo Lim, Li Zhao (2015). Somoclu: An Efficient Parallel Library for Self-Organizing Maps. arXiv:1305.1422.

Then Depsy theoretically scans manuscripts for links to PyPI or CRAN packages, so a link to Somoclu's PyPI page would also be welcome. Unfortunately, Depsy is not doing a great job, at least not yet.

You might want to consider making your result available as a notebook or as a repository. I am ardent supporter of open science, and I believe that making the results of a paper easily reproducible is hugely important. Well, anyhow, no pressure, and thanks again for asking.

from somoclu.

oliviaguest commented on July 18, 2024

That's so sad, can't you un-submit it and send it to a more welcoming journal - perhaps?

from somoclu.

oliviaguest commented on July 18, 2024

I have been using seaborn a lot. And I wonder if this is in any way as useful to you as it is to me. I have been visualising the surface states using (dis)similalrity matrices:

import seaborn as sns
import matplotlib.pylab as plt

def create_similarity_matrix(X, labels, save_as):

    # Calculate the pairwise correlations as a metric for similarity
    corrmat = 1-pairwise_distances(X, metric="correlation")

    # Set up the matplotlib figure
    f, ax = plt.subplots(figsize=(12,9))

    # Y axis has inverted labels (seaborn default, no idea why)
    yticklabels = np.atleast_2d(labels)
    yticklabels = np.fliplr(yticklabels)[0]

    # Draw the heatmap using seaborn
    sns.heatmap(corrmat, vmax=1, vmin=-1, square=True,
    xticklabels = labels, yticklabels = labels, cmap = "RdBu_r", center = 0)
    f.tight_layout()

    # This sets the ticks to a readable angle
    plt.yticks(rotation=0)
    plt.xticks(rotation=90)

    # This sets the labels for the two axes
    ax.set_yticklabels(yticklabels, ha = 'right', va='center', size= 8)
    ax.set_xticklabels(labels, ha = 'center', va = 'top', size= 8)

    # Save and close the figure
    plt.savefig(save_as+'_heatmap.png', bbox_inches='tight')
    plt.close()

create_similarity_matrix(som.get_surface_state(), labels, 'state')

from somoclu.

peterwittek commented on July 18, 2024

I love seaborn -- just importing it does magic to figures. I wish it was merged with matplotlib.

I am trying to grasp what this heatmap implies. This would be another way of looking at clustering structures between data points. Is this correct? We struggle a lot with evaluating cluster consistency and shifts between the clusters over evolving maps, so it is indeed useful, thanks. Perhaps you can add it to train.py as a helper function -- not as a method of the Somoclu class -- with minor changes to allow both saving to a file and inline visualization. No pressure, though, only if you have a bit of time.

Risking of being overly pedantic, allow me one tiny comment: scoping in Python is very simplistic. It deletes variables in exactly one specific case: when it returns from a function. So the del at the end of the function definition is unnecessary, at least in principle.

from somoclu.

oliviaguest commented on July 18, 2024

Yeah, sorry about the del was copy pasting in a hurry from a non-function/script to make this into a function for you.

I will have to ask you to look at a paper using (dis)similarity matrices. It's a very Googleable term. You could look at my recent preprint which explains them here. Hopefully that will help. Then please feel free to ask me questions if it's still unclear how that could be applied to SOMs, but it should be easier to understand. Perhaps running it on the example you have in the docs for somoclu might also help clarify the usefulness or indeed lack thereof. I have no idea if you will find it as useful as I do!

Tangent, but I actually presented SOMs connected in a hierarchy last week at a conference and I used these matrices to show their representational spaces.

from somoclu.

oliviaguest commented on July 18, 2024

PS: Clusters in the heatmap would be detected by calculating the pairwise correlations between each pattern's activation on the surface of the map.

from somoclu.

peterwittek commented on July 18, 2024

Wow, and I thought I knew SOMs well. Thanks for the paper. I've seen some work on hierarchical SOMs, but I never made the connection to deep learning. Normally I think of SOMs as the shallowest possible learners, but this hierarchy changes the game. This heatmap is handy indeed.

from somoclu.

oliviaguest commented on July 18, 2024

No, sorry, I'm confusing you because I was being brief.

So that preprint uses DSMs, but not SOMs. It uses convolutional artificial neural networks.

My hierarchical SOMs are not written up yet. Hopefully at some point.

Even though they are a little similar they are not overlapping other than by my presence. Completely different PIs. Two different projects.

from somoclu.

peterwittek commented on July 18, 2024

I think I got it -- the question is more about the representation. It is still very interesting, though.

from somoclu.

oliviaguest commented on July 18, 2024

I really hope it's useful and clear. Otherwise please get back to me. I don't want you spending too much time on it. And thanks for nice comments.

from somoclu.

oliviaguest commented on July 18, 2024

Also I'd just like to say it's been really nice interacting with you as an open source developer who's so positive and welcoming to new ideas and contributions. Definitely get back to me if you want me to incorporate this idea into somoclu.

from somoclu.

oliviaguest commented on July 18, 2024

PS: Tidied up the code above even more. 😄

from somoclu.

peterwittek commented on July 18, 2024

It looks great. Please add it to train.py and send a pull request when you have a moment. I want to make sure you get the credit for it. Then I'll do some minor changes to integrate it nicely with the rest of the module.

from somoclu.

oliviaguest commented on July 18, 2024

My PR is very rough, apologies: #47

from somoclu.

commented on July 18, 2024

I did become interested in the activation patterns, I was especially interested if I would be able to find any activation pattern that has two distinct 'hotzones', this doesn't appear to be possible.

Here are a few collages of 100 activation patterns each from a 44x44 codebook I trained with a toroidal hexagonal lattice:

http://imgur.com/a/yqueJ

Brighter=greater distance between sample and that point in the codemap.

After some consideration I decided that isn't informative enough, here is the inverse (1/distance):

http://imgur.com/a/vQ5Oa

The two are not related, I used random sampling.

Code:

def collage(arr):  
    def unblockshaped(arr, h, w):
        n, nrows, ncols = arr.shape
        return (arr.reshape(h//nrows, -1, nrows, ncols).swapaxes(1,2).reshape(h, w))

    sqrt_lastdim = math.sqrt(arr.shape[0])
    sqrt_ceil = math.ceil(sqrt_lastdim)
    if not sqrt_lastdim.is_integer():
        diff = sqrt_ceil ** 2 - arr.shape[0]
        arr = np.concatenate((arr, np.zeros((diff, arr.shape[1],arr.shape[2]))), axis=0)

    grid = unblockshaped(arr, arr.shape[1]*sqrt_ceil, arr.shape[2]*sqrt_ceil)
    return grid

X = data
W = codebook.reshape((codebook.shape[0] * codebook.shape[1], codebook.shape[2]))
D = -2*np.dot(W, X.T) + (W**2).sum(1)[:, None] + (X**2).sum(1)[:, None].T
D = D.swapaxes(0,1).reshape(-1,codebook.shape[0],codebook.shape[1])

grid = collage(codebook.swapaxes(0,2).swapaxes(1,2))
plt.imshow(grid)
plt.show()

for i in range(10):
    idx = np.random.choice(D.shape[0], 100)
    grid = collage(D[idx])

    plt.close()
    plt.gcf().set_size_inches(4, 4)
    plt.gca().axes.get_xaxis().set_visible(False)
    plt.gca().axes.get_yaxis().set_visible(False)
    plt.gca().set_aspect('equal', adjustable='box')    

    plt.imshow(grid)
    plt.savefig(str(i)+'.png', bbox_inches='tight', pad_inches=0, dpi=256)

I fairly sure I confused the heck out of the x,y axis somewhere, so any non square lattice may not work.

from somoclu.

peterwittek commented on July 18, 2024

This is interesting. I am not all that surprised that you have hot zones all over the place, since a plain, random-initialized SOM can only be expected to preserve the local topology.

Perhaps you can get fewer and more localized hot spots. Initialize the map with PCA, then train it for one epoch with a low radius and a low learning rate. This map will not look great, since in one epoch, the training phase does not have a chance to pull the points apart that the PCA collapses to the same spot. The only purpose of this is to see if only one region lights up in the activation map. If this is the case, from here it is a matter of tuning the training parameters to get what you want.

from somoclu.

commented on July 18, 2024

I actually feel somewhat stupid, all I had to do was make the values negative to make the images look more intuitive, now brighter is actually better without having to use inverse:

http://imgur.com/a/rrXCu

The second album of the previous post does however show the absolute centers of influence though. So I guess inverse has it's uses nevertheless.

I'm actually not sure how to interpret these results, what significance does the activation pattern hold? Because it's of higher dimensionality than the data was to begin with - it is however very constrained, much like the pixels of images. Quite interesting, still have to think about it and how it may be of use. It might have potential when you need to project data into a higher dimensional space, although I'm not sure of how much use it would be because it adopts the same problems that images have - being spatial.

It certainly ought to be far more useful than a BMU, considering that the BMU is a tiny tiny fraction of the whole, lots of information gets discarded. I'm pretty sure the activation map each sample makes is significant when comparing samples.

from somoclu.

oliviaguest commented on July 18, 2024

The activation pattern allows us to calculate the similarity matrix. This is my only use really. The similarity matrix depicts a representational space that could not be captured by the BMUs since they are by definition orthogonal.

from somoclu.

Activation on the surface of the map about somoclu HOT 46 CLOSED

Comments (46)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent