Hi, I'm very impressed with the quick reproduction, nice work! <

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Positional embedding not stored in checkpoints - problem for tuning/inference at higher resolution about mae-pytorch HOT 6 CLOSED

pengzhiliang commented on September 28, 2024

Positional embedding not stored in checkpoints - problem for tuning/inference at higher resolution

from mae-pytorch.

Comments (6)

pengzhiliang commented on September 28, 2024

Thank you for your affirmation！

As you know, we use the sine-cosine positional embedding mentioned in the paper. And those are not learnable parameters, and are not stored in checkpoints. So a potential solution is to use the learnable positional embedding.

And if you run it at higher resolutions here, you maybe not need to interpolate the sine-cosine positional embedding.
Because get_sinussoid_encoding_table function can return a given number of positional embedding, you only need to change the corresponding parameters.

from mae-pytorch.

pengzhiliang commented on September 28, 2024

Please feel free to reopen this issue with more info if you are still stuck in this problem.
Thank you！

from mae-pytorch.

atonderski commented on September 28, 2024

Hi, sorry for the slow response.

I agree that the sine-cosine embeddings are not learnable. However it seems like they still need to be interpolated for the model to work well. I suspect that this is at least partially due to the fact that they are 1d, and thus the model has to learn the number of rows/column. E.g. it cannot say "look one patch down" but rather has to say "look X patches forward".

I have attached attention visualizations that show what happens if you run on higher res with or without interpolating the positional embedding. As you can see, the non-interpolated version looks worse and has weird diagonal stripes.

This is not a big issue to me, but I wanted to let you (and anyone else that has the same problem) know about this. I think the best solution is what I mentioned before: to simply include the positional embeddings in the checkpoint even though they are not learnable parameters.

Original:

With interpolation:

Without interpolation:

from mae-pytorch.

cliangyu commented on September 28, 2024

@atonderski Could you please share how to draw the self-attention map, without class token?

from mae-pytorch.

atonderski commented on September 28, 2024

Yeah, so since there is no class token I am here visualising the attention map of an arbitrarily picked token (signified by the red dot in the image. There are of course as many attention maps as there are token/patches

from mae-pytorch.

cliangyu commented on September 28, 2024

There are 12 images... are they corresponding to 12 heads?
Do you mind pushing the code? Thank you!

from mae-pytorch.

Positional embedding not stored in checkpoints - problem for tuning/inference at higher resolution about mae-pytorch HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent