File "/home/usr/anaconda3/envs/varpt13/lib/python3.8/site-packages/torch/nn/modules/mo

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

could you post a simple reproducible ? <p dir="au

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Image tokens dimension mismatch with the layernorm bias dimension about coca-pytorch HOT 12 CLOSED

lucidrains commented on September 3, 2024

Image tokens dimension mismatch with the layernorm bias dimension

from coca-pytorch.

Comments (12)

szxuhongye commented on September 3, 2024 1

@szxuhongye also, can you give the full trace?
torch 1.13.1 and the following is the full trace：

Traceback (most recent call last):
File "playground.py", line 39, in
loss = coca(
File "/home/usr/anaconda3/envs/coca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/usr/anaconda3/envs/coca/lib/python3.8/site-packages/coca_pytorch/coca_pytorch.py", line 433, in forward
image_embeds, image_tokens = self.embed_image(images=images, image_tokens=image_tokens)
File "/home/usr/anaconda3/envs/coca/lib/python3.8/site-packages/coca_pytorch/coca_pytorch.py", line 412, in embed_image
img_queries = self.img_attn_pool(img_queries, image_tokens)
File "/home/usr/anaconda3/envs/coca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/usr/anaconda3/envs/coca/lib/python3.8/site-packages/coca_pytorch/coca_pytorch.py", line 245, in forward
context = self.context_norm(context)
File "/home/usr/anaconda3/envs/coca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/usr/anaconda3/envs/coca/lib/python3.8/site-packages/coca_pytorch/coca_pytorch.py", line 25, in forward
return F.layer_norm(x, x.shape[-1:], self.gamma, self.beta)
File "/home/usr/anaconda3/envs/coca/lib/python3.8/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected weight to be of same shape as normalized_shape, but got weight of shape [1024] and normalized_shape = [1000]

from coca-pytorch.

lucidrains commented on September 3, 2024

does your image_dim on CoCa init match up with the dim of the vit?

from coca-pytorch.

lucidrains commented on September 3, 2024

could you post a simple reproducible script?

from coca-pytorch.

szxuhongye commented on September 3, 2024

could you post a simple reproducible script?

I just ran the code that you provide in the README.md

import torch

from vit_pytorch.simple_vit_with_patch_dropout import SimpleViT
from vit_pytorch.extractor import Extractor

vit = SimpleViT(
image_size = 256,
patch_size = 32,
num_classes = 1000,
dim = 1024,
depth = 6,
heads = 16,
mlp_dim = 2048,
patch_dropout = 0.5 # https://arxiv.org/abs/2212.00794
)

vit = Extractor(vit, return_embeddings_only = True, detach = False)

from coca_pytorch.coca_pytorch import CoCa

coca = CoCa(
dim = 512, # model dimension
img_encoder = vit, # vision transformer - image encoder, returning image embeddings as (batch, seq, dim)
image_dim = 1024, # image embedding dimension, if not the same as model dimensions
num_tokens = 20000, # number of text tokens
unimodal_depth = 6, # depth of the unimodal transformer
multimodal_depth = 6, # depth of the multimodal transformer
dim_head = 64, # dimension per attention head
heads = 8, # number of attention heads
caption_loss_weight = 1., # weight on the autoregressive caption loss
contrastive_loss_weight = 1., # weight on the contrastive loss between image and text CLS embeddings
).cuda()

text = torch.randint(0, 20000, (4, 512)).cuda()
images = torch.randn(4, 3, 256, 256).cuda()

loss = coca(
text = text,
images = images,
return_loss = True # set this to True to get the full caption + contrastive loss
)

loss.backward()

from coca-pytorch.

lucidrains commented on September 3, 2024

@szxuhongye it runs for me

which version of coca-pytorch are you on? and are you on the latest vit-pytorch?

from coca-pytorch.

szxuhongye commented on September 3, 2024

@szxuhongye it runs for me

which version of coca-pytorch are you on? and are you on the latest vit-pytorch?

coca-pytorch 0.07 and vit-pytorch 0.40.2

from coca-pytorch.

lucidrains commented on September 3, 2024

@szxuhongye hmm, that looks ok

which version of pytorch?

from coca-pytorch.

lucidrains commented on September 3, 2024

@szxuhongye also, can you give the full trace?

from coca-pytorch.

lucidrains commented on September 3, 2024

@szxuhongye i see, the Extractor may be broken in pytorch 1.13.1 and not returning the embeddings, and instead, returning the logits

from coca-pytorch.

lucidrains commented on September 3, 2024

@szxuhongye hmm no, it still works for me try running all the cells in that colab. it is also on pytorch 1.13.0

from coca-pytorch.

szxuhongye commented on September 3, 2024

@szxuhongye hmm no, it still works for me try running all the cells in that colab. it is also on pytorch 1.13.0

It is truly wired that it works after I uninstall and reinstall the entire environment. Thank you for your response.

from coca-pytorch.

lucidrains commented on September 3, 2024

glad it is working now 😁

from coca-pytorch.

Image tokens dimension mismatch with the layernorm bias dimension about coca-pytorch HOT 12 CLOSED

Comments (12)

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent