<div class="highlight highlight-source-python notranslate position-relative overflow-auto" dir="auto

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

attn_mask about coca-pytorch HOT 3 OPEN

lucidrains commented on July 29, 2024 1

attn_mask

from coca-pytorch.

Comments (3)

gshaikov-paige commented on July 29, 2024 1

@pldlgb we only mask the last row of sim because this row corresponds to the CLS token query. Without this mask it will attend to all the keys before it, incl. PAD keys.

We don't need to mask other queries because we don't care what PAD queries attend to - they will be masked out when we compute CE loss. We also don't need to mask text queries since they are already masked by the causal mask so they can only look backwards at other text queries.

from coca-pytorch.

skyerhxx commented on July 29, 2024

I have the same question. It seems like the attn_mask = F.pad(cls_mask, (0, 1, seq, 0), value=True) is not right.
Based on the original paper, the attn_mask here should be in the form of an inverted triangle, to prevent the current timestep feature from seeing the future timestep feature.

Welcome to discuss.

from coca-pytorch.

gshaikov-paige commented on July 29, 2024

@skyerhxx This is not the causal mask, this is a mask that prevents CLS tokens from attending to PAD tokens in the batch.

We add PAD tokens to the text batch since text examples have different length but the tensor has a fixed dimension, so to concat them into a batch tensor one must pad the end sequence with dummy token, i.e. a PAD token. However, since we append CLS token to the very end, it will attend to the entire sequence, including PAD tokens, which we don't want. So we mask them out.

from coca-pytorch.

Recommend Projects

attn_mask about coca-pytorch HOT 3 OPEN

Comments (3)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent