In the paper, I notice that query augmentation is done by padding the query with [MASK

Question about query augmentation about colbert HOT 5 CLOSED

stanford-futuredata commented on August 26, 2024

Question about query augmentation

from colbert.

Comments (5)

okhat commented on August 26, 2024 1

Thanks for the question!

Indeed, query augmentation is only applied to queries. The v0.1 implementation (from which your snippet is taken) does this in a short way that could possibly seem confusing at first.

The above snippet appends 103s (i.e., [MASK] tokens) to queries and documents and sets the attention_mask to zero. This attention mask controls attention inside the BERT encoder. This attention is different from ColBERT attention (MaxSim). After the BERT encoder, we mask this padding only for the document side right before MaxSim. When storing the documents to disk, we filter out these padding embeddings too (which are masked anyway).

In contrast, for the query side, the representations of the [MASK]s is not masked before MaxSim. However, inside the BERT encoder, setting the attention mask set to zero forces the mask tokens to be computed in the last layer, and not to influence the other token representations. Empirically, this has little to no effect on performance given plenty of training data, so the simpler implementation is in this manner. (Keep in mind that query augmentation itself (as described) still contributes substantially to performance, though.)

Let me know if you have further questions!

from colbert.

okhat commented on August 26, 2024 1

This relies on an implementation detail of HuggingFace transformers: the attention mask doesn't change how the values are computed. It only affects what values are attended to.

So, to answer your question: in queries, the [MASK] positions are represented by paying attention (in the 12th layer) to all other query tokens. Notice this happens only in the 12th layer by default. If you do it in all 12 layers, there's little to no difference either. You can experiment with it to see how it functions.

Also, you may want to use the v0.2 code, which should be easier to understand and otherwise more extensive.

from colbert.

jiqiujia commented on August 26, 2024

Thanks for your reply!
I know that doc side doesn't use the mask padding now. However, I am still confused about how mask padding makes a contribution. As I understand it, mask padding doesn't affect other tokens' representation, but will other token's representation affect mask's representation? I am not sure about this.
Besides, did you experiment how the query max_length affect the performance with mask padding?

from colbert.

jiqiujia commented on August 26, 2024

I still have some doubts but I think query augmentation with mask padding is a great idea, and I will do some further experiments to verify it. Thank you for your reply!

from colbert.

ashantanu commented on August 26, 2024

I am having trouble understanding Query Augmentation and why does it add so much to performance. I saw the ablation study and the difference is quite significant.
From your earlier comments, it seems like augmentation happens only in the last layer of ColBERT, which again confuses me about the importance of this step and why would it contribute so much to the performance.

from colbert.

Question about query augmentation about colbert HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent