Coder Social home page Coder Social logo

Comments (5)

okhat avatar okhat commented on July 23, 2024 1

Thanks for the question!

Indeed, query augmentation is only applied to queries. The v0.1 implementation (from which your snippet is taken) does this in a short way that could possibly seem confusing at first.

The above snippet appends 103s (i.e., [MASK] tokens) to queries and documents and sets the attention_mask to zero. This attention mask controls attention inside the BERT encoder. This attention is different from ColBERT attention (MaxSim). After the BERT encoder, we mask this padding only for the document side right before MaxSim. When storing the documents to disk, we filter out these padding embeddings too (which are masked anyway).

In contrast, for the query side, the representations of the [MASK]s is not masked before MaxSim. However, inside the BERT encoder, setting the attention mask set to zero forces the mask tokens to be computed in the last layer, and not to influence the other token representations. Empirically, this has little to no effect on performance given plenty of training data, so the simpler implementation is in this manner. (Keep in mind that query augmentation itself (as described) still contributes substantially to performance, though.)

Let me know if you have further questions!

from colbert.

okhat avatar okhat commented on July 23, 2024 1

This relies on an implementation detail of HuggingFace transformers: the attention mask doesn't change how the values are computed. It only affects what values are attended to.

So, to answer your question: in queries, the [MASK] positions are represented by paying attention (in the 12th layer) to all other query tokens. Notice this happens only in the 12th layer by default. If you do it in all 12 layers, there's little to no difference either. You can experiment with it to see how it functions.

Also, you may want to use the v0.2 code, which should be easier to understand and otherwise more extensive.

from colbert.

jiqiujia avatar jiqiujia commented on July 23, 2024

Thanks for your reply!
I know that doc side doesn't use the mask padding now. However, I am still confused about how mask padding makes a contribution. As I understand it, mask padding doesn't affect other tokens' representation, but will other token's representation affect mask's representation? I am not sure about this.
Besides, did you experiment how the query max_length affect the performance with mask padding?

from colbert.

jiqiujia avatar jiqiujia commented on July 23, 2024

I still have some doubts but I think query augmentation with mask padding is a great idea, and I will do some further experiments to verify it. Thank you for your reply!

from colbert.

ashantanu avatar ashantanu commented on July 23, 2024

I am having trouble understanding Query Augmentation and why does it add so much to performance. I saw the ablation study and the difference is quite significant.
From your earlier comments, it seems like augmentation happens only in the last layer of ColBERT, which again confuses me about the importance of this step and why would it contribute so much to the performance.

from colbert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.