Comments (5)
Thanks for the question!
Indeed, query augmentation is only applied to queries. The v0.1 implementation (from which your snippet is taken) does this in a short way that could possibly seem confusing at first.
The above snippet appends 103
s (i.e., [MASK]
tokens) to queries and documents and sets the attention_mask
to zero. This attention mask controls attention inside the BERT encoder. This attention is different from ColBERT attention (MaxSim). After the BERT encoder, we mask this padding only for the document side right before MaxSim. When storing the documents to disk, we filter out these padding embeddings too (which are masked anyway).
In contrast, for the query side, the representations of the [MASK]
s is not masked before MaxSim. However, inside the BERT encoder, setting the attention mask set to zero forces the mask tokens to be computed in the last layer, and not to influence the other token representations. Empirically, this has little to no effect on performance given plenty of training data, so the simpler implementation is in this manner. (Keep in mind that query augmentation itself (as described) still contributes substantially to performance, though.)
Let me know if you have further questions!
from colbert.
This relies on an implementation detail of HuggingFace transformers: the attention mask doesn't change how the values are computed. It only affects what values are attended to.
So, to answer your question: in queries, the [MASK] positions are represented by paying attention (in the 12th layer) to all other query tokens. Notice this happens only in the 12th layer by default. If you do it in all 12 layers, there's little to no difference either. You can experiment with it to see how it functions.
Also, you may want to use the v0.2 code, which should be easier to understand and otherwise more extensive.
from colbert.
Thanks for your reply!
I know that doc side doesn't use the mask padding now. However, I am still confused about how mask padding makes a contribution. As I understand it, mask padding doesn't affect other tokens' representation, but will other token's representation affect mask's representation? I am not sure about this.
Besides, did you experiment how the query max_length affect the performance with mask padding?
from colbert.
I still have some doubts but I think query augmentation with mask padding is a great idea, and I will do some further experiments to verify it. Thank you for your reply!
from colbert.
I am having trouble understanding Query Augmentation and why does it add so much to performance. I saw the ablation study and the difference is quite significant.
From your earlier comments, it seems like augmentation happens only in the last layer of ColBERT, which again confuses me about the importance of this step and why would it contribute so much to the performance.
from colbert.
Related Issues (20)
- Set batch size when indexing HOT 3
- troubleshooting encoding performance HOT 1
- Pre-filtering the documents based on metadata before late-interaction HOT 5
- What is Colbert v1.9?
- Issue: Training "resume" and "resume_optimizer" implementation was removed
- Irrelevant results returned by the Colbert V2 Model HOT 1
- crypt.h: No such file or directory HOT 7
- Basic Training (ColBERTv1-style) -> ujson.JSONDecodeError: Expected object or value HOT 2
- How can I use "all_mpnet_base_v2" model for colbert indexing and searching?
- GPU not working while training a new model in Colab
- [rank1]:[E ProcessGroupNCCL.cpp:523] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL HOT 2
- How to set chunk_size
- Tokens in `skiplist` are not returned (masked out) but they still affect other tokens embeddings. Is this expected? HOT 2
- How to get the mapping information about doc_id with doc_content. HOT 1
- CollectionEncoder blocking on encoder N passages HOT 1
- Focusing retrieval on list of document ids with doc_ids parameter doesn't work
- type object 'ColBERT' has no attribute 'segmented_maxsim' HOT 1
- Where is the qrels.dev.small.tsv?
- How to get rid of the "Duplicate GPU detected : rank 0 and rank 1 both on CUDA device ca000" error while training of the ColBERTv1.9 modell? HOT 1
- Request for AMD gpu support
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from colbert.