Comments (5)
Hi Jo, thanks for the feedback!
I think that with d=32 and float16, MS MARCO passages consumes around 35GiBs, so one could use a much smaller machine than m5a.12xlarge.
To your main points:
It would be nice if the tokenization parts could be moved out of the torch ColBERT model. This would allow direct export to ONNX format for the model for fast serving (e.g with ONNX-RT).
Please check the current code on master (now v0.2). I think it already does so.
Move to AutoModel instead of BertModel so user can chose by an argument which pre-trained checkpoint should be used.
Good advice! Will do this soon and let you know. I hope it's relatively straightforward.
Let me know if there are other things I can do that are helpful. I'm about to release a quantization flag for indexing in ColBERT that represents each vector with just 32 bytes and yet gets >37% MRR@10 on MS MARCO passages (dev). Conversely, a lot of our users want to use all the Vespa optimizations for our late interaction (MaxSim) operator but don't want to miss out on some features in our repo (e.g., end-to-end retrieval).
Is there any way we can make Vespa and ColBERT interoperate more directly, so people don't have to choose one or the other?
from colbert.
Hey @jobergum !
I thought you may be interested to know about our new quantization branch. By default, it represents each vector in just 32 bytes. I generally get very similar results with this to using the full 128-dim embedding.
from colbert.
Thank you for the quick feedback Omar, appreciate it!
I think that with d=32 and float16, MS MARCO passages consumes around 35GiBs, so one could use a much smaller machine than >m5a.12xlarge.
Yes, you are right and it makes it even more attractive compared to GPU serving. bfloat16/int8 is coming soon to Vespa as well
(vespa-engine/vespa#17118).
Let me know if there are other things I can do that are helpful. I'm about to release a quantization flag for indexing in ColBERT that >represents each vector with just 32 bytes and yet gets >37% MRR@10 on MS MARCO passages (dev).
Then we are down to 18GiB for the passage tasks and it also makes the document ranking task more practical as well.
Conversely, a lot of our users >want to use all the Vespa optimizations for our late interaction (MaxSim) operator but don't want to >miss out on some features in our >repo (e.g., end-to-end retrieval).
Yes, end to end retrieval using only ColBERT depends on Vespa allowing indexing multiple tensors per document in the HNSW graph, we don't want to duplicate the passage text across up to 80 sub-documents. Especially for document ranking, but also for passage where we could also store token_ids of the passage for another re-ranking stage using full cross-attention but on e.g top 10 hits from ColBERT MaxSim.
Vespa allows efficient candidate retrieval using sparse (e.g HDCT or docT5query using wand) or dense via ann (hnsw) or a hybrid of the above in the same query. I think personally that ColBERT shines as a re-ranker as compared with a cross-attention model but we do see the need for allowing indexing multiple vectors for the same document so I think we will get there.
Please check the current code on master (now v0.2). I think it already does so.
I see, I'll check it out. I used an older version of this repo when training the linked snapshot weights and a small wrapper for the query forward pass. We used your repo to produce the passage representation and we plan on releasing the pre-produced term vectors on a datahub soon
Is there any way we can make Vespa and ColBERT interoperate more directly, so people don't have to choose one or the other?
I have to think about this. I think the first important part is that the ColBERT model allows exporting to ONNX and that Vespa can index multiple vectors per document in the HNSW graph. The vectorization offline is best done outside of Vespa (as batch size > 1) which makes GPU attractive.
from colbert.
Awesome---thanks Jo!
from colbert.
@okhat thanks for the update. That is awesome! So we can use tensor<int8>(t{}, x[32]). That will also enable further run time optimizations of the max sim operator in Vespa. The current version uses bfloat16 but it just saves memory, the evaluation is still on float but having int8 could enable faster max sim evaluation.
Will merge https://github.com/vespa-engine/sample-apps/blob/passage-ranking/msmarco-ranking/passage-ranking.md to master this week, just wrapping up the performance summary.
from colbert.
Related Issues (20)
- How to get rid of the "Duplicate GPU detected : rank 0 and rank 1 both on CUDA device ca000" error while training of the ColBERTv1.9 modell? HOT 1
- Request for AMD gpu support
- How to quickly check if installation is working fine?
- ColBert is not failing when Error is encounter during both train and indexing
- How to insert new document into the pre-built index? HOT 1
- Is there a check point of ColBERT that wasn't trained on MSMARCO?
- How to check the centroids and the data in the clusters?
- Extract only embeddings
- Execution fails in colbert.index_objs() with assert classname.endswith('Vector')
- Results on BEIR HOT 1
- unable to open file </root/.cache/huggingface/hub/models--bert-base-uncased/snapshots/86b5e0934494bd15c9632b12f734a8a67f723594/model.safetensors> in read-only mode: No such file or directory (2)
- Add_to_index only work first time
- Tokenization Assumption for Query Marker Replacement is Inconsistent
- GPU crashes when running "D_packed @ Q.to(dtype=D_packed.dtype).T" with no error message HOT 1
- Training script from doc is not working
- ImportError: cannot import name 'packaging' from 'pkg_resources' HOT 1
- Indexing stuck at encoding passages HOT 2
- ImportError: .../torch_extensions/py38_cu117/decompress_residuals_cpp/decompress_residuals_cpp.so: cannot open shared object file: No such file or directory
- How to load the checkpoint of "colbert-ir/colbertv2.0"
- FAISS RuntimError
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from colbert.