jingtaozhan / repconc Goto Github PK
View Code? Open in Web Editor NEWWSDM'22 Best Paper: Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval
License: MIT License
WSDM'22 Best Paper: Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval
License: MIT License
Hello,
Thank you very much for this very interesting paper and open sourcing the results.
I was wondering if you were planning on releasing the training code as well ?
Thanks,
claeyzre
In table-1, is there any explanation for why the results of unsupervised PQ (MRR@10 = 0.028 @ compression ratio = 64x) so poor?
In our experience, PQ works reasonably well. For example, when we use PQ to compress vectors (768 dim) generated by the ANCE model with M=32 (compression ratio = 96x), we get MRR@10 = 0.252 on MS MARCO Passage Dev. We used IndexPQ from the FAISS library for the same.
Also, when reporting results on unsupervised methods (PQ, ScaNN, OPQ etc), what is the encoder of input uncompressed vectors? Is it the trained STAR model?
The RepCONC paper mentions several times that it does not fix index assignments like JPQ.
However, in section 3.6.2, there is a contradictory line as follows: "To enable end-to-end retrieval during training, we fix the Index Assignments and only train the query encoder and PQ Centroid Embeddings.". Is this line a typing error?
Hi,
Thanks for sharing the code! Does your code support dp or ddp training? I tried running your training script but it seemed that it only supports single-gpu. Could you please tell me how to train your model with dp or ddp?
Hi @jingtaozhan,
Thanks for releasing this super repository and interesting paper. I'm interested in evaluating the model generalization across different datasets. For example, evaluating the model on different datasets from the BEIR Benchmark (https://github.com/UKPLab/beir).
It would really help if a sample code is available to evaluate an already trained RepCONC model on a dataset from the BEIR Benchmark.
Thanks!
Kind Regards,
Nandan Thakur
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.