Coder Social home page Coder Social logo

jingtaozhan / repconc Goto Github PK

View Code? Open in Web Editor NEW
115.0 4.0 12.0 491 KB

WSDM'22 Best Paper: Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval

License: MIT License

Python 100.00%
dense-retrieval product-quantization efficiency neural-ranking information-retrieval

repconc's Issues

Release of the training code

Hello,

Thank you very much for this very interesting paper and open sourcing the results.
I was wondering if you were planning on releasing the training code as well ?

Thanks,

claeyzre

Unsupervised PQ results

In table-1, is there any explanation for why the results of unsupervised PQ (MRR@10 = 0.028 @ compression ratio = 64x) so poor?

In our experience, PQ works reasonably well. For example, when we use PQ to compress vectors (768 dim) generated by the ANCE model with M=32 (compression ratio = 96x), we get MRR@10 = 0.252 on MS MARCO Passage Dev. We used IndexPQ from the FAISS library for the same.

Also, when reporting results on unsupervised methods (PQ, ScaNN, OPQ etc), what is the encoder of input uncompressed vectors? Is it the trained STAR model?

Fixing of Index Assignments

The RepCONC paper mentions several times that it does not fix index assignments like JPQ.
However, in section 3.6.2, there is a contradictory line as follows: "To enable end-to-end retrieval during training, we fix the Index Assignments and only train the query encoder and PQ Centroid Embeddings.". Is this line a typing error?

Multi-gpu training

Hi,
Thanks for sharing the code! Does your code support dp or ddp training? I tried running your training script but it seemed that it only supports single-gpu. Could you please tell me how to train your model with dp or ddp?

Evaluating RepCONC on different datasets in a zero-shot fashion

Hi @jingtaozhan,

Thanks for releasing this super repository and interesting paper. I'm interested in evaluating the model generalization across different datasets. For example, evaluating the model on different datasets from the BEIR Benchmark (https://github.com/UKPLab/beir).

It would really help if a sample code is available to evaluate an already trained RepCONC model on a dataset from the BEIR Benchmark.

Thanks!

Kind Regards,
Nandan Thakur

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.