jingtaozhan / repconc Goto Github PK

WSDM'22 Best Paper: Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval

License: MIT License

Python 100.00%

dense-retrieval product-quantization efficiency neural-ranking information-retrieval

repconc's Issues

Release of the training code

Hello,

Thank you very much for this very interesting paper and open sourcing the results.
I was wondering if you were planning on releasing the training code as well ?

Thanks,

claeyzre

Unsupervised PQ results

In table-1, is there any explanation for why the results of unsupervised PQ (MRR@10 = 0.028 @ compression ratio = 64x) so poor?

In our experience, PQ works reasonably well. For example, when we use PQ to compress vectors (768 dim) generated by the ANCE model with M=32 (compression ratio = 96x), we get MRR@10 = 0.252 on MS MARCO Passage Dev. We used IndexPQ from the FAISS library for the same.

Also, when reporting results on unsupervised methods (PQ, ScaNN, OPQ etc), what is the encoder of input uncompressed vectors? Is it the trained STAR model?

Fixing of Index Assignments

The RepCONC paper mentions several times that it does not fix index assignments like JPQ.
However, in section 3.6.2, there is a contradictory line as follows: "To enable end-to-end retrieval during training, we fix the Index Assignments and only train the query encoder and PQ Centroid Embeddings.". Is this line a typing error?

Multi-gpu training

Hi,
Thanks for sharing the code! Does your code support dp or ddp training? I tried running your training script but it seemed that it only supports single-gpu. Could you please tell me how to train your model with dp or ddp?

Evaluating RepCONC on different datasets in a zero-shot fashion

Hi @jingtaozhan,

Thanks for releasing this super repository and interesting paper. I'm interested in evaluating the model generalization across different datasets. For example, evaluating the model on different datasets from the BEIR Benchmark (https://github.com/UKPLab/beir).

It would really help if a sample code is available to evaluate an already trained RepCONC model on a dataset from the BEIR Benchmark.

Thanks!

Kind Regards,
Nandan Thakur

jingtaozhan / repconc Goto Github PK

repconc's Issues

Release of the training code

Unsupervised PQ results

Fixing of Index Assignments

Multi-gpu training

Evaluating RepCONC on different datasets in a zero-shot fashion

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent