Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

This sounds right, but <a class="user-mention notranslate" data-hovercard-type="user"

Clarification on PLAID retrieval about colbert HOT 6 CLOSED

thibault-formal commented on August 26, 2024

Clarification on PLAID retrieval

from colbert.

Comments (6)

okhat commented on August 26, 2024 1

Hey Thibault! Hope you’re well.

I can check but basically we return k, but we compute exact scores for a larger number than k

from colbert.

okhat commented on August 26, 2024 1

This sounds right, but @santhnm2 might be able to confirm too

from colbert.

thibault-formal commented on August 26, 2024

Hey Omar! I hope you are well too!
I see -- so basically

in the k=10 scenario, you compute 256/4=64 exact scores
in the k=100 scenario, you compute 1024/4=256 exact scores
etc.

Just checking that I got things correctly, as I have been working on related stuff :)
Thanks

from colbert.

santhnm2 commented on August 26, 2024

Yes this is correct, this function is where we choose the hyperparameters according to k: https://github.com/stanford-futuredata/ColBERT/blob/main/colbert/searcher.py#L88
And here is where the number of exact scores is computed:

ColBERT/colbert/search/index_storage.py

Line 152 in fc3ce55

pids = pids[torch.topk(approx_scores, k=(config.ndocs // 4)).indices]

from colbert.

thibault-formal commented on August 26, 2024

Perfect, thank you both for the quick answer!

from colbert.

thibault-formal commented on August 26, 2024

Hi again,

I have another (unrelated) question regarding PLAID: did you evaluate the performance on the BEIR benchmark? Could there be a performance drop (OOD) due to the approximation?

EDIT: I saw the Lotte results (apparently no drop) but I wonder if it's also true on BEIR

Thanks

from colbert.

Recommend Projects