Coder Social home page Coder Social logo

Comments (9)

bclavie avatar bclavie commented on August 19, 2024 3

Will do @timothepearce

This is (probably) the last feature I'll push before spending some time on important housekeeping (setting up CI, tests, better documentation, tutorials for training on a new language, etc...), but I'm hoping to have it out next week (in beta, just like everything else in RAGatouille at the moment 😄) !

from ragatouille.

bclavie avatar bclavie commented on August 19, 2024 2

Hey @hiranya911 @timothepearce, closing this issue as it's now available in 0.0.4a1 #31 🥳

from ragatouille.

bclavie avatar bclavie commented on August 19, 2024 2

What are the scores returned by the rerank() API? Are they logits (log probabilities) or some other scaled values?

This is a good question and could do with more explaining. It's non-normalised MaxSim scores, which is how ColBERT score documents: for each query token, compare cosine distance w/ all document tokens and keep the max score in memory, and the total score is the sum of all those individual scores. (a good, slightly longer explanation can be found here). This could be normalised to give a "relevance" estimate.

Are there any recommendations on the content length of documents passed into rerank()?

Anything up to your ColBERT's base model maximum length (for ColBERTv2, bert-base-uncased, so 512) is fine, but the longer the documents, the slower the process is. I think it's mostly about finding the sweet spot for you between doc length and efficiency constraints!

from ragatouille.

okhat avatar okhat commented on August 19, 2024 1

That would be so cool! I have some code for this, @bclavie, I can get it.

QQ for @hiranya911 , do you want the docs to be pre-encoded? Or supplied at query time?

from ragatouille.

santhnm2 avatar santhnm2 commented on August 19, 2024

The search function in ColBERT accepts a pids argument which can be used to rank only the given documents.

from ragatouille.

hiranya911 avatar hiranya911 commented on August 19, 2024

@okhat I think I want to pass the documents as raw text. Kind of similar to how the MsMarco cross encoder API is set up. But I'm sure passing the pre-encoded docs is valid use case too.

from ragatouille.

bclavie avatar bclavie commented on August 19, 2024

Hey @hiranya911, this is definitely something that I'll be adding to the roadmap (@okhat, please do share the code you have 😄), thanks for the suggestion!

from ragatouille.

timothepearce avatar timothepearce commented on August 19, 2024

@bclavie When you're done, ping me here, and I'll PR weaviate/reranker-transformers to add RAGatouille there!

from ragatouille.

hiranya911 avatar hiranya911 commented on August 19, 2024

This is working like a charm. Thanks for the quick turnaround 🙏

Couple of questions when you have a moment:

  1. What are the scores returned by the rerank() API? Are they logits (log probabilities) or some other scaled values?
  2. Are there any recommendations on the content length of documents passed into rerank()?

from ragatouille.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.