Coder Social home page Coder Social logo

Faiss indexing assertion about colbert HOT 5 CLOSED

cmacdonald avatar cmacdonald commented on August 26, 2024
Faiss indexing assertion

from colbert.

Comments (5)

okhat avatar okhat commented on August 26, 2024 1

That's correct: the assertion was being triggered on every call of _flush_to_cpu; it should be after the last _flush_to_cpu only. (This issue doesn't show when _flush_to_cpu is getting called only once, which happens to be my setup.)

Just pushed a relaxation of the assertion. It should work for you now.

from colbert.

okhat avatar okhat commented on August 26, 2024

For FAISS indexing, here's a sample command for MS MARCO:

python -m colbert.index_faiss --index_root /root/to/indexes/ --index_name IndexName --root /root/to/experiments --experiment ExperimentName --run OptionalRunName --partitions 65536 --sample 0.3

You can probably remove --sample 0.3 which will use a default (0.05). This will be faster and probably lose no effectiveness (unless you increase the partitions further).

If this run cannot use faiss-gpu for any reason, you can also reduce --partitions to just 16384. This will be fast to index on CPU. It will make retrieval slightly slower, so it's a tradeoff. (I just realized this is not relevant to you here, since the error above is only for faiss-gpu. In this case, you can use 65536 or even twice or 4x that quickly enough during indexing.)

The original runs in the paper used just 2000 partitions. Using up to 200k can yield much faster retrieval (much faster than the paper's results too) but you pay the cost upfront during faiss indexing. This is not an issue at all when indexing with faiss-gpu, though.

If you use a large number of partitions, I recommend --nprobe 16 or --nprobe 32 during retrieval. Otherwise, --nprobe 10 is more than enough for small partition numbers.

Let me know if any other parameters are unclear. I will eventually move those to their own set of instructions.

from colbert.

cmacdonald avatar cmacdonald commented on August 26, 2024

Thanks Omar. Looking forward to seeing you tomorrow!

from colbert.

okhat avatar okhat commented on August 26, 2024

Looking forward to seeing you tomorrow as well!

from colbert.

okhat avatar okhat commented on August 26, 2024

Seems resolved. Closing but feel free to re-open if needed.

from colbert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.