Coder Social home page Coder Social logo

Comments (16)

bclavie avatar bclavie commented on July 19, 2024 3

Hey @RubenAMtz,

Thanks for this! There's a few problems here, some of them due to RAGatouille and one in your code.

1 - The way indexing works is its first embedded, then processed (this is what Iteration 17 means) to create clusters and ensure querying will be super fast. By default, colbert-v2.0 has a number of k-means iterations of 20, which will create a really strong index! I'll provide an easy way of lowering this in the future for tests, etc... A workaround if you'd like to lower it for your own tests, you can do so by first loading RAG normally then doing RAG.model.config.kmeans_niters = 10 (or any other value).

2 - RAGatouille currently ships with faiss-cpu as the default install, because it support all platforms and doesn't require a GPU. For indexing faiss-gpu is much quicker (cc @timothepearce this is relevant to you too). I need to figure out a way to easily change which one is installed depending on the user platform, or add a warning at indexing time, faiss is finicky because it's entirely separate packages...

In the meantime, you can manually use faiss-gpu by installing it via pip:

pip uninstall faiss-cpu
pip install faiss-gpu

This should massively speed up indexing! (It'll still be slow!)

In an upcoming release (soon, hopefully), I'll be adding more warnings, both in the documentation and when running .index() so the user is at least made aware more clearly!

3 - The one issue that is on your end: add_to_index should be used very sparingly! Indeed, with the way ColBERT works, for large volumes of documents it's generally more efficient (especially with faiss-gpu!) to just rebuild the index. For indexing large collections, you'll be needing to load your data into memory and send it all to RAG.index() in one go, without creating batches (the documents will automatically be processed in batches by .index() )

from ragatouille.

fblissjr avatar fblissjr commented on July 19, 2024 2

This is pretty much what I get on WSL2, zero progress, no cpu or gpu usage. I very minorly modified some of the upstream colbert code to disable distributed processing (kept getting remote node errors with torch), but just stuck here, even on the toy wiki example notebook 1. Running as a .py script wrapped in main is the same result.

Anyone found a way around this?

from ragatouille.

bclavie avatar bclavie commented on July 19, 2024 2

Hey,

Thanks to all of you for flagging up the issues! This is all quite odd, and there appears to be quite a lot of variability in how well it runs on Windows/WSL, with some people reporting it working great and (seemingly many) others having all sorts of issues. I appreciate this is quite frustrating!

Supporting Windows is currently not something I can prioritise, but I'd greatly appreciate it if someone managed to figure out what exactly in the upstream library is causing these issues 🤔

from ragatouille.

bclavie avatar bclavie commented on July 19, 2024 2

In the meantime, the new .rerank() (example here) function could maybe fare better with Windows because it doesn't quite rely on multiprocessing. Not a perfect substitute to full-corpus ColBERT search sadly, but could be worth a try!

from ragatouille.

bclavie avatar bclavie commented on July 19, 2024 1

Yeah training is still auto-forking even on single GPUs! Changing this is the next step (but indexing felt like a bigger priority as training on windows is a rarer use case)

from ragatouille.

bclavie avatar bclavie commented on July 19, 2024 1

Hey,

Multiprocessing is no longer enforced for indexing when using no GPU or a single GPU thanks to @Anmol6's excellent upstream work on stanford-futuredata/ColBERT#290 & propagated by #51.

This is likely to fix the indexing problems on Windows (or at least, one of the problems). The performance may likely still be worse than on Linux, but it should at least start and run properly! Let me know if this solves this issue.

from ragatouille.

RubenAMtz avatar RubenAMtz commented on July 19, 2024

@bclavie I see, it makes sense, I've implemented the changes except for the kmeans_niters parameter, however, I've been waiting for around 30 minutes on this screen:

image

GPU usage is at 0 still, is the long waiting time expected? Maybe I need to adjust niters as you suggested.

from ragatouille.

RubenAMtz avatar RubenAMtz commented on July 19, 2024

Thanks, @bclavie, will give it a try, and will keep an eye on this issue, hopefully, someone with time and expertise will come along to find out what is causing the issues.

from ragatouille.

fblissjr avatar fblissjr commented on July 19, 2024

@bclavie Thanks for the response! Will share any details if I can nail it down.

from ragatouille.

fblissjr avatar fblissjr commented on July 19, 2024

In the meantime, the new .rerank() (example here) function could maybe fare better with Windows because it doesn't quite rely on multiprocessing. Not a perfect substitute to full-corpus ColBERT search sadly, but could be worth a try!

This one ran quickly and painlessly on my wsl2 setup.

from ragatouille.

fblissjr avatar fblissjr commented on July 19, 2024

FYI - this PR in Colbert fixed it! Indexed in under 2 seconds in the 01-basic indexing notebook! Definitely was related to distributed mode on a single gpu / workstation.

stanford-futuredata/ColBERT#290

from ragatouille.

bclavie avatar bclavie commented on July 19, 2024

Hey, thanks for confirming! This PR should indeed fix Indexing in Colab&Windows, and we (@Anmol6) are also looking at doing the same for training (once both are done, it'll also open up the way for mps support on MacBooks)

Can't thank @Anmol6 enough for taking this on!

from ragatouille.

fblissjr avatar fblissjr commented on July 19, 2024

Hey, thanks for confirming! This PR should indeed fix Indexing in Colab&Windows, and we (@Anmol6) are also looking at doing the same for training (once both are done, it'll also open up the way for mps support on MacBooks)

Can't thank @Anmol6 enough for taking this on!

Just tried the last part of example 2, and getting that same error as before. Trainer is definitely forcing distributed torch but the collection indexer fixed indexing, though. Good sign!

from ragatouille.

fblissjr avatar fblissjr commented on July 19, 2024

Yeah training is still auto-forking even on single GPUs! Changing this is the next step (but indexing felt like a bigger priority as training on windows is a rarer use case)

totally - seeing the indexing process gives me the weekend to explore how it's working - much of this is intuitive so far. appreciate it, looking forward to this project as it grows!

from ragatouille.

TheMcSebi avatar TheMcSebi commented on July 19, 2024

I've gotten it to work relatively quickly on WSL2 by using Python 3.10 and pinning torch to 2.0.1. I'm running CUDA 12.3 on Ubuntu 22.04. This is what I did to successfully install and run RAGatouille:

conda create -n rag python=3.10
conda activate rag
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
git clone https://github.com/bclavie/RAGatouille
cd RAGatouille/
pip install -e .
pip uninstall faiss-cpu
conda install faiss-gpu

To get started, I used a slightly modified version of the code included in the README.md to index my obsidian notes, which only took about half a minute in total.

I also successfully indexed and queried a large text corpus of roughly 1gb for testing. It did in fact take a very long time to start noticably using the GPU, but the entire process also finished within roughly 2 hours.

Some metrices on running queries against an index of this size:

ConditionsTime to response
First run after a cold start of WSL23 minutes until first response
Second run30 seconds until first response
Consecutive queries without restarting the interpreterless than 1 second 🤯

I've uploaded the two scripts I'm using to index and query the database. The search script includes some code to postprocess the resulting documents using llama2 hosted by a local ollama server.
create_index.py
do_search.py
It generates surprisingly good and consistent results from my very limited tests.

from ragatouille.

fblissjr avatar fblissjr commented on July 19, 2024

I just got myself a Mac Studio M2 Ultra, and have been running this on WSL2 + CUDA (RTX 4090) and now the Mac. So far no issues on either anymore (haven't run all the example notebooks yet, just the first few). Bravo team.

from ragatouille.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.