Comments (16)
Hey @RubenAMtz,
Thanks for this! There's a few problems here, some of them due to RAGatouille and one in your code.
1 - The way indexing works is its first embedded, then processed (this is what Iteration 17 means) to create clusters and ensure querying will be super fast. By default, colbert-v2.0 has a number of k-means iterations of 20, which will create a really strong index! I'll provide an easy way of lowering this in the future for tests, etc... A workaround if you'd like to lower it for your own tests, you can do so by first loading RAG normally then doing RAG.model.config.kmeans_niters = 10
(or any other value).
2 - RAGatouille currently ships with faiss-cpu
as the default install, because it support all platforms and doesn't require a GPU. For indexing faiss-gpu
is much quicker (cc @timothepearce this is relevant to you too). I need to figure out a way to easily change which one is installed depending on the user platform, or add a warning at indexing time, faiss is finicky because it's entirely separate packages...
In the meantime, you can manually use faiss-gpu by installing it via pip:
pip uninstall faiss-cpu
pip install faiss-gpu
This should massively speed up indexing! (It'll still be slow!)
In an upcoming release (soon, hopefully), I'll be adding more warnings, both in the documentation and when running .index() so the user is at least made aware more clearly!
3 - The one issue that is on your end: add_to_index
should be used very sparingly! Indeed, with the way ColBERT works, for large volumes of documents it's generally more efficient (especially with faiss-gpu!) to just rebuild the index. For indexing large collections, you'll be needing to load your data into memory and send it all to RAG.index()
in one go, without creating batches (the documents will automatically be processed in batches by .index() )
from ragatouille.
This is pretty much what I get on WSL2, zero progress, no cpu or gpu usage. I very minorly modified some of the upstream colbert code to disable distributed processing (kept getting remote node errors with torch), but just stuck here, even on the toy wiki example notebook 1. Running as a .py script wrapped in main is the same result.
Anyone found a way around this?
from ragatouille.
Hey,
Thanks to all of you for flagging up the issues! This is all quite odd, and there appears to be quite a lot of variability in how well it runs on Windows/WSL, with some people reporting it working great and (seemingly many) others having all sorts of issues. I appreciate this is quite frustrating!
Supporting Windows is currently not something I can prioritise, but I'd greatly appreciate it if someone managed to figure out what exactly in the upstream library is causing these issues 🤔
from ragatouille.
In the meantime, the new .rerank()
(example here) function could maybe fare better with Windows because it doesn't quite rely on multiprocessing. Not a perfect substitute to full-corpus ColBERT search sadly, but could be worth a try!
from ragatouille.
Yeah training is still auto-forking even on single GPUs! Changing this is the next step (but indexing felt like a bigger priority as training on windows is a rarer use case)
from ragatouille.
Hey,
Multiprocessing is no longer enforced for indexing when using no GPU or a single GPU thanks to @Anmol6's excellent upstream work on stanford-futuredata/ColBERT#290 & propagated by #51.
This is likely to fix the indexing problems on Windows (or at least, one of the problems). The performance may likely still be worse than on Linux, but it should at least start and run properly! Let me know if this solves this issue.
from ragatouille.
@bclavie I see, it makes sense, I've implemented the changes except for the kmeans_niters
parameter, however, I've been waiting for around 30 minutes on this screen:
GPU usage is at 0 still, is the long waiting time expected? Maybe I need to adjust niters as you suggested.
from ragatouille.
Thanks, @bclavie, will give it a try, and will keep an eye on this issue, hopefully, someone with time and expertise will come along to find out what is causing the issues.
from ragatouille.
@bclavie Thanks for the response! Will share any details if I can nail it down.
from ragatouille.
In the meantime, the new
.rerank()
(example here) function could maybe fare better with Windows because it doesn't quite rely on multiprocessing. Not a perfect substitute to full-corpus ColBERT search sadly, but could be worth a try!
This one ran quickly and painlessly on my wsl2 setup.
from ragatouille.
FYI - this PR in Colbert fixed it! Indexed in under 2 seconds in the 01-basic indexing notebook! Definitely was related to distributed mode on a single gpu / workstation.
stanford-futuredata/ColBERT#290
from ragatouille.
Hey, thanks for confirming! This PR should indeed fix Indexing in Colab&Windows, and we (@Anmol6) are also looking at doing the same for training (once both are done, it'll also open up the way for mps
support on MacBooks)
Can't thank @Anmol6 enough for taking this on!
from ragatouille.
Hey, thanks for confirming! This PR should indeed fix Indexing in Colab&Windows, and we (@Anmol6) are also looking at doing the same for training (once both are done, it'll also open up the way for
mps
support on MacBooks)Can't thank @Anmol6 enough for taking this on!
Just tried the last part of example 2, and getting that same error as before. Trainer is definitely forcing distributed torch but the collection indexer fixed indexing, though. Good sign!
from ragatouille.
Yeah training is still auto-forking even on single GPUs! Changing this is the next step (but indexing felt like a bigger priority as training on windows is a rarer use case)
totally - seeing the indexing process gives me the weekend to explore how it's working - much of this is intuitive so far. appreciate it, looking forward to this project as it grows!
from ragatouille.
I've gotten it to work relatively quickly on WSL2 by using Python 3.10 and pinning torch to 2.0.1. I'm running CUDA 12.3 on Ubuntu 22.04. This is what I did to successfully install and run RAGatouille:
conda create -n rag python=3.10
conda activate rag
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
git clone https://github.com/bclavie/RAGatouille
cd RAGatouille/
pip install -e .
pip uninstall faiss-cpu
conda install faiss-gpu
To get started, I used a slightly modified version of the code included in the README.md to index my obsidian notes, which only took about half a minute in total.
I also successfully indexed and queried a large text corpus of roughly 1gb for testing. It did in fact take a very long time to start noticably using the GPU, but the entire process also finished within roughly 2 hours.
Some metrices on running queries against an index of this size:
Conditions | Time to response |
First run after a cold start of WSL2 | 3 minutes until first response |
Second run | 30 seconds until first response |
Consecutive queries without restarting the interpreter | less than 1 second 🤯 |
I've uploaded the two scripts I'm using to index and query the database. The search script includes some code to postprocess the resulting documents using llama2 hosted by a local ollama server.
create_index.py
do_search.py
It generates surprisingly good and consistent results from my very limited tests.
from ragatouille.
I just got myself a Mac Studio M2 Ultra, and have been running this on WSL2 + CUDA (RTX 4090) and now the Mac. So far no issues on either anymore (haven't run all the example notebooks yet, just the first few). Bravo team.
from ragatouille.
Related Issues (20)
- How to get token level similarity scores? HOT 1
- Cannot access pre-trained ColBERT model on Windows 11 (CPU-only) HOT 2
- ImportError: DLL load failed while importing segmented_maxsim_cpp: The specified module could not be found. HOT 1
- Can't search with k over 128 HOT 2
- Inconsistent search results length for high top-k values HOT 4
- Rework Dependencies: ship with barebones dependencies & bundle different features as extras HOT 1
- 02-basic_training.ipynb fails HOT 1
- You have a GPU available, but only `faiss-cpu` is currently installed. HOT 4
- TypeError: array([15055, 320, 22479, 2853, 8197, ..., 374, 3827]) is not JSON serializable HOT 5
- Can't install on WSL 2 Windows 10 or Crashes (using faiss-gpu) HOT 8
- mac m1: trainer.train: ImportError: incompatible architecture (have 'x86_64', need 'arm64') HOT 2
- Pytorch 2.1 on Runpod running Examples hangs with message HOT 5
- llama-index version 0.10.x not compatible HOT 2
- Training resume feature isn't available due to removal in upstream ColBERT HOT 1
- Issue with indexing BGE-M3 (large dimensionality vectors) HOT 4
- Replace ColBERT with jina-colbert-v1-en HOT 2
- ImportError: cannot import name 'Document' from 'llama_index' (unknown location) HOT 11
- ImportError: cannot import name 'LLM' from 'llama_index.core.llms' HOT 1
- Discussion / forum for RAGatouille? HOT 1
- Is there a way to quiet the progress bar printout?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ragatouille.