Comments (4)
Hey, thank you for flagging! This was due to RAGatouille not exporting collection properly to disk, resulting in loading a far too short list when reloading the index! I've just pushed an updated version to PyPi where (experimentally) it should work properly, please let me know if you run into another issue!
from ragatouille.
I seem to still have this issue, on Mac with 0.0.4b1.
I indexed ~7k documents with the following code:
from ragatouille import RAGPretrainedModel
from ragatouille.data import CorpusProcessor
from tqdm import tqdm
from dao import Storage
if __name__ == '__main__':
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
db = Storage('db')
processor = CorpusProcessor()
documents = []
for item in tqdm(db.get_all_items()):
documents.append(json.loads(item[2])['text'])
print(type(documents), type(documents[0])) # This outputs list, str
processed = processor.process_corpus(documents)
print(type(processed), type(processed[0])) # This outputs list, str
index_name = RAG.index(index_name='rag_test', collection=processed)
print('---')
print(index_name)
Then I try querying with:
from ragatouille import RAGPretrainedModel
query = "What manga did Hayao Miyazaki write?"
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
results = RAG.search(query, index_name="colbert/indexes/rag_test")
print(results)
And get
File "/Users/gal/Library/Caches/pypoetry/virtualenvs/knowledgedb-OLhc9Epa-py3.9/lib/python3.9/site-packages/colbert/data/collection.py", line 25, in __getitem__
return self.data[item]
IndexError: list index out of range
With self.data being:
0 = {str} 'list with 7077 elements starting with...'
1 = {list: 3} [...]
from ragatouille.
@hochbergg Thanks for flagging, this is a good catch!
This is due to me having missed an important update on the README. The way to load an existing index is no longer RAGPretrainedModel.from_pretrained()
but RAGPretrainedModel.from_index(index_path)
.
I'm also hoping to restore support for the previous way, but am fiddling with the best way for the config to work for this. I'll update the README asap to clear up the confusion, let me know if it works for you!
from ragatouille.
Changed to
query = "What are some examples of adversarial AI attacks?"
RAG = RAGPretrainedModel.from_index(".ragatouille/colbert/indexes/rag_test")
results = RAG.search(query)
print(results)
And it works. Thank you. (Haven't seen the README update yet, but extrapolated from other example code)
from ragatouille.
Related Issues (20)
- How to get token level similarity scores? HOT 1
- Cannot access pre-trained ColBERT model on Windows 11 (CPU-only) HOT 2
- ImportError: DLL load failed while importing segmented_maxsim_cpp: The specified module could not be found. HOT 1
- Can't search with k over 128 HOT 2
- Inconsistent search results length for high top-k values HOT 4
- Rework Dependencies: ship with barebones dependencies & bundle different features as extras HOT 1
- 02-basic_training.ipynb fails HOT 1
- You have a GPU available, but only `faiss-cpu` is currently installed. HOT 4
- TypeError: array([15055, 320, 22479, 2853, 8197, ..., 374, 3827]) is not JSON serializable HOT 5
- Can't install on WSL 2 Windows 10 or Crashes (using faiss-gpu) HOT 8
- mac m1: trainer.train: ImportError: incompatible architecture (have 'x86_64', need 'arm64') HOT 2
- Pytorch 2.1 on Runpod running Examples hangs with message HOT 5
- llama-index version 0.10.x not compatible HOT 2
- Training resume feature isn't available due to removal in upstream ColBERT HOT 1
- Issue with indexing BGE-M3 (large dimensionality vectors) HOT 4
- Replace ColBERT with jina-colbert-v1-en HOT 2
- ImportError: cannot import name 'Document' from 'llama_index' (unknown location) HOT 11
- ImportError: cannot import name 'LLM' from 'llama_index.core.llms' HOT 1
- Discussion / forum for RAGatouille? HOT 1
- Is there a way to quiet the progress bar printout?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ragatouille.