Coder Social home page Coder Social logo

Comments (4)

bclavie avatar bclavie commented on July 19, 2024

Hey, thank you for flagging! This was due to RAGatouille not exporting collection properly to disk, resulting in loading a far too short list when reloading the index! I've just pushed an updated version to PyPi where (experimentally) it should work properly, please let me know if you run into another issue!

from ragatouille.

hochbergg avatar hochbergg commented on July 19, 2024

I seem to still have this issue, on Mac with 0.0.4b1.
I indexed ~7k documents with the following code:

from ragatouille import RAGPretrainedModel
from ragatouille.data import CorpusProcessor
from tqdm import tqdm

from dao import Storage

if __name__ == '__main__':
    RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
    db = Storage('db')
    processor = CorpusProcessor()

    documents = []
    for item in tqdm(db.get_all_items()):
        documents.append(json.loads(item[2])['text'])

    print(type(documents), type(documents[0])) # This outputs list, str
    processed = processor.process_corpus(documents)
    print(type(processed), type(processed[0])) # This outputs list, str

    index_name = RAG.index(index_name='rag_test', collection=processed)

    print('---')
    print(index_name)

Then I try querying with:

from ragatouille import RAGPretrainedModel

query = "What manga did Hayao Miyazaki write?"
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
results = RAG.search(query, index_name="colbert/indexes/rag_test")

print(results)

And get

  File "/Users/gal/Library/Caches/pypoetry/virtualenvs/knowledgedb-OLhc9Epa-py3.9/lib/python3.9/site-packages/colbert/data/collection.py", line 25, in __getitem__
    return self.data[item]
IndexError: list index out of range

With self.data being:

0 = {str} 'list with 7077 elements starting with...'
1 = {list: 3} [...]

from ragatouille.

bclavie avatar bclavie commented on July 19, 2024

@hochbergg Thanks for flagging, this is a good catch!

This is due to me having missed an important update on the README. The way to load an existing index is no longer RAGPretrainedModel.from_pretrained() but RAGPretrainedModel.from_index(index_path).

I'm also hoping to restore support for the previous way, but am fiddling with the best way for the config to work for this. I'll update the README asap to clear up the confusion, let me know if it works for you!

from ragatouille.

hochbergg avatar hochbergg commented on July 19, 2024

Changed to

query = "What are some examples of adversarial AI attacks?"
RAG = RAGPretrainedModel.from_index(".ragatouille/colbert/indexes/rag_test")
results = RAG.search(query)

print(results)

And it works. Thank you. (Haven't seen the README update yet, but extrapolated from other example code)

from ragatouille.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.