Comments (13)
Hi for now I have a workaround solution, please try to add this:
if __name__ == '__main__':
main() #your main code
my code:-
from ragatouille import RAGPretrainedModel
import requests
def run():
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
def get_wikipedia_page(title: str):
"""
Retrieve the full text content of a Wikipedia page.
:param title: str - Title of the Wikipedia page.
:return: str - Full text content of the page as raw string.
"""
# Wikipedia API endpoint
URL = "https://en.wikipedia.org/w/api.php"
# Parameters for the API request
params = {
"action": "query",
"format": "json",
"titles": title,
"prop": "extracts",
"explaintext": True,
}
# Custom User-Agent header to comply with Wikipedia's best practices
headers = {"User-Agent": "RAGatouille_tutorial/0.0.1 ([email protected])"}
response = requests.get(URL, params=params, headers=headers)
data = response.json()
# Extracting page content
page = next(iter(data["query"]["pages"].values()))
return page["extract"] if "extract" in page else None
full_document = get_wikipedia_page("Hayao_Miyazaki")
RAG.index(
collection=[full_document],
index_name="Miyazaki-1234",
max_document_length=180,
split_documents=True,
)
results = RAG.search(query="What animation studio did Miyazaki found?", k=3)
print(results)
if __name__ == '__main__':
run()
from ragatouille.
Reinstalling the environment solved the issue
from ragatouille.
Hey! Thank you for flagging, this is helping track down what's causing a stream of new problems.
This being a multiprocessing error, I believe it could be related to the recent changes around multiprocessing in upstream ColBERT. Tagging @Anmol6 who's leading the work on this -- that might have to be tweaked or reverted as it seems to have caused a lot of unexpected issues.
@excubo-jg, would you be able to try with the previous version of ColBERT to check if it works fine? pip uninstall colbert-ai && pip install colbert-ai==0.2.16
.
from ragatouille.
@bclavie I tried but this did not change the outcome
from ragatouille.
I'm having a similar issue, on M2.
❯ python3 main.py
[Jan 17, 23:40:44] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...
/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/torch/cuda/amp/grad_scaler.py:125: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
warnings.warn(
[Jan 17, 23:40:51] #> Note: Output directory .ragatouille/colbert/indexes/my_index already exists
[Jan 17, 23:40:55] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...
/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/torch/cuda/amp/grad_scaler.py:125: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
warnings.warn(
[Jan 17, 23:40:56] #> Note: Output directory .ragatouille/colbert/indexes/my_index already exists
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
prepare(preparation_data)
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen runpy>", line 291, in run_path
File "<frozen runpy>", line 98, in _run_module_code
File "<frozen runpy>", line 88, in _run_code
File "/Users/swayamchidrawar/repos/ragtest/main.py", line 10, in <module>
index_path = RAG.index(index_name="my_index", collection=my_documents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/ragatouille/RAGPretrainedModel.py", line 125, in index
return self.model.index(
^^^^^^^^^^^^^^^^^
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/ragatouille/models/colbert.py", line 204, in index
self.indexer.index(
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/colbert/indexer.py", line 78, in index
self.__launch(collection)
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/colbert/indexer.py", line 83, in __launch
manager = mp.Manager()
^^^^^^^^^^^^
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/context.py", line 57, in Manager
m.start()
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/managers.py", line 563, in start
self._process.start()
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/spawn.py", line 164, in get_preparation_data
_check_not_importing_main()
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/spawn.py", line 140, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
To fix this issue, refer to the "Safe importing of main module"
section in https://docs.python.org/3/library/multiprocessing.html
Traceback (most recent call last):
File "/Users/swayamchidrawar/repos/ragtest/main.py", line 10, in <module>
index_path = RAG.index(index_name="my_index", collection=my_documents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/ragatouille/RAGPretrainedModel.py", line 125, in index
return self.model.index(
^^^^^^^^^^^^^^^^^
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/ragatouille/models/colbert.py", line 204, in index
self.indexer.index(
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/colbert/indexer.py", line 78, in index
self.__launch(collection)
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/colbert/indexer.py", line 83, in __launch
manager = mp.Manager()
^^^^^^^^^^^^
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/context.py", line 57, in Manager
m.start()
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/managers.py", line 567, in start
self._address = reader.recv()
^^^^^^^^^^^^^
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/connection.py", line 249, in recv
buf = self._recv_bytes()
^^^^^^^^^^^^^^^^^^
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
buf = self._recv(4)
^^^^^^^^^^^^^
File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/connection.py", line 382, in _recv
raise EOFError
EOFError
from ragatouille.
Hey,
Appreciate the feedback guys! I've got less time than I'd like to track it down (and big thanks to @Anmol6 for helping) because of health, but so far we're strongly suspecting a dependency issue and are trying to pinpoint exactly why it only happens to a (currently seemingly) random subset of users.
Would you be able to:
- Try once again with a downgraded Colbert version, to 0.2.15 this time? (
pip uninstall --y colbert-ai && pip install colbert-ai==0.2.15
) - Could you send us your full
pip freeze
? Via https://pastebin.com if possible!
Thanks a lot
from ragatouille.
Oh, sorry to hear that. I hope you get well soon!
- downgrade down to .15 has no impact
here is the output from pip freeze:
aiohttp==3.9.1
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.2.0
attrs==23.2.0
beautifulsoup4==4.12.2
bitarray==2.9.2
blinker==1.7.0
blis==0.7.11
catalogue==2.0.10
certifi==2023.11.17
charset-normalizer==3.3.2
click==8.1.7
cloudpathlib==0.16.0
colbert-ai==0.2.17
confection==0.1.4
cymem==2.0.8
dataclasses-json==0.6.3
datasets==2.16.1
Deprecated==1.2.14
dill==0.3.7
distro==1.9.0
faiss-cpu==1.7.4
filelock==3.13.1
Flask==3.0.0
frozenlist==1.4.1
fsspec==2023.10.0
git-python==1.0.3
gitdb==4.0.11
GitPython==3.1.41
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.2
httpx==0.26.0
huggingface-hub==0.20.2
idna==3.6
itsdangerous==2.1.2
Jinja2==3.1.3
joblib==1.3.2
jsonpatch==1.33
jsonpointer==2.4
langchain==0.1.0
langchain-community==0.0.13
langchain-core==0.1.11
langcodes==3.3.0
langsmith==0.0.80
llama-index==0.9.32
MarkupSafe==2.1.3
marshmallow==3.20.2
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
murmurhash==1.0.10
mypy-extensions==1.0.0
nest-asyncio==1.5.9
networkx==3.2.1
ninja==1.11.1.1
nltk==3.8.1
numpy==1.26.3
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
onnx==1.15.0
openai==1.7.2
packaging==23.2
pandas==2.1.4
pillow==10.2.0
preshed==3.0.9
protobuf==4.25.2
psutil==5.9.7
pyarrow==14.0.2
pyarrow-hotfix==0.6
pydantic==2.5.3
pydantic_core==2.14.6
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3.post1
PyYAML==6.0.1
RAGatouille==0.0.4b2
regex==2023.12.25
requests==2.31.0
ruff==0.1.13
safetensors==0.4.1
scikit-learn==1.3.2
scipy==1.11.4
sentence-transformers==2.2.2
sentencepiece==0.1.99
six==1.16.0
smart-open==6.4.0
smmap==5.0.1
sniffio==1.3.0
soupsieve==2.5
spacy==3.7.2
spacy-legacy==3.0.12
spacy-loggers==1.0.5
SQLAlchemy==2.0.25
srsly==2.4.8
sympy==1.12
tenacity==8.2.3
thinc==8.2.2
threadpoolctl==3.2.0
tiktoken==0.5.2
tokenizers==0.15.0
torch==2.1.2
torchvision==0.16.2
tqdm==4.66.1
transformers==4.36.2
triton==2.1.0
typer==0.9.0
typing-inspect==0.9.0
typing_extensions==4.9.0
tzdata==2023.4
ujson==5.9.0
urllib3==2.1.0
voyager==2.0.2
wasabi==1.1.2
weasel==0.3.4
Werkzeug==3.0.1
wrapt==1.16.0
xxhash==3.4.1
yarl==1.9.4
from ragatouille.
Hey,
Appreciate the feedback guys! I've got less time than I'd like to track it down (and big thanks to @Anmol6 for helping) because of health, but so far we're strongly suspecting a dependency issue and are trying to pinpoint exactly why it only happens to a (currently seemingly) random subset of users.
Would you be able to:
- Try once again with a downgraded Colbert version, to 0.2.15 this time? (
pip uninstall --y colbert-ai && pip install colbert-ai==0.2.15
)- Could you send us your full
pip freeze
? Via https://pastebin.com if possible!Thanks a lot
from ragatouille.
+1 to this i'm experiencing a similar issue on Ubuntu 22.04 + langchain, but I was trying the library apart from the langchain integration too.
If this detail helps narrowing this down:
On memory indexing and Rerank works, the problem is creating the index files it seems that flow gets interrupted
from ragatouille.
@VidyasagarDudekula Nice, this worked for me 👍
from ragatouille.
I just checked with the current version and the error persists.
@VidyasagarDudekula describes how to fix https://python.langchain.com/docs/integrations/retrievers/ragatouille which is not the indexing example.
from ragatouille.
@excubo-jg could you try again with the newer version?
from ragatouille.
Did try with 6b0 and still got the same error
from ragatouille.
Related Issues (20)
- Stuck at " Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)..." HOT 1
- ImportError: cannot import name 'PromptTemplate' from 'llama_index' (unknown location)
- Compatibility with LangChain 0.2.0 HOT 3
- How to extract embeddings generated by Colbert? HOT 2
- Idea: Make CorpusProcessor (and splitter_fn / preprocessing_fn) to have access to metadata
- Embedding Model with Existing Index
- How to index collection using generator function?
- Training script is not working as is
- Making deletions will alter the collection.json file, hence the search function unusable because we access the collection using list indices.
- can't access my finetuned model
- Use base model or sentence transformer
- ragatouille requires a version of numpy uncompatible with python
- ValueError: RAGatouille is not installed. Please install it with `pip install ragatouille`.
- Documentation (API Reference): missing params in description tables
- Question -- Symmetric search
- How to use highly capable Decoder only models (LLMs) with RAGatouille -- is it even advisable?
- RuntimeError when indexing with FAISS
- RAGAtouille cpu only installation
- Typescript support of RAGatouille
- Question: rerank does not use index
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ragatouille.