Coder Social home page Coder Social logo

Comments (13)

VidyasagarDudekula avatar VidyasagarDudekula commented on July 19, 2024 3

Hi for now I have a workaround solution, please try to add this:

if __name__ == '__main__':
    main() #your main code

my code:-

from ragatouille import RAGPretrainedModel
import requests
def run():
    RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
    def get_wikipedia_page(title: str):
        """
        Retrieve the full text content of a Wikipedia page.

        :param title: str - Title of the Wikipedia page.
        :return: str - Full text content of the page as raw string.
        """
        # Wikipedia API endpoint
        URL = "https://en.wikipedia.org/w/api.php"

        # Parameters for the API request
        params = {
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "extracts",
            "explaintext": True,
        }
        # Custom User-Agent header to comply with Wikipedia's best practices
        headers = {"User-Agent": "RAGatouille_tutorial/0.0.1 ([email protected])"}

        response = requests.get(URL, params=params, headers=headers)
        data = response.json()

        # Extracting page content
        page = next(iter(data["query"]["pages"].values()))
        return page["extract"] if "extract" in page else None
    full_document = get_wikipedia_page("Hayao_Miyazaki")
    RAG.index(
        collection=[full_document],
        index_name="Miyazaki-1234",
        max_document_length=180,
        split_documents=True,
    )
    results = RAG.search(query="What animation studio did Miyazaki found?", k=3)
    print(results)
if __name__ == '__main__':
    run()

from ragatouille.

excubo-jg avatar excubo-jg commented on July 19, 2024 1

Reinstalling the environment solved the issue

from ragatouille.

bclavie avatar bclavie commented on July 19, 2024

Hey! Thank you for flagging, this is helping track down what's causing a stream of new problems.

This being a multiprocessing error, I believe it could be related to the recent changes around multiprocessing in upstream ColBERT. Tagging @Anmol6 who's leading the work on this -- that might have to be tweaked or reverted as it seems to have caused a lot of unexpected issues.

@excubo-jg, would you be able to try with the previous version of ColBERT to check if it works fine? pip uninstall colbert-ai && pip install colbert-ai==0.2.16.

from ragatouille.

excubo-jg avatar excubo-jg commented on July 19, 2024

@bclavie I tried but this did not change the outcome

from ragatouille.

sway4em avatar sway4em commented on July 19, 2024

I'm having a similar issue, on M2.

❯ python3 main.py
[Jan 17, 23:40:44] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...
/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/torch/cuda/amp/grad_scaler.py:125: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn(


[Jan 17, 23:40:51] #> Note: Output directory .ragatouille/colbert/indexes/my_index already exists


[Jan 17, 23:40:55] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...
/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/torch/cuda/amp/grad_scaler.py:125: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn(


[Jan 17, 23:40:56] #> Note: Output directory .ragatouille/colbert/indexes/my_index already exists


Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/swayamchidrawar/repos/ragtest/main.py", line 10, in <module>
    index_path = RAG.index(index_name="my_index", collection=my_documents)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/ragatouille/RAGPretrainedModel.py", line 125, in index
    return self.model.index(
           ^^^^^^^^^^^^^^^^^
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/ragatouille/models/colbert.py", line 204, in index
    self.indexer.index(
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/colbert/indexer.py", line 78, in index
    self.__launch(collection)
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/colbert/indexer.py", line 83, in __launch
    manager = mp.Manager()
              ^^^^^^^^^^^^
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/context.py", line 57, in Manager
    m.start()
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/managers.py", line 563, in start
    self._process.start()
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
        
Traceback (most recent call last):
  File "/Users/swayamchidrawar/repos/ragtest/main.py", line 10, in <module>
    index_path = RAG.index(index_name="my_index", collection=my_documents)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/ragatouille/RAGPretrainedModel.py", line 125, in index
    return self.model.index(
           ^^^^^^^^^^^^^^^^^
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/ragatouille/models/colbert.py", line 204, in index
    self.indexer.index(
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/colbert/indexer.py", line 78, in index
    self.__launch(collection)
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/site-packages/colbert/indexer.py", line 83, in __launch
    manager = mp.Manager()
              ^^^^^^^^^^^^
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/context.py", line 57, in Manager
    m.start()
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/managers.py", line 567, in start
    self._address = reader.recv()
                    ^^^^^^^^^^^^^
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/connection.py", line 249, in recv
    buf = self._recv_bytes()
          ^^^^^^^^^^^^^^^^^^
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
    buf = self._recv(4)
          ^^^^^^^^^^^^^
  File "/Users/swayamchidrawar/anaconda3/lib/python3.11/multiprocessing/connection.py", line 382, in _recv
    raise EOFError
EOFError

from ragatouille.

bclavie avatar bclavie commented on July 19, 2024

Hey,

Appreciate the feedback guys! I've got less time than I'd like to track it down (and big thanks to @Anmol6 for helping) because of health, but so far we're strongly suspecting a dependency issue and are trying to pinpoint exactly why it only happens to a (currently seemingly) random subset of users.

Would you be able to:

  • Try once again with a downgraded Colbert version, to 0.2.15 this time? (pip uninstall --y colbert-ai && pip install colbert-ai==0.2.15)
  • Could you send us your full pip freeze? Via https://pastebin.com if possible!

Thanks a lot

from ragatouille.

excubo-jg avatar excubo-jg commented on July 19, 2024

Oh, sorry to hear that. I hope you get well soon!

  • downgrade down to .15 has no impact

here is the output from pip freeze:
aiohttp==3.9.1
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.2.0
attrs==23.2.0
beautifulsoup4==4.12.2
bitarray==2.9.2
blinker==1.7.0
blis==0.7.11
catalogue==2.0.10
certifi==2023.11.17
charset-normalizer==3.3.2
click==8.1.7
cloudpathlib==0.16.0
colbert-ai==0.2.17
confection==0.1.4
cymem==2.0.8
dataclasses-json==0.6.3
datasets==2.16.1
Deprecated==1.2.14
dill==0.3.7
distro==1.9.0
faiss-cpu==1.7.4
filelock==3.13.1
Flask==3.0.0
frozenlist==1.4.1
fsspec==2023.10.0
git-python==1.0.3
gitdb==4.0.11
GitPython==3.1.41
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.2
httpx==0.26.0
huggingface-hub==0.20.2
idna==3.6
itsdangerous==2.1.2
Jinja2==3.1.3
joblib==1.3.2
jsonpatch==1.33
jsonpointer==2.4
langchain==0.1.0
langchain-community==0.0.13
langchain-core==0.1.11
langcodes==3.3.0
langsmith==0.0.80
llama-index==0.9.32
MarkupSafe==2.1.3
marshmallow==3.20.2
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
murmurhash==1.0.10
mypy-extensions==1.0.0
nest-asyncio==1.5.9
networkx==3.2.1
ninja==1.11.1.1
nltk==3.8.1
numpy==1.26.3
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
onnx==1.15.0
openai==1.7.2
packaging==23.2
pandas==2.1.4
pillow==10.2.0
preshed==3.0.9
protobuf==4.25.2
psutil==5.9.7
pyarrow==14.0.2
pyarrow-hotfix==0.6
pydantic==2.5.3
pydantic_core==2.14.6
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3.post1
PyYAML==6.0.1
RAGatouille==0.0.4b2
regex==2023.12.25
requests==2.31.0
ruff==0.1.13
safetensors==0.4.1
scikit-learn==1.3.2
scipy==1.11.4
sentence-transformers==2.2.2
sentencepiece==0.1.99
six==1.16.0
smart-open==6.4.0
smmap==5.0.1
sniffio==1.3.0
soupsieve==2.5
spacy==3.7.2
spacy-legacy==3.0.12
spacy-loggers==1.0.5
SQLAlchemy==2.0.25
srsly==2.4.8
sympy==1.12
tenacity==8.2.3
thinc==8.2.2
threadpoolctl==3.2.0
tiktoken==0.5.2
tokenizers==0.15.0
torch==2.1.2
torchvision==0.16.2
tqdm==4.66.1
transformers==4.36.2
triton==2.1.0
typer==0.9.0
typing-inspect==0.9.0
typing_extensions==4.9.0
tzdata==2023.4
ujson==5.9.0
urllib3==2.1.0
voyager==2.0.2
wasabi==1.1.2
weasel==0.3.4
Werkzeug==3.0.1
wrapt==1.16.0
xxhash==3.4.1
yarl==1.9.4

from ragatouille.

sway4em avatar sway4em commented on July 19, 2024

Hey,

Appreciate the feedback guys! I've got less time than I'd like to track it down (and big thanks to @Anmol6 for helping) because of health, but so far we're strongly suspecting a dependency issue and are trying to pinpoint exactly why it only happens to a (currently seemingly) random subset of users.

Would you be able to:

  • Try once again with a downgraded Colbert version, to 0.2.15 this time? (pip uninstall --y colbert-ai && pip install colbert-ai==0.2.15)
  • Could you send us your full pip freeze? Via https://pastebin.com if possible!

Thanks a lot

https://pastebin.com/L7gPAT5a

from ragatouille.

GMartin-dev avatar GMartin-dev commented on July 19, 2024

+1 to this i'm experiencing a similar issue on Ubuntu 22.04 + langchain, but I was trying the library apart from the langchain integration too.
If this detail helps narrowing this down:
On memory indexing and Rerank works, the problem is creating the index files it seems that flow gets interrupted

from ragatouille.

bjsi avatar bjsi commented on July 19, 2024

@VidyasagarDudekula Nice, this worked for me 👍

from ragatouille.

excubo-jg avatar excubo-jg commented on July 19, 2024

I just checked with the current version and the error persists.

@VidyasagarDudekula describes how to fix https://python.langchain.com/docs/integrations/retrievers/ragatouille which is not the indexing example.

from ragatouille.

Anmol6 avatar Anmol6 commented on July 19, 2024

@excubo-jg could you try again with the newer version?

from ragatouille.

excubo-jg avatar excubo-jg commented on July 19, 2024

Did try with 6b0 and still got the same error

from ragatouille.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.