Coder Social home page Coder Social logo

"Error while generating embeddings: runtime error: loading model failed: specified file not found in archive" Huggingface /settings endpoint about meilisearch HOT 10 OPEN

emraza1ai avatar emraza1ai commented on June 16, 2024
"Error while generating embeddings: runtime error: loading model failed: specified file not found in archive" Huggingface /settings endpoint

from meilisearch.

Comments (10)

sanders41 avatar sanders41 commented on June 16, 2024

The issue may be that a BERT model is required, I don't think is BAAI/bge-m3 BERT? Have you tried BERT models and still get the same error?

https://www.meilisearch.com/docs/learn/experimental/vector_search#generate-auto-embeddings-with-huggingface

from meilisearch.

emraza1ai avatar emraza1ai commented on June 16, 2024

You are right. BAAI/bge-m3 is based on XLM-Roberta not BERT.
But I have tried avsolatorio/GIST-all-MiniLM-L6-v2 which is based on BERT to no avail:

uid=73 index_uid='meeting_actions' status='failed' type='settingsUpdate' details={'embedders': {'default': {'source': 'huggingFace', 'model': 'avsolatorio/GIST-all-MiniLM-L6-v2'}}} error={'message': 'internal: Error while generating embeddings: runtime error: loading model failed: specified file not found in archive.', 'code': 'internal', 'type': 'internal', 'link': 'https://docs.meilisearch.com/errors#internal'} canceled_by=None duration='PT23.760405S' enqueued_at=datetime.datetime(2024, 2, 10, 14, 34, 31, 113366) started_at=datetime.datetime(2024, 2, 10, 14, 34, 31, 115745) finished_at=datetime.datetime(2024, 2, 10, 14, 34, 54, 876150)

from meilisearch.

emraza1ai avatar emraza1ai commented on June 16, 2024

I even tried the model mentioned at doc link shared by you @sanders41 https://www.meilisearch.com/docs/learn/experimental/vector_search#generate-auto-embeddings-with-huggingface,
it downloads BAAI/bge-base-en-v1.5 :

config.json [00:00:00] [███████████████████████████████████████████████████████████████████] 777 B/777 B 1.09 KiB/s (0s)
tokenizer.json [00:00:01] [████████████████████████████████████████████████████] 694.72 KiB/694.72 KiB 459.47 KiB/s (0s)
pytorch_model.bin [00:01:36] [███████████████████████████████████████████████████] 417.71 MiB/417.71 MiB 4.32 MiB/s (0s)

But then it still gives the same error:

uid=75 index_uid='meeting_actions' status='failed' type='settingsUpdate' details={'embedders': {'default': {'source': 'huggingFace', 'model': 'BAAI/bge-base-en-v1.5'}}} error={'message': 'internal: Error while generating embeddings: runtime error: loading model failed: specified file not found in archive.', 'code': 'internal', 'type': 'internal', 'link': 'https://docs.meilisearch.com/errors#internal'} canceled_by=None duration='PT102.462092400S' enqueued_at=datetime.datetime(2024, 2, 10, 14, 53, 38, 266617) started_at=datetime.datetime(2024, 2, 10, 14, 53, 38, 304698) finished_at=datetime.datetime(2024, 2, 10, 14, 55, 20, 766790)

from meilisearch.

sanders41 avatar sanders41 commented on June 16, 2024

I was able to recreate this on Linux so it isn't a Windows specific issue. Hugging Face is giving a 401 access denied error. So it looks like a token is needed. hf-hub looks to have a way to pass a token (https://docs.rs/hf-hub/0.3.2/hf_hub/api/sync/struct.ApiBuilder.html#method.with_token), but I don't see a way to pass the token to Meilisearch.

uid=9 index_uid='movies' status='failed' task_type='settingsUpdate' details={'embedders': {'default': {'source': 'huggingFace', 'model': 'bge-base-en', 'revision': 'v1.5', 'documentTemplate': "A movie titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"}}} error={'message': 'internal: Error while generating embeddings: error: fetching file from HG_HUB failed: request error: https://huggingface.co/bge-base-en/resolve/v1.5/config.json: status code 401.', 'code': 'internal', 'type': 'internal', 'link': 'https://docs.meilisearch.com/errors#internal'} canceled_by=None duration='PT0.098242910S' enqueued_at=datetime.datetime(2024, 2, 10, 16, 53, 50, 198972) started_at=datetime.datetime(2024, 2, 10, 16, 53, 50, 203195) finished_at=datetime.datetime(2024, 2, 10, 16, 53, 50, 301438)

from meilisearch.

emraza1ai avatar emraza1ai commented on June 16, 2024

You are facing a different error:

error={'message': 'internal: Error while generating embeddings: error: fetching file from HG_HUB failed: request error: https://huggingface.co/bge-base-en/resolve/v1.5/config.json: status code 401.'

This is because I think you didn't specify BAAI/ before bge-base-en-v1.5, therefore it is resolving to https://huggingface.co/bge-base-en/resolve/v1.5/config.json

Correct way:

curl -X PATCH "http://localhost:7700/indexes/meeting_actions/settings" -H "Content-Type: application/json" --data-binary "{"embedders": {"default": {"source": "huggingFace", "model": "BAAI/bge-base-en-v1.5"}}}"

Wrong way:

curl -X PATCH "http://localhost:7700/indexes/meeting_actions/settings" -H "Content-Type: application/json" --data-binary "{"embedders": {"default": {"source": "huggingFace", "model": "bge-base-en" , "revision": "v1.5"}}}"
OR
curl -X PATCH "http://localhost:7700/indexes/meeting_actions/settings" -H "Content-Type: application/json" --data-binary "{"embedders": {"default": {"source": "huggingFace", "model": "bge-base-en-v1.5"}}}"

from meilisearch.

sanders41 avatar sanders41 commented on June 16, 2024

I tried both ways and got the same error, but I'll try again to make sure I didn't have a typo.

from meilisearch.

sanders41 avatar sanders41 commented on June 16, 2024

Apparently I did have a typo, I tried again and didn't have any errors this time.

from meilisearch.

emraza1ai avatar emraza1ai commented on June 16, 2024

Glad for you!
But coming back to my original error:

uid=69 index_uid='meeting_actions' status='failed' type='settingsUpdate' details={'embedders': {'default': {'source': 'huggingFace', 'model': 'BAAI/bge-base-en-v1.5'}}} error={'message': 'internal: Error while generating embeddings: runtime error: loading model failed: specified file not found in archive.', 'code': 'internal', 'type': 'internal', 'link': 'https://docs.meilisearch.com/errors#internal'} canceled_by=None 

What should I do? I looked around that maybe it might be symlink issue in windows. But no I don't think so, seems to be some other problem.

from meilisearch.

emraza1ai avatar emraza1ai commented on June 16, 2024

Any update on this?

from meilisearch.

dureuill avatar dureuill commented on June 16, 2024

Hello 👋, and thank you for your report ❤️

Given that the issue doesn't reproduce on Linux or macOS, it might be Windows-specific.

👉 Windows bugs are not the priority for the Meili team, so we need someone from the community to check if they can reproduce the bug on their machine, and then, someone to fix the bug.

That said, I'll try to reproduce it on a Windows machine if I can find the time.

from meilisearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.