Comments (10)
The issue may be that a BERT model is required, I don't think is BAAI/bge-m3
BERT? Have you tried BERT models and still get the same error?
from meilisearch.
You are right. BAAI/bge-m3
is based on XLM-Roberta not BERT.
But I have tried avsolatorio/GIST-all-MiniLM-L6-v2
which is based on BERT to no avail:
uid=73 index_uid='meeting_actions' status='failed' type='settingsUpdate' details={'embedders': {'default': {'source': 'huggingFace', 'model': 'avsolatorio/GIST-all-MiniLM-L6-v2'}}} error={'message': 'internal: Error while generating embeddings: runtime error: loading model failed: specified file not found in archive.', 'code': 'internal', 'type': 'internal', 'link': 'https://docs.meilisearch.com/errors#internal'} canceled_by=None duration='PT23.760405S' enqueued_at=datetime.datetime(2024, 2, 10, 14, 34, 31, 113366) started_at=datetime.datetime(2024, 2, 10, 14, 34, 31, 115745) finished_at=datetime.datetime(2024, 2, 10, 14, 34, 54, 876150)
from meilisearch.
I even tried the model mentioned at doc link shared by you @sanders41 https://www.meilisearch.com/docs/learn/experimental/vector_search#generate-auto-embeddings-with-huggingface,
it downloads BAAI/bge-base-en-v1.5
:
config.json [00:00:00] [███████████████████████████████████████████████████████████████████] 777 B/777 B 1.09 KiB/s (0s)
tokenizer.json [00:00:01] [████████████████████████████████████████████████████] 694.72 KiB/694.72 KiB 459.47 KiB/s (0s)
pytorch_model.bin [00:01:36] [███████████████████████████████████████████████████] 417.71 MiB/417.71 MiB 4.32 MiB/s (0s)
But then it still gives the same error:
uid=75 index_uid='meeting_actions' status='failed' type='settingsUpdate' details={'embedders': {'default': {'source': 'huggingFace', 'model': 'BAAI/bge-base-en-v1.5'}}} error={'message': 'internal: Error while generating embeddings: runtime error: loading model failed: specified file not found in archive.', 'code': 'internal', 'type': 'internal', 'link': 'https://docs.meilisearch.com/errors#internal'} canceled_by=None duration='PT102.462092400S' enqueued_at=datetime.datetime(2024, 2, 10, 14, 53, 38, 266617) started_at=datetime.datetime(2024, 2, 10, 14, 53, 38, 304698) finished_at=datetime.datetime(2024, 2, 10, 14, 55, 20, 766790)
from meilisearch.
I was able to recreate this on Linux so it isn't a Windows specific issue. Hugging Face is giving a 401 access denied error. So it looks like a token is needed. hf-hub
looks to have a way to pass a token (https://docs.rs/hf-hub/0.3.2/hf_hub/api/sync/struct.ApiBuilder.html#method.with_token), but I don't see a way to pass the token to Meilisearch.
uid=9 index_uid='movies' status='failed' task_type='settingsUpdate' details={'embedders': {'default': {'source': 'huggingFace', 'model': 'bge-base-en', 'revision': 'v1.5', 'documentTemplate': "A movie titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"}}} error={'message': 'internal: Error while generating embeddings: error: fetching file from HG_HUB failed: request error: https://huggingface.co/bge-base-en/resolve/v1.5/config.json: status code 401.', 'code': 'internal', 'type': 'internal', 'link': 'https://docs.meilisearch.com/errors#internal'} canceled_by=None duration='PT0.098242910S' enqueued_at=datetime.datetime(2024, 2, 10, 16, 53, 50, 198972) started_at=datetime.datetime(2024, 2, 10, 16, 53, 50, 203195) finished_at=datetime.datetime(2024, 2, 10, 16, 53, 50, 301438)
from meilisearch.
You are facing a different error:
error={'message': 'internal: Error while generating embeddings: error: fetching file from HG_HUB failed: request error: https://huggingface.co/bge-base-en/resolve/v1.5/config.json: status code 401.'
This is because I think you didn't specify BAAI/
before bge-base-en-v1.5
, therefore it is resolving to https://huggingface.co/bge-base-en/resolve/v1.5/config.json
Correct way:
curl -X PATCH "http://localhost:7700/indexes/meeting_actions/settings" -H "Content-Type: application/json" --data-binary "{"embedders": {"default": {"source": "huggingFace", "model": "BAAI/bge-base-en-v1.5"}}}"
Wrong way:
curl -X PATCH "http://localhost:7700/indexes/meeting_actions/settings" -H "Content-Type: application/json" --data-binary "{"embedders": {"default": {"source": "huggingFace", "model": "bge-base-en" , "revision": "v1.5"}}}"
OR
curl -X PATCH "http://localhost:7700/indexes/meeting_actions/settings" -H "Content-Type: application/json" --data-binary "{"embedders": {"default": {"source": "huggingFace", "model": "bge-base-en-v1.5"}}}"
from meilisearch.
I tried both ways and got the same error, but I'll try again to make sure I didn't have a typo.
from meilisearch.
Apparently I did have a typo, I tried again and didn't have any errors this time.
from meilisearch.
Glad for you!
But coming back to my original error:
uid=69 index_uid='meeting_actions' status='failed' type='settingsUpdate' details={'embedders': {'default': {'source': 'huggingFace', 'model': 'BAAI/bge-base-en-v1.5'}}} error={'message': 'internal: Error while generating embeddings: runtime error: loading model failed: specified file not found in archive.', 'code': 'internal', 'type': 'internal', 'link': 'https://docs.meilisearch.com/errors#internal'} canceled_by=None
What should I do? I looked around that maybe it might be symlink issue in windows. But no I don't think so, seems to be some other problem.
from meilisearch.
Any update on this?
from meilisearch.
Hello 👋, and thank you for your report ❤️
Given that the issue doesn't reproduce on Linux or macOS, it might be Windows-specific.
👉 Windows bugs are not the priority for the Meili team, so we need someone from the community to check if they can reproduce the bug on their machine, and then, someone to fix the bug.
That said, I'll try to reproduce it on a Windows machine if I can find the time.
from meilisearch.
Related Issues (20)
- Move some information from the LMDB database keys to the values to reduce the number of key to write
- Meilisearch x Azure: current incompatibilities HOT 3
- v1.8.0 ROADMAP
- Create release changelogs for v1.8.0
- Documents containing "œ" / "æ" not found when searching "oe" / "ae" HOT 1
- max-indexing-threads CLI parameter consume an additional thread to write in database HOT 1
- Inconsistent string and number faceted documents order
- Curl Installation Error HOT 4
- Can't search documents with number in the string even separated white space HOT 4
- [EXPERIMENTAL] Provide a way to output logs as json HOT 1
- [EXPERIMENTAL] Update the console log level at runtime HOT 1
- Update cargo.toml version for v1.8.0
- Logs should be outputted to `stderr` instead of `stdout` HOT 1
- Implement changes for Replication HOT 1
- Negative operator
- Make `sortFacetValuesBy` work with `/facet-search` endpoint
- Modify facet-search sort at query time HOT 1
- Meili search ignore offset in request. Always start with 0 in response HOT 2
- Problem Searchable & Unsearchable with v1.6.1 & v1.6.2 HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from meilisearch.