Following this conversation: <a class="issue-link js-issue-link" data-error-text="Fail

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Meilisearch x Azure: current incompatibilities about meilisearch HOT 3 OPEN

curquiza commented on June 16, 2024 1

Meilisearch x Azure: current incompatibilities

from meilisearch.

Comments (3)

Kerollmops commented on June 16, 2024 2

Hello 👋

Toward my journey to reproduce the issue on Azure I got some interesting behavior of the Azure Basic B2 and Basic B3 App Service Plan and Standard S1 and S2/S3 plans.

While using a B2 plan:

I sent the movies dataset (15MiB) two times to Meilisearch.
When looking at the task queue, we can see that two tasks succeeded.

content-encoding: gzip
content-type: application/json
date: Wed, 14 Feb 2024 14:37:52 GMT
vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers
vary: accept-encoding

{
    "results": [
        {
            "uid": 1,
            "indexUid": "movies",
            "status": "succeeded",
            "type": "documentAdditionOrUpdate",
            "canceledBy": null,
            "details": {
                "receivedDocuments": 31944,
                "indexedDocuments": 31944
            },
            "error": null,
            "duration": "PT2.706101982S",
            "enqueuedAt": "2024-02-14T14:31:37.6525331Z",
            "startedAt": "2024-02-14T14:31:37.692560368Z",
            "finishedAt": "2024-02-14T14:31:40.39866235Z"
        },
        {
            "uid": 0,
            "indexUid": "movies",
            "status": "succeeded",
            "type": "documentAdditionOrUpdate",
            "canceledBy": null,
            "details": {
                "receivedDocuments": 31944,
                "indexedDocuments": 31944
            },
            "error": null,
            "duration": "PT72.261062519S",
            "enqueuedAt": "2024-02-14T14:29:55.660306903Z",
            "startedAt": "2024-02-14T14:29:55.701157519Z",
            "finishedAt": "2024-02-14T14:31:07.962220038Z"
        }
    ],
    "total": 2,
    "limit": 20,
    "from": 1,
    "next": null
}

Then I sent the movies dataset again but decided to move from the B2 to B3 plan or S1 to S2/S3. Everything worked according to Meilisearch as it successfully committed and sent me the task uid 2.

content-encoding: gzip
content-type: application/json
date: Wed, 14 Feb 2024 14:37:54 GMT
vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers
vary: accept-encoding

{
    "taskUid": 2,
    "indexUid": "movies",
    "status": "enqueued",
    "type": "documentAdditionOrUpdate",
    "enqueuedAt": "2024-02-14T14:37:54.772560323Z"
}

However, when I fetched the task queue everything was broken.

content-encoding: gzip
content-type: application/json
date: Wed, 14 Feb 2024 14:38:28 GMT
vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers
vary: accept-encoding

{
    "results": [
        {
            "uid": 1,
            "indexUid": "movies",
            "status": "failed",
            "type": "documentAdditionOrUpdate",
            "canceledBy": null,
            "details": {
                "receivedDocuments": 31944,
                "indexedDocuments": 0
            },
            "error": {
                "message": "No such file or directory (os error 2)",
                "code": "internal",
                "type": "internal",
                "link": "https://docs.meilisearch.com/errors#internal"
            },
            "duration": "PT0.014666324S",
            "enqueuedAt": "2024-02-14T14:31:37.6525331Z",
            "startedAt": "2024-02-14T14:37:54.82802857Z",
            "finishedAt": "2024-02-14T14:37:54.842694894Z"
        },
        {
            "uid": 0,
            "indexUid": "movies",
            "status": "succeeded",
            "type": "documentAdditionOrUpdate",
            "canceledBy": null,
            "details": {
                "receivedDocuments": 31944,
                "indexedDocuments": 31944
            },
            "error": null,
            "duration": "PT72.261062519S",
            "enqueuedAt": "2024-02-14T14:29:55.660306903Z",
            "startedAt": "2024-02-14T14:29:55.701157519Z",
            "finishedAt": "2024-02-14T14:31:07.962220038Z"
        }
    ],
    "total": 2,
    "limit": 20,
    "from": 1,
    "next": null
}

I understand from this that moving from the B2 to the B3 plan and S1 to S2/S3 breaks the disk and, more specifically, the consistency of it. The LMDB database seems to be rolled back into a previous state, but the file containing the documents to update is not, hence the error we got ("no such file or directory").

We are lucky that the Meilisearch database wasn't corrupted in this experiment. Note that we are missing the task with uid two even if we received a confirmation that Meilisearch enqueued it. There is data loss when you switch from B2 to B3 and probably between other plans.

Next Steps

I am currently trying to set up a Meilisearch in a Standard plan (not a Basic one) and move Meilisearch from S1 to S2. If the issue is gone, we will be able to document stuff. It is unsafe to move a Meilisearch between Basic plans, that's for sure.

The same issue appears after investigating the Stantard plans S1 and S2. I updated this comment accordingly.

from meilisearch.

Kerollmops commented on June 16, 2024

Hello @cmaneu 👋

I have some news: Meilisearch works perfectly on the Basic and Standard plans (so far) as long as you don't change the plan. It doesn't appear when Meilisearch is indexing or anything. The Azure infrastructure snapshots the disk when you change a service plan, which is expected.

However, the disk snapshot is not consistent:

The files containing the documents to index are not there. That is expected as Meilisearch has already indexed them and committed the change in the LMDB database by flushing to disk and ensuring the data was durably written to disk.
The database is rolled back in the past before the data was processed and flushed to disk. This results in Meilisearch thinking that some documents update is still enqueued and need to be processed, reading the missing content file ☝️

We are lucky in this scenario because the LMDB databases of Meilisearch are not corrupted. After all, you can still use Meilisearch. However, some other users reported database corruptions, probably resulting from a snapshot not being consistent by file, snapshotting half of a file at the moment and the other half at another point in time, creating inconsistencies in the internal LMDB B+Tree data structures.

@meilisearch/docs-team (👋) can start by documenting that upgrading/downgrading plans on Azure results in database corruptions or inconsistencies in the best scenario and link to this very issue. However, @cmaneu, I would like to ask if you are aware of any way to ensure that a snapshot is done consistently over the whole disk before we decide to change the plan. Even blocking write operations during the snapshot would be perfect 👀 It may be related to the daily snapshotting being caped at 10 for Standard S3 plans. I am sure this problem is affecting a lot, if not all, of transactional databases, i.e., SQLite, RocksDB, and PostgreSQL.

Have a nice day 🏄

from meilisearch.

knd775 commented on June 16, 2024

My DB just got corrupted again without making any changes. Anything I could gather before I reinitialize the instance? Here's the error I am getting now:

from meilisearch.

Meilisearch x Azure: current incompatibilities about meilisearch HOT 3 OPEN

Comments (3)

Next Steps

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent