Coder Social home page Coder Social logo

Comments (3)

Kerollmops avatar Kerollmops commented on June 16, 2024 2

Hello 👋

Toward my journey to reproduce the issue on Azure I got some interesting behavior of the Azure Basic B2 and Basic B3 App Service Plan and Standard S1 and S2/S3 plans.

While using a B2 plan:

  1. I sent the movies dataset (15MiB) two times to Meilisearch.
  2. When looking at the task queue, we can see that two tasks succeeded.
content-encoding: gzip
content-type: application/json
date: Wed, 14 Feb 2024 14:37:52 GMT
vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers
vary: accept-encoding

{
    "results": [
        {
            "uid": 1,
            "indexUid": "movies",
            "status": "succeeded",
            "type": "documentAdditionOrUpdate",
            "canceledBy": null,
            "details": {
                "receivedDocuments": 31944,
                "indexedDocuments": 31944
            },
            "error": null,
            "duration": "PT2.706101982S",
            "enqueuedAt": "2024-02-14T14:31:37.6525331Z",
            "startedAt": "2024-02-14T14:31:37.692560368Z",
            "finishedAt": "2024-02-14T14:31:40.39866235Z"
        },
        {
            "uid": 0,
            "indexUid": "movies",
            "status": "succeeded",
            "type": "documentAdditionOrUpdate",
            "canceledBy": null,
            "details": {
                "receivedDocuments": 31944,
                "indexedDocuments": 31944
            },
            "error": null,
            "duration": "PT72.261062519S",
            "enqueuedAt": "2024-02-14T14:29:55.660306903Z",
            "startedAt": "2024-02-14T14:29:55.701157519Z",
            "finishedAt": "2024-02-14T14:31:07.962220038Z"
        }
    ],
    "total": 2,
    "limit": 20,
    "from": 1,
    "next": null
}
  1. Then I sent the movies dataset again but decided to move from the B2 to B3 plan or S1 to S2/S3. Everything worked according to Meilisearch as it successfully committed and sent me the task uid 2.
content-encoding: gzip
content-type: application/json
date: Wed, 14 Feb 2024 14:37:54 GMT
vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers
vary: accept-encoding

{
    "taskUid": 2,
    "indexUid": "movies",
    "status": "enqueued",
    "type": "documentAdditionOrUpdate",
    "enqueuedAt": "2024-02-14T14:37:54.772560323Z"
}
  1. However, when I fetched the task queue everything was broken.
content-encoding: gzip
content-type: application/json
date: Wed, 14 Feb 2024 14:38:28 GMT
vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers
vary: accept-encoding

{
    "results": [
        {
            "uid": 1,
            "indexUid": "movies",
            "status": "failed",
            "type": "documentAdditionOrUpdate",
            "canceledBy": null,
            "details": {
                "receivedDocuments": 31944,
                "indexedDocuments": 0
            },
            "error": {
                "message": "No such file or directory (os error 2)",
                "code": "internal",
                "type": "internal",
                "link": "https://docs.meilisearch.com/errors#internal"
            },
            "duration": "PT0.014666324S",
            "enqueuedAt": "2024-02-14T14:31:37.6525331Z",
            "startedAt": "2024-02-14T14:37:54.82802857Z",
            "finishedAt": "2024-02-14T14:37:54.842694894Z"
        },
        {
            "uid": 0,
            "indexUid": "movies",
            "status": "succeeded",
            "type": "documentAdditionOrUpdate",
            "canceledBy": null,
            "details": {
                "receivedDocuments": 31944,
                "indexedDocuments": 31944
            },
            "error": null,
            "duration": "PT72.261062519S",
            "enqueuedAt": "2024-02-14T14:29:55.660306903Z",
            "startedAt": "2024-02-14T14:29:55.701157519Z",
            "finishedAt": "2024-02-14T14:31:07.962220038Z"
        }
    ],
    "total": 2,
    "limit": 20,
    "from": 1,
    "next": null
}

I understand from this that moving from the B2 to the B3 plan and S1 to S2/S3 breaks the disk and, more specifically, the consistency of it. The LMDB database seems to be rolled back into a previous state, but the file containing the documents to update is not, hence the error we got ("no such file or directory").

We are lucky that the Meilisearch database wasn't corrupted in this experiment. Note that we are missing the task with uid two even if we received a confirmation that Meilisearch enqueued it. There is data loss when you switch from B2 to B3 and probably between other plans.

Next Steps

I am currently trying to set up a Meilisearch in a Standard plan (not a Basic one) and move Meilisearch from S1 to S2. If the issue is gone, we will be able to document stuff. It is unsafe to move a Meilisearch between Basic plans, that's for sure.

The same issue appears after investigating the Stantard plans S1 and S2. I updated this comment accordingly.

from meilisearch.

Kerollmops avatar Kerollmops commented on June 16, 2024

Hello @cmaneu 👋

I have some news: Meilisearch works perfectly on the Basic and Standard plans (so far) as long as you don't change the plan. It doesn't appear when Meilisearch is indexing or anything. The Azure infrastructure snapshots the disk when you change a service plan, which is expected.

However, the disk snapshot is not consistent:

  • The files containing the documents to index are not there. That is expected as Meilisearch has already indexed them and committed the change in the LMDB database by flushing to disk and ensuring the data was durably written to disk.
  • The database is rolled back in the past before the data was processed and flushed to disk. This results in Meilisearch thinking that some documents update is still enqueued and need to be processed, reading the missing content file ☝️

We are lucky in this scenario because the LMDB databases of Meilisearch are not corrupted. After all, you can still use Meilisearch. However, some other users reported database corruptions, probably resulting from a snapshot not being consistent by file, snapshotting half of a file at the moment and the other half at another point in time, creating inconsistencies in the internal LMDB B+Tree data structures.

@meilisearch/docs-team (👋) can start by documenting that upgrading/downgrading plans on Azure results in database corruptions or inconsistencies in the best scenario and link to this very issue. However, @cmaneu, I would like to ask if you are aware of any way to ensure that a snapshot is done consistently over the whole disk before we decide to change the plan. Even blocking write operations during the snapshot would be perfect 👀 It may be related to the daily snapshotting being caped at 10 for Standard S3 plans. I am sure this problem is affecting a lot, if not all, of transactional databases, i.e., SQLite, RocksDB, and PostgreSQL.

Have a nice day 🏄

from meilisearch.

knd775 avatar knd775 commented on June 16, 2024

My DB just got corrupted again without making any changes. Anything I could gather before I reinitialize the instance? Here's the error I am getting now:
image

from meilisearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.