Coder Social home page Coder Social logo

Write up Sync details about watermelondb HOT 40 CLOSED

nozbe avatar nozbe commented on May 16, 2024 8
Write up Sync details

from watermelondb.

Comments (40)

flettling avatar flettling commented on May 16, 2024 19

Alright, during the past weeks we (@theVasyl , @sebastian-schlecht, @fletling and others) have built a sync engine for our app on top of WatermelonDB and we’re very close to releasing it to our production app. In the following I’m going to share some insights how we did it and what we learned.
For now this is more like a list of bullet points, if there’s interest around this topic I’ll try to make it a more extensive and coherent blogpost some point in time.

Our app:

Pave is an issue tracking app for construction site managers and architects. In terms of the data model it is basically a todo app.

The problem:

  • Our app needs to work fully offline
  • We want the offline functionality to work on top of our existing GraphQL API.
  • There’s some existing synchronisation solutions for those who are willing to rebuild their whole backend (e.g. Realm, KintoJS) but nothing in the GraphQL ecosystem.
  • There’s some caching solutions for apollo (see e.g. apollographql/apollo-link#125 and https://github.com/awslabs/aws-mobile-appsync-sdk-js). But for our case that turned out to be not persistent enough and too many edge cases that were failing. Before, we used a custom caching solution we built that roughly works similarly like the aws app sync client with redux offline under the hood. Also in particular the apollo cache is awfully slow. We therefore wanted to move the client side data to a proper client side database.

How we envision the architecture of a sync engine:

  • Some basic characteristics of the sync engine
    • incremental sync with timestamps
    • conflict detection using timestamps
    • client side conflict resolution
  • The sync engine should not be part of WatermelonDB as the concept outlined before is not dependent on WatermelonDB. The basic reasoning should be more like an “adapter”. Similar like WatermelonDB sits on top of SQLite or LokiJS with different adapters, the syncengine should sit on top of WatermelonDB but not be a part of WatermelonDB. This allows for better separation of concerns and testability.
  • Similarly the sync engine should only make minimal expectations about the backend API and work with existing backends without breaking changes (important USP compared to e.g. realm or kinto). There needs to be a spec what the backend needs to support conceptually, otherwise the reasoning should again be to provide an adapter. The spec would roughly go like this
    • The backend needs to support the last modified timestamp
    • The backend needs to update records only when the last modified timestamp matches
    • The backend needs to respond with a “conflict” flag in case of a conflict and include the original record in the response.
    • potentially other points that I forgot
  • It is then up to the developer to make an adapter for their existing backend. Ideally we’d provide exhaustive tests such that it can be determined easily whether the adapter works as required.
  • The synchronisation runs per Collection. Somewhere there needs to be a configuration to which API calls a Collection corresponds. See below for how we solved this at Pave.
  • The sync engine needs to resolve relations. This problem consists of two integral parts
    • How are client vs server side ids handled?
    • In what order are different collections synced? Basically the parent of a relation always needs to be synced first.
  • The sync engine should support syncing only a part of the database at once. See below for a description why we need this at Pave. Other use cases would for instance be syncing only newest data or specifically filtered data etc.
  • The reasoning should always be offline first. The WatermelonDB should hold a fully local representation of the server data. That way the UI bindings can just use the data from the WatermelonDB and not even be aware that there’s a backend behind the database that is synced. The transformation from local data to remote data happens during synchronisation exclusively. This is particularly relevant for IDs (as discussed before) as well as for files (see below for how we do this at Pave).

What we currently do at Pave:

  • We built an extension of WatermelonDB that we call SyncableDatabase in our Code (SyncableDatabase extends [WatermelonDB-]Database). Similarly we have a SyncableCollection, SyncableModel etc. This is potentially not exactly ideal because that way we don’t exactly have a synchronisation adapter (as envisioned above) but something that is highly entangled with WatermelonDB.
  • The SyncableModel holds a syncConfig where we configure what API calls are made and how a remote record relates to a local record. In particular we here map from remote IDs to local IDs to resolve relations. Please find enclosed an excerpt of such a syncConfig. Not sure whether it makes sense but it should give a rough idea. Please note that in our case it is specific for our GraphQL backend but that should better be abstracted as discussed before.
class Ticket extends SyncableModel {
  static table = Tables.tickets;
  static syncConfig = {
    read: {
      operationName: 'tickets',
      query: `// GraphQL query string goes here`,
      // creates input variables object for query
      variables: async ({
        additionalVariables,
        since,
        convertLocalToRemoteId,
      }: ReadVariablesInput) => ({
        input: {
          projectId: await convertLocalToRemoteId(additionalVariables.projectId),
          withDeleted: true,
          since,
        },
      }),
    },
    create: {
      operationName: 'createTicket',
      mutation: `// GraphQL mutation goes here`,
      // creates input variables object for create mutation
      // Here we transform the local record to the remote representation
      variables: async ({
        record,
        uncleanRecord,
        projectId,
        convertLocalToRemoteId,
      }: CreateVariablesInput<SyncableModel>) => {
         // [...]
        };
      },
    },
   // ... same also for update / delete mutations
   mapRemoteToRawLocal: async (...) => {
      // [...]
      // Transform remote representation to raw local record
   },
   // ...
  • Regarding client/server IDs: We’re currently storing them in a separate Collection. We don’t find this solution optimal as it results in additional DB roundtrips.
  • For the order of syncing it is important that the parents of relations are synced first. We didn’t find a smarter way than hardcoding this.
  • The synchronisation of an individual collection is triggered by calling a sync() function on a collection.
  • Our data model has the notion of a project. (For quick context: Our data model is basically a todo app where todo items are part of a project.) The synchronisation should run per project. We therefore introduced the concept of an “additionalQuery”. Somewhere in the sync engine the engine determines which records to sync by querying for created/updated/deleted records. In our case we use this additionalQuery to filter for a specific projectId:
    const unsyncedRecords = !!additionalQuery
      ? await this.query(
          Q.where(columnName('_status'), Q.notEq('synced')),
          additionalQuery,
        ).fetch()
      : await this.query(
          Q.where(columnName('_status'), Q.notEq('synced')),
        ).fetch();
    // this is a hack from Radek see https://github.com/Nozbe/WatermelonDB/issues/5
    // the !!additionalQuery? is necessary because with the hack watermelondb throws an error when no additionalQuery is provided
    const queryForDeleted = !!additionalQuery
      ? // $FlowFixMe
        this.query(Q.where(columnName('_status'), 'deleted'), additionalQuery)
      : this.query(Q.where(columnName('_status'), 'deleted'));

    queryForDeleted.description = queryForDeleted._rawDescription;

    const deletedRecords = await queryForDeleted.fetch();

And then the signature of the sync function is async sync(additionalVariables?: Object = {}, additionalQuery?: Query<Record>)

  • Similarly for the reading synchronisation should run per project and therefore we need to include the projectId in the api request. We therefore introduced the concept of additionalVariables that allows to inject arbitrary data into the sync() function that can then be used in the syncConfig.
  • Since calling sync({ projectId }, Q.where(columnName('project_id'), Q.eq(projectId))) all the time is quite ugly we built a wrapper class that we call ProjectSyncer. Here we expose a ProjectSyncer.sync() function that takes care of the correct order of collection syncing (hardcoded as mentioned above) as well as providing the additionalQuery and additionalVariables.
  • With regards to conflict resolution we currently just do last write wins.
  • One more thing: Our data also includes files that we need to sync. Here the concept of the syncConfig in the Model comes in handy: Downloading a file is just an additional transformation of data from remote record to local record. That would roughly look like this:
class Attachment extends SyncableModel {
    static syncConfig = {
        // [...]
        mapRemoteToRawLocal: async ({
          cleanRemote,
          convertRemoteToLocalId,
          projectId,
        }: MapRemoteToRawLocalInput<SyncableModel>) => {
          // [...]
          rawLocal.file = await fileSync.sync(cleanRemote.file.url, projectId)
          // [...]
      };
    }
}

The rest of the syncengine doesn't even need to be aware of the notion of files.

Some resources that we found quite helpful while building this:

DX of WatermelonDB in particular with regards to syncing:

  • We often need to resort to ._raw and sanitizedRaw which doesn’t feel like. This is in particular for updating the _status and the last_modified fields. There should be helper functions like markSynced() or updateTimestamp() that can be used in a record builder.
  • We found the handling of deleted records to be inconsistent with the handling of created/updated records as discussed by @theVasyl somewhere in comments above. A “delete" is done as a soft delete in the sync engine concept and therefore a delete is basically just a special case of a “update” operation.
  • We had some trouble with flow. This may be related to us using extensions of the original WatermelonDB classes that are apparently not understood by flow correctly. I’m not exactly sure yet whether that’s a mistake on our side or on the side of WatermelonDB, as of today we’re resorting to $FlowFixMe quite a lot.
  • The handling of the last_modified timestamp (in particular: fetching the newest timestamp per collection under the constraint of an additionalQuery) should be built more deeply into the WatermelonDB. We currently use adapter.setLocal but that feels a little redundant and unnecessary.
  • Same applies to the notion of local/remote IDs. That is something that any sync engine needs to support and should therefore be built more deeply into watermelonDB. We currently use a separate table for that, but similarly like using the setLocal before) this comes with the drawback of additional boilerplate code in the schema that is not specific for our app as well as additional database roundtrips during synchronisation.
  • For the local/remote id mapping it would be helpful to have recordBuilder functions async.
  • Generally WatermelonDB is totally awesome! Even it’s still young it’s in a quite mature state and we haven’t found any significant bug that would significantly impede us (A few small things here and there obviously but that’s expected. We’re going to open a few more issues about small stuff that we found)

So, this was quite a long post. I hope it helps, I’m available for questions and of course I’d be the most interested in your feedback on the concept that we came up with and how to further improve it.

@radex I haven’t looked at your pull request that you mentioned above yet but will do that some point in time during the next days.

from watermelondb.

radex avatar radex commented on May 16, 2024 3

@luisgregson that might be difficult for you to figure out before we push documentation and some helper functions for it, but just in short:

  1. All records have _status and _changed fields:
    • if _status is created, it's a new record to push to server
    • if _status is updated, it's a record to update
  2. when you pull data from server, and you have something updated on both client, and server, you have to resolve a conflict. there will be a helper function for it. But _changed will have comma-separated names of column names that were changed on client since last sync. you can use that to resolve the conflict (client-changed columns shall win over server)
  3. to get records deleted on the client, use adapter.getDeletedRecords. this will give you IDs of stuff to delete on the server.
  4. after sync, use adapter.destroyDeletedRecords to permanently remove records marked as deleted before, and created/updated records mark as _status='synced', _changed=''.

Again, there will be a few helper functions so that it's straightforward.

from watermelondb.

radex avatar radex commented on May 16, 2024 2

@ashconnell

Am I correct in assuming this sync strategy will only work if you were to pull "all" of the data available to the user in an app?

Correct, all data is pulled.

For the majority of apps discussed using WatermelonDB it seems feasible that when an existing user first logs into the apps that you could fetch all data for the user at launch, but for a chat app that data could potentially be hundreds of threads and thousands upon thousands of messages over a user's lifetime.

Right. A cache for a real-time chat app is not the primary use case for Watermelon. We're not planning to develop a partial synchronization/caching scheme ourselves, but:

  • the implementation of sync is documented here:
    ## Sync design and implementation
    — if you want to take this up yourself
  • you can still achieve some mechanism for partial sync using Watermelon sync. Your pull changes API endpoint could, for example, only send messages from the last two weeks, and likewise the app could have a job that destroys old records. Watermelon wasn't designed for this, but you could do it.

from watermelondb.

radex avatar radex commented on May 16, 2024 2

@servocoder

Concerning the second option that uses @action decorator, I couldn't find a way to invoke it directly. Is it designed to be used for models which are fetched as a result of find()/query() methods solely? In other words I am able to access a model action in the following way only:

Correct, this is meant to be used on Model instance methods. If you have global actions (such as creating users — if users don't belong to any other record), you can define functions any way you like and use database.action(). Hopefully documentation will clear that up.

@radex What is the correct way to break/invalidate push process? Let's say in case of a server is down. All changes will be considered as resolved and never be synced. I can throw an error, but is there a more graceful way than throw an error?

pushChanges: async ({ changes, lastPulledAt }) => {
await axios.post('/user')
.catch((e) => {
// server is down, need to keep all changes for the next sync
});
}

This is correct. Throwing an error in an async function is the same as rejecting a promise. If axios.post returns a Promise, you don't need a catch block

from watermelondb.

radex avatar radex commented on May 16, 2024 1

Our first intuition is that this wouldn't work, because fetch doesn't return deleted objects (as discussed further above).

Right! But this is because Query.constructor rewrites your query to add Q.where('_status', Q.notEq('deleted')). That's why query.description = query._rawDescription. But like I'm saying, this is not covered by tests, so it might not work

from watermelondb.

radex avatar radex commented on May 16, 2024 1

@fletling @theVasyl @sebastian-schlecht @brandondrew et al:

I'm beginning the work of rewriting our sync implementation, open sourcing it as part of 🍉, and have it be essentially self-documenting sample implementation for others to follow. Check it out here: #142 . This is very early, almost no code there, but you can follow along, see the proposed API, and the rough procedure of sync. I'd be happy to hear your comments!

from watermelondb.

luisgregson avatar luisgregson commented on May 16, 2024

In the current documentation you mention using a sync engine with the current synchronization primitives. Did you have specific ones in mind? I'd like to see if i can get something working

from watermelondb.

flettling avatar flettling commented on May 16, 2024

@radex As long as the documentation is not there I'd abuse this thread to ask a few additional questions:

  1. Could you briefly explain how the _changed field works? I assume that's related to sync, right?

  2. Is there a recommended way to set the _status, _changed and last_modified fields? I'm currently using record._raw = sanitizedRaw(...) for everything but I thought a helper function might be helpful here.

  3. Is there a way to find the maximum last_modified time in a collection without retrieving all records? I'm currently storing it using db.adapter.getLocal/setLocal but that doesn't seem ideal.

from watermelondb.

radex avatar radex commented on May 16, 2024

Could you briefly explain how the _changed field works? I assume that's related to sync, right?

Yes, it tracks which columns were changed since the last sync.

If you do:

record.update(() => {
  record.text = 'abc'
  record.fooBar = 10 // the @field does `record._setRaw('foo_bar', 10)` under the hood
})

Watermelon will update _changed to be text,foo_bar. It's just comma separated names of columns (we figured that's going to be the simplest, fastest format).

This is so that if you have a conflict (server says that it has an updated record R1, but local R1's status is also updated), you can take the server's version and only replace the columns that were changed locally.

After sync (after you send the local versions of records to the server), you're supposed to set the status to 'synced' and reset '_changed' to ''.

If the status is 'created', then changes are not tracked (since the whole record is going to be new to the database).

Is there a recommended way to set the _status, _changed and last_modified fields? I'm currently using record._raw = sanitizedRaw(...) for everything but I thought a helper function might be helpful here.

Using sanitizedRaw in sync code is fine! Agreed, a bunch of helper functions would be useful (for finding all the records to sync, resolving the conflicts automatically, etc.)

Is there a way to find the maximum last_modified time in a collection without retrieving all records? I'm currently storing it using db.adapter.getLocal/setLocal but that doesn't seem ideal.

I think we're doing the same thing. I agree it's not ideal. But I don't know all the details. I think this is partly an artifact of the backend system we use. @Stanley knows more about this

from watermelondb.

theVasyl avatar theVasyl commented on May 16, 2024

Hi, @radex , thanks a lot for the project! Very timely.

I will also abuse this thread for some clarification questions. =)

Maybe you could help clarify the following:
There seems to be a _status attribute (usage clear), but also a syncStatus attribute.
Might be I misunderstand smth... How is syncStatus to be used (if at all)?

Thanks a lot for your help.
(And for the lib, again)

from watermelondb.

radex avatar radex commented on May 16, 2024

but also a syncStatus attribute.
Might be I misunderstand smth... How is syncStatus to be used (if at all)?

Sorry for the confusion: the synsStatus property is a @field('_status'). That is, there's _status column in the database, but in javascript, it's available as record.syncStatus. So it's the same thing.

from watermelondb.

theVasyl avatar theVasyl commented on May 16, 2024

Thanks a lot.
Thought that much, but wasn't sure.

from watermelondb.

kennethpdev avatar kennethpdev commented on May 16, 2024

Do you guys have sample implementations? I would rather want to see a live code if possible :)

from watermelondb.

radex avatar radex commented on May 16, 2024

Not yet from me. @theVasyl, @fletling did you get the sync to work on your side? Do you have code snippets to share maybe? :)

from watermelondb.

sebastian-schlecht avatar sebastian-schlecht commented on May 16, 2024

Working with @fletling and @theVasyl here.

We're currently struggling on how to get all deleted records from a specific table? Is it possible to include deleted items in the result when querying against a specific table?

collection.query(Q.where("_status", Q.notEq("synced"))).fetch();

does not work because _status != deleted is always added to the underlying query.

from watermelondb.

radex avatar radex commented on May 16, 2024

@sebastian-schlecht

#5 (comment) — you can only get the IDs of the deleted records. That should be enough, since… well… they're deleted, so you only want to delete them on the server, correct? Or is there a specific reason why you need the full details of the record marked as deleted?

from watermelondb.

theVasyl avatar theVasyl commented on May 16, 2024

@radex Thank you for your help!
As to your question regarding code samples: Everything still very much WIP.

Adapted our prototype sync workflow to use getDeletedRecords and destroyDeletedRecords.
We were wondering, if there is a good reason those two work through the adapter and not on collection level. Seems an implicit collection argument would make them more generic, no?

from watermelondb.

radex avatar radex commented on May 16, 2024

We were wondering, if there is a good reason those two work through the adapter and not on collection level.

I guess this is just used for sync, so I didn't think that it's necessary to expose it to Collection.

And if you're asking why it just returns IDs — we figured this is enough, will be more performant, and there were some concerns about caching/consistency, but I don't remember the details. I have to dig through internal documentation and open-source that too

from watermelondb.

theVasyl avatar theVasyl commented on May 16, 2024

Yes, we want to use it for sync as well. Currently looking into building a generic sync mechanism for all our models. And how to pass the model names. That's the reason for the question.

Yes, understood the reasoning behind IDs from the previous responses.

from watermelondb.

theVasyl avatar theVasyl commented on May 16, 2024

The current blocker is how to getDeletedRecords() for a table, but only the ones where a certain condition applies (ex. they belong to a certain projectId).
Seems Watermelon just exposes the sqlite getDeletedRecords method here.

Any ideas?

from watermelondb.

radex avatar radex commented on May 16, 2024

The current blocker is how to getDeletedRecords() for a table, but only the ones where a certain condition applies (ex. they belong to a certain projectId).

Why do you need that?

from watermelondb.

theVasyl avatar theVasyl commented on May 16, 2024

The current setup is a local WatermelonDB containing all the projects for a given user.
Projects have Tickets, Attachments etc.

Syncing all projects at once would take too long and use too much data.
And is actually not needed.

So trying to make the sync functionality work project wise.
So only sync all tickets for the current project, f.e.

This seems to work except for the deletion part.
The method above returns the deletions for the whole table without any filtering possibility.
So currently looking for a workaround for that.

from watermelondb.

radex avatar radex commented on May 16, 2024

The method above returns the deletions for the whole table without any filtering possibility.
So currently looking for a workaround for that.

I can't promise this is going to work correctly all of the time, as it really wasn't designed to do so, but I think this workaround might work:

const query = collection.query(Q.where(xxx), Q.where('_status', 'deleted'))
query.description = query._rawDescription

const records = await query.fetch()

Let me know if this works. It might be easier if you sync deletions all at once, or if you fetch data for sync all at once but just send it in batches...

from watermelondb.

theVasyl avatar theVasyl commented on May 16, 2024

Thanks!
Our first intuition is that this wouldn't work, because fetch doesn't return deleted objects (as discussed further above).
But will test shortly and report.

from watermelondb.

brandondrew avatar brandondrew commented on May 16, 2024

In the case of a conflict between server and client, do I correctly infer from what is above that my code will be able to determine what to do?

(For contrast, and to explain why I ask, GunDB—last time I looked at it—seemed more concerned with having a perfectly deterministic resolution algorithm than with allowing the developer to control the resolution and determine which data is worth preserving.)

For example, if I wanted to display both sets of data to the end user and ask them to choose what to keep (or combine data into a third option), would that be possible?

from watermelondb.

radex avatar radex commented on May 16, 2024

In the case of a conflict between server and client, do I correctly infer from what is above that my code will be able to determine what to do?

Yes, it's up to you. The information you have during sync is:

  • What's the current state of the client version
  • Which fields changed since last sync on client version
  • What's the current server version

So you could hook up UI to show both versions somehow and let user decide. The easiest resolution scheme is to use server version, and apply fields changed locally since last sync from local version.

from watermelondb.

radex avatar radex commented on May 16, 2024

@fletling whoa, nice post. It will take me a while to digest all this and respond, since your needs are a lot different than ours (for us, no GraphQL, no existing stuff to support, and a preference for full-database, not per-collection sync). But this will help, so thank you.

from watermelondb.

radex avatar radex commented on May 16, 2024

@fletling will you be able to find some time this week to take a look at #142 and give us feedback about the proposed API and sketch of the sync algorithm? I can already see that your case is specific enough you might not be able to use the "standard 🍉 sync", but that's also why I could use your feedback a) why isn't it possible for you to use per-database sync, or at least something closer to it but this GraphSQL abstraction, b) how can we structure the standard sync adapter implementation so that it's easier for people with special cases to reuse as much standard 🍉 code as possible

from watermelondb.

radex avatar radex commented on May 16, 2024

also, before I fully reply to your note from last week, I have one question:

Same applies to the notion of local/remote IDs. That is something that any sync engine needs to support and should therefore be built more deeply into watermelonDB. We currently use a separate table for that, but similarly like using the setLocal before) this comes with the drawback of additional boilerplate code in the schema that is not specific for our app as well as additional database roundtrips during synchronisation.

Why? Why can't you use the same IDs locally as on the server? This is what we've been always doing, and I assumed there's no reason to treat local IDs separately

from watermelondb.

flettling avatar flettling commented on May 16, 2024

@fletling will you be able to find some time this week to take a look at #142 and give us feedback about the proposed API and sketch of the sync algorithm? I can already see that your case is specific enough you might not be able to use the "standard 🍉 sync", but that's also why I could use your feedback a) why isn't it possible for you to use per-database sync, or at least something closer to it but this GraphSQL abstraction, b) how can we structure the standard sync adapter implementation so that it's easier for people with special cases to reuse as much standard 🍉 code as possible

Sure! I left some comments in particular regarding the two questions a) and b) in here:
#142

Why? Why can't you use the same IDs locally as on the server? This is what we've been always doing, and I assumed there's no reason to treat local IDs separately

Theoretically it should be possible but let me describe some edge cases:
That’s generally a problem of being able to make sure that IDs are unique. I know that for instance with UUIDs there’s negligible chance of IDs not being unique (https://stackoverflow.com/questions/1155008/how-unique-is-uuid) but let’s assume they may not be random enough to be unique:
Let’s assume that there’s client (1) and (2) that due to bad luck generated the same ID x for a record. (1) already synced it’s record to the backend. So in the backend we have a record with ID x. Now (2) attempts to sync the record with ID x. The backend is going to reject it. (2) would now have to generate a new ID y and attempt to sync to the backend again. Also (2) needs to update all its relations to now use y instead of x. What happens when some part of these relations was already synced to the backend? Then also these need to be updated. Surely theoretically possible but definitely it would be better to just avoid this situation in the first place. And an easy solution to avoid this is to use client and server side IDs and just map them in the network layer of the synchronisation. That way the backend doesn’t have to be aware of multiple clients and the client doesn’t have to be aware of the backend side IDs at all.

See for instance https://softwareengineering.stackexchange.com/questions/287163/generation-of-ids-in-offline-online-application and https://softwareengineering.stackexchange.com/questions/236309/strategy-for-generating-unique-and-secure-identifiers-for-use-in-a-sometimes-of for some discussion around this topic.

Also this here is helpful: https://tech.trello.com/sync-two-id-problem/

Apart from that there’s also security considerations about generating IDs client side - on which I’m not an expert so I’d refer to some discussion thread here again: https://stackoverflow.com/questions/105034/create-guid-uuid-in-javascript?noredirect=1&lq=1 and https://stackoverflow.com/questions/1296234/is-there-any-danger-to-creating-uuid-in-javascript-client-side

from watermelondb.

brandondrew avatar brandondrew commented on May 16, 2024

but let’s assume they may not be random enough to be unique

Why would you make that assumption?

It seems to me that:

  1. the odds are infinitesimally small (or "negligible" as you say)
  2. you can detect a collision and throw an exception, and therefore
  3. there is no risk of data loss

If I'm wrong about 2, then 3 disappears. But isn't # 1 good enough? Aren't the odds higher that all your servers will be simultaneously stolen by criminals or hit by lightning?

I might be missing an important point. If so, please clarify for me. Thanks!

from watermelondb.

radex avatar radex commented on May 16, 2024

Theoretically it should be possible but let me describe some edge cases:
That’s generally a problem of being able to make sure that IDs are unique. I know that for instance with UUIDs there’s negligible chance of IDs not being unique (https://stackoverflow.com/questions/1155008/how-unique-is-uuid) but let’s assume they may not be random enough to be unique:
Let’s assume that there’s client (1) and (2) that due to bad luck generated the same ID x for a record. (1) already synced it’s record to the backend. So in the backend we have a record with ID x. Now (2) attempts to sync the record with ID x. The backend is going to reject it. (2) would now have to generate a new ID y and attempt to sync to the backend again. Also (2) needs to update all its relations to now use y instead of x. What happens when some part of these relations was already synced to the backend? Then also these need to be updated. Surely theoretically possible but definitely it would be better to just avoid this situation in the first place. And an easy solution to avoid this is to use client and server side IDs and just map them in the network layer of the synchronisation. That way the backend doesn’t have to be aware of multiple clients and the client doesn’t have to be aware of the backend side IDs at all.

See for instance https://softwareengineering.stackexchange.com/questions/287163/generation-of-ids-in-offline-online-application and https://softwareengineering.stackexchange.com/questions/236309/strategy-for-generating-unique-and-secure-identifiers-for-use-in-a-sometimes-of for some discussion around this topic.

Also this here is helpful: https://tech.trello.com/sync-two-id-problem/

Apart from that there’s also security considerations about generating IDs client side - on which I’m not an expert so I’d refer to some discussion thread here again: https://stackoverflow.com/questions/105034/create-guid-uuid-in-javascript?noredirect=1&lq=1 and https://stackoverflow.com/questions/1296234/is-there-any-danger-to-creating-uuid-in-javascript-client-side

I read through the links you posted and I remain unconvinced that this is necessary.

At Nozbe, we've been using 16-character client-generated random IDs with sync for 10 years, and we haven't ever detected conflicting IDs.

Currently, 🍉 generates IDs that have 8e24 possibilities. So if you have 10M teams, each of which has 100K records (1T = 1e12), the probability of one collision (one in a trillion!), which would cause a single sync error or a single record data loss is (if I'm calculating correctly)… about 6%. OK, that's actually higher than I expected, but we're talking one in a freaking trillion here. And if that's not enough:

  • we can add more bits of entropy by generating a longer ID, or just broadening the alphabet from [a-z0-9] to [a-zA-Z0-9] — that would improve the one in a trillion collision probability to 0.001%.
  • use crypto instead of Math.random() to ensure the randomness is dispersed better
  • you could swap randomId() to proper UUID implementation, which at 122 bits of entropy makes the 1 in a trillion collision likelihood to 9e-14.

TL;DR: Unless you're building a nuclear power facility, the extremely tiny improvement in data safety by eliminating the possibility of ID conflicts is completely outweighed by the risk of a lot of additional complexity.

PS. Regarding safety, I don't see this as relevant to sync. You always have to ensure if a user can create a type of record on the server. IDs are no different here

from watermelondb.

psolom avatar psolom commented on May 16, 2024

Hey guys, you do an amazing job!

I can see the Sync feature is about to be released, and some PR's are already merged.

There are some docs have been written related to Sync (in PR), but I would ask you to make it more extensive and comprehensive and more examples would nice to get. Looking forward for the final Sync release.

from watermelondb.

radex avatar radex commented on May 16, 2024

@servocoder have you seen the docs in this PR: https://github.com/Nozbe/WatermelonDB/blob/cafe5421981693a0d9b5ca051b3eb271bfc41711/docs/Advanced/Sync.md ? Those are a little bit more written up. If you have specific suggestions — please comment on that PR. Otherwise, this is more or less what we'll ship and we expect contributions to improve it more!

from watermelondb.

psolom avatar psolom commented on May 16, 2024

Why the synchronize function require actionsEnabled to be enabled?

new Database({
    adapter,
    modelClasses: [...],
    actionsEnabled: true
  });

This allows the sync to work, but when I try to create a new entity in a manner database.collections.get('user').create() I get an error saying that I must perform create operation inside an action solely:

Diagnostic error: Collection.create() can only be called from inside of an Action.

If this is by design then I am curious what is correct way to access/invoke model action directly?

from watermelondb.

psolom avatar psolom commented on May 16, 2024

I would expect to invoke model action like this:

db.collections.get('user').myAction()

or

db.collections.get('user').model.myAction()

or any other way

at this point I could not get how can I access model action

from watermelondb.

radex avatar radex commented on May 16, 2024

Why the synchronize function require actionsEnabled to be enabled?

Actions are necessary to ensure safety. Because of asynchronicity, if you have database writes depending on database reads, and something else is happening simultaneously, Bad Things™ can happen. TL;DR: Only one write action must happen at the same time.

I get an error saying that I must perform create operation inside an action solely:

Will be written up in more detail shortly. But:

database.action(async () => {
  // do your creates/updates here
})

or

class User extends Model {
  @action async someActionOnUser() {
    // do creates/updates here
  }
}

from watermelondb.

ashconnell avatar ashconnell commented on May 16, 2024

Am I correct in assuming this sync strategy will only work if you were to pull "all" of the data available to the user in an app?

I've been researching for solutions to a chat based app i'm building that needs to function offline but work in realtime when connected.

For the majority of apps discussed using WatermelonDB it seems feasible that when an existing user first logs into the apps that you could fetch all data for the user at launch, but for a chat app that data could potentially be hundreds of threads and thousands upon thousands of messages over a user's lifetime.

Are there any alternative ideas floating around that don't involve a flow like this where the first sync involves fetching "everything"?

from watermelondb.

psolom avatar psolom commented on May 16, 2024

@radex this works like a charm:

database.action(async () => {
  db().collections.get('user').create(...)
})

Concerning the second option that uses @action decorator, I couldn't find a way to invoke it directly. Is it designed to be used for models which are fetched as a result of find()/query() methods solely? In other words I am able to access a model action in the following way only:

db().collections.get('user').find('id').then((user) => {
  user.createUser(...)
});

But this looks odd in case I would like to define all my C(R)UD operations inside of model actions. In order to create a new user I have to get another User model instance first. Am I missing something?

from watermelondb.

psolom avatar psolom commented on May 16, 2024

@radex What is the correct way to break/invalidate push process? Let's say in case of a server is down. All changes will be considered as resolved and never be synced. I can throw an error, but is there a more graceful way than throw an error?

pushChanges: async ({ changes, lastPulledAt }) => {
  await axios.post('/user')
    .catch((e) => {
      // server is down, need to keep all changes for the next sync
    });
}

from watermelondb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.