Coder Social home page Coder Social logo

Comments (9)

m110 avatar m110 commented on June 16, 2024 4

We should consider one of the workarounds for PostgreSQL described by @oskardudycz here https://event-driven.io/en/ordering_in_postgres_outbox/ :)

from watermill.

roblaszczak avatar roblaszczak commented on June 16, 2024 3

I spent a pretty long time trying to find out why it's not always reproducible - and I found it!

Generally speaking, two pre-conditions need to happen: you need to have a small PollInterval, and two specific queries need to happen: query A, which is executed first and committed second, and B, which is executed second but committed as first.

It's illustrated by that test: https://github.com/ThreeDotsLabs/watermill-sql/pull/23/files#diff-9c7c1342e6c9993e31c519e27f897bea0ffb721cb39fd7cf9bb70cfc4a56b6ebR297

In the end, I was able to reproduce it in MySQL and Postgres.

It wasn't detected by other tests earlier because they were just fast enough (even with 1ms poll interval).
Typically, on production systems, we are not using such small poll interval as well - so the change of it's happening is even smaller.

PR with final implementation: ThreeDotsLabs/watermill-sql#23

It's changing the public API slightly, so a version bump was required. However, the change of the API is rather cosmetic, so migration should be simple.

The initial PoC implementation is closed. Thanks, @condorcet, for the review!

from watermill.

m110 avatar m110 commented on June 16, 2024 2

Hey @condorcet, thanks for sharing this!

I think this topic is important enough to consider bold moves if there's no way around it. 😅

We could either release a v2 with the new API or create a new set of structs that would live next to existing ones.

I think v2 sounds better overall, as the current version is not only limited but can lead to serious bugs. After all, we need event handling to be rock solid, especially when we trust they happen in a transaction.

We should then consider adding a migration that adds the tx_id column for existing users and an easy way to apply it.

I'm happy to review a PR if you're okay with sharing the solution. :)

@roblaszczak FYI

from watermill.

condorcet avatar condorcet commented on June 16, 2024 1

I faced the same issue and tried to implement Postgres adapter based on ideas from this article. Seems it works fine, at least I don't see broken ordering in my integration tests anymore.

But I couldn't find an option to extend existing SQL adapter because interfaces relies on integer "offset". While in article we have to implement "offset" actually as a tuple (transaction_id, offset) and operate it components in queries, for example, to receiving message:

SELECT 
     position, message_id, message_type, data
FROM
     outbox
WHERE
     (
          (transaction_id = last_processed_transaction_id
               AND position > last_processed_position)
          OR
          (transaction_id > last_processed_transaction_id)
     )
     AND transaction_id < pg_snapshot_xmin(pg_current_snapshot())
ORDER BY
    transaction_id ASC,
    position ASC

So we have support transaction_id in adapter interfaces, something like that:

type OffsetAdapter {
    AckMessageQuery(topic string, tx_id int, offset int, consumerGroup string) (string, []interface{}) // added tx_id
}

In my case I added "offset" type:

type Offset {
    TransactionID int64
    Offset               int64
}

And interfaces become something like that:

type OffsetAdapter {
    AckMessageQuery(topic string, offset Offset, consumerGroup string) (string, []interface{}) // <-- int -> Offset
}

But anyway this is breaking change for generic interfaces that tries to fit both MySQL and Postgres. By the way, adding transaction_id in table also seems as breaking changes, because of migration users tables to new schema etc.

@m110 What do you think about this? Could we really achieve non-breaking changes for SQL adapter in this case? Or it's better to make separate Postgres adapter?

from watermill.

roblaszczak avatar roblaszczak commented on June 16, 2024 1

if offset is bigserial...

@roblaszczak does this mean that the standard driver ""offset" SERIAL," regular serial is not affected by this issue? I would really like to know, as I have watermill running critical services in production.

After investigation, I can confirm that this issue may occur under very specific circumstances (I tested it with BIGSERIAL, but to my best knowledge, it could happen with SERIAL as well).

The good news is that it is not always the case - in some systems we run with @m110, the original implementation works fine. In the end, I was able to reproduce it just with one application deployed on GCP (with local Docker deployment, it worked fine with the same project). But in the end, I was not able to identify what was making the difference.

It doesn't change the fact that this bug requires a fix, so I prepared a proof of concept that uses the pattern mentioned earlier to ensure consistency: https://github.com/ThreeDotsLabs/watermill-sql/pull/22/files

It requires some cleanups, but it's already running for one week in that project without an issue. I'll try to find some time soon to clean it up and add a new release. It's not merged nor released yet, so I can still introduce breaking changes in this PR.

In the meantime, I recommend cross-checking the data in your systems to ensure that no events were missed. As I said earlier: it's not always an issue, so there is a chance that no events were missed.

Thanks for your patience!

from watermill.

hlwone avatar hlwone commented on June 16, 2024

Hey @condorcet, can MySQL escape from this bug?

from watermill.

condorcet avatar condorcet commented on June 16, 2024

Hey @condorcet, can MySQL escape from this bug?

@hlwone I have no strong experience with MySQL, I think it depends on implementation of AUTO_INCREMENT and maybe it's not a problem actually (for example lock modes https://dev.mysql.com/doc/refman/8.0/en/innodb-auto-increment-handling.html)

from watermill.

dkotik avatar dkotik commented on June 16, 2024

if offset is bigserial...

@roblaszczak does this mean that the standard driver ""offset" SERIAL," regular serial is not affected by this issue? I would really like to know, as I have watermill running critical services in production.

from watermill.

dkotik avatar dkotik commented on June 16, 2024

Excellent work! Great catch!

from watermill.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.