Coder Social home page Coder Social logo

Comments (20)

BrendanBall avatar BrendanBall commented on July 3, 2024 1

Thanks @derekkraan

from horde.

derekkraan avatar derekkraan commented on July 3, 2024 1

@sleipnir I believe I did this a long time ago. I don't know what on earth is happening at Github, but I sent this email months ago.
image

from horde.

arjan avatar arjan commented on July 3, 2024

Wow, indeed.. this is not supposed to happen. Even a 1ms sleep inbetween helps already.

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

Great find. Are you able to submit a PR to fix this @BrendanBall?

from horde.

BrendanBall avatar BrendanBall commented on July 3, 2024

I can try. I'm currently not sure why this happens.
I can see that it does an insert to keys_ets_table and then asynchronously when getting a crdt diff name conflicts are checked.
I don't currently understand why both processes end up in a conflict.

Do you think it's good enough to do another name conflict check in handle_call :register, or might that cause other issues?

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

I believe the main race condition is between the call to lookup and GenServer.call in Horde.Registry.register/3. After we do lookup we don't check again. Only the code in the handle_call of Horde.RegistryImpl is run serially, the lookup is run in parallel.

So we register both processes, but the second one is overwriting the first (in ETS). Then the first one comes in via the CRDT and sees that the second is in ETS, it sees a conflict, deregisters the second (and tells it to exit), reregisters the first (in ETS). Then the second comes in via the CRDT, it sees a conflict, deregisters the first (and tells it to exit), reregisters the second. Then the exit signal comes in from the second, resulting in it also being removed from the registry.

So I believe the issue is: 1. there is a race condition in Horde.Registry.register/3, 2. eagerly registering in ETS causes weird things to happen later when the diffs come in from the CRDT.

Now I'm thinking about whether it is right to eagerly register things in ETS. For sure 1. needs to be fixed.

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

Hmm not sure if my logic on 2. is correct here, but there is definitely something fishy going on here. With some debug logging in handle_call({:register, ... and process_diff(state, {:add, ... we should be able to figure out what exactly is happening.

In principle the CRDT should be doing conflict resolution and only sending either pid1 or pid2 to RegistryImpl. I believe this should be the second pid, since the CRDT is "last write wins". If that's the case, then it should see pid2 and do nothing.

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

OK, running the test reveals: the CRDT is sending both pids in diffs. So the scenario I sketched previously is indeed happening.

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

Ah it's getting both pids from the CRDT because the CRDT diff interval is very low, so it's another race condition. But the pids are at least coming in in the correct order. So we'll always get either pid1, then pid2, or only pid2 from the CRDT.

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

This is a bit tricky, I'm almost inclined to say that we should remove eagerly registering in ETS. This is not great, but I am not sure how else this would work?

from horde.

BrendanBall avatar BrendanBall commented on July 3, 2024

Doing a first lookup seems to fix the issue
However I'm not 100% confident that this will always fix the issue or might cause other issues because of the eventual consistency of the crdt

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

I am also still worried about this eagerly adding entries to the ETS table and possible race conditions stemming from that.

from horde.

BrendanBall avatar BrendanBall commented on July 3, 2024

Do you know why entries are currently added eagerly?
It could also cause other issues if we remove that?

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

We do it like that to make local changes immediately visible in the registry. So the impact of removing that would be that the registry would be eventually consistent also for local changes.

from horde.

arjan avatar arjan commented on July 3, 2024

At the call site you expect a registry entry to be effective immediately, I think? It might lead to strange situations otherwise maybe? Because then an immediate lookup after a register will fail?

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

Horde users are already expected to deal with eventual consistency from one node to the next, so I am wondering whether removing the immediate consistency from local Registry operations would have a large impact.

I think most scenarios where one is starting a process and then immediately looking it up could be fixed by simply referring to the pid. That's how people are expected to deal with eventual consistency when starting remote processes.

I'm leaning towards removing immediate consistency for local registry changes. It would require a new major version number: 0.9. Would like to think on it for a while, and if anyone has any ideas or input then please mention it here.

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

It's been a month. I think it's a good idea to remove this "eager" process registration. Let's do that.

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

from horde.

sleipnir avatar sleipnir commented on July 3, 2024

@derekkraan Please do this optionally. I think that changing this behavior could have important impacts for other use cases than the one mentioned above by you. It would be prudent to do this optionally for the user

from horde.

sleipnir avatar sleipnir commented on July 3, 2024

Ok.

from horde.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.