Comments (20)
Thanks @derekkraan
from horde.
@sleipnir I believe I did this a long time ago. I don't know what on earth is happening at Github, but I sent this email months ago.
from horde.
Wow, indeed.. this is not supposed to happen. Even a 1ms sleep inbetween helps already.
from horde.
Great find. Are you able to submit a PR to fix this @BrendanBall?
from horde.
I can try. I'm currently not sure why this happens.
I can see that it does an insert to keys_ets_table
and then asynchronously when getting a crdt diff name conflicts are checked.
I don't currently understand why both processes end up in a conflict.
Do you think it's good enough to do another name conflict check in handle_call :register
, or might that cause other issues?
from horde.
I believe the main race condition is between the call to lookup
and GenServer.call
in Horde.Registry.register/3
. After we do lookup
we don't check again. Only the code in the handle_call
of Horde.RegistryImpl
is run serially, the lookup
is run in parallel.
So we register both processes, but the second one is overwriting the first (in ETS). Then the first one comes in via the CRDT and sees that the second is in ETS, it sees a conflict, deregisters the second (and tells it to exit), reregisters the first (in ETS). Then the second comes in via the CRDT, it sees a conflict, deregisters the first (and tells it to exit), reregisters the second. Then the exit signal comes in from the second, resulting in it also being removed from the registry.
So I believe the issue is: 1. there is a race condition in Horde.Registry.register/3
, 2. eagerly registering in ETS causes weird things to happen later when the diffs come in from the CRDT.
Now I'm thinking about whether it is right to eagerly register things in ETS. For sure 1. needs to be fixed.
from horde.
Hmm not sure if my logic on 2. is correct here, but there is definitely something fishy going on here. With some debug logging in handle_call({:register, ...
and process_diff(state, {:add, ...
we should be able to figure out what exactly is happening.
In principle the CRDT should be doing conflict resolution and only sending either pid1 or pid2 to RegistryImpl
. I believe this should be the second pid, since the CRDT is "last write wins". If that's the case, then it should see pid2 and do nothing.
from horde.
OK, running the test reveals: the CRDT is sending both pids in diffs. So the scenario I sketched previously is indeed happening.
from horde.
Ah it's getting both pids from the CRDT because the CRDT diff interval is very low, so it's another race condition. But the pids are at least coming in in the correct order. So we'll always get either pid1, then pid2, or only pid2 from the CRDT.
from horde.
This is a bit tricky, I'm almost inclined to say that we should remove eagerly registering in ETS. This is not great, but I am not sure how else this would work?
from horde.
Doing a first lookup seems to fix the issue
However I'm not 100% confident that this will always fix the issue or might cause other issues because of the eventual consistency of the crdt
from horde.
I am also still worried about this eagerly adding entries to the ETS table and possible race conditions stemming from that.
from horde.
Do you know why entries are currently added eagerly?
It could also cause other issues if we remove that?
from horde.
We do it like that to make local changes immediately visible in the registry. So the impact of removing that would be that the registry would be eventually consistent also for local changes.
from horde.
At the call site you expect a registry entry to be effective immediately, I think? It might lead to strange situations otherwise maybe? Because then an immediate lookup after a register will fail?
from horde.
Horde users are already expected to deal with eventual consistency from one node to the next, so I am wondering whether removing the immediate consistency from local Registry operations would have a large impact.
I think most scenarios where one is starting a process and then immediately looking it up could be fixed by simply referring to the pid. That's how people are expected to deal with eventual consistency when starting remote processes.
I'm leaning towards removing immediate consistency for local registry changes. It would require a new major version number: 0.9. Would like to think on it for a while, and if anyone has any ideas or input then please mention it here.
from horde.
It's been a month. I think it's a good idea to remove this "eager" process registration. Let's do that.
from horde.
from horde.
@derekkraan Please do this optionally. I think that changing this behavior could have important impacts for other use cases than the one mentioned above by you. It would be prudent to do this optionally for the user
from horde.
Ok.
from horde.
Related Issues (20)
- DynamicSupervisor.start_child hangs on infinite timeout HOT 3
- Is it possible to start a Horde dynamic supervisor with a registered name? HOT 2
- match error when stopping a process HOT 1
- Stopping Child via Horde.DynamicSupervisor.terminate_child with via_tuple HOT 1
- Occasional Horde Registry Lookup Misses HOT 2
- GenServer.cast cant find process via {:via, Horde.Registry, ...} HOT 6
- Default random node distribution strategy HOT 5
- Match error while doing process handoff in dynamic supervisor
- Use new PartitionSupervisor
- Timeouts and Crashes in the Cluster HOT 2
- Disappearing values from Horde.Registry after spawning more than two nodes. HOT 2
- Why we need to handle {:error, {:already_started, pid}} ? HOT 4
- SayHello GenServer never (re)started on `count3` node?
- uniform_random_distribution.ex not in deps 8.7 HOT 1
- Old child specs remain in Supervisor CRDT after process handoff
- Registry.lookup doesn't always return new item that was just registered HOT 13
- Example of How to Start Children like a Regular Supervisor
- [BUG]: Horde Supervisor behaves one_for_all no matter what HOT 8
- Include horde_process in the main repo HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from horde.