Comments (10)
Hi @QuantamHD,
This is correct. We use DynamicSupervisor for doing the actual supervision, and the challenge with temporary
and transient
processes is reflecting this information in the δ-CRDT. In the case of permanent
processes, the state is leading, but in the case of temporary
or transient
processes, this is reversed.
For an overview of the behaviour of temporary
and transient
, see https://hexdocs.pm/elixir/Supervisor.Spec.html#module-restart-values-restart
I have shied away from doing "real" supervision in Horde.Supervisor
, preferring to delegate this to DynamicSupervisor
. This won't be possible anymore if we support transient and temporary children.
If you have any other questions, don't hesitate to ask!
from horde.
In terms of "leading" state do you mean that the state can be set before process creation and not modified afterward for permanent strategies? While for the temporary and transient cases require CRDT modification at process creation time as well as process exits?
Could we achieve this feature with monitors on the processes managed by the DynamicSupervisor or do you think we would need a fully implemented supervisor behavior?
from horde.
What I mean is, with permanent
restart, a process is only removed from the supervisor by us, by calling DynamicSupervisor.terminate_child
. With transient
and temporary
restart, the process is removed by the supervisor itself when the process exits (depending on the exit reason and the strategy).
We could achieve this with monitors on the processes that are managed by the DynamicSupervisor, but the complexity of our "supervision by proxy" approach is increasing every time we extend the approach, and I think we should consider (and I have been considering) whether Horde could benefit from fully implementing its own supervisor instead of relying on DynamicSupervisor.
Having our own supervisor implementation in Horde would also simplify some other parts of the code, like the graceful shutdown bits, which is already a bit of a hack.
Do you have any thoughts on this?
from horde.
I would say that makes sense to me. We could also base the implementation off of https://github.com/elixir-lang/elixir/blob/master/lib/elixir/lib/dynamic_supervisor.ex given that Horde currently tries to maintain API parity with it.
from horde.
One question I do have is can a temporary process actually be restarted in Horde given the strict definition provided in the docs? It sounds like a node down should kill it.
from horde.
We could also base the implementation off Elixir's DynamicSupervisor
This sounds like a very good option. If our changes are limited, then we will also be able to port any changes / bugfixes etc from the stdlib DynamicSupervisor back to Horde's forked version and keep them mostly in sync.
One question I do have is can a temporary process actually be restarted in Horde given the strict definition provided in the docs? It sounds like a node down should kill it.
I hadn't thought about this. We are extending Supervisor into places it has not been before, so some interpretation is required. In this case I think never restarting is the correct behaviour as you say. For processes that should always be restarted until they have a normal exit code, we have :transient
.
from horde.
Keeping the bug fixes of stdlib DynamicSupervisor sounds great! It looks like we would need to insert CRDT modification code here. Could you explain to me the current "supervision by proxy" strategy employed by Horde? I see that you select nodes bases a consistent hash scheme, and it looks like you forward child specs to that node based on that. If we employed a custom Supervisor how might that strategy change?
from horde.
The current strategy is to let DynamicSupervisor
do all our supervision for us. So we proxy all calls either down to the DynamicSupervisor
or to the DeltaCRDT (depending on what information is desired).
There is already a use case that I have had to work around (when the CRDT is not "leading"), and that's during a shutdown. During a shutdown, what we want is to update the CRDT state for each process as they shut down, and not at the end, after all processes have shut down. This is because there could be a big difference in shutdown time per process, and we don't want to have a scenario where a process shuts down and then has to wait for all other processes on the node to shut down before it is started on another node. So this is already a chink in the armour of our "supervision by proxy" approach.
And the way this works in practice is that there is a "graceful shutdown" process that recognizes when a shutdown has been initiated and proxies terminate messages from the ProcessCanary
processes when there is a shutdown ongoing.
So this "graceful shutdown" behaviour is a candidate for improvement if we fork DynamicSupervisor.
I don't think the hash scheme ("distribution strategy") details will change if we fork DynamicSupervisor. This is purely for choosing which node a process should start on, and doesn't have anything to do with supervision of the process once it has actually started.
from horde.
I have a little WIP branch up for you to look at #62 I added and modified the dynamic supervisor so that child_ids aren't thrown away and on process deletion, the supervisor will also remove the child_id from the CRDT. I have one failing test related to deduping I was hoping you could give me a hand with
from horde.
Implemented in #70
from horde.
Related Issues (20)
- ETS lookup crash in Registry.lookup
- Horde Process topology HOT 12
- DynamicSupervisor.start_child hangs on infinite timeout HOT 3
- Is it possible to start a Horde dynamic supervisor with a registered name? HOT 2
- match error when stopping a process HOT 1
- Stopping Child via Horde.DynamicSupervisor.terminate_child with via_tuple HOT 1
- Occasional Horde Registry Lookup Misses HOT 2
- GenServer.cast cant find process via {:via, Horde.Registry, ...} HOT 6
- race condition in name conflict in same registry process results in all processes losing HOT 20
- Default random node distribution strategy HOT 5
- Match error while doing process handoff in dynamic supervisor
- Use new PartitionSupervisor
- Timeouts and Crashes in the Cluster HOT 2
- Disappearing values from Horde.Registry after spawning more than two nodes. HOT 2
- Why we need to handle {:error, {:already_started, pid}} ? HOT 4
- SayHello GenServer never (re)started on `count3` node?
- uniform_random_distribution.ex not in deps 8.7 HOT 1
- Old child specs remain in Supervisor CRDT after process handoff
- Registry.lookup doesn't always return new item that was just registered HOT 13
- Example of How to Start Children like a Regular Supervisor
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from horde.