Coder Social home page Coder Social logo

Comments (10)

derekkraan avatar derekkraan commented on July 3, 2024

Hi @QuantamHD,

This is correct. We use DynamicSupervisor for doing the actual supervision, and the challenge with temporary and transient processes is reflecting this information in the δ-CRDT. In the case of permanent processes, the state is leading, but in the case of temporary or transient processes, this is reversed.

For an overview of the behaviour of temporary and transient, see https://hexdocs.pm/elixir/Supervisor.Spec.html#module-restart-values-restart

I have shied away from doing "real" supervision in Horde.Supervisor, preferring to delegate this to DynamicSupervisor. This won't be possible anymore if we support transient and temporary children.

If you have any other questions, don't hesitate to ask!

from horde.

QuantamHD avatar QuantamHD commented on July 3, 2024

In terms of "leading" state do you mean that the state can be set before process creation and not modified afterward for permanent strategies? While for the temporary and transient cases require CRDT modification at process creation time as well as process exits?

Could we achieve this feature with monitors on the processes managed by the DynamicSupervisor or do you think we would need a fully implemented supervisor behavior?

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

What I mean is, with permanent restart, a process is only removed from the supervisor by us, by calling DynamicSupervisor.terminate_child. With transient and temporary restart, the process is removed by the supervisor itself when the process exits (depending on the exit reason and the strategy).

We could achieve this with monitors on the processes that are managed by the DynamicSupervisor, but the complexity of our "supervision by proxy" approach is increasing every time we extend the approach, and I think we should consider (and I have been considering) whether Horde could benefit from fully implementing its own supervisor instead of relying on DynamicSupervisor.

Having our own supervisor implementation in Horde would also simplify some other parts of the code, like the graceful shutdown bits, which is already a bit of a hack.

Do you have any thoughts on this?

from horde.

QuantamHD avatar QuantamHD commented on July 3, 2024

I would say that makes sense to me. We could also base the implementation off of https://github.com/elixir-lang/elixir/blob/master/lib/elixir/lib/dynamic_supervisor.ex given that Horde currently tries to maintain API parity with it.

from horde.

QuantamHD avatar QuantamHD commented on July 3, 2024

One question I do have is can a temporary process actually be restarted in Horde given the strict definition provided in the docs? It sounds like a node down should kill it.

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

We could also base the implementation off Elixir's DynamicSupervisor

This sounds like a very good option. If our changes are limited, then we will also be able to port any changes / bugfixes etc from the stdlib DynamicSupervisor back to Horde's forked version and keep them mostly in sync.

One question I do have is can a temporary process actually be restarted in Horde given the strict definition provided in the docs? It sounds like a node down should kill it.

I hadn't thought about this. We are extending Supervisor into places it has not been before, so some interpretation is required. In this case I think never restarting is the correct behaviour as you say. For processes that should always be restarted until they have a normal exit code, we have :transient.

from horde.

QuantamHD avatar QuantamHD commented on July 3, 2024

Keeping the bug fixes of stdlib DynamicSupervisor sounds great! It looks like we would need to insert CRDT modification code here. Could you explain to me the current "supervision by proxy" strategy employed by Horde? I see that you select nodes bases a consistent hash scheme, and it looks like you forward child specs to that node based on that. If we employed a custom Supervisor how might that strategy change?

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

The current strategy is to let DynamicSupervisor do all our supervision for us. So we proxy all calls either down to the DynamicSupervisor or to the DeltaCRDT (depending on what information is desired).

There is already a use case that I have had to work around (when the CRDT is not "leading"), and that's during a shutdown. During a shutdown, what we want is to update the CRDT state for each process as they shut down, and not at the end, after all processes have shut down. This is because there could be a big difference in shutdown time per process, and we don't want to have a scenario where a process shuts down and then has to wait for all other processes on the node to shut down before it is started on another node. So this is already a chink in the armour of our "supervision by proxy" approach.

And the way this works in practice is that there is a "graceful shutdown" process that recognizes when a shutdown has been initiated and proxies terminate messages from the ProcessCanary processes when there is a shutdown ongoing.

So this "graceful shutdown" behaviour is a candidate for improvement if we fork DynamicSupervisor.

I don't think the hash scheme ("distribution strategy") details will change if we fork DynamicSupervisor. This is purely for choosing which node a process should start on, and doesn't have anything to do with supervision of the process once it has actually started.

from horde.

QuantamHD avatar QuantamHD commented on July 3, 2024

I have a little WIP branch up for you to look at #62 I added and modified the dynamic supervisor so that child_ids aren't thrown away and on process deletion, the supervisor will also remove the child_id from the CRDT. I have one failing test related to deduping I was hoping you could give me a hand with

from horde.

derekkraan avatar derekkraan commented on July 3, 2024

Implemented in #70

from horde.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.