Coder Social home page Coder Social logo

Comments (2)

kailuowang avatar kailuowang commented on June 4, 2024

The Queue tries to finish all work already enqueued, that's the intention.
Upon receiving the Retire, it first does a dispatchWork which will try dispatch all the work in the queue.
The dispatchWork once called, will either exhaust the work queue or the worker queue. Thus, if there are still workers left, then it means all the work in the queue are dispatched, the left workers can be retired, so it sends the NoWorkLeft message to them. If there is still work left in the queue, then there won't be any Worker left to receive that NoWorkLeft message.
Then in the retiring state, the Queue will try to dispatch multiple times to finish the leftover work.

Bottom line is, ShutdownGracefully does mean shutdown gracefully, no work left behind.
Another factor that adds to the confusion is that other actors such as Worker and QueueProcessor watches Queue but that's more a sanity check than part of the shutdown process. If the Queue accidentally dies, there is no reason to keep either QueueProcessor or Workers around. The whole system better be recreated by the supervisor which is the Dispatcher, although we need some tests around that.

from kanaloa.

nsauro avatar nsauro commented on June 4, 2024

I've been poking around a bit at this.

I have a basic idea which I just wanted to put down in writing somewhere, so I figured where better than here 😄

So, here's a rundown of things I think we need to address, one way or another:

  • Currently, there is Work loss potential. IE: If the Queue receives the final RetiringTimeout, and there is Work in the queue still, that Work goes with it. To fix this, we could easily at this point in time, send a message back to the original sender, similar to EnqueueRejected(work). While the message is slightly off, it might be better to just reuse the same public message that is being used now to reject work, vs introducing a new one.
  • The above idea, of having Work which was enqueued, subsequently get rejected, started to make me think, that we might be missing something from acknowledgement. Currently, the sender can know when Work is enqueued or rejected, but the sender never knows when the Work was completed(one way or another). This might be handy to have, since it allows the sender(not the replyTo) to track the status of Work from start to finish. This would allow the sender to fully react to the status of Work. Def a thing up for debate. In either event, we would want to have messages like these optional to send.
  • So, the idea behind graceful shutdown is that all work completes, and then all components shut themselves down. The only slight wrinkle I see with this, is that in the retiring state, we are still actively scheduling work(if it is already enqueued). The issue is, we have no control or guarantee that work we schedule while in this state will successfully be completed because of the RetireTimeout. Figure, if during the retiring state, right at T-1(before shutdown), a Worker gets new work, sends a message to its Routee, and then the system shuts down. That message is now in limbo(from the end user of Kanaloa's point of view). Did it go through? Did it timeout? If this was a transaction of some sort, there is no record of what happened(the replyTo and the sender both get nothing). The thing that rubs me the wrong way here is that in graceful shutdown, we can actively be increasing our chances of not having a graceful shutdown(by scheduling more work).

I think the safest thing to do, (as things are implemented currently), is upon retire, immediately reject all queued messages(similar to how things would need to be rejected in the above point). This gives the sender the ability to confidently handle things that we had no guarantee of executing, and it can react accordingly. To the sender, this is just another rejection event, which it already has to handle.

The graceful shutdown would then be a signal for "in flight" operations. IE: did everything that was in flight succeed, and were all unexecuted queued message successfully returned to the sender?

I know this is a bit of a departure from what was originally envisioned, but personally, I feel like we need to do our absolute best to make sure Kanaloa never drops work.

One interesting thing to note, and one thing we might want to explore, is that there actually might be diverging behaviors on shutdown between the pulling and pushing models.

One other thing to note is that based on our earlier chats about this, there actually might be the case for implementing multiple types of shutdown behavior. I think there were many good points bought up about both the current and potential implementations, and they don't need to be mutually exclusive.

from kanaloa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.