Comments (2)
The Queue
tries to finish all work already enqueued, that's the intention.
Upon receiving the Retire
, it first does a dispatchWork
which will try dispatch all the work in the queue.
The dispatchWork
once called, will either exhaust the work queue or the worker queue. Thus, if there are still workers left, then it means all the work in the queue are dispatched, the left workers can be retired, so it sends the NoWorkLeft
message to them. If there is still work left in the queue, then there won't be any Worker left to receive that NoWorkLeft
message.
Then in the retiring state, the Queue will try to dispatch multiple times to finish the leftover work.
Bottom line is, ShutdownGracefully
does mean shutdown gracefully, no work left behind.
Another factor that adds to the confusion is that other actors such as Worker
and QueueProcessor
watches Queue
but that's more a sanity check than part of the shutdown process. If the Queue
accidentally dies, there is no reason to keep either QueueProcessor
or Worker
s around. The whole system better be recreated by the supervisor which is the Dispatcher
, although we need some tests around that.
from kanaloa.
I've been poking around a bit at this.
I have a basic idea which I just wanted to put down in writing somewhere, so I figured where better than here
So, here's a rundown of things I think we need to address, one way or another:
- Currently, there is Work loss potential. IE: If the Queue receives the final
RetiringTimeout
, and there is Work in the queue still, that Work goes with it. To fix this, we could easily at this point in time, send a message back to the originalsender
, similar toEnqueueRejected(work)
. While the message is slightly off, it might be better to just reuse the same public message that is being used now to reject work, vs introducing a new one. - The above idea, of having Work which was enqueued, subsequently get rejected, started to make me think, that we might be missing something from acknowledgement. Currently, the
sender
can know when Work is enqueued or rejected, but thesender
never knows when the Work was completed(one way or another). This might be handy to have, since it allows thesender
(not thereplyTo
) to track the status of Work from start to finish. This would allow thesender
to fully react to the status of Work. Def a thing up for debate. In either event, we would want to have messages like these optional to send. - So, the idea behind graceful shutdown is that all work completes, and then all components shut themselves down. The only slight wrinkle I see with this, is that in the retiring state, we are still actively scheduling work(if it is already enqueued). The issue is, we have no control or guarantee that work we schedule while in this state will successfully be completed because of the
RetireTimeout
. Figure, if during the retiring state, right at T-1(before shutdown), a Worker gets new work, sends a message to its Routee, and then the system shuts down. That message is now in limbo(from the end user of Kanaloa's point of view). Did it go through? Did it timeout? If this was a transaction of some sort, there is no record of what happened(thereplyTo
and thesender
both get nothing). The thing that rubs me the wrong way here is that in graceful shutdown, we can actively be increasing our chances of not having a graceful shutdown(by scheduling more work).
I think the safest thing to do, (as things are implemented currently), is upon retire, immediately reject all queued messages(similar to how things would need to be rejected in the above point). This gives the sender
the ability to confidently handle things that we had no guarantee of executing, and it can react accordingly. To the sender
, this is just another rejection event, which it already has to handle.
The graceful shutdown would then be a signal for "in flight" operations. IE: did everything that was in flight succeed, and were all unexecuted queued message successfully returned to the sender
?
I know this is a bit of a departure from what was originally envisioned, but personally, I feel like we need to do our absolute best to make sure Kanaloa never drops work.
One interesting thing to note, and one thing we might want to explore, is that there actually might be diverging behaviors on shutdown between the pulling and pushing models.
One other thing to note is that based on our earlier chats about this, there actually might be the case for implementing multiple types of shutdown behavior. I think there were many good points bought up about both the current and potential implementations, and they don't need to be mutually exclusive.
from kanaloa.
Related Issues (20)
- Expected wait time reported incorrectly HOT 1
- auto down-size
- Make sure kanaloa reacts properly during abnormal situations
- Add configuration validation to settings class
- double logging for errors
- Use Future as backend interface rather than actorRef
- A fast direct mode
- Per routee QueueProcessor and AutoThrottle
- Document two approaches of autothrottle start point
- A Dispatcher Factory for simplified API for creating dispatchers.
- Guard against Exception from ResultChecker and backend response
- Metrics include host in namespace
- upgrade to ficus 1.2.7+ and use hyphened config names
- StatsD reporter can't recover from lost connection
- Fail fast mode HOT 1
- retry and timeout HOT 3
- reimplement circuit breaker per handler
- Reject all queued work if there is no worker pool created after grace period
- Queue shutdown triggers dispatcher shutdown
- Delay resetting drop rate and burst allowed in Regulator HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kanaloa.