Opening this to provide feedback. But it might take a few days to get it all out.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

OK. It is too simplistic. Gun has a few different <code class="notranslat

Feedback,about maria-12648430/wait_statem

Comments (28)

lhoguin commented on June 20, 2024

Having to call wait_statem:enter_wait/1 can be a problem if you want to set other actions at the same time. At least what it's doing should be documented.

from wait_statem.

lhoguin commented on June 20, 2024

One immediate problem is that this will not work for modules that still want to handle some calls, for example gun:info/1. Perhaps instead of a full gen_statem (or alongside this option) it would be good to have a way to enter wait mode and initialise the wait mode state, and to forward only some events to the wait module (my_statem:wait catchall would call wait_statem:handle_event or something).

from wait_statem.

Maria-12648430 commented on June 20, 2024

I think in general that it is good. Right now I am mostly wondering about whether it is a good idea to validate the options every time we enter "wait" mode. Though I suppose it's not very heavy anyway.

It was assumed that enter_wait would not be called often, at least not within the same process. You wait for whatever you need, once you got it (or not), you can use it for good (or not).

Having to call wait_statem:enter_wait/1 can be a problem if you want to set other actions at the same time. At least what it's doing should be documented.

The set of actions you can use there is limited:

You could use replies, but you can do them via gen_statem:reply/2 beforehand as well.
You could set, update or cancel generic timeouts. It may be feasible to provide for this, but it's not a top priority right now ;) Event and state timeouts are implicitly cancelled by the state change, so those are not usable there.
You could postpone the event that caused the current execution leading up to the enter_wait call, but that is tricky as it will be replayed after the state change inside the new callback module and might mess up things in there. It seems a pointless thing to do I think.
You could change the callback module before entering wait, in effect leading to returning to a different callback module thatn where enter_wait was called from, but this seems to me an usual edge case.

One immediate problem is that this will not work for modules that still want to handle some calls, for example gun:info/1. Perhaps instead of a full gen_statem (or alongside this option) it would be good to have a way to enter wait mode and initialise the wait mode state, and to forward only some events to the wait module (my_statem:wait catchall would call wait_statem:handle_event or something).

Oh, but you can handle calls and other external events by giving a callback (a fun or {Mod, Fun}) in the external_events option, or with the more fine-grained call_events, cast_events and info_events options. This callback will be called with the event ({call, From}, cast, info) and the internal data ("internal" meaning the "old" module's data, wait_statem's is different) as arguments. You can answer via gen_statem:reply or do whatever you want to do. The callback may return ignore or {ignore, NewData} to discard the event, or postpone or {postpone, NewData} to postpone the event and replay after leaving wait.

from wait_statem.

Maria-12648430 commented on June 20, 2024

@lhoguin we're currently exploring another approach btw, a behavior which we dubbed gen_agent for the time being, based on gen_statem itself, but specialized and sort of stripped down. Because actually, the wait_statem layer is a gen_statem in its own right, so we may as well use a real gen_statem here.

It takes care of the timing and retrying, and implementing modules only supply the logic for attempting to get the resource you want. This can in fact even be a multi-step process (like, establish a connection, do some protocol handshake, try a upgrade to SSL, do another handshake, etc). In the end, it just hands the resource over to the client process (or gives up), and that's it then (with sockets, there is also another step of giving it away to the client process, but that should be doable).

At a glance, this offers more flexibility, which is important if this is supposed to make its way into OTP, and has another benefit in that you can employ several agents at once if the client needs several resources to become operational.

Last but not least, it can be used from any kind of client process, not only gen_statems like with wait_statem.

It's more complex to use, probably, though...

@juhlig is currently drafting it up and experimenting, I'll let you know when there is something to see ;)

from wait_statem.

Maria-12648430 commented on June 20, 2024

@lhoguin gen_agent, check it out :) Still drafty (so don't mind the namings etc) and only briefly tested, but you'll get the general idea I guess :)

from wait_statem.

lhoguin commented on June 20, 2024

I have no idea how to use it. A demo would be good.

from wait_statem.

Maria-12648430 commented on June 20, 2024

I admit the README docu is a bit sparse 😅 I'll slap together an example on monday. Too tired right now... 😞

from wait_statem.

Maria-12648430 commented on June 20, 2024

@lhoguin an example is up, not much, but it's something you can play with. For comments on gen_agent, please open a ticket there ;)

from wait_statem.

lhoguin commented on June 20, 2024

OK. It is too simplistic. Gun has a few different gen_statem states just for the connection part (DNS, TCP, TLS...) and Gun users may have even more (proxies / protocol upgrades) so this wouldn't work very well. I don't think we want a solution that cannot be used along with gen_statem for that reason.

from wait_statem.

Maria-12648430 commented on June 20, 2024

OK. It is too simplistic. Gun has a few different gen_statem states just for the connection part (DNS, TCP, TLS...) and Gun users may have even more (proxies / protocol upgrades) so this wouldn't work very well. I don't think we want a solution that cannot be used along with gen_statem for that reason.

@lhoguin are you referring to wait_statem or gen_agent (or both)?

from wait_statem.

lhoguin commented on June 20, 2024

I am referring to gen_agent. I think a special state machine we can switch to/from is the way to go, only it has to be made as flexible as possible to support all cases.

from wait_statem.

Maria-12648430 commented on June 20, 2024

I'm not trying to argue for one or against the other, don't get me wrong, but I really think gen_agent is a better approach. It looks like we are misunderstanding each other and/or talking at cross purposes. I'll explain ;)

This is the difference between the two approaches, ie what each one is:

the wait_statem approach is to turn your gen_statem process into something, like, "fill me the gaps"
the gen_agent approach is to let it fetch something for the client process, like, "get me something to fill the gaps"

Actually, from my point of view anyway, gen_agent is the more flexible solution in many aspects:

Usage:
- wait_statem can only be used with gen_statem processes, as it is a gen_statem callback module
- gen_agent can be used with anything, as agents are independent processes
Entering/starting:
- entering wait_statem via enter_wait (or manually) happens via a state change, which means it can not be done within an enter call, and the enter_wait call has to be last, ie what it returns must be the return from the state function.
- gen_agents can be started from anywhere, including the shell
Nesting:
- wait_statems are not nestable, you cannot enter another wait_statem layer when you are in one already. It may be possible to somehow implement that, but managing the state data (of the base gen_statem, ie (un-)packing/updating it) across several layers looks very complicated
- gen_agents can use other gen_agents from within themselves, and etcetc
Freedom (of sorts):
- while in the wait_statem layer, the things the base gen_statem can do are limited to what the wait_statem passes through to it; it can handle external events to some extend, but it cannot set timeouts, switch states, etc
- while a gen_agent is running, the client process is free to do whatever it wants to do. It can decide to just wait for the result, but it can also do whatever it meanwhile wants/needs to do
Parallelism:
- with wait_statem, it is only possible to wait for one thing (which may consist of several things) at once. For example, if you need two connections, you either have to set up both in the same wait_statem layer, or wait for one to complete, then do the other
- several gen_agents can be used simultaneously, so you can let one fetch the first connection and let another fetch the other connection at the same time

At the same time, gen_agent is a behavior and thereby exhibits a cleaner interface to implement. For wait_statem, you need all sorts of callbacks, too, but you have to pass them at runtime in the configuration (Opts).

Finally, if you need several steps to make a resource operational (like DNS -> TCP -> SSL -> ...), you can wrap them in one run, but I think it would be the better to handle that by means of your gen_statem, like this:

with wait_statem, you start in the dns state, enter wait_statem, and transition to the tcp state when that succeeds (by giving the tcp state as success return state to wait_statem), then enter wait_statem again and transition to the ssl state on success, etc, and from there to the operational state. If OTOH wait_statem fails in ssl for example, you transition back to the tcp state (you don't need DNS again) via the error return state, and try again from there.
with gen_agent, you also start in the dns state, start an agent, wait for it to return, then on success transition to the tcp state (manually, ie by means of your gen_statem), then start another agent for tcp and transition to ssl on success, and from ssl to operational.

With both mechanisms, if you need to backtrack (eg ssl fails, so you go back to tcp), you may have to implement extra measures for retry timing manually. Or not, depends.

With that all said, both implementations are far from perfect yet. Right now we are thinking of adding "phases" or "stages" to gen_agent, a bit like states, which would make multi-step setups easier. There is still a long way to go either way.

from wait_statem.

lhoguin commented on June 20, 2024

OK I probably misunderstand how to use gen_agent still. But from the sound of it, I'd have to have a gen_statem (Gun connection) AND gen_agent, meaning two processes, and that's simply unacceptable. I also do not think that this provides much value compared to just implementing it manually in my process. The backoff function would be useful but that's about it.

The wait_statem on the other hand, provided that it is flexible enough to cover just about anything, would be ideal: I would switch to a state specifically for waiting; switch to the wait_statem; let wait_statem take care of the backoff/whatever; meanwhile let wait_statem call my state function (with my state data) so that I can deal with calls and messages; and then when wait_statem is done switch back to the appropriate state. I also need wait_statem to let me return actions so that I can stop the process if necessary for example.

I know it doesn't work with things other than gen_statem but that's probably OK? A gen_server is easily converted to a gen_statem anyway. Non-gen code is not recommended.

Perhaps gen_agent can have uses as well but I think it solves a different problem than backoff after connection failures.

from wait_statem.

Maria-12648430 commented on June 20, 2024

Again, not arguing, just trying to clear things up some more ;)

You seem to have your heart set on wait_statem, so we will keep working on it. I just don't think we can reasonably expect to sell it to OTP, gen_agent looks more promising for that matter.

But from the sound of it, I'd have to have a gen_statem (Gun connection) AND gen_agent, meaning two processes,

Yes, read more below.

and that's simply unacceptable.

An Erlanger shying away from yet another process, that's just short of an atheist pope 😁 What has the world come to, I wonder 😅 j/k, but seriously, what's the problem here?

I also do not think that this provides much value compared to just implementing it manually in my process. The backoff function would be useful but that's about it.

The wait_statem on the other hand, provided that it is flexible enough to cover just about anything,

Yeah ^^; At some point, it may not be worth the hassle and you could just use gen_statem itself, ie the common ground the abstraction provides becomes so small that it's just not worth it any more. It's definitely something to keep in mind.

would be ideal: I would switch to a state specifically for waiting; switch to the wait_statem

... which you can't do from a state enter call, as it involves a state change. That complicates matters I think.

let wait_statem take care of the backoff/whatever; meanwhile let wait_statem call my state function (with my state data) so that I can deal with calls and messages; and then when wait_statem is done switch back to the appropriate state.

Hm, something similar can be done with gen_agent, but you would have to go about it differently... Enter the wait state, start_link a process, which in turn start_links and runs an agent and sends the agent result back to the gun connection process. While it runs, the gun connection can do whatever it likes, reply to stuff, switch states, whatever... Once it receives the agent result from the process, it can use that to update its state data and become operational.

The difference is that it won't turn your virgin gun connection into an operational gun connection, it delivers the things you need so that a virgin gun connection can turn itself into an operational one.

I also need wait_statem to let me return actions so that I can stop the process if necessary for example.

And here it starts getting complicated... You can't return just anything you could from a state callback, at a quick glance the same restrictions as for what a state enter call could return would apply. That means checking the return values, and adapting the checks in case gen_statem introduces new things.

Perhaps gen_agent can have uses as well but I think it solves a different problem than backoff after connection failures.

You think so? I would say it solves the same problem (in a nutshell: try (and retry) to get a resource), only with a different approach, and different requirements for how to use them. wait_statem is a plug-in state, gen_agent is a dedicated process.

from wait_statem.

Maria-12648430 commented on June 20, 2024

(I probably know too little about gun to grasp all the implications, btw 😅 Don't hate me plz)

from wait_statem.

lhoguin commented on June 20, 2024

When you have 1 million connections, and you make those connections use two processes instead of one, you end up with a substantial amount of wasted memory. I think gen_agent is temporary so it wouldn't be 1 million of them at any one time under normal conditions, but in the worst case scenario it would be and the memory jump would be significant. And under normal conditions, I would still expect a fair amount of clients retrying their connection at any one time.

The state enter call is not important. You can easily switch to a state and trigger an internal event and do the switch there. You do not have to do it in the state enter call. Gun uses the switch state and trigger event multiple times when setting up the connection already.

The difference is that it won't turn your virgin gun connection into an operational gun connection, it delivers the things you need so that a virgin gun connection can turn itself into an operational one.

I think that's the difference in the problem being solved.

wait_statem lets me wait before doing whatever I need to be done to get in a good state (not only resource fetching!)

gen_agent does resource fetching.

The gen_agent example gets me a TCP socket. But that's not what I want. Gun needs to do a DNS check then a TCP connect then a TLS upgrade. If any of these steps fail, or if the connection gets closed after it has been established, I want to wait, and then go back to the DNS check and repeat the steps. If the TLS upgrade fails, I cannot just retry it, I need to at least go back to a TCP connection. The steps are separate in order to gather metrics and identify which part of the connection is slow if necessary.

So the problem they solve, while similar, is not the same. I cannot just use gen_agent to get back to a good state. I could use it to tell me it's OK to retry again, but I might as well just use a timer for that.

It's OK to not allow everything to be returned. I just need to do replies, perhaps timeouts and such. I don't think you need to adapt to new gen_statem return values, you can just disallow what would break wait_statem. And if a new value would break it then yes it needs to be disabled, but that's probably going to be rare. Or you could just not check return values and rely on the developer if that's a bother. I don't think it's a bad thing if the callback manually "pops" wait_statem to be honest, if there's a good reason.

PS: Haet

from wait_statem.

Maria-12648430 commented on June 20, 2024

When you have 1 million connections, and you make those connections use two processes instead of one, you end up with a substantial amount of wasted memory. I think gen_agent is temporary so it wouldn't be 1 million of them at any one time under normal conditions, but in the worst case scenario it would be and the memory jump would be significant. And under normal conditions, I would still expect a fair amount of clients retrying their connection at any one time.

Ok, was just interested ;)

It's OK to not allow everything to be returned. I just need to do replies, perhaps timeouts and such. I don't think you need to adapt to new gen_statem return values, you can just disallow what would break wait_statem. And if a new value would break it then yes it needs to be disabled, but that's probably going to be rare. Or you could just not check return values and rely on the developer if that's a bother. I don't think it's a bad thing if the callback manually "pops" wait_statem to be honest, if there's a good reason.

Ok, I refactored it. Everything non-wait_statem is now passed through to the base module. You can return keep_state_and_data, keep_state, next_state if you use the same state there, stop, and stop_and_reply, as well as the tuple variants, and some actions. For timeouts, you can use generic ones, but not event and state timeouts.

PS: Haet

One of those famous typos I guess? =^^=

from wait_statem.

lhoguin commented on June 20, 2024

I will have to try if that's enough to use it with Gun before making further comments I think.

from wait_statem.

Maria-12648430 commented on June 20, 2024

I will have to try if that's enough to use it with Gun before making further comments I think.

Oh, by all means, give it your worst 😉

Anyway, new ideas coming in (in part courtesy of @juhlig, thanks).

We could use state timeouts if we were staying in the same state, managing pseudo-states via the state data. That would simplify postponing events, too, no reason to do that manually any more.
Event timeouts can probably be emulated with generic timeouts.
next_event could be interpreted as "leave the wait state (early)", ie pop the wait_statem module.

I'll explore tomorrow 😃

from wait_statem.

lhoguin commented on June 20, 2024

Not sure I understand, but you can't know which events must be postponed and which events must be handled by the user's gen_statem.

from wait_statem.

Maria-12648430 commented on June 20, 2024

Everything that is not a wait_statem-specific event (distinguished by tagging it with a unique reference) is passed to the base gen_statem. If that decides to postpone the event, it is postponed until the state changes, either by returning next_state with a different state than the wait state or by the resource-fetching callback returning ok or erroror a respective tuple, after which the wait_statem module will be popped and the gen_statem transitions to the success or error state.

However, don't know what you think, but if I look at it now, the whole thing has turned into just a very complicated, cumbersome, restriction-laden way to use a timer :P

from wait_statem.

lhoguin commented on June 20, 2024

by the resource-fetching callback returning ok or erroror a respective tuple, after which the wait_statem module will be popped and the gen_statem transitions to the success or error state.

I think I misunderstood parts of wait_statem. I do not think it should have a resource-fetching callback. I think its job should strictly be to wait and optionally to forward events that occur while waiting to the user module (and in that case, handle the commands returned). It should support many different wait/backoff/etc strategies in a standardised way so that all libraries that use it can use the same configuration format. When it has finished waiting, or when the user module decides that's enough, it gives back control to the user module. That's it. The user module can THEN do its resource-fetching or whatnot if necessary.

Configuration can be like:

#{
    wait_strategy => fixed_time() | backoff() | ...
    events_strategy => postpone() | ignore() | error_out_the_calls() | call_user_module(module(), function(), State)
}

I do not think there is much more to be done:

How long should wait_statem wait before giving back control to the user module by popping itself out?
What should be done about incoming events?

wait_statem should not DO anything itself, not even invoke a callback to "retry". And it should only wait ONCE.

I know this feels backward with regard to push/pop and how wait_statem has to keep track of attempts. To that end I propose another value to be used in the configuration when it matters:

#{
    retry_attempt => non_neg_integer()
}

This value is likely to be already defined by the user if they want a hard limit on the number of retries (Gun defaults to 5). wait_statem doesn't need to keep state of its own, just configuration.

Another value that is likely to be useful is the user module's own state, so maybe have a #{state => State} key and put the state back to normal once wait_statem is done. Of course that state needs to be updated when the user module is called, but it's really the only thing wait_statem needs to keep track of, the rest of its own state is static.

Forget about the resource fetching, and focus on just waiting. This is the missing functionality that we reimplement just about everywhere. If I could switch to a not_connected state and push wait_statem and let it do its thing in any gen_statem I write, and if the configuration for that behavior is the same across all user libraries because they all use wait_statem, then the problem is solved.

from wait_statem.

Maria-12648430 commented on June 20, 2024

Ok, so in a nutshell, you want something that waits for the sake of waiting, not for something to happen, right? :)

from wait_statem.

lhoguin commented on June 20, 2024

Whether something happens afterwards is of no concern to wait_statem yes.

from wait_statem.

Maria-12648430 commented on June 20, 2024

I see... hm, ok then, but in that case I don't see much value in it, I have to admit. Or maybe I'm getting things wrong again, so let's clear things up more before I start on a wild goose chase again ;)

What you would do now is roughly this:

When entering the not_connected state, you calculate an appropriate timeout for waiting (maybe fixed, maybe depending on the attempt number, maybe with some random jitter, or a combination of those) and start a state timeout with that value.
You have a state function or handle_event clause that transitions to the connecting state when it is called. Alternatively, it may send an internal event or something.
If you want to ignore all events when in that state, you have a catch-all clause returning keep_state_and_data, if you want to postpone all events you have the same plus the postpone action, if you want to handle events you have appropriate clauses.
In any case, you have to track the attempt number and keep the settings for the timeout strategy (fixed time, backoff growth, jitter range etc) in the state data (depending on what you need).

With wait_statem...

When entering the not_connected state, you push the module and pass over the attempt number and timeout strategy configuration. But you can't push the module from a state enter call, you need to get into a "real" (event) state call one way or other first. And you need to tell it what to do when the waiting time has passed and wait_statem is popped, like transition to a different state or send an internal event, to let the user statem know that it should continue.
If you want to ignore or postpone events, instead of having function clauses, you tell wait_statem to ignore or postpone. That's shorter than writing function clauses, but not much I'd say. If you want to handle events, you have to write the appropriate function clauses, too.
You have to keep track of the attempt number and strategy settings in the user statem too, to pass them on when pushing wait_statem.

With all this, it looks to me that there is not much room for making things simpler here, that the things you could simplify are special cases (like ignoring/postponing), and that on the other hand you have to do other things in a more complicated way (entering wait_statem in the first place, for example)...

The only useful thing seems to be the waiting time calculation, and there is only one really useful function for that, a+b*(n^c)+d, where a is a constant (attempt-independent) time, b is an attempt-dependent time growing with respect to the attempt number n and the backoff exponent c, and d a random (jitter) time offset. Everything else is just special cases of this one (like, b=0 and d=0 is fixed_time etc).
Another possibly useful timeout function is a logistic function, but it's probably very rare in comparison...

from wait_statem.

Maria-12648430 commented on June 20, 2024

And you need to tell it what to do when the waiting time has passed and wait_statem is popped, like transition to a different state or send an internal event, to let the user statem know that it should continue.

Ok, that could actually be done by re-inserting the state timeout via next_event when popping wait_statem...

from wait_statem.

Maria-12648430 commented on June 20, 2024

And you need to tell it what to do when the waiting time has passed and wait_statem is popped, like transition to a different state or send an internal event, to let the user statem know that it should continue.

Ok, that could actually be done by re-inserting the state timeout via next_event when popping wait_statem...

... in which case you need a function clause for that in the users statem, too. If you set up wait_statem to pass through events, there really is no difference to using a plain gen_statem any more, or is there?

from wait_statem.

lhoguin commented on June 20, 2024

I think it's about as useful as the timer module, yes. But if done right it can be very nice to have.

I suppose there could be a way to also manage the number of retries as well, if the wait_statem gives back its state when it gives back control. But there will need to be a way to reset the number of retries. In Gun retries are reset after a connection was successfully established (including TLS and such).

I don't know if you need the extra event when switching to not_connected, perhaps you can switch the state AND the callback module all in one? So the not_connected state function in the user module would only ever be called as a callback if necessary (or perhaps we can have a callback with a fixed name to avoid configuring it, and if the callback doesn't exist then you can default to ignore perhaps). And in that case wait_statem could switch the state and pop the callback module all in one go as well.

In that case not_connected is just so that when we do sys:state we can see the state name, it would matter little otherwise.

from wait_statem.

Feedback about wait_statem HOT 28 OPEN

Comments (28)

Related Issues (1)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent