Coder Social home page Coder Social logo

serverlessworkflow / specification Goto Github PK

View Code? Open in Web Editor NEW
706.0 706.0 145.0 33.44 MB

Serverless Workflow Specification

Home Page: http://serverlessworkflow.io

License: Apache License 2.0

Makefile 1.00% JavaScript 0.44% TypeScript 25.32% Gherkin 73.24%
cncf serverless specification workflow

specification's People

Contributors

antmendoza avatar bbalakriz avatar cdavernas avatar elanhasson avatar fjtirado avatar gibson042 avatar hbelmiro avatar jbbianchi avatar jorgenj avatar josephblt avatar lcaparelli avatar lsytj0413 avatar lynxnathan avatar manick02 avatar manuelstein avatar marianmacik avatar maslino avatar matthias-pichler-warrify avatar mikoskinen avatar mswiderski avatar pmorie avatar radtriste avatar ricardozanini avatar tomasdavidorg avatar transientvariable avatar tsurdilo avatar wenfengwang avatar whuanle avatar yzhao244 avatar zdrapela avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

specification's Issues

spec stateDataFilter dataInputPath and dataOutputPath question

What is the question:
example select veggieLike

state config:
{
"name": "VegetablesOnlyState",
"type": "inject",
"stateDataFilter": {
"dataInputPath": "{{ $.vegetables }}",
"dataOutputPath": "{{ $.[?(@.veggieLike)] }}"
},
"transition": {
"nextState": "someNextState"
}
}

if data input 1 is

{
"fruits": [ "apple", "orange", "pear" ],
"vegetables": [
{
"veggieName": "potato",
"veggieLike": true
},
{
"veggieName": "broccoli",
"veggieLike": false
}
]
}

then state output is an object
{
"veggieName": "potato",
"veggieLike": true
}

but if data input 2 is :
{
"fruits": [
"apple",
"orange",
"pear"
],
"vegetables": [
{
"veggieName": "potato",
"veggieLike": true
},
{
"veggieName": "tomato",
"veggieLike": true
},
{
"veggieName": "broccoli",
"veggieLike": false
}
]
}

is the output to be :
[
{
"veggieName": "potato",
"veggieLike": true
},
{
"veggieName": "tomato",
"veggieLike": true
}
]

if it is the output ,but this have a problem input data 1 the output is an object , input data 1 the output is an array . the output is different schema.

Is cron like scheduling possible?

What is the question:

Continuing the issue from previous repo cncf/wg-serverless#245

Original Question
I've been reviewing the spec and it seems like you can define the state/task start kind to be scheduled but this only takes an interval as a property. Is it possible to specify a cron like syntax?

Example
From Wikipedia https://en.wikipedia.org/wiki/Cron

This example runs a shell program called export_dump.sh at 23:45 (11:45 PM) every Saturday.

45 23 * * 6 /home/oracle/scripts/export_dump.sh

@tsurdilo Added an expected cron example above that I'm trying to figure out if it's supported.

Function invocations (gRPC)

What would you like to be added:
Functions can currently be called using OpenAPI specs. The idea is to also allow the use of gRPC APIs (and maybe others, such as AsyncAPI, plain HTTP calls, etc).

Why is this needed:
These may be optional to be supported in the spec (making OpenAPI mandatory), but when specified, at least it would clarify how function invocations performed by the workflow would bind to service protocols other than the OpenAPI kind.

ForEach State 、Parallel State: Branch use workflowId Question

What is the question:
In ForEach State 、Parallel State: Branch use sub-workflow id . How to specify whether to wait for the sub-workflow to complete ? Why does subflow state have waitforcompletion, but foreach state and parallel state: branch does not?

Ongoing - workflow model optimization

What would you like to be added:
Issue to discuss workflow optimization ideas.
We should look into things like:

  1. reducing the model complexity
  2. reducing the model verbosity but still keeping it as descriptive as possible
  3. Look into alternative approaches for certain model definitions
  4. Any other ideas/thoughts etc
    Why is this needed:

Prepare for release

We are ready to do another release. Please add here any issues/concerns.

The release need to happen for the SDKs as well and also be reflected on website.

Action & EventsRef

What would you like to be added:

The EventsRef option for Action definition is confusing, it really seems to be an RPC call where the transport protocol is cloud-events (the remote-procedure is invoked via an event and the response is received via another event). This really seems like another way to invoke a 'function', and thus, in my humble opinion, should be redefined as a type of function, rather than a separate type of action.

Why is this needed:

As the Action is defined now, it's really unclear at first read what these 'events' in the eventsRef are for. It took me several reads through the entire spec to understand what this is doing. It would be much easier to understand if this were redefined so that this is just another type of 'function' rather than a different type of action.

Proposal:

  • Remove the 'eventsRef' attribute from 'action' definition
  • Add support for a new type of 'function', re-purposing the description for eventsRef.

Same behavior, but as a 'function', because what it seems to be describing is a way to 'invoke' something via cloud-event.

Question about ”State data filtering - Using multiple filters “

What is the question:
in spc:https://github.com/serverlessworkflow/specification/blob/master/specification.md#-state-data-filtering---using-multiple-filters

Question 1 :
"dataInputPath": "{{ $.hello }} " ,after "dataInputPath" is evaluated and filters . The current state data should be:
{
"english": "Hello",
"spanish": "Hola",
"german": "Hallo",
"russian": "Здравствуйте"
}
instead of
{
"hello": {
"english": "Hello",
"spanish": "Hola",
"german": "Hallo",
"russian": "Здравствуйте"
}
}

because JsonPath : $.hello result as follows :
image

Question 2:
In the example :
"greetingFunction" returns:
{
"finalCustomerGreeting": "Hola John Michaels!"
}

action :"dataResultsPath": "{{ $.finalCustomerGreeting }}"
after "dataResultsPath" is evaluated and filters. Why is the "finalCustomerGreeting" field action result is merged into state data. for JsonPath $.finalCustomerGreeting the result not contain "finalCustomerGreeting":

image
So should be just "Hola John Michaels!" merged into state data ? If not ,What exactly does action "dataResultsPath" mean?
In the specification, the description of "dataResultsPath" is "JsonPath expression that filters the actions data result, to be added to or merged with the states data" (https://github.com/serverlessworkflow/specification/blob/master/specification.md#-state-data-filtering---action-data-filter).
Is the meaning of this description should be to select and filter action result only as JSON path. It does not say what field name has been put into state data for the filtered result ?

Workflow invocation bindings

What would you like to be added:
A workflow can currently be designed to be triggered by events (using event-type state as starting state) and it can be scheduled (using scheduled start state). The "default" starting kind leaves it to the runtime with no API spec of the workflow invocation (e.g. the data expected). The discussion around the scope (public/private) has led to the idea of allowing different bindings of the workflow.

Why is this needed:
Workflow invocation bindings, in general (event, scheduled, OpenAPI, gRPC), are crucial to the way the workflow interacts with its environment. The entry point to the workflow (starting state) should be reusable (regardless whether the input data comes from a CloudEvent, an API or some other structured message). But at the same time, the semantics of a state may depend on the input method, e.g. trigger by a certain CloudEvent type may use a different starting state than the entry point of an API call.

More info:
The binding of a workflow to an API is currently left to the runtime. It should be possible to bind a workflow to one method or another, define access control (which requires AuthZ and AuthN), etc. There's a trade-off between making these part of the workflow or separating them out. AuthN and AuthZ, for example, is something I'd expect not to be part of the workflow definition.
Serverless Workflows may separate the spec of bindings, which would separate the data communication method (event, API call) from the workflow definition. We could also define invocations as part of the workflow (where starting states are defined).

spec End Definition kind

What is the question:
default - Default workflow execution completion; no other special behavior.
terminate - Completes all execution flows in the given workflow instance. All activities/actions being executed are completed. If a terminate end is reached inside a ForEach, Parallel, or SubFlow state, the entire workflow instance is terminated.

If kind is to be set “default” ,workflow execution completion .Is the workflow completed, but if there are other states running, will it continue to run and move to the next state? Can you give an example of the difference between fault and terminate?

Error types with OpenAPI

What would you like to be added:
The possibility to define errors similar to the function definitions. Functions can be defined referencing an OpenAPI operation ID from the service specification. That specification would also contain responses named either by a specific HTTP response code (e.g. 401) or by a wildcard ("default") excluding the ones specified. HTTP response codes >= 400 are considered errors and should be treated as such. Regardless the HTTP code (which may be considered protocol-specific), the OpenAPI spec also identifies application-domain response objects.

At application-level, a named object in an error response should be usable during error handling.
For the particular cases of OpenAPI, error handling of unspecified HTTP response errors would be nice to support (by their code).

Why is this needed:

When function calls fail at the application level (e.g. Item not in inventory), they are identified by a mandatory (human-readable) string error. A developer using an OpenAPI spec of a service would not know where this error string comes from:

  • The response may be a 404 NotFound which has 404 as code and would carry an additional reason phrase intended for the human user (e.g. "Not Found"), but that may vary (e.g. with well-known sub-codes "Site Not Found")
  • The OpenAPI response carries a description field, which AFAICS is not used as reason phrase as it also allows to be markdown-formatted
  • The content (body) may be binary, or object of various encoding

Even if the OpenAPI spec encodes a meaningful object or identifier in the error response, the developer can only guess or learn by trial-and-error if the error string used in the workflow works with the runtime and matches the expected error.

OpenAPI operation error responses can clearly be identified by the response status code, but the spec doesn't allow to use the error code without an error string. Furthermore, for the same response code (e.g. 404 Not Found), there may be multiple response contents, e.g. a oneOf clause in the OpenAPI spec may allow different objects to be returned as part of the error response. Nonetheless, these objects encode the error that is specific to the application (domain), e.g. "Item not in inventory"

SubFlow state; child/parent interactions

What is the question:

Given the following 'parent' and 'child' workflow definitions:
Parent workflow

{
    "id": "ParentWorkflow",
    ...
    "events": [{
        "name": "ParentEvent",
        ...
     }],
    "functions": [{
        "name": "ParentFunction",
        ...
    }],
    "states":[
        {
            "name": "WaitForChild",
            "type": "SubFlow",
            "waitForCompletion": true,
            "workflowId": "ChildWorkflow",
            "start": { "kind": "default" },
            "end": { "kind": "default" }
        }
    ]
}

Child workflow

{
    "id": "ChildWorkflow",
    ...
    "events": [{
        "name": "ChildEvent",
        ...
     }],
    "functions": [{
        "name": "ChildFunction",
        ...
    }],
    "states":[
        ...
    ]
}

Inheritance

The SubFlow state section of the spec states that:

Sub-workflows inherit all the function and event definitions of their parent workflow.

What does this mean? Given that the intent of sub-workflows appears to be re-usability amongst potentially many parent workflow definitions, this seems to imply that the functions/event definitions of the parent workflow are injected at run-time into the 'execution context' of the child workflow instance when it's invoked. Is this correct?

Given the example above, can the ChildWorkflow have an Operation state which references a ParentFunction, even though one such function is not actually defined in the child workflow definition?

Does this mean that a child workflow can reference functions/events in its state definitions which are not actually defined in its workflow definition? I have a concern that this means that a particular workflow definition cannot be fully validated until the very moment when an instance of that workflow is created (since it might reference events/functions that are only provided by a 'parent' workflow).

Parent state data

The SubFlow state section of the spec states that:

If the waitForCompletion property is set to true, sub-workflows have the ability to edit the parent's workflow data. If the waitForCompletion property is set to false, data access to the parent workflow should not be allowed.

I'm confused about what "edit the parent's workflow data" means? Is it actually allowing the child to alter the data of the parent workflow in some way, or is this just intending something more like the following?

  • waitForCompletion=false, parent doesn't wait for child to finish, so child output data won't be available to potentially merge into parent state data
  • waitForCompletion=true, parent waits for child to finish, so child output data can be merged into parent state data (via outputDataFilter defined by the parent)

How to just select some filed from input json use state Data Filter ?

What is the question:
inputJson :
{
"inputkey1": "value1",
"inputkey2": "value2",
"inputkey3": "value3",
"inputkey4": "value4"
}
if I want
outputJson :
{
"outputkey1": "value1",
"outputkey2": "value2"
}

When two fields inputkey1 and inputkey2 need to be taken from datainputpath JSON data to construct an output JSON, how should dataoutputpath be set ?

Provide .NET SDK

What would you like to be added:
Provide .NET SDK

Why is this needed:
To support customers building .NET applications.

Uniqueness constraint for workflows

What would you like to be added:

The ability to define some 'uniqueness' attributes of a workflow, there-by allowing the runtime to enforce a 'single logical instance' of a workflow at any given time.

Why is this needed:

Note; I think the Events state + correlation rules actually already support this in a way, but as far as I can tell it only works for workflows that are invoked via a consumed event.

Some common cases that I don't think are currently supported (please correct me if I'm missing something):

  • For scheduled (cron-style) workflows, the ability to skip launching a new workflow instance if a previous workflow instance is still running.
    • For example, an hourly 'garbage collection' workflow that deletes stale resources. If the previous one is still running, users may not want to have a second workflow instance running at the same time.
  • For directly invoked workflows, the ability to skip launching a new workflow instance if a similar workflow instance is already running
    • For example, a 'recovery workflow' which terminates and relaunches a cloud resource, 'similar' in this case would mean the same type of workflow targeting the same cloud resource.
  • Similar for sub-workflows, the ability to skip launching a new workflow instance if a similar workflow instance is already running

My understanding (which could be very incorrect) of the event correlation rules is that I could accomplish all of these use-cases but only if my workflows are invoked via events:

  • If I want a single garbage-collection workflow, never more than one, then I can set a correlation rule with a hard-coded contextAttributeValue=SomeValue in the event definition.
  • If I want to ensure there's only one workflow for a given resource at one time, then I can instead have a correlation rule with contextAttributeName=SomeAttributeContainingTheResourceId

Unfortunately, this doesn't work very well for scheduled (cron-style) workflows because it would require the arrival of an 'event' at roughly the same timing as the cron schedule (per the spec docs). It also seems to preclude the use of the SubFlow state as that doesn't seem to invoke the child workflow via event, if I understand things correctly.

I don't have a solution to propose here, but I wanted to initiate a conversation about how we might extend the existing model so that users can accomplish this for:

  • directly invoked workflows
  • sub-workflows
  • scheduled (cron-style) workflows

Add Workflow Tracing extension

Describe the extension you would like to be added:
Tracing Extension
What is the main purpose of the extension:
Allow users to define tracing extension which can help to optimize workflows with tools such as Jaeger and OpenTelemetry

Retries; exponential backoff & max backoff duration

What would you like to be added:

Support for exponential backoff and to be able to cap the maximum amount of time between retry attempts.

Why is this needed:

Exponential backoff

A common strategy for retrying on failure conditions is to backoff exponentially, to support progressively larger backoff durations if the failure is potentially due to something like a network partition or service outage.

The current spec describes 'multiplier' as a duration, where subsequent backoff between retries is incremented by the duration value of 'multiplier'. Would it be possible to add support for float values as well, in which case the backoff between retries would be calculated by multiplying the float value by the previous backoff duration (interval attribute providing the initial value if no previous retry attempts yet)? Or perhaps the current field should be renamed to something like 'increment' and a new 'multiplier' field which expects a float value could be added?

Max backoff duration

I think there are many use-cases for this, but one such use-case:

For cloud orchestration workflows, a common use-case is to request a resource in one step of a workflow and then to 'poll' for completion of the resource in a sub-sequent step (eg. launch a compute instance and then wait for status=AVAILABLE). Workflow implementers want to be 'kind' to the service being called and thus back-off, but they also want to keep the workflow timely/responsive by not waiting too long between checks.

Would it be possible to add a maxDuration attribute, which sets an upper-bound on the backoff duration between attempts of a state?

“ isAdult to true” this is not JsonPath syntax

What is the question:

{
"isAdult" : "{{ $.[?(@.age > 18)] }}"
}
would set the value if isAdult to true, as the expression returns a non-empty subset of the JSON data. Similarly

“ isAdult to true, as the expression returns a non-empty subset of the JSON data”
This description in the specification (master branch) is not a function supported by jsonpath syntax. Who should implement it ?

The 'kind' parameter for event seems unnecessary

What would you like to be added:

The kind parameter of the Event Definition can be removed.

Why is this needed:

In the specification these are all of the places where an event might be referenced:

In each case, whether or not the referenced event will be produced or consumed is transparent via context. What purpose does kind serve? It could be used as a sort of type safety, but I think the real-life impact is to introduce unnecessary complexity, verbosity, and cause trivial errors.

Allow functions to reference "trigger" and "response" events rather than defining string resource

What would you like to be added:

There are cases where functions are triggered by events, not by a string-type resource (uri/arn/etc). For these cases
allow functions to reference a "trigger" and "response" events (produce/consume events).

Add detailed description on how event-triggered functions should function within different states such as operation, event, callback states.

Why is this needed:

Defining event types as consumed/produced

This is carried over from issue cncf/wg-serverless#186 in old repository created by @manuelstein.

What would you like to be added:

Add ability for event definitions to define a produced/consumed type

Why is this needed:
A workflow description has an events property, a set of CloudEvent definitions that can be "consumed or produced". Each event definition has a mandatory source and type in accordance with CloudEvents.

Issue: It is difficult to identify the ones produced and the ones consumed. An engine that acts as the producer would need to ensure that all created events' source+ID pairs are unique, e.g. by putting itself as source. An engine may want to register as consumer of consumed event types only.
Should the spec separate produced/consumed event types?

Also, for the types produced, what would be used as source?
The developer can statically set an absolute URI and would somehow need to ensure all produced events' IDs are unique (UUIDs are typically reliable). Or would the workflow engine want to put its own absolute URI to identify as the source of an event?

Check Images and Links in specification documents

Why is this needed:
We recently moved to this repository and did a little restructuring of documents and their locations.

It would be great if we can please check all the images and links in spec documents to make sure they are still valid and work.

Thanks!

Add a parameter classifier to the Function definition scheme

What would you like to be added:
A way to classify the parameter in the function definition (optional) so implementations could have a hint of how to handle the parameter when executing a given function.

For example a given operation state:

 "functions": [
        {
            "name": "userAuthServiceCall",
            "resource": "http://authservice.com",
            "type": "rest",
            "parameters": [
                {"name": "user", "classifier": "querystring"},
                {"name": "token", "classifier": "header"}
            ],
            "metadata": {
                "method": "GET"
            }
        }
]
(...)

Why is this needed:
With the classifier, implementations would know how to handle these parameters on a REST call scenario for instance.

StateDateFilter dataInputPath question

What is the question:
In the spec 0.5 version , there are tow examples.

In the first example:

image
dataInputPath is {{ $.hello }} .
data input is:
image
The event state stateDataFilter is invoked to filter this data input. Its "dataInputPath" is evaluated and filters only the "hello" greetings in different languages. At this point our event state data contains:
{
"hello": {
"english": "Hello",
"spanish": "Hola",
"german": "Hallo",
"russian": "Здравствуйте"
}
}
Please note that the JSON field is "hello".

In the second example:

image
dataInputPath is {{ $.fruits }} ,Please note that the JSON field not contains "fruits".

These two examples make me wonder why statedatefilter datainputpath sometimes uses its own fields when putting data into state data? Statedatefilter dataoutputpath has the same question.

What is states data and How to merge with it .

What is the question:
Merged with the States data appears many times in the specification (action Data filter dataResultPath ,Error Data Filter ), but it does not explain what states data is and how to merge with it? Can you add some explanations and examples ?

Retries; maxAttempts and interval attributes

What is the question:

If I'm understanding correctly, maxAttempts and the interval attribute both indicate the max number of attempts. Are these fields really both allowing the user to specify the number of attempts, or am I misunderstanding something about these fields?

From the spec

Parameter Description
interval Interval value for retry (ISO 8601 repeatable format). For example: "R5/PT15M" (Starting from now repeat 5 times with 15 minute intervals)
maxAttempts Maximum number of retry attempts. Value of 0 means no retries are performed

Research to add support to JSON Patch Schema (RFC 6902)

What would you like to be added:
From the RFC 6902 document:

JSON Patch defines a JSON document structure for expressing a
sequence of operations to apply to a JavaScript Object Notation
(JSON) document

Would be interesting to support this RFC on scenarios where an user need to modify the content of the data within the workflow. I'm opening this issue to keep track of this and do some research around use cases this would be useful instead of using JSON path directly. It's super simple to understand and it's gaining adoption in the Kubernetes community, like in the kustomize tool for instance.

Why is this needed:
This is an alternative to transform the data on a declarative way.

Remove need to set "value" and "operator" from data-based switch state

What would you like to be added:
Remove the need to set "value" and "operator" in data-based switch states.
Why is this needed:
This can be accomplished with JsonPath expressions only. No need to set comparison values or hard-code the "operator" values which will never be enough.

Question about timeout definition in callback state

What is the question:

states:
- name: CheckCredit
  type: callback
  start:
    kind: default
  action:
    functionRef:
      refName: callCreditCheckMicroservice
  eventRef: CreditCheckCompletedEvent
  timeout: PT15M
  transition:
    nextState: StartApplication
- name: StartApplication
  type: subflow
  workflowId: startApplicationWorkflowId
  end:
    kind: default

two scenarios

  1. The CreditCheckCompletedEventevent occurs within the defined timeout, the flow runs well and StartApplication is triggered.
  2. The CreditCheckCompletedEventevent does not occur within the defined timeout, the StartApplication is still triggered.

My question is how the StartApplication state knows whether the previous state is timeout or not? because we do not want call a subflow when the previous state is timeout. Is there some spec like onErrors or default in callback state?

Single expression language

What would you like to be added:
Currently we allow in expressions for users to specify the expression language to be used.
This is not portable and we should consider defining use of a single expression language. On the one hand this does
reduce possible adoption as different impls may want to use different expression languages, but it promoted portability which is one of main goals of this specification.

After looking at many different expression languages, I tent to go with https://opensource.google/projects/cel.
Another mentioned option is the FEEL expression language which is part of the DMN specification (OMG).

Should the order of conditions in a Switch State be honored by implementations?

What is the question:
Are implementations expected to honor the order of Switch State conditions? Seems logical, but it's not explicitly stated, so I thought it would be worth asking.

In a switch with 2 conditions (A and B), it's possible that both condition A and B are true, so order matters as either A or B are eligible for execution and could produce different results depending on which gets ran.

I believe this is important not only to data-based conditions, but to event-based conditions as well: it's possible that event A and B have already happened or arrive simultaneously when this switch state is reached and the order on which they are consumed matters.

Update workflow error handling section

What would you like to be added:
Current "workflow error handling" section does not clearly define how errors / retries / timeouts etc can be handled.
Examples do not include some error checking / timeout / retry examples either.
We should update the spec to address both points.
Why is this needed:

Add Workflow KPI extension

What would you like to be added:

This request is to add Key Performance Indicators (KPI) spec extension.

Why is this needed:

  • How can we monitor which workflow or specific states were completed in time and which did not?
  • How can we monitor if a specific function execution (service call) was completed in time? How can we monitor if its execution was within the range of expected cost?

KPI is very useful to have for runtime workflow analysis tools.

Add Workflow Simulation Extension

Describe the extension you would like to be added:
Workflow Simulation extension
What is the main purpose of the extension:
Extension would allow users to specify structural and capacity analysis info to be defined. This will provide
pre and post execution optimization for workflows.

Proposal to move the 'Retry definition' section (under Events state) to the general section on retries

What would you like to be added:

Can we move 'retry definition' section under 'Events state' section to the general section on retries

Why is this needed:

The 'Retry definition' section under the 'Events state' has some great detail on the schema and behavior of retries. All other states only link to the generic retries section which does not include all of this detail. As a first-read through the doc, Event state is the first state mentioned with retries so it's natural to be introduced to the concept at this point but when using the spec as reference material it's hard to locate all the details on retry behavior where it's located currently (partly under Events state, and partly in the general 'retries' section).

Add mechanism to control non-determinism in retries

What would you like to be added:
On the retry definition there is an option to increase the time period to be waited between attempts ("multiplier"). Many exponential backoff implementations also make use of a (usually small) bounded random amount of time that is added to the delay.

It would be nice if this bounded random amount of time (sometimes referred to as "jitter") was part of the spec, it can be very important to certain workloads and IMO it's better that this is enforced by the specification than to be left up to the implementation.

This parameter could be of type integer and be measured in milliseconds. The delay for the n-th attempt could be defined as: interval * exp(multiplier, n-1) + rand(0, jitter), though I believe the exact formula could be left up to the implementation. The key point here is enforcing a mechanism to provide bounded randomness.

Why is this needed:

This is important in scenarios where the transient failure is a result of a race condition (for example) between two actions which could be, by bad luck, causing the failure and waiting the same amount of time before retrying and causing the failure again, only to wait for the same amount of time again (in this scenario, their "interval" and "multiplier" would be the same).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.