flyteorg / flyte Goto Github PK
View Code? Open in Web Editor NEWScalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
Home Page: https://flyte.org
License: Apache License 2.0
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
Home Page: https://flyte.org
License: Apache License 2.0
It's unclear exactly what format things should be in, but for I/O types like CSV/Blob/Schema we should be able to provide a download link for the user.
Options:
Likely it will be option 2.
For things like CSV list, we have to consider how to display a list of these items.
Background: We don't have any transactional guarantees for the case where a schedule rule in cloudwatch is say, deleted but the subsequent database update fails. Although we return an error and a user can retry (and the delete call to cloudwatch is idempotent) unless the user retries we have no guarantee of being in a non-corrupt state.
We could update the scheduled workflow event dequeuing logic to trigger a call to delete a rule when no active launch plan versions exist. Unfortunately there's a possible race condition this exposes in the case of an end-user calling disable in one step, and then enable separately after that.
As a solution, [~matthewsmith] proposed adding an epoch to schedule names to distinguish them. Since we already want to make schedule names more descriptive (with some kind of truncated project & domain in the name) that work can fall under this work item.
Allow loose parallelism as a native part of the Flyte spec. In other words, allow a 'parallel node' to take a list of inputs and map the work out to replicas of the same executable: task, workflow, or launch plan.
Creating CRDs should not result in a death spiral of the operator. We should provide hooks to validate the spec
The local testing story is weak... we can do a better job documenting tips for how to improve.
Our initial idea is that the pyflyte execute command can be run locally, but this has some problems like it uses an autodeleting temp dir and it might mess up real outputs in S3, etc.
We'll play around with stuff and at least come up with some short term workarounds.
Ensure parallel node executions are visible in a reasonable manner in the CLI.
Expected: The error message should still be expanded.
Actual: The error message renders collapsed, but the row is still the size that it would be with the error message expanded. Now the content sits in the middle of a row that is too tall.
This is a task to audit our usage of error messages.
The graph components in the console are designed to be a reusable package, but while it's under active development I'm leaving it inside the flyteconsole repo. This ticket is for tracking the work to be done to publish it as a standalone package.
Right now if a container is misconfigured or something, the job sticks around forever. Propeller should garbage collect and fail.
Some repr methods in Flytekit SDK rely on "required" configurations. This obscures exceptions when config is not available in the environment.
Admin currently allows tasks to be parents of other nodes (1->many) and nodes to be parents of other tasks (1-1). This has lead to some confusion/assumptions:
We have talked separately on different occasions about how this should ideally be represented. This task is to track the concrete steps towards a better model.
Update:
This is a UI bug. We should not attempt to retrieve inputs if no inputsUri
is set, and should not attempt to retrieve outputs if closure.outputsUri
is unset.
Direct child
[https://flyte.lyft.net/api/v1/data/node_executions/flytekit/production/y9n8xi9amd/task1-b0e1be7f74-h-task-sqb5710215b84d56d6770b72f5e3cd4f797910c6e6-0-0]
Grandchild (nested subtask)
[https://flyte.lyft.net/api/v1/data/node_executions/flytekit/production/y9n8xi9amd/task1-b0e1be7f74-h-task-sqb5710215b84d56d6770b72f5e3cd4f797910c6e6-0-0-78d085b30a--sub-taskb5710215b84d56d6770b72f5e3cd4f797910c6e6-0-0]
The above URLs should both return NodeExecution data for the ids provided, but instead they return an error "invalid URI".
The full execution idea ID ex:project:domain:id
In the UI we only show the last portion ("id")
The CLI requires the full "ex:project:domain:id", meaning you can easily copy-paste between the two.
Request from pricing.
We need to determine what info should be available in the breadcrumbs.
The Inputs for launching a workflow accept a Notifications
fields, which can be used to specify notification rules for specific states. It's a little complicated (can be email, PD, Slack to multiple recipients for multiple states), so we'll tackle it as a separate task.
There is an expectation from Admin that some type of output will exist in storage for a NodeExecution. This turns out not to be the case if a container is running without the SDK. We need some type of handling for this case.
{"json":\{"exec_id":"","node":"","ns":"-development","routine":"worker-13","src":"handler.go:216","tasktype":"spark","wf":"***.SparkTasksWorkflow"}
,"level":"warning","msg":"No plugin found for Handler-type [spark], defaulting to [container]","ts":"2019-11-11T21:09:36Z"}
Defaulting Spark to container doesn't make sense and ideally we should fail cleanly at Propeller level and expose it to users instead of executing it as a container task and leading to an unknown/weird container failures. I think this also applies to other tasks like Hive/Sidecar.
The Admin API returns error code values that we can use to show more informative errors to users.
Ensure we have a good visualization for parallel nodes in the UI.
We don't currently support list/map or some of the less common types. This task is to at least implement list/map and explore if there is anything we can do about supporting the other types.
We have enough information from activity execution entity to make calls directly to AWS to retrieve log stream events.
Accessing log streams requires specific permissions. These won't exist on the client (nor should they). But the server side could be granted that role and be a proxy for the logs.
So it might look something like this:
Questions/Concerns:
Problem:
The messages coming back from the API are decoded by protobufjs. But since all the fields in a proto messages are optional by convention, we don't have any assurance that the records are valid and usable. This has caused errors before on the client side.
Solution options:
message as X
) or type-guarding (: message is X
) to the stricter types present on the client side. This has the advantage of being flexible in the UI requirements, and the disadvantage of being difficult to keep in sync with the protobuf source of truth.Option 3 is ideal, but the amount of work necessary to do so is concerning (especially considering it may not work correctly and we might have to back it out).
For workflows which take boolean values, the Console renders a toggle switch. When the toggle remains switched to "off", the resulting computed value is undefined
instead of false
. This translated to passing no value for the input when making the launch request.
For required inputs with no default value, that will result in a 400.
At the very least, if a boolean value is required and has no default, we should be translating an unchecked toggle to false
to make sure the launch request succeeds.
Once default values are implemented for the form, this should become less of an issue.
test issue
The UI currently hides workflows which are marked as archived. But you can only set this value via the CLI / API. Users should be able to mark a workflow as archived through the UI as well.
TCS are excited for the native parallelization offered in Flyte 2.0. This task is for the propeller side execution of parallel nodes.
We want to make some updates to the way we load items:
TODO: Document all the places where we currently use loading spinners.
Currently the UI does not show that a task execution is memoized. It is just absent from execution details if the execution was skipped because of cache.
Depends on #138
Already in the cli:
flyte-cli -h flyte.lyft.net -p flytekit -d development list-executions -f "eq(workflow.version,gitsha)"
This is to track potential for this in the UI.
Customer notes:
NOTE
The UI can already filter executions by Version, but we don't show versions in the executions table. The work here is mostly for adding that.
Will require a small amount of UX work to determine how to surface versions in the table rows.
There are probably some hotkeys worth implementing. This is a placeholder to determine what those should be.
It's useful to filter executions down by the value of certain inputs. For instance, if a workflow takes a region code as an input and is run frequently with different values for the region code, a user may want to only see executions using one given value of that code ("SEA").
This functionality will require a design spec, since workflows may have many inputs of varying types and indexing across those types and values is non-trivial.
Note: There is an internal design document that could be cleaned up and moved to public in order to provide guidance for this item.
We need a story around what types of testing we are doing for the UI, and an update of the existing test coverage to move toward that goal.
Right now, we have a mixture of tests implemented with react-testing-library, Enzyme(?), and react-test-renderer (mostly snapshots which we don't really need).
The target will be:
This is to cover any overflow / nice-to-haves on the graph implementation after the initial usable version. Some ideas:
This will probably be similar to Workflow Version details, in that it will show information from the closure. But it may not show the graph, or it may optionally allow a user to show a graph view of the workflow at that version.
TODO: Determine which details of a LP are useful to show.
test issue
Admin handles most of the auth flow. Console needs to properly handle 401 responses and redirect to the auth flow to refresh cookies.
flyteidl is currently being output as an es6 module, which makes it incompatible with NodeJS unless it is run through webpack first. There's no real reason to do it that way, and protobufjs supports commonjs output, so we should switch to that.
On the Execution Details page, expose the Launch Plan which was used to create the execution.
Implement specification of nodes in SDK.
It should be possible to specify pre and post validators on nodes to prevent advancement of a node (or cache poisoning) if the input/output data does not match standards.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.