kubeflow / pipelines Goto Github PK
View Code? Open in Web Editor NEWMachine Learning Pipelines for Kubeflow
Home Page: https://www.kubeflow.org/docs/components/pipelines/
License: Apache License 2.0
Machine Learning Pipelines for Kubeflow
Home Page: https://www.kubeflow.org/docs/components/pipelines/
License: Apache License 2.0
Here is the raw HTTP:
GET /api/v1/namespaces/kubeflow/services/ml-pipeline:8888/proxy/apis/v1beta1/jobs?page_size=10&resource_reference_key.id=9333ecee-28b2-4c53-807d-bbfd2a45423f&resource_reference_key.type=EXPERIMENT HTTP/1.1
Host: 35.224.113.48
User-Agent: Go-http-client/1.1
Accept: application/json
Accept-Encoding: gzip
HTTP/2.0 500 Internal Server Error
Connection: close
Audit-Id: a32bebb7-3520-4819-9f2f-1003a0d39977
Content-Type: application/json
Date: Fri, 09 Nov 2018 08:55:32 GMT
{"error":"Failed to list jobs.: List jobs failed.: List data model failed.: InternalServerError: Failed to list jobs: sql: Scan error on column index 0, name "UUID": unsupported Scan, storing driver.Value type \u003cnil\u003e into type *string: sql: Scan error on column index 0, name "UUID": unsupported Scan, storing driver.Value type \u003cnil\u003e into type *string","code":13,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Internal Server Error","error_details":"Failed to list jobs.: List jobs failed.: List data model failed.: InternalServerError: Failed to list jobs: sql: Scan error on column index 0, name "UUID": unsupported Scan, storing driver.Value type \u003cnil\u003e into type *string: sql: Scan error on column index 0, name "UUID": unsupported Scan, storing driver.Value type \u003cnil\u003e into type *string"}]}
Raw error from the service: Failed to list jobs.: List jobs failed.: List data model failed.: InternalServerError: Failed to list jobs: sql: Scan error on column index 0, name "UUID": unsupported Scan, storing driver.Value type into type *string: sql: Scan error on column index 0, name "UUID": unsupported Scan, storing driver.Value type into type *string (code: 13)
Currently, runs have two ways of telling which pipeline was used to start them:
The UI should be passing that spec when cloning a run that does not have a pipeline id. We need to also figure out the UX, since a user might still be able to change their mind after starting a clone from a run, then want to switch to another pipeline.
@ajayalfred any thoughts here?
Our SDK samples need a notebook do demonstrate the ability to create components and pipelines.
I'd like to contribute/help to this project, are there any milestones or actual codes for pipelines?
It looks like that there are two procedures:
please reply if you don't mind.
It looks like it's showing the total # of jobs across all experiments, instead of just that experiment.
does the parameter maxHistory
assumes removing old argo workflows, or it's just defines number of records in workflowHistory
?
i have following configuration (keep it short for simplicity):
apiVersion: kubeflow.org/v1alpha1
kind: ScheduledWorkflow
metadata:
name: iris-trainer
namespace: playground
spec:
enabled: true
maxHistory: 5
trigger:
cronSchedule:
cron: "@hourly"
workflow:
spec:
# argo workflow declaration
entrypoint: iris-train
onExit: exit-handler
arguments:
parameters:
- name: learning-rate
value: "0.01"
- name: num-boost-round
value: "100"
templates:
- name: iris-train
In a workfloHistory
is see 5 last records:
trigger:
LastIndex: 47
LastTriggeredTime: 2018-09-03T03:00:00Z
NextTriggeredTime: 2018-09-03T04:00:00Z
workflowHistory:
completed:
- Phase: Succeeded
createdAt: 2018-09-03T03:00:08Z
finishedAt: 2018-09-03T03:00:28Z
index: 47
name: iris-trainer-47-991149392
namespace: playground
scheduledAt: 2018-09-03T03:00:00Z
selfLink: /apis/argoproj.io/v1alpha1/namespaces/playground/workflows/iris-trainer-47-991149392
startedAt: 2018-09-03T03:00:08Z
uid: 7702b405-af25-11e8-a9d4-06bcfad5caf4
- Phase: Succeeded
createdAt: 2018-09-03T02:00:07Z
finishedAt: 2018-09-03T02:00:27Z
index: 46
name: iris-trainer-46-1007927011
namespace: playground
scheduledAt: 2018-09-03T02:00:00Z
selfLink: /apis/argoproj.io/v1alpha1/namespaces/playground/workflows/iris-trainer-46-1007927011
startedAt: 2018-09-03T02:00:07Z
uid: 1516d4a3-af1d-11e8-a9d4-06bcfad5caf4
- Phase: Succeeded
createdAt: 2018-09-03T01:00:07Z
finishedAt: 2018-09-03T01:00:27Z
index: 45
name: iris-trainer-45-1024704630
namespace: playground
scheduledAt: 2018-09-03T01:00:00Z
selfLink: /apis/argoproj.io/v1alpha1/namespaces/playground/workflows/iris-trainer-45-1024704630
startedAt: 2018-09-03T01:00:07Z
uid: b35d925e-af14-11e8-a9d4-06bcfad5caf4
- Phase: Succeeded
createdAt: 2018-09-03T00:00:08Z
finishedAt: 2018-09-03T00:00:26Z
index: 44
name: iris-trainer-44-1041482249
namespace: playground
scheduledAt: 2018-09-03T00:00:00Z
selfLink: /apis/argoproj.io/v1alpha1/namespaces/playground/workflows/iris-trainer-44-1041482249
startedAt: 2018-09-03T00:00:08Z
uid: 51fcd7bb-af0c-11e8-a9d4-06bcfad5caf4
- Phase: Succeeded
createdAt: 2018-09-02T23:00:08Z
finishedAt: 2018-09-02T23:00:28Z
index: 43
name: iris-trainer-43-1058259868
namespace: playground
scheduledAt: 2018-09-02T23:00:00Z
selfLink: /apis/argoproj.io/v1alpha1/namespaces/playground/workflows/iris-trainer-43-1058259868
startedAt: 2018-09-02T23:00:08Z
uid: f01968c7-af03-11e8-a9d4-06bcfad5caf4
Unfortunately, old containers wasn't removed:
argo -n playground list
NAME STATUS AGE DURATION
iris-trainer-47-991149392 Succeeded 33m 20s
iris-trainer-46-1007927011 Succeeded 1h 20s
iris-trainer-45-1024704630 Succeeded 2h 20s
iris-trainer-44-1041482249 Succeeded 3h 18s
iris-trainer-43-1058259868 Succeeded 4h 20s
iris-trainer-42-1075037487 Succeeded 5h 18s
iris-trainer-41-1091815106 Succeeded 6h 19s
iris-trainer-40-1108592725 Succeeded 7h 19s
iris-trainer-39-3373026837 Succeeded 8h 19s
iris-trainer-38-3356249218 Succeeded 9h 19s
iris-trainer-37-3406582075 Succeeded 10h 20s
iris-trainer-36-3389804456 Succeeded 11h 20s
iris-trainer-35-3440137313 Succeeded 12h 21s
iris-trainer-34-3423359694 Succeeded 13h 18s
iris-trainer-33-3473692551 Succeeded 14h 18s
iris-trainer-32-3456914932 Succeeded 15h 19s
iris-trainer-31-3507247789 Succeeded 16h 19s
iris-trainer-30-3490470170 Succeeded 17h 19s
iris-trainer-29-3171842504 Succeeded 18h 18s
iris-trainer-28-3188620123 Succeeded 19h 18s
iris-trainer-27-3138287266 Succeeded 20h 18s
iris-trainer-26-3155064885 Succeeded 21h 18s
iris-trainer-25-3104732028 Succeeded 22h 19s
iris-trainer-24-3121509647 Succeeded 23h 18s
iris-trainer-23-3071176790 Succeeded 1d 19s
iris-trainer-22-3087954409 Succeeded 1d 18s
iris-trainer-21-3037621552 Succeeded 1d 19s
iris-trainer-20-3054399171 Succeeded 1d 20s
iris-trainer-19-1023865987 Succeeded 1d 18s
iris-trainer-18-1007088368 Succeeded 1d 20s
iris-trainer-17-1258752653 Succeeded 1d 19s
iris-trainer-16-1241975034 Succeeded 1d 18s
iris-trainer-15-1225197415 Succeeded 1d 21s
iris-trainer-14-1208419796 Succeeded 1d 18s
iris-trainer-13-1191642177 Succeeded 1d 20s
iris-trainer-12-1174864558 Succeeded 1d 19s
iris-trainer-11-1158086939 Succeeded 1d 20s
iris-trainer-10-1141309320 Succeeded 1d 19s
iris-trainer-9-3808217808 Succeeded 1d 20s
iris-trainer-8-3824995427 Succeeded 1d 19s
iris-trainer-7-4043104474 Succeeded 1d 18s
iris-trainer-6-4059882093 Succeeded 1d 20s
iris-trainer-5-4009549236 Succeeded 1d 18s
iris-trainer-4-4026326855 Succeeded 1d 19s
iris-trainer-3-3975993998 Succeeded 1d 25s
iris-trainer-2-3992771617 Succeeded 1d 20s
iris-trainer-1-3942438760 Succeeded 1d 21s
When switching between the "Experiments" and "Runs" tab, the page title is changing and the "Create experiment" button is hidden. This is incorrect, the title and actions above tabs should not change based on tab selection since it breaks the design system rules.
Also, the Experiments link in the nav must remain highlighted as long as the user is in that section.
API would throw invalid input error if experiment name is duplicate. The UI need to handle it properly.
Currently, the ScheduledWorkflow CRD reliably starts Argo workflows, but does not monitor that they complete successfully. It relies on retries embedded in the Argo workflow itself.
The ScheduledWorkflow CRD could provide a retry functionality.
It's probably going to be the most frequently used tab. Especially in cases where the user only has one or two experiments.
See https://github.com/golang/go/wiki/Modules
Argo has just switched recently: https://github.com/argoproj/argo/pull/1071/files
Hey. I would like to try this project out on minikube without GKE. Can't really find any docs around this.
This is a quick analysis of the experiment details page performance. Most of the time spent is because we have to make multiple consecutive requests to load all the information we need. We currently do this:
getExperiment
API to get the details (name, description.. etc).listJobs
API to get all recurring jobs in this experiment.listRuns
API to show the first page of runs in this experiment.getRun
API to get its details (name, status, duration... etc).getPipeline
on its pipeline ID, in order to show the pipeline name.getExperiment
on its experiment ID, if any, to show the experiment name. This is not needed when listing runs of a given experiment, but it's technical debt we accumulated, since we're using the same component to list runs everywhere.This is needed by the UI to show the total number of resources when paging through them. It's also needed when showing the total number of recurring runs in an experiment.
It might make sense to cleanse the pipeline input.
For example, if there is a parameter that requires a GCS path, with a negligible SPACE before the GCS path(e.g. " gs://pipeline/input-bucket") would lead to the pipeline failure. This type of bugs are hard to detect.
If an exit handler is specified in the DSL, the compiler currently uses a bit of a hack to ensure that it is always called before the pipeline terminates: an extra DAG is added to the compiled yaml that is named something like exit-handler-1
which is the only task of the entrypoint DAG, and wraps all other steps within the pipeline (they are tasks within the exit handler's DAG).
This works alright as a workaround, but it clutters the UI in a rather unhelpful way, so we currently hide this node by checking if the name starts with "exit-handler".
This is potentially problematic as it's not the most unlikely name for a user to pick, so maybe something like "__exit-handler" would be better until a better solution is found to the exit handler problem.
Currently, we have hardcoded the release version as the image tag in the samples. We need to make it easy to update these image tags during releases.
As best, we can avoid double releases.
feature request: In the SDK, support get_or_create_experiment()
in addition to create_experiment()
(Bradley has the context).
This is needed so that pods can be garbage collected after they've finished, and to remove the dependency on the cluster state.
Currently, we're already seeing issues when a cluster is resized where the frontend can't find pods started by some of the runs.
Will it prepare the scheduledworkflows' CLIs in this repository to manage controller like argo
? Or adding subcommands to argo.
The Pipeline API Server Swagger Client (Go) returns incomplete output for the pipeline.upload method.
For instance, using the CLI in the directory:
kubeflow/pipelines/backend/src/cmd/ml
With the command:
go run main.go pipeline --namespace kubeflow upload ./samples/hello-world.yaml --name pipleline87 -o json
We get the output:
{
"created_at": "0001-01-01T00:00:00.000Z",
"parameters": null
}
In the All runs list, I click on a run, then use the browser's back button. This does not return me to the same page.
Today the user needs to click on every stop of the workflow to see it's output. This is not convenient for complex reusable workflows. We need to support a mode when the user can see the output of the entire graph in a single feed.
Run overview section cannot be collapsed. Remove collapse action.
Parameters section table style below. Run (objects) are shown as rows, and parameters (attributes) are shown as columns. Match table font style with 'Overview' table (Roboto, 14px, for cell content)
Title of the aggregate view should read "All selected runs"
Vertical top-align all charts in a section
I'd like to insert the run view page into a notebook when I submit the run.
For this I'd like to have a minimal page without left or top navigation, with only the run view and the refresh button.
In addition to run_pipeline
(which doesn't actually create a pipeline object in the UI), bring back the client method for actually creating the pipeline.
People might want to share pipelines, later run other experiments based on that pipeline definition but don't have the original notebook to hand, etc.
(Bradley has the context on this).
Let's remove dsl.python_op in favor of dsl.python_component.
The experiments tab would be used much more frequent than the Pipelines tab.
The Experiments tab also gives much more useful information.
It doesn't look like this install:
pip3 install https://storage.googleapis.com/ml-pipeline/release/0.1.2/kfp.tar.gz --upgrade
includes the kubernetes client lib, which I think is intended to be included, as the %%docker
"magic" requires it.
This issue manifested itself several times. The latest was today when the prow tests were fixed.
Out test images only use the branch code.
git clone https://github.com/kubeflow/pipelines /ml
git checkout 6e96b054fb2585f3577155fa92dd107c6e1b5dd2
But the tests that Prow runs are taken from the result of merging the base branch (master) with the PR branch.
I1103 23:36:06.094] Checkout: /workspace/github.com/googleprivate/ml master:296b540cd724fed645e1652f12428462fd5375ed,1532:5afe507591f58f76a12c9f0f3b6659a30b657060 to /workspace/github.com/googleprivate/ml
I1103 23:36:06.094] Call: git init github.com/googleprivate/ml
I1103 23:36:06.101] Call: git clean -dfx
I1103 23:36:06.105] Call: git reset --hard
I1103 23:36:06.110] Call: git config --local user.name 'K8S Bootstrap'
I1103 23:36:06.116] Call: git config --local user.email k8s_bootstrap@localhost
I1103 23:36:06.122] Call: git fetch --quiet --tags [email protected]:googleprivate/ml master +refs/pull/1532/head:refs/pr/1532
I1103 23:36:10.765] Call: git checkout -B test 296b540cd724fed645e1652f12428462fd5375ed
I1103 23:36:11.199] Call: git show -s --format=format:%ct HEAD
I1103 23:36:11.204] Call: git merge --no-ff -m 'Merge +refs/pull/1532/head:refs/pr/1532' 5afe507591f58f76a12c9f0f3b6659a30b657060
This effectively means that the test code is always taken from master while test image code is taken from the branch and may be out of sync.
We should also do something like
git clone https://github.com/kubeflow/pipelines
cd pipelines
git merge --no-ff 321ca814db4955b3950b0fac06a2d289fe4db39a -m "Merged PR"
For example, if a user sets the start date
to 2/31/2018, the form will show that date, but component will set the start date
to undefined
.
We should either add constraints on the days ourselves, or at least show an error message indicating that the date is invalid.
I think it should be "/pipelines", because our project is called that. But we may also make the /pipeline redirect.
-- EDIT
"Last 5 runs" - is the last one (the rightmost) the real last run?
(Note: not sure if this is already tracked).
With the current way the Pipeline API Server swagger client (Go) is implemented, it does not seem possible to specify a "name" in the "Pipeline Create" API call
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.