Coder Social home page Coder Social logo

kubeflow / pipelines Goto Github PK

View Code? Open in Web Editor NEW
3.4K 103.0 1.6K 355.21 MB

Machine Learning Pipelines for Kubeflow

Home Page: https://www.kubeflow.org/docs/components/pipelines/

License: Apache License 2.0

Makefile 0.15% Dockerfile 0.27% Shell 1.64% Python 47.34% Jupyter Notebook 0.75% Batchfile 0.01% Mustache 0.28% Go 16.92% Smarty 0.06% Jinja 0.01% JavaScript 3.91% TypeScript 28.57% HTML 0.02% PowerShell 0.02% CSS 0.01% Starlark 0.01% MDX 0.03%
kubeflow-pipelines mlops kubeflow machine-learning kubernetes pipeline data-science

pipelines's Introduction

Coverage Status SDK Documentation Status SDK Package version SDK Supported Python versions

Overview of the Kubeflow pipelines service

Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

Kubeflow pipelines are reusable end-to-end ML workflows built using the Kubeflow Pipelines SDK.

The Kubeflow pipelines service has the following goals:

  • End to end orchestration: enabling and simplifying the orchestration of end to end machine learning pipelines
  • Easy experimentation: making it easy for you to try numerous ideas and techniques, and manage your various trials/experiments.
  • Easy re-use: enabling you to re-use components and pipelines to quickly cobble together end to end solutions, without having to re-build each time.

Installation

  • Install Kubeflow Pipelines from choices described in Installation Options for Kubeflow Pipelines.

  • The Docker container runtime has been deprecated on Kubernetes 1.20+. Kubeflow Pipelines has switched to use Emissary Executor by default from Kubeflow Pipelines 1.8. Emissary executor is Container runtime agnostic, meaning you are able to run Kubeflow Pipelines on Kubernetes cluster with any Container runtimes.

Documentation

Get started with your first pipeline and read further information in the Kubeflow Pipelines overview.

See the various ways you can use the Kubeflow Pipelines SDK.

See the Kubeflow Pipelines API doc for API specification.

Consult the Python SDK reference docs when writing pipelines using the Python SDK.

Refer to the versioning policy and feature stages documentation for more information about how we manage versions and feature stages (such as Alpha, Beta, and Stable).

Contributing to Kubeflow Pipelines

Before you start contributing to Kubeflow Pipelines, read the guidelines in How to Contribute. To learn how to build and deploy Kubeflow Pipelines from source code, read the developer guide.

Kubeflow Pipelines Community Meeting

The meeting is happening every other Wed 10-11AM (PST) Calendar Invite or Join Meeting Directly

Meeting notes

Kubeflow Pipelines Slack Channel

#kubeflow-pipelines

Blog posts

Acknowledgments

Kubeflow pipelines uses Argo Workflows by default under the hood to orchestrate Kubernetes resources. The Argo community has been very supportive and we are very grateful. Additionally there is Tekton backend available as well. To access it, please refer to Kubeflow Pipelines with Tekton repository.

pipelines's People

Contributors

ajchili avatar ark-kun avatar bobgy avatar capri-xiyue avatar chensun avatar chongyouquan avatar connor-mccarthy avatar dependabot[bot] avatar gaoning777 avatar gkcalat avatar hongye-sun avatar ironpan avatar ji-yaqi avatar jingzhang36 avatar jlyaoyuli avatar jsondai avatar kevinbnaughton avatar linchin avatar neuromage avatar nikenano avatar qimingj avatar rileyjbauer avatar rmgogogo avatar rui5i avatar sinachavoshi avatar themichaelhu avatar tomcli avatar vicaire avatar yebrahim avatar zijianjoy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pipelines's Issues

Give generated exit-handler a more special name

If an exit handler is specified in the DSL, the compiler currently uses a bit of a hack to ensure that it is always called before the pipeline terminates: an extra DAG is added to the compiled yaml that is named something like exit-handler-1 which is the only task of the entrypoint DAG, and wraps all other steps within the pipeline (they are tasks within the exit handler's DAG).

This works alright as a workaround, but it clutters the UI in a rather unhelpful way, so we currently hide this node by checking if the name starts with "exit-handler".

This is potentially problematic as it's not the most unlikely name for a user to pick, so maybe something like "__exit-handler" would be better until a better solution is found to the exit handler problem.

How does it proceed with this project?

I'd like to contribute/help to this project, are there any milestones or actual codes for pipelines?

It looks like that there are two procedures:

  1. Move argoproj/argo into here. Probably it's necessary to permit from argoproj members.
  2. Implement the pipeline module from scratch using kubernetes APIs (e.g. CRD) like argo.

please reply if you don't mind.

ScheduledWorkflow CRD: CLI

Will it prepare the scheduledworkflows' CLIs in this repository to manage controller like argo? Or adding subcommands to argo.

Date pickers in NewRun do not handle invalid days well

For example, if a user sets the start date to 2/31/2018, the form will show that date, but component will set the start date to undefined.

We should either add constraints on the days ourselves, or at least show an error message indicating that the date is invalid.

Our test code and test images code is not always the same.

This issue manifested itself several times. The latest was today when the prow tests were fixed.

Out test images only use the branch code.

git clone https://github.com/kubeflow/pipelines /ml
git checkout 6e96b054fb2585f3577155fa92dd107c6e1b5dd2

But the tests that Prow runs are taken from the result of merging the base branch (master) with the PR branch.

I1103 23:36:06.094] Checkout: /workspace/github.com/googleprivate/ml master:296b540cd724fed645e1652f12428462fd5375ed,1532:5afe507591f58f76a12c9f0f3b6659a30b657060 to /workspace/github.com/googleprivate/ml
I1103 23:36:06.094] Call:  git init github.com/googleprivate/ml
I1103 23:36:06.101] Call:  git clean -dfx
I1103 23:36:06.105] Call:  git reset --hard
I1103 23:36:06.110] Call:  git config --local user.name 'K8S Bootstrap'
I1103 23:36:06.116] Call:  git config --local user.email k8s_bootstrap@localhost
I1103 23:36:06.122] Call:  git fetch --quiet --tags [email protected]:googleprivate/ml master +refs/pull/1532/head:refs/pr/1532
I1103 23:36:10.765] Call:  git checkout -B test 296b540cd724fed645e1652f12428462fd5375ed
I1103 23:36:11.199] Call:  git show -s --format=format:%ct HEAD
I1103 23:36:11.204] Call:  git merge --no-ff -m 'Merge +refs/pull/1532/head:refs/pr/1532' 5afe507591f58f76a12c9f0f3b6659a30b657060

This effectively means that the test code is always taken from master while test image code is taken from the branch and may be out of sync.

We should also do something like

git clone https://github.com/kubeflow/pipelines
cd pipelines
git merge --no-ff 321ca814db4955b3950b0fac06a2d289fe4db39a -m "Merged PR"

SDK should require kubernetes client lib

It doesn't look like this install:
pip3 install https://storage.googleapis.com/ml-pipeline/release/0.1.2/kfp.tar.gz --upgrade
includes the kubernetes client lib, which I think is intended to be included, as the %%docker "magic" requires it.

Pipeline input cleansing

It might make sense to cleanse the pipeline input.
For example, if there is a parameter that requires a GCS path, with a negligible SPACE before the GCS path(e.g. " gs://pipeline/input-bucket") would lead to the pipeline failure. This type of bugs are hard to detect.

Compare experience – UX changes

  1. Rows selected for comparison should not display selection styling in the 'Overview' section. The checkboxes remain active and rows aren't highlighted. See below for row styling.

screen shot 2018-11-06 at 6 56 15 pm

  1. Run overview section cannot be collapsed. Remove collapse action.

  2. Parameters section table style below. Run (objects) are shown as rows, and parameters (attributes) are shown as columns. Match table font style with 'Overview' table (Roboto, 14px, for cell content)

screen shot 2018-11-06 at 6 55 04 pm

  1. Metrics is to be its own section and not part of overview section

screen shot 2018-11-06 at 6 58 42 pm

  1. Title of the aggregate view should read "All selected runs"

  2. Vertical top-align all charts in a section

Better render pipeline description

The sample now has a link to the source code. The text is cropped.
screen shot 2018-11-06 at 1 21 30 am

Also we probably should render the URL as hyperlink, so user don't have to copy the path.

Remember the page I was on

In the All runs list, I click on a run, then use the browser's back button. This does not return me to the same page.

Run list perf optimizations

This is a quick analysis of the experiment details page performance. Most of the time spent is because we have to make multiple consecutive requests to load all the information we need. We currently do this:​

  1. Call getExperiment API to get the details (name, description.. etc).
  2. Call listJobs API to get all recurring jobs in this experiment.
  3. Call listRuns API to show the first page of runs in this experiment.
  4. For each run (in parallel), call its getRun API to get its details (name, status, duration... etc).
  5. For each run (in parallel), call getPipeline on its pipeline ID, in order to show the pipeline name.
  6. For each run (in parallel), call getExperiment on its experiment ID, if any, to show the experiment name. This is not needed when listing runs of a given experiment, but it's technical debt we accumulated, since we're using the same component to list runs everywhere.

    Some low-hanging perf improvements can be obtained by doing the following:
  • There is no need to do the first three steps in sequence, they're not interdependent.
  • There is no need to do the second three steps in sequence, they're all using the same metadata fields.
  • We can render the list of runs as we get them, before we get the details of each run and its pipeline. This will show their names and statuses, but not their run time or their pipeline name. Subsequent requests then re-render to fill out these fields.

Can't read full text in "Choose a pipeline" dialog

The parts that are visible in the Pipeline name and the Description columns are too short to decide which one to choose.
image

Maybe allow text wrapping with a max row height, and/or narrowing the timestamp column?
image

Pipeline API Server Swagger Client (Go) for Pipeline Upload returns incomplete output

The Pipeline API Server Swagger Client (Go) returns incomplete output for the pipeline.upload method.

For instance, using the CLI in the directory:

kubeflow/pipelines/backend/src/cmd/ml

With the command:

go run main.go pipeline --namespace kubeflow upload ./samples/hello-world.yaml --name pipleline87 -o json

We get the output:

{
"created_at": "0001-01-01T00:00:00.000Z",
"parameters": null
}

feature request: restore the client method for creating a pipeline

In addition to run_pipeline (which doesn't actually create a pipeline object in the UI), bring back the client method for actually creating the pipeline.
People might want to share pipelines, later run other experiments based on that pipeline definition but don't have the original notebook to hand, etc.
(Bradley has the context on this).

Experiment list title should not change

When switching between the "Experiments" and "Runs" tab, the page title is changing and the "Create experiment" button is hidden. This is incorrect, the title and actions above tabs should not change based on tab selection since it breaks the design system rules.

Also, the Experiments link in the nav must remain highlighted as long as the user is in that section.

screen shot 2018-11-05 at 4 47 53 pm

screen shot 2018-11-05 at 4 48 53 pm

Create a sample notebook

Our SDK samples need a notebook do demonstrate the ability to create components and pipelines.

Embeddable run view page

I'd like to insert the run view page into a notebook when I submit the run.

For this I'd like to have a minimal page without left or top navigation, with only the run view and the refresh button.

Doesn't remove old containers (> maxHistory)

does the parameter maxHistory assumes removing old argo workflows, or it's just defines number of records in workflowHistory?

i have following configuration (keep it short for simplicity):

apiVersion: kubeflow.org/v1alpha1
kind: ScheduledWorkflow
metadata:
  name: iris-trainer
  namespace: playground
spec:
  enabled: true
  maxHistory: 5
  trigger:
    cronSchedule:
      cron: "@hourly"
  workflow:
    spec:
    
      # argo workflow declaration    
      entrypoint: iris-train
      onExit: exit-handler

      arguments:
        parameters:
        - name: learning-rate
          value: "0.01"
        - name: num-boost-round
          value: "100"

      templates:

      - name: iris-train

In a workfloHistory is see 5 last records:

 trigger:
    LastIndex: 47
    LastTriggeredTime: 2018-09-03T03:00:00Z
    NextTriggeredTime: 2018-09-03T04:00:00Z
  workflowHistory:
    completed:
    - Phase: Succeeded
      createdAt: 2018-09-03T03:00:08Z
      finishedAt: 2018-09-03T03:00:28Z
      index: 47
      name: iris-trainer-47-991149392
      namespace: playground
      scheduledAt: 2018-09-03T03:00:00Z
      selfLink: /apis/argoproj.io/v1alpha1/namespaces/playground/workflows/iris-trainer-47-991149392
      startedAt: 2018-09-03T03:00:08Z
      uid: 7702b405-af25-11e8-a9d4-06bcfad5caf4
    - Phase: Succeeded
      createdAt: 2018-09-03T02:00:07Z
      finishedAt: 2018-09-03T02:00:27Z
      index: 46
      name: iris-trainer-46-1007927011
      namespace: playground
      scheduledAt: 2018-09-03T02:00:00Z
      selfLink: /apis/argoproj.io/v1alpha1/namespaces/playground/workflows/iris-trainer-46-1007927011
      startedAt: 2018-09-03T02:00:07Z
      uid: 1516d4a3-af1d-11e8-a9d4-06bcfad5caf4
    - Phase: Succeeded
      createdAt: 2018-09-03T01:00:07Z
      finishedAt: 2018-09-03T01:00:27Z
      index: 45
      name: iris-trainer-45-1024704630
      namespace: playground
      scheduledAt: 2018-09-03T01:00:00Z
      selfLink: /apis/argoproj.io/v1alpha1/namespaces/playground/workflows/iris-trainer-45-1024704630
      startedAt: 2018-09-03T01:00:07Z
      uid: b35d925e-af14-11e8-a9d4-06bcfad5caf4
    - Phase: Succeeded
      createdAt: 2018-09-03T00:00:08Z
      finishedAt: 2018-09-03T00:00:26Z
      index: 44
      name: iris-trainer-44-1041482249
      namespace: playground
      scheduledAt: 2018-09-03T00:00:00Z
      selfLink: /apis/argoproj.io/v1alpha1/namespaces/playground/workflows/iris-trainer-44-1041482249
      startedAt: 2018-09-03T00:00:08Z
      uid: 51fcd7bb-af0c-11e8-a9d4-06bcfad5caf4
    - Phase: Succeeded
      createdAt: 2018-09-02T23:00:08Z
      finishedAt: 2018-09-02T23:00:28Z
      index: 43
      name: iris-trainer-43-1058259868
      namespace: playground
      scheduledAt: 2018-09-02T23:00:00Z
      selfLink: /apis/argoproj.io/v1alpha1/namespaces/playground/workflows/iris-trainer-43-1058259868
      startedAt: 2018-09-02T23:00:08Z
      uid: f01968c7-af03-11e8-a9d4-06bcfad5caf4

Unfortunately, old containers wasn't removed:

argo -n playground list
NAME                         STATUS      AGE    DURATION
iris-trainer-47-991149392    Succeeded   33m    20s
iris-trainer-46-1007927011   Succeeded   1h     20s
iris-trainer-45-1024704630   Succeeded   2h     20s
iris-trainer-44-1041482249   Succeeded   3h     18s
iris-trainer-43-1058259868   Succeeded   4h     20s
iris-trainer-42-1075037487   Succeeded   5h     18s
iris-trainer-41-1091815106   Succeeded   6h     19s
iris-trainer-40-1108592725   Succeeded   7h     19s
iris-trainer-39-3373026837   Succeeded   8h     19s
iris-trainer-38-3356249218   Succeeded   9h     19s
iris-trainer-37-3406582075   Succeeded   10h    20s
iris-trainer-36-3389804456   Succeeded   11h    20s
iris-trainer-35-3440137313   Succeeded   12h    21s
iris-trainer-34-3423359694   Succeeded   13h    18s
iris-trainer-33-3473692551   Succeeded   14h    18s
iris-trainer-32-3456914932   Succeeded   15h    19s
iris-trainer-31-3507247789   Succeeded   16h    19s
iris-trainer-30-3490470170   Succeeded   17h    19s
iris-trainer-29-3171842504   Succeeded   18h    18s
iris-trainer-28-3188620123   Succeeded   19h    18s
iris-trainer-27-3138287266   Succeeded   20h    18s
iris-trainer-26-3155064885   Succeeded   21h    18s
iris-trainer-25-3104732028   Succeeded   22h    19s
iris-trainer-24-3121509647   Succeeded   23h    18s
iris-trainer-23-3071176790   Succeeded   1d     19s
iris-trainer-22-3087954409   Succeeded   1d     18s
iris-trainer-21-3037621552   Succeeded   1d     19s
iris-trainer-20-3054399171   Succeeded   1d     20s
iris-trainer-19-1023865987   Succeeded   1d     18s
iris-trainer-18-1007088368   Succeeded   1d     20s
iris-trainer-17-1258752653   Succeeded   1d     19s
iris-trainer-16-1241975034   Succeeded   1d     18s
iris-trainer-15-1225197415   Succeeded   1d     21s
iris-trainer-14-1208419796   Succeeded   1d     18s
iris-trainer-13-1191642177   Succeeded   1d     20s
iris-trainer-12-1174864558   Succeeded   1d     19s
iris-trainer-11-1158086939   Succeeded   1d     20s
iris-trainer-10-1141309320   Succeeded   1d     19s
iris-trainer-9-3808217808    Succeeded   1d     20s
iris-trainer-8-3824995427    Succeeded   1d     19s
iris-trainer-7-4043104474    Succeeded   1d     18s
iris-trainer-6-4059882093    Succeeded   1d     20s
iris-trainer-5-4009549236    Succeeded   1d     18s
iris-trainer-4-4026326855    Succeeded   1d     19s
iris-trainer-3-3975993998    Succeeded   1d     25s
iris-trainer-2-3992771617    Succeeded   1d     20s
iris-trainer-1-3942438760    Succeeded   1d     21s

Support cloning run started from notebook

Currently, runs have two ways of telling which pipeline was used to start them:

  • Either a pipeline was uploaded to the system first, in which case the run will include its id.
  • Or the run was started from a notebook (or CLI), in which case it will (can) embed the entire pipeline spec.

The UI should be passing that spec when cloning a run that does not have a pipeline id. We need to also figure out the UX, since a user might still be able to change their mind after starting a clone from a run, then want to switch to another pipeline.

@ajayalfred any thoughts here?

Unsupported Scan Error While Listing the Jobs of an Experiment

Here is the raw HTTP:

GET /api/v1/namespaces/kubeflow/services/ml-pipeline:8888/proxy/apis/v1beta1/jobs?page_size=10&resource_reference_key.id=9333ecee-28b2-4c53-807d-bbfd2a45423f&resource_reference_key.type=EXPERIMENT HTTP/1.1
Host: 35.224.113.48
User-Agent: Go-http-client/1.1
Accept: application/json
Accept-Encoding: gzip

HTTP/2.0 500 Internal Server Error
Connection: close
Audit-Id: a32bebb7-3520-4819-9f2f-1003a0d39977
Content-Type: application/json
Date: Fri, 09 Nov 2018 08:55:32 GMT

{"error":"Failed to list jobs.: List jobs failed.: List data model failed.: InternalServerError: Failed to list jobs: sql: Scan error on column index 0, name "UUID": unsupported Scan, storing driver.Value type \u003cnil\u003e into type *string: sql: Scan error on column index 0, name "UUID": unsupported Scan, storing driver.Value type \u003cnil\u003e into type *string","code":13,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Internal Server Error","error_details":"Failed to list jobs.: List jobs failed.: List data model failed.: InternalServerError: Failed to list jobs: sql: Scan error on column index 0, name "UUID": unsupported Scan, storing driver.Value type \u003cnil\u003e into type *string: sql: Scan error on column index 0, name "UUID": unsupported Scan, storing driver.Value type \u003cnil\u003e into type *string"}]}
Raw error from the service: Failed to list jobs.: List jobs failed.: List data model failed.: InternalServerError: Failed to list jobs: sql: Scan error on column index 0, name "UUID": unsupported Scan, storing driver.Value type into type *string: sql: Scan error on column index 0, name "UUID": unsupported Scan, storing driver.Value type into type *string (code: 13)

Show results of the entire workflow in one view

Today the user needs to click on every stop of the workflow to see it's output. This is not convenient for complex reusable workflows. We need to support a mode when the user can see the output of the entire graph in a single feed.

Persist pod logs after they finish

This is needed so that pods can be garbage collected after they've finished, and to remove the dependency on the cluster state.

Currently, we're already seeing issues when a cluster is resized where the frontend can't find pods started by some of the runs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.