Demo the work of MLRun with Github based projects and automated CI/CD
Using the Iris dataset and XGBoost Classification with Hyper-Parameters search.
The following examples demonstrate complete machine learning pipelines which include data collection, data preparation, model training and automated deployment.
The examples demonstrate how you can:
- Run pipelines locally on a notebook.
- Run some or all tasks on an elastic Kubernetes cluster using serverless functions/jobs.
- Create automated ML workflows using KubeFlow Pipelines.
- Maintain project lifecycle
The demo applications are tested on the Iguazio's Data Science PaaS,
and use Iguazio's shared data fabric (v3io), and can be modified to work with any shared file storage by replacing the
apply(v3io_mount())
calls with other KubeFlow volume modifiers (e.g. apply(mlrun.platforms.mount_pvc())
) .
You can request a free trial of Iguazio PaaS.
Pre-requisites:
- A Kubernetes cluster with pre-installed KubeFlow, Nuclio.
- MLRun Service and UI installed, see MLRun readme.
- Clone this repo to your own Git.
- in a client or notebook properly configured with MLRun and KubeFlow run:
mlrun project my-proj/ -u git://github.com/<your-fork>/demo-xgb-project.git
-
Run the local playground notebook to build, test, and run functions.
-
Open the project notebook and follow the instructions to run an automated ML Pipeline and source control.
Note: alternatively you can run the
main
pipeline from the CLI and specify artifacts path using:
mlrun project my-proj/ -r main -p "v3io:///users/admin/kfp/{{workflow.uid}}/"
- Project Notebook (load and run workflows)
- Project spec (functions, workflows, etc)
- Local function spec (XGboost)
- Function Notebook (code, test, build, run)
- Function code (generated from notebook)
- Workflow code (init + dsl)