Coder Social home page Coder Social logo

rushins / kubeflow_jupyter_pipeline Goto Github PK

View Code? Open in Web Editor NEW

This project forked from netapp/kubeflow_jupyter_pipeline

0.0 1.0 0.0 56 KB

Example Kubeflow Pipeline definitions and Jupyter Notebooks that show how NetApp data management functions can be performed using Kubeflow and Jupyter.

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 67.39% Python 32.61%

kubeflow_jupyter_pipeline's Introduction

Kubeflow and Jupyter Examples

This repository contains example Kubeflow pipeline definitions and Jupyter Notebooks that show how NetApp data management functions can be incorporated by data scientists and data engineers using Kubeflow and Jupyter. The example Kubeflow pipeline definitions show how NetApp data management functions can be incorporated as steps in automated Kubeflow pipeline workflows. The example Jupyter Notebooks show how NetApp data management functions can be performed on demand by data scientists or developers working within a Jupyter Notebook environment. For comprehensive documentation on NetApp's Kubeflow and Jupyter integrations, refer to TR-4798.

Kubeflow Pipeline Definitions

Note: All example Kubeflow pipeline definitions are in the 'Pipelines' folder.

ai-training-run.py

  • Description: Python script that creates a Kubeflow pipeline definition for an AI/ML model training run with built-in, near-instantaneous, dataset and model versioning and traceability. This is intended to demonstrate how a data scientist could define an automated AI/ML workflow that incorporates automated dataset and model versioning and traceability.
  • Instructions for Use: When you execute this script, it will produce a Kubeflow pipeline definition in the form of a YAML file that can then be uploaded to the Kubeflow dashboard. Whenever you then execute the pipeline from the Kubeflow dashboard or via a pre-scheduled run, a snapshot of the volume that contains the dataset and a snapshot of the volume that contains the model will be triggered for versioning and traceability purposes. For detailed documentation, refer to section 7.4 in TR-4798.
  • Dependencies: The following Python modules are required in order to execute the script (these can be installed with pip) - kfp, Kubernetes, netapp_ontap
  • Compatibility: This example pipeline only supports volumes that reside on NetApp ONTAP storage systems or software-defined instances.

create-data-scientist-workspace.py

  • Description: Python script that creates a Kubeflow pipeline definition for a workflow that can be used to near-instantaneously clone potentially massive datasets for use in a developer workspace. This is intended to demonstrate how a data scientist or data engineer could define an automated AI/ML workflow that incorporates the rapid cloning of datasets for use in workspaces, etc.
  • Instructions for Use: When you execute this script, it will produce a Kubeflow pipeline definition in the form of a YAML file that can then be uploaded to the Kubeflow dashboard. Whenever you then execute the pipeline from the Kubeflow dashboard or via a pre-scheduled run, a clone of the volume that contains the dataset will be created and instructions for subsequently provisioning a Jupyter Notebook workspace with access to the new clone will be printed to the logs. For detailed documentation, refer to section 7.5 in TR-4798.
  • Dependencies: The following Python modules are required in order to execute the script (these can be installed with pip) - kfp
  • Compatibility: This example pipeline is not compatible with NetApp FlexGroup volumes. At the time of this posting, FlexGroup volumes must be cloned by using ONTAP System Manager, the ONTAP CLI, or the ONTAP API, and then imported into the Kubernetes cluster. For details about importing a volume using Trident, see section 8.1 in TR-4798.

replicate-data-cloud-sync.py

  • Description: Python script that creates a Kubeflow pipeline definition for a workflow that can be used to trigger a Cloud Sync replication update (for an existing Cloud Sync relationship). This is intended to demonstrate how a data scientist or data engineer could define an automated AI/ML workflow that incorporates Cloud Sync for data movement between platforms (e.g. NFS, S3) and/or across environments (e.g. edge data center, core data center, private cloud, public cloud).
  • Instructions for Use: When you execute this script, it will produce a Kubeflow pipeline definition in the form of a YAML file that can then be uploaded to the Kubeflow dashboard. Whenever you then execute the pipeline from the Kubeflow dashboard or via a pre-scheduled run, a Cloud Sync replication update will be triggered. For detailed documentation, refer to section 7.7 in TR-4798.
  • Dependencies: The following Python modules are required in order to execute the script (these can be installed with pip) - kfp, kubernetes, requests
  • Compatibility: This example pipeline is compatible with any existing Cloud Sync relationship.

replicate-data-snapmirror.py

  • Description: Python script that creates a Kubeflow pipeline definition for a workflow that can be used to trigger an asynchronous SnapMirror replication update. This is intended to demonstrate how a data scientist or data engineer could define an automated AI/ML workflow that incorporates SnapMirror replication for data movement across sites.
  • Instructions for Use: When you execute this script, it will produce a Kubeflow pipeline definition in the form of a YAML file that can then be uploaded to the Kubeflow dashboard. Whenever you then execute the pipeline from the Kubeflow dashboard or via a pre-scheduled run, a SnapMirror replication update will be triggered via Ansible. For detailed documentation, refer to section 7.6 in TR-4798.
  • Dependencies: The following Python modules are required in order to execute the script (these can be installed with pip) - kfp, kubernetes, ansible, netapp-lib
  • Compatibility: This example pipeline only supports volumes that reside on NetApp ONTAP storage systems or software-defined instances.

Jupyter Notebooks

Note: All example Jupyter Notebooks are in the 'Notebooks' folder.

Cloud-Sync.ipynb

  • Description: Jupyter Notebook containing Python code that can be used to trigger a Cloud Sync replication update (for an existing Cloud Sync relationship). This is intended to demonstrate how a data scientist could trigger a Cloud Sync replication update within the interactive Jupyter Notebook environment, without having to access another, potentially unfamiliar, tool.
  • Instructions for Use: Simply upload to a workspace that supports Jupyter Notebooks. For detailed documentation, refer to section 7.3 in TR-4798.
  • Dependencies: The following Python modules are required in order to execute the code contained within the Notebook (these can be installed with pip) - requests
  • Compatibility: This example notebook is compatible with any existing Cloud Sync relationship.

SnapMirror.ipynb

  • Description: Jupyter Notebook containing Python code that can be used to trigger an asynchronous SnapMirror replication update via Ansible. This is intended to demonstrate how a data scientist could trigger a SnapMirror replication update within the interactive Jupyter Notebook environment, without having to access another, potentially unfamiliar, tool.
  • Instructions for Use: Simply upload to a workspace that supports Jupyter Notebooks.
  • Dependencies: The following Python modules are required in order to execute the code contained within the Notebook (these can be installed with pip) - ansible, netapp-lib
  • Compatibility: This example notebook only supports volumes that reside on NetApp ONTAP storage systems or software-defined instances.

Snapshot.ipynb

  • Description: Jupyter Notebook containing Python code that can be used to trigger a snapshot for near instantaneous dataset or model versioning and/or traceability. This is intended to demonstrate how a data scientist could implement dataset or model versioning and/or traceability via snapshots from within the interactive Jupyter Notebook environment, without having to access another, potentially unfamiliar, tool.
  • Instructions for Use: Simply upload to a workspace that supports Jupyter Notebooks. For detailed documentation, refer to section 7.2 in TR-4798. You may also want to review this blog post.
  • Dependencies: The following Python modules are required in order to execute the code contained within the Notebook (these can be installed with pip) - netapp_ontap
  • Compatibility: This example notebook only supports volumes that reside on NetApp ONTAP storage systems or software-defined instances.

Disclaimers

c) 2019 NetApp Inc. (NetApp), All Rights Reserved

NetApp disclaims all warranties, excepting NetApp shall provide support of unmodified software pursuant to a valid, separate, purchased support agreement. No distribution or modification of this software is permitted by NetApp, except under separate written agreement, which may be withheld at NetApp's sole discretion.

THIS SOFTWARE IS PROVIDED BY NETAPP "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

kubeflow_jupyter_pipeline's People

Contributors

mboglesby avatar unnatural940 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.