Coder Social home page Coder Social logo

h2o-kubeflow's Introduction

H2O + Kubeflow Integration

This is a project for the integration of H2O.ai and Kubeflow. The integration of H2O and Kubeflow is an extremely powerful opportunity, as it provides a turn-key solution for easily deployable and highly scalable machine learning applications, with minimal input required from the user.

Kubeflow

Kubeflow is an open source project managed by Google and built on top of their Kubernetes engine. It is designed to alleviate some of the more tedious tasks associated with machine learning. Kubeflow helps orchestrate deployment of apps through the full cycle of development, testing, and production, and allows for resource scaling as demand increases.

H2O 3

H2O 3’s goal is to reduce the time spent by data scientists on time-consuming tasks like designing grid search algorithms and tuning hyperparameters, while also providing an interface that allows newer practitioners an easy foothold into the machine learning space.

Driverless AI

Driverless AI is an artificial intelligence (AI) platform for automatic machine learning. Driverless AI automates some of the most difficult data science and machine learning workflows such as feature engineering, model validation, model tuning, model selection and model deployment. It aims to achieve highest predictive accuracy, comparable to expert data scientists, but in much shorter time thanks to end-to-end automation. Driverless AI also offers automatic visualizations and machine learning interpretability (MLI).

Contents

This repository contains all the necessary components for deploying H2O.ai's core products on Kubeflow

h2o-kubeflow
|-- dockerfiles
    |-- A copy of dockerfiles that will are currently part of components in POC
|-- h2o-kubeflow // --> Ksonnet registry containing all packages offered in this repo
    |-- h2oai
        |-- Ksonnet package containing deployment templates for core offerings from H2O.ai [H2O-3, Driverless AI]
    |-- <all other package directories>
        |-- Ksonnet packages built as a proof of concept. Not consistently maintained
    |-- registry.yaml // --> file defining all packages included in the registry

Quick Start

Complete deployment steps can be found inside this directory: https://github.com/h2oai/h2o-kubeflow/tree/master/h2o-kubeflow/h2oai.

Repository for Kubeflow can be found here, and complete steps to deploy Kubeflow can be found in their User Documentation

You will also need ksonnet and kubectl command line tools.

  • Create a Kubernetes cluster. Either on-prem or on Google Cloud
  • Run the following commands to setup your ksonnet app (how you deploy Kubeflow)

NOTE: Kubeflow is managed by Google's Kubeflow team, and some of the commands to deploy Kubeflow's core components may change. Refer to https://www.kubeflow.org/docs/started/getting-started/ for comprehensive steps to launch Kubeflow. The H2O Components are not dependent on Kubeflow running to be able to be deployed, but will benefit from Kubeflow's core functionality. It is recommended that you launch Kubeflow prior to starting the H2O deployments, but is not required.

# create ksonnet app
ks init <my_ksonnet_app>
cd <my_ksonnet_app>

# add ksonnet registry to app containing all the kubeflow manifests as maintained by Google Kubeflow team
ks registry add kubeflow https://github.com/kubeflow/kubeflow/tree/master/kubeflow
# add ksonnet registry to app containing all the h2o component manifests
ks pkg install h2o-kubeflow/h2oai

# create namespace and environment for deployments
kubectl create namespace kubeflow
ks env add <my_environment_name>
ks prototype use io.ksonnet.pkg.h2oai-h2o3 h2o3 \
--name h2o3 \
--namespace kubeflow \
--memory 2 \
--cpu 1 \
--replicas 2 \
--model_server_image <location_of_docker_image>

ks apply <my_environment_name> -c h2o3
  • run kubectl get svc -n kubeflow to find the External IP address.
  • Open a jupyter notebook on a local computer that has H2O installed locally.
import h2o
h2o.init(port="<External IP address>", port=54321)
  • You can now follow the steps for running H2O 3 AutoML that can be found here

Burst to Cloud (NOT CONSISTENTLY MAINTAINED)

If you are interested in additional orchestration, follow the following steps to setup a Kubernetes cluster. This walkthrough will setup a Kubernetes cluster with the ability to scale with the demand of additional resources.

Note: This is a prototype and will continue to be changed/modified as time progresses.

  1. Start a machine with Ubuntu 16.04. This can be On-Premise or in the cloud
  2. Copy all the scripts from the scripts folder in this repo to the machine
  3. Move deployment-status.service and deployment-status.timer to /etc/systemd/system/ and enable the services.
sudo mv deployment-status.service /etc/systemd/system/
sudo mv deployment-status.timer /etc/systemd/system/
sudo systemctl enable deployment-status.service deployment-status.timer
sudo systemctl start deployment-status.service deployment-status.timer
  1. Move deployment-status.sh, k8s_master_setup.sh and k8s_slave_setup.sh to a new directory /opt/kubeflow/
sudo mkdir /opt/kubeflow
sudo mv k8s_master_setup.sh /opt/kubeflow/
  1. Run sudo /opt/kubeflow/k8s_master_setup.sh. This script will modify k8s_slave_setup.sh with the necessary commands to connect any other machines Ubuntu 16.04 to the Kubernetes cluster
  2. Run the new k8s_slave_setup.sh on any other machines you want to connect to the cluster
  3. k8s_slave_setup.sh will also create a new file called config.txt in /opt/kubeflow/ modify the final line KSONNET_APP to include the relative file path to the file created by ks init: /home/ubuntu/my_ksonnet_app --> use KSONNET_APP=my_ksonnet_app
  4. Use kubectl get nodes to ensure that all nodes are attached properly to the cluster
  5. Follow above steps to deploy H2O on Kubeflow + Kubernetes

h2o-kubeflow's People

Contributors

fjudith avatar nkpng2k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

h2o-kubeflow's Issues

Documentation Update:

Hello,

The README.md use the master branch of kubeflow. There as per v0.2.0 change, the following command is not required anymore as tf-hub have been merged to kubeflow/core.

ks pkg install kubeflow/tf-job

Using kubeflow v0.1.x requires the following commands

KUBEFLOW_VERSION=v0.1.3
KUBEFLOW_REGISTRY="github.com/kubeflow/kubeflow/tree/${KUBEFLOW_VERSION}/kubeflow"

ks pkg install kubeflow/core@${KUBEFLOW_VERSION}
ks pkg install kubeflow/tf-serving@${KUBEFLOW_VERSION}
ks pkg install kubeflow/tf-job@${KUBEFLOW_VERSION}

Startup/flatfile generation is broken with the latest kubectl

In kubectl 1.13.1, "kubectl get pods -o wide" returns two extra columns "NOMINATED NODE" and "READINESS GATES" both of which may contain "". This breaks docker-startup.sh in a way that it never starts:

if $(kubectl get pods -o wide | grep -q "")
then
sleep 5

It should also include "grep $DEP_NAME" to focus on h2o pods only.

Another issue related to filtering is that the flatfile may catch any old pods stuck in the "Terminated" state due to the second filter not being specific enough. It should include something like "grep Running".

can't install h2o-kubeflow/h2oai : Error resolve registry library. blob size and Git Data API

On kubeflow 0.4.1

ks pkg install h2o-kubeflow/h2oai

ERROR resolve registry library: GET https://api.github.com/repos/h2oai/h2o-kubeflow/contents/h2o-kubeflow/h2oai/dockerfiles/DAIMojoRestServer4-1.11.1.jar?ref=48ea56153149c036a75e47cb9f58375f4032f193: 403 This API returns blobs up to 1 MB in size. The requested blob is too large to fetch via the API, but you can use the Git Data API to request blobs up to 100 MB in size. [{Resource:Blob Field:data Code:too_large Message:}] 

Issue with more than one h2o3 replica

Hello,

I'm getting the following error when i try to apply the AutoML tutorial.
I suspect it is all about client session stickyness to h2o3 instances, because it does not happen when i'm using only one replica.

image

Stuck at notebook deployment

Hello,

I'm trying to follow your guide on a local cluster running CoreOS and Kubernetes 1.10.0.

I'm currently stucked with de deployment of the notebook from the jupyterhub web interface.

Bellow are the logs from the tf-hub-0 pod.

 [I 2018-04-04 20:40:57.308 JupyterHub log:122] 302 POST /hub/spawn → /user/admin/ ([email protected]) 10269.41ms
[I 2018-04-04 20:40:57.326 JupyterHub log:122] 302 GET /user/admin/ → /hub/user/admin/ (@127.0.0.1) 2.63ms
[I 2018-04-04 20:41:07.356 JupyterHub base:722] Pending spawn for admin didn't finish in 10.0 seconds
[I 2018-04-04 20:41:07.357 JupyterHub base:727] admin is pending spawn
[I 2018-04-04 20:41:07.380 JupyterHub log:122] 200 GET /hub/user/admin/ ([email protected]) 10032.90ms
[W 2018-04-04 20:41:18.158 JupyterHub user:458] admin's server never showed up at http://10.2.74.46:8888/user/admin/ after 30 seconds. Giving up
[I 2018-04-04 20:41:23.669 JupyterHub base:722] Pending spawn for admin didn't finish in 10.0 seconds
[I 2018-04-04 20:41:23.670 JupyterHub base:727] admin is pending stop 

Similar logs are returned by Kubernetes 1.9.x

running a curl loop from tf-hub-0 to http://10.2.74.46:8888/user/admin shows that connection is refused.

root@tf-hub-0:/# until curl http://10.2.74.46:8888/user/admin; do sleep 1 ; done
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
curl: (7) Failed to connect to 10.2.74.46 port 8888: Connection refused
### POD gets deleted after 30s ###
curl: (7) Failed to connect to 10.2.74.46 port 8888: No route to host
curl: (7) Failed to connect to 10.2.74.46 port 8888: No route to host
curl: (7) Failed to connect to 10.2.74.46 port 8888: No route to host
curl: (7) Failed to connect to 10.2.74.46 port 8888: No route to host
curl: (7) Failed to connect to 10.2.74.46 port 8888: No route to host
curl: (7) Failed to connect to 10.2.74.46 port 8888: No route to host
curl: (7) Failed to connect to 10.2.74.46 port 8888: No route to host
curl: (7) Failed to connect to 10.2.74.46 port 8888: No route to host
curl: (7) Failed to connect to 10.2.74.46 port 8888: No route to host

Logs from the jupyter-admin pod

 usermod: no changes
Execute the command as jovyan
[W 2018-04-04 21:02:26.963 SingleUserNotebookApp configurable:168] Config option `open_browser` not recognized by `SingleUserNotebookApp`.  Did you mean `browser`? 

h2o3-static : 404 Not Found

ks pkg install h2o-kubeflow3/h2o3-static
shows :
ERROR resolve registry library: GET https://api.github.com/repos/h2oai/h2o-kubeflow/contents//h2o3-static?ref=7d3b278732d4c73dc15be61551a169f0ba7f63f6: 404 Not Found []

When I am trying to hit this url, it's nothing there.
May be url should be :
https://api.github.com/repos/h2oai/h2o-kubeflow/contents/h2o-kubeflow/h2o3-static
Any help??

h2o3-static: ERROR No prototypes names matched 'io.ksonnet.pkg.h2o3-static'

Hello,

I'm facing the following error message when applying the recent document update.

~/h2oworkspace/kubeflow-demo# KS_ENV=local
~/h2oworkspace/kubeflow-demo# ks apply ${KF_ENV} -c h2o3-static -n kubeflow
~/h2oworkspace/kubeflow-demo# ERROR No prototype names matched 'io.ksonnet.pkg.h2o3-static'
~/h2oworkspace/kubeflow-demo# tree
.
├── app.yaml
├── components
│   ├── kubeflow-core.jsonnet
│   └── params.libsonnet
├── environments
│   ├── base.libsonnet
│   ├── default
│   │   ├── main.jsonnet
│   │   └── params.libsonnet
│   └── local
│       ├── main.jsonnet
│       └── params.libsonnet
├── lib
│   └── v1.7.0
│       ├── k8s.libsonnet
│       ├── k.libsonnet
│       └── swagger.json
└── vendor
    └── kubeflow
        ├── core
        │   ├── all.libsonnet
        │   ├── ambassador.libsonnet
        │   ├── centraldashboard.libsonnet
        │   ├── cert-manager.libsonnet
        │   ├── cloud-endpoints.libsonnet
        │   ├── iap.libsonnet
        │   ├── jupyterhub.libsonnet
        │   ├── kubeform_spawner.py
        │   ├── nfs.libsonnet
        │   ├── parts.yaml
        │   ├── prototypes
        │   │   ├── all.jsonnet
        │   │   ├── cert-manager.jsonnet
        │   │   ├── cloud-endpoints.jsonnet
        │   │   └── iap-ingress.jsonnet
        │   ├── README.md
        │   ├── spartakus.libsonnet
        │   ├── tests
        │   │   ├── ambassador_test.jsonnet
        │   │   ├── iap_test.jsonnet
        │   │   ├── jupyterhub_test.jsonnet
        │   │   ├── nfs_test.jsonnet
        │   │   ├── spartakus_test.jsonnet
        │   │   ├── tf-job_test.jsonnet
        │   │   └── util_test.jsonnet
        │   ├── tf-job.libsonnet
        │   ├── util.libsonnet
        │   ├── version-info.json
        │   └── version.libsonnet
        └── tf-serving
            ├── parts.yaml
            ├── prototypes
            │   └── tf-serving-all-features.jsonnet
            ├── README.md
            ├── tf-serving.libsonnet
            └── util.libsonnet

13 directories, 43 files

unable to pull prototype : core

Hi,

when I tried to configure ks,kubeflow and h2o got stuck in between due to below error,
$ks generate core kubeflow-core --name=kubeflow-core --namespace=kubeflow-test
ERROR no prototype names matched 'core'

I did not find any prototype : core, at all
[root@k8smastr-vanaraj sibin-ml]# ks prototype list
NAME DESCRIPTION
==== ===========
io.ksonnet.pkg.ambassador Ambassador
io.ksonnet.pkg.centraldashboard centraldashboard
io.ksonnet.pkg.cert-manager Certificate generation on GKE.
io.ksonnet.pkg.cloud-endpoints Cloud Endpoint domain creation.
io.ksonnet.pkg.configMap A simple config map with optional user-specified data
io.ksonnet.pkg.deployed-service A deployment exposed with a service
io.ksonnet.pkg.echo-server A simple echo server.
io.ksonnet.pkg.google-cloud-filestore-pv Creates PV and PVC based on Google Cloud Filestore NFS
io.ksonnet.pkg.h2o3-static H2O3 Static Cluster
io.ksonnet.pkg.iap-ingress Ingress for IAP on GKE.
io.ksonnet.pkg.jupyterhub jupyterhub Component
io.ksonnet.pkg.metric-collector Service monitor for kubeflow on GCP.
io.ksonnet.pkg.namespace Namespace with labels automatically populated from the name
io.ksonnet.pkg.prometheus Prometheus Service.
io.ksonnet.pkg.single-port-deployment Replicates a container n times, exposes a single port
io.ksonnet.pkg.single-port-service Service that exposes a single port
io.ksonnet.pkg.spartakus spartakus component for usage collection
io.ksonnet.pkg.tensorboard ksonnet components for Tensorboard
io.ksonnet.pkg.tf-job-operator A TensorFlow job operator.
io.ksonnet.pkg.tf-serving A TensorFlow serving deployment

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.