Coder Social home page Coder Social logo

ashahab / fairing Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wbuchwalter/fairing

0.0 2.0 0.0 3.01 MB

๐Ÿ‘ฉโ€๐Ÿ”ฌ[Experimental] Easily train and serve ML models on Kubernetes, directly from your python code.

License: MIT License

Makefile 0.94% Python 99.06%

fairing's Introduction

Fairing

Easily train and serve ML models on Kubernetes, directly from your python code.

This projects uses Metaparticle behind the scene.

fairing allows you to express how you want your model to be trained and served using native python decorators.

Table of Contents

Requirements

If you are going to use fairing on your local machine (as opposed to from a Jupyter Notebook deployed inside a Kubernetes cluster for example), you will need to have access to a deployed Kubernetes cluster, and have the kubeconfig for this cluster on your machine.

You will also need to have docker installed locally.

Getting fairing

Note: This projects requires python 3

pip install fairing

Or, in a Jupyter Notebook, create a new cell and execute: !pip install fairing.

Training

fairing provides a @Train class decorator allowing you to specify how you want your model to be packaged and trained.
Your model needs to be defined as a class to work with fairing.

This limitation is needed in order to enable usage of more complex training strategies and simplify usage from within a Jupyter Notebook.

Following are a series of example that should help you understand how fairing works.

Simple Training

Your class needs to define a train method that will be called during training:

from fairing.train import Train

@Train(repository='<your-repo-name>')
class MyModel(object):
    def train(self):
      # Training logic goes here

Complete example: examples/simple-training/main.py

Hyperparameters Tuning

Allows you to run multiple trainings in parallel, each one with different values for your hyperparameters.

Your class should define a hyperparameters method that returns an dictionary of hyperparameters and their values. This dictionary will be automatically passed to your train method. Don't forget to add a new argument to your train method to received the hyperparameters.

from fairing.train import Train
from fairing.strategies.hp import HyperparameterTuning

@Train(
    repository='<your-repo-name>',
    strategy=HyperparameterTuning(runs=6),
)
class MyModel(object):
    def hyperparameters(self):
      return {
        'learning_rate': random.normalvariate(0.5, 0.45)
      }

    def train(self, hp):
      # Training logic goes here

To specify that we wanted to train our model using hyperparameters tuning, and not just a simple training, we passed a new strategy parameter to the @Train decorator, and specified the number of runs we wish to be created.

Complete example: examples/hyperparameter-tuning/main.py

Population Based Training

We can also ask fairing to train our code using Population Based Training.

This is a more advanced training strategies that needs hook into different lifecycle steps of your model, thus we need to define several additional method into our model class.

A multiple read/write PVC name needs to be pass to the PopulationBasedTraining strategie. This is used to store and exchange the different models generated by our training to enable the explore/exploit mechanism of Population Based Training.

from fairing.train import Train
from fairing.strategies.pbt import PopulationBasedTraining

@Train(
    repository='<your-repo-name>',
    strategy=PopulationBasedTraining(
        population_size=10,
        exploit_count=6,
        steps_per_exploit=5000,
        pvc_name='<pvc-name>',
        model_path = MODEL_PATH
    )
)
class MyModel(object):
    def hyperparameters(self):
      # returns the dictionary of hyperparameters
    
    def build(self, hp):
      # build the model
    
    def train(self, hp):
      # training logic
    
    def save(self):
      # save the model at MODEL_PATH
    
    def restore(self, model_path):
      # restore the model from MODEL_PATH

Complete example: examples/population-based-training/main.py

Usage with Kubeflow

Simple TfJob

Instead of creating native Jobs, fairing can leverage Kubeflow's TfJobs assuming you have Kubeflow installed in your cluster. Simply pass the Kubeflow architecture to the train decorator (note that you can still use all the training strategies mentionned above):

from fairing.train import Train
from fairing.architectures.kubeflow.basic import BasicArchitecture

@Train(repository='wbuchwalter', architecture=BasicArchitecture())
class MyModel(object):
    def train(self):
       # training logic

Distributed Training

Using Kubeflow, we can also ask fairing to start distributed trainings instead. Simply import DistributedTraining architecture insteda of the BasicArchitecture:

from fairing.train import Train
from fairing.architectures.kubeflow.distributed import DistributedTraining

@Train(
    repository='<your-repo-name>',
    architecture=DistributedTraining(ps_count=2, worker_count=5),
)
class MyModel(object):
    ...

Specify the number of desired parameter servers with ps_count and the number of workers with worker_count. Another instance of type master will always be created.

See https://github.com/Azure/kubeflow-labs/tree/master/7-distributed-tensorflow#modifying-your-model-to-use-tfjobs-tf_config to understand how you need to modify your model to support distributed training with Kubeflow.

Complete example: examples/distributed-training/main.py

From a Jupyter Notebook

To make fairing work from a Jupyter Notebook deployed with Kubeflow, a few more requirements are needed (such as Knative Build deployed). Refer to the dedicated documentation and example.

TensorBoard

You can easily attach a TensorBoard instance to monitor your training:

@Train(
    repository='<your-repo-name>',
    tensorboard={
      'log_dir': LOG_DIR,
      'pvc_name': '<pvc-name>',
      'public': True # Request a public IP
    }
)
class MyModel(object):
    ...

fairing's People

Contributors

wbuchwalter avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.