Coder Social home page Coder Social logo

clipper-v0's People

Contributors

dcrankshaw avatar giulio-zhou avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clipper-v0's Issues

Provide extra support for Scikit-Learn models

Scikit-Learn models should be extremely simple to deploy in Clipper. I have two ideas in mind to help with this.

First, we should write a little Python library that can be imported into a Jupyter notebook that looks something like this:

>>> import clipper_lib
>>> mymodel = svm.SVC()
>>> mymodel.fit(X,Y)
>>> clipper_config = ClipperConf()
>>> cl = clipper.start(clipper_config, name="pytest")
>>> cl.status()
>>> cl.stop()
>>> cl.add_model(mymodel, name="model1")
>>> print(cl)
>>> cl2.connect(clipper_config)
"Clipper running at 127.0.0.1:1337 serving the pytest application"

The idea with the library is to automatically deploy Clipper and add models to a running Clipper instance directly from a Jupyter notebook.

Second, we should provide a script to automatically deploy Clipper and all dependencies (including model wrappers) that just needs to be pointed to a serialized Scikit-Learn model (or models).

Possible API functionality

  • Start
  • stop
  • restart
  • connect
  • deploy model
  • update model (include options for how to deal with cache, correction policies)
  • set_config parameter
  • status (hierarchical descriptions of model, include current params)
  • metrics

Python FFI bindings for external query interface

For some types of ML serving applications, a highly concurrent REST API doesn't make sense as the primary serving API. Particularly for reinforcement learning, being able to query Clipper from a Python library directly is much easier. Add a blocking external query interface with Python FFI bindings.

Clean shutdown of running Clipper instance

Currently Clipper is stopped by killing the process and letting the OS clean up everything. Instead, there are several resources that should be cleanly shutdown (e.g. terminating TCP connections to model wrappers). Using Ctrl-C and a signal handler is probably the simplest way to trigger shutdown. Once a shutdown is triggered, the REST API should stop receiving requests but allow all existing requests to finish. Then we need to signal all child threads to stop. I think most of the threads are communicating with each other through an mpsc and message passing, so I think we should be able to signal termination through disconnecting the mpsc for the most part. The metrics thread needs special-case handling, and a few others might need that as well.

My hope is that this will allow the model-wrapper RPC servers to detect a client has disconnected and start listening for a new connection.

Update terminology to match the paper

We should standardize terminology around the terms used in the Clipper paper. E.g., features -> models/model wrappers, tasks -> correction policies.

Estimating distributional shift in input data

For monitoring, it would be useful if we could actively monitor the distribution of the input data and determine if/when there is a distributional shift. Some potential strategies:

  • Examine the distributions of input data conditioned on their predicted class
  • Clustering on the inputs/distribution of outputs
  • Images can potentially be clustered by examining their vector embedding (via the hidden layer/output of some deep network) but might be slow
  • Text clustering using K-means or hierarchical clustering - maybe too specialized
  • Mann–Whitney U test

To start, we should implement a logging mechanism for input data.

Monitoring the distribution of predicted y-values

Using the Clipper system metrics, we would like to track the distribution of predicted y-values across tasks such as binary classification, multiclass classification and regression. This can be done (for binary and multiclass classification) by creating a fixed-bucket histogram within Clipper metrics. For now, we will not focus on regression.

Automate the Grafana setup process

Minimize the number of steps needed for users to initialize and access Grafana for metrics visualization. Ideally, the user should simply be able to navigate to a local URL and immediately be presented with relevant visualizations.

Update libsvm/liblinear wrappers for improved portability

Clipper should move the wrapper package back into the clipper-server code. We can then create liblinear-sys and libsvm-sys packages that will compile the C libraries from source rather than assuming they are available in /usr/local/lib.

Variable latency predictions

Rather than having one static latency objective, Clipper should support a latency objective associated with each prediction request.

Unify Clipper deployment

Currently Clipper deployment relies on having a running Redis instance and model wrappers. It would be good to have a unified script that launches and shuts down all of them together.

Allow new users to be added dynamically at runtime

Right now the Correction Model Table is statically allocated when Clipper is initialized and new users cannot be added to the running system. This was just for ease of programming, but needs to be fixed so that the correction model table is growable.

Tracking Prediction Error

For active monitoring of system and model performance, we would like to track prediction error with respect to all of the data and feedback; as the system runs, we will accumulate more data/feedback for validation. Training on the latest window of feedback, we will evaluate on the entire set of training data/feedback.

Update metrics tracking

In the course of refactoring, some metrics reporting got lost. This should be fixed and updated.

Add frontend RPC interface

The current REST interface is pretty slow. We should provide an equivalent RPC interface for higher performance.

Create end-to-end example

Before release, we should create an end-to-end example demonstrating how to use Clipper with an existing web-server (Nginx probably) in a full cluster setup on AWS.

Rewrite model wrapper RPC system to make Clipper listen for incoming connections

The model wrapper RPC system treats the model wrappers as long-lived RPC servers. When a Clipper instance is started it requests a long-lived TCP connection to each of the model wrappers.

This decision was made because the model wrappers are logical servers: they listen for incoming RPC requests and return responses. However, as we start to think of Clipper in the context of long-running serving clusters, it's clear that Clipper is actually a much longer-lived resource than the model wrappers. New models will be added and existing models updated frequently within the uptime of a single Clipper instance. Furthermore, forcing each model wrapper to listen on a different port could potentially lead to running out of ports.

Instead, per @jegonzal's suggestion, let's reverse the connection direction and have Clipper listen for new model wrapper connections on a known port number and tell model wrappers where to find Clipper at runtime. As we start to move model wrappers into Docker containers, this information can be provided through an environment variable.

External query functionality

Right now, Clipper does everything besides actually receive and respond to external queries. It's not a very useful prediction-server without that functionality though. I think I can use the new Hyper async-io mechanism that was recently merged into master to implement Clipper's weird partially event-driven query-processing.

Components of this PR

  • Basic integration with Hyper
  • Add way to POST inputs
  • Error handling for malformed, broken requests
  • Support for prepopulating user models from training data
  • Implement online updates

Refactor RPC system

The RPC system in place was just enough to run benchmarks for the paper. I need to add support for strings, variable length arrays, byte arrays. The easiest way to do this is probably to not roll my own RPC system. This is a matter of adding more general-purpose serialization (protobuf?) and adding support for variable length inputs.

More broadly, I'd like to factor out the actual communication mechanism from the batching layer so that we can support more than implementation at once. This will allow for single-node IPC when everything is on the same box and specialized RPC implementations for performance if necessary.

Testing and basic documentation

I want to do some minor refactoring to clean up the code organization and make it slightly more modular. This will help with both #1 and make it easier to extend in the future. In doing this refactor, it would be good to add documentation and unit tests to the existing code.

Detect when a model wrapper dies

It's possible for a model wrapper to die. At a minimum, the Clipper PredictionBatcher should detect this and stop sending requests. It also shouldn't panic! and cause errors in the rest of Clipper.

Support notification when a prediction cache entry gets filled

When we perform online updates to the correction model, we don't have the same SLOs and so we train with all available model predictions for best accuracy. We need some way to determine when the model predictions for the new piece of feedback are available. To avoid having to poll the cache, I want to implement a notification mechanism.

Management Utility Proposal

Developers need some way of managing and administering a Clipper instance.

Proposal:

A running Clipper instance will expose an admin endpoint on a different port number that admin commands can be executed against. Accompanying this, we will provide a python library that can either be imported (e.g. into a Jupyter notebook) or used from the command line to aid administration. The Python library will include some special functionality to help users serve Scikit-Learn and PySpark models as easily as possible.

Partial list of commands

class ClipperManager:

  @classmethod
  def start(cls, conf)
    """Returns a new ClipperManager object"""

  @classmethod
  def connect(cls, address)
    """Connects to an already running Clipper instance. Returns a new Clipper object."""

  def restart(self)
    """
    Cleanly restarts Clipper (including emptying the caches) but does not touch the
    running model wrappers.
    """

  def add_model(self, model_conf)
    """Tell Clipper about a new, running, model wrapper"""

  def add_replica(self, name, location)
    """
    A single offline model in Clipper can have many replicas.
    This methods informs Clipper about a new replica for an existing model.
    This will throw an exception if name does not match the names of any existing
    models in Clipper.
    """

  def deploy_model(self, name, wrapper_executable, num_replicas=1, extra_cl_args=None)
    """
    Launch a new model wrapper and point it at the model_file. Assumes the model-wrapper is
    a self-contained executable. Any additional arguments (e.g. model file location) must be provided
    as part of the extra_cl_args argument.     
    """

  def deploy_spark_model(self, name, model_type, model_file)
    """Starts a model wrapper for this serialized Spark model"""

  def deploy_sklearn_model(self, model, **kwargs)
    """Take an in-memory scikit-learn model and create and start a model wrapper for it"""

  def update_model(self, name, model_file)

  def remove_model(self, name, cleanup=False)

  def update_config(self, param=value)

  def status(self)

  def metrics(self)

  def stop(self, teardown=False)

Case Studies/Tutorials

  1. Basic Scikit-Learn workflow:
import sklearn.svm
import numpy as np
import pandas as pd
import clipper

# Load some data and train a model
df = pd.read_csv(fname, sep=",", header=None)
data = df.values
print("Number of image files:", len(data))
Y = data[:,0]
X = data[:,1:]
model = svm.SVC()
model.fit(X,Y)

# Deploy trained model in Clipper

# start() takes kwargs for any of the fields in a ClipperConf
# (https://github.com/amplab/clipper/blob/39eef86a4c3926677e11e68847384f072245513a/clipper_server/src/clipper/configuration.rs#L15)
clp = clipper.start(name="spam-detector")
# There is now a Clipper server running on this node, but not serving any models yet. start() automatically launches Clipper then connects to the running instance.

# Get the status of Clipper. Returns configuration information, as well as hierarchical information about any models, but not performance information.
json_status = clp.status()
print(json_status)

# Add an in-memory scikit-learn model directly to Clipper. This bypasses the need for the
# user to know about model wrappers
clp.deploy_sklearn_model("sklearn-svm", model)

# Request a prediction. This is just a convenience library to send predictions from Python to the
# REST API.
uid = 1
test_pred = clipper.request_prediction(uid, X[17])
# Update the user's correction model
clipper.schedule_update(uid, X[17], y[17])

# See performance metrics (throughput, latency, number of requests, etc)
json_metrics = clp.metrics()
print(json_metrics)

Now in a second session (e.g. different time, different user, even different machine)

from sklearn import tree
import numpy as np
import pandas as pd
import clipper
# There is already a running instance, so we don't want to launch a new one but just connect to
# the existing one to manage it.
clp = clipper.connect(ip="localhost", port=clipper.DEFAULT_ADMIN_PORT)

# Train a decision tree
X, Y = load_data(labeled_data_path)
dt_model = tree.DecisionTreeClassifier()
dt_model.fit(X,Y)
clp.deploy_model("sklearn-dt", dt_model)
# Clipper is now serving 2 models

# Request a prediction. This is just a convenience library to send predictions from Python to the
# REST API.
uid = 1
test_pred = clipper.request_prediction(uid, X[17])
# Update the user's correction model
clipper.schedule_update(uid, X[17], y[17])

# See performance metrics (throughput, latency, number of requests, etc)
json_metrics = clp.metrics()
print(json_metrics)

# Shut down Clipper
# Teardown tells Clipper to shut down the model wrappers as well
clp.stop(teardown = True)

Open questions:

  • How/how much should Clipper manage the model wrappers? From an internal systems standpoint, the separation between model wrappers and Clipper makes sense (isolation, scaleout, independent resource allocation). But what about from a management perspective? It turns into a huge pain to manage model wrappers separately.
  • Semantics of model add/update/delete (see #22)

Finish implementing RPC support for strings, variable length inputs

Clipper should support arrays of both fixed and variable length {bytes,ints,floats}, as well as text strings. Support for some of these data types in the RPC layer has not been implemented yet.

This issue has two components:

  • Implement the three methods highlighted here to support sending bytes and strings, as well as the accompanying decode methods for unit-testing. The encoding format for bytes should mirror those for floats/ints, and the proposed format for strings is described in the module documentation. Let's use this library for LZ4 compression on the Rust side, and Python-LZ4 for decoding on the Python side.
  • Implement support for receiving these inputs in the Python RPC server code rather than raise NotImplementedError.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.