amplab / clipper-v0 Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 2.0 32.48 MB

Python 18.63% Rust 70.48% Shell 0.60% C++ 4.06% Makefile 0.04% RenderScript 6.19%

clipper-v0's People

Contributors

Stargazers

Watchers

Forkers

canyuchen huaizhengzhang

clipper-v0's Issues

Provide extra support for Scikit-Learn models

Scikit-Learn models should be extremely simple to deploy in Clipper. I have two ideas in mind to help with this.

First, we should write a little Python library that can be imported into a Jupyter notebook that looks something like this:

>>> import clipper_lib
>>> mymodel = svm.SVC()
>>> mymodel.fit(X,Y)
>>> clipper_config = ClipperConf()
>>> cl = clipper.start(clipper_config, name="pytest")
>>> cl.status()
>>> cl.stop()
>>> cl.add_model(mymodel, name="model1")
>>> print(cl)
>>> cl2.connect(clipper_config)
"Clipper running at 127.0.0.1:1337 serving the pytest application"

The idea with the library is to automatically deploy Clipper and add models to a running Clipper instance directly from a Jupyter notebook.

Second, we should provide a script to automatically deploy Clipper and all dependencies (including model wrappers) that just needs to be pointed to a serialized Scikit-Learn model (or models).

Possible API functionality

Start
stop
restart
connect
deploy model
update model (include options for how to deal with cache, correction policies)
set_config parameter
status (hierarchical descriptions of model, include current params)
metrics

Python FFI bindings for external query interface

For some types of ML serving applications, a highly concurrent REST API doesn't make sense as the primary serving API. Particularly for reinforcement learning, being able to query Clipper from a Python library directly is much easier. Add a blocking external query interface with Python FFI bindings.

Clean shutdown of running Clipper instance

Currently Clipper is stopped by killing the process and letting the OS clean up everything. Instead, there are several resources that should be cleanly shutdown (e.g. terminating TCP connections to model wrappers). Using Ctrl-C and a signal handler is probably the simplest way to trigger shutdown. Once a shutdown is triggered, the REST API should stop receiving requests but allow all existing requests to finish. Then we need to signal all child threads to stop. I think most of the threads are communicating with each other through an mpsc and message passing, so I think we should be able to signal termination through disconnecting the mpsc for the most part. The metrics thread needs special-case handling, and a few others might need that as well.

My hope is that this will allow the model-wrapper RPC servers to detect a client has disconnected and start listening for a new connection.

TEST ISSUE

Update terminology to match the paper

We should standardize terminology around the terms used in the Clipper paper. E.g., features -> models/model wrappers, tasks -> correction policies.

Pre-initialize correction policies from existing training data

A Clipper deployment should be able to pre-train correction policies if provided training data. For now the training data can just be provided through a local csv file.

Estimating distributional shift in input data

For monitoring, it would be useful if we could actively monitor the distribution of the input data and determine if/when there is a distributional shift. Some potential strategies:

Examine the distributions of input data conditioned on their predicted class
Clustering on the inputs/distribution of outputs
Images can potentially be clustered by examining their vector embedding (via the hidden layer/output of some deep network) but might be slow
Text clustering using K-means or hierarchical clustering - maybe too specialized
Mann–Whitney U test

To start, we should implement a logging mechanism for input data.

Incorporate Docker solutions for InfluxDB and Grafana

We need to support the use of InfluxDB and Grafana within a docker container.

Monitoring the distribution of predicted y-values

Using the Clipper system metrics, we would like to track the distribution of predicted y-values across tasks such as binary classification, multiclass classification and regression. This can be done (for binary and multiclass classification) by creating a fixed-bucket histogram within Clipper metrics. For now, we will not focus on regression.

Automate the Grafana setup process

Minimize the number of steps needed for users to initialize and access Grafana for metrics visualization. Ideally, the user should simply be able to navigate to a local URL and immediately be presented with relevant visualizations.

Update libsvm/liblinear wrappers for improved portability

Clipper should move the wrapper package back into the clipper-server code. We can then create liblinear-sys and libsvm-sys packages that will compile the C libraries from source rather than assuming they are available in /usr/local/lib.

Allow configurable number of training points provided for correction policy training

Once #26 is addressed and updates are persisted, we should allow developers to configure the maximum number of training points provided to correction policies when being retrained. This would allow correction policies to use models that cannot be incrementally maintained (e.g. where online learning/SGD is impossible or performs poorly).

Add support for online updates

REST API support is already in place. Once #9 and #10 are completed, this should be simple to implement.

Use a real logging implementation and improve metrics reporting

Fern looks like a viable alternative for logging.

Set up CI with Jenkins

Better error handling for broken or malformed requests

A bad request should just fail, not cause any panics or impact the running system.

Provide extra support for Spark MLLib models

We should do something similar to #29 for Spark models as well, but this requires a bit more thought to determine the best way to support Spark.

Variable latency predictions

Rather than having one static latency objective, Clipper should support a latency objective associated with each prediction request.

Unify Clipper deployment

Currently Clipper deployment relies on having a running Redis instance and model wrappers. It would be good to have a unified script that launches and shuts down all of them together.

Allow new users to be added dynamically at runtime

Right now the Correction Model Table is statically allocated when Clipper is initialized and new users cannot be added to the running system. This was just for ease of programming, but needs to be fixed so that the correction model table is growable.

Tracking Prediction Error

For active monitoring of system and model performance, we would like to track prediction error with respect to all of the data and feedback; as the system runs, we will accumulate more data/feedback for validation. Training on the latest window of feedback, we will evaluate on the entire set of training data/feedback.

Implement exponentially weighted moving averages for `Meter` and exponentially decaying reservoirs for `Histogram`

Implement LSH for the Prediction Cache

Create a Clipper docker container

One way to simplify Clipper deployment might be to package everything together into a Docker container.

Add support for VW models

This PR has two components:

Write a generic Clipper model RPC server in C++. This should be modeled after the Python RPC server. The serialization format is described in the Rust RPC Client.
Figure out how to train, serialize and deserialize Vowpal Wabbit models and write sample model wrappers for a few VW models.

Update metrics tracking

In the course of refactoring, some metrics reporting got lost. This should be fixed and updated.

Add frontend RPC interface

The current REST interface is pretty slow. We should provide an equivalent RPC interface for higher performance.

Save updates to persistent storage

When updates come in, they should be saved to a DB, logged to disk, or sent along a message queue for further consumption.

Create end-to-end example

Before release, we should create an end-to-end example demonstrating how to use Clipper with an existing web-server (Nginx probably) in a full cluster setup on AWS.

Rewrite model wrapper RPC system to make Clipper listen for incoming connections

The model wrapper RPC system treats the model wrappers as long-lived RPC servers. When a Clipper instance is started it requests a long-lived TCP connection to each of the model wrappers.

This decision was made because the model wrappers are logical servers: they listen for incoming RPC requests and return responses. However, as we start to think of Clipper in the context of long-running serving clusters, it's clear that Clipper is actually a much longer-lived resource than the model wrappers. New models will be added and existing models updated frequently within the uptime of a single Clipper instance. Furthermore, forcing each model wrapper to listen on a different port could potentially lead to running out of ports.

Instead, per @jegonzal's suggestion, let's reverse the connection direction and have Clipper listen for new model wrapper connections on a known port number and tell model wrappers where to find Clipper at runtime. As we start to move model wrappers into Docker containers, this information can be provided through an environment variable.

External query functionality

Right now, Clipper does everything besides actually receive and respond to external queries. It's not a very useful prediction-server without that functionality though. I think I can use the new Hyper async-io mechanism that was recently merged into master to implement Clipper's weird partially event-driven query-processing.

Components of this PR

Basic integration with Hyper
Add way to POST inputs
Error handling for malformed, broken requests
Support for prepopulating user models from training data
Implement online updates

Update or add models to running Clipper instance

Currently, model wrappers must be declared statically at initialization time and cannot be added or updated.

Refactor RPC system

The RPC system in place was just enough to run benchmarks for the paper. I need to add support for strings, variable length arrays, byte arrays. The easiest way to do this is probably to not roll my own RPC system. This is a matter of adding more general-purpose serialization (protobuf?) and adding support for variable length inputs.

More broadly, I'd like to factor out the actual communication mechanism from the batching layer so that we can support more than implementation at once. This will allow for single-node IPC when everything is on the same box and specialized RPC implementations for performance if necessary.

Make configuration dictated by a single config file

General system configuration, correction policies, input types, model wrapper locations, etc. should all be configured in a single TOML file.

Testing and basic documentation

I want to do some minor refactoring to clean up the code organization and make it slightly more modular. This will help with both #1 and make it easier to extend in the future. In doing this refactor, it would be good to add documentation and unit tests to the existing code.

Add a license

Probably Apache

Detect when a model wrapper dies

It's possible for a model wrapper to die. At a minimum, the Clipper PredictionBatcher should detect this and stop sending requests. It also shouldn't panic! and cause errors in the rest of Clipper.

Support notification when a prediction cache entry gets filled

When we perform online updates to the correction model, we don't have the same SLOs and so we train with all available model predictions for best accuracy. We need some way to determine when the model predictions for the new piece of feedback are available. To avoid having to poll the cache, I want to implement a notification mechanism.

Management Utility Proposal

Developers need some way of managing and administering a Clipper instance.

Proposal:

A running Clipper instance will expose an admin endpoint on a different port number that admin commands can be executed against. Accompanying this, we will provide a python library that can either be imported (e.g. into a Jupyter notebook) or used from the command line to aid administration. The Python library will include some special functionality to help users serve Scikit-Learn and PySpark models as easily as possible.

Partial list of commands

class ClipperManager:

  @classmethod
  def start(cls, conf)
    """Returns a new ClipperManager object"""

  @classmethod
  def connect(cls, address)
    """Connects to an already running Clipper instance. Returns a new Clipper object."""

  def restart(self)
    """
    Cleanly restarts Clipper (including emptying the caches) but does not touch the
    running model wrappers.
    """

  def add_model(self, model_conf)
    """Tell Clipper about a new, running, model wrapper"""

  def add_replica(self, name, location)
    """
    A single offline model in Clipper can have many replicas.
    This methods informs Clipper about a new replica for an existing model.
    This will throw an exception if name does not match the names of any existing
    models in Clipper.
    """

  def deploy_model(self, name, wrapper_executable, num_replicas=1, extra_cl_args=None)
    """
    Launch a new model wrapper and point it at the model_file. Assumes the model-wrapper is
    a self-contained executable. Any additional arguments (e.g. model file location) must be provided
    as part of the extra_cl_args argument.     
    """

  def deploy_spark_model(self, name, model_type, model_file)
    """Starts a model wrapper for this serialized Spark model"""

  def deploy_sklearn_model(self, model, **kwargs)
    """Take an in-memory scikit-learn model and create and start a model wrapper for it"""

  def update_model(self, name, model_file)

  def remove_model(self, name, cleanup=False)

  def update_config(self, param=value)

  def status(self)

  def metrics(self)

  def stop(self, teardown=False)

Case Studies/Tutorials

Basic Scikit-Learn workflow:

import sklearn.svm
import numpy as np
import pandas as pd
import clipper

# Load some data and train a model
df = pd.read_csv(fname, sep=",", header=None)
data = df.values
print("Number of image files:", len(data))
Y = data[:,0]
X = data[:,1:]
model = svm.SVC()
model.fit(X,Y)

# Deploy trained model in Clipper

# start() takes kwargs for any of the fields in a ClipperConf
# (https://github.com/amplab/clipper/blob/39eef86a4c3926677e11e68847384f072245513a/clipper_server/src/clipper/configuration.rs#L15)
clp = clipper.start(name="spam-detector")
# There is now a Clipper server running on this node, but not serving any models yet. start() automatically launches Clipper then connects to the running instance.

# Get the status of Clipper. Returns configuration information, as well as hierarchical information about any models, but not performance information.
json_status = clp.status()
print(json_status)

# Add an in-memory scikit-learn model directly to Clipper. This bypasses the need for the
# user to know about model wrappers
clp.deploy_sklearn_model("sklearn-svm", model)

# Request a prediction. This is just a convenience library to send predictions from Python to the
# REST API.
uid = 1
test_pred = clipper.request_prediction(uid, X[17])
# Update the user's correction model
clipper.schedule_update(uid, X[17], y[17])

# See performance metrics (throughput, latency, number of requests, etc)
json_metrics = clp.metrics()
print(json_metrics)

Now in a second session (e.g. different time, different user, even different machine)

from sklearn import tree
import numpy as np
import pandas as pd
import clipper
# There is already a running instance, so we don't want to launch a new one but just connect to
# the existing one to manage it.
clp = clipper.connect(ip="localhost", port=clipper.DEFAULT_ADMIN_PORT)

# Train a decision tree
X, Y = load_data(labeled_data_path)
dt_model = tree.DecisionTreeClassifier()
dt_model.fit(X,Y)
clp.deploy_model("sklearn-dt", dt_model)
# Clipper is now serving 2 models

# Request a prediction. This is just a convenience library to send predictions from Python to the
# REST API.
uid = 1
test_pred = clipper.request_prediction(uid, X[17])
# Update the user's correction model
clipper.schedule_update(uid, X[17], y[17])

# See performance metrics (throughput, latency, number of requests, etc)
json_metrics = clp.metrics()
print(json_metrics)

# Shut down Clipper
# Teardown tells Clipper to shut down the model wrappers as well
clp.stop(teardown = True)

Open questions:

How/how much should Clipper manage the model wrappers? From an internal systems standpoint, the separation between model wrappers and Clipper makes sense (isolation, scaleout, independent resource allocation). But what about from a management perspective? It turns into a huge pain to manage model wrappers separately.
Semantics of model add/update/delete (see #22)

Finish implementing RPC support for strings, variable length inputs

Clipper should support arrays of both fixed and variable length {bytes,ints,floats}, as well as text strings. Support for some of these data types in the RPC layer has not been implemented yet.

This issue has two components:

Implement the three methods highlighted here to support sending bytes and strings, as well as the accompanying decode methods for unit-testing. The encoding format for bytes should mirror those for floats/ints, and the proposed format for strings is described in the module documentation. Let's use this library for LZ4 compression on the Rust side, and Python-LZ4 for decoding on the Python side.
Implement support for receiving these inputs in the Python RPC server code rather than raise NotImplementedError.