Coder Social home page Coder Social logo

yelp / moe Goto Github PK

View Code? Open in Web Editor NEW
1.3K 62.0 139.0 11.64 MB

A global, black box optimization engine for real world metric optimization.

License: Other

Makefile 0.03% Python 36.33% Ruby 0.06% Shell 0.13% C++ 60.45% Cuda 0.95% CSS 0.04% JavaScript 0.34% CMake 0.74% Mako 0.94%

moe's Introduction

MOE Logo

[![Build Status](https://travis-ci.org/Yelp/MOE.svg?branch=master)](https://travis-ci.org/Yelp/MOE) Note: travis link temporarily disabled. The last major MOE commit built successfully but our travis flow appears to be broken (out of date packages perhaps?). Tests and docker still pass/build locally and MOE still works.

Metric Optimization Engine. A global, black box optimization engine for real world metric optimization.

Or, build the documentation locally with make docs.

What is MOE?

MOE (Metric Optimization Engine) is an efficient way to optimize a system's parameters, when evaluating parameters is time-consuming or expensive.

Here are some examples of when you could use MOE:

  • Optimizing a system's click-through rate (CTR). MOE is useful when evaluating CTR requires running an A/B test on real user traffic, and getting statistically significant results requires running this test for a substantial amount of time (hours, days, or even weeks).

  • Optimizing tunable parameters of a machine-learning prediction method. MOE is useful if calculating the prediction error for one choice of the parameters takes a long time, which might happen because the prediction method is complex and takes a long time to train, or because the data used to evaluate the error is huge.

  • Optimizing the design of an engineering system (an airplane, the traffic network in a city, a combustion engine, a hospital). MOE is useful if evaluating a design requires running a complex physics-based numerical simulation on a supercomputer.

  • Optimizing the parameters of a real-world experiment (a chemistry, biology, or physics experiment, a drug trial). MOE is useful when every experiment needs to be physically created in a lab, or very few experiments can be run in parallel.

MOE is ideal for problems in which the optimization problem's objective function is a black box, not necessarily convex or concave, derivatives are unavailable, and we seek a global optimum, rather than just a local one. This ability to handle black-box objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access. To use MOE, we simply need to specify some objective function, some set of parameters, and any historical data we may have from previous evaluations of the objective function. MOE then finds the set of parameters that maximize (or minimize) the objective function, while evaluating the objective function as little as possible.

Inside, MOE uses Bayesian global optimization, which performs optimization using Bayesian statistics and optimal learning.

Optimal learning is the study of efficient methods for collecting information, particularly when doing so is time-consuming or expensive, and was developed and popularized from its roots in decision theory by Prof. Peter Frazier (Cornell, Operations Research and Information Engineering) and Prof. Warren Powell (Princeton, Operations Research and Financial Engineering). For more information about the mathematics of optimal learning, and more real-world applications like heart surgery, drug discovery, and materials science, see these intro slides to optimal learning.

Why do we need MOE?

Video and slidedeck introduction to MOE:

MOE does this internally by:

  1. Building a Gaussian Process (GP) with the historical data
  2. Optimizing the hyperparameters of the Gaussian Process (model selection)
  3. Finding the points of highest Expected Improvement (EI)
  4. Returning the points to sample, then repeat

Externally you can use MOE through:

  1. The REST interface
  2. The Python interface
  3. The C++ interface

You can be up and optimizing in a matter of minutes. Examples of using MOE

Install

Install in docker:

This is the recommended way to run the MOE REST server. All dependencies and building is done automatically and in an isolated container.

Docker (http://docs.docker.io/) is a container based virtualization framework. Unlike traditional virtualization Docker is fast, lightweight and easy to use. Docker allows you to create containers holding all the dependencies for an application. Each container is kept isolated from any other, and nothing gets shared.

$ docker pull yelpmoe/latest # You can also pull specific versions like yelpmoe/v0.1.0
$ docker run -p 6543:6543 yelpmoe/latest

If you are on OSX, or want a build based on the current master branch you may need to build this manually.

$ git clone https://github.com/Yelp/MOE.git
$ cd MOE
$ docker build -t moe_container .
$ docker run -p 6543:6543 moe_container

The webserver and REST interface is now running on port 6543 from within the container. http://localhost:6543

Install from source:

See Install Documentation

Running MOE

REST/web server and interactive demo

from the directory MOE is installed:

$ pserve --reload development.ini # MOE server is now running at http://localhost:6543

The REST interface documentation

Or, from the command line,

$ curl -X POST -H "Content-Type: application/json" -d '{"domain_info": {"dim": 1}, "points_to_evaluate": [[0.1], [0.5], [0.9]], "gp_historical_info": {"points_sampled": [{"value_var": 0.01, "value": 0.1, "point": [0.0]}, {"value_var": 0.01, "value": 0.2, "point": [1.0]}]}}' http://127.0.0.1:6543/gp/ei

gp_ei endpoint documentation.

From ipython

$ ipython
> from moe.easy_interface.experiment import Experiment
> from moe.easy_interface.simple_endpoint import gp_next_points
> exp = Experiment([[0, 2], [0, 4]])
> exp.historical_data.append_sample_points([[[0, 0], 1.0, 0.01]])
> next_point_to_sample = gp_next_points(exp)
> print next_point_to_sample

easy_interface documentation.

Within Python

See examples/next_point_via_simple_endpoint.py for this code or http://yelp.github.io/MOE/examples.html for more examples.

import math
import random

from moe.easy_interface.experiment import Experiment
from moe.easy_interface.simple_endpoint import gp_next_points
from moe.optimal_learning.python.data_containers import SamplePoint


# Note: this function can be anything, the output of a batch, results of an A/B experiment, the value of a physical experiment etc.
def function_to_minimize(x):
    """Calculate an aribitrary 2-d function with some noise with minimum near [1, 2.6]."""
    return math.sin(x[0]) * math.cos(x[1]) + math.cos(x[0] + x[1]) + random.uniform(-0.02, 0.02)

if __name__ == '__main__':
    exp = Experiment([[0, 2], [0, 4]])  # 2D experiment, we build a tensor product domain
    # Bootstrap with some known or already sampled point(s)
    exp.historical_data.append_sample_points([
        SamplePoint([0, 0], function_to_minimize([0, 0]), 0.05),  # Iterables of the form [point, f_val, f_var] are also allowed
        ])

    # Sample 20 points
    for i in range(20):
        # Use MOE to determine what is the point with highest Expected Improvement to use next
        next_point_to_sample = gp_next_points(exp)[0]  # By default we only ask for one point
        # Sample the point from our objective function, we can replace this with any function
        value_of_next_point = function_to_minimize(next_point_to_sample)

        print "Sampled f({0:s}) = {1:.18E}".format(str(next_point_to_sample), value_of_next_point)

        # Add the information about the point to the experiment historical data to inform the GP
        exp.historical_data.append_sample_points([SamplePoint(next_point_to_sample, value_of_next_point, 0.01)])  # We can add some noise

More examples can be found in the <MOE_DIR>/examples directory.

Within C++

Expected Improvement Demo - http://yelp.github.io/MOE/gpp_expected_improvement_demo.html Gaussian Process Hyperparameter Optimization Demo - http://yelp.github.io/MOE/gpp_hyperparameter_optimization_demo.html Combined Demo - http://yelp.github.io/MOE/gpp_hyper_and_EI_demo.html

Contributing

See Contributing Documentation

License

MOE is licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

moe's People

Contributors

jialeiwang avatar norases avatar peter-i-frazier avatar rmcgibbo avatar suntzu86 avatar timdumol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

moe's Issues

No mutable default arguments

I did not realize Python default arguments are instantiated per-argument, like a static variable in C. So:

def bar(foo=numpy.array([])):
  foo.resize(foo.size + 1)

will eventually run out of memory if foo is called often enough. See:
http://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Default_Argument_Values

Google's style for using a mutable default value is:

def bar(foo=None):
  if bar is None:
    bar = numpy.array()

Thankfully this doesn't break anything right now b/c I only read from my function args, copying before writes. But it's fragile/scary and should be fixed.

MOE crashes when there is no bootstrap data

The following error gets spit out and it causes the REST server to completely crash (when looking for next points):

Grad Cholesky failed; matrix singular. k=0

We should handle this gracefully. If there is no bootstrap data we should return a random point in the domian, or the center of mass of the domain, or something (this could even be a user option), but not just crashing hard.

clean up/improve grad EI vectorized computation

The code for this is currently pretty complicated and kind of ugly. I use numpy.ma.mask to pull out irrelevant terms and then have to apply the masks across multiple vectors, compress masked arrays, and do a bunch of other tricks.

At one point we even copy the grad_chol_decomp tensor "num_mc_iter" (well, usually like 1-10% of num_mc_iter b/c not all mc iterations have a nonzero contributions) times which is a pretty ugly looking operation too.

Partition long vectorized loops in Python

Currently we vectorize the computation of EI & grad EI for performance. At the moment, we vectorize over all num_mc_iterations at once.

This has a couple issues:

  1. To do the vectorization, we duplicate a lot of data (even the gradient tensors) in memory. This is pretty wasteful and after a certain size, it'll be faster to limit the size of the duplication.
  2. We currently compute all the grad_mu updates at once (instead of taking them 1 at at time, along side a grad_chol_decomp update). So it's M * grad_mu, where M could be hundreds of thousands or millions. If grad_mu is large in magnitude, we're losing a lot of precision. Breaking up the loops limits how big this term can get (before being offset by the grad variance term).

Clean up and unify strings/enums across C++ and python

sync these types with the C++ enums (e.g., see line 35, self._domain_type = C_GP.DomainTypes.tensor_product vs domain_type = TENSOR_PRODUCT_DOMAIN_TYPE in moe/optimal_learning/python/cpp_wrappers/domain.py). Like we could drop the enums and use a common set of string names to specify everything or produce a mapping from string names to enums. Whatever, but right now we specify this information multiple different ways.

Expand tests for gp EI

Right now the python only checks agains the analytic EI of the C++

Try it against an MC run (by setting seed) for various problem sizes.

Support conda build system.

scipy uses this. Academic community seems to be shifting over to this (as of PyCon 2014). Another request by Julian K.

http://conda.pydata.org/

Will allow us to build into that ecosystem and apparently get windows support almost for "free"

Support multiple compilers

Users should be able to select what compiler to use at the manual install level (i think this is already reasonably easy) and at the docker level.

gcc, clang, icc are the main ones we'd care about, I think.

icc in particular is interesting b/c the resulting code is at least 2x faster than gcc (even in long monte carlo loops).

Update pyramid file structure

  1. Pull views into modules based on logic boundaries
  2. Pull schemas into views
  3. Move templates into similar module boundaries as views
  4. Move simple_endpoint stuff into a module

improve variance, grad var & cholesky thereof computation in Python GaussianProcess

I found a number of useful performance boosts in C++ by taking advantage of numerous symmetries in these matrices/tensors (it also makes the code fairly opaque). The python implementation skips most of this stuff b/c it's easier to understand (based on the formulas you'd "naturally" derive) and to save developer time.

We can revisit this later.

Sphinx and REST docs

We should be more explicit on the interface of our pyramid classes/methods. Including defining parameter types, return types, exceptions raised etc.

We should also explicitly document the REST interface. jsonschema can do this and there may be a way to hack colander into doing it as well.

Give warnings when the user is doing "bad" things with MOE

We should warn the user when they try to do things that could cause errors.

  1. No bootstrap data - warn MOE will pick a random point in the domain to sample next.
  2. "Low" bootstrap data (3 points in a 10-D space) - warn MOE has little info to go on
  3. Hyperparameter opt with "low" data - warn this may not converge to something sensical
  4. Two points that are identical with no noise - warn and then throw one out

Anything else you guys can think of?

Implement hessians and newton's method for optimization in Python

Currently python uses gradient descent for all optimization, even though hyperparameter optimization is much faster with Newton's method.

implement hessian computation for:
covariance functions
log likelihood measures

and implement a NewtonOptimizer.

All this already exists in C++; so if you want to use it, call that instead.

High level math docs.

We should explain what MOE is, why it is needed and some background. We should be able to steal almost all of this content from my thesis, the paper and the C++ docs.

We should be able to have links to demos and some graphs from this.

Any ideas for general outline?

MOE crashes with no initial data

When asking for a new point MOE will complain that the grad cholesky matrix is singular and crash hard. This brings down the whole rest server.

A hack is placed here:
if gaussian_process.num_sampled == 0:
Line 190 of moe/views/gp_next_points_pretty_view.py

Flush out REST interface.

Right now only about 50% of the functionality of MOE is exposed to the REST interface. Bring that up to 100%.

improve gradient descent (and optimizer) 'communication'

Right now gradient descent doesn't offer much introspection into what it did & whether it succeeded:

  1. indicate whether convergence occurred (e.g., did the gradient drop below tolerance?)
  2. indicate the best objective function value found so far
  3. allow users to indicate a minimum best (like if I can't find a objective value better than 1.5, then I should indicate failure)

1, 2 3 are readily done by constructing an "IOContainer" for optimizers. This could have 3 fields--point, value, success. On input, point, value are the initial guess & minimum objective value. On output, point, value, success are the result & whether optimization succeeded.

This goes hand in hand with GH-91 (e.g., having a tolerance condition is what indicates whether convergence occurs).

make python files pass flake8, etc

Since sclark added some new checks in flake8 & friends, a bunch of things in optimal_learning/python/* and tests/optimal_learning/python/* dont' pass anymore, particularly wrt import ordering.

I'll fix this up but I want to get my current branch out asap: #55

fix TODOs, ticket numbers, etc

I like to litter my code with

TODO(eliu): blah blah blah (#### ticket number if you're lucky)

Currently the ticket number isn't always present. When it is, the numbers are a mix of Yelp-Trac, Yelp-Jira, and Gitbhub numbers.

These comments are handy--when someone comes to do a new ticket, they can quickly see the major code components that would be affected. Readers/users can also see that improvements are planned and missing features aren't just do to ignorance :)

Before open source, we should clean up the style to:

TODO(GH-###): description

no individual username, always have a GH ticket number. This includes mirroring all unfinished Yelp tickets on Github.

implement q,p-EI for the heuristic optimizers

Right now you can only using constant liar, kriging believer, etc. to solve q,0-EI. No reason why we can't do this with q,p-EI.

The basic algorithm is:

for i in range(q):
  new_point = get_next_best_point(GP, ...)
  new_point_value = get_truth_value(new_point)  # e.g., kriging
  GP.add_point(new_point, new_point_value)

So we can simply first do:

points_being_sampled_value = get_truth_values(points_being_sampled)
GP.add_points(points_being_sampled, points_being_sampled_value)

And that provides a way of solving q,p-EI. For standard kriging believer, this will drop the EI at each points_being_sampled to 0, allowing the heuristic solve (over q points) to identify new areas of interest.

More descriptive 500 errors.

Right now when anything fails in the REST server we just 500 error. We should be explicit about what went wrong (C++ matrix vs schema validation vs whatever). We can pass the exceptions up through pyramid.

Adopt an optimized BLAS library or libraries

Currently we install llibblas-dev and liblapack-dev through apt-get in a docker container. These are just the "reference" implementations--no optimizations and they're at best no faster than the naive implementations of these algorithms.

There are several optimized BLAS libraries (and LAPACK*) out there. ATLAS is free (and also generally the worst) and easy to get (you can apt-get it). EIGEN can also be adopted for this purpose (i.e., potentially without rewriting all of MOE to use their matrix objects). GOTO2 is great but I believe the license is non-commercial; not sure what this means for us. And if the user has icc available, the Intel MKL is one of the best.

*LAPACK can be built on top of BLAS using only fcn calls but I think performant libraries will do some extra cross-library optimization.

These libraries would be used by numpy/scipy and I can add hooks in the C++ to call these guys instead of my by-hand implementations.

MOE crashes when two points are identical (with no noise)

LinAlgError: Singular matrix

We should combine these two points instead of putting them both into the matrix. If they are the exact same point, with the exact same value, with no noise, then they are the same point and should be treated as such (instead of crashing).

let python callers pass in random sources

It'd be nice to define a simple container like:

class Randomness(object):
  def __init__(self, uniform=numpy.random.uniform, normal=numpy.random.normal):
    self.uniform = uniform  # callable by uniform(min, max)
    self.normal = normal  # callable by normal()

Handy for testing or for easily trying out other generators/distribution transformations. Also makes it easy to have Python use the same RNG as C++ (could make testing easier).

Merge similar optimization methods in C++

BLOCKED ON: #274

Currently I have things like:
ComputeOptimalPointToSampleWithRandomStarts
and
ComputeOptimalPointToSampleViaMultistartGradientDescent

The only difference here is that the second one lets you specify the starting_points for MGD and the first one generates them for you. The first one wraps the second one. That's a lot of duplicated comments and junk for such a teeny tiny difference.

use real getters/setters for "current_point", "hyperparameters", etc

Currently I have a lot of functions like
get_current_point(), set_current_point(), get_hyperparameters(), etc. These should be replaced with proper getters/setters. Some of these functions are part of the ABC interface; here's how to set those up:
http://bip.weizmann.ac.il/course/python/PyMOTW/PyMOTW/docs/abc/index.html

And for the regular ones, we can do either

@property
def x(self):
  pass
@x.setter
def x(self, val):
  pass

or

def get_x(self):
  pass
def set_x(self, val):
  pass
x = property(get_x, set_x)

gradient descent in python is pretty slow

It takes ~0.7s to perform 10 rounds of 1000 gd iterations, optimizing x^2 + y^2 + z^2. Yikes. For reference, that's a really small number of gd iterations and also a tiny number of multistarts on an objective function that is basically free to compute.

We might be able to speed up the loop, the domain update restrictions can be vectorized, etc.

This isn't a huge deal since these things are blazing fast in C++ but still.

fall back on heuristic optimizers instead of random search?

Currently when EPI optimization via gradient descent fails, we fall back on random search. For larger q values, random search performance can be very poor.

Instead, it probably makes more sense to fall back on one of the heuristic optimizers like kriging believer. It'll be worth testing this at least on some random cases from GP priors to see that it does in fact outperform random search.

It also appears that using random search to solve 1,0-EI, 1,1-EI, 1,2-EI, and 1,3-EI sequentially generally yields better results than random searching for q,0-EI directly. This needs further investigation.

Check for redundancy in tests.

def assert_relatively_equal(self, value_one, value_two, tol=None):

is defined multiple times. Pull them all out and DRY.

Colander to serialize/deserialize numpy arrays

Since moe expects numpy arrays as input and gives numpy arrays as output, it'd be nice if colander supported this directly.

Currently (as a hack), I (eliu) call numpy.array() on the output of params.get() (from deserialized colander operations) and I pass output.tolist() to the colander serializer. This means I import numpy in every file and we have numpy.array() and out.tolist() scattered throughout. It'd be nice to have all of these happen in 1 central place (via colander).

Colander supports type extensions, e.g.:
http://colander.readthedocs.org/en/latest/extending.html#an-example
So it looks like it could be as simple as providing a serialize that calls array.tolist() and then uses the "regular" serializer and a deserializer that uses the regular deserializer and calls numpy.array() on its output. There might be a more elegant way to do this too. I don't know much about colander so I'm not sure if this would work. And I can't find any examples of other people interfacing colander and numpy.

If we switch to jsonschema (or whatever), that new tool should also support this.

fix points_to_sample/points_being_sampled naming discrepancies

Currently, when talking about EI, we use the words "points_to_sample" and "points_being_sampled" to refer to the same thing. Recently I defined the term "q,p-EI" to describe the problem that MOE solves. We can optimize 'q' new points to sample given 'p' points concurrently being run in an experiment.

q <=> points_to_sample (b/c these points will be run in an experiment after return)
p <=> points_being_sampled (b/c these experiments are not yet resolved)

When the experiments from points q or p resolve, they become part of "HistoricalData" or "points_sampled".

The use of "points_to_sample" in GP computations is reasonable and won't be changed. This refers to the union of q & p; this point will be made more clear in the docs.

This blocks ADS-3094 (true EPI for C++).

Expose MOE options through REST interface

optimization iterations, tolerances, monte carlo iters, and other configure values used by MOE should all be exposed through the REST interface.

Care is needed to allow only relevant parameters (e.g., can't try to run gradient descent with a newton parameters input).

List (incomplete):
domain, domain_type (currently we just do a hypercube but support for "arbitrary" domains is coming)
optimizer_type (along w/params, see below)
log_likelihood objective type (marginal, cross val)
all args of cpp_wrappers.optimization_parameters.build_newton_parameters
al args of cpp_wrappers.optimization_parameters.build_gradient_descent_parameters
num mc interations
num_random_samples (from cpp_wrappers.optimization_interface.ExpectedImprovementOptimizationParameters; also see views.gp_next_points_pretty_view.py)
constant liar values
kriging believer values

Note: some of these are already done but I'm just dumping out what I can think of.

Make project a pip/setuptools compatible project

We want to make the install as easy as

$ python setup.py install
or
$ pip install moe

with all of the trimmings of a full setuptools package. This should work with different prefixes etc.

The C++ should also build properly.

Fix logging.

Right now we do a combination of stdout and file logging. We also have different ways of calling logging (importing at file level and setting things and passing loggers around). The python and C++ also handle things differently.

We should unify all logging.

  1. Make logging consistent across all python.
  2. Unify how logging can be turned on/off and levels set and where it goes across python and cpp.

Make sane defaults for all "normal" use cases.

We should have good defaults for

  • EI opt GD and/or Newton
  • Hyper opt GD and/or Newton
  • Testing/demo opt
  • Python vs C++

Defaults for everything that we think the user will want to do, optimizing for what they care about (speed vs accuracy, maybe even having two sets of defaults, or keyed on tolerance).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.