Coder Social home page Coder Social logo

ardaegeunlu / contextual-gaussian-process-bandit-optimization Goto Github PK

View Code? Open in Web Editor NEW
30.0 1.0 5.0 2.79 MB

Simple implementation of the CGP-UCB algorithm.

License: MIT License

Python 100.00%
multi-armed-bandit gaussian-processes machine-learning machine-learning-algorithms reinforcement-learning reinforcement-learning-algorithms

contextual-gaussian-process-bandit-optimization's Introduction

Contextual-Gaussian-Process-Bandit-Optimization

A straightforward implementation of the CGP-UCB algorithm [1].

CGP-UCB is an intuitive upper-confidence style algorithm, in which the payoff function is modeled as a sample from a Gaussian process defined over joint action-context space. It is shown that by mixing and matching kernels for contexts and actions, CGP-UCB can handle a variety of practical applications [2].

Dependencies

You need to have the following libraries.

GPy
matplotlib
numpy

How to Run

Define your Input Space

# create arrays of actions and contexts
actions = np.arange(-3, 3.25, 0.25)
contexts = np.arange(-3, 3.25, 0.25)

Create an Environment

environment = DummyEnvironment()

DummyEnvironment used for testing is given below.

class DummyEnvironment(object):
  def sample(self, x):
    return np.sin(x[0]) + np.cos(x[1])
  def sample_noisy(self, x):
    return [self.sample(x) + np.random.normal(loc=0.0, scale=0.02)]

Create a Kernel

Define a kernel using GPy Kernels or you can create one for yourself. Jupyter Tutorial on GPy Kernels GPy Documentation

# works on the first dim. of input_space, index=0
kernel1 = GPy.kern.RBF(input_dim=1, variance=1., lengthscale=1., active_dims=[0])
# works on the second dim. of input_space, index=1
kernel2 = GPy.kern.RBF(input_dim=1, variance=1., lengthscale=1., active_dims=[1])
# composite kernel by the product of two kernels.
kernel = kernel1 * kernel2

Initialize and Run

# initialize CGP-UCB
agent = CGPUCB(actions=actions, contexts=contexts, sample_from_environment=environment.sample_noisy, kernel=kernel)
rounds = 300
number_of_contexts = contexts.size
best_strategy_rewards = []
for i in range(rounds):
  # choose a random context.
  context_index = np.floor(np.random.rand()*number_of_contexts)
  # iterate learning algorithm for 1 round.
  agent.learn(context_index)
  # save best_strategy's reward for the current context. (Only used to plot regret.)
  best_strategy_rewards.append(environment.get_best_reward_in_context(context_index))

Some Tests and Plots

CGP-UCB is iterated for 300 learning rounds using the DummyEnvironment given above.

Regret Plot

The following code plots the regret. Note that plot_regret is not a part of the CGP-UCB class.

plot_regret(best=best_strategy_rewards, agent=agent.Y)

The sublinear regret shows that the CGP-UCB converges to the best action-context pair.

Wireframe Plot of Mean and Payoff Function

The following code generates a 3D plot of deduced means at all input-space points and the real environment function.

agent.plot_environment_and_mean()

Plotting Slices

You can fix either context or action at some fixed points, here we fix context to [-np.pi/2.0, 0, np.pi/2.0] to produce 3 seperate 2d plots.

agent.plot_slices(fixed_dimension=1, slices=[-np.pi/2.0, 0, np.pi/2.0])

Other Possible Plots

GPy offers a plethora of plotting options. CGP-UCB.gp attribute can be used to access them. The function "builtin_plot()" utilizes CGP-UCB.gp.plot() function with your choice of kwargs. The following code is used to create the contour plot below after 300 rounds.

agent.builtin_plot(projection='2d', title='GPy Contour Plot', xlabel='Actions', ylabel='Contexts')

A few Words about Composite Kernels

Additive Kernel

An alternative is to consider the additive combination (kS ⊕ kZ) of kernels. The intuition behind this construction is that a GP with additive kernel can be understood as a generative model, which first samples a function fS(s, z) that is constant along z, and various along s with regularity as expressed by ks; it then samples a function fz(s, z), which varies along z and is constant along s; then f = fs + fz. Thus, the fz component models overall trends according to the context (e.g., encoding assumptions about similarity within clusters of contexts), and the fS models action-specific deviation from this trend [3].

Product Kernel

The intuition behind this product kernel is a conjunction of the notions of similarities induced by the kernels over context and action spaces: Two context-action pairs are similar (large correlation) if the contexts are similar and actions are similar. Note that many kernel functions used in practice are already in product form. For example, if kZ and kS are squared exponential kernels, then the product k = kZ ⊗ kS is a squared exponential kernel [4].

Acknowledgments

Reference

[1,2,3,4] Krause, A., Ong, C.S. (2011). Contextual Gaussian Process Bandit Optimization. Advances in Neural Information Processing Systems 24 (NIPS 2011), pp.2447-2455.

Also check for implementation

Srinivas, N., Krause, A., Kakade, S. and Seeger, M. (2012). Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting. IEEE Transactions on Information Theory, 58(5), pp.3250-3265.

Thanks to https://github.com/tushuhei/gpucb for initial template on GP-UCB.

contextual-gaussian-process-bandit-optimization's People

Contributors

ardaegeunlu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

contextual-gaussian-process-bandit-optimization's Issues

There is only one kernel function in this repo

It seems that we only use single RBF kernel on the actions here, but CGP-UCB requires two different kernel functions k_z :Z×Z→R and k_s : S × S → R on the space of contexts and actions, and use them to derive the kernel on the product space.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.