Coder Social home page Coder Social logo

slots's Introduction

slots

A multi-armed bandit library for Python

Slots is intended to be a basic, very easy-to-use multi-armed bandit library for Python.

PyPI PyPI - Python Version Downloads

Code style: black type hints with mypy

Author

Roy Keyes -- roy.coding@gmail

License: MIT

See LICENSE.txt

Introduction

slots is a Python library designed to allow the user to explore and use simple multi-armed bandit (MAB) strategies. The basic concept behind the multi-armed bandit problem is that you are faced with n choices (e.g. slot machines, medicines, or UI/UX designs), each of which results in a "win" with some unknown probability. Multi-armed bandit strategies are designed to let you quickly determine which choice will yield the highest result over time, while reducing the number of tests (or arm pulls) needed to make this determination. Typically, MAB strategies attempt to strike a balance between "exploration", testing different arms in order to find the best, and "exploitation", using the best known choice. There are many variation of this problem, see here for more background.

slots provides a hopefully simple API to allow you to explore, test, and use these strategies. Basic usage looks like this:

Using slots to determine the best of 3 variations on a live website.

import slots

mab = slots.MAB(3, live=True)

Make the first choice randomly, record responses, and input reward 2 was chosen. Run online trial (input most recent result) until test criteria is met.

mab.online_trial(bandit=2,payout=1)

The response of mab.online_trial() is a dict of the form:

{'new_trial': boolean, 'choice': int, 'best': int}

Where:

  • If the criterion is met, new_trial = False.
  • choice is the current choice of arm to try.
  • best is the current best estimate of the highest payout arm.

To test strategies on arms with pre-set probabilities:

# Try 3 bandits with arbitrary win probabilities
b = slots.MAB(3, live=False)
b.run()

To inspect the results and compare the estimated win probabilities versus the true win probabilities:

# Current best guess
b.best()
> 0

# Estimate of the payout probabilities
b.est_probs()
> array([ 0.83888149,  0.78534031,  0.32786885])

# Ground truth payout probabilities (if known)
b.bandits.probs
> [0.8020877268854065, 0.7185844454955193, 0.16348877912363646]

By default, slots uses the epsilon greedy strategy. Besides epsilon greedy, the softmax, upper confidence bound (UCB1), and Bayesian bandit strategies are also implemented.

Regret analysis

A common metric used to evaluate the relative success of a MAB strategy is "regret". This reflects that fraction of payouts (wins) that have been lost by using the sequence of pulls versus the currently best known arm. The current regret value can be calculated by calling the mab.regret() method.

For example, the regret curves for several different MAB strategies can be generated as follows:

import matplotlib.pyplot as plt
import slots

# Test multiple strategies for the same bandit probabilities
probs = [0.4, 0.9, 0.8]

strategies = [{'strategy': 'eps_greedy', 'regret': [],
               'label': '$\epsilon$-greedy ($\epsilon$=0.1)'},
              {'strategy': 'softmax', 'regret': [],
               'label': 'Softmax ($T$=0.1)'},
              {'strategy': 'ucb', 'regret': [],
               'label': 'UCB1'},
              {'strategy': 'bayesian', 'regret': [],
               'label': 'Bayesian bandit'},
              ]

for s in strategies:
 s['mab'] = slots.MAB(probs=probs, live=False)

# Run trials and calculate the regret after each trial
for t in range(10000):
    for s in strategies:
        s['mab']._run(s['strategy'])
        s['regret'].append(s['mab'].regret())

# Pretty plotting
plt.style.use(['seaborn-poster','seaborn-whitegrid'])

plt.figure(figsize=(15,4))

for s in strategies:
    plt.plot(s['regret'], label=s['label'])

plt.legend()
plt.xlabel('Trials')
plt.ylabel('Regret')
plt.title('Multi-armed bandit strategy performance (slots)')
plt.ylim(0,0.2);

Regret plot

API documentation

For documentation on the slots API, see slots-docs.md.

Todo list:

  • More MAB strategies
  • Argument to save regret values after each trial in an array.
  • TESTS!

Contributing

I welcome contributions, though the pace of development is highly variable. Please file issues and submit pull requests as makes sense.

The current development environment uses:

  • pytest >= 5.3 (5.3.2)
  • black >= 19.1 (19.10b0)
  • mypy = 0.761

You can pip install these easily by including dev-requirements.txt.

For mypy config, see mypy.ini. For black config, see pyproject.toml.

slots's People

Contributors

jcbozonier avatar roycoding avatar sinanh avatar szeitlin avatar zd123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

slots's Issues

Fix `true_divide` `RuntimeWarning`

slots.py:377: RuntimeWarning: invalid value encountered in true_divide sum(self.pulls) * np.max(np.nan_to_num(self.wins / self.pulls))

Refactor for fixed and variable payouts

Refactor to historical and future payouts for both fixed payout values (both binary and numerical) and variable payouts.

  • For live trials, handle binary or numerical payouts.
  • For off-line trials, handle fixed binary or numerical payouts and variable numerical payouts.

This probably requires a relatively significant API change.

Basic defaults results in an error

Running

mab = slots.MAB()
mab.run()

results in an error.

This seems to be due to how the defaults are handled for a "live" trial.

live=False might be a better default?

Add tests

We need tests!

  • I am planning to use pytest.
  • Asymptotic tests would be nice for the MAB strategies.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.