Coder Social home page Coder Social logo

eytancanzini / neural_exploration Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sauxpa/neural_exploration

0.0 0.0 0.0 6.22 MB

Study NeuralUCB and regret analysis for contextual bandit with neural decision

Python 9.55% Jupyter Notebook 90.45%

neural_exploration's Introduction

neural_exploration

Contextual bandits are single-state decision processes with the assumption that the rewards for each arm at each step are generated from a (possibly noisy) function of observable features. Similarly, contextual MDPs offer a setting for reinforcement learning where rewards and transition probabilities can be inferred from vector features.

The literature has focused on optimistic exploration bounds under assumptions of linear dependency with the features, resulting in celebrated algorithms such as LinUCB (bandit) and LinUCBVI (fixed horizon RL with value iteration).

Recently, https://arxiv.org/pdf/1911.04462.pdf introduced NeuralUCB, an optimistic exploration bound algorithm that leverages the power of deep neural networks as universal function approximators to alleviate the constraint of linearity.

The goal of this repo is to implement these methods, design synthetic bandits and MDPs to test the various algorithms and introduce NeuralUCBVI, a value iteration algorithm based on neural approximators of rewards and transition kernel for efficient exploration of fixed-horizon MDP.

Experiments

All methods are tested on 3 types of contextual rewards : linear, quadratic, and highly nonlinear (cosine).

For episodic MDPs, transition matrix are assumed to be linear in the features in all cases. While LinUCB and LinUCB-VI perform well in the linear case (sublinear or even no regret growth), they are slightly sub-optimal in the quadratic case and completely fail in presence of stronger nonlinearity. This is consistent with results from https://arxiv.org/abs/1907.05388 on approximate linear MDP and https://arxiv.org/pdf/1911.00567.pdf on low-rank MDPs, which give control on the performance of linear exploration as a function of the magnitude of the nonlinearity.

Neural exploration on the other hand rely on more sophisticated approximators, which are expressive enough to predict rewards or Q-functions generated by more complicated functions of the features (given wide or deep enough architecture, neural networks are universal approximators). NeuralUCB and NeuralUCB-VI efficiently explore and quickly reach optimality (no or very slow regret growth).

neural_exploration's People

Contributors

sauxpa avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.