Coder Social home page Coder Social logo

loca's Introduction

The LoCA Regret:

A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning

we introduce an experimental setup to evaluate model-based behavior of RL methods, inspired by work from neuroscience on detecting model-based behavior in humans and animals.

Setup

Task A and task B have the same transition dynamics, but a reward function that is locally different. An experiment consists of first pretraining a method on task A, followed by local pretraining around T1 of task B. After pretraining, the agent is trained and evaluated on the full environment of task B. The additional training a method needs before it has fully adapted to task B determines the size of the LoCA regret.

setup

Requirements:

This code is tested with python 3.7. To install the required packages you can use:

pip install -r requirements.txt

Tabular Experiments

                                        sample1_3

Usage:

First, change the directory to the LoCA_tabular:

cd LoCA_tabular

then use these commands to run the training or plot the results.

  • Training: python main.py
  • Visualize results : python show_results.py
PARAMETERS Description
--method {1: mb_vi, 2: mb_su, 3: mb_nstep, 4: sarsa_lambda, 5: qlearning} algorthms
--LoCA_pretraining {False, True} skip the pretraining phase
--alpha_multp step-size parameter, any value > 0
--S_multp {1, 2, 3, ...} artificially increasing the size of the state space
--n_step 1 only relevant when method = 3 (mb_nstep)

MountainCar Experiments

We adopted MountainCar env for LOCA regret calculation. In our variation, the existing terminal state at the top of the hill corresponds with T1; we added an additional terminal state to the domain, T2, that corresponds with the cart being at the bottom of the hill with a velocity close to 0.

                                        sample1_3

Usage:

First, change the directory to the LoCA_tabular:

cd LoCA_MountainCar
  • Pre-training + Training: python main.py --method sarsa_lambda --env MountainCar
  • Pre-training with shuffled actions + Training: python main.py --env MountainCar --flipped_actions
  • Training: python main.py --env MountainCar --no_pre_training
  • Visualize results : python show_results.py
  • Visualize MuZero results : tensorboard --logdir=/results
Arguments Description
--method {sarsa_lambda, MuZero} Name of the algorithm
--env {MountainCar} Name of the environment
----no_pre_training Skip the pretraining phase
----flipped_actions pretrain with shuffled actions to cancel the effect of model learning

The MuZero code for the experiments is adopted from here.

Citation:

If you found this work useful, please consider citing our paper.

@article{van2020loca,
  title={The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning},
  author={Van Seijen, Harm and Nekoei, Hadi and Racah, Evan and Chandar, Sarath},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  pages={6562--6572},
  year={2020}
}

loca's People

Contributors

dependabot[bot] avatar hnekoeiq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

yiwan-rl

loca's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.