Evaluating Memory and Credit Assignment in Memory-Based RL

This is the official code for the paper (Section 5.1 & 5.2: discrete control)

"When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment", NeurIPS 2023 (oral)

by Tianwei Ni, Michel Ma, Benjamin Eysenbach, and Pierre-Luc Bacon.

Please switch to the branch to check the code for Section 5.3 (PyBullet continuous control).

Modular Design

The code has a modular design which requires three configuration files. We hope that such design could facilitate future research on different environments, RL algorithms, and sequence models.

config_env: specify the environment, with config_env.env_name specifying the exact (memory / credit assignment) length of the task
- Passive T-Maze (this work)
- Active T-Maze (this work)
- Passive Visual Match (based on [Hung et al., 2018])
- Key-to-Door (based on [Raposo et al., 2021])
config_rl: specify the RL algorithm and its hyperparameters
- DQN (with epsilon greedy)
- SAC-Discrete (we find --freeze_critic can prevent gradient explosion, see the discussion in Appendix C.1 in the latest version of the arXiv paper).
config_seq: specify the sequence model and its hyperparameters including training sequence length config_seq.sampled_seq_len and number of layers --config_seq.model.seq_model_config.n_layer
- LSTM [Hochreiter and Schmidhuber, 1997]
- Transformer (GPT-2) [Radford et al., 2019]

Installation

We use python 3.7+ and list the basic requirements in requirements.txt.

Reproducing the Results

Below are example commands to reproduce the main results shown in Figure 3 and 6. For the ablation results, please adjust the corresponding hyperparameters.

To run Passive T-Maze with a memory length of 50 with LSTM-based agent:

python main.py \
    --config_env configs/envs/tmaze_passive.py \
    --config_env.env_name 50 \
    --config_rl configs/rl/dqn_default.py \
    --train_episodes 20000 \
    --config_seq configs/seq_models/lstm_default.py \
    --config_seq.sampled_seq_len -1 \

To run Passive T-Maze with a memory length of 1500 with Transformer-based agent:

python main.py \
    --config_env configs/envs/tmaze_passive.py \
    --config_env.env_name 1500 \
    --config_rl configs/rl/dqn_default.py \
    --train_episodes 6700 \
    --config_seq configs/seq_models/gpt_default.py \
    --config_seq.sampled_seq_len -1 \

To run Active T-Maze with a memory length of 20 with Transformer-based agent:

python main.py \
    --config_env configs/envs/tmaze_active.py \
    --config_env.env_name 20 \
    --config_rl configs/rl/dqn_default.py \
    --train_episodes 40000 \
    --config_seq configs/seq_models/gpt_default.py \
    --config_seq.sampled_seq_len -1 \
    --config_seq.model.seq_model_config.n_layer 2 \
    --config_seq.model.seq_model_config.n_head 2 \

To run Passive Visual Match with a memory length of 60 with Transformer-based agent:

python main.py \
    --config_env configs/envs/visual_match.py \
    --config_env.env_name 60 \
    --config_rl configs/rl/sacd_default.py \
    --shared_encoder --freeze_critic \
    --train_episodes 40000 \
    --config_seq configs/seq_models/gpt_cnn.py \
    --config_seq.sampled_seq_len -1 \

To run Key-to-Door with a memory length of 120 with LSTM-based agent:

python main.py \
    --config_env configs/envs/keytodoor.py \
    --config_env.env_name 120 \
    --config_rl configs/rl/sacd_default.py \
    --shared_encoder --freeze_critic \
    --train_episodes 40000 \
    --config_seq configs/seq_models/lstm_cnn.py \
    --config_seq.sampled_seq_len -1 \
    --config_seq.model.seq_model_config.n_layer 2 \

To run Key-to-Door with a memory length of 250 with Transformer-based agent:

python main.py \
    --config_env configs/envs/visual_match.py \
    --config_env.env_name 250 \
    --config_rl configs/rl/sacd_default.py \
    --shared_encoder --freeze_critic \
    --train_episodes 30000 \
    --config_seq configs/seq_models/gpt_cnn.py \
    --config_seq.sampled_seq_len -1 \
    --config_seq.model.seq_model_config.n_layer 2 \
    --config_seq.model.seq_model_config.n_head 2 \

The train_episodes of each task is specified in budget.py.

By default, the logging data will be stored in logs/ folder with csv format. If you use --debug flag, it will be stored in debug/ folder.

Logging and Plotting

After the logging data is stored, you can plot the learning curves and aggregation plots (e.g., Figure 3 and 6) using vis.ipynb jupyter notebook.

We also provide our logging data used in the paper shared in google drive (< 400 MB).

Acknowledgement

The code is largely based on prior works:

Questions

If you have any questions, please raise an issue (preferred) or send an email to Tianwei ([email protected]).

twni2016 / memory-rl Goto Github PK

memory-rl's Introduction

Evaluating Memory and Credit Assignment in Memory-Based RL

Modular Design

Installation

Reproducing the Results

Logging and Plotting

Acknowledgement

Questions

memory-rl's People

Contributors

Stargazers

Watchers

Forkers

memory-rl's Issues

Questions about the implementation

Movement Penalty in TMaze

Port for Minigrid Environments

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent