Training a Best Response with a Belief Model Hidden State Over a Pool of Hanabi Policies
This codebase is largely based off https://github.com/facebookresearch/hanabi_SAD with some modifications. The code is set up to train a best response with a belief model hidden state over a pool of 6 policies, and may be run via python pyhanabi/selfplay.py
. Change line 151
in pyhanabi/selfplay.py
to modify the 6 training pool policies. Change line 146
in pyhanabi/selfplay.py
to modify the belief model whose hidden state we use for the best response; the belief should be such that applying the softmax function to its hidden state yields a categorical distribution over the unobservable environment features.
We have been using pytorch-1.5.1
, cuda-10.1
, and cudnn-v7.6.5
in our development environment.
Other settings may also work but we have not tested it extensively under different configurations.
We also use conda/miniconda
to manage environments.
# create new conda env
conda create -n hanabi python=3.7
conda activate hanabi
# install pytorch
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
# install other dependencies
pip install numpy
pip install psutil
# if the current cmake version is < 3.15
conda install -c conda-forge cmake
For convenience, add the following lines to your .bashrc
,
after the line of conda activate xxx
.
# set path
CONDA_PREFIX=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
export CPATH=${CONDA_PREFIX}/include:${CPATH}
export LIBRARY_PATH=${CONDA_PREFIX}/lib:${LIBRARY_PATH}
export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}
# avoid tensor operation using all cpu cores
export OMP_NUM_THREADS=1
Clone & build.
git clone --recursive https://github.com/facebookresearch/hanabi.git
cd hanabi
mkdir build
cd build
cmake ..
make -j10
hanabi/pyhanabi/tools
contains some example scripts to launch training
runs. dev.sh
is a fast lauching script for debugging. It needs 2 gpus to run,
1 for training and 1 for simulation. Other scripts are examples for a more formal
training run, they require 3 gpus, 1 for training and 2 for simulation.
The important flags are:
--sad 1
to enable "Simplified Action Decoder";
--pred_weight 0.25
to enable auxiliary task and multiply aux loss with 0.25;
--shuffle_color 1
to enable other-play.
cd pyhanabi
sh tools/dev.sh
Run the following command to download the trained models used to produce tables in the paper.
cd model
sh download.sh
To evaluate a model, simply run
cd pyhanabi
python tools/eval_model.py --weight ../models/sad_2p_10.pthw --num_player 2
The results on Hanabi can be further improved by running search on top of our agents. Please refer to the paper and code for details.
We also open-sourced a single agent implementation of R2D2 tested on Atari here.
Use black
to format python code,
run black *.py
before pushing
The root contains a .clang-format
file that define the coding style of
this repo, run the following command before submitting PR or push
clang-format -i *.h
clang-format -i *.cc
Copyright (c) Facebook, Inc. and its affiliates. All rights reserved.
This source code is licensed under the license found in the LICENSE file in the root directory of this source tree.