Coder Social home page Coder Social logo

hanabi-br-withbelief's Introduction

Training a Best Response with a Belief Model Hidden State Over a Pool of Hanabi Policies

This codebase is largely based off https://github.com/facebookresearch/hanabi_SAD with some modifications. The code is set up to train a best response with a belief model hidden state over a pool of 6 policies, and may be run via python pyhanabi/selfplay.py. Change line 151 in pyhanabi/selfplay.py to modify the 6 training pool policies. Change line 146 in pyhanabi/selfplay.py to modify the belief model whose hidden state we use for the best response; the belief should be such that applying the softmax function to its hidden state yields a categorical distribution over the unobservable environment features.

Compile

We have been using pytorch-1.5.1, cuda-10.1, and cudnn-v7.6.5 in our development environment. Other settings may also work but we have not tested it extensively under different configurations. We also use conda/miniconda to manage environments.

# create new conda env
conda create -n hanabi python=3.7
conda activate hanabi

# install pytorch
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# install other dependencies
pip install numpy
pip install psutil

# if the current cmake version is < 3.15
conda install -c conda-forge cmake

Clone & Build this repo

For convenience, add the following lines to your .bashrc, after the line of conda activate xxx.

# set path
CONDA_PREFIX=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
export CPATH=${CONDA_PREFIX}/include:${CPATH}
export LIBRARY_PATH=${CONDA_PREFIX}/lib:${LIBRARY_PATH}
export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}

# avoid tensor operation using all cpu cores
export OMP_NUM_THREADS=1

Clone & build.

git clone --recursive https://github.com/facebookresearch/hanabi.git

cd hanabi
mkdir build
cd build
cmake ..
make -j10

Run

hanabi/pyhanabi/tools contains some example scripts to launch training runs. dev.sh is a fast lauching script for debugging. It needs 2 gpus to run, 1 for training and 1 for simulation. Other scripts are examples for a more formal training run, they require 3 gpus, 1 for training and 2 for simulation.

The important flags are:

--sad 1 to enable "Simplified Action Decoder";

--pred_weight 0.25 to enable auxiliary task and multiply aux loss with 0.25;

--shuffle_color 1 to enable other-play.

cd pyhanabi
sh tools/dev.sh

Trained Models

Run the following command to download the trained models used to produce tables in the paper.

cd model
sh download.sh

To evaluate a model, simply run

cd pyhanabi
python tools/eval_model.py --weight ../models/sad_2p_10.pthw --num_player 2

Related Repos

The results on Hanabi can be further improved by running search on top of our agents. Please refer to the paper and code for details.

We also open-sourced a single agent implementation of R2D2 tested on Atari here.

Contribute

Python

Use black to format python code, run black *.py before pushing

C++

The root contains a .clang-format file that define the coding style of this repo, run the following command before submitting PR or push

clang-format -i *.h
clang-format -i *.cc

Copyright

Copyright (c) Facebook, Inc. and its affiliates. All rights reserved.

This source code is licensed under the license found in the LICENSE file in the root directory of this source tree.

hanabi-br-withbelief's People

Contributors

gfppoy avatar ravihammond avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.