eugenevinitsky / sequential_social_dilemma_games Goto Github PK

Repo for reproduction of sequential social dilemmas

License: MIT License

Python 96.70% Dockerfile 0.34% Shell 2.96%

sequential_social_dilemma_games's Introduction

Sequential Social Dilemma Games

This repo is an open-source implementation of DeepMind's Sequential Social Dilemma (SSD) multi-agent game-theoretic environments [1]. SSDs can be thought of as analogous to spatially and temporally extended Prisoner's Dilemma-like games. The reward structure poses a dilemma because individual short-term optimal strategies lead to poor long-term outcomes for the group.

The implemented environments are structured to be compatible with OpenAIs gym environments as well as RLlib's Multiagent Environment

Implemented Games

Cleanup: A public goods dilemma in which agents get a reward for consuming apples, but must use a cleaning beam to clean a river in order for apples to grow. While an agent is cleaning the river, other agents can exploit it by consuming the apples that appear.

Harvest: A tragedy-of-the-commons dilemma in which apples regrow at a rate that depends on the amount of nearby apples. If individual agents employ an exploitative strategy by greedily consuming too many apples, the collective reward of all agents is reduced.

The above plot shows the empirical Schelling diagrams for both Cleanup (A) and Harvest (B) (from [2]). These diagrams show the payoff that an individual agent can expect if it follows a defecting/exploitative strategy (red) vs a cooperative strategy (blue), given the number of other agents that are cooperating. We can see that an individual agent can almost always greedily benefit from detecting, but the more agents that defect, the worse the outcomes for all agents.

Setup instructions

To install the SSD environments:

Anaconda/miniconda

git clone -b master https://github.com/eugenevinitsky/sequential_social_dilemma_games
cd sequential_social_dilemma_games
conda create -n ssd python==3.8.10 # Create a conda virtual environment
# Patch ray due to https://github.com/ray-project/ray/issues/7946
# And https://github.com/ray-project/ray/pull/8491
. conda_uint8_patch.sh

git clone -b master https://github.com/eugenevinitsky/sequential_social_dilemma_games
cd sequential_social_dilemma_games
python3 -m venv venv # Create a Python virtual environment
. venv/bin/activate
pip3 install --upgrade pip setuptools wheel
python3 setup.py develop
pip3 install -r requirements.txt
# Patch ray due to https://github.com/ray-project/ray/issues/7946
# And https://github.com/ray-project/ray/pull/8491
. venv_uint8_patch.sh

To install sb3|rllib|all requirements for learning:

pip3 install social-dilemmas[sb3|rllib|all]

If using RLlib:

After the setup, you can run experiments like so:

To train with default parameters (baseline model cleanup with 2 agents):

python3 run_scripts/train.py

To train the MOA with 5 agents:

python3 run_scripts/train.py --model moa --num_agents 5

Many more options are available which can be found in default_args.py. A collection of preconfigured training scripts can be found in run_scripts.

Note that the RLlib initialization time can be rather high (up to 5 minutes) the more agents you use, and the more complex your used model is.

To train using Stable-Baselines3 and parameter shared PPO:

python3 run_scripts/sb3_train.py --env harvest --num_agents 5

To train using MARL-Baselines3 and independent PPO:

python3 run_scripts/sb3_independent.py --env harvest --num_agents 5

To train using MARL-Baselines3 and independent PPO with inequity aversion:

python3 run_scripts/sb3_independent.py --env harvest --num_agents 5 --inequity-averse-reward=True --alpha=5.0 --beta=0.05

CUDA, cuDNN and tensorflow-gpu

If you run into any cuda errors, make sure you've got a compatible set of cuda/cudnn/tensorflow versions installed. However, beware of the following:

The compatibility table given in the tensorflow site does not contain specific minor versions for cuda and cuDNN. However, if the specific versions are not met, there will be an error when you try to use tensorflow. source

Tests

Tests are located in the test folder and can be run individually or run by running python -m pytest. Many of the less obviously defined rules for the games can be understood by reading the tests, each of which outline some aspect of the game.

Constructing new environments

Every environment that subclasses MapEnv probably needs to implement the following methods:

class NewMapEnv(MapEnv):
    ...
    
    def custom_reset(self):
        """Reset custom elements of the map. For example, spawn apples"""
        pass

    def custom_action(self, agent, action):
        """Execute any custom, non-move actions that may be defined, like fire or clean"""
        pass

    def custom_map_update(self):
        """Custom map updates that don't have to do with agent actions"""
        pass

    def setup_agents(self):
        """Construct all the agents for the environment"""
        raise NotImplementedError

PPO Results

The below graphs display results for cleanup/harvest using un-tuned PPO in RLlib. As of yet, A3C remains untested.

Collective cleanup reward:

Collective harvest reward:

Relevant papers

Leibo, J. Z., Zambaldi, V., Lanctot, M., Marecki, J., & Graepel, T. (2017). Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (pp. 464-473).
Hughes, E., Leibo, J. Z., Phillips, M., Tuyls, K., Dueñez-Guzman, E., Castañeda, A. G., Dunning, I., Zhu, T., McKee, K., Koster, R., Tina Zhu, Roff, H., Graepel, T. (2018). Inequity aversion improves cooperation in intertemporal social dilemmas. In Advances in Neural Information Processing Systems (pp. 3330-3340).
Jaques, N., Lazaridou, A., Hughes, E., Gulcehre, C., Ortega, P. A., Strouse, D. J., Leibo, J. Z. & de Freitas, N. (2018). Intrinsic Social Motivation via Causal Influence in Multi-Agent RL. arXiv preprint arXiv:1810.08647.

Contributors

This code base was initially developed by Eugene Vinitsky and Natasha Jaques; help with reproduction was provided by Joel Leibo, Antonio Castenada, and Edward Hughes. Additional development was done by Hugo Heemskerk. Support for PettingZoo was provided by Rohan Potdar.

Citation

If you want to cite this repository accademic work, please use the following citation:

@misc{SSDOpenSource,
author = {[Vinitsky, Eugene and Jaques, Natasha and Leibo, Joel and Castenada, Antonio and Hughes, Edward]},
title = {An Open Source Implementation of Sequential Social Dilemma Games},
year = {2019},
publisher = {GitHub},
note = {GitHub repository},
howpublished = {\url{https://github.com/eugenevinitsky/sequential_social_dilemma_games/issues/182}}
}

sequential_social_dilemma_games's People

Contributors

Stargazers

Watchers

Forkers

ml-lab codeaudit tranhoangkhuongvn narendasan patropavan zepingz satpreetsingh jiankangren wjh720 internetcoffeephone dcomplexity abbyluhui christopher-hsu gregschwartz18 tartaruszen yinjiangjin stjordanis sandguine zizai pdgeraldo ngocntkt spuronlee lee15253 justin-yuan 011235813 eager-seeker gauravg11 adamcomer ychhhen jeremydouglas91 swamydev zoeyuchao vermashresth vidhijain ldfrancis leloykun wwxfromtju susumuota goingmyway dylancope danfoa akileshbadrinaaraayanan albertcity rajbharti25 jhelum-ch slowika chasemcd silent-zebra suryagram raktimgg xuezzee vickyzauner samnew1 backpropper luckygt sayali27 davidwang527 yangrui2015 arjun-prakash contrivancecompanychicago wsg1873 kevingrey zhangnyg akbir michal-choinski carkham deadmoustache ankitshah009 vrehnberg icodein tkukurin kyeonghyeon-park zhaofromying aa2686 tamurajunichi quinn-dougherty pikachuzhao rainwangphy ynulihao xingxiaoyu1109 ykazemi autonomous-vehicle rohan138 jdelamer marshin dtradke karthiks1701 ascrypto caffa jkterry1 mail-ecnu amimem ybaker661 redtachyon minaek aslansd chunjiangmonkey jkun-332 hejichao2020 miguelsilva15

sequential_social_dilemma_games's Issues

Colors for orientations

Agents observing other agents need to know which way they are facing.
We need to shade the colors of the agents slightly (or some other scheme) to indicate orientation

Add return types and parameters for all methods in HarvestEnv

...before it's too late.

agent.get_state() returns a window of size (5,7,3)

Create env for harvest

Create agents for harvest

Agents occasionally leaves holes when they step over a spot in cleanup

Unit tests for harvest

Unit tests for cleanup

Fix spawn probabilities in cleanup

In cleanup the spawn probabilities seem to not quite work correctly; the waste spawns too fast.

pass env type to rollout in command line

I'm currently picking harvest vs. cleanup by actually changing it in the code in rollout.py, this should be a command line arg.

Implement interaction of firing beams with waste in cleanup

Fix orientation of axes

Currently the orientation of the axes is i.e. the diagram of moving in the ascii array looks like this

This is convenient for coding but will need to be flipped when actually displaying.

Clean up map update methods

Map update methods are currently split in a very ad-hoc way between HarvestEnv and MapEnv. As much of the update process should be moved into MapEnv.

Color map

Need to convert internal manipulated map environment into an appropriate color state for the agent.

Write utility function to rollout from rllib checkpoints

Run Independent learners for harvest and compare to results

Apples can't spawn where a firing beam is

The order of events is:
(1) Move
(2) Fire
(3) Spawn

However, we currently check in the apple spawn method if a firing beam is in the apple spawn point, and if so, we don't spawn an apple there. However, this leads to two bugs:
(1) Apples can't be spawned in points temporarily obscured by a firing beam
(2) We check if an agent is currently in that spot before spawning an apple, however, an agent COULD be there and is just temporarily obscured by the beam.

Calling env.render doesn't do anything

env.render should use matplotlib to show the results. It currently appears to do nothing.

Add test for river crossing in cleanup

if an agent crosses a river tile in cleanup it should stay when it moves
if an agent crosses a river tile, stays there, and then moves, the river tile should still be there when it moves
Similarly for stream cells, waste cells

Renderer to display the environments

Add test for agent firing twice at apple in harvest

if an agent fires twice at an apple consecutively, it should still be there when he stops

Create agents for cleanup

Second agent does not appear on rollout video

Pretty sure this isn't a bug so much as the screen is big and the whole thing doesn't get rendered.

What happens if a firing beam hits an apple?

Currently, a firing beam will cause the apple to disappear. Is this the correct behavior?

Implement spawning for lemons, apples, and waste in cleanup

function SoftHuangpu:postUpdate(gameState)
local wasteDensity = 0
if self._config.potentialWasteArea ~= 0 then
wasteDensity = 1 - self._state.permittedLemonCells:size() /
self._config.potentialWasteArea
end
if wasteDensity >= self._config.thresholdDepletion then
self._state.lemonSpawnProbability = 0
self._state.appleRespawnProbability = 0
else
self._state.lemonSpawnProbability = self._config.lemonSpawnProbability
if wasteDensity <= self._config.thresholdRestoration then
self._state.appleRespawnProbability = self._config.appleRespawnProbability
else
-- Interpolate.
self._state.appleRespawnProbability = (1 - (
wasteDensity - self._config.thresholdRestoration) / (
self._config.thresholdDepletion - self._config.thresholdRestoration)) *
self._config.appleRespawnProbability
end
end
super.postUpdate(self, gameState)
end

The hyperparameters are:

thresholdDepletion = 0.4
thresholdRestoration = 0.0
lemonSpawnProbability = 0.5
appleRespawnProbability = 0.05

Insert correct hyperparameters for harvest

Hyperparams for A3C
Window size of harvest agent
Convolutional filters from paper
Correct rewards

Add agent actions to rendered videos

If we can see what actions the agents intended to take at each frame, it will be easier to debug visually.
For example, the top right corner of the video could say: "Agent-1, move-left" or "Agent-2, fire-up"

Tool to save the videos

Build model for Harvest

Construct the following model in the run scripts (https://arxiv.org/pdf/1810.08647.pdf section 6.3):
a single convolutional layer with a kernel of size 3, stride of size 1, and 6 output
channels. This is connected to two fully connected layers of size 32 each, and an LSTM with 128 cells.

Add a test that the run_scripts work

You can run each of the run_scripts for 50 steps just to check that they are not broken by any changes.

Create causal Q learning diagram

Figure out if any additional variables need to be conditioned on
Create graphic for publication

Custom post-processor in RLlib for causal influence reward

This is super vague; I haven't investigated this part of RLlib enough yet to know what needs to be done here

Firing beam does not have the correct shape in harvest.

It should look like:

but it currently is just a straight line.

Firing is not showing up in video

Add description of how to create a new env

README.md should describe the six methods every subclass of MapEnv needs to implement.

Finish run script

Run script is up on the example_script branch in the run_script folder.
Remaining part is to figure out how to set the filters correctly for the ConvNet.

Add a gitignore

gitignore:
pycache
eggs
Can probably use a standard gitignore.

Implementation of a random agent

Agent should just take random actions
This will let us have something to visualize before the RL kicks in

River cells are not correctly put back after being hit with a firing beam in cleanup

Run script with more than 1 worker seems to freeze

set NUM_CPUS > 1 in run_harvest.py
see if it is or isn't freezing.

Add style checking + continuous integration tests.

Travis + Flake9

Create env for cleanup

Write tests for all the padding functions in map_env

Replay is broken for custom models

Custom models must be registered to be re-used in RLlib via: ModelCatalog.register_custom_model("", ). However, when we replay things we don't have this information anymore. Consider storing this as a tune function in the env config so that it can be recreated later.

Add synchrony to agent actions

Agent actions are currently stepped through one by one regardless of how they may interact with other actions.
For example, if an agent is at [1,1] with the intent of moving right and there is an agent at [2,2] with the intent of moving right, if agent at [1,1] moves first this move will be disallowed as there is already an agent there.

Create environments for harvest and cleanup

Descriptions of harvest and cleanup can be found at:
https://arxiv.org/pdf/1810.08647.pdf

Basic run script for IQL in RLlib

What happens if a firing beam hits an agent?

@natashamjaques do we just overlay a new color that indicates agent + firing beam, does the firing beam temporarily obscure the agent, or does the agent stay with color unchanged but a point gets subtracted? There's some choice here and I'm curious what ya'll did.

Environment for Cleanup

Check model correctness

@natashamjaques I've implemented a model from the paper; it is in the models/causal_to_fc_net.py. It would be good if you could check it over and see if it looks right to you. It'd be a tragedy if down the line we realized I misunderstood something.

eugenevinitsky / sequential_social_dilemma_games Goto Github PK

sequential_social_dilemma_games's Introduction

Sequential Social Dilemma Games

Implemented Games

Setup instructions

Anaconda/miniconda

CUDA, cuDNN and tensorflow-gpu

Tests

Constructing new environments

PPO Results

Relevant papers

Contributors

Citation

sequential_social_dilemma_games's People

Contributors

Stargazers

Watchers

Forkers

sequential_social_dilemma_games's Issues

Recommend Projects

Recommend Topics

Recommend Org