Coder Social home page Coder Social logo

dicg's Introduction

Deep Implicit Coordination Graphs (DICG)

Installation

Recommend to run code within a conda virtual environment. Create a virtual environment by

conda create -n dicg python=3.7

Activate the virtual environment by running

conda activate dicg

Install dependencies by running the following command in the root directory of this repo (in the virtual environment):

pip install -r requirements.txt

The predator-prey environment and the traffic junction environment are included in this repo. To run the StarCraft Multi-agent Challenge (SMAC) environment, please follow the instructions here. You will need to install the full StarCraft II game as well as the SMAC maps.

Running Experiments

Predator-Prey

Experiment runners are in /exp_runners/predatorprey/, detailed argument specification list can be found in the included runner files (e.g. detailed environment size, number of agents, network sizes, algorithm hyperparemters, etc.). To train in Predator-Prey, please cd into /exp_runners/predatorprey/ and run following commands by replacing penalty P with desired values (0, -1, -1.25 or -1.5):

Approach Command
DICG-CE-MLP python dicg_ce_predatorprey_runner.py --penalty P
DICG-DE-MLP python dicg_de_predatorprey_runner.py --penalty P
CENT-MLP python cent_predatorprey_runner.py --penalty P
DEC-MLP python dec_predatorprey_runner.py --penalty P

The model checkpoints will be saved in /exp_runners/predatorprey/data/local/.

Reported average return (average of 5 random seeds, utill 7.5e6 environment steps):

Approach P = 0 P = -1 P = -1.25 P = -1.5
DICG-CE-MLP 78.6 ± 0.1 70.2 ± 0.7 72.0 ± 1.0 72.0 ± 0.7
DICG-DE-MLP 78.2 ± 0.0 70.5 ± 2.7 72.4 ± 1.1 70.0 ± 2.7
CENT-MLP 75.0 ± 0.1 -39.8 ± 3.8 -52.28 ± 1.3 -65.8 ± 2.9
DEC-MLP 74.4 ± 0.3 56.6 ± 0.7 56.7 ± 1.1 57.9 ± 0.3

SMAC

Experiment runners are in /exp_runners/smac/, detailed argument specification list can be found in the included runner files. To train in SMAC, please cd into /exp_runners/smac/ and run following commands by replacing MAP with desired scenario strings (8m_vs_9m, 3s_vs_5z or 6h_vs_8z):

Approach Command
DICG-CE-LSTM python dicg_ce_smac_runner.py --map MAP
DICG-DE-LSTM python dicg_de_smac_runner.py --map MAP
CENT-LSTM python cent_smac_runner.py --map MAP
DEC-LSTM python dec_smac_runner.py --map MAP

The model checkpoints will be saved in /exp_runners/smac/data/local/.

Reported average win rate (average of 5 random seeds, utill 9e6 environment steps for 8m_vs_9m and 1.8e7 environment steps for the other two maps):

Approach 8m_vs_9m 3s_vs_5z 6h_vs_8z
DICG-CE-LSTM 72 ± 11 % 96 ± 3 % 9 ± 9 %
DICG-DE-LSTM 87 ± 6 % 99 ± 1 % 0
CENT-LSTM 42 ± 6 % 0 0
DEC-LSTM 65 ± 16 % 94 ± 5 % 0

Traffic Junction

Experiment runners are in /exp_runners/traffic/, detailed argument specification list can be found in the included runner files.

To train in Traffic Junction, please cd into /exp_runners/traffic/ and run following commands by replacing difficulty D with desired setting strings (easy, medium or hard):

Approach Command
DICG-CE-MLP python dicg_ce_traffic_runner.py --difficulty D
DICG-DE-MLP python dicg_de_traffic_runner.py --difficulty D
CENT-MLP python cent_traffic_runner.py --difficulty D
DEC-MLP python dec_traffic_runner.py --difficulty D

The model checkpoints will be saved in /exp_runners/traffic/data/local/.

Reported average success rate (average of 5 random seeds, utill 2.5e6 environment steps):

Approach easy medium hard
DICG-CE-MLP 98.1 ± 1.9 % 80.5 ± 6.8 % 22.8 ± 4.6 %
DICG-DE-MLP 95.6 ± 1.5 % 90.8 ± 2.9 % 82.2 ± 6.0 %
CENT-MLP 97.7 ± 0.9 % 0 0
DEC-MLP 90.2 ± 6.5 % 81.3 ± 4.8 % 69.4 ± 4.9 %

dicg's People

Contributors

parachutel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dicg's Issues

About traffic junction Env training and GPU available

Hi, I'm sorry to trouble you.
I am trying to run the code in Traffic Junction (each version) and follow the arguments.
But I find the result is not right (except easy version) for medium and hard version.
Finally, the evaluation success rate is just 0 when killed at total env steps (2.5e6 environment steps you already set)
I don't adjust anything about the initial setting.
command: python dicg_de_traffic_runner.py --difficulty 'medium' (or dicg_ce_traffic_runner.py)
python dicg_de_traffic_runner.py --difficulty 'hard' (or dicg_ce_traffic_runner.py)
And I don' t know where the error is now.
Also, Can I train with GPU? (CUDA available)?
I want to seek your help. Thanks!

Any difference between the baseline scripts and paper description?

Hi,

Thank you for sharing your wonderful work.

I was just trying to reproduce Fig.3 from your paper and when I run python dec_predatorprey_runner.py --penalty -1 --seed 0 using all default parameters, it gives the final average return of 68.947. It seems overly well compared to the baseline from Fig.3 in your paper and the reference results table in README.md. So I was wondering if there's any parameter I need to alter to get the reported results for decentralized architecture?

Thank you very much.

Zero win rate in SMAC scenario

Hello, thanks for your elegant code.
When I directly run:
python dicg_ce_smac_runner.py --map 3s_vs_5z
python dicg_ce_smac_runner.py --map 6h_vs_8z
the win rate is always zero.

When I directly run:
python dicg_ce_smac_runner.py --map 8m_vs_9m
the win rate is lower than 0.1,
which are very different from the results reported in the paper.

Can you give me a hand?

Seeds used in Figure 3 of the paper

Hello, My name is Roman Chiva. I am a student at TU Delft in the Netherlands. Together with a group of students we are carrying out a reproducibility project of the results presented by the paper ''Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning". One of the parts we have to reproduce is Figure 3 of the paper in which the average return for test episodes for varying penalties 𝑝 averaged over 5 runs is shown. We were wondering what seeds were used for these runs.

Regards,
Roman Chiva

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.