inoryy / reaver Goto Github PK

Reaver: Modular Deep Reinforcement Learning Framework. Focused on StarCraft II. Supports Gym, Atari, and MuJoCo.

License: MIT License

Python 100.00%

artificial-intelligence deep-learning machine-learning reinforcement-learning actor-critic tensorflow pysc2 starcraft-ii starcraft2 deepmind

reaver's Introduction

Reaver: Modular Deep Reinforcement Learning Framework

Project status: No longer maintained!
Unfortunately, I am no longer able to further develop or provide support to the project.

Introduction

Reaver is a modular deep reinforcement learning framework with a focus on various StarCraft II based tasks, following in DeepMind's footsteps who are pushing state-of-the-art of the field through the lens of playing a modern video game with human-like interface and limitations. This includes observing visual features similar (though not identical) to what a human player would perceive and choosing actions from similar pool of options a human player would have. See StarCraft II: A New Challenge for Reinforcement Learning article for more details.

Though development is research-driven, the philosophy behind Reaver API is akin to StarCraft II game itself - it has something to offer both for novices and experts in the field. For hobbyist programmers Reaver offers all the tools necessary to train DRL agents by modifying only a small and isolated part of the agent (e.g. hyperparameters). For veteran researchers Reaver offers simple, but performance-optimized codebase with modular architecture: agent, model, and environment are decoupled and can be swapped at will.

While the focus of Reaver is on StarCraft II, it also has full support for other popular environments, notably Atari and MuJoCo. Reaver agent algorithms are validated against reference results, e.g. PPO agent is able to match Proximal Policy Optimization Algorithms. Please see below for more details.

Installation

PIP Package

Easiest way to install Reaver is through the PIP package manager:

pip install reaver

You can also install additional extras (e.g. gym support) through the helper flags:

pip install reaver[gym,atari,mujoco]

Manual Installation

If you plan to modify Reaver codebase you can retain its module functionality by installing from source:

$ git clone https://github.com/inoryy/reaver-pysc2
$ pip install -e reaver-pysc2/

By installing with -e flag Python will now look for reaver in the specified folder, rather than site-packages storage.

Windows

Please see the wiki page for detailed instructions on setting up Reaver on Windows.

However, if possible please consider using Linux OS instead - due to performance and stability considerations. If you would like to see your agent perform with full graphics enabled you can save a replay of the agent on Linux and open it on Windows. This is how the video recording listed below was made.

Requirements

PySC2 >= 3.0.0
StarCraft II >= 4.1.2 (instructions)
gin-config >= 0.3.0
TensorFlow >= 2.0.0
TensorFlow Probability >= 0.9

Optional Extras

If you would like to use Reaver with other supported environments, you must install relevant packages as well:

gym >= 0.10.0
atari-py >= 0.1.5
mujoco-py >= 1.50.0
- roboschool >= 1.0 (alternative)

Quick Start

You can train a DRL agent with multiple StarCraft II environments running in parallel with just four lines of code!

import reaver as rvr

env = rvr.envs.SC2Env(map_name='MoveToBeacon')
agent = rvr.agents.A2C(env.obs_spec(), env.act_spec(), rvr.models.build_fully_conv, rvr.models.SC2MultiPolicy, n_envs=4)
agent.run(env)

Moreover, Reaver comes with highly configurable commandline tools, so this task can be reduced to a short one-liner!

python -m reaver.run --env MoveToBeacon --agent a2c --n_envs 4 2> stderr.log

With the line above Reaver will initialize the training procedure with a set of pre-defined hyperparameters, optimized specifically for the given environment and agent. After awhile you will start seeing logs with various useful statistics in your terminal screen.

| T    118 | Fr     51200 | Ep    212 | Up    100 | RMe    0.14 | RSd    0.49 | RMa    3.00 | RMi    0.00 | Pl    0.017 | Vl    0.008 | El 0.0225 | Gr    3.493 | Fps   433 |
| T    238 | Fr    102400 | Ep    424 | Up    200 | RMe    0.92 | RSd    0.97 | RMa    4.00 | RMi    0.00 | Pl   -0.196 | Vl    0.012 | El 0.0249 | Gr    1.791 | Fps   430 |
| T    359 | Fr    153600 | Ep    640 | Up    300 | RMe    1.80 | RSd    1.30 | RMa    6.00 | RMi    0.00 | Pl   -0.035 | Vl    0.041 | El 0.0253 | Gr    1.832 | Fps   427 |
...
| T   1578 | Fr    665600 | Ep   2772 | Up   1300 | RMe   24.26 | RSd    3.19 | RMa   29.00 | RMi    0.00 | Pl    0.050 | Vl    1.242 | El 0.0174 | Gr    4.814 | Fps   421 |
| T   1695 | Fr    716800 | Ep   2984 | Up   1400 | RMe   24.31 | RSd    2.55 | RMa   30.00 | RMi   16.00 | Pl    0.005 | Vl    0.202 | El 0.0178 | Gr   56.385 | Fps   422 |
| T   1812 | Fr    768000 | Ep   3200 | Up   1500 | RMe   24.97 | RSd    1.89 | RMa   31.00 | RMi   21.00 | Pl   -0.075 | Vl    1.385 | El 0.0176 | Gr   17.619 | Fps   423 |

Reaver should quickly converge to about 25-26 RMe (mean episode rewards), which matches DeepMind results for this environment. Specific training time depends on your hardware. Logs above are produced on a laptop with Intel i5-7300HQ CPU (4 cores) and GTX 1050 GPU, the training took around 30 minutes.

After Reaver has finished training, you can look at how it performs by appending --test and --render flags to the one-liner.

python -m reaver.run --env MoveToBeacon --agent a2c --test --render 2> stderr.log

Google Colab

A companion Google Colab notebook notebook is available to try out Reaver online.

Key Features

Performance

Many modern DRL algorithms rely on being executed in multiple environments at the same time in parallel. As Python has GIL, this feature must be implemented through multiprocessing. Majority of open source implementations solve this task with message-based approach (e.g. Python multiprocessing.Pipe or MPI), where individual processes communicate by sending data through IPC. This is a valid and most likely only reasonable approach for large-scale distributed approaches that companies like DeepMind and openAI operate on.

However, for a typical researcher or hobbyist a much more common scenario is having access only to a single machine environment, whether it is a laptop or a node on a HPC cluster. Reaver is optimized specifically for this case by making use of shared memory in a lock-free manner. This approach nets significant performance boost of up to 1.5x speed-up in StarCraft II sampling rate (and up to 100x speedup in general case), being bottle-necked almost exclusively by GPU input/output pipeline.

Extensibility

The three core Reaver modules - envs, models, and agents are almost completely detached from each other. This ensures that extending functionality in one module is seamlessly integrated into the others.

Configurability

All configuration is handled through gin-config and can be easily shared as .gin files. This includes all hyperparameters, environment arguments, and model definitions.

Implemented Agents

Advantage Actor-Critic (A2C)
Proximal Policy Optimization (PPO)

Additional RL Features

Generalized Advantage Estimation (GAE)
Rewards clipping
Gradient norm clipping
Advantage normalization
Baseline (critic) bootstrapping
Separate baseline network

But Wait! There's more!

When experimenting with novel ideas it is important to get feedback quickly, which is often not realistic with complex environments like StarCraft II. As Reaver was built with modular architecture, its agent implementations are not actually tied to StarCraft II at all. You can make drop-in replacements for many popular game environments (e.g. openAI gym) and verify implementations work with those first:

python -m reaver.run --env CartPole-v0 --agent a2c 2> stderr.log

import reaver as rvr

env = rvr.envs.GymEnv('CartPole-v0')
agent = rvr.agents.A2C(env.obs_spec(), env.act_spec())
agent.run(env)

Supported Environments

Currently the following environments are supported by Reaver:

StarCraft II via PySC2 (tested on all minigames)
openAI Gym (tested on CartPole-v0)
Atari (tested on PongNoFrameskip-v0)
Mujoco (tested on InvertedPendulum-v2 and HalfCheetah-v2)

Results

Map	Reaver (A2C)	DeepMind SC2LE	DeepMind ReDRL	Human Expert
MoveToBeacon	26.3 (1.8) [21, 31]	26	27	28
CollectMineralShards	102.8 (10.8) [81, 135]	103	196	177
DefeatRoaches	72.5 (43.5) [21, 283]	100	303	215
FindAndDefeatZerglings	22.1 (3.6) [12, 40]	45	62	61
DefeatZerglingsAndBanelings	56.8 (20.8) [21, 154]	62	736	727
CollectMineralsAndGas	2267.5 (488.8) [0, 3320]	3,978	5,055	7,566
BuildMarines	--	3	123	133

Human Expert results were gathered by DeepMind from a GrandMaster level player.
DeepMind ReDRL refers to current state-of-the-art results, described in Relational Deep Reinforcement Learning article.
DeepMind SC2LE are results published in StarCraft II: A New Challenge for Reinforcement Learning article.
Reaver (A2C) are results gathered by training the reaver.agents.A2C agent, replicating SC2LE architecture as closely as possible on available hardware. Results are gathered by running the trained agent in --test mode for 100 episodes, calculating episode total rewards. Listed are the mean, standard deviation (in parentheses), and min & max (in square brackets).

Training Details

Map	Samples	Episodes	Approx. Time (hr)
MoveToBeacon	563,200	2,304	0.5
CollectMineralShards	74,752,000	311,426	50
DefeatRoaches	172,800,000	1,609,211	150
FindAndDefeatZerglings	29,760,000	89,654	20
DefeatZerglingsAndBanelings	10,496,000	273,463	15
CollectMineralsAndGas	16,864,000	20,544	10
BuildMarines	-	-	-

Samples refer to total number of observe -> step -> reward chains in one environment.
Episodes refer to total number of StepType.LAST flags returned by PySC2.
Approx. Time is the approximate training time on a laptop with Intel i5-7300HQ CPU (4 cores) and GTX 1050 GPU.

Note that I did not put much time into hyperparameter tuning, focusing mostly on verifying that the agent is capable of learning rather than maximizing sample efficiency. For example, naive first try on MoveToBeacon required about 4 million samples, however after some playing around I was able to reduce it down all the way to 102,000 (~40x reduction) with PPO agent.

Mean episode rewards with std.dev filled in-between. Click to enlarge.

Video Recording

A video recording of the agent performing on all six minigames is available online at: https://youtu.be/gEyBzcPU5-w. In the video on the left is the agent acting in with randomly initialized weights and no training, whereas on the right he is trained to target scores.

Reproducibility

The problem of reproducibility of research has recently become a subject of many debates in science in general, and Reinforcement Learning is not an exception. One of the goals of Reaver as a scientific project is to help facilitate reproducible research. To this end Reaver comes bundled with various tools that simplify the process:

All experiments are saved into separate folders with automatic model checkpoints enabled by default
All configuration is handled through gin-config Python library and saved to experiment results directory
During training various statistics metrics are duplicated into experiment results directory
Results directory structure simplifies sharing individual experiments with full information

Pre-trained Weights & Summary Logs

To lead the way with reproducibility, Reaver is bundled with pre-trained weights and full Tensorboard summary logs for all six minigames. Simply download an experiment archive from the releases tab and unzip onto the results/ directory.

You can use pre-trained weights by appending --experiment flag to reaver.run command:

python reaver.run --map <map_name> --experiment <map_name>_reaver --test 2> stderr.log

Tensorboard logs are available if you launch tensorboard --logidr=results/summaries.
You can also view them directly online via Aughie Boards.

Why "Reaver"?

Reaver is a very special and subjectively cute Protoss unit in the StarCraft game universe. In the StarCraft: Brood War version of the game, Reaver was notorious for being slow, clumsy, and often borderline useless if left on its own due to buggy in-game AI. However, in the hands of dedicated players that invested time into mastery of the unit, Reaver became one of the most powerful assets in the game, often playing a key role in tournament winning games.

Acknowledgement

A predecessor to Reaver, named simply pysc2-rl-agent, was developed as the practical part of bachelor's thesis at the University of Tartu under the supervision of Ilya Kuzovkin and Tambet Matiisen. You can still access it on the v1.0 branch.

Support

If you encounter a codebase related problem then please open a ticket on GitHub and describe it in as much detail as possible. If you have more general questions or simply seeking advice feel free to send me an email.

I am also a proud member of an active and friendly SC2AI online community, we mostly use Discord for communication. People of all backgrounds and levels of expertise are welcome to join!

Citing

If you have found Reaver useful in your research, please consider citing it with the following bibtex:

@misc{reaver,
  author = {Ring, Roman},
  title = {Reaver: Modular Deep Reinforcement Learning Framework},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/inoryy/reaver}},
}

reaver's People

Contributors

Stargazers

Watchers

Forkers

moshebeutel amoliu ituco mushroom1116 jevmar moon0823 taylor-liu suchot ericonaldo morningsky jonarain fd-mingjie chenlheng johnyfeng hsywhu shyamalschandra jaymgrayson chunde darknessbeforedawn wowoyang stjordanis mbyase sharmer156 xucongyong samangel93 gmhou tonyle9 apollozhang2012 ml-lab snowfeet ygshuwu bluepine drag0nglass xuexixuexihaha wwxfromtju joshgavinhong wanjinchang whattress brandonliuli spearous zhangluoyang shadowkun we1l1n 2to3rdpwr jinjunqi collector-m yanxiaobin-ben empereurcc wozqhl haolu1994 jiyulongxu yueyedeai demonxjj kartikmehta09 badaben yiningzeng victorleelk desire142 mateusmatiazzi cavallonechen hatsu3 scotty1100110 paulchou0309 9578577 verystrongjoe dudwojae liucongalbin xinyiys qwertymaster00 athinkingneal caiyangcy viehzeug drwxyh kubilaykagankomurcu guitar64 liuiia huoliangyu dshen007 woshiqchi owenpanqiufeng ramonpereira ali-i-abbas hmf0103 danielqujun cobase2010 axia75 tylersarah

reaver's Issues

Simplify setup

Create an env.yml for conda
Maybe docker container?
- How to handle sc2 installation? Blizzard explicitly requires EULA agreement

How to use the Plot.py util file?

Hi @inoryy ,
I am trying to figure out how to figure out how to use this plot.py file in your utils folder, I want more data on the experiment I am running.

Sorry if this is basic, but could you explain how you use this util file?
As in do I have to call this from command line or a python file?

ACKTR

Investigate if tensorflow/kfac works correctly
Implement the algorithm

Marine stuck in 'MoveToBeacon'

Hi there! Thanks for your great work! 👍
But I met some unexpected problem, my environment is Window 10.
my code is as follow:

import reaver as rvr
from multiprocessing import Process

if __name__ == '__main__':
    p = Process()
    p.start()
    env = rvr.envs.SC2Env(map_name='MoveToBeacon')
    agent = rvr.agents.A2C(env.obs_spec(), env.act_spec()
                           , rvr.models.build_fully_conv, rvr.models.SC2MultiPolicy, n_envs=1)
    agent.run(env)

But I got those Traceback and Marine just will not move to anywhere:

Process Process-2:
Traceback (most recent call last):
  File "C:\Users\Saber\Anaconda3\lib\multiprocessing\process.py", line 258, in _bootstrap
    self.run()
  File "C:\Users\Saber\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Saber\Anaconda3\lib\site-packages\reaver\envs\base\multiproc.py", line 52, in _run
    obs = self._env.reset()
  File "C:\Users\Saber\Anaconda3\lib\site-packages\reaver\envs\sc2.py", line 69, in reset
    obs, reward, done = self.obs_wrapper(self._env.reset())
  File "C:\Users\Saber\Anaconda3\lib\site-packages\reaver\envs\sc2.py", line 126, in __call__
    obs['feature_screen'][self.feature_masks['screen']],
  File "C:\Users\Saber\Anaconda3\lib\site-packages\pysc2\lib\named_array.py", line 145, in __getitem__
    index = _get_index(obj, index)
  File "C:\Users\Saber\Anaconda3\lib\site-packages\pysc2\lib\named_array.py", line 207, in _get_index
    "Can't index by type: %s; only int, string or slice" % type(index))
TypeError: Can't index by type: <class 'list'>; only int, string or slice

and I also got stuck in 'CartPole-v0'. Nothing will be shown after I have wait for quite a few moment and my code is:

import reaver as rvr
from multiprocessing import Process

if __name__ == '__main__':
    p = Process()
    p.start()
    env = rvr.envs.GymEnv('CartPole-v0')
    agent = rvr.agents.A2C(env.obs_spec(), env.act_spec())
    agent.run(env)

Any idea about this, Thanks!

Rewards clipping or scaling

Need to investigate if clipping or scaling rewards improves performance.

Does it even make sense if I'm already clipping grads?
How will the agent known that one action is better than other if both get reward = 1?

Some errors when running the code

I have received some errors when running the code. But I don't know why this happens.

Process Process-1:
Traceback (most recent call last):
File "C:\Anaconda3\lib\multiprocessing\process.py", line 252, in _bootstrap
self.run()
File "C:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "E:\liupenghui\pysc2-rl-agent-master\common\env.py", line 22, in worker
env = env_fn_wrapper.x()
File "E:\liupenghui\pysc2-rl-agent-master\common\env.py", line 14, in _thunk
env = sc2_env.SC2Env(**params)
File "C:\Anaconda3\lib\site-packages\pysc2\env\sc2_env.py", line 132, in init
self._setup((agent_race, bot_race, difficulty), **kwargs)
File "C:\Anaconda3\lib\site-packages\pysc2\env\sc2_env.py", line 173, in _setup
self.run_config = run_configs.get()
File "C:\Anaconda3\lib\site-packages\pysc2\run_configs_init.py", line 38, in get
if FLAGS.sc2_run_config is None: # Find the highest priority as default.
File "C:\Anaconda3\lib\site-packages\absl\flags_flagvalues.py", line 488, in getattr
raise _exceptions.UnparsedFlagAccessError(error_message)
absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --sc2_run_config before flags were parsed.

Separate std variable for continuous policies

Seems using a single, separate variable for (log?) standard deviation is more popular than making it part of the network, e.g. (Schulman et al., 2015). Should probably use this way instead of currently implemented, at least while comparing algorithms against baselines.

Can't use tf.get_variable() though, goes away in 2.0.

question about sampling

Hey,

I wanted to ask about calculation in sample function.

return tf.argmax(tf.log(u) / probs, axis=1)

it divides from probs. Does that mean that lower probabilities have better chances to get picked? Better exploration???

Unstable performance, sometimes agent converges to no_op action

Thank you for great release. I try to train an agent on CollectMineralShards, but can not repeat the performance as reported. I made several tries, but only get reward=75 at 100k steps. Is there any config parameters I should change? Thanks~

Random seed setting

Hi, inoryy.
How did you set your random seed, as shown in your learning curves?

code error when running the test code

import reaver as rvr
env = rvr.envs.SC2Env(map_name='MoveToBeacon')
agent = rvr.agents.A2C(env.obs_spec(), env.act_spec(), rvr.models.build_fully_conv,rvr.models.SC2MultiPolicy, n_envs=1)
agent.run(env)

1st error :..pysc2/lib/features.py:737: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.

2ed error:
.../pysc2/lib/named_array.py", line 208, in _get_index
"Can't index by type: %s; only int, string or slice" % type(index))

Agent occasionally stops moving

Most likely due to faulty masking of unavailable actions. Maybe clipping probs to 1e-6 is a bad idea?

Related to #7

Possible to make agents play against eacher other?

Hello!

I am new to Reinforcement Learning, and really wanted to try and implement a model that would be able to play itself, and I found your awesome project!

Is there a way to make it play itself, speedup?

Fix default logger

Current default is essentially /dev/null which is probably not the expected behavior for people trying to run reaver from inside their own codebase.

Restore previous experiment result issue

Thank you for always kindly answering.

Suddenly the computer stopped when learning and tried to use the restore function to start with the learning results done so far. However, I discovered in the log file the phenomenon that reading was no longer progressing and stopped at a place.

In this case, which part is the problem?

From Dohyeong

Trouble with running PySC2

Hey I created an env on conda to test reaver and when i tried the command to try Beacon I had the logger issue. I followed your hotfix but I end up having weird errors depending on the agent I specify (A2C/PPO).
For PPO for instance I get this error :

Traceback (most recent call last): File "C:\ProgramData\Anaconda3\envs\tf\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "C:\ProgramData\Anaconda3\envs\tf\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, **self._kwargs) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\base\msg_multiproc.py", line 48, in _run obs = self._env.reset() File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\sc2.py", line 73, in reset obs, reward, done = self.obs_wrapper(self._env.reset()) File "C:\ProgramData\Anaconda3\envs\tf\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "C:\ProgramData\Anaconda3\envs\tf\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\sc2.py", line 130, in __call__ obs['feature_screen'][self.feature_masks['screen']], File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pysc2\lib\named_array.py", line 145, in __getitem__ index = _get_index(obj, index) File "C:\ProgramData\Anaconda3\envs\tf\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, **self._kwargs) File "C:\ProgramData\Anaconda3\envs\tf\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, **self._kwargs) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pysc2\lib\named_array.py", line 207, in _get_index "Can't index by type: %s; only int, string or slice" % type(index)) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\base\msg_multiproc.py", line 48, in _run obs = self._env.reset() File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\base\msg_multiproc.py", line 48, in _run obs = self._env.reset() TypeError: Can't index by type: <class 'list'>; only int, string or slice File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\sc2.py", line 73, in reset obs, reward, done = self.obs_wrapper(self._env.reset()) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\sc2.py", line 73, in reset obs, reward, done = self.obs_wrapper(self._env.reset()) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\sc2.py", line 130, in __call__ obs['feature_screen'][self.feature_masks['screen']], File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\sc2.py", line 130, in __call__ obs['feature_screen'][self.feature_masks['screen']], File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pysc2\lib\named_array.py", line 145, in __getitem__ index = _get_index(obj, index) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pysc2\lib\named_array.py", line 145, in __getitem__ index = _get_index(obj, index) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pysc2\lib\named_array.py", line 207, in _get_index "Can't index by type: %s; only int, string or slice" % type(index)) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pysc2\lib\named_array.py", line 207, in _get_index "Can't index by type: %s; only int, string or slice" % type(index)) TypeError: Can't index by type: <class 'list'>; only int, string or slice TypeError: Can't index by type: <class 'list'>; only int, string or slice

I love your work anyway and looking forward a fix thanks

Screen features KeyError

Hello I m trying to run the script with these flags that essentially specify both screen and feature observations:

`parser = argparse.ArgumentParser()
parser.add_argument("--gpu", type=int, default=0)
parser.add_argument("--sz", type=int, default=32)

parser.add_argument("--feature_screen_size", type=int, default=84)
parser.add_argument("--feature_minimap_size", type=int, default=64)
#action space features 1, rgb 2 (needed if both rgb and features are on)
parser.add_argument("--action_space", type=str, default='features')

parser.add_argument("--rgb_screen_size", type=str, default="120")
parser.add_argument("--rgb_minimap_size", type=str, default="64")

parser.add_argument("--envs", type=int, default=32)
parser.add_argument("--render", type=int, default=1)
parser.add_argument("--steps", type=int, default=16)
parser.add_argument("--updates", type=int, default=1000000)
parser.add_argument('--lr', type=float, default=7e-4)
parser.add_argument('--vf_coef', type=float, default=0.25)
parser.add_argument('--ent_coef', type=float, default=1e-3)
parser.add_argument('--discount', type=float, default=0.99)
parser.add_argument('--clip_grads', type=float, default=1.)
parser.add_argument("--run_id", type=int, default=-1)
parser.add_argument("--map", type=str, default='MoveToBeacon')
parser.add_argument("--cfg_path", type=str, default='config.json.dist')
parser.add_argument("--test", type=bool, nargs='?', const=True, default=False)
parser.add_argument("--restore", type=bool, nargs='?', const=True, default=False)
parser.add_argument('--save_replay', type=bool, nargs='?', const=True, default=False)`

but I got this error:
return [self._preprocess(obs, _type) for _type in ['screen', 'minimap'] + self.feats['non_spatial']] File "/home/dstefanidis/starcraft_codes/pysc2-rl-agent/common/config.py", line 106, in _preprocess spatial = [[ob[_type][f.index] for f in self._feats(_type)] for ob in obs] File "/home/dstefanidis/starcraft_codes/pysc2-rl-agent/common/config.py", line 106, in <listcomp> spatial = [[ob[_type][f.index] for f in self._feats(_type)] for ob in obs] File "/home/dstefanidis/starcraft_codes/pysc2-rl-agent/common/config.py", line 106, in <listcomp> spatial = [[ob[_type][f.index] for f in self._feats(_type)] for ob in obs] KeyError: 'screen'

Dimension of screen feature

Hello,

I am trying to apply Relational Network in DefeatRoaches environment using the code you uploaded.
The size of the screen feature is 16, but is it too small to affect performance?

I want to know that the performance graph shown on the web page is the performance when using the size of the screen feature.

From Dohyeong Kim

Update code for newer dependencies

Need to update code for PySC2 v2.0. Maybe TensorFlow?

Related to #6

Fail to run the demo

When I run the demo code shown on the readme page, there is a error occured as below.
RuntimeError: v1.summary.FileWriter is not compatible with eager execution. Use tf.summary.create_file_writer,or a with v1.Graph().as_default(): context

error with Function xx is currently not available

Hi, thank you for great reaver,
I test the run.py with --env MoveToBeacon --agent ppo --n_envs 1 in macOS without GPU, but get follow error

Traceback (most recent call last):
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/reaver/envs/base/shm_multiproc.py", line 48, in _run
obs, rew, done = self._env.step(data)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/reaver/envs/sc2.py", line 87, in step
obs, reward, done = self.obs_wrapper(self._env.step(self.act_wrapper(action)))
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/pysc2/lib/stopwatch.py", line 212, in _stopwatch
return func(*args, **kwargs)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/pysc2/env/sc2_env.py", line 537, in step
actions = [[f.transform_action(o.observation, a, skip_available=skip)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/pysc2/env/sc2_env.py", line 537, in
actions = [[f.transform_action(o.observation, a, skip_available=skip)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/pysc2/env/sc2_env.py", line 537, in
actions = [[f.transform_action(o.observation, a, skip_available=skip)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/pysc2/lib/stopwatch.py", line 212, in _stopwatch
return func(*args, **kwargs)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/pysc2/lib/features.py", line 1608, in transform_action
raise ValueError("Function %s/%s is currently not available" % (
ValueError: Function 331/Move_screen is currently not available

the error about action id is not same every time.
then I run the command python3 -m pysc2.bin.agent --map MoveToBeacon, the result:

I1023 10:11:01.933034 4632368576 sc2_env.py:506] Starting episode 1: [terran] on MoveToBeacon
0/no_op ()
1/move_camera (1/minimap [64, 64])
2/select_point (6/select_point_act [4]; 0/screen [84, 84])
3/select_rect (7/select_add [2]; 0/screen [84, 84]; 2/screen2 [84, 84])
4/select_control_group (4/control_group_act [5]; 5/control_group_id [10])
7/select_army (7/select_add [2])
453/Stop_quick (3/queued [2])
451/Smart_screen (3/queued [2]; 0/screen [84, 84])
452/Smart_minimap (3/queued [2]; 1/minimap [64, 64])
331/Move_screen (3/queued [2]; 0/screen [84, 84])
332/Move_minimap (3/queued [2]; 1/minimap [64, 64])
333/Patrol_screen (3/queued [2]; 0/screen [84, 84])
334/Patrol_minimap (3/queued [2]; 1/minimap [64, 64])
12/Attack_screen (3/queued [2]; 0/screen [84, 84])
13/Attack_minimap (3/queued [2]; 1/minimap [64, 64])
274/HoldPosition_quick (3/queued [2])

my pysc2 version is 3.0.0 , reaver version is 2.1.9
I had set the ensure_available_actions=False and it works,but I dont think it`s good idea

RNN based agent

AlphaStar uses LSTM right? Why there is no RNN in your rvr.models.build_fully_conv ?

I train under the ubuntu

Dear Inorry,
Thanks your sharing, I can learn a lot. Now I have trained four minigames and they are consistent with your results. But the other three minigames can not run. the error is id 1/id 17 unknown. I use gtx1080 ti , ubuntu16.04. I wonder if it has something to do with it?

Parameter "--sz" function

What does the parameter "--sz" mean in main.py?

Question about performance on BuildMarines

Hi @inoryy , in addition to the results on these minigames. I notice there isn't any results on BuildMarines, may I ask if there is an update or if there is a planned follow-up?

By the way, awesome repo!

Tensorflow 2.X issue

Hi,

Thank you very much for the nice open-source project! After installation, I have a tf.summary.FileWriter is not compatible with eager execution error when I try agent = rvr.agents.A2C(env.obs_spec(), env.act_spec(), rvr.models.build_fully_conv, rvr.models.SC2MultiPolicy, n_envs=4). I think this is because of tensorflow version issue. I wonder how did you handle this issue!

Thanks

Is this a bug in runner.py?

Thank you for the great codes. When I tried new maps, I found some problems in runner.py. When there are more than one env, one env have done before others, then it is going to restart the game. At the end, all envs are done, the calculated rewards contain many episodes, which is a much bigger number. If you understand what I am talking about, please tell me is there any problem?

import reaver results in error

First, Following the install instructions to use source for both reaver and pysc2.

The error has changed. I realized I did not have TF-probability installed. However I'm still receiving an error.

When running import reaver i receive the fallowing error

Traceback (most recent call last):
File "", line 1, in
File "/home/hf/.local/lib/python3.6/site-packages/reaver/init.py", line 1, in
import reaver.envs
File "/home/hf/.local/lib/python3.6/site-packages/reaver/envs/init.py", line 6, in
from .gym import GymEnv
File "/home/hf/.local/lib/python3.6/site-packages/reaver/envs/gym.py", line 3, in
from reaver.envs.atari import AtariPreprocessing
File "/home/hf/.local/lib/python3.6/site-packages/reaver/envs/atari.py", line 29, in
import gin.tf
File "/home/hf/.local/lib/python3.6/site-packages/gin/tf/init.py", line 20, in
from gin.tf.utils import GinConfigSaverHook
File "/home/hf/.local/lib/python3.6/site-packages/gin/tf/utils.py", line 34, in
config.register_file_reader(tf.io.gfile.GFile, tf.io.gfile.exists)
AttributeError: module 'tensorflow._api.v1.io' has no attribute 'gfile'

=====\==============//====================

OLD ERROR:
When running import reaver i receive the fallowing error

  File "<stdin>", line 1, in <module>
  File "/home/hf/.local/lib/python3.6/site-packages/reaver/__init__.py", line 1, in <module>
    import reaver.envs
  File "/home/hf/.local/lib/python3.6/site-packages/reaver/envs/__init__.py", line 2, in <module>
    from .sc2 import SC2Env
  File "/home/hf/.local/lib/python3.6/site-packages/reaver/envs/sc2.py", line 5, in <module>
    from pysc2.lib import actions
ModuleNotFoundError: No module named 'pysc2.lib'

However I have no problems with 'import pysc2'

Error in loading pre-trained models for minigames

Hi just wanted to reproduce the results reported by downloading the zip files from releases

I ran into two issues:

In the unzipped folders, there seems to be missing the config.gin files
In an attempt to resolve the first issue, I started the training for the corresponding minigame and interrupted to get the config.gin file. But loading the model checkpoints seems to give another problem, giving the error message:
Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint.
Could it be due to the version of tensorflow?

Thanks again!

How to watch full graphics replay

Hello,

Can you please elaborate further on how you recorded the full graphics replay? I am currently using SC2 Linux version 4.1.2 and have been trying to watch the full graphics replay on Windows by logging into the Battle.net, but I keep failing to do so presumably because of the version difference. You briefly mentioned on the README about it, but can you please explain it in a little more detail on how you did it?

Thank you.

Agents not training in final step

Hi inoryy,

I am having an issue with the final step where the agents train in the reaver colab.

Everything loads fine until the pygames part. It just shows that I have deleted tcmalloc library and stops running shortly there after.

Any ideas on how to fix?

No way to save replays

After carefully inspecting the code, I don't see a way to specify the replay directory or flag. However in previous versions, it seems that this functionality was included.

CPU BiasOp only supports NHWC.

I received some errors when running the code.
InvalidArgumentError (see above for traceback): CPU BiasOp only supports NHWC. [[Node: Conv/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](Conv/convolution, Conv/biases/read)]]
I want to use 2 gpu so I modified the args.gpu=2.

Implementaion of transfomer into model

Hello, thank you for sharing good code.

I am trying to solve a DefeatRoaches minigame by using a Relational Network.

I found a example code of Transformer for MNIST classification and modified a fully_conv.py file for it. Unlike original code, I only use a screen feature without a minimap feature. But, result is still not good.

Would you like give me recommendation how to modify it for reaching performance of DeepMind?

Thank you.
From Dohyeong

Implementation code : https://github.com/kimbring2/pysc2_transformer/blob/master/graph_network.py
Train result :

Max pooling

Need to try adding max pool layer into the model.
Intuitively agent might benefit from spatial translation invariance on some maps like DefeatRoaches.
Why doesn't DM use it?

Issue with running on google colab

the last step,it shows:
ERROR: ld.so: object '/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.

Windows support (multiprocessing / shared memory)

There's no os.fork() in Windows, so it seems that when I launch a new worker it re-creates ProcEnv object, which no longer has access to MultiProcEnv shared memory reference. Need to either rewrite how I give the reference or temporarily implement message-based communication instead for Windows.

I would like to ask how much memory is needed to run the code?

I received some errors when running the code.
OOM when allocating tensor with shape[512,1850,32,32]
So I would like to ask how much memory is needed to run this code?
Thank you.

Issue running StarCraft II agent on CPU only setup

First, if I run python -m reaver.run --env MoveToBeacon --agent a2c --n_envs 4 2> stderr.log I get UnimplementedError (see above for traceback): Generic conv implementation only supports NHWC tensor format for now. So I changed line67 in run.py into if not int(args.gpu)
And after that, this problem seems to be solved, but I got another Problem whenever the game loading is done:ValueError: Argument is out of range for 12/Attack_screen (3/queued [2]; 0/screen [0, 0]), got: [[1], [8, 40]] The Argument that is out of range is not the same each time. So is there something I overlooked? Thx