timzaman / dotaclient Goto Github PK

View Code? Open in Web Editor NEW

26.0 3.0 7.0 437 KB

distributed RL spaghetti al arabiata

Python 81.69% Dockerfile 0.67% Jsonnet 17.63%

spaghetti

dotaclient's Introduction

DotaClient on K8s

DotaClient is a reinforcement learning system to train RL agents to play Dota 2 through self-play.

Video: (Youtube) 1v1 self play, 9 Mar 2019, uses fountain for regen!.
Video: (Youtube) 1v1 self play, 29 jan 2019.

This is built upon the DotaService project, that exposes the game of Dota2 as a (grpc) service for synchronous play.

Distributed Agents self-playing Dota 2.
Experience/Model Broker (rmq).
Distributed Optimizer (PyTorch)

Prerequisites

Kubeflow's PyTorch Operator
Kubernetes Cluster (e.g. GKE).
Build the dota docker image
Build the dotaservice docker image
Build the rabbitmq docker image
Install ksonnet

Launch distributed dota training

cd ks-app
ks show default  # Shows the full manifest
ks param list  # Lists all parameters
ks apply default  # Launches everything you need

Note: A typical job has 40 agents per optimizer. One optimizer does around 1000 steps/s.

dotaclient's People

Contributors

Stargazers

Watchers

Forkers

nostrademous 19871010 lenlrx liamdiprose tzuren atodgl

dotaclient's Issues

Augmentation Arena

@Nostrademous need your help. I think we should make a dota map that we can augment. The creep spawn should be randomized, the creeps themselves should have random hp and attack dmg, and preferablies the heroes themselves too. If we don't do this, the self-play can never generalize well enough and get out of the local minimal of just sitting back and last hitting.

Removing Invalid Actions

One suggestions that seems reasonable to me is:
https://ai.stackexchange.com/a/2994

This is regarding:
https://github.com/TimZaman/dotaclient/blob/master/policy.py#L127-L133

Games get cut off after a certain period.

GCP games get cut off around 580~582 dota_time, while local games (Mac) do not. This problem probably was always there, but never caught. Some recent assertions caught this.

Locally I can reproduce (yay!) by setting the dota_time very high. I just got locally 655 and 654 dota_time before we get a 'bad' end state. I think dota times out somehow.

Movements: MoveToPosition?

I now found MoveToPosition too, @Nostrademous . How's that different?

Detect End-of-Game Condition

If we monitor the console.log we should check for the presence of the following string:
<DATE> - <TIME>: Match signout: duration = <NUM> (<FLOAT_NUM>) good guys win = <BOOL>

If bool is 0 then Dire won. If bool is 1 then Radiant won.

Alternatively there is an earlier message that states:
<DATE> - <TIME>: Building: npc_dota_<goodguys or badguys>_fort destroyed at <TIME>

Agent receives 20 latest models instead of 1.

Ex:

$ kubectl logs dotaservice-deployment-6d54f576fc-xdbdw agent
2019-01-06 23:12:55,932 INFO     setup_model_cb(host=rmq.default.svc.cluster.local, port=5672)
2019-01-06 23:12:56,820 INFO     Received new model: version=34, size=1207326b
2019-01-06 23:12:56,824 INFO     Updated weights to version 34
2019-01-06 23:12:56,824 INFO     === Starting Episode 0.
2019-01-06 23:12:56,824 INFO     Starting game.
2019-01-06 23:12:56,884 INFO     Received new model: version=35, size=1207326b
2019-01-06 23:12:56,887 INFO     Updated weights to version 35
2019-01-06 23:12:56,991 INFO     Received new model: version=36, size=1207326b
2019-01-06 23:12:56,994 INFO     Updated weights to version 36
2019-01-06 23:12:57,021 INFO     Received new model: version=37, size=1207326b
2019-01-06 23:12:57,024 INFO     Updated weights to version 37
2019-01-06 23:12:57,094 INFO     Received new model: version=38, size=1207326b
2019-01-06 23:12:57,097 INFO     Updated weights to version 38
2019-01-06 23:12:57,101 INFO     Received new model: version=39, size=1207326b
2019-01-06 23:12:57,103 INFO     Updated weights to version 39
2019-01-06 23:12:57,154 INFO     Received new model: version=40, size=1207326b
2019-01-06 23:12:57,158 INFO     Updated weights to version 40
2019-01-06 23:12:57,161 INFO     Received new model: version=41, size=1207326b
2019-01-06 23:12:57,163 INFO     Updated weights to version 41
2019-01-06 23:12:57,208 INFO     Received new model: version=42, size=1207326b
2019-01-06 23:12:57,212 INFO     Updated weights to version 42
2019-01-06 23:12:57,215 INFO     Received new model: version=43, size=1207326b
2019-01-06 23:12:57,217 INFO     Updated weights to version 43
2019-01-06 23:12:57,220 INFO     Received new model: version=44, size=1207326b
2019-01-06 23:12:57,222 INFO     Updated weights to version 44
2019-01-06 23:12:57,265 INFO     Received new model: version=45, size=1207326b
2019-01-06 23:12:57,268 INFO     Updated weights to version 45
2019-01-06 23:12:57,272 INFO     Received new model: version=46, size=1207326b
2019-01-06 23:12:57,275 INFO     Updated weights to version 46
2019-01-06 23:12:57,317 INFO     Received new model: version=47, size=1207326b
2019-01-06 23:12:57,321 INFO     Updated weights to version 47
2019-01-06 23:12:57,324 INFO     Received new model: version=48, size=1207326b
2019-01-06 23:12:57,328 INFO     Updated weights to version 48
2019-01-06 23:12:57,371 INFO     Received new model: version=49, size=1207326b
2019-01-06 23:12:57,373 INFO     Updated weights to version 49
2019-01-06 23:12:57,379 INFO     Received new model: version=50, size=1207326b
2019-01-06 23:12:57,383 INFO     Updated weights to version 50
2019-01-06 23:12:57,386 INFO     Received new model: version=51, size=1207326b
2019-01-06 23:12:57,389 INFO     Updated weights to version 51
2019-01-06 23:12:57,432 INFO     Received new model: version=52, size=1207326b
2019-01-06 23:12:57,436 INFO     Updated weights to version 52
2019-01-06 23:12:57,439 INFO     Received new model: version=53, size=1207326b
2019-01-06 23:12:57,441 INFO     Updated weights to version 53
2019-01-06 23:12:57,478 INFO     Received new model: version=54, size=1207326b
2019-01-06 23:12:57,481 INFO     Updated weights to version 54
2019-01-06 23:13:45,003 INFO     Player 0 rollout.

Negative probabilities selected

Probably due to action masking, but at the core probably a pytorch bug.

kubectl logs job4-ppo-dotaservice-6f4cc5d688-gfvt5 agent
2019-01-25 08:17:55,147 INFO     main(rmq_host=job4-ppo-rmq.default.svc.cluster.local, rmq_port=5672)
2019-01-25 08:17:55,178 INFO     setup_model_cb(host=job4-ppo-rmq.default.svc.cluster.local, port=5672)
2019-01-25 08:17:55,214 INFO     Received new model: version=3266, size=1207838b
2019-01-25 08:17:55,219 INFO     === Starting Gane 0.
2019-01-25 08:17:55,219 INFO     Starting game.
2019-01-25 08:17:55,229 INFO     Player 0 using weights version 3266
2019-01-25 08:17:55,238 INFO     Player 5 using weights version 3266
2019-01-25 08:18:13,561 INFO     Received new model: version=3267, size=1207838b
Traceback (most recent call last):
  File "agent.py", line 698, in main
    await game.play(game_id=game_id)
  File "agent.py", line 621, in play
    action_pb = player.obs_to_action(obs=obs)
  File "agent.py", line 506, in obs_to_action
    hidden=self.hidden,
  File "agent.py", line 467, in select_action
    action_dict = self.policy.select_actions(head_prob_dict=head_prob_dict)
  File "/root/dotaclient/policy.py", line 181, in select_actions
    action_dict['target_unit'] = cls.sample_action(head_prob_dict['target_unit'])
  File "/root/dotaclient/policy.py", line 161, in sample_action
    return Categorical(probs).sample()
  File "/root/.local/lib/python3.7/site-packages/torch/distributions/categorical.py", line 110, in sample
    sample_2d = torch.multinomial(probs_2d, 1, True)
RuntimeError: invalid argument 2: invalid multinomial distribution (encountering probability entry < 0) at /pytorch/aten/src/TH/generic/THTensorRandom.cpp:298
2019-01-25 08:19:00,834 ERROR    Unclosed connection: Channel('127.0.0.1', 13337, ..., path=None)

AI vs Scripted

@TimZaman I know you are hoping to test our AI/ML design against Default bots; unfortunately there is no way to do that that I see "with the same hero selection for both sides".

I'm going to spend some time in the next few days and write a Lua-scripted bot so that we can achieve this game-play mode. I will make sure the scripted bot only uses actions that are currently available to the AI bot (no abilities, no items, etc.) and makes decision at a similar frequency (once every 5 frames).

This is just a heads-up.

GCS Setup

So I created a GCS account and tested that my authentication key is valid and I can retrieve my bucket list.

>>> def implicit():
...     from google.cloud import storage
...     storage_client = storage.Client()
...     buckets = list(storage_client.list_buckets())
...     print(buckets)
... 
>>> implicit()
[<Bucket: pydota2>]
>>> quit()

What changes and where do I need to make to run using the distributed K8s setup? Any help you can offer in terms of directions or direct answer would probably save me tons of time.

Thanks

gcp upload of tb file too big

2019-01-13 16:42:25,801 INFO     steps_per_s=176.02, avg_weight_age=6.12, mean_reward=-0.60, loss=0.7546
Traceback (most recent call last):
  File "optimizer.py", line 514, in <module>
    mq_prefetch_count=args.mq_prefetch_count,
  File "optimizer.py", line 493, in main
    dota_optimizer.run()
  File "optimizer.py", line 315, in run
    self.step(experiences=experiences)
  File "optimizer.py", line 429, in step
    blob.upload_from_filename(filename=self.events_filename)
  File "/root/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1136, in upload_from_filename
    predefined_acl=predefined_acl,
  File "/root/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1081, in upload_from_file
    client, file_obj, content_type, size, num_retries, predefined_acl
  File "/root/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 991, in _do_upload
    client, stream, content_type, size, num_retries, predefined_acl
  File "/root/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 938, in _do_resumable_upload
    response = upload.transmit_next_chunk(transport)
  File "/root/.local/lib/python3.7/site-packages/google/resumable_media/requests/upload.py", line 392, in transmit_next_chunk
    method, url, payload, headers = self._prepare_request()
  File "/root/.local/lib/python3.7/site-packages/google/resumable_media/_upload.py", line 530, in _prepare_request
    self._stream, self._chunk_size, self._total_bytes)
  File "/root/.local/lib/python3.7/site-packages/google/resumable_media/_upload.py", line 825, in get_next_chunk
    raise ValueError(msg)
ValueError: 8682306 bytes have been read from the stream, which exceeds the expected total 8664934.

Better Exploration and Training

This is its own issue regarding agent expectations and the comment you made with respect to win/loss reward and it not mattering currently.

I think that becoming a 'last-hit' master is currently all the agent can hope to accomplish. I say this because there is the following intrinsic rewards:

Stay near the center (sure a micro-reward but it still counts) so it encourages the agent not to "explore" through negative reinforcement
The bot cannot do anything other than move or attack enemy NPC units or heroes or deny own units. It has no concept of enemy tower, just its own tower and only for purposes of knowing its health but nothing else about it. It is encouraged to last-hit, deny, kill enemy hero, not die.
It could potentially learn to help push the creep wave into enemy tower and thus "win" the game but it is discouraged from doing so currently as the reward for killing enemy hero is 3x the reward for winning a game early. (you kill enemy hero twice at +3.0 reward for each kill to win and thus gain the +1.0 reward for win, or you destroy 1 tower which the bot cannot even attack as it doesn't have that unit handle for no-reward other than the +1.0 win reward)

As a result, I think becoming a last-hit champ is all it can aspire to.

Optimizer ValueError

I kicked off a local run on Friday afternoon with the current HEAD at the time and let it run over the weekend. Sunday night I noticed that the optimizer had seemed to die about 2 hours after starting with the error below:

2019-02-15 15:37:57,825 INFO     steps_per_s=25.76, avg_weight_age=1.0, reward_per_sec=-0.0000, loss=-0.1758, entropy=5.023, advantage=0.018
2019-02-15 15:37:57,878 INFO     iteration 27/10000
2019-02-15 15:40:09,232 INFO      epoch 1/4
2019-02-15 15:40:12,433 INFO      epoch 2/4
2019-02-15 15:40:15,710 INFO      epoch 3/4
Traceback (most recent call last):
  File "optimizer.py", line 782, in <module>
    run_local=args.run_local,
  File "optimizer.py", line 736, in main
    dota_optimizer.run()
  File "optimizer.py", line 461, in run
    loss_d, entropy_d, advantage = self.train(experiences=batch)
  File "optimizer.py", line 646, in train
    loss, policy_loss, entropy_loss, advantage_loss))
ValueError: loss=nan, policy_loss=nan, entropy_loss=nan, advantage_loss=nan

The error above is one issue. The 2nd issue is that the agent kept running for 2 more days (as well as the Dota Service) without a hiccup even though the Optimizer went down. The Agent just would continue to re-use the same weights version, over and over and over.

Create (or use mine) dotaworld module

From an architecture standpoint - we really should have a:

dotaservice module - already exists, responsible for all the network/file IO between Dota2 and our code
dotaagent module (called dotaclient currently) - responsible for the AI portion of controlling the bots - uses NNs, RL, calculates rewards, learning, etc.
dotaworld module - responsible for mapping the CMsgBotWorldState protobuf information acquired at whatever interval to tracked entities in the Python world.

I started work on 3 in my code already:
https://github.com/pydota2/pydota2/tree/master/dotaworld

Problem reloading weights

2019-01-06 18:42:18,355 INFO     Downloading: runs/Jan06_04-11-46_optimizer-master-0/model_000000726.pt
Traceback (most recent call last):
  File "optimizer.py", line 362, in <module>
    pretrained_model=args.pretrained_model,
  File "optimizer.py", line 338, in main
    pretrained_model=pretrained_model,
  File "optimizer.py", line 80, in __init__
    self.policy_base.load_state_dict(torch.load(pretrained_model), strict=False)
  File "/root/.local/lib/python3.7/site-packages/torch/serialization.py", line 367, in load
    return _load(f, map_location, pickle_module)
  File "/root/.local/lib/python3.7/site-packages/torch/serialization.py", line 528, in _load
    magic_number = pickle_module.load(f)
EOFError: Ran out of input

Add Support for 5v5 GameMode

All Pick Support - DOTA_GAMEMODE_AP

Steps per second of optimizer slow

Profile this. Possibly due to slow upload of tensorboard event files or model uploading.

Notice all of the jumps in the below figure relate to restarts of the optimizer.

Sum entropy per head before averaging.

Optimizer drops out bc of rmq

Optimizer drops out [here, worker 9 out of 12 optimizers] with below error. RMQ is fine.

The other optimizers then drop out:

Traceback (most recent call last):
  File "optimizer.py", line 353, in <module>
    pretrained_model=args.pretrained_model,
  File "optimizer.py", line 335, in main
    dota_optimizer.run()
  File "optimizer.py", line 171, in run
    self.step(experiences=experiences)
  File "optimizer.py", line 205, in step
    loss = self.finish_episode(rewards=all_discounted_rewards, log_probs=all_logprobs)
  File "optimizer.py", line 118, in finish_episode
    loss.backward()
  File "/root/.local/lib/python3.7/site-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/root/.local/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/root/dotaclient/distributed.py", line 37, in allreduce_params
    dist.all_reduce(has_grad_count, op=dist.ReduceOp.SUM) # [0. to world_size]
  File "/root/.local/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 838, in all_reduce
    work.wait()
RuntimeError: [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:543] Connection closed by peer [10.20.94.2]:15019

[Request] Please provide a run instructions

In order to collaborate and work together, since I'm not very much familiar with K8s provisioning, configuration, and the necessary steps for deploying my workload against them, would you be kind enough to provide a README or at least links to instructions that you found useful to setup my own cluster, credentials, account, etc?

I would like to be able to throw as much money as I want at the problem and replicate your current setup and work (I have corporate time and money to throw at the problem coming up soon). Once I'm up and running there is a lot more value I should be able to bring.

Handle dotaservice crashes

If the dotaservice crashes, we don't want to perform a rollout.

Compute entropy only for valid targets.

Otherwise leads to these kind of bs situations, where is actually optimizes for getting invalid units to come up so that the entropy seems higher.

 'target_unit': tensor([[[3.0567e-02, 2.6279e-02, 2.6279e-02, 2.6279e-02, 2.6279e-02,
          2.6279e-02, 5.9059e-04, 1.3049e-05, 9.7086e-04, 7.9768e-04,
          3.0369e-04, 5.4091e-04, 4.1392e-02, 4.1392e-02, 4.1392e-02,
          4.1392e-02, 4.1392e-02, 4.1392e-02, 4.1392e-02, 4.1392e-02,
          4.1392e-02, 4.1392e-02, 8.5043e-05, 3.6217e-04, 3.0033e-02,
          3.0033e-02, 3.0033e-02, 3.0033e-02, 3.0033e-02, 3.0033e-02,
          3.0033e-02, 3.0033e-02, 3.0033e-02, 3.0033e-02, 3.0033e-02,
          3.0033e-02, 3.0033e-02, 3.0033e-02]]], grad_fn=<SoftmaxBackward>),

Error enemy reward calculation

Occasionally

Traceback (most recent call last):
  File "agent.py", line 751, in main
    await game.play(game_id=game_id)
  File "agent.py", line 684, in play
    rad_rew = sum(players[TEAM_RADIANT].rewards[-1].values())
IndexError: list index out of range

NaN's in loss

$ kubectl logs job11-optimizer-master-0 -p
2019-01-27 16:29:21,969 INFO     main(rmq_host=job11-rmq.default.svc.cluster.local, rmq_port=5672, epochs=4 seq_per_epoch=32, batch_size=8, seq_len=256 learning_rate=0.0001, pretrained_model=None, mq_prefetch_count=4, entropy_coef=0.02)
2019-01-27 16:29:21,970 INFO     init_distribution
2019-01-27 16:29:21,970 WARNING  skipping distribution: world size too small (1)
2019-01-27 16:29:21,983 INFO     Checkpointing to: exp2/job11
2019-01-27 16:29:22,305 INFO     Found a latest model in pretrained dir: exp2/job11/model_000000484.pt
2019-01-27 16:29:22,305 INFO     Downloading: exp2/job11/model_000000484.pt
2019-01-27 16:29:22,431 INFO     Connected to RMQ
2019-01-27 16:29:22,571 INFO     iteration 485/10000
2019-01-27 16:29:25,229 INFO      epoch 1/4
2019-01-27 16:29:30,171 INFO      epoch 2/4
2019-01-27 16:29:34,918 INFO      epoch 3/4
2019-01-27 16:29:39,586 INFO      epoch 4/4
2019-01-27 16:29:44,268 INFO     steps_per_s=421.46, avg_weight_age=1.0, reward_per_sec=0.0202, loss=nan, entropy=nan
Traceback (most recent call last):
  File "optimizer.py", line 737, in <module>
    run_local=args.run_local,
  File "optimizer.py", line 693, in main
    dota_optimizer.run()
  File "optimizer.py", line 495, in run
    self.writer.add_histogram('losses', losses, it)
  File "/root/.local/lib/python3.7/site-packages/tensorboardX/writer.py", line 406, in add_histogram
    histogram(tag, values, bins), global_step, walltime)
  File "/root/.local/lib/python3.7/site-packages/tensorboardX/summary.py", line 146, in histogram
    hist = make_histogram(values.astype(float), bins)
  File "/root/.local/lib/python3.7/site-packages/tensorboardX/summary.py", line 168, in make_histogram
    counts = counts[start:end]
UnboundLocalError: local variable 'start' referenced before assignment

Crash

Ran latest code on local machine over 3-4 days and after 5994 games it spat out an error:

Traceback (most recent call last):
  File "agent.py", line 985, in main
    await game.play(config=config, game_id=game_id)
  File "agent.py", line 898, in play
    player.compute_reward(prev_obs=prev_obs[team_id], obs=obs)
  File "agent.py", line 797, in compute_reward
    reward = get_reward(prev_obs=prev_obs, obs=obs, player_id=self.player_id)
  File "agent.py", line 123, in get_reward
    unit = get_unit(obs, player_id=player_id)
  File "agent.py", line 247, in get_unit
    raise ValueError("unit {} not found in state:\n{}".format(player_id, state))
ValueError: unit 5 not found in state:

Followed by the dump of the protobuf

Dotaclient deadlocks on rmq connection loss

Agent:

$ tzaman@Tims-Mac-Pro dotaclient (master) $ kubectl logs dotaservice-deployment-8c499c96-2mgbd agent
2019-01-06 21:18:41,153 INFO     setup_model_cb(host=rmq.default.svc.cluster.local, port=5672)
2019-01-06 21:18:41,283 INFO     Received new model: version=284, size=1207326b
2019-01-06 21:18:41,287 INFO     Updated weights to version 284
2019-01-06 21:18:41,288 INFO     === Starting Episode 0.
2019-01-06 21:18:41,288 INFO     Starting game.
2019-01-06 21:18:41,290 ERROR    error on dispatch
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 573, in run_until_complete
    return future.result()
  File "agent.py", line 515, in main
    await game.play()
  File "agent.py", line 430, in play
    response = await asyncio.wait_for(self.dota_service.reset(self.config), timeout=120)
  File "/usr/lib/python3.7/asyncio/tasks.py", line 416, in wait_for
    return fut.result()
  File "/root/.local/lib/python3.7/site-packages/grpclib/client.py", line 588, in __call__
    await stream.send_message(message, end=True)
  File "/root/.local/lib/python3.7/site-packages/grpclib/client.py", line 163, in send_message
    await self.send_request()
  File "/root/.local/lib/python3.7/site-packages/grpclib/client.py", line 132, in send_request
    protocol = await self._channel.__connect__()
  File "/root/.local/lib/python3.7/site-packages/grpclib/client.py", line 477, in __connect__
    self._protocol = await self._create_connection()
  File "/root/.local/lib/python3.7/site-packages/grpclib/client.py", line 465, in _create_connection
    ssl=self._ssl)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 948, in create_connection
    raise exceptions[0]
  File "/usr/lib/python3.7/asyncio/base_events.py", line 935, in create_connection
    await self.sock_connect(sock, address)
  File "/usr/lib/python3.7/asyncio/selector_events.py", line 475, in sock_connect
    return await fut
  File "/usr/lib/python3.7/asyncio/selector_events.py", line 505, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('127.0.0.1', 13337)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.local/lib/python3.7/site-packages/aioamqp/protocol.py", line 333, in run
    yield from self.dispatch_frame()
  File "/root/.local/lib/python3.7/site-packages/aioamqp/protocol.py", line 288, in dispatch_frame
    yield from channel.dispatch_frame(frame)
  File "/root/.local/lib/python3.7/site-packages/aioamqp/channel.py", line 111, in dispatch_frame
    yield from methods[(frame.class_id, frame.method_id)](frame)
  File "/root/.local/lib/python3.7/site-packages/aioamqp/channel.py", line 631, in basic_deliver
    content_body_frame = yield from self.protocol.get_frame()
  File "/root/.local/lib/python3.7/site-packages/aioamqp/protocol.py", line 264, in get_frame
    yield from frame.read_frame()
  File "/root/.local/lib/python3.7/site-packages/aioamqp/frame.py", line 462, in read_frame
    payload_data = yield from self.reader.readexactly(self.frame_length)
  File "/usr/lib/python3.7/asyncio/streams.py", line 679, in readexactly
    await self._wait_for_data('readexactly')
  File "/usr/lib/python3.7/asyncio/streams.py", line 473, in _wait_for_data
    await self._waiter
concurrent.futures._base.CancelledError

Dotaservice (0.3.2):

$ tzaman@Tims-Mac-Pro dotaclient (master) $ kubectl logs dotaservice-deployment-8c499c96-2mgbd dotaservice
2019-01-06 21:18:41,932 INFO     DotaService 0.3.2 serving on :13337

Speed dropouts correlated with strategy collapse

The below particular example didnt fully shit itself, for the simple reason that its learning rate was very modest (1e-5). The rest is typically 1e-4 as shown in below comments.

Example: 15 Jan 23:09

Related log entry:

2019-01-16 06:55:27,372 INFO     ::step episode=1427
2019-01-16 06:55:37,756 INFO     steps_per_s=559.17, avg_weight_age=4.88, mean_reward=3.36, loss=-0.0141
2019-01-16 06:55:38,705 INFO     ::step episode=1428
2019-01-16 06:55:42,167 INFO     steps_per_s=711.36, avg_weight_age=3.75, mean_reward=0.29, loss=-0.0138
2019-01-16 06:55:43,089 INFO     ::step episode=1429
2019-01-16 06:55:47,309 INFO     steps_per_s=805.36, avg_weight_age=4.94, mean_reward=0.43, loss=-0.0181
2019-01-16 06:56:12,566 INFO     ::step episode=1430
2019-01-16 06:56:20,674 INFO     steps_per_s=241.90, avg_weight_age=5.94, mean_reward=2.61, loss=-0.0661
2019-01-16 06:57:19,174 INFO     ::step episode=1431
2019-01-16 06:57:28,744 INFO     steps_per_s=140.80, avg_weight_age=2.25, mean_reward=3.39, loss=-0.1711
2019-01-16 06:58:26,423 INFO     ::step episode=1432

At the time, pods were being preempted:

NAME                                     READY   STATUS              RESTARTS   AGE
dotaservice-deployment-b8dc7b9dc-2kjlf   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-5lm7s   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-5z2tr   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-6bwzc   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-6k5p9   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-7cdxh   2/2     Running             1          6h
dotaservice-deployment-b8dc7b9dc-87vzs   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-9pnr9   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-cxb8s   2/2     Running             0          2h
dotaservice-deployment-b8dc7b9dc-f7xls   0/2     ContainerCreating   0          12m
dotaservice-deployment-b8dc7b9dc-fmqm5   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-h2h4h   2/2     Running             0          6h
dotaservice-deployment-b8dc7b9dc-h5j29   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-hdcrs   2/2     Running             0          6h
dotaservice-deployment-b8dc7b9dc-hvr7d   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-hwvsz   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-hxbst   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-j22n7   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-lltwf   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-m5j5d   2/2     Running             0          6h
dotaservice-deployment-b8dc7b9dc-m6md5   2/2     Running             1          6h
dotaservice-deployment-b8dc7b9dc-m85nk   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-mzb9p   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-nrdq4   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-rcqhv   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-scc7t   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-smsrw   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-svzc9   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-wdcjw   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-wjwql   0/2     ContainerCreating   0          12m
dotaservice-deployment-b8dc7b9dc-xtnx7   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-z9t6r   2/2     Running             0          6h
optimizer-master-0                       1/1     Running             0          6h
pytorch-operator-6f87db67b7-hgsr5        1/1     Running             0          1d
rmq-7c86fd9767-bqjfk                     1/1     Running             0          6h

PPO Improvements

https://arxiv.org/abs/1901.10314

NaN in policy loss

$ kubectl logs job1-optimizer-master-0
2019-02-03 08:08:55,542 INFO     main(rmq_host=job1-rmq.default.svc.cluster.local, rmq_port=5672, epochs=4 seq_per_epoch=32, batch_size=8, seq_len=256 learning_rate=1e-05, pretrained_model=None, mq_prefetch_count=4, entropy_coef=0.0001)
2019-02-03 08:08:55,542 INFO     init_distribution
2019-02-03 08:08:55,543 WARNING  skipping distribution: world size too small (1)
2019-02-03 08:08:55,555 INFO     Checkpointing to: exp3/job1
2019-02-03 08:08:55,820 INFO     Found a latest model in pretrained dir: exp3/job1/model_000000018.pt
2019-02-03 08:08:55,821 INFO     Downloading: exp3/job1/model_000000018.pt
2019-02-03 08:08:55,967 INFO     Connected to RMQ
2019-02-03 08:08:56,118 INFO     iteration 19/10000
2019-02-03 08:09:04,984 INFO      epoch 1/4
2019-02-03 08:09:09,354 INFO      epoch 2/4
2019-02-03 08:09:13,669 INFO      epoch 3/4
2019-02-03 08:09:17,948 INFO      epoch 4/4
2019-02-03 08:09:22,256 INFO     steps_per_s=321.15, avg_weight_age=2.8, reward_per_sec=-0.0045, loss=0.0894, entropy=7.509, advantage=-0.047
2019-02-03 08:09:22,535 INFO     iteration 20/10000
2019-02-03 08:09:36,020 INFO      epoch 1/4
2019-02-03 08:09:40,070 INFO      epoch 2/4
2019-02-03 08:09:44,142 INFO      epoch 3/4
2019-02-03 08:09:48,175 INFO      epoch 4/4
2019-02-03 08:09:52,276 INFO     steps_per_s=273.46, avg_weight_age=2.8, reward_per_sec=-0.0024, loss=0.3443, entropy=7.537, advantage=-0.098
2019-02-03 08:09:52,509 INFO     iteration 21/10000
2019-02-03 08:10:05,592 INFO      epoch 1/4
2019-02-03 08:10:09,640 INFO      epoch 2/4
2019-02-03 08:10:13,671 INFO      epoch 3/4
2019-02-03 08:10:17,794 INFO      epoch 4/4
2019-02-03 08:10:22,531 INFO     steps_per_s=287.60, avg_weight_age=2.8, reward_per_sec=-0.0050, loss=0.3568, entropy=7.509, advantage=0.088
2019-02-03 08:10:22,784 INFO     iteration 22/10000
2019-02-03 08:10:33,289 INFO      epoch 1/4
2019-02-03 08:10:37,302 INFO      epoch 2/4
2019-02-03 08:10:41,340 INFO      epoch 3/4
2019-02-03 08:10:45,373 INFO      epoch 4/4
2019-02-03 08:10:49,536 INFO     steps_per_s=322.51, avg_weight_age=13.8, reward_per_sec=-0.0008, loss=0.2533, entropy=7.513, advantage=0.070
2019-02-03 08:10:49,779 INFO     iteration 23/10000
2019-02-03 08:11:01,499 INFO      epoch 1/4
2019-02-03 08:11:05,246 INFO      epoch 2/4
2019-02-03 08:11:09,120 INFO      epoch 3/4
2019-02-03 08:11:12,983 INFO      epoch 4/4
2019-02-03 08:11:16,889 INFO     steps_per_s=299.38, avg_weight_age=12.0, reward_per_sec=-0.0036, loss=0.3263, entropy=7.541, advantage=0.130
2019-02-03 08:11:17,135 INFO     iteration 24/10000
tzaman@Tims-Mac-Pro dotaclient (master) $ kubectl logs job1-optimizer-master-0 -p
2019-02-03 07:58:09,186 INFO     main(rmq_host=job1-rmq.default.svc.cluster.local, rmq_port=5672, epochs=4 seq_per_epoch=32, batch_size=8, seq_len=256 learning_rate=1e-05, pretrained_model=None, mq_prefetch_count=4, entropy_coef=0.0001)
2019-02-03 07:58:09,186 INFO     init_distribution
2019-02-03 07:58:09,186 WARNING  skipping distribution: world size too small (1)
2019-02-03 07:58:09,198 INFO     Checkpointing to: exp3/job1
2019-02-03 07:58:09,442 INFO     Connected to RMQ
2019-02-03 07:58:09,649 INFO     iteration 1/10000
2019-02-03 08:00:13,929 INFO      epoch 1/4
2019-02-03 08:00:18,560 INFO      epoch 2/4
2019-02-03 08:00:22,967 INFO      epoch 3/4
2019-02-03 08:00:27,461 INFO      epoch 4/4
2019-02-03 08:00:32,009 INFO     steps_per_s=59.25, avg_weight_age=0.0, reward_per_sec=-0.0082, loss=0.3138, entropy=7.515, advantage=0.227
2019-02-03 08:00:32,601 INFO     iteration 2/10000
2019-02-03 08:00:34,976 INFO      epoch 1/4
2019-02-03 08:00:39,304 INFO      epoch 2/4
2019-02-03 08:00:43,549 INFO      epoch 3/4
2019-02-03 08:00:47,804 INFO      epoch 4/4
2019-02-03 08:00:52,170 INFO     steps_per_s=407.36, avg_weight_age=1.0, reward_per_sec=-0.0057, loss=0.1834, entropy=7.549, advantage=0.153
2019-02-03 08:00:52,454 INFO     iteration 3/10000
2019-02-03 08:00:54,928 INFO      epoch 1/4
2019-02-03 08:00:58,679 INFO      epoch 2/4
2019-02-03 08:01:02,412 INFO      epoch 3/4
2019-02-03 08:01:06,143 INFO      epoch 4/4
2019-02-03 08:01:10,061 INFO     steps_per_s=457.79, avg_weight_age=2.0, reward_per_sec=-0.0045, loss=0.1381, entropy=7.549, advantage=0.109
2019-02-03 08:01:10,330 INFO     iteration 4/10000
2019-02-03 08:01:36,341 INFO      epoch 1/4
2019-02-03 08:01:40,406 INFO      epoch 2/4
2019-02-03 08:01:44,472 INFO      epoch 3/4
2019-02-03 08:01:48,500 INFO      epoch 4/4
2019-02-03 08:01:52,630 INFO     steps_per_s=204.48, avg_weight_age=3.0, reward_per_sec=-0.0052, loss=0.2404, entropy=7.518, advantage=0.120
2019-02-03 08:01:52,858 INFO     iteration 5/10000
2019-02-03 08:01:55,481 INFO      epoch 1/4
2019-02-03 08:01:59,453 INFO      epoch 2/4
2019-02-03 08:02:03,407 INFO      epoch 3/4
2019-02-03 08:02:07,419 INFO      epoch 4/4
2019-02-03 08:02:11,792 INFO     steps_per_s=441.39, avg_weight_age=4.0, reward_per_sec=-0.0048, loss=0.2777, entropy=7.482, advantage=0.130
2019-02-03 08:02:12,060 INFO     iteration 6/10000
2019-02-03 08:02:15,000 INFO      epoch 1/4
2019-02-03 08:02:19,372 INFO      epoch 2/4
2019-02-03 08:02:23,516 INFO      epoch 3/4
2019-02-03 08:02:27,732 INFO      epoch 4/4
2019-02-03 08:02:32,221 INFO     steps_per_s=425.83, avg_weight_age=5.0, reward_per_sec=-0.0029, loss=0.2681, entropy=7.498, advantage=-0.018
2019-02-03 08:02:32,489 INFO     iteration 7/10000
2019-02-03 08:03:00,428 INFO      epoch 1/4
2019-02-03 08:03:04,404 INFO      epoch 2/4
2019-02-03 08:03:08,522 INFO      epoch 3/4
2019-02-03 08:03:12,645 INFO      epoch 4/4
2019-02-03 08:03:16,870 INFO     steps_per_s=189.22, avg_weight_age=4.0, reward_per_sec=-0.0069, loss=0.2963, entropy=7.481, advantage=0.130
2019-02-03 08:03:17,154 INFO     iteration 8/10000
2019-02-03 08:03:19,849 INFO      epoch 1/4
2019-02-03 08:03:23,899 INFO      epoch 2/4
2019-02-03 08:03:28,062 INFO      epoch 3/4
2019-02-03 08:03:32,274 INFO      epoch 4/4
2019-02-03 08:03:36,499 INFO     steps_per_s=443.39, avg_weight_age=4.9, reward_per_sec=-0.0046, loss=0.2225, entropy=7.487, advantage=0.121
2019-02-03 08:03:36,767 INFO     iteration 9/10000
2019-02-03 08:03:40,356 INFO      epoch 1/4
2019-02-03 08:03:44,524 INFO      epoch 2/4
2019-02-03 08:03:48,800 INFO      epoch 3/4
2019-02-03 08:03:53,079 INFO      epoch 4/4
2019-02-03 08:03:57,397 INFO     steps_per_s=428.69, avg_weight_age=5.8, reward_per_sec=-0.0031, loss=0.1637, entropy=7.529, advantage=-0.027
2019-02-03 08:03:57,624 INFO     iteration 10/10000
2019-02-03 08:04:09,730 INFO      epoch 1/4
2019-02-03 08:04:13,922 INFO      epoch 2/4
2019-02-03 08:04:18,159 INFO      epoch 3/4
2019-02-03 08:04:22,414 INFO      epoch 4/4
2019-02-03 08:04:26,668 INFO     steps_per_s=297.42, avg_weight_age=4.4, reward_per_sec=-0.0052, loss=0.2684, entropy=7.551, advantage=0.137
2019-02-03 08:04:26,937 INFO     iteration 11/10000
2019-02-03 08:04:40,910 INFO      epoch 1/4
2019-02-03 08:04:44,869 INFO      epoch 2/4
2019-02-03 08:04:48,852 INFO      epoch 3/4
2019-02-03 08:04:52,930 INFO      epoch 4/4
2019-02-03 08:04:57,053 INFO     steps_per_s=277.99, avg_weight_age=5.0, reward_per_sec=-0.0055, loss=0.3511, entropy=7.519, advantage=0.157
2019-02-03 08:04:57,328 INFO     iteration 12/10000
2019-02-03 08:05:08,032 INFO      epoch 1/4
2019-02-03 08:05:12,169 INFO      epoch 2/4
2019-02-03 08:05:16,640 INFO      epoch 3/4
2019-02-03 08:05:20,895 INFO      epoch 4/4
2019-02-03 08:05:25,137 INFO     steps_per_s=309.99, avg_weight_age=5.4, reward_per_sec=-0.0043, loss=0.1677, entropy=7.552, advantage=0.002
2019-02-03 08:05:25,377 INFO     iteration 13/10000
2019-02-03 08:05:32,969 INFO      epoch 1/4
2019-02-03 08:05:36,927 INFO      epoch 2/4
2019-02-03 08:05:40,960 INFO      epoch 3/4
2019-02-03 08:05:45,058 INFO      epoch 4/4
2019-02-03 08:05:49,178 INFO     steps_per_s=351.36, avg_weight_age=4.3, reward_per_sec=-0.0046, loss=0.3550, entropy=7.511, advantage=0.136
2019-02-03 08:05:49,449 INFO     iteration 14/10000
2019-02-03 08:06:07,304 INFO      epoch 1/4
2019-02-03 08:06:11,466 INFO      epoch 2/4
2019-02-03 08:06:15,637 INFO      epoch 3/4
2019-02-03 08:06:19,784 INFO      epoch 4/4
2019-02-03 08:06:24,006 INFO     steps_per_s=242.56, avg_weight_age=4.4, reward_per_sec=-0.0027, loss=0.1183, entropy=7.534, advantage=0.041
2019-02-03 08:06:24,288 INFO     iteration 15/10000
2019-02-03 08:06:28,117 INFO      epoch 1/4
2019-02-03 08:06:32,300 INFO      epoch 2/4
2019-02-03 08:06:36,325 INFO      epoch 3/4
2019-02-03 08:06:40,490 INFO      epoch 4/4
2019-02-03 08:06:44,682 INFO     steps_per_s=408.59, avg_weight_age=4.5, reward_per_sec=-0.0066, loss=0.3987, entropy=7.405, advantage=0.203
2019-02-03 08:06:44,939 INFO     iteration 16/10000
2019-02-03 08:06:58,538 INFO      epoch 1/4
2019-02-03 08:07:02,430 INFO      epoch 2/4
2019-02-03 08:07:06,373 INFO      epoch 3/4
2019-02-03 08:07:10,318 INFO      epoch 4/4
2019-02-03 08:07:14,332 INFO     steps_per_s=276.31, avg_weight_age=4.3, reward_per_sec=-0.0044, loss=0.3123, entropy=7.525, advantage=0.166
2019-02-03 08:07:14,598 INFO     iteration 17/10000
2019-02-03 08:07:32,498 INFO      epoch 1/4
2019-02-03 08:07:36,640 INFO      epoch 2/4
2019-02-03 08:07:40,742 INFO      epoch 3/4
2019-02-03 08:07:44,962 INFO      epoch 4/4
2019-02-03 08:07:49,240 INFO     steps_per_s=249.25, avg_weight_age=4.9, reward_per_sec=-0.0031, loss=0.1578, entropy=7.548, advantage=-0.241
2019-02-03 08:07:49,485 INFO     iteration 18/10000
2019-02-03 08:07:52,115 INFO      epoch 1/4
2019-02-03 08:07:55,908 INFO      epoch 2/4
2019-02-03 08:07:59,834 INFO      epoch 3/4
2019-02-03 08:08:03,765 INFO      epoch 4/4
2019-02-03 08:08:07,733 INFO     steps_per_s=443.28, avg_weight_age=4.6, reward_per_sec=-0.0054, loss=0.3175, entropy=7.556, advantage=0.137
2019-02-03 08:08:07,999 INFO     iteration 19/10000
2019-02-03 08:08:15,437 INFO      epoch 1/4
2019-02-03 08:08:19,210 INFO      epoch 2/4
2019-02-03 08:08:23,094 INFO      epoch 3/4
2019-02-03 08:08:27,007 INFO      epoch 4/4
2019-02-03 08:08:30,989 INFO     steps_per_s=352.08, avg_weight_age=4.8, reward_per_sec=-0.0024, loss=0.2117, entropy=7.556, advantage=-0.033
2019-02-03 08:08:31,275 INFO     iteration 20/10000
2019-02-03 08:08:45,272 INFO      epoch 1/4
2019-02-03 08:08:49,297 INFO      epoch 2/4
Traceback (most recent call last):
  File "optimizer.py", line 746, in <module>
    run_local=args.run_local,
  File "optimizer.py", line 702, in main
    dota_optimizer.run()
  File "optimizer.py", line 441, in run
    loss, entropy_d, advantage = self.train(experiences=batch)
  File "optimizer.py", line 619, in train
    loss, policy_loss, entropy_loss, advantage_loss))
ValueError: loss=nan, policy_loss=nan, entropy_loss=0.00010979062062688172, advantage_loss=0.003007180755957961

add_image error

Kills Optimizer.py

Traceback (most recent call last):
  File "optimizer.py", line 737, in <module>
    run_local=args.run_local,
  File "optimizer.py", line 693, in main
    dota_optimizer.run()
  File "optimizer.py", line 507, in run
    self.writer.add_image('canvas', canvas, it, dataformats='HWC')
TypeError: add_image() got an unexpected keyword argument 'dataformats'

Transformers vs LTSM & CNN/RNN for Attention

https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

MoveToPosition doesn't exist.

hUnit:Action_MoveToPosition(vLoc) gives me:

[VScript] Script Runtime Error: ...a 2 beta/game/dota/scripts/vscripts/bots/bot_generic.lua:14: attempt to call method 'Action_MoveToPosition' (a nil value)
stack traceback:
	...a 2 beta/game/dota/scripts/vscripts/bots/bot_generic.lua:14: in function <...a 2 beta/game/dota/scripts/vscripts/bots/bot_generic.lua:2>

Calculating Reward on Game End

If you look at the stream of rewards below (entire Game #2) you will see that it ends in victory for Dire, however you only see 1 death each from both agents. Also, based on tower_hp it looks like the tower was not even close to dying, meaning the game ended b/c the Radiant agent died a 2nd time, but I don't have the -3.0 kill reward for Player 0 in the reward a second time.

This make me believe that we don't capture the rewards between the last reward sync and the game end.

2019-02-05 10:59:50,789 INFO     === Starting Game 2.
2019-02-05 10:59:50,789 INFO     Starting game.
2019-02-05 10:59:50,797 INFO     Player 0 using weights version 0
2019-02-05 10:59:50,802 INFO     Player 5 using weights version 0
2019-02-05 11:00:16,411 INFO     Player 0 rollout.
2019-02-05 11:00:16,412 INFO     Player 0 reward sum: -0.11 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': -0.114,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': 0.0,
 'win': 0.0,
 'xp': 0.0}
2019-02-05 11:00:16,429 INFO     Player 5 rollout.
2019-02-05 11:00:16,430 INFO     Player 5 reward sum: 0.11 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': -0.0,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': 0.0,
 'win': 0.0,
 'xp': 0.114}
2019-02-05 11:00:33,551 INFO     Received new model: version=0, size=1472372b
2019-02-05 11:00:40,146 INFO     Player 0 rollout.
2019-02-05 11:00:40,147 INFO     Player 0 reward sum: -0.15 subrewards:
{'death': -3.0,
 'denies': 0.0,
 'enemy': 3.0988716954415696,
 'hp': -1.2002411301619431,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': -0.015,
 'win': 0.0,
 'xp': 0.9700000000000001}
2019-02-05 11:00:40,158 INFO     Player 5 rollout.
2019-02-05 11:00:40,159 INFO     Player 5 reward sum: 0.15 subrewards:
{'death': -3.0,
 'denies': 0.2,
 'enemy': 3.245241130161943,
 'hp': -1.2005383621082364,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': -0.058333333333333334,
 'win': 0.0,
 'xp': 0.96}
2019-02-05 11:00:56,220 INFO     Player 0 rollout.
2019-02-05 11:00:56,221 INFO     Player 0 reward sum: -6.98 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': -0.61683011154303,
 'hp': -1.3583716176202625,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': 0.0,
 'win': -5.0,
 'xp': 0.0}
2019-02-05 11:00:56,226 INFO     Player 5 rollout.
2019-02-05 11:00:56,227 INFO     Player 5 reward sum: 6.72 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': 1.3602503028054476,
 'hp': -0.3734285740740741,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': -0.018333333333333333,
 'win': 5.0,
 'xp': 0.756}
2019-02-05 11:00:56,232 INFO     Game finished.

Honestly wanted to ask that for a while, but why did you select PPO as "the" algorithm you are implementing?
Did you consider anything else?

Bad Non_Hero

https://github.com/TimZaman/dotaclient/blob/master/agent.py#L337

Make connection to dotaclient more resilient

kubectl logs dotaservice-deployment-f7ffdb9d5-fbk5l agent -f
2019-01-06 02:12:25,289 INFO     setup_model_cb(host=rmq.default.svc.cluster.local, port=5672)
2019-01-06 02:12:25,344 INFO     Received new model: version=224, size=1207326b
2019-01-06 02:12:25,347 INFO     Updated weights to version 224
2019-01-06 02:12:25,348 INFO     === Starting Episode 0.
2019-01-06 02:12:25,348 INFO     Starting game.
2019-01-06 02:12:25,349 ERROR    error on dispatch
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 573, in run_until_complete
    return future.result()
  File "agent.py", line 515, in main
    await game.play()
  File "agent.py", line 430, in play
    response = await asyncio.wait_for(self.dota_service.reset(self.config), timeout=120)
  File "/usr/lib/python3.7/asyncio/tasks.py", line 416, in wait_for
    return fut.result()
  File "/root/.local/lib/python3.7/site-packages/grpclib/client.py", line 588, in __call__
    await stream.send_message(message, end=True)
  File "/root/.local/lib/python3.7/site-packages/grpclib/client.py", line 163, in send_message
    await self.send_request()
  File "/root/.local/lib/python3.7/site-packages/grpclib/client.py", line 132, in send_request
    protocol = await self._channel.__connect__()
  File "/root/.local/lib/python3.7/site-packages/grpclib/client.py", line 477, in __connect__
    self._protocol = await self._create_connection()
  File "/root/.local/lib/python3.7/site-packages/grpclib/client.py", line 465, in _create_connection
    ssl=self._ssl)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 948, in create_connection
    raise exceptions[0]
  File "/usr/lib/python3.7/asyncio/base_events.py", line 935, in create_connection
    await self.sock_connect(sock, address)
  File "/usr/lib/python3.7/asyncio/selector_events.py", line 475, in sock_connect
    return await fut
  File "/usr/lib/python3.7/asyncio/selector_events.py", line 505, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('127.0.0.1', 13337)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.local/lib/python3.7/site-packages/aioamqp/protocol.py", line 333, in run
    yield from self.dispatch_frame()
  File "/root/.local/lib/python3.7/site-packages/aioamqp/protocol.py", line 280, in dispatch_frame
    frame = yield from self.get_frame()
  File "/root/.local/lib/python3.7/site-packages/aioamqp/protocol.py", line 264, in get_frame
    yield from frame.read_frame()
  File "/root/.local/lib/python3.7/site-packages/aioamqp/frame.py", line 453, in read_frame
    data = yield from self.reader.readexactly(7)
  File "/usr/lib/python3.7/asyncio/streams.py", line 679, in readexactly
    await self._wait_for_data('readexactly')
  File "/usr/lib/python3.7/asyncio/streams.py", line 473, in _wait_for_data
    await self._waiter
concurrent.futures._base.CancelledError
2019-01-06 02:12:55,431 INFO     Received new model: version=225, size=1207326b
2019-01-06 02:12:55,434 INFO     Updated weights to version 225
2019-01-06 02:13:40,232 INFO     Received new model: version=226, size=1207326b
2019-01-06 02:13:40,235 INFO     Updated weights to version 226
2019-01-06 02:14:24,366 INFO     Received new model: version=227, size=1207326b
2019-01-06 02:14:24,369 INFO     Updated weights to version 227
2019-01-06 02:15:05,102 INFO     Received new model: version=228, size=1207326b
2019-01-06 02:15:05,106 INFO     Updated weights to version 228
2019-01-06 02:15:45,595 INFO     Received new model: version=229, size=1207326b
2019-01-06 02:15:45,598 INFO     Updated weights to version 229
2019-01-06 02:16:23,744 INFO     Received new model: version=230, size=1207326b
2019-01-06 02:16:23,747 INFO     Updated weights to version 230
2019-01-06 02:16:59,806 INFO     Received new model: version=231, size=1207326b
2019-01-06 02:16:59,809 INFO     Updated weights to version 231
2019-01-06 02:17:40,834 INFO     Received new model: version=232, size=1207326b
2019-01-06 02:17:40,838 INFO     Updated weights to version 232
2019-01-06 02:18:19,436 INFO     Received new model: version=233, size=1207326b
2019-01-06 02:18:19,440 INFO     Updated weights to version 233
2019-01-06 02:19:18,664 INFO     Received new model: version=234, size=1207326b
2019-01-06 02:19:18,668 INFO     Updated weights to version 234
2019-01-06 02:20:00,478 INFO     Received new model: version=235, size=1207326b
2019-01-06 02:20:00,481 INFO     Updated weights to version 235
2019-01-06 02:20:40,009 INFO     Received new model: version=236, size=1207326b
2019-01-06 02:20:40,013 INFO     Updated weights to version 236
2019-01-06 02:21:20,128 INFO     Received new model: version=237, size=1207326b
2019-01-06 02:21:20,131 INFO     Updated weights to version 237
2019-01-06 02:22:01,556 INFO     Received new model: version=238, size=1207326b
2019-01-06 02:22:01,559 INFO     Updated weights to version 238
2019-01-06 02:22:49,645 INFO     Received new model: version=239, size=1207326b
2019-01-06 02:22:49,648 INFO     Updated weights to version 239
2019-01-06 02:23:26,087 INFO     Received new model: version=240, size=1207326b
2019-01-06 02:23:26,090 INFO     Updated weights to version 240
2019-01-06 02:24:05,643 INFO     Received new model: version=241, size=1207326b
2019-01-06 02:24:05,646 INFO     Updated weights to version 241
2019-01-06 02:24:54,556 INFO     Received new model: version=242, size=1207326b
2019-01-06 02:24:54,559 INFO     Updated weights to version 242
2019-01-06 02:25:29,720 INFO     Received new model: version=243, size=1207326b
2019-01-06 02:25:29,722 INFO     Updated weights to version 243
2019-01-06 02:26:10,312 INFO     Received new model: version=244, size=1207326b
2019-01-06 02:26:10,316 INFO     Updated weights to version 244
2019-01-06 02:26:47,232 INFO     Received new model: version=245, size=1207326b
2019-01-06 02:26:47,235 INFO     Updated weights to version 245