The sequential_social_dilemma_games's discuss from eugenevinitsky

Custom post-processor in RLlib for causal influence reward

This is super vague; I haven't investigated this part of RLlib enough yet to know what needs to be done here

Agents occasionally leaves holes when they step over a spot in cleanup

Create env for cleanup

Add synchrony to agent actions

Agent actions are currently stepped through one by one regardless of how they may interact with other actions.
For example, if an agent is at [1,1] with the intent of moving right and there is an agent at [2,2] with the intent of moving right, if agent at [1,1] moves first this move will be disallowed as there is already an agent there.

Apples can't spawn where a firing beam is

The order of events is:
(1) Move
(2) Fire
(3) Spawn

However, we currently check in the apple spawn method if a firing beam is in the apple spawn point, and if so, we don't spawn an apple there. However, this leads to two bugs:
(1) Apples can't be spawned in points temporarily obscured by a firing beam
(2) We check if an agent is currently in that spot before spawning an apple, however, an agent COULD be there and is just temporarily obscured by the beam.

Create agents for cleanup

Write tests for all the padding functions in map_env

Implement interaction of firing beams with waste in cleanup

Add a gitignore

gitignore:
pycache
eggs
Can probably use a standard gitignore.

Calling env.render doesn't do anything

env.render should use matplotlib to show the results. It currently appears to do nothing.

Create causal Q learning diagram

Figure out if any additional variables need to be conditioned on
Create graphic for publication

Add description of how to create a new env

README.md should describe the six methods every subclass of MapEnv needs to implement.

Fix spawn probabilities in cleanup

In cleanup the spawn probabilities seem to not quite work correctly; the waste spawns too fast.

Write utility function to rollout from rllib checkpoints

agent.get_state() returns a window of size (5,7,3)

Add test for river crossing in cleanup

if an agent crosses a river tile in cleanup it should stay when it moves
if an agent crosses a river tile, stays there, and then moves, the river tile should still be there when it moves
Similarly for stream cells, waste cells

Clean up map update methods

Map update methods are currently split in a very ad-hoc way between HarvestEnv and MapEnv. As much of the update process should be moved into MapEnv.

Create agents for harvest

Create environments for harvest and cleanup

Descriptions of harvest and cleanup can be found at:
https://arxiv.org/pdf/1810.08647.pdf

Environment for Cleanup

Colors for orientations

Agents observing other agents need to know which way they are facing.
We need to shade the colors of the agents slightly (or some other scheme) to indicate orientation

Add test for agent firing twice at apple in harvest

if an agent fires twice at an apple consecutively, it should still be there when he stops

Finish run script

Run script is up on the example_script branch in the run_script folder.
Remaining part is to figure out how to set the filters correctly for the ConvNet.

Unit tests for harvest

Run script with more than 1 worker seems to freeze

set NUM_CPUS > 1 in run_harvest.py
see if it is or isn't freezing.

Insert correct hyperparameters for harvest

Hyperparams for A3C
Window size of harvest agent
Convolutional filters from paper
Correct rewards

Unit tests for cleanup

Check model correctness

@natashamjaques I've implemented a model from the paper; it is in the models/causal_to_fc_net.py. It would be good if you could check it over and see if it looks right to you. It'd be a tragedy if down the line we realized I misunderstood something.

Tool to save the videos

Add agent actions to rendered videos

If we can see what actions the agents intended to take at each frame, it will be easier to debug visually.
For example, the top right corner of the video could say: "Agent-1, move-left" or "Agent-2, fire-up"

Replay is broken for custom models

Custom models must be registered to be re-used in RLlib via: ModelCatalog.register_custom_model("", ). However, when we replay things we don't have this information anymore. Consider storing this as a tune function in the env config so that it can be recreated later.

Basic run script for IQL in RLlib

Add style checking + continuous integration tests.

Travis + Flake9

Add return types and parameters for all methods in HarvestEnv

...before it's too late.

What happens if a firing beam hits an agent?

@natashamjaques do we just overlay a new color that indicates agent + firing beam, does the firing beam temporarily obscure the agent, or does the agent stay with color unchanged but a point gets subtracted? There's some choice here and I'm curious what ya'll did.

Firing beam does not have the correct shape in harvest.

It should look like:

but it currently is just a straight line.

Run Independent learners for harvest and compare to results

Implementation of a random agent

Agent should just take random actions
This will let us have something to visualize before the RL kicks in

What happens if a firing beam hits an apple?

Currently, a firing beam will cause the apple to disappear. Is this the correct behavior?

Firing is not showing up in video

pass env type to rollout in command line

I'm currently picking harvest vs. cleanup by actually changing it in the code in rollout.py, this should be a command line arg.

Implement spawning for lemons, apples, and waste in cleanup

function SoftHuangpu:postUpdate(gameState)
local wasteDensity = 0
if self._config.potentialWasteArea ~= 0 then
wasteDensity = 1 - self._state.permittedLemonCells:size() /
self._config.potentialWasteArea
end
if wasteDensity >= self._config.thresholdDepletion then
self._state.lemonSpawnProbability = 0
self._state.appleRespawnProbability = 0
else
self._state.lemonSpawnProbability = self._config.lemonSpawnProbability
if wasteDensity <= self._config.thresholdRestoration then
self._state.appleRespawnProbability = self._config.appleRespawnProbability
else
-- Interpolate.
self._state.appleRespawnProbability = (1 - (
wasteDensity - self._config.thresholdRestoration) / (
self._config.thresholdDepletion - self._config.thresholdRestoration)) *
self._config.appleRespawnProbability
end
end
super.postUpdate(self, gameState)
end

The hyperparameters are:

thresholdDepletion = 0.4
thresholdRestoration = 0.0
lemonSpawnProbability = 0.5
appleRespawnProbability = 0.05

Color map

Need to convert internal manipulated map environment into an appropriate color state for the agent.

Second agent does not appear on rollout video

Pretty sure this isn't a bug so much as the screen is big and the whole thing doesn't get rendered.

River cells are not correctly put back after being hit with a firing beam in cleanup

Create env for harvest

Renderer to display the environments

Build model for Harvest

Construct the following model in the run scripts (https://arxiv.org/pdf/1810.08647.pdf section 6.3):
a single convolutional layer with a kernel of size 3, stride of size 1, and 6 output
channels. This is connected to two fully connected layers of size 32 each, and an LSTM with 128 cells.

Fix orientation of axes

Currently the orientation of the axes is i.e. the diagram of moving in the ascii array looks like this

This is convenient for coding but will need to be flipped when actually displaying.

Add a test that the run_scripts work

You can run each of the run_scripts for 50 steps just to check that they are not broken by any changes.

eugenevinitsky / sequential_social_dilemma_games Goto Github PK

sequential_social_dilemma_games's Issues

Recommend Projects

Recommend Topics

Recommend Org