david-lindner / safe-grid-gym Goto Github PK

View Code? Open in Web Editor NEW

16.0 6.0 9.0 3.34 MB

A gym interface for AI safety gridworlds created in pycolab.

License: Apache License 2.0

Python 100.00%

safe-grid-gym's Introduction

safe-grid-gym

An OpenAI Gym interface for the AI safety gridworlds by DeepMind, which are implemented in pycolab.

This repository combines and extends two previous implementations which can be found at:

Features

safe_grid_gym additionally provides:

Additional toy environments
Additional features for the Gym environment:
- Additional render modes ansi and rgb_array allowing for more automated experimentation
- A TransitionBoatRace environments which provides the last two boards as state information
Easier dependency management by providing a setup.py
Unittests for the Gym environment using the demonstrations provided by in the ai-safety-gridworlds repository

To handle the dependency on the ai-safety-gridworlds we use a fork of the official repository that provides a setup.py.

You can use the code from the official ai-safety-gridworlds repository instead by adding it to your PYTHONPATH.

Usage

By using safe_grid_gym the AI safety gridworlds can by used like any other gym environment. For example to take 10 random actions in the boat race environment and render the gridworld, you can do:

import safe_grid_gym
import gym

env = gym.make("BoatRace-v0")
action_space = env.action_space

for i in range(10):
   action = action_space.sample()
   state, reward, done, info = env.step(action)
   env.render(mode="human")

safe-grid-gym's People

Contributors

Stargazers

Watchers

Forkers

jvmncs timorl dhruvramani mariyagcv rk1a stewy33 pallottaenrico volpix28 masonn808

safe-grid-gym's Issues

Wrong Gym version in Travis cache

The current version of Gym in our Travis cache is 0.11.0, however we're using 0.10.9. This causes the build to break as in #23.

We should either modify the code to work with 0.11.0, or delete the current cache. We should also have it check for changes to the setup.py and overwriting the previous cache before restoring pip caches in the future.

There's something wrong with the Toy Gridworlds...

Something about the toy gridworlds is broken. It looks like a tabular Q-learner learns the worst-case policy instead of the best case one, despite thoroughly exploring the environment.

For reference, here's the command I'm using:
python main.py -L test-log/<env_name> -E 1000 -V 100 -EE 50 <env_name> tabular-q -l .2 -dl 6000

When <env_name> is set to sokoban:

When <env_name> is set to way:

When <env_name> is set to corners:

Gym wrapper for cheating

Perhaps rather than making new environments for cheating we could have a wrapper. I think this would be cleaner both here and in usage.

Add transitions to state information

For experiments using safe-grid-agents we want to consider the boat race environment using transitions instead of states.

To implement this, we can just add a parameter to the GridworldEnv class, that if activated causes the observations to be a concatenation of the last state the agent was in and the current state, instead of just the current state.

Remove cheat argument from safety gridworlds wrapper

The cheat argument is no longer necessary, because cheating is handled in the safe-grid-agents repo, as discussed in #6

Runtime errors in new versions of gym(nasium)

Can you apply the following patch fixes runtime errors caused by:

* gym being renamed to gymnasium
* changed gymnasium API requiring calling reset() before step()
* incorrect value of metadata

fix-runtime-errors.txt

I attach it as txt because github has some bug and does not allow me to name it <something>.patch.

Toy Gridworlds Rendering Feedback

There seems to be undesirable behavior in the way the toy gridworlds are rendering in Tensorboard. Since the AI Safety Gridworlds rendering were functioning normally before, I'm assuming this is something to do with code over here. See figure below for example rendering.

Specifically, I feel like the following is unwanted behavior:
(1) multiple agents in each frame (maybe this is a general safe-grid-gym thing, or something to do with a parameter in safe-grid-agents?)
(2) overlapping rendered characters (e.g. the A, S values at the bottom), but also seems related to (1)
(3) only rendering partial trajectories (ideally we'd see the agent go from the initial state to the end state for each trajectory)
(4) what do A and S represent? they seem ambiguous, so as a user I don't know how to interpret those numbers