Coder Social home page Coder Social logo

safe-grid-gym's Introduction

safe-grid-gym

An OpenAI Gym interface for the AI safety gridworlds by DeepMind, which are implemented in pycolab.

This repository combines and extends two previous implementations which can be found at:

Features

safe_grid_gym additionally provides:

  • Additional toy environments
  • Additional features for the Gym environment:
    • Additional render modes ansi and rgb_array allowing for more automated experimentation
    • A TransitionBoatRace environments which provides the last two boards as state information
  • Easier dependency management by providing a setup.py
  • Unittests for the Gym environment using the demonstrations provided by in the ai-safety-gridworlds repository

To handle the dependency on the ai-safety-gridworlds we use a fork of the official repository that provides a setup.py.

You can use the code from the official ai-safety-gridworlds repository instead by adding it to your PYTHONPATH.

Usage

By using safe_grid_gym the AI safety gridworlds can by used like any other gym environment. For example to take 10 random actions in the boat race environment and render the gridworld, you can do:

import safe_grid_gym
import gym

env = gym.make("BoatRace-v0")
action_space = env.action_space

for i in range(10):
   action = action_space.sample()
   state, reward, done, info = env.step(action)
   env.render(mode="human")

safe-grid-gym's People

Contributors

alok avatar david-lindner avatar dhruvramani avatar jvmncs avatar timorl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

safe-grid-gym's Issues

Wrong Gym version in Travis cache

The current version of Gym in our Travis cache is 0.11.0, however we're using 0.10.9. This causes the build to break as in #23.

We should either modify the code to work with 0.11.0, or delete the current cache. We should also have it check for changes to the setup.py and overwriting the previous cache before restoring pip caches in the future.

There's something wrong with the Toy Gridworlds...

Something about the toy gridworlds is broken. It looks like a tabular Q-learner learns the worst-case policy instead of the best case one, despite thoroughly exploring the environment.

For reference, here's the command I'm using:
python main.py -L test-log/<env_name> -E 1000 -V 100 -EE 50 <env_name> tabular-q -l .2 -dl 6000

When <env_name> is set to sokoban:
screen shot 2019-01-11 at 6 12 35 pm

When <env_name> is set to way:
screen shot 2019-01-11 at 6 12 50 pm

When <env_name> is set to corners:
screen shot 2019-01-11 at 6 13 01 pm

Gym wrapper for cheating

Perhaps rather than making new environments for cheating we could have a wrapper. I think this would be cleaner both here and in usage.

Add transitions to state information

For experiments using safe-grid-agents we want to consider the boat race environment using transitions instead of states.

To implement this, we can just add a parameter to the GridworldEnv class, that if activated causes the observations to be a concatenation of the last state the agent was in and the current state, instead of just the current state.

Runtime errors in new versions of gym(nasium)

Can you apply the following patch fixes runtime errors caused by:

* gym being renamed to gymnasium
* changed gymnasium API requiring calling reset() before step()
* incorrect value of metadata

fix-runtime-errors.txt

I attach it as txt because github has some bug and does not allow me to name it <something>.patch.

Toy Gridworlds Rendering Feedback

There seems to be undesirable behavior in the way the toy gridworlds are rendering in Tensorboard. Since the AI Safety Gridworlds rendering were functioning normally before, I'm assuming this is something to do with code over here. See figure below for example rendering.

Specifically, I feel like the following is unwanted behavior:
(1) multiple agents in each frame (maybe this is a general safe-grid-gym thing, or something to do with a parameter in safe-grid-agents?)
(2) overlapping rendered characters (e.g. the A, S values at the bottom), but also seems related to (1)
(3) only rendering partial trajectories (ideally we'd see the agent go from the initial state to the end state for each trajectory)
(4) what do A and S represent? they seem ambiguous, so as a user I don't know how to interpret those numbers

render corner

Merge with toy gridworlds

To avoid code duplication perhaps we could merge this repo with the toy gridworlds one (whichever way is more convenient). This would be especially nice for having a consistent interface for corrupted gridworlds and would also make solving #6 easier.

dtype of board is forced to float32

When converting a position into an observation the dtype of the board is forced to float32. This breaks the assertion that the resulting observation is within the observation space for the toy envs.

AI Safety Gridworlds version 1.3

DeepMind has pushed Version 1.3 to the upstream repo. From a brief glance at the commit history there don't seem to be any breaking changes, but it might be worth a quick test to see if we can pull the changes in.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.