Coder Social home page Coder Social logo

yarlp's Introduction

Build Status

yarlp

Yet Another Reinforcement Learning Package

Implementations of CEM, REINFORCE, TRPO, DDQN, A2C with reproducible benchmarks. Experiments are templated using jsonschema and are compared to published results. This is meant to be a starting point for working implementations of classic RL algorithms. Unfortunately even implementations from OpenAI baselines are not always reproducible.

A working Dockerfile with yarlp installed can be run with:

  • docker build -t "yarlpd" .
  • docker run -it yarlpd bash

To run a benchmark, simply:

python yarlp/experiment/experiment.py --help

If you want to run things manually, look in examples or look at this:

from yarlp.agent.trpo_agent import TRPOAgent
from yarlp.utils.env_utils import NormalizedGymEnv

env = NormalizedGymEnv('MountainCarContinuous-v0')
agent = TRPOAgent(env, seed=123)
agent.train(max_timesteps=1000000)

Benchmarks

We benchmark against published results and Openai baselines where available using yarlp/experiment/experiment.py. Benchmark scripts for Openai baselines were made ad-hoc, such as this one.

Atari10M

BeamRider Breakout Pong
QBert Seaquest SpaceInvaders

DDQN with dueling networks and prioritized replay

python yarlp/experiment/experiment.py run_atari10m_ddqn_benchmark

I trained 6 Atari environments for 10M time-steps (40M frames), using 1 random seed, since I only have 1 GPU and limited time on this Earth. I used DDQN with dueling networks, but no prioritized replay (although it's implemented). I compare the final mean 100 episode raw scores for yarlp (with exploration of 0.01) with results from Hasselt et al, 2015 and Wang et al, 2016 which train for 200M frames and evaluate on 100 episodes (exploration of 0.05).

I don't compare to OpenAI baselines because the OpenAI DDQN implementation is not currently able to reproduce published results as of 2018-01-20. See this github issue, although I found these benchmark plots to be pretty helpful.

env yarlp DUEL 40M Frames Hasselt et al DDQN 200M Frames Wang et al DUEL 200M Frames
BeamRider 8705 7654 12164
Breakout 423.5 375 345
Pong 20.73 21 21
QBert 5410.75 14875 19220.3
Seaquest 5300.5 7995 50245.2
SpaceInvaders 1978.2 3154.6 6427.3
BeamRiderNoFrameskip-v4 BreakoutNoFrameskip-v4 PongNoFrameskip-v4 QbertNoFrameskip-v4
SeaquestNoFrameskip-v4 SpaceInvadersNoFrameskip-v4

A2C

python yarlp/experiment/experiment.py run_atari10m_a2c_benchmark

A2C on 10M time-steps (40M frames) with 1 random seed. Results compared to learning curves from Mnih et al, 2016 extracted at 10M time-steps from Figure 3. You are invited to run for multiple seeds and the full 200M frames for a better comparison.

env yarlp A2C 40M Mnih et al A3C 40M 16-threads
BeamRider 3150 ~3000
Breakout 418 ~150
Pong 20 ~20
QBert 3644 ~1000
SpaceInvaders 805 ~600
BeamRiderNoFrameskip-v4 BreakoutNoFrameskip-v4 PongNoFrameskip-v4 QbertNoFrameskip-v4
SeaquestNoFrameskip-v4 SpaceInvadersNoFrameskip-v4

Here are some more plots from OpenAI to compare against.

Mujoco1M

TRPO

python yarlp/experiment/experiment.py run_mujoco1m_benchmark

We average over 5 random seeds instead of 3 for both baselines and yarlp. More seeds probably wouldn't hurt here, we report 95th percent confidence intervals.

Hopper-v1 HalfCheetah-v1 Reacher-v1 Swimmer-v1
InvertedDoublePendulum-v1 Walker2d-v1 InvertedPendulum-v1

CLI scripts

CLI convenience scripts will be installed with the package:

  • Run a benchmark:
    • python yarlp/experiment/experiment.py --help
  • Plot yarlp compared to Openai baselines benchmarks:
    • compare_benchmark <yarlp-experiment-dir> <baseline-experiment-dir>
  • Experiments:
    • Experiments can be defined using json, validated with jsonschema. See here for sample experiment configs. You can do a grid search if multiple parameters are specified, which will run in parallel.
    • Example: run_yarlp_experiment --spec-file experiment_configs/trpo_experiment_mult_params.json
  • Experiment plots:
    • make_plots <experiment-dir>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.