Coder Social home page Coder Social logo

araffin / rl-baselines-zoo Goto Github PK

View Code? Open in Web Editor NEW
1.1K 32.0 205.0 375.67 MB

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.

Home Page: https://stable-baselines.readthedocs.io/

License: MIT License

Python 95.62% Shell 2.81% Dockerfile 0.93% Makefile 0.65%
rl zoo reinforcement-learning stable-baselines openai-gym openai gym pybullet hyperparameters optimization

rl-baselines-zoo's Introduction

WARNING: This repository is no longer maintained, please use the RL-Baselines3 Zoo for an up-to-date version, powered by Stable-Baselines3

Build Status

RL Baselines Zoo: a Collection of Pre-Trained Reinforcement Learning Agents

A collection of trained Reinforcement Learning (RL) agents, with tuned hyperparameters, using Stable Baselines.

We are looking for contributors to complete the collection!

Goals of this repository:

  1. Provide a simple interface to train and enjoy RL agents
  2. Benchmark the different Reinforcement Learning algorithms
  3. Provide tuned hyperparameters for each environment and RL algorithm
  4. Have fun with the trained agents!

Enjoy a Trained Agent

If the trained agent exists, then you can see it in action using:

python enjoy.py --algo algo_name --env env_id

For example, enjoy A2C on Breakout during 5000 timesteps:

python enjoy.py --algo a2c --env BreakoutNoFrameskip-v4 --folder trained_agents/ -n 5000

If you have trained an agent yourself, you need to do:

# exp-id 0 corresponds to the last experiment, otherwise, you can specify another ID
python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 0

To load the best model (when using evaluation environment):

python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 1 --load-best

Train an Agent

The hyperparameters for each environment are defined in hyperparameters/algo_name.yml.

If the environment exists in this file, then you can train an agent using:

python train.py --algo algo_name --env env_id

For example (with tensorboard support):

python train.py --algo ppo2 --env CartPole-v1 --tensorboard-log /tmp/stable-baselines/

Evaluate the agent every 10000 steps using 10 episodes for evaluation:

python train.py --algo sac --env HalfCheetahBulletEnv-v0 --eval-freq 10000 --eval-episodes 10

Save a checkpoint of the agent every 100000 steps:

python train.py --algo td3 --env HalfCheetahBulletEnv-v0 --save-freq 100000

Continue training (here, load pretrained agent for Breakout and continue training for 5000 steps):

python train.py --algo a2c --env BreakoutNoFrameskip-v4 -i trained_agents/a2c/BreakoutNoFrameskip-v4.pkl -n 5000

Note: when training TRPO, you have to use mpirun to enable multiprocessing:

mpirun -n 16 python train.py --algo trpo --env BreakoutNoFrameskip-v4

Hyperparameter Tuning

We use Optuna for optimizing the hyperparameters.

Note: hyperparameters search is not implemented for ACER and DQN for now. when using SuccessiveHalvingPruner ("halving"), you must specify --n-jobs > 1

Budget of 1000 trials with a maximum of 50000 steps:

python train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \
  --sampler tpe --pruner median

Env Wrappers

You can specify in the hyperparameter config one or more wrapper to use around the environment:

for one wrapper:

env_wrapper: gym_minigrid.wrappers.FlatObsWrapper

for multiple, specify a list:

env_wrapper:
    - utils.wrappers.DoneOnSuccessWrapper:
        reward_offset: 1.0
    - utils.wrappers.TimeFeatureWrapper

Note that you can easily specify parameters too.

Env keyword arguments

You can specify keyword arguments to pass to the env constructor in the command line, using --env-kwargs:

python enjoy.py --algo ppo2 --env MountainCar-v0 --env-kwargs goal_velocity:10

Overwrite hyperparameters

You can easily overwrite hyperparameters in the command line, using --hyperparams:

python train.py --algo a2c --env MountainCarContinuous-v0 --hyperparams learning_rate:0.001 policy_kwargs:"dict(net_arch=[64, 64])"

Record a Video of a Trained Agent

Record 1000 steps:

python -m utils.record_video --algo ppo2 --env BipedalWalkerHardcore-v2 -n 1000

Current Collection: 120+ Trained Agents!

Scores can be found in benchmark.md. To compute them, simply run python -m utils.benchmark.

Atari Games

7 atari games from OpenAI benchmark (NoFrameskip-v4 versions).

RL Algo BeamRider Breakout Enduro Pong Qbert Seaquest SpaceInvaders
A2C ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
ACER ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
ACKTR ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
PPO2 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
DQN ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
TRPO

Additional Atari Games (to be completed):

RL Algo MsPacman
A2C ✔️
ACER ✔️
ACKTR ✔️
PPO2 ✔️
DQN ✔️

Classic Control Environments

RL Algo CartPole-v1 MountainCar-v0 Acrobot-v1 Pendulum-v0 MountainCarContinuous-v0
A2C ✔️ ✔️ ✔️ ✔️ ✔️
ACER ✔️ ✔️ ✔️ N/A N/A
ACKTR ✔️ ✔️ ✔️ ✔️ ✔️
PPO2 ✔️ ✔️ ✔️ ✔️ ✔️
DQN ✔️ ✔️ ✔️ N/A N/A
DDPG N/A N/A N/A ✔️ ✔️
SAC N/A N/A N/A ✔️ ✔️
TD3 N/A N/A N/A ✔️ ✔️
TRPO ✔️ ✔️ ✔️ ✔️

Box2D Environments

RL Algo BipedalWalker-v2 LunarLander-v2 LunarLanderContinuous-v2 BipedalWalkerHardcore-v2 CarRacing-v0
A2C ✔️ ✔️ ✔️ ✔️
ACER N/A ✔️ N/A N/A N/A
ACKTR ✔️ ✔️ ✔️ ✔️
PPO2 ✔️ ✔️ ✔️ ✔️
DQN N/A ✔️ N/A N/A N/A
DDPG ✔️ N/A ✔️
SAC ✔️ N/A ✔️ ✔️
TD3 ✔️ N/A ✔️
TRPO ✔️ ✔️ ✔️

PyBullet Environments

See https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet/gym/pybullet_envs. Similar to MuJoCo Envs but with a free simulator: pybullet. We are using BulletEnv-v0 version.

Note: those environments are derived from Roboschool and are much harder than the Mujoco version (see Pybullet issue)

RL Algo Walker2D HalfCheetah Ant Reacher Hopper Humanoid
A2C ✔️ ✔️ ✔️ ✔️
ACKTR ✔️
PPO2 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
DDPG ✔️ ✔️ ✔️
SAC ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
TD3 ✔️ ✔️ ✔️ ✔️ ✔️
TRPO ✔️ ✔️ ✔️ ✔️

PyBullet Envs (Continued)

RL Algo Minitaur MinitaurDuck InvertedDoublePendulum InvertedPendulumSwingup
A2C
ACKTR
PPO2 ✔️ ✔️ ✔️ ✔️
DDPG
SAC ✔️ ✔️
TD3 ✔️ ✔️
TRPO

MiniGrid Envs

See https://github.com/maximecb/gym-minigrid A simple, lightweight and fast Gym environments implementation of the famous gridworld.

RL Algo Empty FourRooms DoorKey MultiRoom Fetch
A2C
PPO2 ✔️ ✔️
DDPG
SAC
TRPO

There are 19 environment groups (variations for each) in total.

Note that you need to specify --gym-packages gym_minigrid with enjoy.py and train.py as it is not a standard Gym environment, as well as installing the custom Gym package module or putting it in python path.

pip install gym-minigrid
python train.py --algo ppo2 --env MiniGrid-DoorKey-5x5-v0 --gym-packages gym_minigrid

This does the same thing as:

import gym_minigrid

Also, you may need to specify a Gym environment wrapper in hyperparameters, as MiniGrid environments have Dict observation space, which is not supported by StableBaselines for now.

MiniGrid-DoorKey-5x5-v0:
  env_wrapper: gym_minigrid.wrappers.FlatObsWrapper

Colab Notebook: Try it Online!

You can train agents online using colab notebook.

Installation

Stable-Baselines PyPi Package

Min version: stable-baselines[mpi] >= 2.10.0

apt-get install swig cmake libopenmpi-dev zlib1g-dev ffmpeg
pip install -r requirements.txt

Please see Stable Baselines README for alternatives.

Docker Images

Build docker image (CPU):

./scripts/build_docker.sh

GPU:

USE_GPU=True ./scripts/build_docker.sh

Pull built docker image (CPU):

docker pull stablebaselines/rl-baselines-zoo-cpu

GPU image:

docker pull stablebaselines/rl-baselines-zoo

Run script in the docker image:

./scripts/run_docker_cpu.sh python train.py --algo ppo2 --env CartPole-v1

Tests

To run tests, first install pytest, then:

python -m pytest -v tests/

Citing the Project

To cite this repository in publications:

@misc{rl-zoo,
  author = {Raffin, Antonin},
  title = {RL Baselines Zoo},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/araffin/rl-baselines-zoo}},
}

Contributing

If you trained an agent that is not present in the rl zoo, please submit a Pull Request (containing the hyperparameters and the score too).

Contributors

We would like to thanks our contributors: @iandanforth, @tatsubori @Shade5

rl-baselines-zoo's People

Contributors

araffin avatar caburu avatar clonedone avatar gautams3 avatar iandanforth avatar jmribeiro avatar johannesul avatar josiahcoad avatar keshaviyengar avatar laurelkeys avatar shade5 avatar tatsubori avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rl-baselines-zoo's Issues

A custom policy for Dict observation spaces.

I think this is a bit large task so let me raise as an issue here.
Currently flattening Dict observation space (combinations like image and text) for gym envs like MiniGrid is on going.
MlpPolicy can handle it but might look awkward without using appropriating feature extractions such as CNN, etc.

CustomPolicies are the ways to go.

Retrieving Q-values of trained agents. (Question)

Hi there!
Really loving the trained models - kudos to you guys.

Is there a way for me to retrieve the Q-values of these trained agents?
Meaning - I'd like to obtain the probabilities of choosing each action from a given state.
Currently, when setting the stochastic parameter to True, the model.predict method (as shown in enjoy.py) only returns the predicted action.

Is there a configuration I can use in order to obtain the full action probabilities for a given state?

Thanks in advance for your help.

Tensorboard log directory creation fails on windows with train.py

On Windows if you pass the --tensorboard-log argument to train.py it will fail to create the proper directory structure.

Traceback (most recent call last):
File "train.py", line 189, in
main()
File "train.py", line 169, in main
model.learn(n_timesteps)
File "c:\users\ian\clones\stable-baselines\stable_baselines\ppo2\ppo2.py", line 255, in learn
with SetVerbosity(self.verbose), TensorboardWriter(self.graph, self.tensorboard_log, tb_log_name) as writer:
File "c:\users\ian\clones\stable-baselines\stable_baselines\common\base_class.py", line 558, in enter
self.writer = tf.summary.FileWriter(save_path, graph=self.graph)
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\summary\writer\writer.py", line 352, in init
filename_suffix)
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\summary\writer\event_file_writer.py", line 67, in init
gfile.MakeDirs(self._logdir)
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 374, in recursive_create_dir
pywrap_tensorflow.RecursivelyCreateDir(compat.as_bytes(dirname), status)
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Failed to create a directory: tblogs/MuscledAnt-v0\PPO2_1; No such file or directory

Cause

This is due to https://github.com/araffin/rl-baselines-zoo/blob/master/train.py#L51 which should instead use os.path.join()

[question] Architecture Search

I understand that there is a way to tune hyperparameter. Is there a way to tune the actual model (number of layers and tensors)?
If not, is it possible to integrate something like adanet?

pybullet_envs required even for atari environments

Describe the bug
enjoy.py launches a ModuleNotFoundError: No module named 'pybullet_envs' even when operating only on atari environments.

Code example
python enjoy.py --algo a2c --env BreakoutNoFrameskip-v4 --folder trained_agents/ -n 5000

System Info

➜ pip freeze
absl-py==0.8.0
astor==0.8.0
atari-py==0.2.6
attrs==19.1.0
backcall==0.1.0
bleach==3.1.0
cloudpickle==1.2.1
cycler==0.10.0
decorator==4.4.0
defusedxml==0.6.0
entrypoints==0.3
future==0.17.1
gast==0.2.2
google-pasta==0.1.7
grpcio==1.23.0
gym==0.14.0
h5py==2.9.0
ipykernel==5.1.2
ipython==7.8.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
jedi==0.15.1
Jinja2==2.10.1
joblib==0.13.2
jsonschema==3.0.2
jupyter==1.0.0
jupyter-client==5.3.1
jupyter-console==6.0.0
jupyter-core==4.5.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
Markdown==3.1.1
MarkupSafe==1.1.1
matplotlib==3.1.1
mistune==0.8.4
mpi4py==3.0.2
nbconvert==5.6.0
nbformat==4.4.0
notebook==6.0.1
numpy==1.16.4
opencv-python==4.1.0.25
pandas==0.25.1
pandocfilters==1.4.2
parso==0.5.1
pexpect==4.7.0
pickleshare==0.7.5
Pillow==6.1.0
prometheus-client==0.7.1
prompt-toolkit==2.0.9
protobuf==3.9.1
ptyprocess==0.6.0
pyglet==1.3.2
Pygments==2.4.2
pyparsing==2.4.2
pyrsistent==0.15.4
python-dateutil==2.8.0
pytz==2019.2
PyYAML==5.1.2
pyzmq==18.1.0
qtconsole==4.5.4
scikit-learn==0.21.3
scipy==1.3.1
Send2Trash==1.5.0
six==1.12.0
snakeviz==2.0.1
stable-baselines==2.7.0
tensorboard==1.14.0
tensorflow==1.14.0
tensorflow-estimator==1.14.0
termcolor==1.1.0
terminado==0.8.2
testpath==0.4.2
tornado==6.0.3
traitlets==4.3.2
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.15.5
widgetsnbextension==3.5.1
wrapt==1.11.2

Additional context
From the code of enjoy.py it seemed that the import of pybullet_envs is guarded, however the module raises the exception anyway.

# For pybullet envs
warnings.filterwarnings("ignore")
import gym
try:
    import pybullet_envs
except ImportError:
    pybullet_envs = None
import numpy as np
try:
    import highway_env
except ImportError:
    highway_env = None
import stable_baselines

The full trace of the exception is:

Traceback (most recent call last):
  File "enjoy.py", line 24, in <module>
    from utils import ALGOS, create_test_env, get_latest_run_id, get_saved_hyperparams
  File "/home/giorgio/projects/research/rl-baselines-zoo/utils/__init__.py", line 1, in <module>
    from .utils import make_env, ALGOS, linear_schedule, create_test_env,\
  File "/home/giorgio/projects/research/rl-baselines-zoo/utils/utils.py", line 9, in <module>
    import pybullet_envs
ModuleNotFoundError: No module named 'pybullet_envs'

So the problem is an unguarded import in rl-baselines-zoo/utils/utils.py

Easy fix would be to just wrap the import in the utils.py module, otherwise pybullet could be listed as a required dependency.

[Feature request] Checkpointing in train.py

I've been using train.py as an entry point for testing new agents and environments.

These agents, environments, and associated code are not neccesarily stable. Either in the sense of the physics simulation or in terms of code quality.

Thus is it is not surprising that when training for millions of steps there will be mishaps for example:

  • An agent becomes physically unstable and the simulator crashes or hangs
  • An agent enters an undesirable part of policy space and begins behaving erratically
  • An agent learns to hack the reward function
  • Some other part of the code crashes or hangs

Currently train.py saves the state of the agent after the specified training run completes.

This ticket is to consider and discuss adding a mechanism by which intermediate checkpoints are produced during training.

New process started before current process has finished its bootstrapping

Describe the bug

There is a race condition in the multi-processing lib if you don't use

if __name__ == "__main__":
    main()

System Info

OS: Windows 10
Installed: via pip
GPU: 970
Python: 3.6.5

Additional context

C:\Users\Ian\clones\rl-baselines-zoo>python enjoy.py --algo ppo2 --env Walker2DBulletEnv-v0
pybullet build time: Jan 11 2019 15:30:44
c:\users\ian\clones\gym\gym\logger.py:30: UserWarning: �[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.�[0m
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
pybullet build time: Jan 11 2019 15:30:44
c:\users\ian\clones\gym\gym\logger.py:30: UserWarning: �[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.�[0m
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\Ian\clones\rl-baselines-zoo\enjoy.py", line 59, in
seed=args.seed, log_dir=log_dir, should_render=not args.no_render)
File "C:\Users\Ian\clones\rl-baselines-zoo\utils\utils.py", line 115, in create_test_env
env = SubprocVecEnv([make_env(env_id, 0, seed, log_dir)])
File "c:\users\ian\clones\stable-baselines\stable_baselines\common\vec_env\subproc_vec_env.py", line 59, in init
process.start()
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\multiprocessing\popen_spawn_win32.py", line 33, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Clarification Optuna optimization n-jobs

Hi, i would like to ask for a clarification in how optuna is actually implemented. Suppose i have a DummyVecEnv, and in the configuration file I have n_envs = 4. I know that they will run in just one core of the CPU sequentially.
But what happens if i launch the optuna optimization with a number of jobs = 2? Do Optuna automatically will use two cores, putting 4 env in one core e 4 to the another?

Thank you!

[Question] When should TimeFeatureWrapper be used?

Hello,

In the tuned hyperparameters yml files, I noticed that some environments are wrapped with TimeFeatureWrapper. This is the case for most environments trained with TD3 (and TRPO) but not for the other algorithms. How do you decide when the environment should be wrapped in a TimeFeatureWrapper?

I understand from this paper that this wrapper is necessary for environments with a fixed number of time steps so that they respect the Markov property.

To give more context, I would like to compare the performance of TD3 and A2C for a same environment over an equal number of time steps per episode.
If I train with TimeFeatureWrapper, the episode lengths are not guarantied to be equal so comparing the mean reward per episode doesn't make sense anymore.
If I train without the wrapper, I may violate the Markov property.

Thanks,
Pierre

[question] Why is the environment instantiated differently for DDPG and DQN?

Hello,

Thanks a lot for this amazing code.

I noticed that the environment is instantiated differently when using either DQN or DDPG. Specifically at line 249 of train.py, the env is created with:

env = gym.make(env_id, **env_kwargs)
env.seed(args.seed)

in the case of DQN and DDPG whereas it is created with the make_env helper:

env = DummyVecEnv([make_env(env_id, 0, args.seed, wrapper_class=env_wrapper, log_dir=log_dir, env_kwargs=env_kwargs)])

for all the other algorithms.

This means that the environment is not vectorized, it is not possible to specify the log directory and to monitor the training. Why did you make a special case for DQN and DDPG?

Thanks

Docker repos missing files

Have rl-baselines-zoo, GPU edition, pulled, not built.

Trying to run:

docker run -it --runtime=nvidia --rm --network host --ipc=host --name test --mount src="$(pwd)",target=/root/code/stable-baselines,type=bind araffin/stable-baselines bash -c 'cd /root/code/stable-baselines/ && pytest tests/'

Am running:

sudo docker run --runtime=nvidia -it araffin/stable-baselines bash

Traversing into /root/code/, the directory is empty. It seems there is something wrong about the repository. Similar issues with the rl-zoo image.

I have little experience with docker, so I might well have missed something.

Kind regards

Cannot save in google colab

This is my google colab file

https://colab.research.google.com/drive/1dnROnz1kDQsHI4ReTjjF79EnHc3GlvKd

For some reason I cannot seem to save any trained model in google colab.
It is giving some kind of segmentation error. The initial part of the file is the same as the template google colab file and the version of stable baselines is the latest as I am updating the stable baselines version.

However on trying to save it is giving this error.
!python train.py --algo ppo2 --env BipedalWalkerHardcore-v2 -f logs -n 10000
========== BipedalWalkerHardcore-v2 ==========
OrderedDict([('cliprange', 'lin_0.2'),
('ent_coef', 0.001),
('gamma', 0.99),
('lam', 0.95),
('learning_rate', 'lin_2.5e-4'),
('n_envs', 16),
('n_steps', 2048),
('n_timesteps', 100000000.0),
('nminibatches', 32),
('noptepochs', 10),
('normalize', True),
('policy', 'MlpPolicy')])
Using 16 environments
Normalizing input and return
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/stable_baselines/common/policies.py:436: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Saving to logs/ppo2/BipedalWalkerHardcore-v2_2
[e88f4df2d52c:03067] *** Process received signal ***
[e88f4df2d52c:03067] Signal: Segmentation fault (11)
[e88f4df2d52c:03067] Signal code: Address not mapped (1)
[e88f4df2d52c:03067] Failing at address: 0x7fee1aba120d
[e88f4df2d52c:03067] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fee1de55890]
[e88f4df2d52c:03067] [ 1] /lib/x86_64-linux-gnu/libc.so.6(getenv+0xa5)[0x7fee1da94785]
[e88f4df2d52c:03067] [ 2] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(_ZN13TCMallocGuardD1Ev+0x34)[0x7fee1e2ffe44]
[e88f4df2d52c:03067] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xf5)[0x7fee1da95615]
[e88f4df2d52c:03067] [ 4] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(+0x13cb3)[0x7fee1e2fdcb3]
[e88f4df2d52c:03067] *** End of error message ***

I want to save the logs folder on to my google drive or at least find a way to download them into my computer. But in this case I cannot even seem to save the file onto log it seems.

Installation od dependencies

My System is Ubuntu 16.04
Core i3
4 GB RAM
Laptop Integrated Graphics
HP Laptop

When I try installing the dependency using the command given below:
pip install stable-baselines>=2.2.1 box2d box2d-kengz pyyaml pybullet==2.1.0 pytablewriter

It gives the following error:
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-org3fV/stable-baselines/

I looked it up online and found they needed me to update setuptools. I updated setuptools using pip upgrade but it is still giving the same results.

AttributeError: module 'train' has no attribute '__path__'

Baselines Zoo

Following the example scripts on the GitHub welcome page.

./run_docker_cpu.sh python -m train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \   --sampler random --pruner median

python -m train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \   --sampler random --pruner median

Both give the same error msg:

... Error while finding module specification for 'train.py' (AttributeError: module 'train' has no attribute '__path__') #27

System Info

  • Mac 10.14.4 (Mojave)
  • Python 3.6.8

Additional context
Other scripts, like
./run_docker_cpu.sh python train.py --algo ppo2 --env CartPole-v1
run OK.

ModuleNotFoundError: No module named 'stable_baselines.ddpg.memory', when loading ddpg pendulum-v0

Describe the bug
ModuleNotFoundError: No module named 'stable_baselines.ddpg.memory', when loading ddpg pendulum-v0

Code example

from utils import ALGOS
folder = "trained_agents/zoo/"

def load_model(env_name, algo):
    algo_path = os.path.join(folder, algo)
    print(algo_path)
    assert os.path.isdir(algo_path), "The {} folder was not found".format(algo_path)

    found = False
    for ext in ['pkl', 'zip']:
        model_path = "{}/{}.{}".format(algo_path, env_name, ext)
        found = os.path.isfile(model_path)
        if found:
            break
    if not found:
        raise ValueError("No model found for {} on {}, path: {}".format(algo, env_name, model_path))

    nn_model = ALGOS[algo].load(model_path)

    return nn_model

if __name__ == "__main__":
	nn_model = load_model("Pendulum-v0", "ddpg")
Traceback (most recent call last):
  File "zoo_model.py", line 33, in <module>
    nn_model = load_model("Pendulum-v0", "ddpg")
  File "zoo_model.py", line 28, in load_model
    nn_model = ALGOS[algo].load(model_path)
  File "/home/zxiong/development/general_dev_p3/lib/python3.6/site-packages/stable_baselines/ddpg/ddpg.py", line 1104, in load
    data, params = cls._load_from_file(load_path, custom_objects=custom_objects)
  File "/home/zxiong/development/general_dev_p3/lib/python3.6/site-packages/stable_baselines/common/base_class.py", line 615, in _load_from_file
    data, params = BaseRLModel._load_from_file_cloudpickle(load_path)
  File "/home/zxiong/development/general_dev_p3/lib/python3.6/site-packages/stable_baselines/common/base_class.py", line 549, in _load_from_file_cloudpickle
    data, params = cloudpickle.load(file_)
ModuleNotFoundError: No module named 'stable_baselines.ddpg.memory'

System Info
Describe the characteristic of your environment:

  • python 3.6.7
  • Tensorflow 1.14.0
  • cloudpickle 1.2.1

[Question] TD3 on FetchReach?

I ran with the hyperparameters given for FetchReach TD3, and it wasn't able to solve perfectly in 25k timesteps like written — can someone verify that this is the case?

Can't tune hyperparameters with CustomSACPolicy - multiple values for keyword argument 'layers'

Describe the bug
Running hyperparameter tuning with SAC and CustomSACPolicy returns

TypeError: __init__() got multiple values for keyword argument 'layers'.

Note, hyperparameter tuning is working fine with MlpPolicy and normal training is working fine with CustomSACPolicy. The issue seems to be coming from Tensorflow.

Code example

After a recent git clone and using the defaults hyperparameters in hyperparameters/sac.yml:

python train.py --algo sac --env HopperBulletEnv-v0 -n 50000 -optimize --n-trials 100 --n-jobs 1

Full traceback:

Traceback (most recent call last):
  File "***/bin/anaconda3/lib/python3.7/site-packages/optuna/study.py", line 648, in _run_trial
    result = func(trial)
  File "***/rl-baselines-zoo/utils/hyperparams_opt.py", line 88, in objective
    model = model_fn(**kwargs)
  File "train.py", line 373, in create_model
    verbose=0, **kwargs)
  File "***/bin/anaconda3/lib/python3.7/site-packages/stable_baselines/sac/sac.py", line 125, in __init__
    self.setup_model()
  File "***/bin/anaconda3/lib/python3.7/site-packages/stable_baselines/sac/sac.py", line 145, in setup_model
    **self.policy_kwargs)
  File "***/rl-baselines-zoo/utils/utils.py", line 71, in __init__
    feature_extraction="mlp")
TypeError: __init__() got multiple values for keyword argument 'layers'

System Info

  • stable baselines: 2.10.0 (installed with pip)
  • rl-baselines-zoo commit: 645ea17
  • Python 3.7.4
  • Tensorflow: 1.14.0
  • Gym: 0.15.4
  • Pybullet: 2.5.8
  • Ubuntu 18.04
  • GPU: GeForce GTX 1060
  • CUDA: 10.2

Running average file (obs_rms.pkl, ret_rms.pkl) for Humanoid-v2?

When making an env for 'HumanoidBulletEnv-v0' using create_test_env, Running average files are loaded from stats_path directory.

I tried loading env without it, but the acts like as it is not trained at all, but was able to walk when I load Running average files.

I've trained Mujoco's 'Humanoid-v2' using PPO with same hyper parameters provided by you in ppo2.yml and the rewards also improves over time.

But, when I render the trained 'Humanoid-v2', the acts like it is not trained.

I think, it's because I need to load the 'Humanoid-v2' with Running average files.
Could you please provide Running average files for 'Humanoid-v2'.

I tried loading 'HumanoidBulletEnv-v0' Running average files for 'Humanoid-v2' env, but I got this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-7bfe2d31eb5a> in <module>
      7 
      8 # Enjoy trained agent
----> 9 obs = env.reset()
     10 running_reward = 0.0
     11 ep_len = 0

~/anaconda2/envs/py3_6/lib/python3.6/site-packages/stable_baselines/common/vec_env/vec_normalize.py in reset(self)
     87             self.old_obs = obs
     88         self.ret = np.zeros(self.num_envs)
---> 89         return self._normalize_observation(obs)
     90 
     91     def save_running_average(self, path):

~/anaconda2/envs/py3_6/lib/python3.6/site-packages/stable_baselines/common/vec_env/vec_normalize.py in _normalize_observation(self, obs)
     63             if self.training:
     64                 self.obs_rms.update(obs)
---> 65             obs = np.clip((obs - self.obs_rms.mean) / np.sqrt(self.obs_rms.var + self.epsilon), -self.clip_obs,
     66                           self.clip_obs)
     67             return obs

ValueError: operands could not be broadcast together with shapes (1,376) (44,) 

[question] Tuning Gym env variable?

In Gym, it's possible to pass arguments to the env when making it. I.e.,:
env = gym.make('Myrl-v0', **params)
Is it possible to tune a gym env variable with baseline-zoo train (or otherwise)?

Default value for hyperparams raises AttributeError

Describe the bug
The current default value form hyperparams raises an AttributeError in utils.create_test_env.

Code example

# use default value for `hyperparams`
_env = create_test_env("BreakoutNoFrameskip-v4", n_envs=1, seed=0, 
                       is_atari=True, log_dir='.', should_render=False)

Traceback:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-35-05121958d8f1> in <module>()
----> 1 _env = create_test_env("BreakoutNoFrameskip-v4", n_envs=1, seed=0, is_atari=True, log_dir='.', should_render=False)

/content/rl-baselines-zoo/utils/utils.py in create_test_env(env_id, n_envs, is_atari, stats_path, seed, log_dir, should_render, hyperparams)
    160 
    161     # Create the environment and wrap it if necessary
--> 162     env_wrapper = get_wrapper_class(hyperparams)
    163     if 'env_wrapper' in hyperparams.keys():
    164         del hyperparams['env_wrapper']

/content/rl-baselines-zoo/utils/utils.py in get_wrapper_class(hyperparams)
     93         return wrapper_name.split('.')[-1]
     94 
---> 95     if 'env_wrapper' in hyperparams.keys():
     96         wrapper_name = hyperparams.get('env_wrapper')
     97         wrapper_module = importlib.import_module(get_module_name(wrapper_name))

AttributeError: 'NoneType' object has no attribute 'keys'

System Info
Running on Colab, with version 2.9.0a0 of Stable Baselines, and using the code after cloning the repository as:

!git clone https://github.com/araffin/rl-baselines-zoo.git
cd rl-baselines-zoo/
from utils import create_test_env
cd ..
_env = create_test_env("BreakoutNoFrameskip-v4", n_envs=1, seed=0, is_atari=True, log_dir='.', should_render=False)

(where each line is a code cell).

Additional context
In the create_test_env function, hyperparams=None by default:

def create_test_env(env_id, n_envs=1, is_atari=False,
stats_path=None, seed=0,
log_dir='', should_render=True, hyperparams=None):

However, this raises an AttributeError ('NoneType' object has no attribute 'keys') when trying to call hyperparams.keys():

if 'env_wrapper' in hyperparams.keys():

Shouldn't hyperparams={} by default? Or keep it None, but add a check for it in the line above?

Google colab error for Soft actor critic

I am running the rl-baselines-zoo for humanoid bullet in google colab. At first I ran it with ppo2 and it gave a very good result with rewards going upto 1600. Now I am running the Softactor critic and it is giving the following error.

!python train.py --algo sac --env HumanoidBulletEnv-v0 --n-timesteps 10000000

========== HumanoidBulletEnv-v0 ==========
OrderedDict([('batch_size', 64),
             ('buffer_size', 1000000),
             ('ent_coef', 'auto'),
             ('gradient_steps', 1),
             ('learning_rate', 'lin_3e-4'),
             ('learning_starts', 1000),
             ('n_timesteps', 20000000.0),
             ('normalize', "{'norm_obs': True, 'norm_reward': False}"),
             ('policy', 'CustomSACPolicy'),
             ('train_freq', 1)])
Using 1 environments
pybullet build time: Apr 11 2019 07:40:52
/usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
Normalizing input and return
Traceback (most recent call last):
  File "train.py", line 171, in <module>
    model = ALGOS[args.algo](env=env, tensorboard_log=tensorboard_log, verbose=1, **hyperparams)
TypeError: 'NoneType' object is not callable
[fc45386ee43e:03596] *** Process received signal ***
[fc45386ee43e:03596] Signal: Segmentation fault (11)
[fc45386ee43e:03596] Signal code: Address not mapped (1)
[fc45386ee43e:03596] Failing at address: 0x7f0ed740320d
[fc45386ee43e:03596] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f0eda6b8890]
[fc45386ee43e:03596] [ 1] /lib/x86_64-linux-gnu/libc.so.6(getenv+0xa5)[0x7f0eda2f7785]
[fc45386ee43e:03596] [ 2] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(_ZN13TCMallocGuardD1Ev+0x34)[0x7f0edab62e44]
[fc45386ee43e:03596] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xf5)[0x7f0eda2f8615]
[fc45386ee43e:03596] [ 4] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(+0x13cb3)[0x7f0edab60cb3]
[fc45386ee43e:03596] *** End of error message ***

BreakoutNoFrameskip-v4 in ppo2 does not match

When I try to load the pkl file from ppo2 folder (under trained_agents), I got this error:
TypeError: Expected float32 passed to parameter 'value' of op 'Assign', got 'model/c1/w:0' of type 'str' instead.

After checking the folder, I found that the description for BreakoutNoFrameskip-v4.pkl is td3.
I think there may exist some mistakes.

How to use Optuna for custom environments

This isn't a bug or anything like that, but I wonder if anyone could point me in the right direction.

One can do this:

python -m train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 --sampler random --pruner median

But when you've created a custom environment...

env=DummyVecEnv([lambda: RunEnv(...)])
model= A2C(CnnPolicy,env).learn(total_timesteps)

... how can I enter the Optuna parameters - or is it even possible?

Of course I can create a custom Gym environment, but that's a bit clunky.

Thankful for feedback

Kind regards

Cannot reproduce DQN Breakout baseline

I'm excited by the "stable" promise of stable-baseline but currently, I'm not able to reproduce DQN results for Breakout. It is well known that you should get 300+ score in Breakout with DQN and this can be confirmed by monitor.csv in benchmark.zip in this repo. Coincidently, OpenAI Baseline is also broken for DQN/Breakout. I'm suspecting their bug has also impacted stable-baselines.

Here are my results:

python train.py --algo dqn --env BreakoutNoFrameskip-v4

Tensotboard curve:
image

Last 3 stdout log:

--------------------------------------
| % time spent exploring  | 1        |
| episodes                | 56300    |
| mean 100 episode reward | 8.9      |
| steps                   | 9940866  |
--------------------------------------
--------------------------------------
| % time spent exploring  | 1        |
| episodes                | 56400    |
| mean 100 episode reward | 8.4      |
| steps                   | 9967305  |
--------------------------------------
--------------------------------------
| % time spent exploring  | 1        |
| episodes                | 56500    |
| mean 100 episode reward | 8.5      |
| steps                   | 9993810  |
--------------------------------------

As we can see training does not converge and reward stay stuck at 8.5 and sometimes randomly picking at upto 22, still well below expected 300+.

HER+SAC on Robotic Environment

Hi I am getting this error while running this code:

python train.py --algo her --env FetchPickAndPlace-v1 --tensorboard-log "C:\Users\pei-seng.tan\Desktop\Deep_RL\rl-baselines-zoo\USM_RL\SAC+HER" --eval-freq 10000 --eval-episodes 10 --save-freq 100000

It happens when the time-steps=10000

image

May I know how do i solve it?

PyBullet Environments not works

Describe the bug
Hi,
First, thanks for such a good repo.

After installation, I could run the trained agent for BreakoutNoFrameskip-v4 with video showing up.
However, when I tried to test the trained agent for PyBullet env, such as HopperBulletEnv-v0, the window pops up but stop with whole black image.

black_window

There is no apparent error msg on the terminal, how could I solve this bug?

Cheers

Code example
Succeed:
$ python enjoy.py --algo ppo2 --env BreakoutNoFrameskip-v4 --folder trained_agents/ -n 5000

Failed (window pop up, but with whole black image):
$ python enjoy.py --algo ppo2 --env HopperBulletEnv-v0 --folder trained_agents/ -n 1000
$ python enjoy.py --algo ppo2 --env HumanoidBulletEnv-v0 --folder trained_agents/ -n 1000
...

System Info
Describe the characteristic of your environment:

  • pip install all dependencies for stable-baselines (tested with openai-gym 0.12.5, 0.14.0 and 0.15.0)
  • GPU models and configuration: CUDA-10.0, Driver-418.87
  • Python version: 3.5
  • Tensorflow version: 1.14.0
  • opencv-python: tested with 3.4.5.20 and 3.4.7.28 --> same issue, however, when changing to 4.X version, the window will crash with error msg "error: (-215:Assertion failed) src_depth != CV_16F && src_depth != CV_32S in function 'convertToShow'".

missing dependency to the given Colab

The bug
running the first !python train.py --algo a2c --env CartPole-v1 --n-timesteps 100000 throws error : cannot import optuna

System Info
Running the ready given colab

** Solution **
Add !pip install optuna to the install dependencies section

How to connect to container by EG remmina

Have rl-baselines-zoo, GPU edition. Am running:

sudo docker run --runtime=nvidia -it basezoo bash
[sudo] password for hh:

  • Xvfb :1 -screen 0 1024x768x24
  • export DISPLAY=:1
  • DISPLAY=:1
  • display=1
  • file=/tmp/.X11-unix/X1
  • sleep 1
    ++ seq 1 10
  • for i in '$(seq 1 10)'
  • '[' -e /tmp/.X11-unix/X1 ']'
  • break
  • '[' -e /tmp/.X11-unix/X1 ']'
  • exec bash
    root@e62f057cb381:/#

Portainer gives me an IP address, but no port info.

Evidently, a display is created, supposedly so I can connect from outside. And it would be quite helpful.

Have I understood this correctly? If so - how do I connect with EG remmina?

Help is much appreciated. Thank you!

Error "ImportError: DLL load failed" on Windows 10

I got the following error in train.py in Windows 10 when running statement from mpi4py import MPI:

ImportError: DLL load failed: The specified procedure could not be found.

The solution was to uninstall mpi4py package, install Microsoft MPI 10.0 and then reinstall mpi4py.

I think it might be good to document that people should first install Microsoft MPI 10.0. Even better, just don't import mpi4py if everything is on single machine :).

More: https://stackoverflow.com/a/58653569/207661

or this part is compiled to machine language?

Thanks for your great job.

However, I can’t find the source code for different algorithm. For example, I can only train an agent by a2c but can’t check how to code A2C. Do I miss something or this part is compiled to machine language?

[question] Hyperparameters for Roboschool HumanoidFlagrunHarder

Hi,
Currently, I have used 4 algorithms from stable-baselines for the task of Roboschool HumanoidFlagrunHarder. My evaluation metric is the mean reward of 100 episodes. Basically: PPO2 is perfect, A2C gets the mean reward of 500, DDPG gets the mean reward around 0. SAC gets the mean reward of 280. I have been looking for the hyperparameters setting in stable-baselines-zoo for A2C, DDPG, and SAC but could only find Bullet Env Humanoid for SAC (quite close to Roboschool HFH). Thus, do you have any suggestions for A2C, DDPG, SAC on this task? The number of timesteps for on-policy methods is 400M and 20M for off-policy methods. It would be nice if you add them to the set of hyperparameters.
Thanks.

How many training steps used to obtain the pre-trained model?

Is there any document illustrating how many training steps used to obtain the pre-trained model? Some pretrained model seems far less than the start-of-the-art. For instance, the dqn model on BeamRider and Qbert only achieve 948.0 and 550.0. However, using other policies (e.g., PPO2 and ACKTR), such reward values could be 10,000+.
It would be better if you can provide these pre-trained models as a trustworthy baseline for benchmarking.

TRPO "underflow encountered in multiply"

While running a TRPO train, after some time (random - anywhere from 15sec to 1min) it kicks with the following:
Traceback (most recent call last): File "callback.py", line 196, in <module> model.learn(total_timesteps=time_steps, callback=callback, tb_log_name=tb_sub_dir) File "/root/stable-baselines/stable_baselines/trpo_mpi/trpo_mpi.py", line 427, in learn self.vfadam.update(grad, self.vf_stepsize) File "/root/stable-baselines/stable_baselines/common/mpi_adam.py", line 61, in update step = (- step_size) * self.exp_avg / (np.sqrt(self.exp_avg_sq) + self.epsilon) FloatingPointError: underflow encountered in multiply

Using the recent version, 2.9.0, Python 3.7.5.

Inclusion of baseline results

There should be a way to see your results that tells you what one should expect if you run the training from scratch. At a minimum, there should be information on a number of training steps and eventual 100-episode average that one might expect in baseline but much better would be to show the entire training curve. Without this baseline is not very meaningful as one may never know if they actually replicated the expected result.

Few good RL baseline frameworks do this, for example here is how other framework display their results: Garage, RLLib, Coach. I love the UX that Garage provides as well as Coach's approach of making results as part of repo itself.

Currently, there is benchmark.zip file in the repo but it seems monitor.csv and progess.csv are not helpful (for example, for DQN progress.csv is empty and monitor.csv only has last few rows). Furthermore, these files are not produced at all currently if you run the experiment.

[feature request] Execution Drivers and Standardized Agent Interface

Currently the way to train an agent is to 1) Instantiate the environment, 2) Instantiate the agent, passing the environment in constructor and 3) calling the learn method.

Some agent frameworks have started implementing execution drivers, i.e., objects responsible for interacting the agent with the environment.

Would love to see such a feature in stable-baselines, given that it would facilitate a lot the pipeline for testing a new agent and comparing it with existing ones.

Current code:
env = gym.make("CartPole-v0")
agent = DQN(env, ...)
agent.learn(...)

What if there were driver objects such that the execution would go something like this:

driver = BottleneckDriver(max_timesteps=10000, max_episodes=200)
metrics = [TotalTimesteps(), AverageTimestepReward(), AverageEpisodeReward(), ...]
driver.run(agent, env, metrics)
for metric in metric:
      print(f"{metric.name}: {metric.result()}")

Example


class Driver(BaseDriver):
    """ 
      Runs until one of the conditions is met - max_timesteps or episodes
    """
    def __init__(self, agent, environment, max_timesteps=math.inf, max_episodes=math.inf, observers=None):
        super(BottleneckRunner, self).__init__(agent, environment, observers)
        self._timesteps = max_timesteps
        self._episodes = max_episodes

    def run(self):
        self._environment.reset()
        done = False
        while not done:
            self.step()
            done = self.total_episodes >= self._episodes or self.total_steps >= self._timesteps
And a base class:

Timestep = namedtuple("Timestep", "t state action reward next_state is_terminal info")


class BaseDriver(ABC):

    def __init__(self, agent, environment, observers):
        """
        :param agent: The agent to interact with the environment
        :param environment: The environment
        :param observers: The observers
        """
        self._agent = agent
        self._environment = environment
        self._observers = observers or []

        self._total_steps = 0
        self._total_episodes = 0

    @property
    def total_steps(self):
        return self._total_steps

    @property
    def total_episodes(self):
        return self._total_episodes

    @abstractmethod
    def run(self):
        raise NotImplementedError()

    def step(self):
        state = self._environment.state
        action = self._agent.action(state)
        timestep = self._environment.step(action)
        for observer in self._observers:
            observer(timestep)
            self._agent.reinforcement(timestep)
        self._total_steps += 1
        is_terminal = timestep.is_terminal
        if is_terminal:
            self._total_episodes += 1
        return timestep

    def episode(self):
        self._environment.reset()
        is_terminal = False
        trajectory = [self._environment.state]
        while not is_terminal:
            timestep = self.step()
            trajectory.append(timestep)
        return trajectory

Would love to contribute with such features.
Let me know what you think.

Docker image requires python3-tk package

I'm trying to use your docker image araffin/rl-baselines-zoo-cpu
When I typed the command:

./run_docker_cpu.sh python train.py --algo ppo2 --env CartPole-v1

It shows below error log

Executing in the docker (cpu image):
python train.py --algo ppo2 --env CartPole-v1
+ export DISPLAY=:1
+ Xvfb :1 -screen 0 1024x768x24
+ DISPLAY=:1
+ display=1
+ file=/tmp/.X11-unix/X1
+ sleep 1
++ seq 1 10
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ']'
+ break
+ '[' -e /tmp/.X11-unix/X1 ']'
+ exec bash -c 'cd /root/code/stable-baselines/ && python train.py --algo ppo2 --env CartPole-v1'
/root/venv/lib/python3.5/site-packages/gym/envs/registration.py:64: UserWarning: register(timestep_limit=1000) is deprecated. Use register(max_episode_steps=1000) instead.
  warnings.warn("register(timestep_limit={}) is deprecated. Use register(max_episode_steps={}) instead.".format(timestep_limit, timestep_limit))
Traceback (most recent call last):
  File "/usr/lib/python3.5/tkinter/__init__.py", line 36, in <module>
    import _tkinter
ImportError: No module named '_tkinter'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 11, in <module>
    from stable_baselines.common import set_global_seeds
  File "/root/venv/lib/python3.5/site-packages/stable_baselines/__init__.py", line 6, in <module>
    from stable_baselines.gail import GAIL
  File "/root/venv/lib/python3.5/site-packages/stable_baselines/gail/__init__.py", line 2, in <module>
    from stable_baselines.gail.dataset.dataset import ExpertDataset, DataLoader
  File "/root/venv/lib/python3.5/site-packages/stable_baselines/gail/dataset/dataset.py", line 8, in <module>
    import matplotlib.pyplot as plt
  File "/root/venv/lib/python3.5/site-packages/matplotlib/pyplot.py", line 2372, in <module>
    switch_backend(rcParams["backend"])
  File "/root/venv/lib/python3.5/site-packages/matplotlib/pyplot.py", line 207, in switch_backend
    backend_mod = importlib.import_module(backend_name)
  File "/root/venv/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/root/venv/lib/python3.5/site-packages/matplotlib/backends/backend_tkagg.py", line 1, in <module>
    from . import _backend_tk
  File "/root/venv/lib/python3.5/site-packages/matplotlib/backends/_backend_tk.py", line 5, in <module>
    import tkinter as Tk
  File "/usr/lib/python3.5/tkinter/__init__.py", line 38, in <module>
    raise ImportError(str(msg) + ', please install the python3-tk package')
ImportError: No module named '_tkinter', please install the python3-tk package

Then I start a new docker container
docker run -it -v araffin/rl-baselines-zoo-cpu
And then I installed python3-tk by console.
After this step, I can run python enjoy.py --algo a2c --env BreakoutNoFrameskip-v4 --folder trained_agents/ -n 5000 in docker bash.

Integrating custom envs with rl-baselines-zoo

[question] [feature request]

Am enjoying r-b-z. Wanting to make envs, or wanting to modify them.

I know you can define a custom env as described in link 1 and in link 2.

The problem with link 1 is that you're limited in the number of parameters you can pass, EG can't, to my knowledge, apply Optuna like this:

./run_docker_gpu.sh python -m train.py --algo ppo2 --env MYCUSTOMENV-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \
  --sampler random --pruner median

I'm aware you can define a custom env

import gym
from gym import spaces

class CustomEnv(gym.Env):
  """Custom Environment that follows gym interface"""
  metadata = {'render.modes': ['human']}

  def __init__(self, arg1, arg2, ...):

but I don't believe you can pass EG -optimize --n-trials 1000 --n-jobs 2 random --pruner median to it, and so it's more limited than the first script; I haven't found a way to train.py CustomEnv as it were.

Link 2 I haven't gotten to work with r-b-z.

I think it's like this because r-b-z imports or stores its "internal" gym envs somewhere. (IE, you don't need to import Gym to use r-b-z.) So, I can't go in there and modify or add files.

My question is - where does r-b-z store its internal gym envs? Or is there some way to have r-b-z re-reference its internal gym to a "normal" gym import "outside" of r-b-z. This so that I can go in there and modify directly.

Help would be warmly appreciated!

Kind regards

Deprecation warning latest optuna version

Using latest optuna and running training results in multiple warnings:
The use of optuna.structs.TrialPrunedis deprecated. Please useoptuna.exceptions.TrialPruned instead

The following is a simple fix. in file utils\hyperparams_opt.py, change line 176
raise optuna.structs.TrialPruned()
with:
raise optuna.exceptions.TrialPruned()

GPU vs CPU Performance

Hello,

first of all i wanna say thank you for this great Framework and those tuned Hyper parameters.

Now to my question:
I tested the performance with my own gym/baselines examples and there were only 5-10 seconds difference.
I also tested your implementation and the difference vs CPU and GPU is really small (about 5 sec.).
Is this an normal behavior or must be the difference much higher (maybe i installed the nvidia-docker wrong with the wrong nvidia-runtime settings) ?

I hope maybe you can help me out

Thanks!!!

Regards Mom0

enjoy.py throws error about observation space with HalfCheetahBulletEnv-v0 and own saved model

Hi,

Playing around with stable baselines 2.9.0 (installed with pip) on Ubuntu 18.04.4 LTS with Python 3.6.9, gym 0.16.0, tensorflow 1.14.0 and pybullet 2.6.5.

When I run

python enjoy.py --algo ppo2 --env HalfCheetahBulletEnv-v0 -- folder trained_agents/ -n 150000

all is well.

When I run

python enjoy.py --algo ppo2 --env HalfCheetahBulletEnv-v0 -- folder logs/ -n 150000

so that enjoy.py loads the model I have trained and saved with train.py, I get the following error..

"Error: the environment passed must have at least the same observation space as the model was trained on."

When I've been trying to see what the problem is today I've noticed that, because of the stored hyperparameters for ppo2 HalfCheetahBulletEnv-v0, train.py wraps the training environment in the TimeFeatureWrapper wrapper from utils/wrappers.py but enjoy.py does not because end up going into the elif "Bullet" in env_id: statement in the create_test_env() method in utils/utils.py.

I've looked and the wrapper changes the observation space from (26,) to (27,) so that may be what it is complaining about in the error message.

Am I barking up the right tree and how come the error doesn't occur with the zoo trained_agents saved models?

Thank you!

What observation space/environmet were the pre-trained Atart DQN agents trained using?

Describe the bug

Unable to load the pre-trained Atari DQN agents because the observation space doesn't match the Atari environment observation space.

Code example

(Using the Space Invaders model copied from the trained_agents directory.)

env = make_atari('SpaceInvadersNoFrameskip-v4')
model = DQN.load('../../atari-models/SpaceInvadersNoFrameskip-v4.pkl')
obs = env.reset()

action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)

plt.imshow(obs)
plt.show()

Error Message

ValueError                                Traceback (most recent call last)
<ipython-input-21-4057e0f0b223> in <module>
      3 obs = env.reset()
      4 
----> 5 action, _states = model.predict(obs)
      6 obs, rewards, dones, info = env.step(action)
      7 

~/miniconda3/envs/baselines/lib/python3.6/site-packages/stable_baselines/common/base_class.py in predict(self, observation, state, mask, deterministic)
    717             mask = [False for _ in range(self.n_envs)]
    718         observation = np.array(observation)
--> 719         vectorized_env = self._is_vectorized_observation(observation, self.observation_space)
    720 
    721         observation = observation.reshape((-1,) + self.observation_space.shape)

~/miniconda3/envs/baselines/lib/python3.6/site-packages/stable_baselines/common/base_class.py in _is_vectorized_observation(observation, observation_space)
    647                                  "Box environment, please use {} ".format(observation_space.shape) +
    648                                  "or (n_env, {}) for the observation shape."
--> 649                                  .format(", ".join(map(str, observation_space.shape))))
    650         elif isinstance(observation_space, gym.spaces.Discrete):
    651             if observation.shape == ():  # A numpy array of a number, has shape empty tuple '()'

ValueError: Error: Unexpected observation shape (210, 160, 3) for Box environment, please use (84, 84, 4) or (n_env, 84, 84, 4) for the observation shape.

System Info
Describe the characteristic of your environment:

  • Stable baselines installed following README instructions in a isolated conda environment.

  • GPU models and configuration: single RTX 2070 GPU.

  • Python version: 3.6.8

  • Tensorflow version: 1.14

Converting a model into PyTorch

Hi,

I'm trying to load your pre-trained DQN Breakout agent into a PyTorch network. I have figured out how to get the weights transferred over but the pytorch agent is still not able to do well on the game.

I'm using the exact same env object as in the non-pytorch case which works so that isn't the cause. I think the cause must be that the stable-baselines agent does some pre-processing behind the scenes that I am not aware of. Can anyone help me with some advice on this?

e.g. at the very least I imagine the stable-baseline agent normalises pixel values to the range 0-1 but I can't find anywhere where this is happening?

benchmark.md agents hyperparameters

Hi,

Thanks for amazing lib, open-source RL benchmark is really valuable nowadays.
Nevertheless, I am wondering, where I can find hyperparameters used for benchmarked agents? Like network architecture, optimizator parameters and other important RL stuff ;)

can I run without early ending?

I am trying to run ppo2 in mountaincar-v0,and the following two issues may need your help : )

1、 the output in tensorboard seems that every episode can only run within 200 steps, I wonder is there any method to change this maximum number of steps that I want to run per episode(or just do not use the early ending trick)?
2、 the episode reward output in tensorboard seems to be the discounted reward, where should I add real episode reward or episode step length in tensorboard output?

could you give me some advice on above issues? Thans a lot!

ImportError: cannot import name 'FlattenDictWrapper'

I tried run the trained agents locally and from Docker, the same bug about FlattenDictWrapper appear

Code example

Traceback (most recent call last):
  File "train.py", line 25, in <module>
    from stable_baselines.common.cmd_util import make_atari_env
  File "/root/venv/lib/python3.5/site-packages/stable_baselines/common/cmd_util.py", line 8, in <module>
    from gym.wrappers import FlattenDictWrapper
ImportError: cannot import name 'FlattenDictWrapper'

System Info
Describe the characteristic of your environment:
Stable_baselines was installed from PIP and Docker
Use on CPU
The local python version is 3.7
The local Tensorflow version is 1.14

BipedalWalkerHardcore-v3

Is it possible to have pre-trained BipedalWalkerHardcore-v3 agent with ppo2 added to the repo? Given the bug found in BipedalWalkerHardcore-v2?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.