inarikami / keras-rl2 Goto Github PK

View Code? Open in Web Editor NEW

250.0 6.0 102.0 939 KB

Reinforcement learning with tensorflow 2 keras

License: MIT License

Python 100.00%

atari deep deep-reinforcement-learning dqn algorithms

keras-rl2's Introduction

Deep Reinforcement Learning for Tensorflow 2 Keras

NOTE: Requires tensorflow==2.1.0

What is it?

keras-rl2 implements some state-of-the art deep reinforcement learning algorithms in Python and seamlessly integrates with the deep learning library Keras.

Furthermore, keras-rl2 works with OpenAI Gym out of the box. This means that evaluating and playing around with different algorithms is easy.

Of course you can extend keras-rl2 according to your own needs. You can use built-in Keras callbacks and metrics or define your own. Even more so, it is easy to implement your own environments and even algorithms by simply extending some simple abstract classes. Documentation is available online.

What is included?

As of today, the following algorithms have been implemented:

Deep Q Learning (DQN) [1], [2]
Double DQN [3]
Deep Deterministic Policy Gradient (DDPG) [4]
Continuous DQN (CDQN or NAF) [6]
Cross-Entropy Method (CEM) [7], [8]
Dueling network DQN (Dueling DQN) [9]
Deep SARSA [10]
Asynchronous Advantage Actor-Critic (A3C) [5]
Proximal Policy Optimization Algorithms (PPO) [11]

You can find more information on each agent in the doc.

Installation

Install Keras-RL2 from Pypi (recommended):

pip install keras-rl2

Install from Github source:

git clone https://github.com/wau/keras-rl2.git
cd keras-rl
python install .

Examples

If you want to run the examples, you'll also have to install:

gym by OpenAI: Installation instruction
h5py: simply run pip install h5py

For atari example you will also need:

Pillow: pip install Pillow
gym[atari]: Atari module for gym. Use pip install gym[atari]

Once you have installed everything, you can try out a simple example:

python examples/dqn_cartpole.py

This is a very simple example and it should converge relatively quickly, so it's a great way to get started! It also visualizes the game during training, so you can watch it learn. How cool is that?

If you have questions or problems, please file an issue or, even better, fix the problem yourself and submit a pull request!

References

Playing Atari with Deep Reinforcement Learning, Mnih et al., 2013
Human-level control through deep reinforcement learning, Mnih et al., 2015
Deep Reinforcement Learning with Double Q-learning, van Hasselt et al., 2015
Continuous control with deep reinforcement learning, Lillicrap et al., 2015
Asynchronous Methods for Deep Reinforcement Learning, Mnih et al., 2016
Continuous Deep Q-Learning with Model-based Acceleration, Gu et al., 2016
Learning Tetris Using the Noisy Cross-Entropy Method, Szita et al., 2006
Deep Reinforcement Learning (MLSS lecture notes), Schulman, 2016
Dueling Network Architectures for Deep Reinforcement Learning, Wang et al., 2016
Reinforcement learning: An introduction, Sutton and Barto, 2011
Proximal Policy Optimization Algorithms, Schulman et al., 2017

keras-rl2's People

Contributors

Stargazers

Watchers

Forkers

fossabot ruairidhcumming akc777 fibigerg eescriba hankcs edurenye adarshagrawal38 tlkh robertdigital akhimji yuqiangheng danieljennings treekid jacobeturpin nicolenair krisssbg atra94 npkhanhh timcanby xvalleycorp guzman-diaz zoetsekas yuan74 priyankar85 luke256 ai-hub-deep-learning-fundamental dylanlu126 siyuchen1 luckxudong rongpenl wardvd augustiu davidtxx zeta1999 ikamensh lippeck blessingkoketso max-schenke alexaorrico shaoruchen ahmadhajmosa eelcobatterink joliveros domagojhack carlos-marques quanthao ofikodar pierian-data tm180 hieufromwaterloo eemelipa rashiq-ahmd chubek hercules261188 xueyagaga anlausch dre2004 yuanzimu matsumoto616 weizn11 tslnihaogit nozomi-sk alicebook12220 mustafamerttunali ashesofphoenix jjpatric edubergeek jayaneetha mathisschleithoff armiantos mikeogezi cp338 kalebakeits ceyzeriat coolrobotsandstuff erick11072 cloudenginehub dliofindia jakelong12556 maiaracoelho sbyinin ianlokh niloyir06 captainclemens ryoma310 gogela salesmendesandre davidpulcifer da4kek stjordanis chenhong805 sameer-13 datasurgellc javabudd kifuman sudislife adfsgherw1 wakamain

keras-rl2's Issues

Problem logging with tensorboard

I am trying to log the training of my agent over time.

I am used to using tensorboard, however when I try and create a callback, i get an error when running a fit.

from keras.callbacks import TensorBoard
tb =  TensorBoard(log_dir='./keras-rl')
dqn.fit(env, nb_steps=1200000, visualize=False, verbose=1, callbacks=[tb])


AttributeError: 'TensorBoard' object has no attribute '_should_trace'

And when I try and use WandbLogger as suggested:

from keras.callbacks import WandbLogger

ImportError: cannot import name 'WandbLogger' from 'keras.callbacks'

If there is a solution to this I would be thankful, or another way of doing live monitoring would be great too!

Did someone know why DQN agent add one shape to array?

So this is my setting:

keras==2.3.1
keras-rl2
tensorflow-gpu==2.0.0-beta1
numpy==1.16.4

And when i fit the DQN agent, my model go on full craziness

#Build the NN
model = Sequential()
model.add(Convolution2D(32, (8, 8), strides=(4, 4),input_shape=input_shape,activation='relu'))
model.add(Convolution2D(64, (4, 4), strides=(2, 2), activation='relu'))
model.add(Convolution2D(64, (2, 2), strides=(1, 1), activation='relu'))
model.add(Flatten())
model.add(Dense(512,activation='relu'))
model.add(Dense(nb_actions, activation='softmax'))
print(model.summary())

policy = BoltzmannGumbelQPolicy()
memory= SequentialMemory(limit=50000, window_length = 1)
dqn = DQNAgent(model = model, nb_actions = nb_actions , memory=memory, train_interval=1,
               nb_steps_warmup=5000, target_model_update=10000, policy=policy)

dqn.compile(Adam(lr=1e-4), metrics=['mae'])

This is the error :

ValueError: Error when checking input: expected conv2d_input to have 4 dimensions, but got array with shape (1, 1, 240, 256, 3)

But it start here:

/usr/local/lib/python3.6/dist-packages/rl/core.py in fit(self, env, nb_steps, action_repetition, callbacks, verbose, visualize, nb_max_start_steps, start_step_policy, log_interval, nb_max_episode_steps)

row #169 core.py

I'm quite new in this library, maybe i'm missing something

DDPG ValueError: name for name_scope must be a string.

While running either DDPG agent, I encounter a value error in a Tensorflow 2 ops.py method. The problem appears to be recreated whenever the AdditionalUpdatesOptimizer class is initialized.

Error:

ValueError: name for name_scope must be a string.

The potential error causing class in Keras-rl2 utils.py:

class AdditionalUpdatesOptimizer(optimizers.Optimizer):
    def __init__(self, optimizer, additional_updates):
        super().__init__(optimizer)
        self.optimizer = optimizer
        self.additional_updates = additional_updates

    def get_updates(self, params, loss):
        updates = self.optimizer.get_updates(params=params, loss=loss)
        updates += self.additional_updates
        self.updates = updates
        return self.updates

    def get_config(self):
        return self.optimizer.get_config()

Traceback list:

File "/Users/taylormcnally/.vscode/extensions/ms-python.python-2019.5.18875/pythonFiles/ptvsd_launcher.py", line 43, in <module> main(ptvsdArgs) File "/Users/taylormcnally/.vscode/extensions/ms-python.python-2019.5.18875/pythonFiles/lib/python/ptvsd/__main__.py", line 434, in main run() File "/Users/taylormcnally/.vscode/extensions/ms-python.python-2019.5.18875/pythonFiles/lib/python/ptvsd/__main__.py", line 312, in run_file runpy.run_path(target, run_name='__main__') File "/anaconda3/envs/tf2/lib/python3.6/runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "/anaconda3/envs/tf2/lib/python3.6/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/anaconda3/envs/tf2/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/taylormcnally/Documents/GitHub/keras-rl2/examples/ddpg_pendulum.py", line 58, in <module> agent.compile(Adam(lr=.001, clipnorm=1.), metrics=['mae']) File "/Users/taylormcnally/Documents/GitHub/keras-rl2/rl/agents/ddpg.py", line 122, in compile critic_optimizer = AdditionalUpdatesOptimizer(critic_optimizer, critic_updates) File "/Users/taylormcnally/Documents/GitHub/keras-rl2/rl/util.py", line 84, in __init__ super().__init__(optimizer) File "/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 263, in __init__ with backend.name_scope(self._name) as name_scope: File "/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 739, in name_scope return ops.name_scope_v2(name) File "/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6248, in __init__ raise ValueError("name for name_scope must be a string.") ValueError: name for name_scope must be a string.

ValueError: Variable Tensor("Mean_1:0", shape=(), dtype=float32) has `None` for gradient while training DDPG

Whenever I try to run the ddpg_pendulum example (or any other DDPG example), I always get the error

ValueError: Variable Tensor("Mean_1:0", shape=(), dtype=float32) has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

Each time the training completes an interval, this problem occurs. Is there any way to get over it?

My Tensorflow and Keras versions are 2.1.0 and 2.3.1 respectively.

please add PPO, A3C...

Guys, Keras-rl is the best reinforcement learning library.
easy to handle despite complex rl algorithmic.
Keras-rl is far moore better than stable baseline.
please add ppo, a3c and other as dqn is the less rl algo.

Thx!

Agent fails with sequential data

The Agent implementation fails for data of indeterminate length, such as temporal data. An environment that outputs data of the shape (None, data_dim) fails for an accompanying model with a fitting LSTM as first layer.

It appears that either the Agent or the standard Processor adds another dimension to the observation, causing a shape mismatch between the Environment output and the Model input, raising a ValueError. The shape received from an Environment that outputs (None, 10) is:

ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [1, 1, None, 10]

The first "1" refers to the batch dimension and is to be expected. As an immediate workaround, one can add a squeeze layer to the model, something along the lines of Input>Squeeze>LSTM>Output.

Check that you are up-to-date with the master branch of Keras-RL. You can update with:
pip install git+git://github.com/wau/keras-rl2.git --upgrade --no-deps
Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short). If you report an error, please include the error message and the backtrace.

Example Code:

import rl.memory
import rl.agents
import rl.core
import tensorflow as tf
import numpy as np

BATCH_SIZE = 1
DATA_DIM = 10

class Environment(rl.core.Env):
	def __init__(self, data_dim = 10, game_length = 50):
		self.reward_counter = 0
		self.data_dim = data_dim
		self.game_length = game_length
		self.reward = 0.1
		self.observation = [[0] * self.data_dim]
		self.observation[0][0] = 1
		self.done = False

	def step(self, action):
		action_number = np.argmax(action)
		if not self.reward_counter + action_number % self.data_dim or np.random.rand() < 0.05:
			self.reward *= 1.1
			self.observation.append([0]*self.data_dim)
			self.observation[-1][self.reward_counter%self.data_dim] = 1
			self.reward_counter += 1
			reward = self.reward
			observation = self.observation
		observation = np.array(observation)
		if len(self.observation) > self.game_length and np.random.rand() < 0.05:
			self.done = True
			done = self.done
		info = {}
		return observation, reward, done, info

	def reset(self):
		self.done = False
		self.reward_counter = 0
		self.reward = 0.1
		self.observation = [[0] * self.data_dim]
		self.observation[0][0] = 1
		observation = self.observation
		observation = np.array(observation)
		return observation

	def close(self):
		self.__del__()

if __name__ is '__main__':
	lstm_input = tf.keras.Input(batch_shape = (BATCH_SIZE, 1, None, DATA_DIM))
	# lstm_input = tf.keras.backend.squeeze(lstm_input, 1) # uncomment squeeze layer to fix model.
	x = tf.keras.layers.LSTM(20)(lstm_input)
	x = tf.keras.layers.Dense(10, activation='softmax')(x) # output size doesn't actually matter here
	model = tf.keras.Model(inputs = [lstm_input], outputs = [x])

	memory = rl.memory.SequentialMemory(50000, window_length=BATCH_SIZE)
	processor = rl.core.Processor()

	agent = rl.agents.DQNAgent(model, memory=memory, processor=processor, nb_actions=10, batch_size=BATCH_SIZE)
	agent.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.01))

	env = Environment(data_dim=DATA_DIM)

	agent.fit(env, nb_steps=int(5e5), log_interval=1000)

AttributeError: Tensor.op is meaningless when eager execution is enabled

Check that you are up-to-date with the master branch of Keras-RL. You can update with:
pip install git+git://github.com/wau/keras-rl2.git --upgrade --no-deps
Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short). If you report an error, please include the error message and the backtrace.

I am running the example scripts after installing precisely as documented above.

python examples/dqn_cartpole.py

Produces the error and trace:

raining for 50000 steps ...
2020-04-06 16:34:29.589490: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1483] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
/opt/venv/keras-rl2/lib/python3.6/site-packages/rl/memory.py:40: UserWarning: Not enough entries to sample without replacement. Consider increasing your warm-up phase to avoid oversampling!
  warnings.warn('Not enough entries to sample without replacement. Consider increasing your warm-up phase to avoid oversampling!')
Traceback (most recent call last):
  File "examples/dqn_cartpole.py", line 46, in <module>
    dqn.fit(env, nb_steps=50000, visualize=False, verbose=2)
  File "/opt/venv/keras-rl2/lib/python3.6/site-packages/rl/core.py", line 194, in fit
    metrics = self.backward(reward, terminal=done)
  File "/opt/venv/keras-rl2/lib/python3.6/site-packages/rl/agents/dqn.py", line 322, in backward
    metrics = self.trainable_model.train_on_batch(ins + [targets, masks], [dummy_targets, targets])
  File "/opt/venv/keras-rl2/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 917, in train_on_batch
    self._make_train_function()
  File "/opt/venv/keras-rl2/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1986, in _make_train_function
    **self._function_kwargs)
  File "/opt/venv/keras-rl2/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3544, in function
    return EagerExecutionFunction(inputs, outputs, updates=updates, name=name)
  File "/opt/venv/keras-rl2/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3438, in __init__
    add_sources=True, handle_captures=True, base_graph=source_graph)
  File "/opt/venv/keras-rl2/lib/python3.6/site-packages/tensorflow/python/eager/lift_to_graph.py", line 325, in lift_to_graph
    add_sources=add_sources))
  File "/opt/venv/keras-rl2/lib/python3.6/site-packages/tensorflow/python/eager/lift_to_graph.py", line 114, in _map_subgraph
    ops_to_visit = [_as_operation(init_tensor)]
  File "/opt/venv/keras-rl2/lib/python3.6/site-packages/tensorflow/python/eager/lift_to_graph.py", line 37, in _as_operation
    return op_or_tensor.op
  File "/opt/venv/keras-rl2/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 987, in op
    "Tensor.op is meaningless when eager execution is enabled.")
AttributeError: Tensor.op is meaningless when eager execution is enabled.

(I'll note that editing the example to increase the warmup resolves the warning message, but I still see this same AttributeError message. I get the same error and trace when attempting the other examples as well.)

p.s. thanks for providing and maintaining this excellent resource!

Preprocessing Data from Observations

Hello,

is there a way to implement a custom preprocessing / featurizing routine into the training process?
Is such a feature already available?

I am currently making use of a featurizer to preprocess the observations from the environment.
As I haven't found a way to implement it into the agent, I had to define this preprocessor as a part of the environment.
Unfortunately, the preprocessor transforms the low-dimensional environment state into a high-dimensional feature vector,
which is then appended to the memory buffer.
Consequently, the training uses a huge amount of RAM, although it should be possible to perform the preprocessing just in time, directly after low-dimensional observations have been loaded from the memory.

Thank you.

AttributeError: 'Sequential' object has no attribute 'uses_learning_phase'

I tried running one of the DDPG examples: 'python ddpg_pendulum.py'
After 81 iterations the program stopped with an error:
AttributeError: 'Sequential' object has no attribute 'uses_learning_phase'

'Sequential' is a Keras object that indeed does not have this attribute in a few versions I checked.

Can you check this out and comment on this problem?

Thank you.

Package versions have conflicting dependencies

I am getting the following error trying to install keras-rl2 on my M1 Macbook. The tensorflow version I have is 2.4.0rc0.

ERROR: Cannot install keras-rl2==1.0.0, keras-rl2==1.0.1, keras-rl2==1.0.2, keras-rl2==1.0.3 and keras-rl2==1.0.4 because these package versions have conflicting dependencies.

What should I do to solve this problem?

AttributeError when running dqn_cartpole.py

I ran the suggested pip commands to update keras-rl2 and keras. No errors with the pip commands.

I am using Python 3.7.5

Any help on how to fix would be appreciated. When I run dqn_cartpole.py, I am getting the following error:

AttributeError: Tensor.op is meaningless when eager execution is enabled.

Full traceback:

Traceback (most recent call last):
File "dqn_cartpole.py", line 46, in
dqn.fit(env, nb_steps=50000, visualize=True, verbose=2)
File "/home/valmiki/miniconda3/envs/gym_keras_rl_n2/lib/python3.7/site-packages/rl/core.py", line 194, in fit
metrics = self.backward(reward, terminal=done)
File "/home/valmiki/miniconda3/envs/gym_keras_rl_n2/lib/python3.7/site-packages/rl/agents/dqn.py", line 324, in backward
metrics = self.trainable_model.train_on_batch(ins + [targets, masks], [dummy_targets, targets])
File "/home/valmiki/miniconda3/envs/gym_keras_rl_n2/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 917, in train_on_batch
self._make_train_function()
File "/home/valmiki/miniconda3/envs/gym_keras_rl_n2/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1986, in _make_train_function
**self._function_kwargs)
File "/home/valmiki/miniconda3/envs/gym_keras_rl_n2/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3544, in function
return EagerExecutionFunction(inputs, outputs, updates=updates, name=name)
File "/home/valmiki/miniconda3/envs/gym_keras_rl_n2/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3438, in init
add_sources=True, handle_captures=True, base_graph=source_graph)
File "/home/valmiki/miniconda3/envs/gym_keras_rl_n2/lib/python3.7/site-packages/tensorflow/python/eager/lift_to_graph.py", line 325, in lift_to_graph
add_sources=add_sources))
File "/home/valmiki/miniconda3/envs/gym_keras_rl_n2/lib/python3.7/site-packages/tensorflow/python/eager/lift_to_graph.py", line 114, in _map_subgraph
ops_to_visit = [_as_operation(init_tensor)]
File "/home/valmiki/miniconda3/envs/gym_keras_rl_n2/lib/python3.7/site-packages/tensorflow/python/eager/lift_to_graph.py", line 37, in _as_operation
return op_or_tensor.op
File "/home/valmiki/miniconda3/envs/gym_keras_rl_n2/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 987, in op
"Tensor.op is meaningless when eager execution is enabled.")
AttributeError: Tensor.op is meaningless when eager execution is enabled.

"IndexError: list index out of range" in examples/dqn_cartpole.py

I run the program "examples/dqn_cartpole.py ", but there was error.
The error is that:

IndexError: list index out of range

According to message, that happens in line 46.

dqn.fit(env, nb_steps=50000, visualize=True, verbose=2)

I'm using tensorflow 2.3.0. Is that cause of that error?
By the way, version of Keras-RL2 is 1.0.4.

TypeError: len is not well defined for symbolic Tensors. (activation_4/Identity:0) Please call `x.shape` rather than `len(x)` for shape information.

Running latest Keras-RL and nightly TensorFlow 2.0 (tf-nightly-2.0-preview), I get the following error trying to run:

https://github.com/wau/keras-rl2/blob/master/examples/dqn_atari.py

I get the following error. Running Python 3.6 (Anaconda):

2019-08-08 17:55:57.294275: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-08 17:55:57.312932: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fbb2e93a0a0 executing computations on platform Host. Devices:
2019-08-08 17:55:57.312954: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
permute (Permute)            (None, 84, 84, 4)         0         
_________________________________________________________________
conv2d (Conv2D)              (None, 20, 20, 32)        8224      
_________________________________________________________________
activation (Activation)      (None, 20, 20, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 9, 9, 64)          32832     
_________________________________________________________________
activation_1 (Activation)    (None, 9, 9, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 7, 7, 64)          36928     
_________________________________________________________________
activation_2 (Activation)    (None, 7, 7, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0         
_________________________________________________________________
dense (Dense)                (None, 512)               1606144   
_________________________________________________________________
activation_3 (Activation)    (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 4)                 2052      
_________________________________________________________________
activation_4 (Activation)    (None, 4)                 0         
=================================================================
Total params: 1,686,180
Trainable params: 1,686,180
Non-trainable params: 0
_________________________________________________________________
None
Traceback (most recent call last):
  File "atari2.py", line 96, in <module>
    train_interval=4, delta_clip=1.)
  File "/Users/jheaton/.local/lib/python3.6/site-packages/rl/agents/dqn.py", line 107, in __init__
    if hasattr(model.output, '__len__') and len(model.output) > 1:
  File "/Users/jheaton/miniconda3/envs/tensorflow2/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 738, in __len__
    "shape information.".format(self.name))
TypeError: len is not well defined for symbolic Tensors. (activation_4/Identity:0) Please call `x.shape` rather than `len(x)` for shape information.

Please make sure that the boxes below are checked before you submit your issue. If your issue is an implementation question, please ask your question in the Discord.

Thank you!

Check that you are up-to-date with the master branch of Keras-RL. You can update with:
pip install git+git://github.com/wau/keras-rl2.git --upgrade --no-deps
Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short). If you report an error, please include the error message and the backtrace.

DDPGAgent is incompatible with MultiInputProcessor for HandReach-v0 env

DDPGAgent fails to train on the critic model while using a MultiInputProcessor within its backward method, specifically at lines 260-263:

                if len(self.critic.inputs) >= 3:
                    state1_batch_with_action = state1_batch[:]
                else:
                    state1_batch_with_action = [state1_batch]
                state1_batch_with_action.insert(self.critic_action_input_idx, target_actions)

This throws the error TypeError: unhashable type: 'slice' since state1_batch is a dictionary with three keys, as returned from the processor. It seems that this chunk of code automatically assumes that state1_batch will be a list instead of a dictionary. The same can be said a few lines down with state0_batch. I would love to be able to fix this myself, but am unsure why there was a hardcoded 3 in the logic or why the length of the inputs would make a difference. I'd love to understand if someone is willing to explain.

Here is the script: hand_reach.py

Please make sure that the boxes below are checked before you submit your issue. If your issue is an implementation question, please ask your question in the Discord.

Thank you!

Check that you are up-to-date with the master branch of Keras-RL. You can update with:
pip install git+git://github.com/wau/keras-rl2.git --upgrade --no-deps
Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short). If you report an error, please include the error message and the backtrace.

Pip install fails because of tf-version (pypi lists only version 1.0.2)

> sudo pip install keras-rl2 --no-cache-dir
Collecting keras-rl2
  Downloading keras-rl2-1.0.3.tar.gz (40 kB)
     |████████████████████████████████| 40 kB 308 kB/s 
ERROR: Could not find a version that satisfies the requirement tensorflow==2.0.0-beta1 (from keras-rl2) (from versions: none)
ERROR: No matching distribution found for tensorflow==2.0.0-beta1 (from keras-rl2)

The library seems to expect an exact tensorflow version

pip install git+git://github.com/wau/keras-rl2.git --upgrade --no-deps

works.

I think pip tries to install 1.0.2

It requires tensorflow==2.0.0-beta1 and should require the stable version

I had the tensorflow 2.0.0 but when I installed keras-rl with pip it uninstalled the stable version and installed the beta1.
It should require a version equal or grater than 2.0.0

AttributeError: No rl.version information

When I run:

import rl

rl.__version__

I get

AttributeError: module 'rl' has no attribute '__version__'

Thanks!

Soft Actor Critic

can you please add SAC algorithm?

ERROR: keras-rl2 1.0.3 has requirement tensorflow==2.0.0-beta1, but you'll have tensorflow 2.1.0 which is incompatible.

Hello,

requirement for keras-rl2 is tensorflow ==2.1.0
When I installed keras-rl2 by pip, it removed my tensorflow 2.1.0 and instaled 2.0.0beta version.

But my code did nor ran and I tryed install tensorflow 2.1.0 as it requires, but got this error
"ERROR: keras-rl2 1.0.3 has requirement tensorflow==2.0.0-beta1, but you'll have tensorflow 2.1.0 which is incompatible.
Installing collected packages: tensorflow
Attempting uninstall: tensorflow
Found existing installation: tensorflow 2.0.0b1
Uninstalling tensorflow-2.0.0b1:
Successfully uninstalled tensorflow-2.0.0b1
Successfully installed tensorflow-2.1.0"

[ X] Check that you are up-to-date with the master branch of Keras-RL. You can update with:
pip install git+git://github.com/wau/keras-rl2.git --upgrade --no-deps
[ X] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps

Reference to model is changed when training starts

Hello. First of all, I am using version 1.0.3 but the issue holds for the latest (1.0.4) version. My problem is that the reference to my model is being messed up with. I need to use the reference to my created model throughout the training, but as soon as the training starts, it suddenly becomes a reference to an Agent, in my case a DQNAgent.

It would be great not to have any side effects when starting the training, since the model variable could still be used after having started the training, as was my case. I have been searching for solutions and I found out that I was not the only one having problems with this: there are some issues on the original repository which indirectly address this as well. As a result, this library (as well as its predecessor) ends up not being fully compatible with some of the original Keras stuff, such as the TensorBoard callback: see keras-rl/keras-rl#255.