Coder Social home page Coder Social logo

archsyscall / deeprl-tensorflow2 Goto Github PK

View Code? Open in Web Editor NEW
608.0 18.0 140.0 614 KB

🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2

License: Apache License 2.0

Python 100.00%
tensorflow machine-learning reinforcement-learning a2c a3c reinforce dqn trpo ppo sac ddpg deep-learning deep-reinforcement-learning tensorflow2 dueling-dqn double-dqn rainbow-dqn

deeprl-tensorflow2's Introduction

TF Depend GYM Depend License Badge

Deep Reinforcement Learning in TensorFlow2

DeepRL-TensorFlow2 is a repository that implements a variety of popular Deep Reinforcement Learning algorithms using TensorFlow2. The key to this repository is an easy-to-understand code. Therefore, if you are a student or a researcher studying Deep Reinforcement Learning, I think it would be the best choice to study with this repository. One algorithm relies only on one python script file. So you don't have to go in and out of different files to study specific algorithms. This repository is constantly being updated and will continue to add a new Deep Reinforcement Learning algorithm.

Algorithms


DQN

Paper Playing Atari with Deep Reinforcement Learning
Author Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete only

Core of Idea

# idea01. Approximate Q-Function using NeuralNetwork
def create_model(self):
    model = tf.keras.Sequential([
        Input((self.state_dim,)),
        Dense(32, activation='relu'),
        Dense(16, activation='relu'),
        Dense(self.action_dim)
    ])
    model.compile(loss='mse', optimizer=Adam(args.lr))
    return model

# idea02. Use target network
self.target_model = ActionStateModel(self.state_dim, self.action_dim)
 
# idea03. Use ReplayBuffer to increase data efficiency
class ReplayBuffer:
    def __init__(self, capacity=10000):
        self.buffer = deque(maxlen=capacity)
    
    def put(self, state, action, reward, next_state, done):
        self.buffer.append([state, action, reward, next_state, done])
    
    def sample(self):
        sample = random.sample(self.buffer, args.batch_size)
        states, actions, rewards, next_states, done = map(np.asarray, zip(*sample))
        states = np.array(states).reshape(args.batch_size, -1)
        next_states = np.array(next_states).reshape(args.batch_size, -1)
        return states, actions, rewards, next_states, done
    
    def size(self):
        return len(self.buffer)

Getting Start

# Discrete Action Space Deep Q-Learning
$ python DQN/DQN_Discrete.py

DRQN

Paper Deep Recurrent Q-Learning for Partially Observable MDPs
Author Matthew Hausknecht, Peter Stone
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete only

Core of Ideas

# idea01. Previous state uses LSTM layer as feature
def create_model(self):
    return tf.keras.Sequential([
        Input((args.time_steps, self.state_dim)),
        LSTM(32, activation='tanh'),
        Dense(16, activation='relu'),
        Dense(self.action_dim)
    ])

Getting Start

# Discrete Action Space Deep Recurrent Q-Learning
$ python DRQN/DRQN_Discrete.py

DoubleDQN

Paper Deep Reinforcement Learning with Double Q-learning
Author Hado van Hasselt, Arthur Guez, David Silver
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete only

Core of Ideas

# idea01. Resolved the issue of 'overestimate' in Q Learning
on_action = np.argmax(self.model.predict(next_states), axis=1)
next_q_values = self.target_model.predict(next_states)[range(args.batch_size), on_action]
targets[range(args.batch_size), actions] = rewards + (1-done) * next_q_values * args.gamma

Getting Start

# Discrete Action Space Double Deep Q-Learning
$ python DoubleQN/DoubleDQN_Discrete.py

DuelingDQN

Paper Dueling Network Architectures for Deep Reinforcement Learning
Author Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete only

Core of Ideas

# idea01. Q-Function has been separated into Value Function and Advantage Function
def create_model(self):
    backbone = tf.keras.Sequential([
        Input((self.state_dim,)),
        Dense(32, activation='relu'),
        Dense(16, activation='relu')
    ])
    state_input = Input((self.state_dim,))
    backbone_1 = Dense(32, activation='relu')(state_input)
    backbone_2 = Dense(16, activation='relu')(backbone_1)
    value_output = Dense(1)(backbone_2)
    advantage_output = Dense(self.action_dim)(backbone_2)
    output = Add()([value_output, advantage_output])
    model = tf.keras.Model(state_input, output)
    model.compile(loss='mse', optimizer=Adam(args.lr))
    return model

Gettting Start

# Discrete Action Space Dueling Deep Q-Learning
$ python DuelingDQN/DuelingDQN_Discrete.py

A2C

Paper Actor-Critic Algorithms
Author Vijay R. Konda, John N. Tsitsiklis
Method ON-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

Core of Ideas

# idea01. Use Advantage to reduce Variance
def advatnage(self, td_targets, baselines):
    return td_targets - baselines

Getting Start

# Discrete Action Space Advantage Actor-Critic
$ python A2C/A2C_Discrete.py

# Continuous Action Space Advantage Actor-Critic
$ python A2C/A2C_Continuous.py

A3C

Paper Asynchronous Methods for Deep Reinforcement Learning
Author Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu
Method ON-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

Core of Ideas

# idea01. Reduce the correlation of data by running asynchronously multiple workers
def train(self, max_episodes=1000):
    workers = []

    for i in range(self.num_workers):
        env = gym.make(self.env_name)
        workers.append(WorkerAgent(
            env, self.global_actor, self.global_critic, max_episodes))

    for worker in workers:
        worker.start()

    for worker in workers:
        worker.join()

# idea02. Improves exploration through entropy loss
entropy_loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

Getting Start

# Discrete Action Space Asyncronous Advantage Actor-Critic
$ python A3C/A3C_Discrete.py

# Continuous Action Space Asyncronous Advantage Actor-Critic
$ python A3C/A3C_Continuous.py

PPO

Paper Proximal Policy Optimization
Author John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
Method ON-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

Core of ideas

# idea01. Use Importance Sampling to act like an Off-Policy algorithm
# idea02. Use clip to prevent rapid changes in parameters.
def compute_loss(self, old_policy, new_policy, actions, gaes):
    gaes = tf.stop_gradient(gaes)
    old_log_p = tf.math.log(
        tf.reduce_sum(old_policy * actions))
    old_log_p = tf.stop_gradient(old_log_p)
    log_p = tf.math.log(tf.reduce_sum(
        new_policy * actions))
    ratio = tf.math.exp(log_p - old_log_p)
    clipped_ratio = tf.clip_by_value(
        ratio, 1 - args.clip_ratio, 1 + args.clip_ratio)
    surrogate = -tf.minimum(ratio * gaes, clipped_ratio * gaes)
    return tf.reduce_mean(surrogate)

Getting Start

# Discrete Action Space Proximal Policy Optimization
$ python PPO/PPO_Discrete.py

# Continuous Action Space Proximal Policy Optimization
$ python PPO/PPO_Continuous.py

DDPG

Paper Continuous control with deep reinforcement learning
Author Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Continuous

Core of ideas

# idea01. Use deterministic Actor Model
def create_model(self):
    return tf.keras.Sequential([
        Input((self.state_dim,)),
        Dense(32, activation='relu'),
        Dense(32, activation='relu'),
        Dense(self.action_dim, activation='tanh'),
        Lambda(lambda x: x * self.action_bound)
    ])

# idea02. Add noise to Action
action = np.clip(action + noise, -self.action_bound, self.action_bound)

Getting Start

# Continuous Action Space Proximal Policy Optimization
$ python DDPG/DDPG_Continuous.py

TRPO

Paper Trust Region Policy Optimization
Author John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

# NOTE: Not yet implemented!

TD3

Paper Addressing Function Approximation Error in Actor-Critic Methods
Author Scott Fujimoto, Herke van Hoof, David Meger
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Continuous

# NOTE: Not yet implemented!

SAC

Paper Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Author Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

# NOTE: Not yet implemented!

Reference

deeprl-tensorflow2's People

Contributors

archsyscall avatar pathway avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar Tang Peng avatar  avatar 𒀉𒋮𒃻 avatar  avatar  avatar  avatar Jeongwon Kim avatar  avatar Tawan T. avatar Violeta TM avatar Minseok Yang avatar Anish Sahoo avatar simo avatar  avatar Shilu avatar Wittawat Manha avatar  avatar mr. kirosama avatar Sleep4TenMoreMinutes avatar Michela Mazzaglia avatar Than Lwin Aung avatar shenjie avatar  avatar  avatar smullyan-dev avatar Donggeun Kwon avatar Sam Joshva avatar TR avatar  avatar smallbojket avatar Theodoros Aslanidis avatar  avatar Lo Pang-Yun Ting avatar  avatar  avatar  avatar 十年一梦 avatar  avatar Diana Lee avatar  avatar  avatar Allie Ellis avatar  avatar Adeel Ahmad avatar Tomomi Ozawa avatar Hamidreza Sadeghi avatar  avatar yangyang avatar  avatar Robin Wang  avatar Shreyas Jaiswal avatar ‎  avatar Kuil avatar rongkunxue avatar Sichang Su avatar  avatar  avatar  avatar  avatar  avatar Jialong Wu avatar wsy avatar Seth Kitchen avatar  avatar XiaoWang avatar  avatar  avatar Luca Anzalone avatar Shuvrajeet Das avatar yycloudywind avatar  avatar jz-huanng avatar René Parlange avatar  avatar  avatar Arno avatar zhenyu_w avatar YYB avatar  avatar  avatar Mr.HAn avatar 张心宇 avatar  avatar Antonio Manjavacas avatar  avatar Cheese  avatar 湫曗 avatar Brandon Starcheus avatar Roman avatar  avatar  avatar Franky1 avatar  avatar STYLIANOS IORDANIS avatar ZhBF avatar

Watchers

Ibrahim Musa avatar Andrei Dore avatar Ron avatar justiceli avatar infinsight avatar Junyoung Jang avatar 박영환 avatar duke-rxc avatar  avatar kenfuliang avatar Aliaa Diab avatar  avatar  avatar  avatar  avatar Akshat Khare avatar  avatar paper2code - bot avatar

deeprl-tensorflow2's Issues

Probelm in A3C continuous

Hello everyone.
I am trying to use A3C continuous. But I am getting some error saying "unrecognized arguments". Please see the attached picture.
image
image

How to solve this?

Any idea why DQN is slow on CPU and on GPU?

The issue is not DQN specific, it's the only module (DQN_Discrete.py) which I tried to run on my mbp and on google colab. It runs okay, but both runs seem to take almost the same time. To activate the GPU, I added the following lines to main():

physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)

Update:
wandb report shows 0% GPU utilization, you can check the graphs after a few minutes from starting the training here

.

..

A3C_continues.py

'Viewer' object has no attribute 'isopen'
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 81, in close
AttributeError: 'Viewer' object has no attribute 'isopen'
Traceback (most recent call last):
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 165, in del
if self.isopen and sys.meta_path:
if self.isopen and sys.meta_path:
if self.isopen and sys.meta_path:
self.close()
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 165, in del
if self.isopen and sys.meta_path:
AttributeError: 'Viewer' object has no attribute 'isopen'
AttributeError: 'Viewer' object has no attribute 'isopen'
self.close()
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 81, in close
AttributeError: 'Viewer' object has no attribute 'isopen'
AttributeError: 'Viewer' object has no attribute 'isopen'
if self.isopen and sys.meta_path:
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 81, in close
AttributeError: 'Viewer' object has no attribute 'isopen'
if self.isopen and sys.meta_path:
AttributeError: 'Viewer' object has no attribute 'isopen'

l don't konw how to fix it

Hyper-parameters for successful DQN Agent

Hi @marload,

Great repository you have here 😄! I am running your DQN script and I am trying to solve CartPole with it (consistently get >200 score).

I ran the script with the default parameters, but the agent is having trouble learning a successful policy. All I get is fluctuating scores between 10 and 100 for the first 800 episodes I trained it on. There was one episode with >200 but it was early in the training and having in mind that eps would have been very high at this point I think this must have been due to chance.

So my question is - if you have trained a successful agent with this algorithm can you provide me with "working" parameters? Or maybe DQN is just unstable in nature and I should run the script a couple of more times and hope for something better?

I have not reviewed the code thoroughly, because I wanted to see it working first, but at first glance, it looks clean and simple.

Anyway, thanks for posting it on Reddit, not sure why it was deleted. I hope I can learn a thing or two from it since I am working on something similar at the moment. 😄

Have a great day!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.