alfredvc / paac Goto Github PK

Open source implementation of the PAAC algorithm presented in Efficient Parallel Methods for Deep Reinforcement Learning

Home Page: https://arxiv.org/abs/1705.04862

License: Other

Python 100.00%

reinforcement learning reinforcement-learning paac machine-learning atari tensorflow open-source

paac's People

Contributors

Stargazers

Watchers

Forkers

codeaudit ajaytalati kastnerkyle lolz0r benjamesbabala mpvyard b2220333 tigerneil xc35 andriylazorenko omoindrot dhfromkorea zencoding rishabgoel hedgefair gwding rl-ninja chihchiehchen pedronahum jolibrain horace89 milestonesvn gaceladri samliu arjunchandra nabk89 meelement trigrass2 mohala562 bat-cha b-kartal qipanyang thibautlavril eycab vickeex yellowfighter liuqiangopenmind banben indra-ipd mvisionai daitr616 ycl010203 rl-code-lib wxjzte chapter09 crackhegg gkuo06

paac's Issues

add LSTM layer

Hello,

May I ask a naive question, did you try to implement LSTM on this architecture? Or you already did it and find it is not efficient (maybe time consuming?) as people think.

In any case thanks for not such harware-demanding idea/architecture.

Best,
Chih-Chieh

Hi,
Thanks for the great implementation, I am currently learning RL and I am trying to adapt paac for a simple use case of CartPole. I made modifications to paac code to include a new environment for CartPole and also modified network for NIPS network to a simple Linear Network. In essence, I am trying to reproduce the A3C implementation of CartPole from https://github.com/jaara/AI-blog/blob/master/CartPole-A3C.py. Running paac on CartPole never seems to be converging to higher rewards, the Maximum reward I get is around 30. I understand that every environment needs tuning of the hyperparameters but I don't know what else can I try to make it work for a Simple use case of CartPole. The reference implementation of A3C at https://github.com/jaara/AI-blog/blob/master/CartPole-A3C.py coverages to successful rewards after few thousand steps but the paac implementation never moves beyond 30. Can you recommend anything else I can do to make it work or am I missing any fundamental settings? The changes I have already tried are

Change learning rate, lower learning rates seem to do better
Change network model to multiple layers 128->64->16 (with relu) and other configuration 512->256->128->64->16
Run it for longer duration ( more than 30 mins)
Change entropy to higher value (this one actually causes NaN's in gradients)

The paac model is capable of solving much more complicated environments and I am suprised that it is struggling with the classic and simplest Cartpole problem. I expected paac to solve the CartPole problem much faster than CPU based A3C

Thanks in advance

Low Seaquest avg score compared to A3C

Looking at a handful A3C implementations and results on Seaquest, they appear to score around 50K:

https://gym.openai.com/evaluations/eval_pjjgc9POQJK4IuVw8nXlBw (ConvNet)
https://gym.openai.com/evaluations/eval_uxYSMnhuTpCNLoPZ7DkxKQ (from https://github.com/dgriff777/rl_a3c_pytorch and with LSTM)

PAAC however, reaches a plateau around 2K according to our tests (similar to your paper). Visual inspection of the policy shows that the submarine does not resurface. While a common difficulty of the game, A3C appears to be able to overcome it (maybe this could be due to a modification in OpenAI Gym since their Atari setup has some differences with ALE).

We've looked at various explorations (e-greedy, boltzmann, bayesian dropout), with no improvement at the moment.

Do you seen any particular reason PAAC would underperform in this case ? LSTM might help, but from the two OpenAI Gym pointers above, it seems it should not be critical for Seaquest.

alfredvc / paac Goto Github PK

paac's People

Contributors

Stargazers

Watchers

Forkers

paac's Issues

add LSTM layer

Adapting paac for CartPole

Low Seaquest avg score compared to A3C

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent