alfredvc / paac Goto Github PK
View Code? Open in Web Editor NEWOpen source implementation of the PAAC algorithm presented in Efficient Parallel Methods for Deep Reinforcement Learning
Home Page: https://arxiv.org/abs/1705.04862
License: Other
Open source implementation of the PAAC algorithm presented in Efficient Parallel Methods for Deep Reinforcement Learning
Home Page: https://arxiv.org/abs/1705.04862
License: Other
Hello,
May I ask a naive question, did you try to implement LSTM on this architecture? Or you already did it and find it is not efficient (maybe time consuming?) as people think.
In any case thanks for not such harware-demanding idea/architecture.
Best,
Chih-Chieh
Hi,
Thanks for the great implementation, I am currently learning RL and I am trying to adapt paac for a simple use case of CartPole. I made modifications to paac code to include a new environment for CartPole and also modified network for NIPS network to a simple Linear Network. In essence, I am trying to reproduce the A3C implementation of CartPole from https://github.com/jaara/AI-blog/blob/master/CartPole-A3C.py. Running paac on CartPole never seems to be converging to higher rewards, the Maximum reward I get is around 30. I understand that every environment needs tuning of the hyperparameters but I don't know what else can I try to make it work for a Simple use case of CartPole. The reference implementation of A3C at https://github.com/jaara/AI-blog/blob/master/CartPole-A3C.py coverages to successful rewards after few thousand steps but the paac implementation never moves beyond 30. Can you recommend anything else I can do to make it work or am I missing any fundamental settings? The changes I have already tried are
The paac model is capable of solving much more complicated environments and I am suprised that it is struggling with the classic and simplest Cartpole problem. I expected paac to solve the CartPole problem much faster than CPU based A3C
Thanks in advance
Looking at a handful A3C implementations and results on Seaquest, they appear to score around 50K:
PAAC however, reaches a plateau around 2K according to our tests (similar to your paper). Visual inspection of the policy shows that the submarine does not resurface. While a common difficulty of the game, A3C appears to be able to overcome it (maybe this could be due to a modification in OpenAI Gym since their Atari setup has some differences with ALE).
We've looked at various explorations (e-greedy, boltzmann, bayesian dropout), with no improvement at the moment.
Do you seen any particular reason PAAC would underperform in this case ? LSTM might help, but from the two OpenAI Gym pointers above, it seems it should not be critical for Seaquest.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.