Coder Social home page Coder Social logo

needle's Introduction

Introduction

This is my personal implementation for several algorithms, some of which are cutting edge, including

  1. Deep Q-Network (DQN)
  2. Deep Deterministic Policy Gradient (DDPG)
  3. Asynchronous Advantage Actor-Critic (A3C)
  4. REINFORCE
  5. Truncated Natural Policy Gradient (TNPG) (maybe I cited wrong paper since it doesn’t use Conjugate Gradient to solve equations?)
  6. Trust Region Policy Gradient with Generalized Advantage Estimation

Some optimizations are used. Double DQN is implemented instead of traditional DQN. Furthermore, prioritized sampling is currently being developing.

The library is inspired by a paper Benchmarking Deep Reinforcement Learning for Continuous Control, whose home page is here. If you find duplicated code, it’s my bad. I, however, promise to write every line of code myself.

Lots of codes are ad-hoc and needs refactored. Issues and discussions are always appreciated.

Tests

I developed the library in an ancient MacBook Air (Mid 2013, i5 with 4G RAM) without using GPU, so you should have no problems running all of these toy experiments.

Few examples are available now, due to lots of bugs. However, DDPG may succeed now. All codes depend on OpenAI/gym and TensorFlow, so if you want to run any experiments, install them please.

Examples commands:

python main.py --mode train --agent DDPG      --env MountainCarContinuous-v0
python main.py --mode train --agent REINFORCE --env CartPole-v0               --batch_size 10 --iterations 8000 --learning_rate 0.1
python main.py --mode train --agent A2C       --env CartPole-v0               --replay_buffer_size 200 --batch_size 200
python main.py --mode train --agent A2C       --env Copy-v0                   --replay_buffer_size 200 --batch_size 200 --iterations 6000
python main.py --mode train --agent TNPG      --env Copy-v0                   --batch_size 10 --iterations 8000
python main.py --mode train --agent TRPO      --env Copy-v0                   --batch_size 10 --iterations 8000

Notes

Experimental A2C (synchronous advantage actor-critic) running on CartPole-v0. Note that A2C uses LSTM by default.

A2C on `Copy-v0` succeeds with probability about 0.7 after 4k-6k steps, or gets stuck at a local minimum where for some specific characters the agent would always go left. I find using small learning rate for actor helps find global minimum.

TNPG sometiomes solves `Copy-v0` in ~1k steps. More experiments?

Question: can we combine TNPG and A3C with LSTM? The actor network and critic network shares many weights and how to apply suitable gradient on them?

5 independent runs of TNPG (batch size 10, delta_KL = 0.001):

./doc/TNPG.png

needle's People

Contributors

roosephu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.