Coder Social home page Coder Social logo

halleewong / implementations-nfq Goto Github PK

View Code? Open in Web Editor NEW

This project forked from seungjaeryanlee/implementations-nfq

0.0 0.0 0.0 438 KB

Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

License: MIT License

Python 98.77% Makefile 1.23%

implementations-nfq's Introduction

Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

black Build Status flake8 Build Status isort Build Status pytest Build Status

numpydoc Docstring Style pre-commit

This repository is an implementation of the paper Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method (Riedmiller, 2005).

Please โญ this repository if you found it useful!


Table of Contents ๐Ÿ“œ

For implementations of other deep learning papers, check the implementations repository!


Summary ๐Ÿ“

Neural Fitted Q-Iteration used a deep neural network for a Q-network, with its input being observation (s) and action (a) and its output being its action value (Q(s, a)). Instead of online Q-learning, the paper proposes batch offline updates by collecting experience throughout the episode and updating with that batch. The paper also suggests hint-to-goal method, where the neural network is trained explicitly in goal regions so that it can correctly estimate the value of the goal region.

Installation ๐Ÿงฑ

First, clone this repository from GitHub. Since this repository contains submodules, you should use the --recursive flag.

git clone --recursive https://github.com/seungjaeryanlee/implementations-nfq.git

If you already cloned the repository without the flag, you can download the submodules separately with the git submodules command:

git clone https://github.com/seungjaeryanlee/implementations-nfq.git
git submodule update --init --recursive

After cloing the repository, use the requirements.txt for simple installation of PyPI packages.

pip install -r requirements.txt

You can read more about each package in the comments of the requirements.txt file!

Running ๐Ÿƒ

You can train the NFQ agent on Cartpole Regulator using the given configuration file with the below command:

python train_eval.py -c cartpole.conf

For a reproducible run, use the --RANDOM_SEED flag.

python train_eval.py -c cartpole.conf --RANDOM_SEED=1

To save a trained agent, use the --SAVE_PATH flag.

python train_eval.py -c cartpole.conf --SAVE_PATH=saves/cartpole.pth

To load a trained agent, use the --LOAD_PATH flag.

python train_eval.py -c cartpole.conf --LOAD_PATH=saves/cartpole.pth

To enable logging to TensorBoard or W&B, use appropriate flags.

python train_eval.py -c cartpole.conf --USE_TENSORBOARD --USE_WANDB

Results ๐Ÿ“Š

This repository uses TensorBoard for offline logging and Weights & Biases for online logging. You can see the all the metrics in my summary report at Weights & Biases!

Train Episode Length Evaluation Episode Length

Train Episode Cost Evaluation Episode Cost

Total Cycle Total Cost Train Loss

Differences from the Paper ๐Ÿ‘ฅ

  • From the 3 environments (Pole Balancing, Mountain Car, Cartpole Regulator), only the Cartpole Regulator environment was implemented and tested. It is the most difficult environment.
  • For the Cartpole Regulator, the success state is relaxed so that the state is successful whenever the pole angle is at most 24 degrees away from upright position. In the original paper, the cart must also be in the center with 0.05 tolerance.
  • Evaluation of the trained policy is only done in 1 evaluation environment, instead of 1000.

Reproducibility ๐ŸŽฏ

Despite having no open-source code, the paper had sufficient details to implement NFQ. However, the results were not fully reproducible: we had to relax the definition of goal states and simplify evaluation. Still, the agent was able to learn to balance a CartPole for 3000 steps while only training from 100-step environment.

Few nits:

  • There is no specification of pole angle for goal and forbidden states. We set 0~24 degrees from upright position as a requirement for goal state and any state with 90+ degrees forbidden.
  • The paper randomly initializes network weights within [โˆ’0.5, 0.5], but does not mention bias initialization.
  • The goal velocity of the success states is not mentioned. We use a normal distribution to randomly generate velocities for the hint-to-goal variant.
  • It is unclear whether to add experience after or before training the agent for each epoch. We assume adding experience before training.
  • The learning rate for the Rprop optimizer is not specified.

implementations-nfq's People

Contributors

seungjaeryanlee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.