Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

This repository is an implementation of the paper Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method (Riedmiller, 2005).

Please ⭐ this repository if you found it useful!

For implementations of other deep learning papers, check the implementations repository!

Summary 📝

Neural Fitted Q-Iteration used a deep neural network for a Q-network, with its input being observation (s) and action (a) and its output being its action value (Q(s, a)). Instead of online Q-learning, the paper proposes batch offline updates by collecting experience throughout the episode and updating with that batch. The paper also suggests hint-to-goal method, where the neural network is trained explicitly in goal regions so that it can correctly estimate the value of the goal region.

Installation 🧱

First, clone this repository from GitHub. Since this repository contains submodules, you should use the --recursive flag.

git clone --recursive https://github.com/seungjaeryanlee/implementations-nfq.git

If you already cloned the repository without the flag, you can download the submodules separately with the git submodules command:

git clone https://github.com/seungjaeryanlee/implementations-nfq.git
git submodule update --init --recursive

After cloing the repository, use the requirements.txt for simple installation of PyPI packages.

pip install -r requirements.txt

You can read more about each package in the comments of the requirements.txt file!

Running 🏃

You can train the NFQ agent on Cartpole Regulator using the given configuration file with the below command:

python train_eval.py -c cartpole.conf

For a reproducible run, use the --RANDOM_SEED flag.

python train_eval.py -c cartpole.conf --RANDOM_SEED=1

To save a trained agent, use the --SAVE_PATH flag.

python train_eval.py -c cartpole.conf --SAVE_PATH=saves/cartpole.pth

To load a trained agent, use the --LOAD_PATH flag.

python train_eval.py -c cartpole.conf --LOAD_PATH=saves/cartpole.pth

To enable logging to TensorBoard or W&B, use appropriate flags.

python train_eval.py -c cartpole.conf --USE_TENSORBOARD --USE_WANDB

Results 📊

This repository uses TensorBoard for offline logging and Weights & Biases for online logging. You can see the all the metrics in my summary report at Weights & Biases!

Differences from the Paper 👥

From the 3 environments (Pole Balancing, Mountain Car, Cartpole Regulator), only the Cartpole Regulator environment was implemented and tested. It is the most difficult environment.
For the Cartpole Regulator, the success state is relaxed so that the state is successful whenever the pole angle is at most 24 degrees away from upright position. In the original paper, the cart must also be in the center with 0.05 tolerance.
Evaluation of the trained policy is only done in 1 evaluation environment, instead of 1000.

Reproducibility 🎯

Despite having no open-source code, the paper had sufficient details to implement NFQ. However, the results were not fully reproducible: we had to relax the definition of goal states and simplify evaluation. Still, the agent was able to learn to balance a CartPole for 3000 steps while only training from 100-step environment.

Few nits:

There is no specification of pole angle for goal and forbidden states. We set 0~24 degrees from upright position as a requirement for goal state and any state with 90+ degrees forbidden.
The paper randomly initializes network weights within [−0.5, 0.5], but does not mention bias initialization.
The goal velocity of the success states is not mentioned. We use a normal distribution to randomly generate velocities for the hint-to-goal variant.
It is unclear whether to add experience after or before training the agent for each epoch. We assume adding experience before training.
The learning rate for the Rprop optimizer is not specified.

halleewong / implementations-nfq Goto Github PK

implementations-nfq's Introduction

Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

Table of Contents 📜

Summary 📝

Installation 🧱

Running 🏃

Results 📊

Differences from the Paper 👥

Reproducibility 🎯

implementations-nfq's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent