Coder Social home page Coder Social logo

temporal_leap_hrl's Introduction

Temporal Leap Hierarchical Reinforcement Learning

Developed by Alex Zhao & Bowei He. This repo implements the algorithm Temporal Leap Hierarchical Reinforcement Learning. The code is based on the work by Nachum et al. https://github.com/tensorflow/models/tree/master/research/efficient-hrl


Code for performing Hierarchical RL based on the following publications:

"Data-Efficient Hierarchical Reinforcement Learning" by Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine (https://arxiv.org/abs/1805.08296).

"Near-Optimal Representation Learning for Hierarchical Reinforcement Learning" by Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine (https://arxiv.org/abs/1810.01257).

Requirements:

Quick Start:

Run a training job based on the original HIRO paper on Ant Maze:

python scripts/local_train.py test1 hiro_orig ant_maze base_uvf suite

Run a continuous evaluation job for that experiment:

python scripts/local_eval.py test1 hiro_orig ant_maze base_uvf suite

To run the same experiment with online representation learning (the "Near-Optimal" paper), change hiro_orig to hiro_repr. You can also run with hiro_xy to run the same experiment with HIRO on only the xy coordinates of the agent.

To run on other environments, change ant_maze to something else; e.g., ant_push_multi, ant_fall_multi, etc. See context/configs/* for other options.

Basic Code Guide:

The code for training resides in train.py. The code trains a lower-level policy (a UVF agent in the code) and a higher-level policy (a MetaAgent in the code) concurrently. The higher-level policy communicates goals to the lower-level policy. In the code, this is called a context. Not only does the lower-level policy act with respect to a context (a higher-level specified goal), but the higher-level policy also acts with respect to an environment-specified context (corresponding to the navigation target location associated with the task). Therefore, in context/configs/* you will find both specifications for task setup as well as goal configurations. Most remaining hyperparameters used for training/evaluation may be found in configs/*.

NOTE: Not all the code corresponding to the "Near-Optimal" paper is included. Namely, changes to low-level policy training proposed in the paper (discounting and auxiliary rewards) are not implemented here. Performance should not change significantly.

Maintained by Ofir Nachum (ofirnachum).

temporal_leap_hrl's People

Contributors

alexzhaozt avatar

Watchers

 avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.