lifelong_rl

Overview

Pytorch implementations of RL algorithms, focusing on model-based, lifelong, reset-free, and offline algorithms. Official codebase for Reset-Free Lifelong Learning with Skill-Space Planning. Originally dervied from rlkit.

Status

Project is released but will receive updates periodically. Contributions, bugs, benchmarking, or other comments are welcome.

Algorithms in this codebase

Reset-Free RL
- Lifelong Skill-Space Planning* (Lu et al. 2020)
Model-Based Online RL*
- Model-Based Policy Optimization (Janner et al. 2019)
- Model Predictive Control (ex. Chua et al. 2018)
- Learning Off-Policy with Online Planning (Sikchi et al. 2020)
Online Skill Discovery/Multitask RL
- Dynamics-Aware Discovery of Skills (Sharma et al. 2019)
- Hindsight Experience Replay (Andrychowicz et al. 2017)
Offline RL
- Model-Based Offline RL* (Kidambi et al. 2020)
- Model-Based Offline Policy Optimization* (Yu et al. 2020)
Model-Free Online RL
- Soft Actor Critic (Haarnoja et al. 2018)
- Twin Delayed Deep Deterministic Policy Gradient (Fujimoto et al. 2018)
- Proximal Policy Optimization (Schulman et al. 2017)
- Deep Deterministic Policy Gradient (Lillicrap et al. 2015)
- Natural Policy Gradient
- Vanilla Policy Gradient

Note: "Online" here means not offline, i.e. data is being collected in an environment. "Batch" refers to algorithms that learn from data in batches, ex. PPO (rather than from a replay buffer), not as a synonym for offline RL.

*Reward and terminal functions are learned in this codebase for ease of flexibility, but we also support providing these by hand.

Usage

Installation

Install Anaconda environment
```
$ conda env create -f environment.yml
```
Optionally, also install MuJoCo: see instructions here.
Install doodad to run experiments (v0.2).

Running experiments

You can run experiments with:

python run_scripts/<script name>.py

Use -h to see more options for running. Experiments require a variant dictionary (equivalently to rlkit), which specify a base setting for each hyperparameter. Additionally, experiments also require a sweep_values dictionary, which should only contain the hyperparameters that will be swept over (overwriting the original value in variant).

Logging experiments

Results from experiments are saved in data/, and a snapshot containing the relevant networks to evaluate policies offline is stored in itr_$n every save_snapshot_every epochs. Data from the offline training phase is stored in offline_itr_$n instead. We support Viskit for plotting or Weights and Biases (include -w True the call to the run script).

Visualizing experiments

scripts/viz_hist.py can be used to record a video from a MuJoCo environment using stored data from the agent's replay buffer, which is modified to additionally store env sim states for MuJoCo environments. There are also a variety of ways visualization can be done manually.

Repo structure

agent_data/
- Stores .pkl files of numpy arrays of past transitions
- Useful for demonstrations, offline data, etc.
- You can download some example datasets from our link here
data/
- Stores logging information and experiment models
- itr_$n is the snapshot after epoch $n; similarly offline_itr_$n is for offline training
experiment_configs/
- Experiment configuration files
- get_config creates a dictionary consisting of networks and parameters used to initialize a run
- get_offline_algorithm and get_algorithm create an RLAlgorithm from the config
experiment_utils/
- Files associated with launching experiments with doodad (should not require modification)
lifelong_rl/
- Main codebase
run_scripts/
- Scripts to launch experiments: pick config, algorithm, hyperparameters
- If only both an offline algorithm and algorithm are specified, the offline algorithm is run first
- Should specify hyperparameters for runs in variant
- Optionally, perform a grid search over some hyperparameters usingsweep_params
scripts/
- Example utility scripts

Acknowledgements

This codebase was originally modified from rlkit. Some parts of the code are taken from ProMP, mjrl, handful-of-trials-pytorch, and dads.

Citation

This is the official codebase for Reset-Free Lifelong Learning with Skill-Space Planning. Note that the code has been modified since the paper so results may be slightly different.

@inproceedings{lu2021lisp,
  title     = {Reset-Free Lifelong Learning with Skill-Space Planning},
  author    = {Kevin Lu and
               Aditya Grover and
               Pieter Abbeel and
               Igor Mordatch},
  booktitle = {9th International Conference on Learning Representations, {ICLR} 2021,
               Virtual Event, Austria, May 3-7, 2021},
  year      = {2021}
}

License

MIT

zhouzhi1 / lifelong_rl Goto Github PK

lifelong_rl's Introduction

lifelong_rl

Overview

Status

Algorithms in this codebase

Usage

Installation

Running experiments

Logging experiments

Visualizing experiments

Repo structure

Acknowledgements

Citation

License

lifelong_rl's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent