Coder Social home page Coder Social logo

zhouzhi1 / lifelong_rl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kzl/lifelong_rl

0.0 0.0 0.0 127 KB

Pytorch implementations of RL algorithms, focusing on model-based, lifelong, reset-free, and offline algorithms. Official codebase for Reset-Free Lifelong Learning with Skill-Space Planning.

License: MIT License

Python 100.00%

lifelong_rl's Introduction

lifelong_rl

Overview

Pytorch implementations of RL algorithms, focusing on model-based, lifelong, reset-free, and offline algorithms. Official codebase for Reset-Free Lifelong Learning with Skill-Space Planning. Originally dervied from rlkit.

Status

Project is released but will receive updates periodically. Contributions, bugs, benchmarking, or other comments are welcome.

Algorithms in this codebase

Note: "Online" here means not offline, i.e. data is being collected in an environment. "Batch" refers to algorithms that learn from data in batches, ex. PPO (rather than from a replay buffer), not as a synonym for offline RL.

*Reward and terminal functions are learned in this codebase for ease of flexibility, but we also support providing these by hand.

Usage

Installation

  1. Install Anaconda environment

    $ conda env create -f environment.yml
    

    Optionally, also install MuJoCo: see instructions here.

  2. Install doodad to run experiments (v0.2).

Running experiments

You can run experiments with:

python run_scripts/<script name>.py

Use -h to see more options for running. Experiments require a variant dictionary (equivalently to rlkit), which specify a base setting for each hyperparameter. Additionally, experiments also require a sweep_values dictionary, which should only contain the hyperparameters that will be swept over (overwriting the original value in variant).

Logging experiments

Results from experiments are saved in data/, and a snapshot containing the relevant networks to evaluate policies offline is stored in itr_$n every save_snapshot_every epochs. Data from the offline training phase is stored in offline_itr_$n instead. We support Viskit for plotting or Weights and Biases (include -w True the call to the run script).

Visualizing experiments

scripts/viz_hist.py can be used to record a video from a MuJoCo environment using stored data from the agent's replay buffer, which is modified to additionally store env sim states for MuJoCo environments. There are also a variety of ways visualization can be done manually.

Repo structure

  • agent_data/
    • Stores .pkl files of numpy arrays of past transitions
    • Useful for demonstrations, offline data, etc.
    • You can download some example datasets from our link here
  • data/
    • Stores logging information and experiment models
    • itr_$n is the snapshot after epoch $n; similarly offline_itr_$n is for offline training
  • experiment_configs/
    • Experiment configuration files
    • get_config creates a dictionary consisting of networks and parameters used to initialize a run
    • get_offline_algorithm and get_algorithm create an RLAlgorithm from the config
  • experiment_utils/
    • Files associated with launching experiments with doodad (should not require modification)
  • lifelong_rl/
    • Main codebase
  • run_scripts/
    • Scripts to launch experiments: pick config, algorithm, hyperparameters
    • If only both an offline algorithm and algorithm are specified, the offline algorithm is run first
    • Should specify hyperparameters for runs in variant
    • Optionally, perform a grid search over some hyperparameters usingsweep_params
  • scripts/
    • Example utility scripts

Acknowledgements

This codebase was originally modified from rlkit. Some parts of the code are taken from ProMP, mjrl, handful-of-trials-pytorch, and dads.

Citation

This is the official codebase for Reset-Free Lifelong Learning with Skill-Space Planning. Note that the code has been modified since the paper so results may be slightly different.

@inproceedings{lu2021lisp,
  title     = {Reset-Free Lifelong Learning with Skill-Space Planning},
  author    = {Kevin Lu and
               Aditya Grover and
               Pieter Abbeel and
               Igor Mordatch},
  booktitle = {9th International Conference on Learning Representations, {ICLR} 2021,
               Virtual Event, Austria, May 3-7, 2021},
  year      = {2021}
}

License

MIT

lifelong_rl's People

Contributors

kzl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.