Coder Social home page Coder Social logo

synther's Introduction

Synthetic Experience Replay

Twitter arXiv

Synthetic Experience Replay (SynthER) is a diffusion-based approach to arbitrarily upsample an RL agent's collected experience, leading to large gains in sample efficiency and scaling benefits. We integrate SynthER into a variety of offline and online algorithms in this codebase, including SAC, TD3+BC, IQL, EDAC, and CQL. For further details, please see the paper:

Synthetic Experience Replay; Cong Lu*, Philip J. Ball*, Yee Whye Teh, Jack Parker-Holder. Published at NeurIPS, 2023.

View on arXiv

Setup

To install, clone the repository and run the following:

git submodule update --init --recursive
pip install -r requirements.txt

The code was tested on Python 3.8 and 3.9. If you don't have MuJoCo installed, follow the instructions here: https://github.com/openai/mujoco-py#install-mujoco.

Running Instructions

Offline RL

Diffusion model training (this automatically generates samples and saves them):

python3 synther/diffusion/train_diffuser.py --dataset halfcheetah-medium-replay-v2

Baseline without SynthER (e.g. on TD3+BC):

python3 synther/corl/algorithms/td3_bc.py --config synther/corl/yaml/td3_bc/halfcheetah/medium_replay_v2.yaml --checkpoints_path corl_logs/

Offline RL training with SynthER:

# Generating diffusion samples on the fly.
python3 synther/corl/algorithms/td3_bc.py --config synther/corl/yaml/td3_bc/halfcheetah/medium_replay_v2.yaml --checkpoints_path corl_logs/ --name SynthER --diffusion.path path/to/model-100000.pt

# Using saved diffusion samples.
python3 synther/corl/algorithms/td3_bc.py --config synther/corl/yaml/td3_bc/halfcheetah/medium_replay_v2.yaml --checkpoints_path corl_logs/ --name SynthER --diffusion.path path/to/samples.npz

Online RL

Baselines (SAC, REDQ):

# SAC.
python3 synther/online/online_exp.py --env quadruped-walk-v0 --results_folder online_logs/ --exp_name SAC --gin_config_files 'config/online/sac.gin'

# REDQ.
python3 synther/online/online_exp.py --env quadruped-walk-v0 --results_folder online_logs/ --exp_name REDQ --gin_config_files 'config/online/redq.gin'

SynthER (SAC):

# DMC environments.
python3 synther/online/online_exp.py --env quadruped-walk-v0 --results_folder online_logs/ --exp_name SynthER --gin_config_files 'config/online/sac_synther_dmc.gin' --gin_params 'redq_sac.utd_ratio = 20' 'redq_sac.num_samples = 1000000'

# OpenAI environments (different gin config).
python3 synther/online/online_exp.py --env HalfCheetah-v2 --results_folder online_logs/ --exp_name SynthER --gin_config_files 'config/online/sac_synther_openai.gin' --gin_params 'redq_sac.utd_ratio = 20' 'redq_sac.num_samples = 1000000'

Thinking of adding SynthER to your own algorithm?

Our codebase has everything you need for diffusion with low-dimensional data along with example integrations with RL algorithms. For a custom use-case, we recommend starting from the training script and SimpleDiffusionGenerator class in synther/diffusion/train_diffuser.py. You can modify the hyperparameters specified in config/resmlp_denoiser.gin to suit your own needs.

Additional Notes

  • Our codebase uses wandb for logging, you will need to set --wandb-entity across the repository.
  • Our pixel-based experiments are based on a modified version of the V-D4RL repository. The latent representations are derived from the trunks of the actor and critic.

Acknowledgements

SynthER builds upon many works and open-source codebases in both diffusion modelling and reinforcement learning. We would like to particularly thank the authors of:

Contact

Please contact Cong Lu or Philip Ball for any queries. We welcome any suggestions or contributions!

synther's People

Stargazers

gyw5131 avatar Gao Tianci  avatar  avatar  avatar joonhyung-lee avatar Élise Zhang avatar Hanye Zhao avatar Jatan Shrestha avatar Maxim Bobrin avatar Qimao Chen avatar  avatar Son Hun Seo avatar  avatar Chengqian Gao avatar Shawn avatar Shyam Sudhakaran avatar Zihan Ding avatar TAO BODONG avatar  avatar Pierre Schumacher avatar Samuel Garcin avatar Luke Meyers avatar Jose Cohenca avatar Aidan Scannell avatar sujin yun avatar Jaewoo Lee avatar Harris avatar Tonic avatar  avatar Emlyn avatar  avatar  avatar Blank Shuo avatar  avatar Taeyoung avatar DaeHee Lee avatar  avatar typoverflow avatar  avatar  avatar kaku avatar Pengyu Chen avatar MJ Shin avatar Denis Tarasov avatar el avatar Logan Kirkland avatar Sandalots avatar Grandad avatar 爱可可-爱生活 avatar Mitsuhiko Nakamoto avatar Guangyuan Zhao avatar Vladislav Kurenkov avatar Stone Tao avatar Aditya Mohan avatar Daniel Lawson avatar Hany Hamed avatar Faizan Shaikh avatar Jialong Wu avatar Jung Yeon Lee avatar Yoon, Seungje avatar Cong Lu avatar

Watchers

Philip J. Ball avatar Cong Lu avatar G. Lan avatar

synther's Issues

Specify supported python versions

Hey!

Many thanks for your interesting paper and providing your code.

I have some installation trouble and had to try a few different python versions to get the requirements to install successfully. Specifically, I found for more recent python versions pip was unable to build the wheel for dm-control. It could be helpful to specify in the readme which versions you have used to simplify this process. I found 3.7.9 worked.

Thanks,

Glass

The issue about online experiment

When I run the online experiment, the program gives the following error:

  File "/home/xxx/miniconda3/envs/synth/lib/python3.8/site-packages/dm_control/mujoco/index.py", line 302, in _get_size_name
    return sizes.array_sizes[struct_name][field_name][0]
KeyError: 'name_actuatoradr'
  In call to configurable 'redq_sac' (<function redq_sac at 0x7f55e0a46310>)

Could you please tell me if my environment configuration might cause the problem? (No problems with offline WM training and policy training)
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.