Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization

ICLR 2024

Kun Lei · Zhengmao He* · Chenhao Lu* · Kaizhe Hu · Yang Gao · Huazhe Xu

Project Page | arXiv | Twitter

Code Overview

We evaluate Uni-O4 on standard D4RL benchmarks during offline and online fine-tuning phases. In addition, we utilize Uni-O4 to enable rapid adaptation of our quadrupedal robot dog to new and challenging environments. This repo contains five branches:

master (default) -> Uni-O4
go1_sdk -> sdk set-up for go1 robot
data_collecting_deployment -> Deploying go1 in real-world for data collecting
unio4-offline-robot -> Run Uni-O4 on dataset collected dy real-world robot dog
go1-online-finetuning -> Fine-tuning the robot in real-world online

Clone each branch: git clone -b [Branch Name] https://github.com/Lei-Kun/Uni-O4.git

For D4RL benchmarks

Requirements

torch 1.12.0
mujoco 2.2.1
mujoco-py 2.1.2.14
d4rl 1.1

To install all the required dependencies:

Install MuJoCo from here.
Install Python packages listed in requirements.txt using pip install -r requirements.txt. You should specify the version of mujoco-py in requirements.txt depending on the version of MuJoCo engine you have installed.
Manually download and install d4rl package from here.

Running the code

main.py: trains the network, storing checkpoints along the way. Other domain set-up comming soon.
Example - for offline pre-training:

./scripts/mujoco_loco/hm.sh

Example - for online fine-tuning:

./ppo_finetune/scripts/mujoco_loco/hm.sh

Real-world tasks set-up

See INSTALL.md for installation instructions.

For real-world adaptation tasks involving quadrupedal robots, our approach involves a three-step process. Firstly, we pre-train a policy in a simulator, which takes several minutes to complete. Then, we proceed with fine-tuning the policy in the real-world environment, both offline and online, utilizing the uni-o4 algorithm.

Pretrining in Issacgym:

cd ./unio4-offline-robot
pip install -e .
cd ./scripts
python train.py

Fine-tuning by uni-o4 offline - collecting data (build sdk follows INSTALL.md):

1）Start up go1 sdk:

cd ./go1_sdk/build
./lcm_position

2）Run:

cd ./data_collecting_deployment
pip install -e .
cd ./data_collecting_deployment/go1_gym_deploy/scripts
python deploy_policy --deploy_policy 'sim'

'sim' -> pretrained policy in simulator
'offline' -> offline fine-tuned policy in real-world
'online' -> online fine-tuned policy in real-world

Fine-tuning by uni-o4 offline - run uni-o4 on collected dataset:

copy dataset to unio4-offline-robot
cd ./unio4-offline-robot
./run.sh

Fine-tuning by PPO online:

cd ./go1_sdk/build
./lcm_position
cd ./go1-online-finetuning
python off2on.py

Citation

If you use Uni-O4, please cite our paper as follows:

@inproceedings{
lei2024unio,
title={Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization},
author={Kun LEI and Zhengmao He and Chenhao Lu and Kaizhe Hu and Yang Gao and Huazhe Xu},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=tbFBh3LMKi}
}

lei-kun / uni-o4 Goto Github PK

uni-o4's Introduction

Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization

ICLR 2024

Project Page | arXiv | Twitter

Code Overview

For D4RL benchmarks

Requirements

Running the code

Real-world tasks set-up

Citation

uni-o4's People

Contributors

Stargazers

Watchers

uni-o4's Issues

Config file for the Antmaze transition model

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent