Coder Social home page Coder Social logo

ericonaldo / ilswiss Goto Github PK

View Code? Open in Web Editor NEW
159.0 4.0 12.0 1.39 MB

ILSwiss is an Easy-to-run Imitation Learning (IL, or Learning from Demonstration, LfD) and also Reinforcement Learning (RL) framework (template) in PyTorch.

License: MIT License

Python 99.61% Dockerfile 0.39%

ilswiss's Introduction

ILSwiss

[News!] We have tested the environment in Python 3.10 and PyTorch 2.0 with other packages in new versions. You can install the new environment from requirement2.yaml. Notice that new versions of gym and gymnasium are not consistent yet.

ILSwiss is an Easy-to-run Imitation Learning (IL, or Learning from Demonstration, LfD) framework (template) in PyTorch based on existing code base.

If you want to run image-based dm_control benchmark more efficiently, try this repo.

This repository is built on top of rlswiss and rlkit. The original rlswiss contains meta-rl methods and redundant codes, in this repo, we clean and optimize the code architecture, modify and re-implement algorithms for the purpose of easier running imitation learning experiments (rlkit focus on general RL algorithms). We further introduce vec envs to sample data in a parallel style to boost the sampling stage refering to tianshou and add tensorboard support. ILSwiss supports experiments logging using wandb, and envpool for accelerating training (see example yaml file in https://github.com/Ericonaldo/ILSwiss/blob/main/exp_specs/sac/sac_hopper_envpool.yaml).

You can easily build experiment codes under this framework in your research. We will continue to maintain this repo while keeping it clear and clean.

Implementing RL algorithms (for potential researches):

  • ValueDICE
  • OPOLO

Implemented RL algorithms:

Implemented IL algorithms:

  • Adversarial Inverse Reinforcement Learning
    • AIRL / GAIL / FAIRL / Discriminator-Actor-Critic (DAC) (Different reward signals for AIRL / GAIL / FAIRL, and absorbing state for DAC)
  • Behaviour Cloning (BC)
  • Dagger

Requirements

To make sure the code runs correctly, we recommend using the following environment:

  • Linux platform (e.g. Ubuntu 18.04)
  • Python 3.8 (Anaconda)

To install the required packages, run the following command:

pip install -r requirements.txt

Due to mysterious reasons (e.g. network issues), you may fail to install some packages. As an alternative, we recommend the following steps:

Install dmc2gym by running

pip install git+https://github.com/denisyarats/dmc2gym.git

Install PyTorch. Select an appropriate version to match your CUDA version, e.g.

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

Install other packages by running

pip install -r requirements.txt

Running Notes:

Before running, assign important log and output paths in rlkit/launchers/config.py.

There are simple multiple processing shcheduling (we use multiple processing to clarify it with multi-processing since it only starts many independent sub-process without communication) for simple hyperparameter grid search.

The main entry is run_experiment.py, with the assigned experiment yaml file in exp_specs/: python run_experiment.py -g 0 -e your_yaml_path or python run_experiment.py -e your_yaml_path.

When you run the run_experiment.py, it reads the yaml file, and generate small yaml files with only one hyperparameter setting for each. In a yaml file, a script file path is assigned (see run_scripts/), which is specified to run the script with every the small yaml file. See exp_specs/sac/bc.yaml for necessary explaination of each parameter.

NOTE: all experiments, including the evaluation tasks (see run_scripts/evaluate_policy.py and exp_specs/evaluate_policy) and the render tasks, can be run under this framework by specifying the yaml file (in a multiple processes style).

Running RL algorithms

RL algorithms do not need demonstrations. Therefore, all you need is to write an experiment yaml file (see an example in exp_specs/sac/sac_hopper.yaml) and run with the above suggestions.

For on-policy algorithms (e.g., PPO), we clean the buffer after every training step.

Example scripts

-e means the path to the yaml file, -g means gpu id.

run mbpo for hopper:

python run_experiment.py -e exp_specs/mbpo/mbpo_hopper.yaml -g 0

run sac-ae for finger_spin:

python run_experiment.py -e exp_specs/sac/sac_ae_dmc_finger_spin.yaml -g 0

run sac for hopper:

python run_experiment.py -e exp_specs/sac/sac_hopper.yaml -g 0

run ppo for hopper:

python run_experiment.py -e exp_specs/ppo/ppo_hopper.yaml -g 0

run td3 for humanoid:

python run_experiment.py -e exp_specs/td3/td3_humanoid.yaml -g 0

run her for pick with td3:

python run_experiment.py -e exp_specs/her/her_pick_td3.yaml -g 0

Running IL algorithms

IL algorithms need to be assigned with demonstrations. The input-format-matching standard mujoco and dmc expert demonstrations data files run by us can be download in here. If you want to sample your own data, train an expert agent using RL algorithms and sample using run_scripts/gen_expert_demo.py or run_scripts/evaluate_policy.py, and do not forget to modify your IO format.

If you get the demos ready, write the path for each expert name in demos_listing.yaml (there are already some examples). Then you should specify the expert name and the traj number in the corresponding yaml file (see exp_specs/bc.yaml for example). After all the stuff, you can run it as a regular experiment following the above suggestions.

Example scripts

gen expert data for hopper:

python run_experiment.py -e exp_specs/gen_expert/hopper.yaml -g 0

run bc for hopper:

python run_experiment.py -e exp_specs/bc.yaml -g 0

run gail for walker:

python run_experiment.py -e exp_specs/gail/gail_walker.yaml -g 0

Notes on wandb logging

The project name in wandb can be configured in rlkit/launchers/config.py. And the experiment name for each trial shown in wandb UI is the same as exp_name in the yaml file.

Some qualitive baseline reults

See exp_specs for detailed curve results.

SAC

Envs Mean Std
Pendulum 139.7313 79.8126
InvertedPendulum-v2 1000.0000 0.0000
InvertedDoublePendulum-v2 9358.8740 0.1043
Ant-v2 5404.5532 1520.4961
Hopper-v2 3402.9494 446.4877
Humanoid-v2 6043.9907 726.1788
HalfCheetah-v2 13711.6445 111.4709
Walker2d-v2 5639.3267 29.9715

SAC-AE

Envs Mean Std
Finger_Spin (600K) 983.42 5.82
Reach_Easy (1600K) 782.8 23.86

Random

Envs Mean Std
InvertedPendulum-v2 25.2800 5.5318
InvertedDoublePendulum-v2 78.2829 10.7335
Ant-v2 713.5986 203.9204
Hopper-v2 13.0901 0.1022
Humanoid-v2 64.7384 2.3037
HalfCheetah-v2 74.4849 12.3917
Walker2d-v2 7.0708 0.1292
Swimmer-v2 15.5430 6.6655

ilswiss's People

Contributors

ericonaldo avatar fineartz avatar xxyqwq avatar zbzhu99 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ilswiss's Issues

FileNotFoundError:

Traceback (most recent call last):
File "run_scripts/gen_expert_demos.py", line 335, in
experiment(exp_specs)
File "run_scripts/gen_expert_demos.py", line 152, in experiment
policy = joblib.load(specs["expert_path"])["policy"]
File "/mnt/liang/miniconda3/envs/py38/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 577, in load
with open(filename, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: './logs/test-gen-hopper-demos/test_gen_hopper_demos_2023_04_13_09_54_46_0000--s-0/params.pkl'

Excuse me, where is this params.pkl file?

Support wandb logger

Recently I integrate wandb logger for training record tracking in other repo, which has similar code structures. I will try to do migrate the feature to this repository.

For more details about wandb, refer to https://wandb.ai

Where can I get the "Random" results shown in the readme?

  • Random表格中展示的结果是用类似随机游走(没用强化学习算法)的方法得到的结果嘛?我没有在exp_specs中找到对应的yaml文件。
  • 有没有在这个代码框架下的论文发表呢?我想参考一下。

想问下AIRL能否用于TD3或者ddpg?

作者你们好,
我看AIRL原文中,AIRL用于on-policy algorithm. 如果我想用AIRL作用于ddpg,但ddpg model只输出一个确定的动作,AIRL该如何训练?

谢谢!

Where is DAC?

I'm sorry to bother you. But I can not find the achievement of DAC in your code

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.