Coder Social home page Coder Social logo

tinkoff-ai / sac-rnd Goto Github PK

View Code? Open in Web Editor NEW
49.0 3.0 4.0 23 KB

Official implementation for "Anti-Exploration by Random Network Distillation", ICML 2023

License: Apache License 2.0

Dockerfile 1.69% Python 98.31%
offline-reinforcement-learning random-network-distillation flax-implementation deep-reinforcement-learning jax-implementation

sac-rnd's Introduction

Anti-Exploration by Random Network Distillation

This repository contains an official implementation of Anti-Exploration by Random Network Distillation. All code is written in Jax.

Dependencies & Docker setup

To set up python environment (with dev-tools of your taste, in our workflow we use conda and python 3.8), just install all the requirements:

python install -r requirements.txt

However, in this setup, you would also need to install mujoco210 binaries by hand. Sometimes this is not super straightforward, but we used this recipe:

mkdir -p /root/.mujoco \
    && wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz \
    && tar -xf mujoco.tar.gz -C /root/.mujoco \
    && rm mujoco.tar.gz
export LD_LIBRARY_PATH=/root/.mujoco/mujoco210/bin:${LD_LIBRARY_PATH}

You may also need to install additional dependencies for mujoco_py. We recommend following the official guide from mujoco_py.

Docker

We also provide a simpler way, with a dockerfile that is already set up to work, all you have to do is build and run it :)

docker build -t sac_rnd .

To run, mount current directory:

docker run -it \
    --gpus=all \
    --rm \
    --volume "<PATH_TO_THE_REPO>/sac-rnd-jax:/workspace/sac-rnd-jax" \
    --name sac_rnd \
    sac_rnd bash

How to reproduce experiments

Configs for the main experiments are stored in the configs/sac-rnd/<task_type>. All available hyperparameters are listed in the offline_sac/algorithms/<algo>.py.

For example, to start SAC-RND training process with halfcheetah-medium-v2 dataset, run the following:

python offline_sac/algorithms/sac_rnd.py \
    --config_path="configs/sac-rnd/halfcheetah/halfcheetah_medium.yaml" \
    --beta=<take the best value from the paper appendix>

To reproduce our sweeps, create wandb sweep from configs in configs/sweeps. After that, start wandb agent with created sweep ID. That's all! Have fun!

Citing

If you use this code for your research, please consider the following bibtex:

@article{nikulin2023anti,
  title={Anti-Exploration by Random Network Distillation},
  author={Nikulin, Alexander and Kurenkov, Vladislav and Tarasov, Denis and Kolesnikov, Sergey},
  journal={arXiv preprint arXiv:2301.13616},
  year={2023}
}

sac-rnd's People

Contributors

howuhh avatar vkurenkov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.