Coder Social home page Coder Social logo

codekk / eureka Goto Github PK

View Code? Open in Web Editor NEW

This project forked from eureka-research/eureka

0.0 0.0 0.0 182.44 MB

Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models"

Home Page: https://eureka-research.github.io/

License: MIT License

Python 18.35% HTML 0.04% CMake 0.01% Batchfile 0.01% Jupyter Notebook 81.61%

eureka's Introduction

Eureka: Human-Level Reward Design via Coding Large Language Models

[Website] [arXiv] [PDF]

Python Version GitHub license


eureka_zoomout.mp4

Large Language Models (LLMs) have excelled as high-level semantic planners for sequential decision-making tasks. However, harnessing them to learn complex low-level manipulation tasks, such as dexterous pen spinning, remains an open problem. We bridge this fundamental gap and present Eureka, a human-level reward design algorithm powered by LLMs. Eureka exploits the remarkable zero-shot generation, code-writing, and in-context improvement capabilities of state-of-the-art LLMs, such as GPT-4, to perform in-context evolutionary optimization over reward code. The resulting rewards can then be used to acquire complex skills via reinforcement learning. Eureka generates reward functions that outperform expert human-engineered rewards without any task-specific prompting or pre-defined reward templates. In a diverse suite of 29 open-source RL environments that include 10 distinct robot morphologies, Eureka outperforms human expert on 83% of the tasks leading to an average normalized improvement of 52%. The generality of Eureka also enables a new gradient-free approach to reinforcement learning from human feedback (RLHF), readily incorporating human oversight to improve the quality and the safety of the generated rewards in context. Finally, using Eureka rewards in a curriculum learning setting, we demonstrate for the first time a simulated five-finger Shadow Hand capable of performing pen spinning tricks, adeptly manipulating a pen in circles at human speed.

Installation

Eureka requires Python โ‰ฅ 3.8. We have tested on Ubuntu 20.04 and 22.04.

  1. Create a new conda environment with:

    conda create -n eureka python=3.8
    conda activate eureka
    
  2. Install IsaacGym (tested with Preview Release 4/4). Follow the instruction to download the package.

tar -xvf IsaacGym_Preview_4_Package.tar.gz
cd isaacgym/python
pip install -e .
(test installation) python examples/joint_monkey.py
  1. Install Eureka
git clone https://github.com/eureka-labs/Eureka.git
cd Eureka; pip install -e .
cd isaacgymenvs; pip install -e .
cd ../rl_games; pip install -e .
  1. Eureka currently uses OpenAI API for language model queries. You need to have an OpenAI API key to use Eureka here/. Then, set the environment variable in your terminal
export OPENAI_API_KEY= "YOUR_API_KEY"

Getting Started

Navigate to the eureka directory and run:

python eureka.py env={environment} iteration={num_iterations} sample={num_samples}
  • {environment} is the task to perform. Options are listed in eureka/cfg/env.
  • {num_samples} is the number of reward samples to generate per iteration. Default value is 16.
  • {num_iterations} is the number of Eureka iterations to run. Default value is 5.

Below are some example commands to try out Eureka:

python eureka.py env=shadow_hand sample=4 iteration=2 model=gpt-4-0314
python eureka.py env=humanoid sample=16 iteration=5 model=gpt-3.5-turbo-16k-0613

Each run will create a timestamp folder in eureka/outputs that saves the Eureka log as well as all intermediate reward functions and associated policies.

Other command line parameters can be found in eureka/cfg/config.yaml. The list of supported environments can be found in eureka/cfg/env.

Eureka Pen Spinning Demo

We have released Eureka pen spinning policy in isaacgymenvs/isaacgymenvs/checkpoints. Try visualizing it with the following command:

cd isaacgymenvs/isaacgymenvs
python train.py test=True headless=False force_render=True task=ShadowHandSpin checkpoint=checkpoints/EurekaPenSpinning.pth

Running Eureka on a New Environment

  1. Create a new IsaacGym environment; instructions can be found in here.
  2. Verify that standard RL works for your new environment.
cd isaacgymenvs/isaacgymenvs
python train.py task=YOUR_NEW_TASK
  1. Create a new yaml file your_new_task.yaml in eureka/cfg/env:
env_name: your_new_task
task: YOUR_NEW_TASK 
description: ...
  1. Construct the raw environment code that will serve as context for Eureka as well as the skeleton environment code on which the Eureka reward will be appended to:
cd eureka/utils
python prune_env.py your_new_task
  1. Try out Eureka!
python eureka.py env=your_new_task

Acknowledgement

We thank the following open-sourced projects:

License

This codebase is released under MIT License.

Citation

If you find our work useful, please consider citing us!

@article{ma2023eureka,
  title   = {Eureka: Human-Level Reward Design via Coding Large Language Models},
  author  = {Yecheng Jason Ma and William Liang and Guanzhi Wang and De-An Huang and Osbert Bastani and Dinesh Jayaraman and Yuke Zhu and Linxi Fan and Anima Anandkumar},
  year    = {2023},
  journal = {arXiv preprint arXiv: Arxiv-2310.12931}
}

Disclaimer: This project is strictly for research purposes, and not an official product from NVIDIA.

eureka's People

Contributors

jasonma2016 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.