eureka-research / eureka Goto Github PK

View Code? Open in Web Editor NEW

2.8K 2.8K 244.0 182.44 MB

Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)

Home Page: https://eureka-research.github.io/

License: MIT License

HTML 0.04% Batchfile 0.01% Python 18.35% CMake 0.01% Jupyter Notebook 81.61%

eureka's People

Contributors

Stargazers

Watchers

Forkers

tomchapin takuyahiraoka codeaudit xiechengmude jiahexu magijedi mohamed-dhouioui shism2 omarcr gareththomasnz anup2ladder codekk doytsujin cryptlabs andersonamaral2 thanhpham1987 leopold-fitz-ai vortexmath 0iui0 xgithubzero raghavjha01 diahidvegi wayan123 contropist chriscarrollsmith neuroidss rhinojosa lyhiving yangxinyu418 nd1097 shaorz mmmkkk888 646677064 haorand iwillcodeu eltociear gaolongsen yashaspa pierreembacher cyrilmagsuci omniatheatre yucnet andrewyu0 techthiyanes icetomoyo signalprime keyman9848 2132660698 swzhaogang skeli9989 standardgalactic immartian wmlabs rioncarter dantegpt cloudenginehub jxzhangjhu wesley7137 gaborkukucska axelfernandes evdcush universewill mbakpur123 leevaleeth adeliavale gaurovgiri petrolha fredatgithub farhadfa22 pyyoda vahij barath19 navalboggyr leejwuniverse bombolino koolgool bobokf aortel22 baba-ali-graph fabiodr k2m5t2 racofernandez goswamig 404nikhil thisisformuchfun lbov278232 rkp64 leu3 aitomator dblitz aerocedi sj-jay mingqc gomeo03 ellishowar idarren-deng 9cat ppqqaz5 rio100 nickfoory

eureka's Issues

The reward function "Sparse"

Hi, I can not find the baseline reward function "Sparse" in the code. And I wonder how to calculate Human Normalized Scores like Issue#22.

starcraft 2 training possible?

hello:just want to know that
can Eureka be used in starcraft 2 envs for training reward function ?

issue with setup.py

line 5 in https://github.com/eureka-research/Eureka/blob/main/setup.py needs , at the end of the line to work with pip install

Hi, congratulations on your amazing work, when I was trying to install IsaacGymEnvs, I run into the issue that there is no module named fbx, and when I trying to solve it in https://aps.autodesk.com/developer/overview/fbx-sdk it seems that there is no wheel file at all. Then everything gets stuck. Does anyone know why?

Thank you!

(eureka) yu@yu-G470:~/project/Eureka$ python test.py
Importing module 'gym_38' (/home/yu/project/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/yu/project/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
/home/yu/anaconda3/envs/eureka/lib/python3.8/site-packages/torch/utils/cpp_extension.py:28: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import packaging  # type: ignore[attr-defined]
/home/yu/anaconda3/envs/eureka/lib/python3.8/site-packages/pkg_resources/__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/home/yu/anaconda3/envs/eureka/lib/python3.8/site-packages/pkg_resources/__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
PyTorch version 2.1.1+cu121
Device count 1
/home/yu/project/isaacgym/python/isaacgym/_bindings/src/gymtorch
Using /home/yu/.cache/torch_extensions/py38_cu121 as PyTorch extensions root...
Emitting ninja build file /home/yu/.cache/torch_extensions/py38_cu121/gymtorch/build.ninja...
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module gymtorch...
2023-11-19 21:37:46,282 - INFO - logger - logger initialized
<unknown>:3: DeprecationWarning: invalid escape sequence \*
Error: FBX library failed to load - importing FBX data will not succeed. Message: No module named 'fbx'
FBX tools must be installed from https://help.autodesk.com/view/FBX/2020/ENU/?guid=FBX_Developer_Help_scripting_with_python_fbx_installing_python_fbx_html
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    envs = isaacgymenvs.make(
  File "/home/yu/project/Eureka/isaacgymenvs/isaacgymenvs/__init__.py", line 29, in make
    env_path = cfg.env_path
AttributeError: 'NoneType' object has no attribute 'env_path'

find the Human Normalised Score metric in the log

I like your work very much. I am reproducing your results in this paper. I have successfully run the eureka.py code and got many log files. I can only find the Success Rate metric but can not in these log files. Can you tell me where this metric is stored or how to calculate it?
Here are some logs.
env_iter4_response2.txt
eureka.log
reward_code_eval4.txt

No well-defined code fencing in README.md

There is no defined bash code fencing for bash scripts. I would like to contribute, by adding them.

ImportError: libpython3.8.so.1.0: cannot open shared object file: No such file or directory

The former procedures are successful.
However,

visual perception (from the robot camera)

Hi,

How to process visual perception (from the robot camera) to as the current OpenAI APIs does not have image capability?

subprocess.POPEN waits forever

I was running eureka.py and have output:

python eureka.py env=shadow_hand sample=4 iteration=2 model=gpt-4-0314
[2023-10-24 23:13:24,730][root][INFO] - Workspace: /home/123/isaacgym/python/Eureka/eureka/outputs/eureka/2023-10-24_23-13-24
[2023-10-24 23:13:24,730][root][INFO] - Project Root: /home/123/isaacgym/python/Eureka/eureka
[2023-10-24 23:13:24,730][root][INFO] - Using LLM: gpt-4-0314
[2023-10-24 23:13:24,730][root][INFO] - Task: ShadowHand
[2023-10-24 23:13:24,730][root][INFO] - Task description: to make the shadow hand spin the object to a target orientation
[2023-10-24 23:13:24,758][root][INFO] - Iteration 0: Generating 4 samples with gpt-4-0314
[2023-10-24 23:14:01,699][root][INFO] - Iteration 0: Prompt Tokens: 1735, Completion Tokens: 1254, Total Tokens: 2989
[2023-10-24 23:14:01,699][root][INFO] - Iteration 0: Processing Code Run 0
[2023-10-24 23:14:13,181][root][INFO] - Iteration 0: Code Run 0 successfully training!
[2023-10-24 23:14:13,181][root][INFO] - Iteration 0: Processing Code Run 1
[2023-10-24 23:14:26,009][root][INFO] - Iteration 0: Code Run 1 successfully training!
[2023-10-24 23:14:26,009][root][INFO] - Iteration 0: Processing Code Run 2
[2023-10-24 23:14:26,009][root][INFO] - Iteration 0: Code Run 2 cannot parse function signature!
[2023-10-24 23:14:26,009][root][INFO] - Iteration 0: Processing Code Run 3
[2023-10-24 23:14:40,169][root][INFO] - Iteration 0: Code Run 3 successfully training!

Then the process got stuck at rl.communication() and didn't move forward anymore

with Traceback:
Traceback (most recent call last):
File "eureka.py", line 397, in
main()
File "/home/.local/lib/python3.8/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/home/.local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/home/.local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/home/.local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/homer/.local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/home/.local/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 119, in run
ret = run_job(
File "/home/.local/lib/python3.8/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "eureka.py", line 211, in main
rl_run.communicate()
File "/home/anaconda3/envs/HarryVae/lib/python3.8/subprocess.py", line 1020, in communicate
self.wait()
File "/home/anaconda3/envs/HarryVae/lib/python3.8/subprocess.py", line 1083, in wait
return self._wait(timeout=timeout)
File "/home/anaconda3/envs/HarryVae/lib/python3.8/subprocess.py", line 1822, in _wait
(pid, sts) = self._try_wait(0)
File "/home/anaconda3/envs/HarryVae/lib/python3.8/subprocess.py", line 1780, in _try_wait
(pid, sts) = os.waitpid(self.pid, wait_flags)

How to train on multiple GPUs

Can you tell me how to train with multiple GPUs, is it the same way as Isacc gym?

Error with Eureka Pen Spinning Demo

hello, I want to test the Eureka with given environment, when I test the traind Pen spinning in Eureka but there is some Error

`JointSpec type free not yet supported!
JointSpec type free not yet supported!
JointSpec type free not yet supported!
[Error] [carb.gym.plugin] *** Could not create contact graph to compute collision filters! Are contacts specified properly?

JointSpec type free not yet supported!
JointSpec type free not yet supported!
[Error] [carb.gym.plugin] *** Could not create contact graph to compute collision filters! Are contacts specified properly?

{'observation_space': Box(-inf, inf, (211,), float32), 'action_space': Box(-1.0, 1.0, (20,), float32), 'agents': 1, 'value_size': 1}
build mlp: 211
RunningMeanStd: (1,)
RunningMeanStd: (211,)
=> loading checkpoint '/home/tuna831/eureka/isaacgymenvs/isaacgymenvs/checkpoints/EurekaPenSpinning.pth'
세그멘테이션 오류 (코어 덤프됨)`

I tested other task environment like humanoid but that is good working.

how can I solve this error

An environmental issue

I followed the Git workflow to set up the entire environment and executed this command:
python eureka.py env=shadow_hand sample=16 iteration=2 model=gpt-3.5-turbo (In order to increase the success rate, a sample size of 16 was set.)

I've received the following log：
[2023-10-26 15:58:07,197][root][INFO] - Iteration 0: Code Run 0 execution error!
[2023-10-26 15:58:07,198][root][INFO] - Iteration 0: Processing Code Run 1
[2023-10-26 15:58:36,157][root][INFO] - Iteration 0: Code Run 1 execution error!
[2023-10-26 15:58:36,157][root][INFO] - Iteration 0: Processing Code Run 2
[2023-10-26 15:59:07,472][root][INFO] - Iteration 0: Code Run 2 execution error!
[2023-10-26 15:59:07,473][root][INFO] - Iteration 0: Processing Code Run 3
[2023-10-26 15:59:40,148][root][INFO] - Iteration 0: Code Run 3 execution error!
[2023-10-26 15:59:40,149][root][INFO] - Iteration 0: Processing Code Run 4
[2023-10-26 16:00:05,806][root][INFO] - Iteration 0: Code Run 4 execution error!
[2023-10-26 16:00:05,806][root][INFO] - Iteration 0: Processing Code Run 5
[2023-10-26 16:00:35,364][root][INFO] - Iteration 0: Code Run 5 execution error!
[2023-10-26 16:00:35,364][root][INFO] - Iteration 0: Processing Code Run 6
[2023-10-26 16:01:05,237][root][INFO] - Iteration 0: Code Run 6 execution error!
[2023-10-26 16:01:05,238][root][INFO] - Iteration 0: Processing Code Run 7
[2023-10-26 16:01:36,413][root][INFO] - Iteration 0: Code Run 7 execution error!
[2023-10-26 16:01:36,414][root][INFO] - Iteration 0: Processing Code Run 8
[2023-10-26 16:02:02,988][root][INFO] - Iteration 0: Code Run 8 execution error!
[2023-10-26 16:02:02,988][root][INFO] - Iteration 0: Processing Code Run 9
[2023-10-26 16:02:30,920][root][INFO] - Iteration 0: Code Run 9 execution error!
[2023-10-26 16:02:30,920][root][INFO] - Iteration 0: Processing Code Run 10
[2023-10-26 16:03:06,019][root][INFO] - Iteration 0: Code Run 10 execution error!
[2023-10-26 16:03:06,020][root][INFO] - Iteration 0: Processing Code Run 11
[2023-10-26 16:03:33,438][root][INFO] - Iteration 0: Code Run 11 execution error!
[2023-10-26 16:03:33,439][root][INFO] - Iteration 0: Processing Code Run 12
[2023-10-26 16:04:02,910][root][INFO] - Iteration 0: Code Run 12 execution error!
[2023-10-26 16:04:02,911][root][INFO] - Iteration 0: Processing Code Run 13
[2023-10-26 16:04:33,260][root][INFO] - Iteration 0: Code Run 13 execution error!
[2023-10-26 16:04:33,260][root][INFO] - Iteration 0: Processing Code Run 14
[2023-10-26 16:05:00,725][root][INFO] - Iteration 0: Code Run 14 execution error!
[2023-10-26 16:05:00,726][root][INFO] - Iteration 0: Processing Code Run 15
[2023-10-26 16:05:32,530][root][INFO] - Iteration 0: Code Run 15 execution error!

When I check each failed log, they all show the same error. Have you encountered this issue before?
Traceback (most recent call last):
File "/homeb/yulong/Eureka/Eureka-main/eureka/../isaacgymenvs/isaacgymenvs/train.py", line 214, in
launch_rlg_hydra()
File "/home/yulong/miniconda3/envs/eureka_v2/lib/python3.8/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/home/yulong/miniconda3/envs/eureka_v2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/home/yulong/miniconda3/envs/eureka_v2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/home/yulong/miniconda3/envs/eureka_v2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/home/yulong/miniconda3/envs/eureka_v2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/home/yulong/miniconda3/envs/eureka_v2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/home/yulong/miniconda3/envs/eureka_v2/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/home/yulong/miniconda3/envs/eureka_v2/lib/python3.8/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/home/yulong/miniconda3/envs/eureka_v2/lib/python3.8/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/homeb/yulong/Eureka/Eureka-main/eureka/../isaacgymenvs/isaacgymenvs/train.py", line 203, in launch_rlg_hydra
statistics = runner.run({
File "/homeb/yulong/Eureka/Eureka-main/rl_games/rl_games/torch_runner.py", line 124, in run
self.run_train(args)
File "/homeb/yulong/Eureka/Eureka-main/rl_games/rl_games/torch_runner.py", line 101, in run_train
self.agent.train()
File "/homeb/yulong/Eureka/Eureka-main/rl_games/rl_games/common/a2c_common.py", line 1251, in train
step_time, play_time, update_time, sum_time, a_losses, c_losses, b_losses, entropies, kls, last_lr, lr_mul = self.train_epoch()
File "/homeb/yulong/Eureka/Eureka-main/rl_games/rl_games/common/a2c_common.py", line 1115, in train_epoch
batch_dict = self.play_steps()
File "/homeb/yulong/Eureka/Eureka-main/rl_games/rl_games/common/a2c_common.py", line 686, in play_steps
self.obs, rewards, self.dones, infos = self.env_step(res_dict['actions'])
File "/homeb/yulong/Eureka/Eureka-main/rl_games/rl_games/common/a2c_common.py", line 504, in env_step
obs, rewards, dones, infos = self.vec_env.step(actions)
File "/homeb/yulong/Eureka/Eureka-main/isaacgymenvs/isaacgymenvs/utils/rlgames_utils.py", line 256, in step
return self.env.step(actions)
File "/homeb/yulong/Eureka/Eureka-main/isaacgymenvs/isaacgymenvs/tasks/base/vec_task.py", line 355, in step
self.post_physics_step()
File "/homeb/yulong/Eureka/Eureka-main/isaacgymenvs/isaacgymenvs/tasks/shadow_handgpt.py", line 653, in post_physics_step
File "/homeb/yulong/Eureka/Eureka-main/isaacgymenvs/isaacgymenvs/tasks/shadow_handgpt.py", line 370, in compute_reward
AttributeError: 'ShadowHandGPT' object has no attribute 'target_orientation'

HumanoidGPT environment error

Thank you for releasing this work! I was trying to run the humanoid example provided in the README, but consistently got this error:

Error executing job with overrides: ['task=HumanoidGPT', 'wandb_activate=False', 'wandb_entity=', 'wandb_project=', 'headless=True', 'capture_video=False', 'force_render=False', 'max_iterations=3000']
Traceback (most recent call last):
  File "/home/ubuntu/Eureka/eureka/../isaacgymenvs/isaacgymenvs/train.py", line 203, in launch_rlg_hydra
    statistics = runner.run({
  File "/home/ubuntu/Eureka/rl_games/rl_games/torch_runner.py", line 124, in run
    self.run_train(args)
  File "/home/ubuntu/Eureka/rl_games/rl_games/torch_runner.py", line 101, in run_train
    self.agent.train()
  File "/home/ubuntu/Eureka/rl_games/rl_games/common/a2c_common.py", line 1251, in train
    step_time, play_time, update_time, sum_time, a_losses, c_losses, b_losses, entropies, kls, last_lr, lr_mul = self.train_epoch()
  File "/home/ubuntu/Eureka/rl_games/rl_games/common/a2c_common.py", line 1115, in train_epoch
    batch_dict = self.play_steps()
  File "/home/ubuntu/Eureka/rl_games/rl_games/common/a2c_common.py", line 686, in play_steps
    self.obs, rewards, self.dones, infos = self.env_step(res_dict['actions'])
  File "/home/ubuntu/Eureka/rl_games/rl_games/common/a2c_common.py", line 504, in env_step
    obs, rewards, dones, infos = self.vec_env.step(actions)
  File "/home/ubuntu/Eureka/isaacgymenvs/isaacgymenvs/utils/rlgames_utils.py", line 256, in step
    return  self.env.step(actions)
  File "/home/ubuntu/Eureka/isaacgymenvs/isaacgymenvs/tasks/base/vec_task.py", line 355, in step
    self.post_physics_step()
  File "/home/ubuntu/Eureka/isaacgymenvs/isaacgymenvs/tasks/humanoidgpt.py", line 267, in post_physics_step
    self.compute_reward(self.actions)
  File "/home/ubuntu/Eureka/isaacgymenvs/isaacgymenvs/tasks/humanoidgpt.py", line 187, in compute_reward
    self.rew_buf[:], self.rew_dict = compute_reward(self.velocity, self.max_velocity, self.vel_scale)
AttributeError: 'HumanoidGPT' object has no attribute 'velocity'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I could get at least one executable code for the shadow hand example, but not with the humanoid. I tried finding where velocity is being defined in the environment (before it being referenced) and couldn't find it. Is this a bug with the humanoid environment? Thank you!

Error executing job with overrides: ['env=humanoid', 'sample=16', 'iteration=5', 'model=gpt-3.5-turbo-16k-0613']

Hi, I'm getting this error when I try to do Getting Started. Any tips on how to solve this?

(eureka) kamikazeknoedel@TechnoMage:~/Dokumente/Eureka/isaacgym/python/Eureka/eureka$ python eureka.py env=humanoid sample=16 iteration=5 model=gpt-3.5-turbo-16k-0613
2024-05-09 14:15:10,130][root][INFO] - Workspace: /home/kamikazeknoedel/Dokumente/Eureka/isaacgym/python/Eureka/eureka/outputs/eureka/2024-05-09_14-15-10
[2024-05-09 14:15:10,130][root][INFO] - Project Root: /home/kamikazeknoedel/Dokumente/Eureka/isaacgym/python/Eureka/eureka
[2024-05-09 14:15:10,130][root][INFO] - Using LLM: gpt-3.5-turbo-16k-0613
[2024-05-09 14:15:10,130][root][INFO] - Task: Humanoid
[2024-05-09 14:15:10,130][root][INFO] - Task description: to make the humanoid run as fast as possible
[2024-05-09 14:15:10,147][root][INFO] - Iteration 0: Generating 16 samples with gpt-3.5-turbo-16k-0613
[2024-05-09 14:15:16,864][httpx][INFO] - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-05-09 14:15:17,131][root][INFO] - Iteration 0: Prompt Tokens: 1085, Completion Tokens: 3129, Total Tokens: 4214
Error executing job with overrides: ['env=humanoid', 'sample=16', 'iteration=5', 'model=gpt-3.5-turbo-16k-0613']
Traceback (most recent call last):
  File "eureka.py", line 122, in main
    response_cur = responses[response_id]["message"]["content"]
TypeError: 'Choice' object is not subscriptable

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Asking for the release of best K generated code samples on IsaacGym and Dexterity

Dear Eureka project members,

I really appreciate the Eureka that shows the ability to redefine a better reward function via LLM. Since running your experiments is extremely expensive and I haven't found those code samples in your repository, I am wondering if your team could release your best K generated code samples on IsaacGym and Dexterity so that I can conduct broader exploration.

Best,
Chi-Chang Lee

Request for Information on Prompts Used in Curriculum Learning for Pen Spinning Task

This is a very interesting research. Thank you for making it public. I ran python eureka.py with the following settings and conducted training and testing. I got the results shown in the figure and video below.

Additionally, I am attempting to replicate the impressive Pen Spinning video that you have shared. You mentioned in your paper and on the project page that you utilized Curriculum Learning.

The pen spinning task requires a Shadow Hand to continuously rotate a pen to achieve some pre-defined spinning patterns for as many cycles as possible. We solve this task by (1) instructing Eureka to generate a reward function for re-orienting the pen to random target configurations, and then (2) fine-tuning this pre-trained policy using the Eureka reward to reach the desired sequence of pen-spinning configurations. As shown, Eureka fine-tuning quickly adapts the policy to successfully spin the pen for many cycles in a row. In contrast, neither pre-trained or learning-from-scratch policies can complete even a single cycle.

If you don't mind, could you please share the prompts you used during the (1) pre-training and (2) fine-tuning stages of Curriculum Learning?

defaults:
  - _self_
  - env: shadow_hand
  - override hydra/launcher: local
  - override hydra/output: local

hydra:
  job:
    chdir: True

# LLM parameters
model: gpt-4-0314  # LLM model (other options are gpt-4, gpt-4-0613, gpt-3.5-turbo-16k-0613)
temperature: 1.0
suffix: GPT  # suffix for generated files (indicates LLM model)

# Eureka parameters
iteration: 5 # how many iterations of Eureka to run
sample: 16 # number of Eureka samples to generate per iteration
max_iterations: 3000 # RL Policy training iterations (decrease this to make the feedback loop faster)
num_eval: 5 # number of evaluation episodes to run for the final reward
capture_video: False # whether to capture policy rollout videos

# Weights and Biases
use_wandb: False # whether to use wandb for logging
wandb_username: "" # wandb username if logging with wandb
wandb_project: "" # wandb project if logging with wandb

task: ShadowHand
env_name: shadow_hand
description: to make the shadow hand spin the object to a target orientation

rl-video-step-0.mp4

IndexError: list index out of range

hello ， thank you very much for your work. When I run "python eureka.py env=humanoid sample=16 iteration=5 model=gpt-3.5-turbo-16k-0613", it occurs "IndexError: list index out of range". What should I do to run it correctly?

Error from Eureka Pen Spinning Demo

Tried
cd isaacgymenvs/isaacgymenvs
python train.py test=True headless=False force_render=True task=ShadowHandSpin checkpoint=checkpoints/EurekaPenSpinning.pth
and I get

wHandSpin checkpoint=checkpoints/EurekaPenSpinning.pth
Importing module 'gym_38' (/home/ram/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/ram/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
PyTorch version 2.0.0+cu117
Device count 1
/home/ram/isaacgym/python/isaacgym/_bindings/src/gymtorch
Using /home/ram/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
Emitting ninja build file /home/ram/.cache/torch_extensions/py38_cu117/gymtorch/build.ninja...
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module gymtorch...
2023-10-21 20:22:16,606 - INFO - logger - logger initialized
:3: DeprecationWarning: invalid escape sequence *
Error: FBX library failed to load - importing FBX data will not succeed. Message: No module named 'fbx'
FBX tools must be installed from https://help.autodesk.com/view/FBX/2020/ENU/?guid=FBX_Developer_Help_scripting_with_python_fbx_installing_python_fbx_html
train.py:75: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_name="config", config_path="./cfg")
/home/ram/anaconda3/envs/eureka/lib/python3.8/site-packages/hydra/_internal/defaults_list.py:415: UserWarning: In config: Invalid overriding of hydra/job_logging:
Default list overrides requires 'override' keyword.
See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/defaults_list_override for more information.

deprecation_warning(msg)
Traceback (most recent call last):
File "/home/ram/anaconda3/envs/eureka/lib/python3.8/pathlib.py", line 1288, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/data2/jasonyma/isaac_gpt/train/2023-10-21_20-22-17'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ram/anaconda3/envs/eureka/lib/python3.8/pathlib.py", line 1288, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/data2/jasonyma/isaac_gpt/train'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ram/anaconda3/envs/eureka/lib/python3.8/pathlib.py", line 1292, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/home/ram/anaconda3/envs/eureka/lib/python3.8/pathlib.py", line 1292, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/home/ram/anaconda3/envs/eureka/lib/python3.8/pathlib.py", line 1292, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
[Previous line repeated 1 more time]
File "/home/ram/anaconda3/envs/eureka/lib/python3.8/pathlib.py", line 1288, in mkdir
self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/data2'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Getting error from the example given in the Geting Started page

When I ran
python eureka.py env=shadow_hand sample=4 iteration=2 model=gpt-4-0314
I get following output:

[2023-10-21 20:06:04,835][root][INFO] - Workspace: /home/ram/Eureka/eureka/outputs/eureka/2023-10-21_20-06-04
[2023-10-21 20:06:04,835][root][INFO] - Project Root: /home/ram/Eureka/eureka
[2023-10-21 20:06:04,835][root][INFO] - Using LLM: gpt-4-0314
[2023-10-21 20:06:04,835][root][INFO] - Task: ShadowHand
[2023-10-21 20:06:04,836][root][INFO] - Task description: to make the shadow hand spin the object to a target orientation
[2023-10-21 20:06:04,868][root][INFO] - Iteration 0: Generating 4 samples with gpt-4-0314
[2023-10-21 20:07:02,853][root][INFO] - Iteration 0: Prompt Tokens: 1735, Completion Tokens: 1992, Total Tokens: 3727
[2023-10-21 20:07:02,854][root][INFO] - Iteration 0: Processing Code Run 0
[2023-10-21 20:07:22,595][root][INFO] - Iteration 0: Code Run 0 execution error!
[2023-10-21 20:07:22,595][root][INFO] - Iteration 0: Processing Code Run 1
[2023-10-21 20:07:42,063][root][INFO] - Iteration 0: Code Run 1 execution error!
[2023-10-21 20:07:42,063][root][INFO] - Iteration 0: Processing Code Run 2
[2023-10-21 20:08:01,523][root][INFO] - Iteration 0: Code Run 2 execution error!
[2023-10-21 20:08:01,524][root][INFO] - Iteration 0: Processing Code Run 3
[2023-10-21 20:08:20,803][root][INFO] - Iteration 0: Code Run 3 execution error!
[2023-10-21 20:08:22,234][root][INFO] - All code generation failed! Repeat this iteration from the current message checkpoint!
[2023-10-21 20:08:22,235][root][INFO] - Iteration 1: Generating 4 samples with gpt-4-0314
[2023-10-21 20:08:48,630][root][INFO] - Iteration 1: Prompt Tokens: 1735, Completion Tokens: 1432, Total Tokens: 3167
[2023-10-21 20:08:48,631][root][INFO] - Iteration 1: Processing Code Run 0
[2023-10-21 20:09:05,350][root][INFO] - Iteration 1: Code Run 0 execution error!
[2023-10-21 20:09:05,350][root][INFO] - Iteration 1: Processing Code Run 1
[2023-10-21 20:09:25,084][root][INFO] - Iteration 1: Code Run 1 execution error!
[2023-10-21 20:09:25,084][root][INFO] - Iteration 1: Processing Code Run 2
[2023-10-21 20:09:44,618][root][INFO] - Iteration 1: Code Run 2 execution error!
[2023-10-21 20:09:44,618][root][INFO] - Iteration 1: Processing Code Run 3
[2023-10-21 20:10:03,932][root][INFO] - Iteration 1: Code Run 3 execution error!
[2023-10-21 20:10:05,409][root][INFO] - All code generation failed! Repeat this iteration from the current message checkpoint!
[2023-10-21 20:10:05,409][root][INFO] - All iterations of code generation failed, aborting...
[2023-10-21 20:10:05,409][root][INFO] - Please double check the output env_iter*_response*.txt files for repeating errors!

Application in different environment and reward type

Can this reward design algorithm be applied considering a different application setup?
For instance, could it be used in a custom environment that simulates a network with nodes and users and an RL algorithm is used to perform some kind of network optimization? In that case, the environment code should be fed to the LLM agent and natural language should be used to describe the task. Any intuition on how this could be implemented?

The entire procedure of the experiment

How can I provide feedback so that the system can try to modify the reward function based on human judgement, it is not possible to enable the capture_video while using the training method from the eureka.py, I tried, it will stuck

Unfair comparison! Human designed reward in the source code

In the environment source codes, the function compute_success provides the human-engineered reward function.
And they are added to the system message. So the reward generation is not zero-shot anymore, because ChatGPT4 can read the human reward function and optimize on top of it.

thoughts and follow-up questions after paper read

Hi,
Today I was reading your paper, and have few questions to ask:

model is frozen after training by the Eureka (along with it's GPT-4 API), when you deploy this model on the robot, does it need access
to GPT-4 API to do inference and change it's behavior as real world environment changes?
How do we incorporate explainability into Eureka if we want to understand why specific specific policy chosen?
How easy it is incorporate robot multi sensors data in robot perception for affect it's actions?
How easy it is to scale Eureka for Co-operative Multi agent (robot) systems?

Can this play pokemon on gameboy?

questions of the performance bar based on IsaacGym

Dear Eureka project members,

I really appreciate the Eureka that shows the ability to redefine a better reward function via LLM. For your bar result, I got two questions:

How do you handle the result that a sparse scheme is better than taking human design? I ran the experiment on the Quadcopter task. Without considering the survival condition, the sparse scheme is much better than the human design (In your bar result, it seems to have different result from mine).

Is the number of FrankaCabinet's results a typo? It seems that the bar that gets 12 times is the same high as the bar that gets 2 times.

Also, there are some tasks that will have an impulse in success performance when their survival conditions are unstable, and it might cause an unfair comparison, such as the FrankaCabinet and Anymal tasks. Perhaps it would be better to count survival as a part of performance?

best,
Chi-Chang Lee

Error with example code

I want to test with given example, but there is some error.

(eureka) js@js:~/isaacgym/python/Eureka/eureka$ python eureka.py env=shadow_hand sample=4 iteration=2 model=gpt-4-0314 Traceback (most recent call last): File "eureka.py", line 23, in <module> @hydra.main(config_path="cfg", config_name="config", version_base="1.3") TypeError: main() got an unexpected keyword argument 'version_base'
What is the solution to this problem?

Question about "Reward Generation Problem". Having env ground truth for RL seems weird.

Hi, my name is Ce Hao. I really appreciate the Eureka that can generate a reward function via LLM.
You formulate this problem as a "reward generation problem", and I agree with this point.

However, I have a question about the problem, you assume access to the source code of the environment and then apply RL algorithms. Usually, the ground truth of the environmental dynamics is not provided for RL methods. And the environmental model is just a black box for RL agents. That is why we rely on RL to explore the environment.

So in the real-world environment, where the source code is definitely infeasible, how can we use Eureka to generate a suitable reward function? Maybe it is a good future direction. Thanks.

Any known tested hardware recommendations please?

Is there any known working hand manipulator hardware with this code?

-

questions regarding use of Eureka for a real Robot

Hi,

I have following questions regarding use of Eureka for a real Robot:

What files are the better place to add real Robot SDK calls to drive real robot?
Are there any plans to integrate on-device LLM like llama instead of cloud access of GPT-4 or 3.5 to have near Real time Robot control?
Optionally, are there any plans to add four legged Robot similar to Spot in the simulator?

Release of ckpt for all tasks

Very nice work, Jason!

I wanna have a try on these provided tasks (especially dexterous hand tasks). Could the ckpts be provided? Currently it seems only the ckpt for one task (pen spinning) is provided.

Thank you so much!

Yanjie Ze

Cloning of Eureka repo URL in README should be https://github.com/eureka-research/Eureka

Adaptation for Isaac SIM

Thanks for your great work! Have you tested this kit on Isaac Sim? Considering the large user number of Isaac sim, I believe your project will have more impact if adapted to Isaac sim platform

Error with building wheels for numpy ?

Hi, I'm getting this error when I try to do step 3. Install Eureka.. I'm not installing IsaacGym because I'm looking to use it on my custom RL environment.. any tips on how to work around this?

note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for numpy
  Building wheel for antlr4-python3-runtime (setup.py) ... done
  Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144555 sha256=d5cef593d7513ad6f92861fc756c70fda5dbd5e3e9fd54b965bc4a39a6256068
  Stored in directory: /home/bdiu/.cache/pip/wheels/12/93/dd/1f6a127edc45659556564c5730f6d4e300888f4bca2d4c5a88
  Building wheel for lit (pyproject.toml) ... done
  Created wheel for lit: filename=lit-17.0.6-py3-none-any.whl size=93255 sha256=8fe2ac33a1cdc8025cbb6b833c4d09f6834ed03cba7df1e8286b5afad736dc49
  Stored in directory: /home/bdiu/.cache/pip/wheels/30/dd/04/47d42976a6a86dc2ab66d7518621ae96f43452c8841d74758a
Successfully built gym antlr4-python3-runtime lit
Failed to build numpy
ERROR: Could not build wheels for numpy, which is required to install pyproject.toml-based projects

Please advise on normalizing observations

I wrote my custom task, but the second output from my actor network is all NaNs and everything crashes. When I generate fake "observations" from normal distributions everything works. Can someone advise me how to solve this problem?