Coder Social home page Coder Social logo

eric-ai-lab / vlmbench Goto Github PK

View Code? Open in Web Editor NEW
77.0 4.0 8.0 97.3 MB

NeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"

License: MIT License

Python 96.99% Lua 2.63% Shell 0.38%
compositionality embodied-ai language-grounding robotic-manipulation vision-and-language

vlmbench's Introduction

VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation

task image missing

VLMbench is a robotics manipulation benchmark, which contains various language instructions on categorized robotic manipulation tasks. In this work, we aim to fill the blank of the last mile of embodied agents---object manipulation by following human guidance, e.g., “move the red mug next to the box while keeping it upright.” VLMbench is the first benchmark that compositional designs for vision-and-language reasoning on manipulations and categorizes the manipulation tasks from the perspectives of task constraints. Meanwhile, we introduce an Automatic Manipulation Solver (AMSolver), where modular rule-based task templates are created to automatically generate robot demonstrations with language instructions, consisting of diverse object shapes and appearances, action types, and motion constraints. Click here for website and paper.

This repo include the implementaions of AMSolver, VLMbench, and 6D-CLIPort.

News

03/04/2023

  • More starting codes have been added under the examples folder!

09/16/2022

  • The work has been accepted by NeurIPS 2022 (Datasets and Benchmarks) !

AMSolver Install

Users can use AMSolver to run the current tasks in the VLMbench or build new tasks. In order to run the AMSolver, you should install Coppliasim 4.1.0 and PyRep first. Then, lets install AMSolver:

pip install -r requirements.txt
pip install -r cliport/requirements.txt #Not needed if don't run 6d-cliport
pip install -e .

Then, copy the simAddOnScript_PyRep.lua from current folder into the Coppliasim folder:

cp ./simAddOnScript_PyRep.lua /Path/To/Coppliasim

Remember that whenever you re-install the PyRep, the file will be overwritten. Then, you should copy this again.

Running in headless server

In order to render observations in headless servers, users need to open the Xorg. First, ensure that the Nvidia driver is appropriately installed. Then, running the following commands:

screen
python ./startx.py 0 #The id of DISPLAY

Exit the screen session (CTRL+A, D). Any other commands should be run in the different sessions/terminals.

Now, you should find that the X servers are opened on each GPU. To render the application with the first GPU, you should add the following command before running other python codes:

export DISPLAY=:0.0 #Keep the first number as same as the argument of startx; the second number is the id of your gpu

VLMbench Baselines

The precollected full dataset can be found at here: Dataset. The smaller sample dataset can be found at here: Sample Dtaset. The dataset is under CC BY 4.0 license.

We also provide a script to automatically download the dataset by using gdrive.

bash ./download_dataset.sh -s /Save/Path/For/Dataset -p Dataset_split -t Tasks
# bash ./download_dataset.sh -h for more help on arguments

The pretrained models of all baselines can be found at here: Model

To test pretrained 6D-CLIPort models:

python vlm/scripts/cliport_test.py --task TASK_TO_TEST --data_folder /Path/to/VLMbench/Dataset/test --checkpoints_folder /Path/to/Pretained/Models

To train new 6D-CLIPort models:

python vlm/scripts/train_baselines.py --data_dir /Path/to/VLMbench/Dataset --train_tasks TASK_NEED_TO_TRAIN

Examples

We provided several example codes for getting start. Please check the code under examples folder. The gym_test.py shows how to run the vlmbench as gym environment. Ensure you have gym installed (pip install gymnasium)

Generate Customized Demonstrations

To generate new demonstrations for training and validation, users can set the output data directory in save_path parameter and run :

python tools/dataset_generator_NLP.py

Meanwhile, the test configurations can be generated by running:

python tools/test_config_generator.py

Add new objects and tasks

All object models are saved in vlm/object_models. To import new objects into vlmbench, users can use "vlm/object_models/save_model.py". We recommand users first save the object models as a coppeliasim model file (.ttm), then use the extra_from_ttm function inside the save_model.py. More examples can be found in save_model.py.

All tasks templates in the current vlmbench can be found in vlm/tasks. To generate new task templates, users can use "tools/task_builder_NLP".py for basic task template generation. Then, the varations of the task can be written as the child classes of the basic task template. More details can refer the codes of vlm/tasks.

Citation

@inproceedings{
zheng2022vlmbench,
title={{VLM}bench: A Compositional Benchmark for Vision-and-Language Manipulation},
author={Kaizhi Zheng and Xiaotong Chen and Odest Jenkins and Xin Eric Wang},
booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2022},
url={https://openreview.net/forum?id=NAYoSV3tk9}
}

vlmbench's People

Contributors

kzzheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

vlmbench's Issues

Solutions about OOB problems of gdrive

Recently, the original gdrive failed due to Google's new policy. To solve this problem, please check the detailed discussion in this issue. Here, I just summarize some key points of one working solution:

  1. Follow the instructions to install the latest Go (at least 1.18).
  2. Clone another gdrive repo.
  3. Get your clientId and clientSecret by following the instructions in the repo.
  4. Edited handlers_drive.go to include your credentials.
  5. Exected compile to get the binary file gdrive.
  6. Running gdrive about to authorize access to your google drive.

ImportError pyrep

Hello, I met an ImportError:

File "VLMbench/amsolver/__init__.py", line 5, in <module>
    import pyrep
ModuleNotFoundError: No module named 'pyrep'

Next I tried to install this package using pip. However, it raised another error:

File "VLMbench/amsolver/__init__.py", line 9, in <module>
    raise ImportError(
ImportError: PyRep version must be greater than 4.1.0.2. Please update PyRep.

The latest version of pyrep I can install is 3.2.0, and I am stuck here now :(

Evaluation issues

Hello, I checked your evaluation script cliport_test.py, but was confused about the setting of the beginning phase.

TwoStreamClipLingUNetLatTransporterAgent.act takes in observation and instruction of each step, but directly outputs the action of place (skipping pick). So, I wonder how can this work in the beginning of task evaluation when nothing is being picked. Or more generally, this is a question about how you extend to arbitrary number of steps (as claimed in paper) with two-stage agent (CLIPort). I mean, in other words, only when pick can be ignored (e.g. object already picked), directly applying place may be reasonable.

By the way, I cannot access waypoints information in the small sample dataset. Maybe the waypoints can provide some clues for the above question, but now I have no idea.

waypoint_type etc. None in waypoints_info in cliport_test

Hi VLMBench gurus,

I'm just trying to run the cliport_test script with the provided models.

I installed the code and dependencies, and downloaded the trained model, and the seen and unseen splits of data:
bash download_dataset.sh -s ~/vlmbench/Dataset -p valid_unseen -t pick
bash download_dataset.sh -s ~/vlmbench/Dataset -p valid_seen -t pick

When running cliport_test, waypoint_type etc. fields are all None in waypoints_info array.
python vlm/scripts/cliport_test.py --task pick --data_folder /Path/Dataset/valid --checkpoints_folder /Path/models

This is the waypoints_info array:
['waypoint0', None, None, 1, None, False, array([ 0.09891216, -0.09790423, 0.85907155, -0.64400905, 0.76425868,
-0.01948901, 0.02795067])], ['waypoint1', None, None, 1, None, False, array([ 0.10433818, -0.0974073 , 0.7792573 , -0.64400905, 0.76425868,
-0.01948901, 0.02795067])], ['waypoint2', None, None, 1, None, False, array([ 0.09755566, -0.09802846, 0.8790251 , -0.64400905, 0.76425868,
-0.01948901, 0.02795067])], ['waypoint3', None, None, 1, None, False, array([ 0.43130848, -0.15798603, 0.85353625, 0.39130512, 0.79267031,
-0.0298808 , 0.46654186])]

The code catches this error and says "need re-generate: /Path/valid/seen/pick_cube_shape/variation0/episodes/episode4".

Am I missing some installation step? Am I downloading the correct dataset files?

Thanks,
Le

How to edit obj part textures

Hi, thanks for your splendid work.

I would like to modify the specific obj's part texture, such as the a top drawer from the entire drawer obj.
From this line in pyrep, it seems to be possible to modify by code line, not modifying the obj texture itself. Is there any guidance or example code of this?

Thank you!

Error while running to task_reset()

When I tried to write my own python script to run, like RLBench, I encountered two problems: 1. The photos of the five cameras are all black. 2.When the code was executed to task.reset(), I encountered an error with the following specific information:
External call to simCallScriptFunction failed (_WriteCustomDataBlock@PyRep): Script function doe not exist.
After the error occurred, V-REP crashed.
In terminal:
Traceback (most recent call last):
File "test.py", line 77, in
descriptions, obs = task.reset()
File "/home/xiesenwei/robotics/VLMBench/vlmbench/amsolver/task_environment.py", line 93, in reset
desc = self._scene.init_episode(
File "/home/xiesenwei/robotics/VLMBench/vlmbench/amsolver/backend/scene.py", line 131, in init_episode
self.descriptions = self._active_task.init_episode(index)
File "/home/xiesenwei/robotics/VLMBench/vlmbench/vlm/tasks/drop_pen_color.py", line 26, in init_episode
return super().init_episode(index)
File "/home/xiesenwei/robotics/VLMBench/vlmbench/vlm/tasks/drop_pen.py", line 49, in init_episode
waypoints = GraspTask.get_path(try_ik_sampling=False, ignore_collisions=True)
File "/home/xiesenwei/robotics/VLMBench/vlmbench/amsolver/backend/unit_tasks.py", line 284, in get_path
WriteCustomDataBlock(waypoint.get_handle(),"waypoint_type","pre_grasp")
File "/home/xiesenwei/robotics/VLMBench/vlmbench/amsolver/backend/utils.py", line 276, in WriteCustomDataBlock
pyrep_utils.script_call('_WriteCustomDataBlock@PyRep', PYREP_SCRIPT_TYPE,
File "/home/xiesenwei/anaconda3/envs/vlm/lib/python3.8/site-packages/pyrep/backend/utils.py", line 65, in script_call
return sim.simExtCallScriptFunction(
File "/home/xiesenwei/anaconda3/envs/vlm/lib/python3.8/site-packages/pyrep/backend/sim.py", line 698, in simExtCallScriptFunction
_check_return(ret)
File "/home/xiesenwei/anaconda3/envs/vlm/lib/python3.8/site-packages/pyrep/backend/sim.py", line 27, in _check_return
raise RuntimeError(
RuntimeError: The call failed on the V-REP side. Return value: -1

Error: signal 11:

/home/xiesenwei/robotics/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04/libcoppeliaSim.so.1(_Z11_segHandleri+0x30)[0x7f4ba9411ae0]
/lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7f4c4bd49090]
/home/xiesenwei/robotics/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04/libQt5Core.so.5(_ZNK18QThreadStorageData3getEv+0x2b)[0x7f4ba6e3d1eb]
/home/xiesenwei/robotics/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04/libQt5OpenGL.so.5(_Z19qt_qgl_paint_enginev+0x2d)[0x7f4ba8ba61ed]
/home/xiesenwei/robotics/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04/libQt5Widgets.so.5(_ZN14QWidgetPrivate11repaint_sysERK7QRegion+0x94)[0x7f4ba84a5094]
/home/xiesenwei/robotics/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04/libQt5Widgets.so.5(_ZN14QWidgetPrivate16syncBackingStoreEv+0x5f)[0x7f4ba84bea6f]
/home/xiesenwei/robotics/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04/libQt5Widgets.so.5(_ZN7QWidget5eventEP6QEvent+0x300)[0x7f4ba84d5920]
/home/xiesenwei/robotics/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04/libQt5Widgets.so.5(_ZN19QApplicationPrivate13notify_helperEP7QObjectP6QEvent+0x9c)[0x7f4ba849792c]
/home/xiesenwei/robotics/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04/libQt5Widgets.so.5(_ZN12QApplication6notifyEP7QObjectP6QEvent+0x2b0)[0x7f4ba849ead0]
/home/xiesenwei/robotics/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04/libQt5Core.so.5(_ZN16QCoreApplication15notifyInternal2EP7QObjectP6QEvent+0x108)[0x7f4ba7005008]
QMutex: destroying locked mutex

Meanwhile, I encountered an error while running dataset generator NLP. py in the tools directory: The call failed on the V-REP side Return value: -1

Thanks for any help

Running headless mode raises OpenGL error.

Hi,
I'm trying to use this package and python examples/gym_test.py raises the following error:

This plugin does not support createPlatformOpenGLContext!


Error: signal 11:

<path_to_CoppeliaSim_Edu_V4_1_0>/libcoppeliaSim.so.1(_Z11_segHandleri+0x30)[0x2b256a2deae0]
/lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x2b255d1c7090]
<path_to_CoppeliaSim_Edu_V4_1_0>/libQt5Gui.so.5(_ZNK14QOpenGLContext10shareGroupEv+0x0)[0x2b256b992060]
<path_to_CoppeliaSim_Edu_V4_1_0>/libQt5Gui.so.5(_ZN16QOpenGLFunctions25initializeOpenGLFunctionsEv+0x4b)[0x2b256bc5ea4b]
<path_to_CoppeliaSim_Edu_V4_1_0>/libQt5Gui.so.5(_ZN24QOpenGLFramebufferObjectC1EiiNS_10AttachmentEjj+0xc8)[0x2b256bc62a18]
<path_to_CoppeliaSim_Edu_V4_1_0>/libsimExtOpenGL3Renderer.so(_ZN18CFrameBufferObjectC2Eii+0x5a)[0x2b25a0acf24a]
<path_to_CoppeliaSim_Edu_V4_1_0>/libsimExtOpenGL3Renderer.so(_ZN16COpenglOffscreenC1EiiiP14QOpenGLContext+0x72)[0x2b25a0acf602]
<path_to_CoppeliaSim_Edu_V4_1_0>/libsimExtOpenGL3Renderer.so(_Z21executeRenderCommandsbiPv+0x2550)[0x2b25a0acdb90]
<path_to_CoppeliaSim_Edu_V4_1_0>/libcoppeliaSim.so.1(_ZN16CPluginContainer11extRendererEiPv+0x19)[0x2b256a4a8249]
<path_to_CoppeliaSim_Edu_V4_1_0>/libcoppeliaSim.so.1(_ZN13CVisionSensor24_extRenderer_prepareViewEi+0x347)[0x2b256a1af107]
QMutex: destroying locked mutex

I have installed PyRep and CoppeliaSim, and I also set the following env variables:

export COPPELIASIM_ROOT=EDIT/ME/PATH/TO/COPPELIASIM/INSTALL/DIR
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$COPPELIASIM_ROOT
export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOT

Do you have any tips?

An issue about the obs

Thanks for your great work!
Are there any segmentation masks or bbox in the dataset?
And if you will consider add these annotations into the dataset or are there interfaces to extract these annotations in this work?

Dataset about long-horizon tasks

Hello! I am interested in obtaining the dataset about long-horizon tasks such as in Figure 1 of the paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation". This dataset would be extremely useful for advancing research in multi-task visual-and-language manipulation learning. If it is not possible to share the dataset, would it be possible to provide guidance on how to generate similar datasets, including long-horizon task trajectories with observations, abstract instructions, and decomposed sub-tasks?

Thank you for your time in considering this request.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.