Coder Social home page Coder Social logo

nvlabs / rvt Goto Github PK

View Code? Open in Web Editor NEW
156.0 9.0 24.0 121 KB

Official Code for RVT: Robotic View Transformer for 3D Object Manipulation

Home Page: https://robotic-view-transformer.github.io/

License: Other

Python 100.00%
object-manipulation rlbench robotics rvt

rvt's Introduction

PWC

RVT: Robotic View Transformer for 3D Object Manipulation
Ankit Goyal, Jie Xu, Yijie Guo, Valts Blukis, Yu-Wei Chao, Dieter Fox
CoRL 2023 (Oral)

If you find our work useful, please consider citing:

@article{,
  title={RVT: Robotic View Transformer for 3D Object Manipulation},
  author={Goyal, Ankit and Xu, Jie and Guo, Yijie and Blukis, Valts and Chao, Yu-Wei and Fox, Dieter},
  journal={CoRL},
  year={2023}
}

Getting Started

Install RVT

  • Tested (Recommended) Versions: Python 3.8. We used CUDA 11.1.

  • Step 1 (Optional): We recommend using conda and creating a virtual environment.

conda create --name rvt python=3.8
conda activate rvt
  • Step 2: Install PyTorch. Make sure the PyTorch version is compatible with the CUDA version. One recommended version compatible with CUDA 11.1 and PyTorch3D can be installed with the following command. More instructions to install PyTorch can be found here.
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
  • Step 3: Install PyTorch3D. One recommended version that is compatible with the rest of the library can be installed as follows. Note that this might take some time. For more instructions visit here.
curl -LO https://github.com/NVIDIA/cub/archive/1.10.0.tar.gz
tar xzf 1.10.0.tar.gz
export CUB_HOME=$(pwd)/cub-1.10.0
pip install 'git+https://github.com/facebookresearch/pytorch3d.git@stable'

Once you have downloaded CoppeliaSim, add the following to your ~/.bashrc file. (NOTE: the 'EDIT ME' in the first line)

export COPPELIASIM_ROOT=<EDIT ME>/PATH/TO/COPPELIASIM/INSTALL/DIR
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$COPPELIASIM_ROOT
export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOT
export DISLAY=:1.0

Remember to source your .bashrc (source ~/.bashrc) or .zshrc (source ~/.zshrc) after this.

  • Step 5: Clone the repository with the submodules using the following command.
git clone --recurse-submodules [email protected]:NVlabs/RVT.git && cd RVT && git submodule update --init

Now, locally install RVT and other libraries using the following command. Make sure you are in folder RVT.

pip install -e . 
pip install -e rvt/libs/PyRep 
pip install -e rvt/libs/RLBench 
pip install -e rvt/libs/YARR 
pip install -e rvt/libs/peract_colab
  • Step 6: Download dataset.
    • For experiments on RLBench, we use pre-generated dataset provided by PerAct. Please download and place them under RVT/rvt/data/xxx where xxx is either train, test, or val.

    • Additionally, we use the same dataloader as PerAct, which is based on YARR. YARR creates a replay buffer on the fly which can increase the startup time. We provide an option to directly load the replay buffer from the disk. We recommend using the pre-generated replay buffer (98 GB) as it reduces the startup time. You can either download replay.tar.xz which contains the replay buffer for all tasks or replay buffer for indidual tasks. After downloading, uncompress the replay buffer(s) (for example using the command tar -xf replay.tar.xz) and place it under RVT/rvt/replay/replay_xxx where xxx is either train or val. Note that is useful only if you want to train RVT from scratch and not needed if you want to evaluate the pre-trained model.

Using the library:

Training RVT

Default command

To train RVT on all RLBench tasks, use the following command (from folder RVT/rvt):

python train.py --exp_cfg_path configs/all.yaml --device 0,1,2,3,4,5,6,7

We use 8 V100 GPUs. Change the device flag depending on available compute.

More details about train.py
  • default parameters for an experiment are defined here.
  • default parameters for rvt are defined here.
  • the parameters in for experiment and rvt can be overwritten by two ways:
    • specifying the path of a yaml file
    • manually overwriting using a opts string of format <param1> <val1> <param2> <val2> ..
  • Manual overwriting has higher precedence over the yaml file.
python train.py --exp_cfg_opts <> --mvt_cfg_opts <> --exp_cfg_path <> --mvt_cfg_path <>

The following command overwrites the parameters for the experiment with the configs/all.yaml file. It also overwrites the bs parameters through the command line.

python train.py --exp_cfg_opts "bs 4" --exp_cfg_path configs/all.yaml --device 0

Evaluate on RLBench

Evaluate RVT on RLBench

Download the pretrained RVT model. Place the model (model_14.pth trained for 15 epochs or 100K steps) and the config files under the folder runs/rvt/. Run evaluation using (from folder RVT/rvt):

python eval.py --model-folder runs/rvt  --eval-datafolder ./data/test --tasks all --eval-episodes 25 --log-name test/1 --device 0 --headless --model-name model_14.pth
Evaluate the official PerAct model on RLBench

Download the officially released PerAct model. Put the downloaded policy under the runs folder with the recommended folder layout: runs/peract_official/seed0. Run the evaluation using:

python eval.py --eval-episodes 25 --peract_official --peract_model_dir runs/peract_official/seed0/weights/600000 --model-name QAttentionAgent_layer0.pt --headless --task all --eval-datafolder ./data/test --device 0 

Gotchas

  • If you get qt plugin error like qt.qpa.plugin: Could not load the Qt platform plugin "xcb", try uninstalling opencv-python and installing opencv-python-headless
pip uninstall opencv-python                                                                                         
pip install opencv-python-headless
  • If you have CUDA 11.7, an alternate installation strategy could be to use the following command for Step 2 and Step 3. Note that this is not heavily tested.
# Step 2:
pip install pytorch torchvision torchaudio
# Step 3:
pip install 'git+https://github.com/facebookresearch/pytorch3d.git@stable'
  • If you are having issues running evaluation on a headless server, please refer to #2 (comment).

  • If you want to generate visualization videos, please refer to #5.

FAQ's

Q. What is the advantag of RVT over PerAct?

RVT is both faster to train and performs better than PerAct.

Q. What resources are required to train RVT?

For training on 18 RLBench tasks, with 100 demos per task, we use 8 V100 GPUs (16 GB memory each). The model trains in ~1 day.

Note that for fair comparison with PerAct, we used the same dataset, which means duplicate keyframes are loaded into the replay buffer. For other datasets, one could consider not doing so, which might further speed up training.

Q. Why do you use pe_fix=True in the rvt config?

For fair comparison with offical PerAct model, we use this setting. More detials about this can be found in PerAct code. For future, we recommend using pe_fix=False for language input.

Q. Why are the results for PerAct different from the PerAct paper?

In the PerAct paper, for each task, the best checkpoint is chosen based on the validation set performance. Hence, the model weights can be different for different tasks. We evaluate PerAct and RVT only on the final checkpoint, so that all tasks are strictly evaluated on the same model weights. Note that only the final model for PerAct has been released officially.

Q. Why is there a variance in performance on RLBench even when evaluting the same checkpoint?

We hypothesize that it is because of the sampling based planner used in RLBench, which could be the source of the randomization. Hence, we evaluate each checkpoint 5 times and report mean and variance.

Q. Why did you use a cosine decay learning rate scheduler instead of a fixed learning rate schedule as done in PerAct?

We found the cosine learning rate scheduler led to faster convergence for RVT. Training PerAct with our training hyper-parameters (cosine learning rate scheduler and same number of iterations) led to worse performance (in ~4 days of training time). Hence for Fig. 1, we used the official hyper-parameters for PerAct.

Q. For my use case, I want to render images at real camera locations (input camera poses) with PyTorch3D. Is it possible to do so and how can I do that?

Yes, it is possible to do so. A self-sufficient example is present here. Depending on your use case, the code may need be modified. Also note that 3D augmentation cannot be used while rendering images at real camera locations as it would change the pose of the camera with respect to the point cloud.

For questions and comments, please contact Ankit Goyal.

Acknowledgement

We sincerely thank the authors of the following repositories for sharing their code.

License

License Copyright © 2023, NVIDIA Corporation & affiliates. All rights reserved.

This work is made available under the Nvidia Source Code License. The pretrained RVT model is released under the CC-BY-NC-SA-4.0 license.

rvt's People

Contributors

imankgoyal avatar markusgrotz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rvt's Issues

Query Regarding Training Time and Default Parameters on RLBench Task with 8 A100 GPUs

Hello there,I hope you're doing well. I wanted to share my experience with your project. I've been using 8 A100 GPUs, each with 40GB of memory, to train an RLBench task. However, I've noticed that training a single epoch takes around 12 hours, and I'm using the default parameters provided by you. I'm curious if there might be an issue somewhere in my setup. I'd greatly appreciate your insights and guidance on this matter.
Thank you for your time and effort in developing this project. Looking forward to your response.

About the tp1 in replay

Hi @imankgoyal,

Thank you for your great work. I have one question about the "tp1" keys in the replay_samples, like 'front_rgb_tp1', 'gripper_pose_tp1' etc. From the replay buffer implementation, I think the tp1 keys are used to store the next key_frame's observations and actions. So in the RVT training, the tp1 observations and actions are not used? That means we just need the current transition's observation and actions for extracting features and supervision.

Is my understanding correct? If there are any mistakes, please correct me.

Thank you very much.

`place_with_mean` argument during training vs eval

Hi,
Thank you for your great work!

I am confused by how you set place_with_mean variable.

In rvt/configs/all.yaml, it is set to False so I assume during training, you've used the center of scene bounds to normalize the point clouds in place_pc_in_cube.

However in eval code, by default, it is set to True. (args.use_input_place_with_mean is False by default, so here it becomes True, and therefore RVTAgent class object normalizes the point cloud with the point cloud mean instead of the scene bound which was used during training.

I have ran the evaluation code on your released model on stack_cups, and the experiment with your setting gets better results. (24% success rate vs 12%)

So my questions are:

  1. What is the reason behind doing different normalization during training vs eval?
  2. Any intuitions on why the model normalized with scene bound center during training performs better with a different normalization during eval phase?

Training Network on Real Franka Panda Arm

Hi,

I appreciate all the hard work being done on this project! I am currently trying to train the network on a real Franka Panda arm and am hoping for some guidance. I have the following questions that I'd appreciate if you could help answer:

Could you please elucidate on the data collection process that was followed?

I'd like to understand more about the eye-hand calibration. Specifically, I'm curious to know how the rvt output action was projected to the robot world frame?

I'm also looking for any specific suggestions or best practices that I should keep in mind when training on real-world data?

Thank you in advance for your support!

paper question

Thank you very much for the work you've done. May I ask a question? Can I interpret your work as being based on multi-view? I'm curious about the primary difference between your approach and multi-view. If someone were to emulate your work using just multi-view for their experiments, would they outperform you?

Sorry for the disturbance and thank you in advance.

Code for evaluating on real Franka Panda

Hi,

Thank you for such a great paper. In your demonstrations I have noted that you evaluated your network on real Franka Panda as well, and I was wondering if you could publish the code for evaluation on real robot as well.

Thank you in advance.

Questions about inference speed

Hi, thanks for your great work! May I ask how you compute fps for peract and rvt? since I also tested it, but I got a pretty high fps for both peract and rvt, I wonder if there is anything wrong with that

Data collection issues

Hi @imankgoyal

Firstly, I would like to express my gratitude for your exceptional work. It's truly inspiring and helpful.

I have a question related to data handling, specifically regarding keyframes. Let's consider a scenario where there are three keyframes, and the first keyframe is at frame 36. When collecting data for frames 10, 20, and 30, it appears that these three keyframes are recollected, and the timestep uniformly decreases. Moreover, in the low_dim_state, frames 10, 20, and 30 all seem to be treated as the starting point of the event, with the timestep set to 1 for each.

However, I'm a bit puzzled about the handling of frame 40. Why is the timestep reset to 1 at this point? Additionally, during the evaluation process, I haven't noticed any reset of the timestep. In my opinion, it might be more consistent to disregard the timestep. What are your thoughts on this approach? Any clarification or insight you could provide would be greatly appreciated.

Thank you for your time and consideration.

Best regards,
LemonWade

AttributeError: 'CustomMultiTaskRLBenchEnv2' object has no attribute 'get_ground_truth_action'

Hello, thank you for the wonderful work!

I would like to check demos saved. For this I run the following script:

python eval.py --model-folder runs/rvt --eval-datafolder ./data/val --tasks close_jar --eval-episodes 25 --log-name val/1 --device 0 --model-name model_14.pth --save-video --ground-truth

It fails with the following error:

Traceback (most recent call last):
  File "/home/projects/RVT/rvt/eval.py", line 540, in <module>
    _eval(args)
  File "/home/projects/RVT/rvt/eval.py", line 494, in _eval
    scores = eval(
  File "/home/miniconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/projects/RVT/rvt/eval.py", line 284, in eval
    raise e
  File "/home/projects/RVT/rvt/eval.py", line 278, in eval
    for replay_transition in generator:
  File "/home/projects/RVT/rvt/libs/YARR/yarr/utils/rollout_generator.py", line 43, in generator
    actions = env.get_ground_truth_action(eval_demo_seed)
AttributeError: 'CustomMultiTaskRLBenchEnv2' object has no attribute 'get_ground_truth_action'

About models training

Thank you for your great work!

When I trained the model from scratch, I encountered that the training stucks like this:
Screenshot 2023-10-09 at 12 20 04 PM.

The GPU utilization is MAX, but the GPU memory is very low.
Screenshot 2023-10-09 at 12 19 50 PM

So is it normal, or have you encountered the same situation?

Thanks.

KeyError: 'lang_goal_tokens'

Thank you very much for your work. Below is a bug I encountered while reproducing

GZTNRH%1TZSM}7(9S}D3V

I downloaded and decompressed the data and replay for a single task. Later, due to the deprecation of np.bool in numpy, I replaced all instances of np.bool with np.bool_. When I executed the training code again python train.py --exp_cfg_path configs/all_100.yaml --device 0, I encountered a KeyError: 'lang_goal_tokens'. Did I do something wrong?

configs/all_100.yaml

exp_id: rvt
tasks: slide_block_to_color_target
bs: 3
num_workers: 3
epochs: 15
sample_distribution_mode: task_uniform
peract:
  lr: 1e-4
  warmup_steps: 2000
  optimizer_type: lamb
  lr_cos_dec: True
  transform_augmentation_xyz: [0.125, 0.125, 0.125]
  transform_augmentation_rpy: [0.0, 0.0, 45.0]
rvt:
  place_with_mean: False

logs

(rvt-zzy) root@7708b7cca4e2:/data/zzy/RVT/rvt# python train.py --exp_cfg_path configs/all_100.yaml --device 0              
dict(exp_cfg)={'agent': 'our', 'tasks': 'slide_block_to_color_target', 'exp_id': 'rvt', 'resume': '', 'bs': 3, 'epochs': 15, 'num_workers': 3, 'sample_distribution_mode': 'task_uniform', 'peract': CfgNode({'lambda_weight_l2': 1e-06, 'lr': 0.00030000000000000003, 'optimizer_type': 'lamb', 'warmup_steps': 2000, 'lr_cos_dec': True, 'add_rgc_loss': True, 'num_rotation_classes': 72, 'transform_augmentation': True, 'transform_augmentation_xyz': [0.125, 0.125, 0.125], 'transform_augmentation_rpy': [0.0, 0.0, 45.0]}), 'rvt': CfgNode({'gt_hm_sigma': 1.5, 'img_aug': 0.1, 'place_with_mean': False, 'move_pc_in_bound': True}), 'peract_official': CfgNode({'cfg_path': 'configs/peract_official_config.yaml'})}
Training on 1 tasks: ['slide_block_to_color_target']
[Info] Replay dataset already exists in the disk: replay/replay_train/slide_block_to_color_target
Created Dataset. Time Cost: 0.21861758629480998 minutes
MVT Vars: {'training': True, '_parameters': OrderedDict(), '_buffers': OrderedDict(), '_non_persistent_buffers_set': set(), '_backward_hooks': OrderedDict(), '_is_full_backward_hook': None, '_forward_hooks': OrderedDict(), '_forward_pre_hooks': OrderedDict(), '_state_dict_hooks': OrderedDict(), '_load_state_dict_pre_hooks': OrderedDict(), '_load_state_dict_post_hooks': OrderedDict(), '_modules': OrderedDict(), 'depth': 8, 'img_feat_dim': 3, 'img_size': 220, 'add_proprio': True, 'proprio_dim': 4, 'add_lang': True, 'lang_dim': 512, 'lang_len': 77, 'im_channels': 64, 'img_patch_size': 11, 'final_dim': 64, 'attn_dropout': 0.1, 'decoder_dropout': 0.0, 'self_cross_ver': 1, 'add_corr': True, 'add_pixel_loc': True, 'add_depth': True, 'pe_fix': True}
Start training ...
Rank [0], Epoch [0]: Training on train dataset
  0%|                                                                                                                                                         | 0/53333 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 300, in <module>
    mp.spawn(experiment, args=(cmd_args, devices, port), nprocs=len(devices), join=True)
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/data/zzy/RVT/rvt/train.py", line 260, in experiment
    out = train(agent, train_dataset, TRAINING_ITERATIONS, rank)
  File "/data/zzy/RVT/rvt/train.py", line 54, in train
    raw_batch = next(data_iter)
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 39, in fetch
    data = next(self.dataset_iter)
  File "/data/zzy/RVT/rvt/libs/YARR/yarr/replay_buffer/wrappers/pytorch_replay_buffer.py", line 41, in _generator
    yield self._replay_buffer.sample_transition_batch(pack_in_dict=True, distribution_mode = self._sample_distribution_mode)
  File "/data/zzy/RVT/rvt/libs/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 803, in sample_transition_batch
    store = self._get_from_disk(
  File "/data/zzy/RVT/rvt/libs/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 456, in _get_from_disk
    store[k][i] = v # NOTE: potential bug here, should % self._replay_capacity
KeyError: 'lang_goal_tokens'

Questions about the scene bounds and img aug in real-robot experiments

Thank you for your great work.

I am trying to run the real-robot experiments with RVT, and I have two questions:

  1. Do you move the point clouds into the bound in real-robot experiments? If so, how to identify the scene bounds for the real-robot experiments like in
    pc, img_feat, self.scene_bounds, no_op=not self.move_pc_in_bound
  2. Do you utilize any augmentation for real-robot training like in
    action_trans_con, action_rot, pc = apply_se3_aug_con(

Thank you very much!

How much memory is required on the GPU?

Hi,

I am running the code on a 4GB NVIDIA RTX 3050 GPU. I am getting a runtime error while running the eval.py for RVT
python eval.py --model-folder runs/rvt --eval-datafolder ./data/test --tasks close_jar --eval-episodes 25 --log-name test/1 --device 0 --headless --model-name model_14.pth

Error:
RuntimeError: CUDA out of memory. Tried to allocate 610.00 MiB (GPU 0; 3.81 GiB total capacity; 1.77 GiB already allocated; 531.69 MiB free; 1.82 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Is there a way I can avoid it?
Setting max_split_size_mb does not help. I tried export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 before running the code. But I am getting the same error.

add_final and repeat in replay

Thanks for your impressive work!
When reviewing the replay you provided, I found that each episode has a replay with ternimal=-1, which has the same value as the replay before it with ternimal=True. Is this necessary or out of any special consideration?

And also there are many repeated replays generated by the loop

RVT/rvt/utils/dataset.py

Lines 379 to 413 in 0b170d7

for i in range(len(demo) - 1):
if not demo_augmentation and i > 0:
break
if i % demo_augmentation_every_n != 0: # choose only every n-th frame
continue
obs = demo[i]
desc = descs[0]
# if our starting point is past one of the keypoints, then remove it
while (
next_keypoint_idx < len(episode_keypoints)
and i >= episode_keypoints[next_keypoint_idx]
):
next_keypoint_idx += 1
if next_keypoint_idx == len(episode_keypoints):
break
_add_keypoints_to_replay(
replay,
task,
task_replay_storage_folder,
d_idx,
i,
obs,
demo,
episode_keypoints,
cameras,
rlbench_scene_bounds,
voxel_sizes,
rotation_resolution,
crop_augmentation,
next_keypoint_idx=next_keypoint_idx,
description=desc,
clip_model=clip_model,
device=device,
)
with the following keypoints repeat over and over again. Why should do so?
Thanks in advance!

[Training] How to evaluate when training?

Hi, RVT is an amazing work, it is efficient and effective comparing with other works. At the same time, I want to ask some question about the training procedure:

  1. Do you add a validation period when training, like evaluting on the validation dataset at the end of one epoch. And if I want to do so, how can I do?
  2. How to select the best checkpoint when training?
  3. Have you tried to generate more training data to enhance the model performance?

Question about the up direction of "front" and "back" camera in renderer.py

Hi @imankgoyal, thanks for your great work.

While reviewing the code in rvt/mvt/renderer.py, I noticed in the get_cube_R_T function that the up direction for the five cameras is defined as follows:

    for view in elev_azim:
        if view in ["left", "right"]:
            up.append((0, 0, 1))
        else:
            up.append()

It appears that the front and back cameras are aligned along the Y-axis as defined in RLBench. However, I'm curious about the up direction being set to (0, 1, 0) for these cameras. Since this also represents the Y-axis, wouldn't this configuration lead to a potential conflict in the camera orientation?

I apologize for any inconvenience this question may cause, and I appreciate your time in addressing this query.
Thank you!

Run parameters

Hello, I would like to ask if there are any optional options for the parameters at run time

question about learning rate

hi, just a quick question regarding learning rate. in your paper it says "an initial learning rate of 2.4 × 10−4" but in code it shows 1 x 10-4 , if we multiply by 3*8, it would be 2.4 × 10−3, would that be right?

This error occurs when I run eval.py

I wanted to run eval.py, but this error occurred.

2023-07-04 15:30:38.419021: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-07-04 15:30:38.465609: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-04 15:30:39.192211: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
MVT Vars: {'training': True, '_parameters': OrderedDict(), '_buffers': OrderedDict(), '_non_persistent_buffers_set': set(), '_backward_pre_hooks': OrderedDict(), '_backward_hooks': OrderedDict(), '_is_full_backward_hook': None, '_forward_hooks': OrderedDict(), '_forward_hooks_with_kwargs': OrderedDict(), '_forward_pre_hooks': OrderedDict(), '_forward_pre_hooks_with_kwargs': OrderedDict(), '_state_dict_hooks': OrderedDict(), '_state_dict_pre_hooks': OrderedDict(), '_load_state_dict_pre_hooks': OrderedDict(), '_load_state_dict_post_hooks': OrderedDict(), '_modules': OrderedDict(), 'depth': 8, 'img_feat_dim': 3, 'img_size': 220, 'add_proprio': True, 'proprio_dim': 4, 'add_lang': True, 'lang_dim': 512, 'lang_len': 77, 'im_channels': 64, 'img_patch_size': 11, 'final_dim': 64, 'attn_dropout': 0.1, 'decoder_dropout': 0.0, 'self_cross_ver': 1, 'add_corr': True, 'add_pixel_loc': True, 'add_depth': True, 'pe_fix': True}
Agent Information
<rvt.models.rvt_agent.RVTAgent object at 0x7f81b1d98100>
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/data/cjy/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, webgl, xcb.

Rendered image with color noise in the background

Hi,

When visualizing the rendered images, I found much color noise in the background(which should be dark ideally), different from Figure 5 in your great paper. Can the color noise be removed with a special render setting? or do you manually remove it from the rendered image for better visualization?
Looking forward to your reply~

Best,
Jerry

4

Issues while running the training script

Hi, Thanks for your great work!!

I am trying to run the training script as it is without using the replay buffer.
I am getting the following errors.

Start training ...
Rank [0], Epoch [0]: Training on train dataset
Rank [0], Epoch [0]: Training on train dataset
Rank [0], Epoch [0]: Training on train dataset
Rank [0], Epoch [0]: Training on train dataset
Traceback (most recent call last):
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/train.py", line 300, in
mp.spawn(experiment, args=(cmd_args, devices, port), nprocs=len(devices), join=True)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 3 terminated with the following error:
Traceback (most recent call last):
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/train.py", line 260, in experiment
out = train(agent, train_dataset, TRAINING_ITERATIONS, rank)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/train.py", line 54, in train
raw_batch = next(data_iter)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 39, in fetch
data = next(self.dataset_iter)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/libs/YARR/yarr/replay_buffer/wrappers/pytorch_replay_buffer.py", line 41, in _generator
yield self._replay_buffer.sample_transition_batch(pack_in_dict=True, distribution_mode = self._sample_distribution_mode)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/libs/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 772, in sample_transition_batch
indices = self.sample_index_batch(batch_size, distribution_mode)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/libs/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 706, in sample_index_batch
state_index = np.random.randint(low = self._task_replay_start_index[task_index],
File "mtrand.pyx", line 765, in numpy.random.mtrand.RandomState.randint
File "_bounded_integers.pyx", line 1247, in numpy.random._bounded_integers._rand_int64
ValueError: high <= 0

Traceback (most recent call last):
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/train.py", line 300, in
mp.spawn(experiment, args=(cmd_args, devices, port), nprocs=len(devices), join=True)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/train.py", line 260, in experiment
out = train(agent, train_dataset, TRAINING_ITERATIONS, rank)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/train.py", line 54, in train
raw_batch = next(data_iter)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 39, in fetch
data = next(self.dataset_iter)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/libs/YARR/yarr/replay_buffer/wrappers/pytorch_replay_buffer.py", line 41, in _generator
yield self._replay_buffer.sample_transition_batch(pack_in_dict=True, distribution_mode = self._sample_distribution_mode)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/libs/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 772, in sample_transition_batch
indices = self.sample_index_batch(batch_size, distribution_mode)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/libs/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 706, in sample_index_batch
state_index = np.random.randint(low = self._task_replay_start_index[task_index],
File "mtrand.pyx", line 765, in numpy.random.mtrand.RandomState.randint
File "_bounded_integers.pyx", line 1247, in numpy.random._bounded_integers._rand_int64
ValueError: high <= 0

Traceback (most recent call last):
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/train.py", line 300, in
mp.spawn(experiment, args=(cmd_args, devices, port), nprocs=len(devices), join=True)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 3 terminated with the following error:
Traceback (most recent call last):
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/train.py", line 260, in experiment
out = train(agent, train_dataset, TRAINING_ITERATIONS, rank)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/train.py", line 54, in train
raw_batch = next(data_iter)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 39, in fetch
data = next(self.dataset_iter)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/libs/YARR/yarr/replay_buffer/wrappers/pytorch_replay_buffer.py", line 41, in _generator
yield self._replay_buffer.sample_transition_batch(pack_in_dict=True, distribution_mode = self._sample_distribution_mode)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/libs/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 772, in sample_transition_batch
indices = self.sample_index_batch(batch_size, distribution_mode)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/libs/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 706, in sample_index_batch
state_index = np.random.randint(low = self._task_replay_start_index[task_index],
File "mtrand.pyx", line 765, in numpy.random.mtrand.RandomState.randint
File "_bounded_integers.pyx", line 1247, in numpy.random._bounded_integers._rand_int64
ValueError: high <= 0

Traceback (most recent call last):
File "", line 1, in
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "", line 1, in
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/train.py", line 300, in
mp.spawn(experiment, args=(cmd_args, devices, port), nprocs=len(devices), join=True)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 3 terminated with the following error:
Traceback (most recent call last):
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/train.py", line 260, in experiment
out = train(agent, train_dataset, TRAINING_ITERATIONS, rank)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/train.py", line 54, in train
raw_batch = next(data_iter)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 39, in fetch
data = next(self.dataset_iter)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/libs/YARR/yarr/replay_buffer/wrappers/pytorch_replay_buffer.py", line 41, in _generator
yield self._replay_buffer.sample_transition_batch(pack_in_dict=True, distribution_mode = self._sample_distribution_mode)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/libs/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 772, in sample_transition_batch
indices = self.sample_index_batch(batch_size, distribution_mode)
File "/home/kiyogi/harsh/RVT_related_stuff/RVT/RVT/rvt/libs/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 706, in sample_index_batch
state_index = np.random.randint(low = self._task_replay_start_index[task_index],
File "mtrand.pyx", line 765, in numpy.random.mtrand.RandomState.randint
File "_bounded_integers.pyx", line 1247, in numpy.random._bounded_integers._rand_int64
ValueError: low >= high

srun: error: gpu-11: task 2: Exited with exit code 1
Traceback (most recent call last):
File "", line 1, in
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "", line 1, in
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/home/kiyogi/miniconda3/envs/rvt/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
srun: error: gpu-11: tasks 0-1: Exited with exit code 1
srun: error: gpu-11: task 3: Exited with exit code 1

What is low_dim_state.

What do the 4 dimensions in obs["low_dim_state"] correspond to? I think the last one is the gripper open/closed bit?

Thanks!

Does the line 150 give a Cartesian coordinate of a pixel?

RVT/rvt/mvt/mvt_single.py

Lines 146 to 157 in 0b170d7

self.pixel_loc = torch.zeros(
(self.num_img, 3, self.img_size, self.img_size)
)
self.pixel_loc[:, 0, :, :] = (
torch.linspace(-1, 1, self.num_img).unsqueeze(-1).unsqueeze(-1)
)
self.pixel_loc[:, 1, :, :] = (
torch.linspace(-1, 1, self.img_size).unsqueeze(0).unsqueeze(-1)
)
self.pixel_loc[:, 2, :, :] = (
torch.linspace(-1, 1, self.img_size).unsqueeze(0).unsqueeze(0)
)

It seems that lines 153 and 156 are giving Cartesian coordinates of a pixel (maybe x and y). However, line 150 does not give anything because it is only related to num_images.

Is it a bug?

About model training in the real-world

Thanks for your great work!
I want to know if training in the real-world starts from scratch or if it involves fine-tuning based on the model trained in the simulator.

Query Regarding Translation Loss Convergence in Real Robot Training

Hi. First of all many thanks for the excellent paper and great codebase.
In the paper you mention that single RVT model can solve the task with just ~10 demos per task, so recently I have been collecting data (10 simple tasks with 10 demos each) with my setup (similar to what you have, a single RGBD camera and a smaller robot arm). While I do the training. I found out that Rotation Loss (x,y,z) are welly converge but the Translation Loss aren't converge well, the translation loss only drops to ~4 while rotation evaluation loss converges to ~0.2. Given this, I would like to ask the following question for some clarity.

  1. When training on real robot data, did you start the training from scratch or was there any preliminary training involved using simulations?
  2. During your experiments with real robot data, did you observe a similar convergence pattern between translation and rotation?

Thank you in advance for your insights!

Why is time an essential part of low_dim_state?

Hi thank you for your impressive work.
Time seems to be an essential part of the low_dim_state which confused me. Why does it matter? If we can view most point on a trajectory (if not all point) as the starting point of a new sub-trajectory, does that mean we can scale the time dimension arbitrarily? However it does not seems to work when time was not added as part of the low_dim_state, or be scaled. Any insights on this design?

About Video Visualization

Hi,

Thank you for the great video. In your demonstrations I have noted that you show the simulation video while I don't find how to save or visualize the result in the eval.py. Can you provide any instruction about how to do it? (I assume it would be like setting cinematic_recorder.enabled=True as PerAct does, but i don't find it in this codebase)

Thank you in advance.

question about camera calibration in real world

Thanks for your excellent work!
I currently have two RealSense cameras, and I want to use these two cameras to generate a point cloud for use in RVT. I already know how to calibrate the intrinsic parameters of each camera and perform hand-eye calibration for a single camera with the robotic arm. How should I calibrate these two cameras to ensure that the point clouds from both cameras are aligned? Additionally, how can I determine the orientation of the point cloud with respect to the front of the robotic arm?
Thank you!

Details about the real-world experiments

Hello! I am trying to reproduce the real-world results. I have one question: it is reported in the paper that only a thrid-person view camera is mounted. I was wondering where the camera is located approximately. Is it located just in the front view as in the simulations. Thank you very much ~

Number of key-frames

Hi! Thank you for an interesting work. I'm trying to train RVT on real data. Could you please tell how much key-frames were collected for a real episode on average?

Questions about the details

Hi, thanks for your splendid work! I am deeply impressed by your work.
I have some questions regarding some implementation details and your paper.

  1. How much time does it take for a single re-rendering process? Also, I am curious of the intention of how you decided to proceed the re-rendering process.
  2. Have you trained your model in less number of demonstrations in simulator? I am asking this because you've used 100 demonstrations per task in simulation, and 7-14 demonstrations per task in real-world; It is quite not realistic that real-world train better with much less number of demonstrations compared to the simulation.
  3. May I ask the details of without view correspondence ablation experiment (third model in Table 2 Left), compared to the original code?
  4. I wonder about the quality of the re-rendered results. From Figure2 in c.virtual images, image degradation kinda working (i.e. black jitter in the background), does the model still work robust in this poor setting too?
  5. I would like to ask details from [Appendix A.2 Experiments on dataset with input cameras in orthogonal configuration]. Regarding the model that directly uses input camera images (27.2%), is there any augmentation technique applied here, such as adding camera location noises?

If I am wrong, please correct me. Thank you!

how to render with real camera pose

Hi, thanks for your great codebase. I was trying to render pytorch3d to produce view that matches real observation but failed. I wonder if you can also share some code for your ablation study that uses real camera view? Thx

proprioception dimension

Hi first of all thank you for the amazing work of RVT.

I have a question. I'm trying to reproduce some of the results on the real-robot setting with a Franka robot, and I'm trying to understand the proprioception data. According to the MTV config the proprioception_dim is 4. What does those 4dim represent? If it's the joint position I assume it should bigger, and if it was cartesian position of the end-effector I assume it is either 3 or 6 (or 7 if quaternion).

Thank you in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.