ut-austin-rpl / ditto Goto Github PK

View Code? Open in Web Editor NEW

101.0 3.0 16.0 15.72 MB

Code for Ditto: Building Digital Twins of Articulated Objects from Interaction

License: MIT License

Python 26.09% TeX 0.14% Jupyter Notebook 73.76%

computer-vision robotics 3d-reconstruction digital-twin deep-learning

ditto's Introduction

Ditto: Building Digital Twins of Articulated Objects from Interaction

Zhenyu Jiang, Cheng-Chun Hsu, Yuke Zhu

CVPR 2022, Oral

Project | arxiv

News

2022-04-28: We released the data generation code of Ditto here.

Introduction

Ditto (Digital Twins of Articulated Objects) is a model that reconstructs part-level geometry and articulation model of an articulated object given observations before and after an interaction. Specifically, we use a PointNet++ encoder to encoder the input point cloud observations, and fuse the subsampled point features with a simple attention layer. Then we use two independent decoders to propagate the fused point features into two sets of dense point features, for geometry reconstruction and articulation estimation separately. We construct feature grid/planes by projecting and pooling the point features, and query local features from the constructed feature grid/planes. Conditioning on local features, we use different decoders to predict occupancy, segmentation and joint parameters with respect to the query points. At then end, we can extract explicit geometry and articulation model from the implicit decoders.

If you find our work useful in your research, please consider citing.

Installation

Create a conda environment and install required packages.

conda env create -f conda_env_gpu.yaml -n Ditto

You can change the pytorch and cuda version in conda_env_gpu.yaml.

Build ConvONets dependents by running python scripts/convonet_setup.py build_ext --inplace.
Download the data, then unzip the data.zip under the repo's root.

Training

# single GPU
python run.py experiment=Ditto_s2m

# multiple GPUs
python run.py trainer.gpus=4 +trainer.accelerator='ddp' experiment=Ditto_s2m

# multiple GPUs + wandb logging
python run.py trainer.gpus=4 +trainer.accelerator='ddp' logger=wandb logger.wandb.group=s2m experiment=Ditto_s2m

Testing

# only support single GPU
python run_test.py experiment=Ditto_s2m trainer.resume_from_checkpoint=/path/to/trained/model/

Demo

Here is a minimum demo that starts from multiview depth maps before and after interaction and ends with a reconstructed digital twin. To run the demo, you need to install this library for visualization.

We provide the posed depth images of a real word laptop to run the demo. You can download from here and put it under data. You can also run demo your own data that follows the same format.

Data and pre-trained models

Data: here. Remeber to cite Shape2Motion and Abbatematteo et al. as well as Ditto when using these datasets.

Pre-trained models: Shape2Motion dataset, Synthetic dataset.

Useful tips

Run eval "$(python run.py -sc install=bash)" under the root directory, you can have auto-completion for commandline options.
Install pre-commit hooks by pip install pre-commit; pre-commit install, then you can have automatic formatting before each commit.

Related Repositories

Our code is based on this fantastic template Lightning-Hydra-Template.
We use ConvONets as our backbone.

Citing

@inproceedings{jiang2022ditto,
   title={Ditto: Building Digital Twins of Articulated Objects from Interaction},
   author={Jiang, Zhenyu and Hsu, Cheng-Chun and Zhu, Yuke},
   booktitle={arXiv preprint arXiv:2202.08227},
   year={2022}
}

ditto's People

Contributors

Stargazers

Watchers

Forkers

bruinxiong fosstheory manikant92 chandanpanda worldhellooo ccc1711 robinwangsd tmats k-ish0212 3dlg-hcvc jimmy-inl kami-code sevenljy josh-zhu oxidationreduction chandan154

ditto's Issues

The results of testing on the pre-trained model

Hello, we have tested using the pre-trained model provided in this paper, but the results seem to be incorrect. Where may there be a problem? We haven't changed the program code during this period. Thank you for your reply.

Different visualization result in demo_depth_map.ipynb

I have been replicating the visualization result shown in demo_depth_map.ipynb (https://github.com/UT-Austin-RPL/Ditto/blob/master/notebooks/demo_depth_map.ipynb). The following are the steps I executed:

Clone the Ditto repo and create a virtual env in PyCharm. (Python 3.8.10). Then perform the rest in the created virtual env.
install the required packages by 'pip install -r requirements.txt'
build the dependencies by 'python scripts/convonet_setup.py build_ext --inplace'
collect all the required data and put them under root/data directory, following the README.m (https://github.com/UT-Austin-RPL/Ditto#data-and-pre-trained-models, https://utexas.box.com/s/ujb2ky8y9vaog7nheth1n3tmm1rgx9t7, https://utexas.box.com/s/a4h001b3ciicrt3f71t4xd3wjsm04be7, https://utexas.box.com/s/zbf5bja20n2w6umryb1bcfbbcm3h2ysn)
install an older version of utils3d by 'pip install git+git://github.com/Steve-Tod/utils3d.git@bbd72687404436b37c90230a572891075aa8a53b' because the Pyrenderer in the newest located at a different path and will cause error when executing in my env.
Then I directly run all the cells in demo_depth_map.ipynb, and get the following results (from different viewpoints)

The prismatic joint axis looks different from the demo and meshes of both digital twins from my result seem to be incomplete.

Below are the pic from demo.

What's the meaning of 'recenter' when inferencing the pivot point

Hi,

Thanks for your great work. When going through the demo code, I don't quite understand about the recenter process. Can you help exaplain a bit. Based on my current understanding, the pivot point has been the averaged motion origin. Why do we want another recenter operation? What's the use of the double cross?

if recenter: pivot_point = np.cross(axis, np.cross(pivot_point, axis))

FileNotFoundError: [Errno 2] No such file or directory:

Hello , this error comes when I do the Training , Is there any solution

File "C:\Users\Labor\anaconda3\envs\Ditto\lib\runpy.py", line 234, in _get_code_from_file
with io.open_code(decoded_path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Labor\Digital-Twin\Ditto\logs\runs\2022-05-19\Ditto_s2m-10-04-35\run.py'

Where do you save the digital twins?

Hi! I'm new to the way your code organizes and I wonder where the output of the model is. And how to visualize it using the utils3d tools?
Many thanks!

R the models in the canonical object space?

Hi,

I notice that in the demo, the laptop is converted from depth image to camera coordinate, then to world coordinate. What's the reason to convert into the world coordinate? Does it promise that the object is in the canonical object space?

When training on the shape2motion data, is the training data in the canonical object space (with a canonical pose)? Or it's just in the camera coordinate

No such file or directory: '../data/Ditto_s2m.ckpt'

When I run the real laptop demo notebook, I encounter this error. I'm not sure where this model coming from.

How can I use mesh files with urdf files to make my dataset？

I want to train the Ditto using my own mesh files with urdf files as the train dataset. Is there any code for generating dataset?

How to visualize the test results of shape2motion dataset?

Hi, according to the tutorial, I have realized the visualization of test results of the real word dataset, which depends on the input of depth images and RGB images. However, how to visualize the test results of the shape2motion dataset? Thank you very much for your help!