Coder Social home page Coder Social logo

ut-austin-rpl / ditto Goto Github PK

View Code? Open in Web Editor NEW
101.0 3.0 16.0 15.72 MB

Code for Ditto: Building Digital Twins of Articulated Objects from Interaction

License: MIT License

Python 26.09% TeX 0.14% Jupyter Notebook 73.76%
computer-vision robotics 3d-reconstruction digital-twin deep-learning

ditto's Introduction

Ditto: Building Digital Twins of Articulated Objects from Interaction

Zhenyu Jiang, Cheng-Chun Hsu, Yuke Zhu

CVPR 2022, Oral

Project | arxiv

intro

News

2022-04-28: We released the data generation code of Ditto here.

Introduction

Ditto (Digital Twins of Articulated Objects) is a model that reconstructs part-level geometry and articulation model of an articulated object given observations before and after an interaction. Specifically, we use a PointNet++ encoder to encoder the input point cloud observations, and fuse the subsampled point features with a simple attention layer. Then we use two independent decoders to propagate the fused point features into two sets of dense point features, for geometry reconstruction and articulation estimation separately. We construct feature grid/planes by projecting and pooling the point features, and query local features from the constructed feature grid/planes. Conditioning on local features, we use different decoders to predict occupancy, segmentation and joint parameters with respect to the query points. At then end, we can extract explicit geometry and articulation model from the implicit decoders.

If you find our work useful in your research, please consider citing.

Installation

  1. Create a conda environment and install required packages.
conda env create -f conda_env_gpu.yaml -n Ditto

You can change the pytorch and cuda version in conda_env_gpu.yaml.

  1. Build ConvONets dependents by running python scripts/convonet_setup.py build_ext --inplace.

  2. Download the data, then unzip the data.zip under the repo's root.

Training

# single GPU
python run.py experiment=Ditto_s2m

# multiple GPUs
python run.py trainer.gpus=4 +trainer.accelerator='ddp' experiment=Ditto_s2m

# multiple GPUs + wandb logging
python run.py trainer.gpus=4 +trainer.accelerator='ddp' logger=wandb logger.wandb.group=s2m experiment=Ditto_s2m

Testing

# only support single GPU
python run_test.py experiment=Ditto_s2m trainer.resume_from_checkpoint=/path/to/trained/model/

Demo

Here is a minimum demo that starts from multiview depth maps before and after interaction and ends with a reconstructed digital twin. To run the demo, you need to install this library for visualization.

We provide the posed depth images of a real word laptop to run the demo. You can download from here and put it under data. You can also run demo your own data that follows the same format.

Data and pre-trained models

Data: here. Remeber to cite Shape2Motion and Abbatematteo et al. as well as Ditto when using these datasets.

Pre-trained models: Shape2Motion dataset, Synthetic dataset.

Useful tips

  1. Run eval "$(python run.py -sc install=bash)" under the root directory, you can have auto-completion for commandline options.

  2. Install pre-commit hooks by pip install pre-commit; pre-commit install, then you can have automatic formatting before each commit.

Related Repositories

  1. Our code is based on this fantastic template Lightning-Hydra-Template.

  2. We use ConvONets as our backbone.

Citing

@inproceedings{jiang2022ditto,
   title={Ditto: Building Digital Twins of Articulated Objects from Interaction},
   author={Jiang, Zhenyu and Hsu, Cheng-Chun and Zhu, Yuke},
   booktitle={arXiv preprint arXiv:2202.08227},
   year={2022}
}

ditto's People

Contributors

steve-tod avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ditto's Issues

The results of testing on the pre-trained model

Hello, we have tested using the pre-trained model provided in this paper, but the results seem to be incorrect. Where may there be a problem? We haven't changed the program code during this period. Thank you for your reply.
Ditto_testresult

Different visualization result in demo_depth_map.ipynb

I have been replicating the visualization result shown in demo_depth_map.ipynb (https://github.com/UT-Austin-RPL/Ditto/blob/master/notebooks/demo_depth_map.ipynb). The following are the steps I executed:

  1. Clone the Ditto repo and create a virtual env in PyCharm. (Python 3.8.10). Then perform the rest in the created virtual env.
  2. install the required packages by 'pip install -r requirements.txt'
  3. build the dependencies by 'python scripts/convonet_setup.py build_ext --inplace'
  4. collect all the required data and put them under root/data directory, following the README.m (https://github.com/UT-Austin-RPL/Ditto#data-and-pre-trained-models, https://utexas.box.com/s/ujb2ky8y9vaog7nheth1n3tmm1rgx9t7, https://utexas.box.com/s/a4h001b3ciicrt3f71t4xd3wjsm04be7, https://utexas.box.com/s/zbf5bja20n2w6umryb1bcfbbcm3h2ysn)
  5. install an older version of utils3d by 'pip install git+git://github.com/Steve-Tod/utils3d.git@bbd72687404436b37c90230a572891075aa8a53b' because the Pyrenderer in the newest located at a different path and will cause error when executing in my env.
  6. Then I directly run all the cells in demo_depth_map.ipynb, and get the following results (from different viewpoints)

output1
output2

The prismatic joint axis looks different from the demo and meshes of both digital twins from my result seem to be incomplete.

Below are the pic from demo.
download

What's the meaning of 'recenter' when inferencing the pivot point

Hi,

Thanks for your great work. When going through the demo code, I don't quite understand about the recenter process. Can you help exaplain a bit. Based on my current understanding, the pivot point has been the averaged motion origin. Why do we want another recenter operation? What's the use of the double cross?

if recenter: pivot_point = np.cross(axis, np.cross(pivot_point, axis))

FileNotFoundError: [Errno 2] No such file or directory:

Hello , this error comes when I do the Training , Is there any solution

File "C:\Users\Labor\anaconda3\envs\Ditto\lib\runpy.py", line 234, in _get_code_from_file
with io.open_code(decoded_path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Labor\Digital-Twin\Ditto\logs\runs\2022-05-19\Ditto_s2m-10-04-35\run.py'

Where do you save the digital twins?

Hi! I'm new to the way your code organizes and I wonder where the output of the model is. And how to visualize it using the utils3d tools?
Many thanks!

R the models in the canonical object space?

Hi,

I notice that in the demo, the laptop is converted from depth image to camera coordinate, then to world coordinate. What's the reason to convert into the world coordinate? Does it promise that the object is in the canonical object space?

When training on the shape2motion data, is the training data in the canonical object space (with a canonical pose)? Or it's just in the camera coordinate

How to visualize the test results of shape2motion dataset?

Hi, according to the tutorial, I have realized the visualization of test results of the real word dataset, which depends on the input of depth images and RGB images. However, how to visualize the test results of the shape2motion dataset? Thank you very much for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.