Coder Social home page Coder Social logo

li-guihai / total3dunderstanding Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gap-lab-cuhk-sz/total3dunderstanding

0.0 0.0 0.0 4.25 MB

Implementation of CVPR'20 Oral: Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

License: MIT License

Python 100.00%

total3dunderstanding's Introduction

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image
Yinyu Nie, Xiaoguang Han, Shihui Guo, Yujian Zheng, Jian Chang, Jian Jun Zhang
In CVPR, 2020.

img.jpg 3dbbox.png recon.png
img.jpg 3dbbox.png recon.png


Install

This implementation uses Python 3.6, Pytorch1.1.0, cudatoolkit 9.0. We recommend to use conda to deploy the environment.

  • Install with conda:
conda env create -f environment.yml
conda activate Total3D
  • Install with pip:
pip install -r requirements.txt

Demo

The pretrained model can be download here. We also provide the pretrained Mesh Generation Net here. Put the pretrained models under

out/pretrained_models

A demo is illustrated below to see how the method works. vtk is used here to visualize the 3D scenes. The outputs will be saved under 'demo/outputs'. You can also play with your toy with this script.

cd Total3DUnderstanding
python main.py configs/total3d.yaml --mode demo --demo_path demo/inputs/1

Data preparation

In our paper, we use SUN-RGBD to train our Layout Estimation Net (LEN) and Object Detection Net (ODN), and use Pix3D to train our Mesh Generation Net (MGN).

Preprocess SUN-RGBD data

You can either directly download the processed training/testing data [link] to (recommended)

data/sunrgbd/sunrgbd_train_test_data

or

  1. Download the raw SUN-RGBD data to
data/sunrgbd/Dataset/SUNRGBD
  1. Download the 37 class labels of objects in SUN RGB-D images [link] to
data/sunrgbd/Dataset/SUNRGBD/train_test_labels
  1. Follow this work to download the preprocessed clean data of SUN RGB-D [link] to
'data/sunrgbd/Dataset/data_clean'
  1. Follow this work to download the preprocessed ground-truth of SUN RGB-D [link], and put the '3dlayout' and 'updated_rtilt' folders respectively to
data/sunrgbd/Dataset/3dlayout
data/sunrgbd/Dataset/updated_rtilt
  1. Run below to generate training and testing data in 'data/sunrgbd/sunrgbd_train_test_data'.
python utils/generate_data.py

   If everything goes smooth, a ground-truth scene will be visualized like

gt_scene.png

Preprocess Pix3D data

You can either directly download the preprocessed ground-truth data [link] to (recommended)

data/pix3d/train_test_data

Each sample contains the object class, 3D points (sampled on meshes), sample id and object image (w.o. mask). Samples in the training set are flipped for augmentation.

or

  1. Download the Pix3D dataset to
data/pix3d/metadata
  1. Run below to generate the train/test data into 'data/pix3d/train_test_data'
python utils/preprocess_pix3d.py

Training and Testing

We use the configuration file (see 'configs/****.yaml') to fully control the training/testing process. There are three subtasks in Total3D (layout estimation, object detection and mesh reconstruction). We first pretrain each task individually followed with joint training.

Pretraining
  1. Switch the keyword in 'configs/total3d.yaml' between ('layout_estimation', 'object_detection') as below to pretrain the two tasks individually.
train:
  phase: 'layout_estimation' # or 'object_detection'

python main.py configs/total3d.yaml --mode train

The two pretrained models can be correspondingly found at

out/total3d/a_folder_named_with_script_time/model_best.pth
  1. Train the Mesh Generation Net by:
python main.py configs/mgnet.yaml --mode train

The pretrained model can be found at

out/mesh_gen/a_folder_named_with_script_time/model_best.pth
Joint training

List the addresses of the three pretrained models in 'configs/total3d.yaml', and modify the phase name to 'joint' as

weight: ['folder_to_layout_estimation/model_best.pth', 'folder_to_object_detection/model_best.pth', 'folder_to_mesh_recon/model_best.pth']

train:
  phase: 'joint'

Then run below for joint training.

python main.py configs/total3d.yaml --mode train

The trained model can be found at

out/total3d/a_folder_named_with_script_time/model_best.pth
Testing

Please make sure the weight path is renewed as

weight: ['folder_to_fully_trained_model/model_best.pth']

and run

python main.py configs/total3d.yaml --mode test

This script generates all 3D scenes on the test set of SUN-RGBD under

out/total3d/a_folder_named_with_script_time/visualization

You can also visualize a 3D scene given the sample id as

python utils/visualize.py --result_path out/total3d/a_folder_named_with_script_time/visualization --sequence_id 274
Differences to the paper
  1. We retrained the model with the learning rate decreases to half if there is no gain within five steps, which is much more efficient.
  2. We do not provide the Faster RCNN code. Users can train their 2D detector with [link].

Citation

If you find our work is helpful, please cite

@InProceedings{Nie_2020_CVPR,
author = {Nie, Yinyu and Han, Xiaoguang and Guo, Shihui and Zheng, Yujian and Chang, Jian and Zhang, Jian Jun},
title = {Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Our method partially follows the data processing steps in this work. If it is also helpful to you, please cite

@inproceedings{huang2018cooperative,
  title={Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation},
  author={Huang, Siyuan and Qi, Siyuan and Xiao, Yinxue and Zhu, Yixin and Wu, Ying Nian and Zhu, Song-Chun},
  booktitle={Advances in Neural Information Processing Systems},
  pages={206--217},
  year={2018}
}	

total3dunderstanding's People

Contributors

yinyunie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.