Coder Social home page Coder Social logo

ysymyth / 3d-sdn Goto Github PK

View Code? Open in Web Editor NEW
266.0 15.0 41.0 14.33 MB

[NeurIPS 2018] 3D-Aware Scene Manipulation via Inverse Graphics

Home Page: http://3dsdn.csail.mit.edu/

License: Other

Shell 0.39% Python 95.02% Cuda 1.71% C 2.86% C++ 0.02%
3d-sdn pytorch deep-learning 3d-vision generative-adversarial-networks gans disentangled-representations

3d-sdn's Introduction

3D Scene De-rendering Networks (3D-SDN)

Project | Paper | Poster

PyTorch implementation for 3D-aware scene de-rendering and editing. Our method integrates disentangled representations for semantics, geometry, and appearance into a deep generative model. The disentanglement of semantics, geometry, and appearance supports 3D-aware scene manipulation such as (a) translation, (b) rotation, (c) color and texture editing, and (d) object removal and occlusion recovery.

3D-Aware Scene Manipulation via Inverse Graphics
Shunyu Yao*, Tzu-Ming Harry Hsu*, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, William T. Freeman, Joshua B. Tenenbaum
In Neural Information Processing Systems (NeurIPS) 2018.
MIT CSAIL, Tsinghua University, and Google Research.

Framework

Our de-renderer consists of a semantic-, a textural- and a geometric branch. The textural renderer and geometric renderer then learn to reconstruct the original image from the representations obtained by the de-renderer modules.

Example Results on Cityscapes

Example user editing results on Cityscapes. (a) We move two cars closer to the camera.
(b) We rotate the car with different angles.
(c) We recover a tiny and occluded car and move it closer. Our model can synthesize the occluded region.
(d) We move a small car closer and then change its locations.

Prerequisites

  • Linux
  • Python 3.6+
  • PyTorch 0.4
  • NVIDIA GPU (GPU memory > 8GB) + CUDA 9.0

Getting Started

Installation

  1. Clone this repository

    git clone https://github.com/ysymyth/3D-SDN.git && cd 3D-SDN
  2. Download the pre-trained weights

    ./models/download_models.sh
  3. Set up the conda environment

    conda env create -f environment.yml && conda activate 3dsdn
  4. Compile dependencies in geometric/maskrcnn

    ./scripts/build.sh
  5. Set up environment variables

    source ./scripts/env.sh

Image Editing

We are using ./assets/0006_30-deg-right_00043.png as the example image for editing.

Semantic Branch

python semantic/vkitti_test.py \
    --ckpt ./models \
    --id vkitti-semantic \
    --root_dataset ./assets \
    --test_img 0006_30-deg-right_00043.png \
    --result ./assets/example/semantic

Geometric Branch

python geometric/scripts/main.py \
    --do test \
    --dataset vkitti \
    --mode extend \
    --source maskrcnn \
    --ckpt_dir ./models/vkitti-geometric-derender3d \
    --maskrcnn_path ./models/vkitti-geometric-maskrcnn/mask_rcnn_vkitti_0100.pth \
    --edit_json ./assets/vkitti_edit_example.json \
    --input_file ./assets/0006_30-deg-right_00043.png \
    --output_dir ./assets/example/geometric

Textural Branch

python textural/edit_vkitti.py \
    --name vkitti-textural \
    --checkpoints_dir ./models \
    --edit_dir ./assets/example/geometric/vkitti/maskrcnn/0006/30-deg-right \
    --edit_source ./assets/0006_30-deg-right_00043.png \
    --edit_num 5 \
    --segm_precomputed_path ./assets/example/semantic/0006_30-deg-right_00043.png \
    --results_dir ./assets/example \
    --feat_pose True \
    --feat_normal True

Then the edit results can be viewed at ./assets/example/vkitti-textural_edit_edit_60/index.html.

Simply do cd ./assets/example/vkitti-textural_edit_edit_60 && python -m http.server 1234 and use your browser to connect to the server. You should see the results with intermediate 2.5D representations rendered as follows.

Training/Testing

Please set up the datasets first and refer to semantic/README.md, geometric/README.md, and textural/README.md for training and testing details.

./datasets/download_vkitti.sh

Please cite their paper if you use their data.

Experiments

Virtual KITTI Benchmark

Here is a fragment of our Virtual KITTI benchmark edit specification, in the form of a json file. For each edit pair, a source image would be world/topic/source.png and a target image would be world/topic/target.png. A list of operations is specified to transform the source image to the target image. Aligned with human cognition, each operation is either moving (modify) an object from a position to another, or delete it from our view. Additionally, we may enlarge (zoom) the object or rotate the object along the y-axis (ry). Note the y-axis points downwards, consistent with the axis specification of the Virtual KITTI dataset. The u's and v's denote the objects' 3D center projected onto the image plane. We indicate a target region of interest roi on top of the target (u, v) position. There are 92 such pairs in the benchmark.

{
    "world": "0006",
    "topic": "fog",
    "source": "00055",
    "target": "00050",
    "operations": [
        {
            "type": "modify",
            "from": {"u": "750.9", "v": "213.9"},
            "to": {"u": "804.4", "v": "227.1", "roi": [194, 756, 269, 865]},
            "zoom": "1.338",
            "ry": "0.007"
        }
    ]
}

Semantic Branch

python semantic/vkitti_test.py \
    --ckpt ./models \
    --id vkitti-semantic \
    --root_dataset ./datasets/vkitti \
    --test_img benchmark \
    --benchmark_json ./assets/vkitti_edit_benchmark.json \
    --result ./assets/vkitti-benchmark/semantic

Geometric Branch

python geometric/scripts/main.py \
    --do test \
    --dataset vkitti \
    --mode extend \
    --source maskrcnn \
    --ckpt_dir ./models/vkitti-geometric-derender3d \
    --maskrcnn_path ./models/vkitti-geometric-maskrcnn/mask_rcnn_vkitti_0100.pth \
    --output_dir ./assets/vkitti-benchmark/geometric \
    --edit_json ./assets/vkitti_edit_benchmark.json

Textural Branch

python textural/edit_benchmark.py \
    --name vkitti-textural \
    --checkpoints_dir ./models \
    --dataroot ./datasets/vkitti \
    --edit_dir ./assets/vkitti-benchmark/geometric/vkitti/maskrcnn \
    --edit_list ./assets/vkitti_edit_benchmark.json \
    --experiment_name benchmark_3D \
    --segm_precomputed_path ./assets/vkitti-benchmark/semantic \
    --results_dir ./assets/vkitti-benchmark/ \
    --feat_pose True \
    --feat_normal True

Then the benchmark edit results can be viewed at ./assets/vkitti-benchmark/vkitti-textural_benchmark_3D_edit_60/index.html.

Reference

If you find this useful for your research, please cite the following paper.

@inproceedings{3dsdn2018,
  title={3D-Aware Scene Manipulation via Inverse Graphics},
  author={Yao, Shunyu and Hsu, Tzu Ming Harry and Zhu, Jun-Yan and Wu, Jiajun and Torralba, Antonio and Freeman, William T. and Tenenbaum, Joshua B.},
  booktitle={Advances in Neural Information Processing Systems},
  year={2018}
}

For any question, please contact Shunyu Yao and Tzu-Ming Harry Hsu.

Acknowledgements

This work is supported by NSF #1231216, NSF #1524817, ONR MURI N00014-16-1-2007, Toyota Research Institute, and Facebook.

The semantic branch borrows from Semantic Segmentation on MIT ADE20K dataset in PyTorch, the geometric branch borrows from pytorch-mask-rcnn and neural_renderer, and the textural branch borrows from pix2pixHD.

3d-sdn's People

Contributors

junyanz avatar ysymyth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3d-sdn's Issues

MaskRCNN build fails

Hello,
I am trying to run your code but somehow I got error building maskrcnn module and receive error related to cuda can not be found.
image
/home/phong/miniconda3/envs/3dsdn/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC/THCGeneral.h:12:18: fatal error: cuda.h: No such file or directory #include "cuda.h"

I have used conda environment to install but so far I only receive this error, please help !

The MaskRCNN training on Cityscapes dataset

I use the provided code for training the Maskrcnn on the "Cityscapes" dataset. The command is as following:

image

However, after some 100 steps, I get the following error:
2019-01-24 6 12 49

And I check the size of the boxes, but the size is [0].

So could you please give me some advice for solving this problem.

Thanks a lot!

cupy package and its compatibility with CUDA 9

Hi,

I am following your guidelines and in the geometric branch (during test time), I face some problems with the cupy package. It seems it has some problems with the cuda compiler. I was wondering if by chance you have faced the same issue and know how to fix it.
Here is part of the error I get:

 File "/opt/conda/envs/3dsdn/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 241, in compile
    nvrtc.compileProgram(self.ptr, options)
  File "cupy/cuda/nvrtc.pyx", line 98, in cupy.cuda.nvrtc.compileProgram
  File "cupy/cuda/nvrtc.pyx", line 108, in cupy.cuda.nvrtc.compileProgram
  File "cupy/cuda/nvrtc.pyx", line 53, in cupy.cuda.nvrtc.check_status
cupy.cuda.nvrtc.NVRTCError: NVRTC_ERROR_COMPILATION (6)

During handling of the above exception, another exception occurred:
...
  File "/opt/conda/envs/3dsdn/lib/python3.6/site-packages/cupy/manipulation/join.py", line 49, in concatenate
    return core.concatenate_method(tup, axis)
  File "cupy/core/core.pyx", line 2423, in cupy.core.core.concatenate_method
  File "cupy/core/core.pyx", line 2467, in cupy.core.core.concatenate_method
  File "cupy/core/core.pyx", line 2502, in cupy.core.core._concatenate
  File "cupy/core/elementwise.pxi", line 839, in cupy.core.core.ufunc.__call__
  File "cupy/util.pyx", line 39, in cupy.util.memoize.decorator.ret
  File "cupy/core/elementwise.pxi", line 638, in cupy.core.core._get_ufunc_kernel
  File "cupy/core/elementwise.pxi", line 33, in cupy.core.core._get_simple_elementwise_kernel
  File "cupy/core/carray.pxi", line 146, in cupy.core.core.compile_with_cache
  File "/opt/conda/envs/3dsdn/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 164, in compile_with_cache
    ptx = compile_using_nvrtc(source, options, arch)
  File "/opt/conda/envs/3dsdn/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 82, in compile_using_nvrtc
    ptx = prog.compile(options)
  File "/opt/conda/envs/3dsdn/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 245, in compile
    raise CompileException(log, self.src, self.name, options)
cupy.cuda.compiler.CompileException: /opt/conda/envs/3dsdn/lib/python3.6/site-packages/cupy/core/include/cupy/carray.cuh(10): catastrophic error: cannot open source file "cuda_fp16.h"

1 catastrophic error detected in the compilation of "/tmp/tmpif65tvzr/kern.cu".
Compilation terminated.

Undefined name 'E_NLayers' in networks.py

Where is E_NLayers defined?

flake8 testing of https://github.com/ysymyth/3D-SDN on Python 3.7.1

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./textural/models/networks.py:96:16: F821 undefined name 'E_NLayers'
        netE = E_NLayers(input_nc, output_nc, ndf, n_layers=4, norm_layer=norm_layer,
               ^
./textural/models/networks.py:99:16: F821 undefined name 'E_NLayers'
        netE = E_NLayers(input_nc, output_nc, ndf, n_layers=5, norm_layer=norm_layer,
               ^
2     F821 undefined name 'E_NLayers'
2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.