Coder Social home page Coder Social logo

whuhxb / 3d-scene-diffusion-guidance-using-scene-graphs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hamnaanaa/3d-scene-diffusion-guidance-using-scene-graphs

0.0 0.0 0.0 12.45 MB

The implementation for "3D Scene Diffusion Guidance using Scene Graphs" paper. A Diffusion Model for Conditional 3D Scene Generation with Classifier-Free Guidance on Scene Graphs

Shell 0.08% Python 5.78% Jupyter Notebook 94.14%

3d-scene-diffusion-guidance-using-scene-graphs's Introduction

3D Scene Diffusion Guidance using Scene Graphs

Paper | arXiv

Paper Poster

This repository contains the source code for the paper 3D Scene Diffusion Guidance using Scene Graphs.

Abstract

Guided synthesis of high-quality 3D scenes is a challenging task. Diffusion models have shown promise in generating diverse data, including 3D scenes. However, current methods rely directly on text embeddings for controlling the generation, limiting the incorporation of complex spatial relationships between objects. We propose a novel approach for 3D scene diffusion guidance using scene graphs. To leverage the relative spatial information the scene graphs provide, we make use of relational graph convolutional blocks within our denoising network. We show that our approach significantly improves the alignment between scene description and generated scene.

Table of Contents

Setup

To set up all the necessary dependencies, you can use Conda. Open your terminal and execute the following command:

conda env create -f environment.yml

Usage

guided-diffusion/inference.ipynb provides a script for generating scenes using the trained denoising network.

guided-diffusion/main.ipynb provides a training script for the denoising network.

[Optional] FastText Embeddings by Facebook Research

We use the FastText embeddings by Facebook Research to embed the scene objects descriptions in a more robust way than a standard Word2Vec encoder. By using it, we can embed textual description of each object in the scene into a 300-dimensional embedding stored in a node. Combining these nodes with their inter-node relations we generate a scene graph used as input to the denoising network. You can download the model binary here and place them in the models folder.

[Optional] DVIS Library for 3D Scene Visualization

guided-diffusion/inference.ipynb contains a code section on DVIS library usage to visualize the generated scenes in 3D.

Results

Generated Scenes

Below is a 4x4 table providing an overview of 3D scene synthesis results. Each row depicts a single result: first column displays a natural language description of a scene, second column shows a corresponding scene graph used as input for the generative process. The remaining two columns depict the synthesized 3D scenes both from the side and top views. The selected results display generative results for (1) very complex, (2) disconnected, (3) repetitive, and (4) simple scene graphs. Generated Scenes

Denoising Process

The following GIF demonstrates the denoising process applied to a single scene: Denoising Process

Acknowledgement

This work is developed with TUM Visual Computing Group led by Prof. Matthias Niessner. It builds upon DiffuScene and we thank Yinyu Nie for his great support and supervision.

Citation

If you find this work useful please cite:

@misc{naanaa20233d,
      title={3D Scene Diffusion Guidance using Scene Graphs}, 
      author={Mohammad Naanaa and Katharina Schmid and Yinyu Nie},
      year={2023},
      eprint={2308.04468},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

3d-scene-diffusion-guidance-using-scene-graphs's People

Contributors

hamnaanaa avatar kasothaphie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.