Coder Social home page Coder Social logo

fangtun / strdiffusion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from htyjers/strdiffusion

0.0 0.0 0.0 19.28 MB

[CVPR 2024] Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting

License: Apache License 2.0

Shell 0.22% C++ 1.01% Python 92.88% Cuda 5.89%

strdiffusion's Introduction

StrDiffusion

This repository is the official code for the paper "Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting" by Haipeng Liu ([email protected]), Yang Wang (corresponding author: [email protected]), Biao Qian, Meng Wang, Yong Rui. CVPR 2024, Seattle, USA

Introduction

In this paper, we propose a novel structure-guided diffusion model for image inpainting (namely StrDiffusion), which reformulates the conventional texture denoising process under the guidance of the structure to derive a simplified denoising objective (Eq.11) for inpainting, while revealing: 1) the semantically sparse structure is beneficial to tackle the semantic discrepancy in the early stage, while the dense texture generates the reasonable semantics in the late stage; 2) the semantics from the unmasked regions essentially offer the time-dependent guidance for the texture denoising process, benefiting from the time-dependent sparsity of the structure semantics. For the denoising process, a structure-guided neural network is trained to estimate the simplified denoising objective by exploiting the consistency of the denoised structure between masked and unmasked regions. Besides, we devise an adaptive resampling strategy as a formal criterion on whether the structure is competent to guide the texture denoising process, while regulate their semantic correlations.

Figure 1. Illustration of the proposed StrDiffusion pipeline.

Figure 2. Illustration of the adaptive resampling strategy.

In summary, our StrDiffusion reveals:

  • the semantically sparse structure encourages the consistent semantics for the denoised results in the early stage, while the dense texture carries out the semantic generation in the late stage;
  • The semantics from the unmasked regions essenially offer the time-dependent guidance for the texture denoising process, benefiting from the time-dependent sparsity of the structure semantics.
  • We remark that whether the structure guides the texture well greatly depends on the semantic correlation between them. As inspired, an adaptive resampling strategy comes up to monitor the semantic correlation and regulate it via the resampling iteration

Dependencies

  • OS: Ubuntu 20.04.6
  • nvidia :
    • cuda: 12.3
    • cudnn: 8.5.0
  • python3
  • pytorch >= 1.13.0
  • Python packages: pip install -r requirements.txt

Train-[Structure Denoising Model]

  1. Dataset Preparation:

    Download mask and image datasets, then get into the StrDiffusion/train/structure directory and modify the dataset paths in option files in /config/inpainting/options/train/ir-sde.yml

    • You can set the mask path in here
    • You can set the image path in here
  2. Run the following command:

Python3 ./train/structure/config/inpainting/train.py

Train-[Texture Denoising Model]

  1. Dataset Preparation:

    Download mask and image datasets, then get into the StrDiffusion/train/texture directory and modify the dataset paths in option files in /config/inpainting/options/train/ir-sde.yml

    • You can set the mask path in here
    • You can set the image path in here
  2. Run the following command:

Python3 ./train/texture/config/inpainting/train.py

Train-[Discriminator Network]

  1. Dataset Preparation:

    Download mask and image datasets, then get into the StrDiffusion/train/discriminator directory and modify the dataset paths in option files in /config/inpainting/options/train/ir-sde.yml

    • You can set the mask path in here
    • You can set the image path in here
  2. Run the following command:

Python3 ./train/discriminator/config/inpainting/train.py

Test-[StrDiffusion]

  1. Dataset Preparation:

    Download mask and image datasets, then get into the StrDiffusion/test/texture directory and modify the dataset paths in option files in /config/inpainting/options/test/ir-sde.yml

    • You can set the mask path in here
    • You can set the image path in here
  2. Pre-trained models:

    Download the pre-trained model of Places2, T=400 or Places2, T=100, then get into the StrDiffusion/test/texture directory and modify the model paths in option files in /config/inpainting/options/test/ir-sde.yml

    • You can set the path of Texture Denoising Model in here
    • You can set the path of Structure Denoising Model in here
    • You can set the path of Discriminator Network in here
  3. For different T, you can set the corresponding hyperparameters of adaptive resampling strategy in here

  4. Run the following command:

Python3 ./test/texture/config/inpainting/test.py

Example Results

  • Visual comparison between our method and the competitors.

  • Visualization of the denoised results for IR-SDE and StrDiffusion during the denoising process,

Citation

If any part of our paper and repository is helpful to your work, please generously cite with:

@misc{liu2024structure,
      title={Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting}, 
      author={Haipeng Liu and Yang Wang and Biao Qian and Meng Wang and Yong Rui},
      year={2024},
      eprint={2403.19898},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

This implementation is based on / inspired by:

strdiffusion's People

Contributors

htyjers avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.