Coder Social home page Coder Social logo

pmh9960 / gcdp Goto Github PK

View Code? Open in Web Editor NEW
37.0 5.0 0.0 9.96 MB

Official PyTorch implementation of "Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis." (ICCV 2023)

Home Page: https://pmh9960.github.io/research/GCDP/

Python 98.61% Shell 1.39%
data-scarce-environments diffusion-models generative-models gaussian-categorical computer-vision

gcdp's Introduction

Gaussian-Categorical Diffusion Process: Official PyTorch Implementation (ICCV 2023)

This is the official PyTorch implementation of the paper: Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis.

GCDP stands for Gaussian-Categorical Diffusion Process

Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis
Minho Park*, Jooyeol Yun*, Seunghwan Choi, and Jaegul Choo
KAIST
In ICCV 2023. (* indicate equal contribution)

Paper: https://arxiv.org/abs/2308.08157
Project page: https://pmh9960.github.io/research/GCDP/

Abstract: Existing text-to-image generation approaches have set high standards for photorealism and text-image correspondence, largely benefiting from web-scale text-image datasets, which can include up to 5 billion pairs. However, text-to-image generation models trained on domain-specific datasets, such as urban scenes, medical images, and faces, still suffer from low text-image correspondence due to the lack of text-image pairs. Additionally, collecting billions of text-image pairs for a specific domain can be time-consuming and costly. Thus, ensuring high text-image correspondence without relying on web-scale text-image datasets remains a challenging task. In this paper, we present a novel approach for enhancing text-image correspondence by leveraging available semantic layouts. Specifically, we propose a Gaussian-categorical diffusion process that simultaneously generates both images and corresponding layout pairs. Our experiments reveal that we can guide text-to-image generation models to be aware of the semantics of different image regions, by training the model to generate semantic labels for each pixel. We demonstrate that our approach achieves higher text-image correspondence compared to existing text-to-image generation approaches in the Multi-Modal CelebA-HQ and the Cityscapes dataset, where text-image pairs are scarce.

Updates

  • [16/08/2023] Pretrained models have been released!
  • [30/07/2023] Code and project page is open to the public.

Gaussian-categorical Diffusion Process

Implementation of the GCDP is available in imagen_pytorch/joint_imagen.py.

Installation

conda env create -f environment.yaml
conda activate GCDP

Training GCDP

First prepare an official Cityscapes / MM CelebA-HQ dataset with the following structure.

Cityscapes
root
 └ leftImg8bit
   └ train
    └ aachen
    └ ...
   └ val
   └ test
 └ gtFine
   └ train
    └ aachen
    └ ...
   └ val
   └ test
MM CelebA-HQ
root
 └ CelebA-HQ-img
   └ 1.jpg
   └ 2.jpg
   └ ...
 └ CelebAMask-HQ-mask-anno
   └ preprocessed
    └ 1.png
    └ 2.png
    └ ...

Please fill in the train/evaluation directories in the scripts/celeba/train_base_128x128.sh file and execute

bash scripts/celeba/train_base_128x128.sh --root /path/to/data

Testing GCDP

You can generate text-conditioned image-layout pairs leveraging pre-trained GCDP models. Please fill in the path to the model checkpoints and validation directories in the scripts/celeba/test_base.sh and scripts/celeba/test_sr.sh file.

bash scripts/celeba/test_base.sh --checkpoint_path /path/to/base/checkpoint
bash scripts/celeba/test_sr.sh --checkpoint_path /path/to/sr/checkpoint

Pretrained Models

Checkpoints for GCDP models are available in here.

  • password: GCDP

Citation

@inproceedings{park2023learning,
  title={Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis},
  author={Park, Minho and Yun, Jooyeol and Choi, Seunghwan and Choo, Jaegul},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={7591--7600},
  year={2023}
}

Acknowledgements

This repository is based on imagen-pytorch by lucidrains and Multinomial Diffusion by Hoogeboom et al.

gcdp's People

Contributors

pmh9960 avatar yeolj00 avatar

Stargazers

Fuyunwang avatar  avatar  avatar Hollis-7 avatar Hyun-Jic Oh avatar llky avatar  avatar Sean avatar abhigoku10 avatar  avatar  avatar elucida avatar 姬忠鹏 avatar  avatar Sandalots avatar Licong Guan avatar 爱可可-爱生活 avatar Dave Lage avatar WOOJUNE PARK avatar  avatar  avatar Zach Bessinger avatar Paragoner avatar Bander Alsulami avatar YqGao716 avatar  avatar Said avatar Hwayoon Lee avatar  avatar Zhenyu Tang avatar Khoi Nguyen avatar Yoon, Seungje avatar Nikita avatar Marcus Kalander avatar Seunghwan Choi avatar Jeff Carpenter avatar  avatar

Watchers

Kostas Georgiou avatar  avatar Sanctuary avatar Mengping Yang avatar Zhenyu Tang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.