Coder Social home page Coder Social logo

jackbonadies / structured-diffusion-guidance Goto Github PK

View Code? Open in Web Editor NEW

This project forked from weixi-feng/structured-diffusion-guidance

0.0 1.0 0.0 2.56 MB

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

License: Other

Python 32.39% Jupyter Notebook 67.61%

structured-diffusion-guidance's Introduction

Structured Diffusion Guidance (ICLR 2023)

We propose a method to fuse language structures into diffusion guidance for compositionality text-to-image generation.

Project Page | Paper | [Google Colab](Coming Soon)

This is the official codebase for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis.

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Weixi Feng 1, Xuehai He 2, Tsu-Jui Fu1, Varun Jampani3, Arjun Akula3, Pradyumna Narayana3, Sugato Basu3, Xin Eric Wang2, William Yang Wang 1
1UCSB, 2UCSC, 3Google

Setup

Clone this repository and then create a conda environment with:

conda env create -f environment.yaml
conda activate structure_diffusion

If you already have a stable diffusion environment, you can run the following commands:

pip install stanza nltk scenegraphparser tqdm matplotlib
pip install -e .

Inference

This repository supports stable diffusion 1.4 for now. Please refer to the official stable-diffusion repository to download the pre-trained model and put it under models/ldm/stable-diffusion-v1/. Our method is training-free and can be applied to the trained stable diffusion checkpoint directly.

To generate an image, run

python scripts/txt2img_demo.py --prompt "A red teddy bear in a christmas hat sitting next to a glass" --plms --parser_type constituency

By default, the guidance scale is set to 7.5 and output image size is 512x512. We only support PLMS sampling and batch size equals to 1 for now. Apart from the default arguments from Stable Diffusion, we add --parser_type and --conjunction.

usage: txt2img_demo.py [-h] [--prompt [PROMPT]] ...
                       [--parser_type {constituency,scene_graph}] [--conjunction] [--save_attn_maps]

optional arguments:
    ...
  --parser_type {constituency,scene_graph}
  --conjunction         If True, the input prompt is a conjunction of two concepts like "A and B"
  --save_attn_maps      If True, the attention maps will be saved as a .pth file with the name same as the image

Without specifying the conjunction argument, the model applies one key and multiple values for each cross-attention layer. For concept conjunction prompts, you can run:

python scripts/txt2img_demo.py --prompt "A red car and a white sheep" --plms --parser_type constituency --conjunction

Overall, compositional prompts remains a challenge for Stable Diffusion v1.4. It may still take several attempts to get a correct image with our method. The improvement is system-level instead of sample-level, and we are still looking for good evaluation metrics for compositional T2I synthesis. We observe less missing objects in Stable Diffusion v2, and we are implementing our method on top of it as well. Please feel free to reach out for a discussion.

Comments

Our codebase builds heavily on Stable Diffusion. Thanks for open-sourcing!

Citing our Paper

If you find our code or paper useful for your research, please consider citing (Coming soon)

@article{feng2022training,
  title={Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis},
  author={Feng, Weixi and He, Xuehai and Fu, Tsu-Jui and Jampani, Varun and Akula, Arjun and Narayana, Pradyumna and Basu, Sugato and Wang, Xin Eric and Wang, William Yang},
  journal={ICLR},
  year={2023}
}

structured-diffusion-guidance's People

Contributors

weixi-feng avatar jackbonadies avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.