Coder Social home page Coder Social logo

ldm_layout's Introduction

Latent Diffusion Model (LDM) for Layout-to-image generation


This is the non-official repository of LDM for layout-to-image generation. Currently, the config and code in official LDM repo is incompleted. Thus, the repo aims to reproduce LDM on Layout-to-image generation task. If you find it useful, please cite their original paper LDM.


Machine environment

  • Ubuntu version: 18.04.5 LTS
  • CUDA version: 11.6
  • Testing GPU: Nvidia Tesla V100

Requirements

A conda environment named ldm_layout can be created and activated with:

conda env create -f environment.yaml
conda activate ldm_layout

Datasets setup

We provide two approaches to set up the datasets:

Auto-download

To automatically download datasets and save it into the default path (../), please use following script:

bash tools/download_datasets.sh

Manual setup

Text-to-image generation

  • We use COCO 2014 splits for text-to-image task, which can be downloaded from official COCO website.

  • Please create a folder name 2014 and collect the downloaded data and annotations as follows.

    COCO 2014 file structure
    >2014
    ├── annotations
    │   └── captions_val2014.json
    │   └── ...
    └── val2014
       └── COCO_val2014_000000000073.jpg
       └── ... 
    

Layout-to-image generation

  • We use COCO 2017 splits to test Frido on layout-to-image task, which can be downloaded from official COCO website.

  • Please create a folder name 2017 and collect the downloaded data and annotations as follows.

    COCO 2017 file structure
    >2017
    ├── annotations
    │   └── captions_val2017.json
    │   └── ...
    └── val2017
       └── 000000000872.jpg
       └── ... 
    

File structure for dataset and code

Please make sure that the file structure is the same as the following. Or, you might modify the config file to match the corresponding paths.

File structure
>datasets
├── coco
│   └── 2014
│        └── annotations
│        └── val2014
│        └── ...
│   └── 2017
│        └── annotations
│        └── val2017
│        └── ...
>ldm_layout
└── configs
│   └── ldm
│   └── ... 
└── exp
│   └── ...
└── ldm
└── taming
└── scripts
└── tools
└── ...

VQGAN models setup

We provide script to download VQGAN-f8 in LDM github:

To automatically download VQGAN-f8 and save it into the default path (exp/), please use following script:

bash tools/download_models.sh

Train LDM for layout-to-image generation

We now provide scripts for training LDM on text-to-image and layout-to-image.

Once the datasets are properly set up, one may train LDM by the following commands.

Text-to-image

bash tools/ldm/train_ldm_coco_T2I.sh
  • Default output folder will be exp/ldm/T2I

Layout-to-image

bash tools/ldm/train_ldm_coco_Layout2I.sh
  • Default output folder will be exp/ldm/Layout2I

Multi-GPU testing

Change "--gpus" to identify the number of GPUs for training.

For example, using 4 gpus

python main.py --base configs/ldm/coco_sg2im_ldm_Layout2I_vqgan_f8.yaml \
        -t True --gpus 0,1,2,3 -log_dir ./exp/ldm/Layout2I \
        -n coco_sg2im_ldm_Layout2I_vqgan_f8 --scale_lr False -tb True

Inference

Change "-t" to identify training or testing phase. (Note that multi-gpu testing is supported.)

For example, using 4 gpus for testing

python main.py --base configs/ldm/coco_sg2im_ldm_Layout2I_vqgan_f8.yaml \
        -t False --gpus 0,1,2,3 -log_dir ./exp/ldm/Layout2I \
        -n coco_sg2im_ldm_Layout2I_vqgan_f8 --scale_lr False -tb True

Acknowledgement

We build LDM_layout codebase heavily on the codebase of Latent Diffusion Model (LDM) and VQGAN. We sincerely thank the authors for open-sourcing!

Citation

If you find this code useful for your research, please consider citing:

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{https://doi.org/10.48550/arxiv.2204.11824,
  doi = {10.48550/ARXIV.2204.11824},
  url = {https://arxiv.org/abs/2204.11824},
  author = {Blattmann, Andreas and Rombach, Robin and Oktay, Kaan and Ommer, Björn},
  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Retrieval-Augmented Diffusion Models},
  publisher = {arXiv},
  year = {2022},  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.