This is the non-official repository of LDM for layout-to-image generation. Currently, the config and code in official LDM repo is incompleted. Thus, the repo aims to reproduce LDM on Layout-to-image generation task. If you find it useful, please cite their original paper LDM.
- Ubuntu version: 18.04.5 LTS
- CUDA version: 11.6
- Testing GPU: Nvidia Tesla V100
A conda environment named ldm_layout
can be created and activated with:
conda env create -f environment.yaml
conda activate ldm_layout
We provide two approaches to set up the datasets:
To automatically download datasets and save it into the default path (../
), please use following script:
bash tools/download_datasets.sh
-
We use COCO 2014 splits for text-to-image task, which can be downloaded from official COCO website.
-
Please create a folder name
2014
and collect the downloaded data and annotations as follows.COCO 2014 file structure
>2014 ├── annotations │ └── captions_val2014.json │ └── ... └── val2014 └── COCO_val2014_000000000073.jpg └── ...
-
We use COCO 2017 splits to test Frido on layout-to-image task, which can be downloaded from official COCO website.
-
Please create a folder name
2017
and collect the downloaded data and annotations as follows.COCO 2017 file structure
>2017 ├── annotations │ └── captions_val2017.json │ └── ... └── val2017 └── 000000000872.jpg └── ...
Please make sure that the file structure is the same as the following. Or, you might modify the config file to match the corresponding paths.
File structure
>datasets
├── coco
│ └── 2014
│ └── annotations
│ └── val2014
│ └── ...
│ └── 2017
│ └── annotations
│ └── val2017
│ └── ...
>ldm_layout
└── configs
│ └── ldm
│ └── ...
└── exp
│ └── ...
└── ldm
└── taming
└── scripts
└── tools
└── ...
We provide script to download VQGAN-f8 in LDM github:
To automatically download VQGAN-f8 and save it into the default path (exp/
), please use following script:
bash tools/download_models.sh
We now provide scripts for training LDM on text-to-image and layout-to-image.
Once the datasets are properly set up, one may train LDM by the following commands.
bash tools/ldm/train_ldm_coco_T2I.sh
- Default output folder will be
exp/ldm/T2I
bash tools/ldm/train_ldm_coco_Layout2I.sh
- Default output folder will be
exp/ldm/Layout2I
Change "--gpus" to identify the number of GPUs for training.
For example, using 4 gpus
python main.py --base configs/ldm/coco_sg2im_ldm_Layout2I_vqgan_f8.yaml \
-t True --gpus 0,1,2,3 -log_dir ./exp/ldm/Layout2I \
-n coco_sg2im_ldm_Layout2I_vqgan_f8 --scale_lr False -tb True
Change "-t" to identify training or testing phase. (Note that multi-gpu testing is supported.)
For example, using 4 gpus for testing
python main.py --base configs/ldm/coco_sg2im_ldm_Layout2I_vqgan_f8.yaml \
-t False --gpus 0,1,2,3 -log_dir ./exp/ldm/Layout2I \
-n coco_sg2im_ldm_Layout2I_vqgan_f8 --scale_lr False -tb True
We build LDM_layout codebase heavily on the codebase of Latent Diffusion Model (LDM) and VQGAN. We sincerely thank the authors for open-sourcing!
If you find this code useful for your research, please consider citing:
@misc{rombach2021highresolution,
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
year={2021},
eprint={2112.10752},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{https://doi.org/10.48550/arxiv.2204.11824,
doi = {10.48550/ARXIV.2204.11824},
url = {https://arxiv.org/abs/2204.11824},
author = {Blattmann, Andreas and Rombach, Robin and Oktay, Kaan and Ommer, Björn},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Retrieval-Augmented Diffusion Models},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}