The u-vit from zizhao-hu

U-ViT
_{Official PyTorch implementation of xxxx}

Dependency

pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116  # install torch-1.13.1
pip install accelerate==0.12.0 absl-py ml_collections einops wand ftfy==6.1.1 transformers==4.23.1

pip install -U xformers
pip install -U --pre triton

Preparation Before Training and Evaluation

Autoencoder

Download stable-diffusion directory from this link (which contains image autoencoders converted from Stable Diffusion). Put the downloaded directory as assets/stable-diffusion in this codebase. The autoencoders are used in latent diffusion models.

Data

MS-COCO: Download COCO 2014 training, validation data and annotations.

python scripts/extract_mscoco_feature.py
python scripts/extract_mscoco_feature.py --split=val
python scripts/extract_test_prompt_feature.py
python scripts/extract_empty_feature.py

Reference statistics for FID

Download fid_stats directory from this link (which contains reference statistics for FID). Put the downloaded directory as assets/fid_stats in this codebase. In addition to evaluation, these reference statistics are used to monitor FID during training.

Configs

In config files

config.nnet = d(
    name='uvit_t2i'
    ...,
    c = c,
    v = v,
    ...
)
# change c and v for caption and image transformer depths
# change name to 'uvit_t2i_old','uvit_t2i_cross','uvit_t2i', for original U-ViT-small, cross-attention, and self-attention Models
# name='uvit_t2i', c=0, v=0 is equivalent to U-ViT-small, but cannot load the pretrained weights provided by U-ViT paper.
# name='uvit_t2i_old' will ignore c and v values.

Training

We use the huggingface accelerate library to help train with distributed data parallel and mixed precision. The following is the training command:

# MS-COCO (U-ViT-S/2)
accelerate launch --num_processes 1 --mixed_precision fp16 train_t2i_discrete.py --config=configs/mscoco_uvit_small.py

Sampling

# Running will store the images generated from prompt file test.txt at --nnet_path
accelerate launch --num_processes 1 --mixed_precision fp16 sample_t2i_discrete.py --config=configs/mscoco_uvit_small.py --nnet_path=nnet.pth --input_path=test.txt

Evaluation (MS-COCO (U-ViT-S/2))

# FID
accelerate launch --multi_gpu --num_processes 1 --mixed_precision fp16 eval_t2i_discrete.py --config=configs/mscoco_uvit_small.py --nnet_path=nnet.pth

# CLIP Score
# The first JSON file containing 30000 test captions will be extracted by running 'python scripts/extract_mscoco_feature.py --split=val'
python tools/clipscore.py assets/datasets/coco256_features/val/eval_captions/captions.json workdir/*/*/ckpts/*.ckpt/eval_samples/

References

This implementation is based on

Extended Analytic-DPM (provide the FID reference statistics on CIFAR10 and CelebA 64x64)
guided-diffusion (provide the FID reference statistics on ImageNet)
pytorch-fid (provide the official implementation of FID to PyTorch)
dpm-solver (provide the sampler)

zizhao-hu / u-vit Goto Github PK

u-vit's Introduction

U-ViT
_{Official PyTorch implementation of xxxx}

Dependency

Preparation Before Training and Evaluation

Autoencoder

Data

Reference statistics for FID

Configs

Training

Sampling

Evaluation (MS-COCO (U-ViT-S/2))

References

u-vit's People

Contributors

Stargazers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

zizhao-hu / u-vit Goto Github PK

u-vit's Introduction

U-ViT Official PyTorch implementation of xxxx

Dependency

Preparation Before Training and Evaluation

Autoencoder

Data

Reference statistics for FID

Configs

Training

Sampling

Evaluation (MS-COCO (U-ViT-S/2))

References

u-vit's People

Contributors

Stargazers

Recommend Projects

Recommend Topics

Recommend Org

U-ViT
_{Official PyTorch implementation of xxxx}