Coder Social home page Coder Social logo

zizhao-hu / u-vit Goto Github PK

View Code? Open in Web Editor NEW

This project forked from baofff/u-vit

1.0 0.0 0.0 25.25 MB

A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".

License: MIT License

Python 20.61% Jupyter Notebook 79.39%

u-vit's Introduction

U-ViT
Official PyTorch implementation of xxxx


Dependency

pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116  # install torch-1.13.1
pip install accelerate==0.12.0 absl-py ml_collections einops wand ftfy==6.1.1 transformers==4.23.1

pip install -U xformers
pip install -U --pre triton

Preparation Before Training and Evaluation

Autoencoder

Download stable-diffusion directory from this link (which contains image autoencoders converted from Stable Diffusion). Put the downloaded directory as assets/stable-diffusion in this codebase. The autoencoders are used in latent diffusion models.

Data

python scripts/extract_mscoco_feature.py
python scripts/extract_mscoco_feature.py --split=val
python scripts/extract_test_prompt_feature.py
python scripts/extract_empty_feature.py

Reference statistics for FID

Download fid_stats directory from this link (which contains reference statistics for FID). Put the downloaded directory as assets/fid_stats in this codebase. In addition to evaluation, these reference statistics are used to monitor FID during training.

Configs

In config files

config.nnet = d(
    name='uvit_t2i'
    ...,
    c = c,
    v = v,
    ...
)
# change c and v for caption and image transformer depths
# change name to 'uvit_t2i_old','uvit_t2i_cross','uvit_t2i', for original U-ViT-small, cross-attention, and self-attention Models
# name='uvit_t2i', c=0, v=0 is equivalent to U-ViT-small, but cannot load the pretrained weights provided by U-ViT paper.
# name='uvit_t2i_old' will ignore c and v values.

Training

We use the huggingface accelerate library to help train with distributed data parallel and mixed precision. The following is the training command:

# MS-COCO (U-ViT-S/2)
accelerate launch --num_processes 1 --mixed_precision fp16 train_t2i_discrete.py --config=configs/mscoco_uvit_small.py

Sampling

# Running will store the images generated from prompt file test.txt at --nnet_path
accelerate launch --num_processes 1 --mixed_precision fp16 sample_t2i_discrete.py --config=configs/mscoco_uvit_small.py --nnet_path=nnet.pth --input_path=test.txt

Evaluation (MS-COCO (U-ViT-S/2))

# FID
accelerate launch --multi_gpu --num_processes 1 --mixed_precision fp16 eval_t2i_discrete.py --config=configs/mscoco_uvit_small.py --nnet_path=nnet.pth

# CLIP Score
# The first JSON file containing 30000 test captions will be extracted by running 'python scripts/extract_mscoco_feature.py --split=val'
python tools/clipscore.py assets/datasets/coco256_features/val/eval_captions/captions.json workdir/*/*/ckpts/*.ckpt/eval_samples/

References

This implementation is based on

u-vit's People

Contributors

baofff avatar zizhao-hu avatar zhenxuan00 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.