Coder Social home page Coder Social logo

svd_xtend's Introduction

SVD Xtend

Stable Video Diffusion Training Code and Extensions ๐Ÿš€

๐Ÿ’ก Highlight

  • Finetuning SVD. See Part 1.
  • Tracklet-Conditioned Video Generation. Building upon SVD, you can control the movement of objects using tracklets(bbox). See Part 2.

Part 1: Training

Comparison

size=(512, 320), motion_bucket_id=127, fps=7, noise_aug_strength=0.00
generator=torch.manual_seed(111)
Init Image Before Fine-tuning After Fine-tuning
demo ori ft
demo ori ft
demo ori ft
demo ori ft

Video Data Processing

Note that BDD100K is a driving video/image dataset, but this is not a necessity for training. Any video can be used to initiate your training. Please refer to the DummyDataset data reading logic. In short, you only need to modify self.base_folder. Then arrange your videos in the following file structure:

self.base_folder
    โ”œโ”€โ”€ video_name1
    โ”‚   โ”œโ”€โ”€ video_frame1
    โ”‚   โ”œโ”€โ”€ video_frame2
    โ”‚   ...
    โ”œโ”€โ”€ video_name2
    โ”‚   โ”œโ”€โ”€ video_frame1
        โ”œโ”€โ”€ ...

Training Configuration(on the BDD100K dataset)

This training configuration is for reference only, I set all parameters of unet to be trainable during the training and adopted a learning rate of 1e-5.

accelerate launch train_svd.py \
    --pretrained_model_name_or_path=/path/to/weight \
    --per_gpu_batch_size=1 --gradient_accumulation_steps=1 \
    --max_train_steps=50000 \
    --width=512 \
    --height=320 \
    --checkpointing_steps=1000 --checkpoints_total_limit=1 \
    --learning_rate=1e-5 --lr_warmup_steps=0 \
    --seed=123 \
    --mixed_precision="fp16" \
    --validation_steps=200

Part 2: Tracklet2Video

Tracklet2Video

We have attempted to incorporate layout control on top of img2video, which makes the motion of objects more controllable, similar to what is demonstrated in the image below. The code and weights will be updated soon. It should be noted that we use a resolution of 512*320 for SVD to generate videos, so the quality of the generated videos appears to be poor (which is somewhat unfair to SVD), but our intention is to demonstrate the effectiveness of tracklet control, and we will resolve the issue with video quality as soon as possible.

Init Image Gen Video by SVD Gen Video by Ours
demo1 svd1 gen1
demo2 svd2 gen2

Methods

We have utilized the Self-Tracking training from Boximator and the Instance-Enhancer from TrackDiffusion. For more details, please refer to the paper.

๐Ÿท๏ธ TODO List

  • Support text2video (WIP)
  • Support more conditional inputs, such as layout

โ™ฅ๏ธ Acknowledgement

Our model is related to Diffusers and Stability AI. Thanks for their great work!

Thanks Boximator and GLIGEN for their awesome models.

โœ’๏ธ Citation

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@article{li2023trackdiffusion,
  title={Trackdiffusion: Multi-object tracking data generation via diffusion models},
  author={Li, Pengxiang and Liu, Zhili and Chen, Kai and Hong, Lanqing and Zhuge, Yunzhi and Yeung, Dit-Yan and Lu, Huchuan and Jia, Xu},
  journal={arXiv preprint arXiv:2312.00651},
  year={2023}
}

svd_xtend's People

Contributors

pixeli99 avatar blakeone avatar kiteretsu77 avatar ciarastrawberry avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.