Coder Social home page Coder Social logo

chenyu-inspirai / motiondirector Goto Github PK

View Code? Open in Web Editor NEW

This project forked from showlab/motiondirector

0.0 0.0 0.0 115.96 MB

MotionDirector: Motion Customization of Text-to-Video Diffusion Models.

Home Page: https://showlab.github.io/MotionDirector/

License: Apache License 2.0

Python 100.00%

motiondirector's Introduction

MotionDirector

This is the official repository of MotionDirector.

MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
Rui Zhao, Yuchao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou

Project Page arXiv


MotionDirector can customize text-to-video diffusion models to generate videos with desired motions.

News

ToDo

  • Gradio Demo
  • More trained weights of MotionDirector

Setup

Requirements

# create virtual environment
conda create -n motiondirector python=3.8
conda activate motiondirector
# install packages
pip install -r requirements.txt

Weights of Foundation Models

git lfs install
## You can choose the ModelScopeT2V or ZeroScope, etc., as the foundation model.
## ZeroScope
git clone https://huggingface.co/cerspense/zeroscope_v2_576w ./models/zeroscope_v2_576w/
## ModelScopeT2V
git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b ./models/model_scope/

Weights of trained MotionDirector

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/ruizhaocv/MotionDirector_weights ./outputs

Usage

Training

Train MotionDirector on multiple videos:

python MotionDirector_train.py --config ./configs/config_multi_videos.yaml

Train MotionDirector on a single video:

python MotionDirector_train.py --config ./configs/config_single_video.yaml

Note:

  • Before running the above command, make sure you replace the path to foundational model weights and training data with your own in the config files config_multi_videos.yaml or config_single_video.yaml.
  • Generally, training on multiple 16-frame videos usually takes 300~500 steps, about 9~16 minutes using one A5000 GPU. Training on a single video takes 50~150 steps, about 1.5~4.5 minutes using one A5000 GPU. The required VRAM for training is around 14GB.
  • Reduce n_sample_frames if your GPU memory is limited.
  • Reduce the learning rate and increase the training steps for better performance.

Inference

python MotionDirector_inference.py --model /path/to/the/foundation/model  --prompt "Your prompt" --checkpoint_folder /path/to/the/trained/MotionDirector --checkpoint_index 300 --noise_prior 0.

Note:

  • Replace /path/to/the/foundation/model with your own path to the foundation model, like ZeroScope.
  • The value of checkpoint_index means the checkpoint saved at which the training step is selected.
  • The value of noise_prior indicates how much the inversion noise of the reference video affects the generation. We recommend setting it to 0 for MotionDirector trained on multiple videos to achieve the highest diverse generation, while setting it to 0.1~0.5 for MotionDirector trained on a single video for faster convergence and better alignment with the reference video.

Inference with pre-trained MotionDirector

All available weights are at official Huggingface Repo. Run the download command, the weights will be downloaded to the folder outputs, then run the following inference command to generate videos.

MotionDirector trained on multiple videos:

python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A person is riding a bicycle past the Eiffel Tower." --checkpoint_folder ./outputs/train/riding_bicycle/ --checkpoint_index 300 --noise_prior 0. --seed 7192280

Note:

  • Replace /path/to/the/ZeroScope with your own path to the foundation model, i.e. the ZeroScope.
  • Change the prompt to generate different videos.
  • The seed is set to a random value by default. Set it to a specific value will obtain certain results, as provided in the table below.

Results:

Reference Videos Videos Generated by MotionDirector
"A person is riding a bicycle." "A person is riding a bicycle past the Eiffel Tower.”
seed: 7192280
"A panda is riding a bicycle in a garden."
seed: 8040063
"An alien is riding a bicycle on Mars."
seed: 2390886

MotionDirector trained on a single video:

16 frames:

python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A tank is running on the moon." --checkpoint_folder ./outputs/train/car_16/ --checkpoint_index 150 --noise_prior 0.5 --seed 8551187
Reference Video Videos Generated by MotionDirector
"A car is running on the road." "A tank is running on the moon.”
seed: 8551187
"A lion is running past the pyramids."
seed: 431554
"A spaceship is flying past Mars."
seed: 8808231

24 frames:

python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A truck is running past the Arc de Triomphe." --checkpoint_folder ./outputs/train/car_24/ --checkpoint_index 150 --noise_prior 0.5 --width 576 --height 320 --num-frames 24 --seed 34543
Reference Video Videos Generated by MotionDirector
"A car is running on the road." "A truck is running past the Arc de Triomphe.”
seed: 34543
"An elephant is running in a forest."
seed: 2171736
"A car is running on the road." "A person on a camel is running past the pyramids."
seed: 4904126
"A spacecraft is flying past the Milky Way galaxy."
seed: 3235677

MotionDirector for Cinematic Shots

1. Zoom

1.1 Dolly Zoom (Hitchcockian Zoom)

python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A firefighter standing in front of a burning forest captured with a dolly zoom." --checkpoint_folder ./outputs/train/dolly_zoom/ --checkpoint_index 150 --noise_prior 0.5 --seed 9365597
Reference Video Videos Generated by MotionDirector
"A man standing in room captured with a dolly zoom." "A firefighter standing in front of a burning forest captured with a dolly zoom."
seed: 9365597
noise_prior: 0.5
"A lion sitting on top of a cliff captured with a dolly zoom."
seed: 1675932
noise_prior: 0.5
"A Roman soldier standing in front of the Colosseum captured with a dolly zoom."
seed: 2310805
noise_prior: 0.5
"A man standing in room captured with a dolly zoom." "A firefighter standing in front of a burning forest captured with a dolly zoom."
seed: 4615820
noise_prior: 0.3
"A lion sitting on top of a cliff captured with a dolly zoom."
seed: 4114896
noise_prior: 0.3
"A Roman soldier standing in front of the Colosseum captured with a dolly zoom."
seed: 7492004

1.2 Zoom In

The reference video is shot with my own water cup. You can also pick up your cup or any other object to practice camera movements and turn it into imaginative videos. Create your AI films with customized camera movements!

python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A firefighter standing in front of a burning forest captured with a zoom in." --checkpoint_folder ./outputs/train/zoom_in/ --checkpoint_index 150 --noise_prior 0.3 --seed 1429227
Reference Video Videos Generated by MotionDirector
"A cup in a lab captured with a zoom in." "A firefighter standing in front of a burning forest captured with a zoom in."
seed: 1429227
"A lion sitting on top of a cliff captured with a zoom in."
seed: 487239
"A Roman soldier standing in front of the Colosseum captured with a zoom in."
seed: 1393184

1.3 Zoom Out

python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A firefighter standing in front of a burning forest captured with a zoom out." --checkpoint_folder ./outputs/train/zoom_out/ --checkpoint_index 150 --noise_prior 0.3 --seed 4971910
Reference Video Videos Generated by MotionDirector
"A cup in a lab captured with a zoom out." "A firefighter standing in front of a burning forest captured with a zoom out."
seed: 4971910
"A lion sitting on top of a cliff captured with a zoom out."
seed: 1767994
"A Roman soldier standing in front of the Colosseum captured with a zoom out."
seed: 8203639

More results

If you have a more impressive MotionDirector or generated videos, please feel free to open an issue and share them with us. We would greatly appreciate it. Improvements to the code are also highly welcome.

Please refer to Project Page for more results.

Citation

@article{zhao2023motiondirector,
  title={MotionDirector: Motion Customization of Text-to-Video Diffusion Models},
  author={Zhao, Rui and Gu, Yuchao and Wu, Jay Zhangjie and Zhang, David Junhao and Liu, Jiawei and Wu, Weijia and Keppo, Jussi and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2310.08465},
  year={2023}
}

Shoutouts

motiondirector's People

Contributors

ruizhaocv avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.