Coder Social home page Coder Social logo

splinter21 / animate-anything Goto Github PK

View Code? Open in Web Editor NEW

This project forked from alibaba/animate-anything

0.0 0.0 0.0 29.27 MB

Fine-Grained Open Domain Image Animation with Motion Guidance

Home Page: https://animationai.github.io/AnimateAnything/

License: MIT License

Shell 0.05% Python 99.95%

animate-anything's Introduction

๐Ÿ‘‰ AnimateAnything: Fine Grained Open Domain Image Animation with Motion Guidance

Zuozhuo Dai, Zhenghao Zhang, Menghao Li, Junchao Liao, Siyu Zhu, Long Qin, Weizhi Wang

views

Showcases

video_demol.mp4
Input Image with Mask Prompt Result
Input image Barbie watching the camera with a smiling face. Result
Input image The cloak swaying in the wind. Result
Input image A red fish is swimming. Result

Framework

framework

News ๐Ÿ”ฅ

2024.2.5: Support multiple GPUs training with Accelerator DeepSpeed. Config DeepSpeed zero_stage 2 and offload_optimizer_device cpu, you can do full finetuning animate-anything with 4x16G V100 GPUs and SVD with 4x24G A10 GPUs now.

2023.12.27: Support finetuning based on SVD (stable video diffusion) model. Update SVD based animate_anything_svd_v1.0

2023.12.18: Update model to animate_anything_512_v1.02

Features Planned

  • ๐Ÿ’ฅ Enhanced prompt-following: generating long-detailed captions using LLaVA.
  • ๐Ÿ’ฅ Replace the U-Net with DiffusionTransformer (DiT) as the base model.
  • ๐Ÿ’ฅ Variable resolutions.
  • ๐Ÿ’ฅ Support Huggingface Demo / Google Colab.
  • etc.

Getting Started

This repository is based on Text-To-Video-Finetuning.

Create Conda Environment (Optional)

It is recommended to install Anaconda.

Windows Installation: https://docs.anaconda.com/anaconda/install/windows/

Linux Installation: https://docs.anaconda.com/anaconda/install/linux/

conda create -n animation python=3.10
conda activate animation

Python Requirements

pip install -r requirements.txt

Running inference

Please download the pretrained model to output/latent, then run the following command. Please replace the {download_model} to your download model name:

python train.py --config output/latent/{download_model}/config.yaml --eval validation_data.prompt_image=example/barbie2.jpg validation_data.prompt='A cartoon girl is talking.'

To control the motion area, we can use the labelme to generate a binary mask. First, we use labelme to draw the polygon for the reference image.

Then we run the following command to transform the labelme json file to a mask.

labelme_json_to_dataset qingming2.json

Then run the following command for inference:

python train.py --config output/latent/{download_model}/config.yaml --eval validation_data.prompt_image=example/qingming2.jpg validation_data.prompt='Peoples are walking on the street.' validation_data.mask=example/qingming2_label.jpg 

User can adjust the motion strength by using the mask motion model:

python train.py --config output/latent/{download_model}/
config.yaml --eval validation_data.prompt_image=example/qingming2.jpg validation_data.prompt='Peoples are walking on the street.' validation_data.mask=example/qingming2_label.jpg validation_data.strength=5

Video super resolution

The model output low res videos, you can use video super resolution model to output high res videos. For example, we can use Real-CUGAN cartoon style video super resolution:

git clone https://github.com/bilibili/ailab.git
cd ailab/Real-CUGAN
python inference_video.py

Training

Using Captions

You can use caption files when training with video. Simply place the videos into a folder and create a json with captions like this:

[
      {"caption": "Cute monster character flat design animation video", "video": "000001_000050/1066697179.mp4"}, 
      {"caption": "Landscape of the cherry blossom", "video": "000001_000050/1066688836.mp4"}
]

Then in your config, make sure to set dataset_types to video_json and set the video_dir and video json path like this:

  - dataset_types: 
      - video_json
    train_data:
      video_dir: '/webvid/webvid/data/videos'
      video_json: '/webvid/webvid/data/40K.json'

Process Automatically

You can automatically caption the videos using the Video-BLIP2-Preprocessor Script and set the dataset_types and json_path like this:

  - dataset_types: 
      - video_blip
    train_data:
      json_path: 'blip_generated.json'

Configuration

The configuration uses a YAML config borrowed from Tune-A-Video repositories.

All configuration details are placed in example/train_mask_motion.yaml. Each parameter has a definition for what it does.

Finetuning anymate-anything

You can finetune anymate-anything with text, motion mask, motion strength guidance on your own dataset. The following config requires around 30G GPU RAM. You can reduce the train_batch_size, train_data.width, train_data.height, and n_sample_frames in the config to reduce GPU RAM:

python train.py --config example/train_mask_motion.yaml pretrained_model_path=<download_model>

Finetune Stable Video Diffusion:

Stable Video Diffusion (SVD) img2vid model can generate high resolution videos. However, it does not have the text or motion mask control. You can finetune SVD with motioin mask guidance with the following commands and pretrained model. This config requires around 80G GPU RAM.

python train_svd.py --config example/train_svd_mask.yaml pretrained_model_path=<download_model>

If you only want to finetune SVD on your own dataset without motion mask control, please use the following config:

python train_svd.py --config example/train_svd.yaml pretrained_model_path=<svd_model>

Multiple GPUs training

I strongly recommend use multiple GPUs training with Accelerator, which will largely decrease the VRAM requirement. Please first config the accelerator with deepspeed. An example config is located in example/deepspeed.yaml.

And then replace 'python train_xx.py ...' commands above with 'accelerate launch train_xx.py ...', for example:

accelerate launch --config_file example/deepspeed.yaml train_svd.py --config example/train_svd_mask.yaml pretrained_model_path=<download_model>

Bibtex

Please cite this paper if you find the code is useful for your research:

@misc{dai2023animateanything,
      title={AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance}, 
      author={Zuozhuo Dai and Zhenghao Zhang and Yao Yao and Bingxue Qiu and Siyu Zhu and Long Qin and Weizhi Wang},
      year={2023},
      eprint={2311.12886},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Shoutouts

animate-anything's People

Contributors

daizuozhuo avatar dailingx avatar leojc avatar sculmh avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.