Coder Social home page Coder Social logo

moore-animateanyone's Introduction

๐Ÿค— Introduction

update๏ผš๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅWe launch a HuggingFace Spaces demo of Moore-AnimateAnyone at here!!

This repository reproduces AnimateAnyone. To align the results demonstrated by the original paper, we adopt various approaches and tricks, which may differ somewhat from the paper and another implementation.

It's worth noting that this is a very preliminary version, aiming for approximating the performance (roughly 80% under our test) showed in AnimateAnyone.

We will continue to develop it, and also welcome feedbacks and ideas from the community. The enhanced version will also be launched on our MoBi MaLiang AIGC platform, running on our own full-featured GPU S4000 cloud computing platform.

๐Ÿ“ Release Plans

  • Inference codes and pretrained weights
  • Training scripts

Note The training code involves private data and packages. We will organize this portion of the code as soon as possible and then release it.

๐ŸŽž๏ธ Examples

Here are some results we generated, with the resolution of 512x768.

compare-1-1.mp4
compare-2-2.mp4
demo3.mp4
demo4.mp4
demo5.mp4
demo6.mp4

Limitation: We observe following shortcomings in current version:

  1. The background may occur some artifacts, when the reference image has a clean background
  2. Suboptimal results may arise when there is a scale mismatch between the reference image and keypoints. We have yet to implement preprocessing techniques as mentioned in the paper.
  3. Some flickering and jittering may occur when the motion sequence is subtle or the scene is static.

These issues will be addressed and improved in the near future. We appreciate your anticipation!

โš’๏ธ Installation

Build Environtment

We Recommend a python version >=3.10 and cuda version =11.7. Then build environment as follows:

# [Optional] Create a virtual env
python -m venv .venv
source .venv/bin/activate
# Install with pip:
pip install -r requirements.txt

Download weights

Download our trained weights, which include four parts: denoising_unet.pth, reference_unet.pth, pose_guider.pth and motion_module.pth.

Download pretrained weight of based models and other components:

Download dwpose weights (dw-ll_ucoco_384.onnx, yolox_l.onnx) following this.

Put these weights under a directory, like ./pretrained_weights, and orgnize them as follows:

./pretrained_weights/
|-- DWPose
|   |-- dw-ll_ucoco_384.onnx
|   `-- yolox_l.onnx
|-- image_encoder
|   |-- config.json
|   `-- pytorch_model.bin
|-- denoising_unet.pth
|-- motion_module.pth
|-- pose_guider.pth
|-- reference_unet.pth
|-- sd-vae-ft-mse
|   |-- config.json
|   |-- diffusion_pytorch_model.bin
|   `-- diffusion_pytorch_model.safetensors
`-- stable-diffusion-v1-5
    |-- feature_extractor
    |   `-- preprocessor_config.json
    |-- model_index.json
    |-- unet
    |   |-- config.json
    |   `-- diffusion_pytorch_model.bin
    `-- v1-inference.yaml

Note: If you have installed some of the pretrained models, such as StableDiffusion V1.5, you can specify their paths in the config file (e.g. ./config/prompts/animation.yaml).

๐Ÿš€ Inference

Here is the cli command for running inference scripts:

python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 784 -L 64

You can refer the format of animation.yaml to add your own reference images or pose videos. To convert the raw video into a pose video (keypoint sequence), you can run with the following command:

python tools/vid2pose.py --video_path /path/to/your/video.mp4

๐ŸŽจ Gradio Demo

HuggingFace Demo: We launch a quick preview demo of Moore-AnimateAnyone at HuggingFace Spaces!! We appreciate the assistance provided by the HuggingFace team in setting up this demo.

To reduce waiting time, we limit the size (width, height, and length) and inference steps when generating videos.

If you have your own GPU resource (>= 16GB vram), you can run a local gradio app via following commands:

python app.py

๐Ÿ–Œ๏ธ Try on Mobi MaLiang

We will launched this model on our MoBi MaLiang AIGC platform, running on our own full-featured GPU S4000 cloud computing platform. Mobi MaLiang has now integrated various AIGC applications and functionalities (e.g. text-to-image, controllable generation...). You can experience it by clicking this link or scanning the QR code bellow via WeChat!

โš–๏ธ Disclaimer

This project is intended for academic research, and we explicitly disclaim any responsibility for user-generated content. Users are solely liable for their actions while using the generative model. The project contributors have no legal affiliation with, nor accountability for, users' behaviors. It is imperative to use the generative model responsibly, adhering to both ethical and legal standards.

๐Ÿ™๐Ÿป Acknowledgements

We first thank the authors of AnimateAnyone. Additionally, we would like to thank the contributors to the majic-animate, animatediff and Open-AnimateAnyone repositorities, for their open research and exploration. Furthermore, our repo incorporates some codes from dwpose and animatediff-cli-prompt-travel, and we extend our thanks to them as well.

moore-animateanyone's People

Contributors

songtao-liu-mt avatar lixunsong avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.