Coder Social home page Coder Social logo

moore-animateanyone's Introduction

🤗 Introduction

update 🔥🔥🔥 We propose a face reenactment method, based on our AnimateAnyone pipeline: Using the facial landmark of driving video to control the pose of given source image, and keeping the identity of source image. Specially, we disentangle head attitude (including eyes blink) and mouth motion from the landmark of driving video, and it can control the expression and movements of source face precisely. We release our inference codes and pretrained models of face reenactment!!

update 🏋️🏋️🏋️ We release our training codes!! Now you can train your own AnimateAnyone models. See here for more details. Have fun!

update:🔥🔥🔥 We launch a HuggingFace Spaces demo of Moore-AnimateAnyone at here!!

This repository reproduces AnimateAnyone. To align the results demonstrated by the original paper, we adopt various approaches and tricks, which may differ somewhat from the paper and another implementation.

It's worth noting that this is a very preliminary version, aiming for approximating the performance (roughly 80% under our test) showed in AnimateAnyone.

We will continue to develop it, and also welcome feedbacks and ideas from the community. The enhanced version will also be launched on our MoBi MaLiang AIGC platform, running on our own full-featured GPU S4000 cloud computing platform.

📝 Release Plans

  • Inference codes and pretrained weights of AnimateAnyone
  • Training scripts of AnimateAnyone
  • Inference codes and pretrained weights of face reenactment
  • Training scripts of face reenactment
  • Inference scripts of audio driven portrait video generation
  • Training scripts of audio driven portrait video generation

🎞️ Examples

AnimateAnyone

Here are some AnimateAnyone results we generated, with the resolution of 512x768.

compare-1-1.mp4
compare-2-2.mp4
demo3.mp4
demo4.mp4
demo5.mp4
demo6.mp4

Limitation: We observe following shortcomings in current version:

  1. The background may occur some artifacts, when the reference image has a clean background
  2. Suboptimal results may arise when there is a scale mismatch between the reference image and keypoints. We have yet to implement preprocessing techniques as mentioned in the paper.
  3. Some flickering and jittering may occur when the motion sequence is subtle or the scene is static.

These issues will be addressed and improved in the near future. We appreciate your anticipation!

Face Reenactment

Here are some results we generated, with the resolution of 512x512.

1.mp4
2.mp4
3.mp4
4.mp4

⚒️ Installation

Build Environtment

We Recommend a python version >=3.10 and cuda version =11.7. Then build environment as follows:

# [Optional] Create a virtual env
python -m venv .venv
source .venv/bin/activate
# Install with pip:
pip install -r requirements.txt  
# For face landmark extraction
git clone https://github.com/emilianavt/OpenSeeFace.git  

Download weights

Automatically downloading: You can run the following command to download weights automatically:

python tools/download_weights.py

Weights will be placed under the ./pretrained_weights direcotry. The whole downloading process may take a long time.

Manually downloading: You can also download weights manually, which has some steps:

  1. Download our AnimateAnyone trained weights, which include four parts: denoising_unet.pth, reference_unet.pth, pose_guider.pth and motion_module.pth.

  2. Download our trained weights of face reenactment, and place these weights under pretrained_weights.

  3. Download pretrained weight of based models and other components:

  4. Download dwpose weights (dw-ll_ucoco_384.onnx, yolox_l.onnx) following this.

Finally, these weights should be orgnized as follows:

./pretrained_weights/
|-- DWPose
|   |-- dw-ll_ucoco_384.onnx
|   `-- yolox_l.onnx
|-- image_encoder
|   |-- config.json
|   `-- pytorch_model.bin
|-- denoising_unet.pth
|-- motion_module.pth
|-- pose_guider.pth
|-- reference_unet.pth
|-- sd-vae-ft-mse
|   |-- config.json
|   |-- diffusion_pytorch_model.bin
|   `-- diffusion_pytorch_model.safetensors
|-- reenact
|   |-- denoising_unet.pth
|   |-- reference_unet.pth
|   |-- pose_guider1.pth
|   |-- pose_guider2.pth
`-- stable-diffusion-v1-5
    |-- feature_extractor
    |   `-- preprocessor_config.json
    |-- model_index.json
    |-- unet
    |   |-- config.json
    |   `-- diffusion_pytorch_model.bin
    `-- v1-inference.yaml

Note: If you have installed some of the pretrained models, such as StableDiffusion V1.5, you can specify their paths in the config file (e.g. ./config/prompts/animation.yaml).

🚀 Training and Inference

Inference of AnimateAnyone

Here is the cli command for running inference scripts:

python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 784 -L 64

You can refer the format of animation.yaml to add your own reference images or pose videos. To convert the raw video into a pose video (keypoint sequence), you can run with the following command:

python tools/vid2pose.py --video_path /path/to/your/video.mp4

Inference of Face Reenactment

Here is the cli command for running inference scripts:

python -m scripts.lmks2vid --config ./configs/prompts/inference_reenact.yaml --driving_video_path YOUR_OWN_DRIVING_VIDEO_PATH --source_image_path YOUR_OWN_SOURCE_IMAGE_PATH  

We provide some face images in ./config/inference/talkinghead_images, and some face videos in ./config/inference/talkinghead_videos for inference.

Training of AnimateAnyone

Note: package dependencies have been updated, you may upgrade your environment via pip install -r requirements.txt before training.

Data Preparation

Extract keypoints from raw videos:

python tools/extract_dwpose_from_vid.py --video_root /path/to/your/video_dir

Extract the meta info of dataset:

python tools/extract_meta_info.py --root_path /path/to/your/video_dir --dataset_name anyone 

Update lines in the training config file:

data:
  meta_paths:
    - "./data/anyone_meta.json"

Stage1

Put openpose controlnet weights under ./pretrained_weights, which is used to initialize the pose_guider.

Put sd-image-variation under ./pretrained_weights, which is used to initialize unet weights.

Run command:

accelerate launch train_stage_1.py --config configs/train/stage1.yaml

Stage2

Put the pretrained motion module weights mm_sd_v15_v2.ckpt (download link) under ./pretrained_weights.

Specify the stage1 training weights in the config file stage2.yaml, for example:

stage1_ckpt_dir: './exp_output/stage1'
stage1_ckpt_step: 30000 

Run command:

accelerate launch train_stage_2.py --config configs/train/stage2.yaml

🎨 Gradio Demo

HuggingFace Demo: We launch a quick preview demo of Moore-AnimateAnyone at HuggingFace Spaces!! We appreciate the assistance provided by the HuggingFace team in setting up this demo.

To reduce waiting time, we limit the size (width, height, and length) and inference steps when generating videos.

If you have your own GPU resource (>= 16GB vram), you can run a local gradio app via following commands:

python app.py

Community Contributions

🖌️ Try on Mobi MaLiang

We will launched this model on our MoBi MaLiang AIGC platform, running on our own full-featured GPU S4000 cloud computing platform. Mobi MaLiang has now integrated various AIGC applications and functionalities (e.g. text-to-image, controllable generation...). You can experience it by clicking this link or scanning the QR code bellow via WeChat!

⚖️ Disclaimer

This project is intended for academic research, and we explicitly disclaim any responsibility for user-generated content. Users are solely liable for their actions while using the generative model. The project contributors have no legal affiliation with, nor accountability for, users' behaviors. It is imperative to use the generative model responsibly, adhering to both ethical and legal standards.

🙏🏻 Acknowledgements

We first thank the authors of AnimateAnyone. Additionally, we would like to thank the contributors to the majic-animate, animatediff and Open-AnimateAnyone repositories, for their open research and exploration. Furthermore, our repo incorporates some codes from dwpose and animatediff-cli-prompt-travel, and we extend our thanks to them as well.

moore-animateanyone's People

Contributors

kegeyang avatar liangyang-mt avatar lixunsong avatar npjd avatar songtao-liu-mt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

moore-animateanyone's Issues

vid2pose.py: No module named 'src'

Traceback (most recent call last):
File "E:\AI\Moore-AnimateAnyone\tools\vid2pose.py", line 1, in
from src.dwpose import DWposeDetector
ModuleNotFoundError: No module named 'src'

image

ImportError: cannot import name 'CaptionProjection' from 'diffusers.models.embeddings'

Traceback (most recent call last):
  File "Moore-AnimateAnyone\app.py", line 16, in <module>
    from src.models.unet_2d_condition import UNet2DConditionModel
  File "Moore-AnimateAnyone\src\models\unet_2d_condition.py", line 40, in <module>
    from .unet_2d_blocks import (
  File "Moore-AnimateAnyone\src\models\unet_2d_blocks.py", line 15, in <module>
    from .transformer_2d import Transformer2DModel
  File "Moore-AnimateAnyone\src\models\transformer_2d.py", line 7, in <module>
    from diffusers.models.embeddings import CaptionProjection
ImportError: cannot import name 'CaptionProjection' from 'diffusers.models.embeddings' 

Enhancing the Fidelity of Generated Animations in Moore-AnimateAnyone

Dear Moore-AnimateAnyone Contributors,

I hope this message finds you well. I have been thoroughly exploring the capabilities of the Moore-AnimateAnyone repository and am deeply impressed by the strides made in animating still images with such remarkable results. The demo hosted on HuggingFace Spaces is particularly indicative of the potential this technology holds.

However, upon delving into the examples provided and running my own tests, I have observed certain limitations that I believe, if addressed, could significantly elevate the quality of the animations produced. I would like to propose a few enhancements that could potentially mitigate these issues and refine the overall animation process.

  1. Background Artifacts: The presence of artifacts in animations, especially when the reference image has a clean background, can be quite distracting. Could we consider implementing a more robust background detection and preservation algorithm to maintain the integrity of the original image?

  2. Scale Mismatch: The suboptimal results due to scale mismatch between the reference image and keypoints are noticeable. While the paper suggests preprocessing techniques, their implementation is not yet apparent in the current version. Could we prioritise the integration of these preprocessing techniques to improve the handling of scale variations?

  3. Motion Subtleties: The flickering and jittering in animations with subtle motions or static scenes detract from the fluidity of the animation. Would it be possible to introduce a smoothing mechanism or a motion threshold to ensure that only significant movements are translated into the animation sequence?

I understand that these enhancements may involve considerable research and development efforts, but I believe they could be instrumental in pushing the boundaries of what Moore-AnimateAnyone can achieve. Additionally, these improvements could be pivotal in the deployment of this technology on the MoBi MaLiang AIGC platform, ensuring a more polished and professional output for end-users.

I am keen to follow the progress of this project and am more than willing to contribute to discussions or testing, should you find my feedback of value.

Thank you for your dedication to this innovative project, and I look forward to your thoughts on the potential for these enhancements.

Best regards,
yihong1120

training error in stage 1

File "Moore-AnimateAnyone/src/models/mutual_self_attention.py", line 180, in hacked_basic_transformer_inner_forward
norm_hidden_states[_uc_mask],
IndexError: The shape of the mask [2] at index 0 does not match the shape of the indexed tensor [3, 9216, 320] at index 0
Steps: 1%|▎ | 249/30000 [06:30<12:57:21, 1.57s/it, lr=1e-5, step_loss=0.107]

Gradio crashes part way through

But the iterations continue.
Once completed a load of noise , this was length 32 , from a 25 frame source video

20240113T1143.mp4

【抱团取暖】加入 AnimateAnyone 交流群~

感谢 Repo 大佬开源!大佬求勿删,纯技术交流,转发一个数字人 AIGC 技术交流群,群里日常讨论 Animate Anyone 在内的各项数字人基础技术,有各种资源分享、一键安装脚本等,小白友好,交流活跃,一起提高开发、学习效率。
SCR-20240420-odtl

Safetensor models

Is there an easy way to use safetensor models with the pipeline?
I have a few merges I would like to try.

vid2pose - import error

Hello!

Could you tell me why I'm getting error when launch vid2pose modul. I use commend that you provided.
python tools/vid2pose.py --video_path my_path/to_file.mp4

Console log:
Traceback (most recent call last):
  File "/content/Moore-AnimateAnyone/tools/vid2pose.py", line 1, in <module>
    from src.dwpose import DWposeDetector
ModuleNotFoundError: No module named 'src

ReferenceAttentionControl

With unconditional generation during training, should reference embedding concat to the normal_hidden_states?

Unable to load weights from checkpoint file

every time i run this I get the following error:

Traceback (most recent call last):
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 109, in load_state_dict
    return torch.load(checkpoint_file, map_location="cpu")
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1028, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1246, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 122, in load_state_dict
    raise ValueError(
ValueError: Unable to locate the file ./pretrained_weights/stable-diffusion-v1-5/unet\diffusion_pytorch_model.bin which is necessary to load this pretrained model. Make sure you have saved the model properly.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\routes.py", line 534, in predict
    output = await route_utils.call_process_api(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1554, in process_api
    result = await self.call_function(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1192, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 659, in wrapper
    response = f(*args, **kwargs)
  File "C:\AI\AnimateAnyone\Moore-AnimateAnyone\app.py", line 52, in animate
    reference_unet = UNet2DConditionModel.from_pretrained(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 800, in from_pretrained
    state_dict = load_state_dict(model_file, variant=variant)
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 127, in load_state_dict
    raise OSError(
OSError: Unable to load weights from checkpoint file for './pretrained_weights/stable-diffusion-v1-5/unet\diffusion_pytorch_model.bin' at './pretrained_weights/stable-diffusion-v1-5/unet\diffusion_pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

any solutions?

Weights of the stage one

Hello, thanks for your open weights. However, I am wondering to use some features of the result from the first training stage, will you share these weights?
I also wondering the ability of keeping id and generating high-quality image about the first-stage result, would you share me with this experience?

colab demo

someone be so kind as to make a google colab to test this thanks :)

overfit phenomenon?

Thanks for your great work. Have you ever encounter the phenomenon of overfit?

Here is a function that is not used?

In src/pipelines/utils.py, the fuction set_tensor_interpolation_method doesn't look like it's ever been used. I searched for this function name globally in VSCode and found that it only appeared once, when it was defined.
Then I found out that the variable tensor_interpolation will only be modified in this function, which means that the return value get_tensor_interpolation_method function is always None, if I understand correctly.
The function get_tensor_interpolation_method used when building the Pose2VideoPipeline, and I'm not sure if this will affect the results.

max length 128

Is there anyway to increase the length of the output video longer than 4sec?

Can you provide a more powerful model?

Thank you very much for providing your valuable code and model. The current model test results may need further optimization.
In addition, can you open source the training code? I would be very grateful.

config.json

OSError: Error no file named config.json found in directory ./pretrained_weights/stable-diffusion-v1-5/.

different transforms in preprocess training data

I noticed that different transformation operations are used for pose and image.
For pose:
self.cond_transform = transforms.Compose(
[
transforms.RandomResizedCrop(
self.img_size,
scale=self.img_scale,
ratio=self.img_ratio,
interpolation=transforms.InterpolationMode.BILINEAR,
),
transforms.ToTensor(),
]
)
For image:
self.transform = transforms.Compose(
[
transforms.RandomResizedCrop(
self.img_size,
scale=self.img_scale,
ratio=self.img_ratio,
interpolation=transforms.InterpolationMode.BILINEAR,
),
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5]),
]
)
Why does pose not require the final normalization step?

torch

i have tried alll torch version which version work?

脸部扭曲

image

脸部细节都刻画了,导致了生成图片脸部扭曲
官方实现中,没有包含脸部细节

Traing code error

Many thanks for releasing the training code.

However, when following the environment setting as well as data preparation and then running the command of stage 1 training, I got the error in the following screenshot. Is there anything wrong?

image

Looking forward to your reply. Thank you again !

Is it right for these flicker result?

Very good job! I run you code in Colab,use anyone-video-2 kpts in your lib, just choose my reference img, but the results seem to no good, can you check it?

20240108-211948_anyone-video-2_784x512_3_0644.mp4
+._anyone-video-2_784x512_3_0728.mp4

Resources required for training?

Hello,

Thank you so much for releasing the training code. What is the GPU VRAM required for training? Say if one wants to train it using single A100 (40GB) how long will it take to get very good results?

Questions about training dataset

Such an open source effort with amazing results!
I have some questions about training data. What is the approximate amount of video data used to train the model?

Question on datasets

Congratulations on achieving such amazing results!!!
Both cartoons and real person can make smooth motion, so I have a question on which type of datasets did you use during training, like ubc datasets or dataset from tiktok?

Cannot run example scripts. OOM Error

Thank you for your great work. When I directly run your provided command, it gives "RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x1024 and 768x320)"
1705137120120

Guidance for Fine-tuning Moore AnimateAnyone with a Small Dataset

Hello Moore AnimateAnyone team,

I've been exploring your remarkable project and am interested in applying it to a specific domain by fine-tuning the pre-trained model on a small, domain-specific dataset. I would appreciate some guidance on the best practices for fine-tuning the model effectively. My questions are as follows:

  1. Model Weight Initialization: For fine-tuning, is it recommended to initialize the model with the provided pre-trained weights and then continue training on the new dataset? If so, could you provide an example or guidance on loading the pre-trained weights correctly before starting the fine-tuning process?

  2. Two-Stage Training Process: The training process for the model is described as two-stage. Should fine-tuning on a new dataset also follow this two-stage approach, or are there any modifications or considerations we should be aware of for fine-tuning?

  3. Data Preparation and Augmentation: For fine-tuning on a small dataset, are there any specific data preparation or augmentation techniques you recommend to prevent overfitting and ensure the model generalizes well to the new domain?

  4. Hyperparameter Adjustments: Are there any specific hyperparameters (e.g., learning rate, batch size) that you suggest tweaking for fine-tuning as opposed to training from scratch?

  5. Evaluation during Fine-tuning: What are the best practices for evaluating the model during the fine-tuning process to ensure that it's adapting well to the new dataset without forgetting the knowledge gained during pre-training?

Any guidance, examples, or additional resources you could provide would be greatly appreciated. Fine-tuning deep learning models can be nuanced, and insights from the creators would be invaluable.

Thank you for your work on this innovative project and for your support to the community.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.