Coder Social home page Coder Social logo

signmm's Introduction

MimicMotion

Replicate

MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
Yuang Zhang1,2, Jiaxi Gu1, Li-Wen Wang1, Han Wang1,2, Junqi Cheng1, Yuefeng Zhu1, Fangyuan Zou1
[1Tencent; 2Shanghai Jiao Tong University]


Highlights: rich details, good temporal smoothness, and long video length.

Overview

model architecture
An overview of the framework of MimicMotion.

In recent years, generative artificial intelligence has achieved significant advancements in the field of image generation, spawning a variety of applications. However, video generation still faces considerable challenges in various aspects such as controllability, video length, and richness of details, which hinder the application and popularization of this technology. In this work, we propose a controllable video generation framework, dubbed MimicMotion, which can generate high-quality videos of arbitrary length with any motion guidance. Comparing with previous methods, our approach has several highlights. Firstly, with confidence-aware pose guidance, temporal smoothness can be achieved so model robustness can be enhanced with large-scale training data. Secondly, regional loss amplification based on pose confidence significantly eases the distortion of image significantly. Lastly, for generating long smooth videos, a progressive latent fusion strategy is proposed. By this means, videos of arbitrary length can be generated with acceptable resource consumption. With extensive experiments and user studies, MimicMotion demonstrates significant improvements over previous approaches in multiple aspects.

News

  • [2024-07-01]: Project page, code, technical report and a basic model checkpoint are released. A better checkpoint supporting higher quality video generation will be released very soon. Stay tuned!

Quickstart

For the initial released version of the model checkpoint, it supports generating videos with a maximum of 16 frames at a 576x1024 resolution. If you encounter insufficient memory issues, you can appropriately reduce the number of frames.

Environment setup

Recommend python 3+ with torch 2.x are validated with an Nvidia V100 GPU. Follow the command below to install all the dependencies of python:

conda env create -f environment.yaml
conda activate mimicmotion

Download weights

Please download weights manually as follows:

cd MimicMotions/
mkdir models
  1. Download SVD model: stabilityai/stable-video-diffusion-img2vid-xt-1-1
    git lfs install
    git clone https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1
    mkdir -p models/SVD
    mv stable-video-diffusion-img2vid-xt-1-1 models/SVD/
    
  2. Download DWPose pretrained model: dwpose
    git lfs install
    git clone https://huggingface.co/yzd-v/DWPose
    mv DWPose models/
    
  3. Download the pre-trained checkpoint of MimicMotion from Huggingface
    curl -o models/MimicMotion.pth https://huggingface.co/ixaac/MimicMotion/resolve/main/MimicMotion.pth
    

Finally, all the weights should be organized in models as follows

models/
├── DWPose
│   ├── dw-ll_ucoco_384.onnx
│   └── yolox_l.onnx
├── SVD
│   └──stable-video-diffusion-img2vid-xt-1-1
└── MimicMotion.pth

Model inference

A sample configuration for testing is provided as test.yaml. You can also easily modify the various configurations according to your needs.

python inference.py --inference_config configs/test.yaml

Citation

@article{mimicmotion2024,
  title={MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance},
  author={Yuang Zhang and Jiaxi Gu and Li-Wen Wang and Han Wang and Junqi Cheng and Yuefeng Zhu and Fangyuan Zou},
  journal={arXiv preprint arXiv:2406.19680},
  year={2024}
}

signmm's People

Watchers

Kirok Kim avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.