Coder Social home page Coder Social logo

zf223669 / diffusestylegesture Goto Github PK

View Code? Open in Web Editor NEW

This project forked from youngseng/diffusestylegesture

2.0 0.0 0.0 20.87 MB

DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models (IJCAI 2023) | The DiffuseStyleGesture+ entry to the GENEA Challenge 2023 (ICMI 2023)

License: MIT License

Shell 0.16% JavaScript 1.89% Python 96.20% CSS 0.13% HTML 1.61% Batchfile 0.01%

diffusestylegesture's Introduction

DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

Further Work

๐Ÿ“ข QPGesture - Based on motion matching, the upper body gesture.

๐Ÿ“ข UnifiedGesture - Training on multiple gesture datasets, refine the gestures.

News

๐Ÿ“ข 9/Oct/23 - We obtained the REPRODUCIBILITY AWARD by GENEA Committee, so we strongly recommend trying DiffuseStyleGesture+ in advance compared to code of DiffuseStyleGesture is partially optimized.

๐Ÿ“ข 29/Aug/23 - Release the paper of DiffuseStyleGesture+, refer to the official paper of GENEA Challenge 2023 to get more.

๐Ÿ“ข 5/Aug/23 - Release code and pre-trained models of DiffuseStyleGesture+ on BEAT and TWH.

๐Ÿ“ข 31/Jul/23 - Upload a tutorial video on visualizing gestures.

๐Ÿ“ข 25/Jun/23 - Upload presentation video.

๐Ÿ“ข 9/May/23 - First release - arxiv, demo, code, pre-trained models on ZEGGS and issue.

1. Getting started

This code was tested on NVIDIA GeForce RTX 2080 Ti and requires:

  • conda3 or miniconda3
conda create -n DiffuseStyleGesture python=3.7
conda activate DiffuseStyleGesture
pip install -r requirements.txt 

2. Quick Start

  1. Download pre-trained model from Tsinghua Cloud or Google Cloud and put it into ./main/mydiffusion_zeggs/.
  2. Download the WavLM Large and put it into ./main/mydiffusion_zeggs/WavLM/.
  3. cd ./main/mydiffusion_zeggs/ and run
python sample.py --config=./configs/DiffuseStyleGesture.yml --no_cuda 0 --gpu 0 --model_path './model000450000.pt' --audiowavlm_path "./015_Happy_4_x_1_0.wav" --max_len 320

You will get the .bvh file named yyyymmdd_hhmmss_smoothing_SG_minibatch_320_[1, 0, 0, 0, 0, 0]_123456.bvh in the sample_dir folder, which can then be visualized using Blender with the following result (To visualize bvh with Blender see this issue and this tutorial video):

0001-0933.mp4

The parameter no_cuda and gpu need to be the same, i.e. the GPU you want to use; max_len is the length you want to generate, this parameter should be 0 if you want to generate the whole length; if you want to use your own audio, you should rename your audio file name as xxx_style_xxx.wav, e.g. 000_Neutral_xxx.wav (Happy, Sad, ...). please refer to this issue to set the style and intensity you want.

3. Train your own model

(1) Get ZEGGS dataset

Same as ZEGGS.

An example is as follows. Download original ZEGGS datasets from here and put it in ./ubisoft-laforge-ZeroEGGS-main/data/ folder. Then cd ./ubisoft-laforge-ZeroEGGS-main/ZEGGS and run python data_pipeline.py to process the dataset. You will get ./ubisoft-laforge-ZeroEGGS-main/data/processed_v1/trimmed/train/ and ./ubisoft-laforge-ZeroEGGS-main/data/processed_v1/trimmed/test/ folders.

If you find it difficult to obtain and process the data, you can download the data after it has been processed by ZEGGS from Tsinghua Cloud or Baidu Cloud. And put it in ./ubisoft-laforge-ZeroEGGS-main/data/processed_v1/trimmed/ folder.

(2) Process ZEGGS dataset

cd ./main/mydiffusion_zeggs/
python zeggs_data_to_lmdb.py

(3) Train

python end2end.py --config=./configs/DiffuseStyleGesture.yml --no_cuda 0 --gpu 0

The model will save in ./main/mydiffusion_zeggs/zeggs_mymodel3_wavlm/ folder.

Reference

Our work mainly inspired by: MDM, Text2Gesture, Listen, denoise, action!

Citation

If you find this code useful in your research, please cite:

@inproceedings{ijcai2023p650,
  title     = {DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models},
  author    = {Yang, Sicheng and Wu, Zhiyong and Li, Minglei and Zhang, Zhensong and Hao, Lei and Bao, Weihong and Cheng, Ming and Xiao, Long},
  booktitle = {Proceedings of the Thirty-Second International Joint Conference on
               Artificial Intelligence, {IJCAI-23}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  pages     = {5860--5868},
  year      = {2023},
  month     = {8},
  doi       = {10.24963/ijcai.2023/650},
  url       = {https://doi.org/10.24963/ijcai.2023/650},
}

Please feel free to contact us ([email protected]) with any question or concerns.

diffusestylegesture's People

Contributors

cyk990422 avatar kkakkkka avatar youngseng avatar

Stargazers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.