Coder Social home page Coder Social logo

hazemabdelkawy / geneface Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yerfor/geneface

0.0 1.0 0.0 172.31 MB

GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code

License: MIT License

Shell 1.10% C++ 5.21% Python 93.53% Starlark 0.16%

geneface's Introduction

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis | ICLR'23

Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Jinzheng He, Zhou Zhao | Zhejiang University, ByteDance

arXiv| GitHub Stars | visitors | downloads | 中文文档

This repository is the official PyTorch implementation of our ICLR-2023 paper, in which we propose GeneFace for generalized and high-fidelity audio-driven talking face generation. The inference pipeline is as follows:



Our GeneFace achieves better lip synchronization and expressiveness to out-of-domain audios. Watch this video for a clear lip-sync comparison against previous NeRF-based methods. You can also visit our project page for more details.

👷 This repo is still under construction ...

Improvment on the way:

  • Improve the efficiency of NeRF-based render by integrating RAD-NeRF, which could infer in real-time.
  • Improve the efficiency of 3DMM extraction, by transfering the TF-based deep_3drecon to a pytorch-based one, which is compatible to new GPUs such as RTX3090, and is 8x faster than the current TF-based one.

🔥 Update:

  • 2023.3.7 A upgrade version of GeneFace, namely GeneFace-S will be released on May 1st. It could provide more accurate and stable lip shape to various voice; it could be trained 4x faster (only 10 hours for a whole model) and could infer in real time (25fps) on RTX2080! Please stay tuned!
  • 2023.2.22 We release a 1 minute-long demo video, in which GeneFace is driven by a Chinese song generated by DiffSinger.
  • 2023.2.20 We release a stable 3D landmark post-processing strategy in inference/ners/lm3d_nerf_infer.py, which improve the stability and quality of the final results by a large margin.

Quick Start!

We provide pre-trained models and processed datasets of GeneFace in this release to enable a quick start. In the following, we show how to infer the pre-trained models in 4 steps. If you want to train GeneFace on your own target person video, please reach to the following sections (Prepare Environments, Prepare Datasets, and Train Models).

  • Step1. Create a new python env named geneface following the guide in docs/prepare_env/install_guide_nerf.md. Download BFM_model_front.mat at this link and place it into ./deep_3drecon/BFM and ./data_util/BFM_models directory.

  • Step2. Download the lrs3.zip and May.zip in the release and unzip it into the checkpoints directory.

  • Step3. Download the binarized dataset of May.mp4 at this link (about 3.5 GB) and place it into the data/binary/videos/May/trainval_dataset.npy directory.

After the above steps, the structure of your checkpoint and data directory should look like this:

> checkpoints
    > lrs3
        > lm3d_vae
        > syncnet
    > May
        > postnet
        > lm3d_nerf
        > lm3d_nerf_torso
> data
    > binary
        > videos
            > May
                trainval_dataset.npy
  • Step4. Run the scripts below:
bash scripts/infer_postnet.sh
bash scripts/infer_lm3d_nerf.sh

You can find a output video named infer_out/May/pred_video/zozo.mp4.

Prepare Environments

Please follow the steps in docs/prepare_env.

Prepare Datasets

Please follow the steps in docs/process_data.

Train Models

Please follow the steps in docs/train_models.

Train GeneFace on other target person videos

Apart from the May.mp4 provided in this repo, we also provide 8 target person videos that were used in our experiments. You can download them at this link. To train on a new video named <video_id>.mp4, you should place it into the data/raw/videos/ directory, then create a new folder at egs/datasets/videos/<video_id> and edit config files, according to the provided example folder egs/datasets/videos/May.

You can also record your own video and train a unique GeneFace model for yourself!

Todo List

  • The inference process of NeRF-based renderer is relatively slow (it takes about 2 hours on 1 RTX2080Ti to render 250 frames at 512x512 resolution with n_samples_per_ray_fine=128). Currently, we could partially alleviate this problem by using multile GPUs or setting --n_samples_per_ray and --n_samples_per_ray_fine to a lower value. In the future we will add acceleration techniques on the NeRF-based renderer.
  • GeneFace use 3D landmark as the intermediate between the audio2motion and motion2image mapping. However, the 3D landmark sequence generated by the postnet sometimes have bad cases (such as shaking head, or extra-large mouth) and influence the quality of the rendered video. Currently, we partially alleviate this problem by postprocessing the predicted 3D landmark sequence. We call for better postprocessing methods.

Citation

@article{ye2023geneface,
  title={GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis},
  author={Ye, Zhenhui and Jiang, Ziyue and Ren, Yi and Liu, Jinglin and He, Jinzheng and Zhao, Zhou},
  journal={arXiv preprint arXiv:2301.13430},
  year={2023}
}

Acknowledgements

Our codes are based on the following repos:

geneface's People

Contributors

yerfor avatar zimonitrome avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.