Coder Social home page Coder Social logo

ssp-nerf's Introduction

Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation (ECCV 2022 Oral)

Xian Liu, Yinghao Xu, Qianyi Wu, Hang Zhou, Wayne Wu, Bolei Zhou.

Animating high-fidelity video portrait with speech audio is crucial for virtual reality and digital entertainment. While most previous studies rely on accurate explicit structural information, recent works explore the implicit scene representation of Neural Radiance Fields (NeRF) for realistic generation. In order to capture the inconsistent motions as well as the semantic difference between human head and torso, some work models them via two individual sets of NeRF, leading to unnatural results. In this work, we propose Semantic-aware Speaking Portrait NeRF (SSP-NeRF), which creates delicate audio-driven portraits using one unified set of NeRF. The proposed model can handle the detailed local facial semantics and the global head-torso relationship through two semantic-aware modules. Specifically, we first propose a Semantic-Aware Dynamic Ray Sampling module with an additional parsing branch that facilitates audio-driven volume rendering. Moreover, to enable portrait rendering in one unified neural radiance field, a Torso Deformation module is designed to stabilize the large-scale non-rigid torso motions. Extensive evaluations demonstrate that our proposed approach renders realistic video portraits.

Environment and Pre-requisites

This project is developed and tested on Ubuntu 18.04, Python 3.7, PyTorch 1.9.1 and CUDA version: 11.1. Since the repository is developed based on AD-NeRF of Guo et al., the environment installation and some pre-requisites like PyTorch3D, 3DMM basis params generally follow theirs.

  • You can create an anaconda environment called sspnerf with:

    conda env create -f environment.yml
    conda activate sspnerf
    
  • PyTorch3D

    Recommend install from a local clone

    git clone https://github.com/facebookresearch/pytorch3d.git
    cd pytorch3d && pip install -e .
    

    Note: The PyTorch3D is only used for dataset pre-processing at a single step and it is not involved in the later training process. If you find the original environment contradicts to the requirements of PyTorch3D, you could create a separate conda env for PyTorch3D and activate it when implementing the corresponding dataset pre-processing step and shift back to the original environment.

  • Basel Face Model 2009

    Put "01_MorphableModel.mat" to data_util/face_tracking/3DMM/; cd data_util/face_tracking; run

    python convert_BFM.py
    
  • spconv

    Recommend install via below command

    git clone https://github.com/traveller59/spconv --recursive
    cd spconv
    git checkout abf0acf30f5526ea93e687e3f424f62d9cd8313a
    git submodule update --init --recursive
    export CUDA_HOME="/usr/local/cuda-11.1"
    python setup.py bdist_wheel
    cd dist
    pip install spconv-1.2.1-cp36-cp36m-linux_x86_64.whl
    

Dataset Preparation and Pre-processing

Dataset Download and Preparation

Due to the YouTube video copyright issues, we could not provide the raw videos/frames. The YouTube video ID of each corresponding videos are provided in video_id.txt. You could download them by efficient tools like youtube-dl. Afterwards, you could select the clips of interest (the clips with clear talking face of a certain speaker), and then crop the images into square shape for later pre-processing (i.e., keep the head, torso and background parts remained).

Data Pre-processing

The data pre-processing codes are included in the data_util folder, where the main pre-processing calling scripts are in the data_util/process_data.py. To implement it step by step, you could run the below bash command:

bash process_data.sh $person_name$

The additional features are that we parse the more detailed facial regions as well as save the 3DMM facial keypoints.

Training

Since the work is done during internship at SenseTime Research, part of the codes still need to undergo a series of review and permission process.

Citation

If you find our work useful, please kindly cite as:

@article{liu2022semantic,
  title = {Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation},
  author = {Liu, Xian and Xu, Yinghao and Wu, Qianyi and Zhou, Hang and Wu, Wayne and Zhou, Bolei},
  journal={arXiv preprint arXiv:2201.07786},
  year = {2022}
}

Related Projects

If you are interested in NeRF / neural implicit representions + semantic map, we would also like to recommend you to check out other related works:

  • Object-compositional implicit neural surfaces: [ECCV 2022] ObjectSDF.

  • Neural implicit generative model: [ECCV 2022] Sem2NeRF.

Acknowledgement

  • The codebase is developed based on AD-NeRF of Guo et al. Many thanks to their great contributions!

ssp-nerf's People

Contributors

alvinliu0 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.