Coder Social home page Coder Social logo

uncertainemotion's Introduction

Multitask Emotion Recognition with Uncertainty Prediction

This is a repository for our solution for the ABAW2021 Challenge. Our team name is NISL-2021. We have won the first place in the VA challenge track (leaderboard). The Valence CCC on the test set is 0.5326 and the Arousal CCC on the test set is 0.4537.

Three Emotion Descriptors

We trained unified models to predict three emotion descriptors,

  • 12 facial action units, i.e., AU1, AU2, AU4, AU6, AU7, AU10, AU12, AU15, AU23, AU24, AU25, AU26

  • 7 basic emotions, i.e., neutral, anger, disgust, fear, happiness, sadness and surprise

  • continuous emotions: valence and arousal in the range [-1, 1].

Models

Our models have efficient CNN-RNN architectures. We show the visual and visual-audio model architectures in this picture. The visual backbone is the MobileFaceNet and the audio backbone is the MarbleNet.

drawing

We list the number of parameters of our visual model (EMENet-V) and visual-audio model (EMENet-VA) as follows:

Model # Param. FLOPs
EMENet-V 1.68M 228M
EMENet-VA 1.91M 234M

Note that the FLOPs are the number of floating-point operations when the visual input is one RGB image (112x112) and audio input is one mel spectrogram (64x64). Our model can accept a sequence of facial images and a sequence of spectrograms.

We not only trained single models, but also trained deep ensembles. A deep ensemble consists of several models with the same architecture, but different random initialization. We applied deep ensembles for emotion uncertainty estimation. Regarding the aleatoric and epsitemic uncertainty of emotions, please refer to our paper and our presentation video.

Requirements

  1. Python3.9
  2. CUDA 11.0
  3. Install other requirements using
pip install requirements.txt
  1. When using audio-visual models, NeMo is required. Install it using:
apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython
pip install nemo_toolkit['all']

Usage

To use the visual models trained by us, change the current working directory to Training. You can download model weights from here.

To use the visual-audio models, change the current working directory to 'Train_Multimodal'. You can download model weights from here.

After downloading model weights, you can put the compressed file in current-workdir/checkpoints/, then uncompress it. The model weights are stored in separate folders, where each folder corresponds to a single model in an ensemble.

Here are a couple of tasks you can perform with our implementation.

Inference on one image

You can load a single visual model's weights and then inference the three emotion descriptors of a single input image.

To do this, you can first prepare a cropped face image, like the image example_image/00081.jpg.

Then you can change the test_image_path of the inference_one_image.py in the Training directory to your face image path.

Suppose you want to load the model weight in student_round_3_exp_0, you can run

python inference_one_image.py --names student_round_3_exp_0 --load_epochs 4 --cuda --auxiliary

You will obtain this output image, which illustrates the predictions and uncertainty (aleatoric uncertainty) of three emotion descriptors. Larger bar indicates higher uncertainty.

drawing

A live video demo

A live demonstration which uses the webcam to capture human faces and predict three emotion descriptors' prediction and their uncertainty.

The main script for this demo is Demo/Demo_Visual_Model.py. You can run this demo with:

python Demo_Visual_Model.py --use_cuda --save_file output.avi

This live demo will save an output video output.avi to the current directory. You may press q to exit this demo.

Predicting on videos

You can load an ensemble of models' weights and predict the three emotion descriptors on the test set of the Aff-wild2 dataset. It requires you to prepare the test set data ahead.

Firstly run the create_test_set_file.py with correct directories.

python create_test_set_file.py --data_dir directory-to-cropped-faces-test-set --video_dir directory-to-video-files-test-set 

This will create a test_set.pkl which stores the test set frames paths and their extracted audio files.

Then with visual models, you can run

python test_ensemble --names student_round_3_exp_0 student_round_3_exp_3  student_round_3_exp_5 student_round_3_exp_6 student_round_3_exp_9 --load_epochs 4 8 4 7 6 --cuda 

This will load the model weights from five visual-model experiments (e.g., the 4th epoch weights from the student_round_3_exp_0). After that, a folder named ensemble_preds_student_round_3 will be created, where the predictions are all stored.

While for visual-audio models, you can replace the experiment names and number of epochs according to the downloaded weights files.

Cite

If you are interested, please cite our work:

@inproceedings{deng2021iterative,
  title={Iterative Distillation for Better Uncertainty Estimates in Multitask Emotion Recognition},
  author={Deng, Didan and Wu, Liang and Shi, Bertram E},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={3557--3566},
  year={2021}
}

uncertainemotion's People

Contributors

wtomin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.