M³T: Multi-Modal Multi-Task Learning for Continuous Valence-Arousal Estimation

Description

This repository holds the PyTorch implementation of the approach described in our report "M³T: Multi-Modal Multi-Task Learning for Continuous Valence-Arousal Estimation", which is used for our entry to ABAW Challenge 2020 (VA track). We provide models trained on Aff-Wild2.

Update

2020.02.10: Initial public release

How to run

First, install dependencies

# clone project   
git clone https://github.com/sailordiary/m3t.pytorch
python3 -m pip install -r requirements.txt --user

To evaluate on our pretrained models, first download the checkpoints from the release page, and run eval.py to generate validation or test set predictions:

# download the checkpoint
wget 
# to report CCC on the validation set
python3 eval.py --test_on_val --checkpoint m3t_mtl-vox2.pt
python3 get_smoothed_ccc predictions_val.pt
# to generate test set predictions
python3 eval.py --checkpoint m3t_mtl-vox2.pt

Dataset

We use the Aff-Wild2 dataset. The raw videos are decoded with ffmpeg, and passed to RetinaFace-ResNet50 for face detection. To extract log-Mel spectrogram energies, extract 16kHz mono wave files from audio tracks, and refer to process/extract_melspec.py.

We provide the cropped-aligned face tracks (256x256, ~79G zipped) as well as pre-computed SENet-101 and TCAE features we use for our experiments here: [OneDrive]

Some files are still being uploaded at this moment. Please check the page again later.

Note that in addition to the 256-dimensional encoder features, we also saved 12 AU activation scores predicted by TCAE, which together are concatenated into a 268-dimensional vector for each video frame. We only used the encoder features for our experiments, but feel free to experiment with this extra information.

Model Zoo

Coming soon...

Citation

@misc{zhang2020m3t,
    title={$M^3$T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild},
    author={Yuan-Hang Zhang and Rulin Huang and Jiabei Zeng and Shiguang Shan and Xilin Chen},
    year={2020},
    eprint={2002.02957},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

swayhrl / m3f.pytorch Goto Github PK

m3f.pytorch's Introduction

M³T: Multi-Modal Multi-Task Learning for Continuous Valence-Arousal Estimation

Description

Update

How to run

Dataset

Model Zoo

Citation

m3f.pytorch's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent