Coder Social home page Coder Social logo

swayhrl / m3f.pytorch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sailordiary/m3f.pytorch

1.0 1.0 0.0 439 KB

PyTorch code for "M³T: Multi-Modal Multi-Task Learning for Continuous Valence-Arousal Estimation"

Home Page: https://github.com/swayhrl/m3t.pytorch

License: MIT License

Python 100.00%

m3f.pytorch's Introduction


M³T: Multi-Modal Multi-Task Learning for Continuous Valence-Arousal Estimation

Paper Conference Workshop Challenge

Description

This repository holds the PyTorch implementation of the approach described in our report "M³T: Multi-Modal Multi-Task Learning for Continuous Valence-Arousal Estimation", which is used for our entry to ABAW Challenge 2020 (VA track). We provide models trained on Aff-Wild2.

Update

  • 2020.02.10: Initial public release

How to run

First, install dependencies

# clone project   
git clone https://github.com/sailordiary/m3t.pytorch
python3 -m pip install -r requirements.txt --user

To evaluate on our pretrained models, first download the checkpoints from the release page, and run eval.py to generate validation or test set predictions:

# download the checkpoint
wget 
# to report CCC on the validation set
python3 eval.py --test_on_val --checkpoint m3t_mtl-vox2.pt
python3 get_smoothed_ccc predictions_val.pt
# to generate test set predictions
python3 eval.py --checkpoint m3t_mtl-vox2.pt

Dataset

We use the Aff-Wild2 dataset. The raw videos are decoded with ffmpeg, and passed to RetinaFace-ResNet50 for face detection. To extract log-Mel spectrogram energies, extract 16kHz mono wave files from audio tracks, and refer to process/extract_melspec.py.

We provide the cropped-aligned face tracks (256x256, ~79G zipped) as well as pre-computed SENet-101 and TCAE features we use for our experiments here: [OneDrive]

Some files are still being uploaded at this moment. Please check the page again later.

Note that in addition to the 256-dimensional encoder features, we also saved 12 AU activation scores predicted by TCAE, which together are concatenated into a 268-dimensional vector for each video frame. We only used the encoder features for our experiments, but feel free to experiment with this extra information.

Model Zoo

Coming soon...

Citation

@misc{zhang2020m3t,
    title={$M^3$T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild},
    author={Yuan-Hang Zhang and Rulin Huang and Jiabei Zeng and Shiguang Shan and Xilin Chen},
    year={2020},
    eprint={2002.02957},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

m3f.pytorch's People

Contributors

sailordiary avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.