Coder Social home page Coder Social logo

xxsuper / emotion2vec Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ddlbojack/emotion2vec

0.0 0.0 0.0 6.6 MB

Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Shell 2.35% Python 96.10% Jupyter Notebook 1.56%

emotion2vec's Introduction

EMOTION2VEC

Official PyTorch code for extracting features and training downstream models with
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

emotion2vec Logo

(Logo generated by DALL·E 3)

version version python mit

News

  • 🆕 9-class emotion recognition model with iterative fine-tuning from emotion2vec has been released in modelscope and FunASR. Firstly, the speech emotion recognition academic datasets are used to fine-tune emotion2vec; then 150k hours of Chinese and English data are labeled, and the data with the same text emotion and speech emotion and high confidence on SER score (resulting in more than 10k hours) are screened out to fine-tune emotion2vec again, obtaining the weights of this version.
  • emotion2vec has been integrated into modelscope and FunASR.
  • We release the paper, and create a WeChat group for emotion2vec.
  • We release code, checkpoints, and extracted features for emotion2vec.

Guides

emotion2vec is the first universal speech emotion representation model. Through self-supervised pre-training, emotion2vec has the ability to extract emotion representation across different tasks, languages, and scenarios.

Performance

Performance on IEMOCAP

emotion2vec achieves SOTA with only linear layers on the mainstream IEMOCAP dataset. Refer to the paper for more details.

Performance on other languages

emotion2vec achieves SOTA compared with SOTA SSL models on multiple languages (Mandarin, French, German, Italian, etc.). Refer to the paper for more details.

Performance on other speech emotion tasks

Refer to the paper for more details.

Visualization

UMAP visualizations of learned features on the IEMOCAP dataset. Red and Blue tones mean low and high arousal emotional classes, respectively. Refer to the paper for more details.

Extract features

Download extracted features

We provide the extracted features of popular emotion dataset IEMOCAP. The features are extracted from the last layer of emotion2vec. The features are stored in .npy format and the sample rate of the extracted features is 50Hz. The utterance-level features are computed by averaging the frame-level features.

All wav files are extracted from the original dataset for diverse downstream tasks. If want to train with standard 5531 utterances for 4 emotions classification, please refer to the iemocap_downstream folder.

Extract features from your dataset

Install from the source code

The minimum environment requirements are python>=3.8 and torch>=1.13. Our testing environments are python=3.8 and torch=2.01.

  1. git clone repos.
pip install fairseq
git clone https://github.com/ddlBoJack/emotion2vec.git
  1. download emotion2vec checkpoint from:
  1. modify and run scripts/extract_features.sh

Install from modelscope (Recommended)

  1. install modelscope and funasr
pip install -U funasr modelscope
  1. run the code.
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

'''
Using the emotion representation model
rec_result only contains {'feats'}
	granularity="utterance": {'feats': [*768]}
	granularity="frame": {feats: [T*768]}
'''
inference_pipeline = pipeline(
    task=Tasks.emotion_recognition,
    model="iic/emotion2vec_base", model_revision="v2.0.4")
rec_result = inference_pipeline('https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav', output_dir="./outputs", granularity="utterance")
print(rec_result)

'''
Using the finetuned emotion recognization model
rec_result contains {'feats', 'labels', 'scores'}
	extract_embedding=False: 9-class emotions with scores
	extract_embedding=True: 9-class emotions with scores, along with features

9-class emotions:
    0: angry
    1: disgusted
    2: fearful
    3: happy
    4: neutral
    5: other
    6: sad
    7: surprised
    8: unknown
'''
inference_pipeline = pipeline(
    task=Tasks.emotion_recognition,
    model="iic/emotion2vec_base_finetuned", model_revision="v2.0.4")
rec_result = inference_pipeline('https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav', output_dir="./outputs", granularity="utterance", extract_embedding=False)
print(rec_result)

The model will be downloaded automatically.

Refer to model scope of emotion2vec_base and emotion2vec_base_finetuned for more details.

Install from FunASR

  1. install funasr
pip install -U funasr
  1. run the code.
from funasr import AutoModel

'''
Using the emotion representation model
rec_result only contains {'feats'}
	granularity="utterance": {'feats': [*768]}
	granularity="frame": {feats: [T*768]}
'''
model = AutoModel(model="iic/emotion2vec_base", model_revision="v2.0.4")
wav_file = f"{model.model_path}/example/test.wav"
rec_result = model.generate(wav_file, output_dir="./outputs", granularity="utterance")
print(rec_result)

'''
Using the finetuned emotion recognization model
rec_result contains {'feats', 'labels', 'scores'}
	extract_embedding=False: 9-class emotions with scores
	extract_embedding=True: 9-class emotions with scores, along with features

9-class emotions:
    0: angry
    1: disgusted
    2: fearful
    3: happy
    4: neutral
    5: other
    6: sad
    7: surprised
    8: unknown
'''
model = AutoModel(model="iic/emotion2vec_base_finetuned", model_revision="v2.0.4")
wav_file = f"{model.model_path}/example/test.wav"
rec_result = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
print(rec_result)

The model will be downloaded automatically.

FunASR support file list input in wav.scp (kaldi style):

wav_name1 wav_path1.wav
wav_name2 wav_path2.wav
...

Refer to FunASR for more details.

Training your downstream model

We provide training scripts for IEMOCAP dataset in the iemocap_downstream folder. You can modify the scripts to train your downstream model on other datasets.

Citation

If you find our emotion2vec code and paper useful, please kindly cite:

@article{ma2023emotion2vec,
  title={emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation},
  author={Ma, Ziyang and Zheng, Zhisheng and Ye, Jiaxin and Li, Jinchao and Gao, Zhifu and Zhang, Shiliang and Chen, Xie},
  journal={arXiv preprint arXiv:2312.15185},
  year={2023}
}

emotion2vec's People

Contributors

ddlbojack avatar zszheng147 avatar lauragpt avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.