ddlbojack / emotion2vec Goto Github PK
View Code? Open in Web Editor NEW[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
你好 可以更新微信群二维码吗
How are utterance embedding obtained? Are they obtained from frame-level features through convolution or pooling?
I was trying to create iemocap embedding on my own, but my GPU with 8GB memory gave me OOM from cuda. How much size do I need to process this?
extrafeature only work with the base model. is there any plan to fix this?
omegaconf.errors.ValidationError: Object of unsupported type: '_MISSING_TYPE'
full_key:
reference_type=None
object_type=None
Is this due to a software package conflict?I cant solve this problem.
Thank you for your contribution; your work is truly amazing. However, I would like to train emotion2vec for a pretraining task. Could you provide the source code or offer any suggestions?
Hello! One of my work recently used Emotion2Vec. Could I join this group chat to communicate with you? My wechat can be get by my profile picture(QR code) If you are not busy, you can get my wechat by scanning it! Thank you very much.
非常感谢作者开源这么好的情绪预训练模型。
我在modelscope上看到有这样的描述:
首先使用语音情感识别学术数据集fine-tune emotion2vec,然后对15万小时中英数据进行标注,筛选文本情感与语音情感相同,并且置信度高的数据。
请问能否开源下文本情绪模型和采用学术数据集训练的语音情绪模型吗,我想基于此方法训练一个3分类模型。
谢谢!
Thank you for sharing your nice work!
In the script emotion2vec_extract_features.sh
, I noticed that features are extracted from the last layer.
Have you tried extracting features from other layers as well?
I'm just curious if this approach is based on empirical insight.
Please update the QR code.
When loading the data2vec2 model using fairseq. checkpoint_utils. load_model_ensemble_and_task ([ckpt_path]), an error occurred while loading the data2vec2 model: KeyError : "_name", Could you please tell me how to solve the problem of loading the model
Could you please share the script to train the network for upstream task? I want to finetune the model.
Thanks!
What is the dataset Emo-262? Does your group collect it and will it be available for the public? How can I get it?
Hint: The word LSSED in the Table 2 caption is wrong and was written as LSED. Maybe you can check your paper writing.
您好,群聊的二维码过期了
如题
Hi, thank you very much for your work.
I want to continue to do some interesting work based on your work.
I have not found any related model fine-tuning on modelscore and github.
Can you please guide me on how to use your model for model fine-tuning and retraining?
many thanks
其实我是有一个需求,是长音频需要切片算情感分类概率,比如每5s得到一个 ,但是目前pipeline api封装得太死了,不支持这么操作,只支持全局平均算出一个。如果pipeline接口能额外输入一个切片长度,得到的概率向量多一个时间维度,就好了
Hey Author , Thanks for the opensource
I wanted to ask if emotion2vec is better than https://github.com/audeering/w2v2-how-to
Thanks in advance
sry for missing the last update
Dear Authors,
You have only shared the train.npy
, train.lengths
, train.emo
in the iemocap_downstream
folder.
Do you mind sharing also the test and dev versions of the files? This will make testing your models more convenient.
Thank you in advance.
Best regards,
Aaron
I want to know if the emotion2vec can run on arm server.
Thank you for providing the code!
I am a novice in the field of SER. I have trained the downstream model using the provided train.npy, train.lengths, and train.emo files, but I'm unsure how to use the obtained model for category inference on the features within train.npy.
I noticed that the shape of the train.npy you provided is (1253877, 768). In my understanding, it represents 1253877 samples with 768-dimensional features each. I would like to classify these 1253877 samples using the pre-trained model. How can I achieve this?
Hi @ddlBoJack,
Please share some information about the checkpoint file shared in the readme. Is it the best performing model so far?
Also the train.py file given for IEMOCAP, is it the frame-level or utterance level features?
Thanks,
Hello!
Thank you for such a nice work!
I am performing speaker diarization with pyannote, and want to use the audio segments which i recieve from the diarization model to perfrom emotion detection on them. The segments are of different sizes, I'm sure I'll have to do some kind of splitting because of the CUDA OOM for very long segments (like 200 sec), but I'm wondering what is the optimal segment size for the emotion2vec_plus_large model? 3 seconds, 15 seconds or whatever?
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.