Coder Social home page Coder Social logo

audeering / w2v2-how-to Goto Github PK

View Code? Open in Web Editor NEW
401.0 9.0 47.0 101 KB

How to use our public wav2vec2 dimensional emotion model

License: MIT License

Jupyter Notebook 100.00%
speech-emotion-recognition deep-learning wav2vec2 transformer-models arousal dominance valence msp-podcast onnx

w2v2-how-to's Introduction

How to use our public dimensional emotion model

An introduction to our model for dimensional speech emotion recognition based on wav2vec 2.0. The model is available from doi:10.5281/zenodo.6221127 and released under CC BY-NC-SA 4.0. The model was created by fine-tuning the pre-trained wav2vec2-large-robust model on MSP-Podcast (v1.7). The pre-trained model was pruned from 24 to 12 transformer layers before fine-tuning. In this tutorial we use the ONNX export of the model. The original Torch model is hosted at Hugging Face. Further details are given in the associated paper.

License

The model can be used for non-commercial purposes, see CC BY-NC-SA 4.0. For commercial usage, a license for devAIce must be obtained. The source code in this GitHub repository is released under the following license.

Quick start

Create / activate Python virtual environment and install audonnx.

$ pip install audonnx

Load model and test on random signal.

import audeer
import audonnx
import numpy as np


url = 'https://zenodo.org/record/6221127/files/w2v2-L-robust-12.6bc4a7fd-1.1.0.zip'
cache_root = audeer.mkdir('cache')
model_root = audeer.mkdir('model')

archive_path = audeer.download_url(url, cache_root, verbose=True)
audeer.extract_archive(archive_path, model_root)
model = audonnx.load(model_root)

sampling_rate = 16000
signal = np.random.normal(size=sampling_rate).astype(np.float32)
model(signal, sampling_rate)
{'hidden_states': array([[-0.00711814,  0.00615957, -0.00820673, ...,  0.00666412,
          0.00952989,  0.00269193]], dtype=float32),
 'logits': array([[0.6717072 , 0.6421313 , 0.49881312]], dtype=float32)}

The hidden states might be used as embeddings for related speech emotion recognition tasks. The order in the logits output is: arousal, dominance, valence.

Tutorial

For a detailed introduction, please check out the notebook.

$ pip install -r requirements.txt
$ jupyter notebook notebook.ipynb 

Citation

If you use our model in your own work, please cite the following paper:

@article{wagner2023dawn,
    title={Dawn of the Transformer Era in Speech Emotion Recognition: Closing the Valence Gap},
    author={Wagner, Johannes and Triantafyllopoulos, Andreas and Wierstorf, Hagen and Schmitt, Maximilian and Burkhardt, Felix and Eyben, Florian and Schuller, Bj{\"o}rn W},
    journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
    pages={1--13},
    year={2023},
}

w2v2-how-to's People

Contributors

frankenjoe avatar hagenw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

w2v2-how-to's Issues

Convert VAD to Ekman

Hello,

This model provides VAD values in 3D space.

However, the Ekman model is more intuitive to share the results with users.

I have found papers with 3D representations hinting at how to perform this conversion.

Are you aware of a straightforward approach to perform the conversion between both models?

Ideally in Python, but any hint on the algorithm would also do.

Best,

Ed

Range value of arousal, valence, dominance

I wonder what the range value of arousal, valence, and dominance is. As far as I know, model output is a logit vector size of 3 representing that feature and looks like its values range [0, 1]. I see that you use MSP-Conversation Corpus for fine-tuning. But when I looked at The MSP-Conversation Corpus paper paperlink, they mentioned that
"Notice that the values of the traces are in the range between -100 and 100. The figure shows that extreme values are uncommon. Most of the annotations are concentrated between -40 to 40 for valence, -20 to 50 for arousal, and -20 to 40 for dominance"

Do you guys normalize that feature, or do something related?

Negative values for Arousal

When I run the model, I get some negative values for the arousal element. I thought the arousal, dominance, and valence range is between 0 to 1. Can anyone interpret what is happening or what these negative values mean?

Memory Leak during inference

Hi,

I have about more than 100,000 audio files (each audio file is about 1-2 minutes). My goal is to use the API to infer arousal, dominance, and valence from these audio files. I simply use the loop function to feed the audio files to the API one by one, but it seems there is a memory leak problem after 5,000 loops.

The error is like:

[ E : o n n x r u n t i m e : , s e q u e n t i a l _ e x e c u t o r . c c : 5 1 4 o n n x r u n t i m e : : E x e c u t e K e r n e l ] N o n - z e r o s t a t u s c o d e r e t u r n e d w h i l e r u n n i n g S o f t m a x n o d e . N a m e : ' S o f t m a x _ 2 4 6 ' S t a t u s M e s s a g e : C : \ a \ _ w o r k \ 1 \ s \ o n n x r u n t i m e \ c o r e \ f r a m e w o r k \ b f c _ a r e n a . c c : 3 7 6 o n n x r u n t i m e : : B F C A r e n a : : A l l o c a t e R a w I n t e r n a l F a i l e d t o a l l o c a t e m e m o r y f o r r e q u e s t e d b u f f e r o f s i z e 1 4 3 0 4 1 6 0 0 0 0

I was wondering is there some way to fix the problem? I am really new to deep learning framework, and look forward to your help. (I am running the code on a CPU machine with 32G ram).

Other pretrained models

Hi authors,

Is it possible to release the XLS version of the model or the CNN14 model?
My current project needs a classifier that predicts valance majorly with the paralinguistic track. I read the paper and the analysis that the released w2v2-L-robust learned sentiment from the linguistic content.
So, I'm wondering if it's possible to access one of your other models that in your analysis, does not rely on linguistic content that much? It will be a great help if it's possible!

Thanks!

Error in using audinterface.Feature

interface = audinterface.Feature(
    model.outputs["logits"].labels,
    process_func=model,
    process_func_applies_sliding_window=False,
    process_func_args={
        "outputs": "logits",
    },
    sampling_rate=sr,
    resample=True,
    verbose=True,
    win_dur=1.0,
    hop_dur=0.5,
)

AttributeError: type object 'type' has no attribute 'id'.

This happened to the latest ver 0.9.0.

there is a error when running the note code


ConnectionRefusedError Traceback (most recent call last)
~/miniconda3/envs/torch/lib/python3.7/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
1349 h.request(req.get_method(), req.selector, req.data, headers,
-> 1350 encode_chunked=req.has_header('Transfer-encoding'))
1351 except OSError as err: # timeout error

~/miniconda3/envs/torch/lib/python3.7/http/client.py in request(self, method, url, body, headers, encode_chunked)
1280 """Send a complete request to the server."""
-> 1281 self._send_request(method, url, body, headers, encode_chunked)
1282

~/miniconda3/envs/torch/lib/python3.7/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
1326 body = _encode(body, 'body')
-> 1327 self.endheaders(body, encode_chunked=encode_chunked)
1328

~/miniconda3/envs/torch/lib/python3.7/http/client.py in endheaders(self, message_body, encode_chunked)
1275 raise CannotSendHeader()
-> 1276 self._send_output(message_body, encode_chunked=encode_chunked)
1277

~/miniconda3/envs/torch/lib/python3.7/http/client.py in _send_output(self, message_body, encode_chunked)
1035 del self._buffer[:]
-> 1036 self.send(msg)
1037

~/miniconda3/envs/torch/lib/python3.7/http/client.py in send(self, data)
975 if self.auto_open:
--> 976 self.connect()
977 else:

~/miniconda3/envs/torch/lib/python3.7/http/client.py in connect(self)
1442
-> 1443 super().connect()
1444

~/miniconda3/envs/torch/lib/python3.7/http/client.py in connect(self)
947 self.sock = self._create_connection(
--> 948 (self.host,self.port), self.timeout, self.source_address)
949 self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

~/miniconda3/envs/torch/lib/python3.7/socket.py in create_connection(address, timeout, source_address)
727 try:
--> 728 raise err
729 finally:

~/miniconda3/envs/torch/lib/python3.7/socket.py in create_connection(address, timeout, source_address)
715 sock.bind(source_address)
--> 716 sock.connect(sa)
717 # Break explicitly a reference cycle

ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

URLError Traceback (most recent call last)
/tmp/ipykernel_2304374/3507476301.py in
20 url,
21 dst_path,
---> 22 verbose=True,
23 )
24

~/miniconda3/envs/torch/lib/python3.7/site-packages/audeer/core/io.py in download_url(url, destination, force_download, verbose)
188 pbar.update(block_size)
189
--> 190 urllib.request.urlretrieve(url, destination, reporthook=bar_update)
191
192 return destination

~/miniconda3/envs/torch/lib/python3.7/urllib/request.py in urlretrieve(url, filename, reporthook, data)
245 url_type, path = splittype(url)
246
--> 247 with contextlib.closing(urlopen(url, data)) as fp:
248 headers = fp.info()
249

~/miniconda3/envs/torch/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
220 else:
221 opener = _opener
--> 222 return opener.open(url, data, timeout)
223
224 def install_opener(opener):

~/miniconda3/envs/torch/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
523 req = meth(req)
524
--> 525 response = self._open(req, data)
526
527 # post-process response

~/miniconda3/envs/torch/lib/python3.7/urllib/request.py in _open(self, req, data)
541 protocol = req.type
542 result = self._call_chain(self.handle_open, protocol, protocol +
--> 543 '_open', req)
544 if result:
545 return result

~/miniconda3/envs/torch/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
501 for handler in handlers:
502 func = getattr(handler, meth_name)
--> 503 result = func(*args)
504 if result is not None:
505 return result

~/miniconda3/envs/torch/lib/python3.7/urllib/request.py in https_open(self, req)
1391 def https_open(self, req):
1392 return self.do_open(http.client.HTTPSConnection, req,
-> 1393 context=self._context, check_hostname=self.check_hostname)
1394
1395 https_request = AbstractHTTPHandler.do_request

~/miniconda3/envs/torch/lib/python3.7/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
1350 encode_chunked=req.has_header('Transfer-encoding'))
1351 except OSError as err: # timeout error
-> 1352 raise URLError(err)
1353 r = h.getresponse()
1354 except:

URLError: <urlopen error [Errno 111] Connection refused>

Fine tune on another dataset

Hi,

I am currently conducting a research project with my partner on developing an SER model for New Zealand English. We evaluated the model you provided here and achieved promising results but would like to fine tune it on another corpus.

We were wondering what input format the model expects our dataset to be in for training. We have it as a Dataset object using the datasets library from HuggingFace. The debug console in the below image shows the structure of our Dataset. It currently has audio, arousal, and valence annotations as inputs to the model.

image

Was this the input used, or was a different input expected?

audonnx requirements trainer and onnx depends on protobuf different protobuf

pip install audonnx

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
trainer 0.0.20 requires protobuf<3.20,>=3.9.2, but you have protobuf 3.20.3 which is incompatible.

and I installed protobuf 3.9.2 then

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
onnx 1.13.1 requires protobuf<4,>=3.20.2, but you have protobuf 3.9.2 which is incompatible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.