Hi, thanks for your excellent work on this repository. I am wonderin

Normalization for Librispeech pre-training data about s3prl HOT 6 CLOSED

rtst777 commented on July 24, 2024

Normalization for Librispeech pre-training data

from s3prl.

Comments (6)

andi611 commented on July 24, 2024 1

Good point! We haven't verified delta_delta.

In our recent experiment, we find that the choice of acoustic features have an effect on model performance (These exps will be released soon with our new paper TERA).
We are still exploring which features are the best for these reconstruction-based models.
One thing certain for now is, acoustic features are a parameter choice that researchers have to explore.

from s3prl.

rtst777 commented on July 24, 2024 1

Good point! We haven't verified delta_delta.

In our recent experiment, we find that the choice of acoustic features have an effect on model performance (These exps will be released soon with our new paper TERA).
We are still exploring which features are the best for these reconstruction-based models.
One thing certain for now is, acoustic features are a parameter choice that researchers have to explore.

make sense. look forward to your new paper :)

from s3prl.

andi611 commented on July 24, 2024

Thank you for having an interest in our work.

Yes, for Mockingjay:
First, features are extracted with Librosa here.
Next, delta is applied here.
And finally, cmvn is also applied here (the zero mean and unit variance you've mentioned).

from s3prl.

rtst777 commented on July 24, 2024

Thank you for having an interest in our work.

Yes, for Mockingjay:
First, features are extracted with Librosa here.
Next, delta is applied here.
And finally, cmvn is also applied here (the zero mean and unit variance you've mentioned).

Thank you for your answer! May I ask why you used delta instead of delta_delta? Doesn't delta_delta (240 dim) provide more information than delta (160 dim) for Mockingjay to learn?

from s3prl.

yilunzhao commented on July 24, 2024

Thank you for having an interest in our work.
Yes, for Mockingjay:
First, features are extracted with Librosa here.
Next, delta is applied here.
And finally, cmvn is also applied here (the zero mean and unit variance you've mentioned).

Thank you for your answer! May I ask why you used delta instead of delta_delta? Doesn't delta_delta (240 dim) provide more information than delta (160 dim) for Mockingjay to learn?

Hi, thanks for the impressive work on this repository!

I have a question: Have you tried to simply concatenate different acoustic features like Mel, mfcc, fmllr, etc together to form the new input feature? Because in intuition, I think features with higher dimensions may reach better results. I wonder whether this method make sense. Thanks!

from s3prl.

andi611 commented on July 24, 2024

Hi, thanks for the impressive work on this repository!

I have a question: Have you tried to simply concatenate different acoustic features like Mel, mfcc, fmllr, etc together to form the new input feature? Because in intuition, I think features with higher dimensions may reach better results. I wonder whether this method make sense. Thanks!

Interesting thought!
It makes sense to me, in our recent study we find input features have an large effect on reconstruction-based pre-trained models.
We’ve tried mfcc, fbank, fmllr separately, but we haven’t used them in combination yet.

from s3prl.

Normalization for Librispeech pre-training data about s3prl HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent