Thanks a lot for the great work! I am trying to finetue the mockingj

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Results of ASR with PyTorch Kaldi don't perform well about s3prl HOT 11 CLOSED

s3prl commented on July 24, 2024

Results of ASR with PyTorch Kaldi don't perform well

from s3prl.

Comments (11)

andi611 commented on July 24, 2024

Thanks for rasing your concern.
I don't think it's normal.

In our new paper TERA,
we show the results of fine-tuning pre-trained models for ASR with PyTorch-Kaldi (See TABLE VI of the TERA paper).
By using the same setup as you did:

We pre-trained the base (3-layer) TERA model with 960 hours of Librispeech
We finetuned the pre-trained base TERA with liGRU
We used train-clean-100 for downstream adaptation
We used test-clean for testing

We achieved 8.23% WER, and 5.84% after rescoring.

I'm not sure what is the cause of your current result.
However, I have two guesses:

fmllr data mismatch with the pre-trained model?
Not successfully loading the pre-trained model?

Let me know if you need any more help.

from s3prl.

JINGZIjingzi commented on July 24, 2024

Thanks a lot for your quick response!
I will read the TERA paper in detail.

As for your guesses,

the fmllr data is extracted by the tutorial of pytorch-kaldi for librispeech. I don't know where the mismatching would come from. By the way, I wonder whether the TERA model can work across languages.
By checking the logfile, I think the model is loaded successfully.

from s3prl.

andi611 commented on July 24, 2024

the fmllr data is extracted by the tutorial of pytorch-kaldi for librispeech. I don't know where the mismatching would come from.

The fMLLR extraction process involves training, I assume the mismatch is from the pre-trained model trained on my fMLLR data and your fMLLR data during downstream ASR adaptation.
One possible solution is to pre-train the model with your fMLLR data. Another is to use my Kaldi files, including these directories under librispeech/s5: data/, exp/, fmllr/

By the way, I wonder whether the TERA model can work across languages.

Interesting topic, we will look into it in our future work.

from s3prl.

JINGZIjingzi commented on July 24, 2024

Thank you! I will check my Kaldi files and try again.

from s3prl.

ArtemisZGL commented on July 24, 2024

@andi611 Hello, When using the tera model in asr, it seems not have too much performance compared to the scratch model. And have you tried different asr model after the tera? Why did you choose this hybrid model ?

from s3prl.

andi611 commented on July 24, 2024

@andi611 Hello, When using the tera model in asr, it seems not have too much performance compared to the scratch model. And have you tried different asr model after the tera? Why did you choose this hybrid model ?

Hi,
May I ask which scratch model are you referring to?
In Table VII, the scratch model achieved 10.47 / 7.68 WER (row r), while the best TERA model achieved 8.23 / 5.84 WER (row s).

I did not try different asr model, this hybrid model (Pytorch-Kaldi) is chosen because it is easy to plug in pre-trained models.
I'm sure that other asr model can work with TERA, however, I haven't found any better option than Pytorch-Kaldi at the moment.

If you have any recommended ASR models, please suggest them and we will try to work on it.

from s3prl.

ArtemisZGL commented on July 24, 2024

@andi611 Sorry, in the tabel V which the tera is frozen without fine-tuning, the result of liGRU + MFCC (or other feature) is similar to the result of tera. And what is the different between this with the TABLE VII(row r)?

from s3prl.

andi611 commented on July 24, 2024

And what is the different between this with the TABLE VII(row r)?

First, the difference between Table VII row r, and other TERA is:
the scratch model (Table VII row r) is a random initialized TERA (i.e. 3-layers of Transformer Encoder) trained from scratch end-to-end with the ASR model; while the other TERAs are pre-trained with the proposed self-supervised task, it is after pre-train that they are used with the ASR model.

@andi611 Sorry, in the tabel V which the tera is frozen without fine-tuning, the result of liGRU + MFCC (or other feature) is similar to the result of tera.

The WER between different features including MFCC, FBANK, fMLLR, and TERA is minor due to this particular ASR framework. In other ASR framework, the improvement may vary. However, these WER still serve as a comparison as to which feature is better (TERA > fMLLR > FBANK > MFCC).

from s3prl.

ArtemisZGL commented on July 24, 2024

@andi611 Thanks for your reply, I just mean that the TERA representation didn't get too much improvement compared to the those feature(MFCC..). By the way, when pretraining the mockingjay, I have some cuda oom in some training step(not too much), will it influence the performance?

from s3prl.

andi611 commented on July 24, 2024

when pretraining the mockingjay, I have some cuda oom in some training step(not too much), will it influence the performance?

Yes, when the Cuda oom msg is shown, that particular batch is passed, hence that batch is not learned by the model.
Large memory is requested for a longer sequence, hence the model is missing out on those long sequences.

You can either:

use smaller batch size and accumulate gradient through changing gradient_accumulation_steps: 1, or
sample a random sub-sequence when the sequence length exceeds a threshold, add this line to your config.

from s3prl.

ArtemisZGL commented on July 24, 2024

@andi611 thanks a lot ! I will try that.

from s3prl.

Results of ASR with PyTorch Kaldi don't perform well about s3prl HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent