Coder Social home page Coder Social logo

Comments (11)

andi611 avatar andi611 commented on July 24, 2024

Thanks for rasing your concern.
I don't think it's normal.

In our new paper TERA,
we show the results of fine-tuning pre-trained models for ASR with PyTorch-Kaldi (See TABLE VI of the TERA paper).
By using the same setup as you did:

  • We pre-trained the base (3-layer) TERA model with 960 hours of Librispeech
  • We finetuned the pre-trained base TERA with liGRU
  • We used train-clean-100 for downstream adaptation
  • We used test-clean for testing

We achieved 8.23% WER, and 5.84% after rescoring.

I'm not sure what is the cause of your current result.
However, I have two guesses:

  1. fmllr data mismatch with the pre-trained model?
  2. Not successfully loading the pre-trained model?

Let me know if you need any more help.

from s3prl.

JINGZIjingzi avatar JINGZIjingzi commented on July 24, 2024

Thanks a lot for your quick response!
I will read the TERA paper in detail.

As for your guesses,

  1. the fmllr data is extracted by the tutorial of pytorch-kaldi for librispeech. I don't know where the mismatching would come from. By the way, I wonder whether the TERA model can work across languages.
  2. By checking the logfile, I think the model is loaded successfully.
    image

from s3prl.

andi611 avatar andi611 commented on July 24, 2024
  1. the fmllr data is extracted by the tutorial of pytorch-kaldi for librispeech. I don't know where the mismatching would come from.

The fMLLR extraction process involves training, I assume the mismatch is from the pre-trained model trained on my fMLLR data and your fMLLR data during downstream ASR adaptation.
One possible solution is to pre-train the model with your fMLLR data. Another is to use my Kaldi files, including these directories under librispeech/s5: data/, exp/, fmllr/

By the way, I wonder whether the TERA model can work across languages.

Interesting topic, we will look into it in our future work.

from s3prl.

JINGZIjingzi avatar JINGZIjingzi commented on July 24, 2024

Thank you! I will check my Kaldi files and try again.

from s3prl.

ArtemisZGL avatar ArtemisZGL commented on July 24, 2024

@andi611 Hello, When using the tera model in asr, it seems not have too much performance compared to the scratch model. And have you tried different asr model after the tera? Why did you choose this hybrid model ?

from s3prl.

andi611 avatar andi611 commented on July 24, 2024

@andi611 Hello, When using the tera model in asr, it seems not have too much performance compared to the scratch model. And have you tried different asr model after the tera? Why did you choose this hybrid model ?

Hi,
May I ask which scratch model are you referring to?
In Table VII, the scratch model achieved 10.47 / 7.68 WER (row r), while the best TERA model achieved 8.23 / 5.84 WER (row s).

I did not try different asr model, this hybrid model (Pytorch-Kaldi) is chosen because it is easy to plug in pre-trained models.
I'm sure that other asr model can work with TERA, however, I haven't found any better option than Pytorch-Kaldi at the moment.

If you have any recommended ASR models, please suggest them and we will try to work on it.

from s3prl.

ArtemisZGL avatar ArtemisZGL commented on July 24, 2024

@andi611 Sorry, in the tabel V which the tera is frozen without fine-tuning, the result of liGRU + MFCC (or other feature) is similar to the result of tera. And what is the different between this with the TABLE VII(row r)?

from s3prl.

andi611 avatar andi611 commented on July 24, 2024

And what is the different between this with the TABLE VII(row r)?

First, the difference between Table VII row r, and other TERA is:
the scratch model (Table VII row r) is a random initialized TERA (i.e. 3-layers of Transformer Encoder) trained from scratch end-to-end with the ASR model; while the other TERAs are pre-trained with the proposed self-supervised task, it is after pre-train that they are used with the ASR model.

@andi611 Sorry, in the tabel V which the tera is frozen without fine-tuning, the result of liGRU + MFCC (or other feature) is similar to the result of tera.

The WER between different features including MFCC, FBANK, fMLLR, and TERA is minor due to this particular ASR framework. In other ASR framework, the improvement may vary. However, these WER still serve as a comparison as to which feature is better (TERA > fMLLR > FBANK > MFCC).

from s3prl.

ArtemisZGL avatar ArtemisZGL commented on July 24, 2024

@andi611 Thanks for your reply, I just mean that the TERA representation didn't get too much improvement compared to the those feature(MFCC..). By the way, when pretraining the mockingjay, I have some cuda oom in some training step(not too much), will it influence the performance?

from s3prl.

andi611 avatar andi611 commented on July 24, 2024

when pretraining the mockingjay, I have some cuda oom in some training step(not too much), will it influence the performance?

Yes, when the Cuda oom msg is shown, that particular batch is passed, hence that batch is not learned by the model.
Large memory is requested for a longer sequence, hence the model is missing out on those long sequences.

You can either:

  1. use smaller batch size and accumulate gradient through changing gradient_accumulation_steps: 1, or
  2. sample a random sub-sequence when the sequence length exceeds a threshold, add this line to your config.

from s3prl.

ArtemisZGL avatar ArtemisZGL commented on July 24, 2024

@andi611 thanks a lot ! I will try that.

from s3prl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.