Comments (11)
Thanks for rasing your concern.
I don't think it's normal.
In our new paper TERA,
we show the results of fine-tuning pre-trained models for ASR with PyTorch-Kaldi (See TABLE VI of the TERA paper).
By using the same setup as you did:
- We pre-trained the base (3-layer) TERA model with 960 hours of Librispeech
- We finetuned the pre-trained base TERA with liGRU
- We used
train-clean-100
for downstream adaptation - We used
test-clean
for testing
We achieved 8.23% WER, and 5.84% after rescoring.
I'm not sure what is the cause of your current result.
However, I have two guesses:
- fmllr data mismatch with the pre-trained model?
- Not successfully loading the pre-trained model?
Let me know if you need any more help.
from s3prl.
Thanks a lot for your quick response!
I will read the TERA paper in detail.
As for your guesses,
- the fmllr data is extracted by the tutorial of pytorch-kaldi for librispeech. I don't know where the mismatching would come from. By the way, I wonder whether the TERA model can work across languages.
- By checking the logfile, I think the model is loaded successfully.
from s3prl.
- the fmllr data is extracted by the tutorial of pytorch-kaldi for librispeech. I don't know where the mismatching would come from.
The fMLLR extraction process involves training, I assume the mismatch is from the pre-trained model trained on my fMLLR data and your fMLLR data during downstream ASR adaptation.
One possible solution is to pre-train the model with your fMLLR data. Another is to use my Kaldi files, including these directories under librispeech/s5
: data/
, exp/
, fmllr/
By the way, I wonder whether the TERA model can work across languages.
Interesting topic, we will look into it in our future work.
from s3prl.
Thank you! I will check my Kaldi files and try again.
from s3prl.
@andi611 Hello, When using the tera model in asr, it seems not have too much performance compared to the scratch model. And have you tried different asr model after the tera? Why did you choose this hybrid model ?
from s3prl.
@andi611 Hello, When using the tera model in asr, it seems not have too much performance compared to the scratch model. And have you tried different asr model after the tera? Why did you choose this hybrid model ?
Hi,
May I ask which scratch model are you referring to?
In Table VII, the scratch model achieved 10.47 / 7.68 WER (row r), while the best TERA model achieved 8.23 / 5.84 WER (row s).
I did not try different asr model, this hybrid model (Pytorch-Kaldi) is chosen because it is easy to plug in pre-trained models.
I'm sure that other asr model can work with TERA, however, I haven't found any better option than Pytorch-Kaldi at the moment.
If you have any recommended ASR models, please suggest them and we will try to work on it.
from s3prl.
@andi611 Sorry, in the tabel V which the tera is frozen without fine-tuning, the result of liGRU + MFCC (or other feature) is similar to the result of tera. And what is the different between this with the TABLE VII(row r)?
from s3prl.
And what is the different between this with the TABLE VII(row r)?
First, the difference between Table VII row r, and other TERA is:
the scratch model (Table VII row r) is a random initialized TERA (i.e. 3-layers of Transformer Encoder) trained from scratch end-to-end with the ASR model; while the other TERAs are pre-trained with the proposed self-supervised task, it is after pre-train that they are used with the ASR model.
@andi611 Sorry, in the tabel V which the tera is frozen without fine-tuning, the result of liGRU + MFCC (or other feature) is similar to the result of tera.
The WER between different features including MFCC, FBANK, fMLLR, and TERA is minor due to this particular ASR framework. In other ASR framework, the improvement may vary. However, these WER still serve as a comparison as to which feature is better (TERA > fMLLR > FBANK > MFCC).
from s3prl.
@andi611 Thanks for your reply, I just mean that the TERA representation didn't get too much improvement compared to the those feature(MFCC..). By the way, when pretraining the mockingjay, I have some cuda oom in some training step(not too much), will it influence the performance?
from s3prl.
when pretraining the mockingjay, I have some cuda oom in some training step(not too much), will it influence the performance?
Yes, when the Cuda oom msg is shown, that particular batch is passed, hence that batch is not learned by the model.
Large memory is requested for a longer sequence, hence the model is missing out on those long sequences.
You can either:
- use smaller batch size and accumulate gradient through changing
gradient_accumulation_steps: 1
, or - sample a random sub-sequence when the sequence length exceeds a threshold, add this line to your config.
from s3prl.
@andi611 thanks a lot ! I will try that.
from s3prl.
Related Issues (20)
- Asking for how to use pretrained weight of Hugging Face models in downstream tasks. HOT 7
- An error occurrs when adding new downstream tasks. HOT 7
- Feature request for Language Identification on ML-SUPERB dataset HOT 5
- Multiresolution HuBERT as a new upstream HOT 7
- No module named 's3prl.superb' HOT 1
- Is this required for the SS and SE task? assert abs(feat_list[i].size(0) - length_list[i]) < 5. I am getting this error for wav2vec HOT 6
- Different upstream and downstream learning rates HOT 1
- ValueError: mutable default <class 's3prl.upstream.roberta.roberta_model.EncDecBaseConfig'> for field encoder is not allowed: use default_factory HOT 3
- Not able to submit the results. HOT 4
- The rules for conformity for emotion recognition. HOT 5
- Potential SpecAug Issue HOT 1
- What is the accept rate in the VC task evaluation output? HOT 1
- a question about two-stage downstream task HOT 1
- ASVspoof Dateset Support HOT 2
- Requesting to add CLSRIL-23 pretrained model as new upstream HOT 6
- Cannot submit my results in the leaderboard HOT 4
- Document link broken HOT 1
- Broken link HOT 4
- How to extract weighted sum SSL representations from an audio dataset?
- 使用自己的数据进行预训练
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from s3prl.