Questions regarding TERA about s3prl HOT 7 CLOSED

s3prl commented on July 24, 2024

Questions regarding TERA

from s3prl.

Comments (7)

andi611 commented on July 24, 2024

Hi,

I have several questions regarding your TERA paper:

You used masked reconstruction for 80%, noisy reconstruction for 10% and clean reconstruction for 10%. Is this strategy better than 100% masked reconstruction? By how much?

The 80% / 10% / 10% masking strategies along the time axis is from BERT, it is better than 100% according to the BERT paper. This also makes sense, as the 10% clean allows the model to see clean data during training.

Your feature extractor is a transformer and you added liGRU on top of it for finetuning. Why you used GRU instead of transformers for finetuning? Does GRU perform better?

We also explore adding a single MLP layer (which is basically fine-tuning just the transformers). In our exp we find liGRU to perform better than MLP, at the cost of significantly longer training time (due to liGRU's recurrence).

You also mentioned in the paper that not freezing TERA in finetuning performs better. By not freezing, do you mean you don't freeze it from the beginning, or you freeze TERA for a while and then defreeze it in the later epochs?

We don't freeze from the beginning, we did not explore defreezing strategies.

To understand your TERA code, which files should I look into? It seems that they are wrapped quite deep?

The code related to input alterations (time + freq + mag) can be found in transformer/mam.py, and the code of model architecture can be found in transformer/model.py

Thx!

I hope this helps!

from s3prl.

JunwenBai commented on July 24, 2024

Thx for the reply!
Besides, for the alternation strategy, why did you ignored C in Fig. 2? And, how long does it typically take for pretrain+finetuning with your code?

from s3prl.

andi611 commented on July 24, 2024

Thx for the reply!
Besides, for the alternation strategy, why did you ignored C in Fig. 2?

It is NOT ignored, it is incorporated in the time alteration objective (80% mask / 10% replace / 10% clean). Fig. 2C is the 10% replacment. Implemented here.

And, how long does it typically take for pretrain+finetuning with your code?

It depends on your GPU and the amount of pre-train data. In my case, 100 hrs of pre-train data takes a day to pre-train, 460 hrs take around 3 days, 960 hrs take around 5 days. Finetuning with liGRU takes a week, with MLP it takes only a couple hours (less than a day).

from s3prl.

JunwenBai commented on July 24, 2024

This is with one 1080Ti GPU?

from s3prl.

andi611 commented on July 24, 2024

This is with one 1080Ti GPU?

Yes!

from s3prl.

JunwenBai commented on July 24, 2024

When you evaluate WER, do you consider 'color' and 'colour' are the same? Namely, the American English and British English?

from s3prl.

andi611 commented on July 24, 2024

When you evaluate WER, do you consider 'color' and 'colour' are the same? Namely, the American English and British English?

Since all the label data are from LibriSpeech, hence no we do not consider the difference between American English and British English. (FYI, we use the standard Kaldi pipeline for evaluating WER)

from s3prl.

Questions regarding TERA about s3prl HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent