Coder Social home page Coder Social logo

Questions regarding TERA about s3prl HOT 7 CLOSED

s3prl avatar s3prl commented on July 24, 2024
Questions regarding TERA

from s3prl.

Comments (7)

andi611 avatar andi611 commented on July 24, 2024

Hi,

I have several questions regarding your TERA paper:

  1. You used masked reconstruction for 80%, noisy reconstruction for 10% and clean reconstruction for 10%. Is this strategy better than 100% masked reconstruction? By how much?

The 80% / 10% / 10% masking strategies along the time axis is from BERT, it is better than 100% according to the BERT paper. This also makes sense, as the 10% clean allows the model to see clean data during training.

  1. Your feature extractor is a transformer and you added liGRU on top of it for finetuning. Why you used GRU instead of transformers for finetuning? Does GRU perform better?

We also explore adding a single MLP layer (which is basically fine-tuning just the transformers). In our exp we find liGRU to perform better than MLP, at the cost of significantly longer training time (due to liGRU's recurrence).

  1. You also mentioned in the paper that not freezing TERA in finetuning performs better. By not freezing, do you mean you don't freeze it from the beginning, or you freeze TERA for a while and then defreeze it in the later epochs?

We don't freeze from the beginning, we did not explore defreezing strategies.

  1. To understand your TERA code, which files should I look into? It seems that they are wrapped quite deep?

The code related to input alterations (time + freq + mag) can be found in transformer/mam.py, and the code of model architecture can be found in transformer/model.py

Thx!

I hope this helps!

from s3prl.

JunwenBai avatar JunwenBai commented on July 24, 2024

Thx for the reply!
Besides, for the alternation strategy, why did you ignored C in Fig. 2? And, how long does it typically take for pretrain+finetuning with your code?

from s3prl.

andi611 avatar andi611 commented on July 24, 2024

Thx for the reply!
Besides, for the alternation strategy, why did you ignored C in Fig. 2?

It is NOT ignored, it is incorporated in the time alteration objective (80% mask / 10% replace / 10% clean). Fig. 2C is the 10% replacment. Implemented here.

And, how long does it typically take for pretrain+finetuning with your code?

It depends on your GPU and the amount of pre-train data. In my case, 100 hrs of pre-train data takes a day to pre-train, 460 hrs take around 3 days, 960 hrs take around 5 days. Finetuning with liGRU takes a week, with MLP it takes only a couple hours (less than a day).

from s3prl.

JunwenBai avatar JunwenBai commented on July 24, 2024

This is with one 1080Ti GPU?

from s3prl.

andi611 avatar andi611 commented on July 24, 2024

This is with one 1080Ti GPU?

Yes!

from s3prl.

JunwenBai avatar JunwenBai commented on July 24, 2024

When you evaluate WER, do you consider 'color' and 'colour' are the same? Namely, the American English and British English?

from s3prl.

andi611 avatar andi611 commented on July 24, 2024

When you evaluate WER, do you consider 'color' and 'colour' are the same? Namely, the American English and British English?

Since all the label data are from LibriSpeech, hence no we do not consider the difference between American English and British English. (FYI, we use the standard Kaldi pipeline for evaluating WER)

from s3prl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.