Comments (7)
Hi,
I have several questions regarding your TERA paper:
- You used masked reconstruction for 80%, noisy reconstruction for 10% and clean reconstruction for 10%. Is this strategy better than 100% masked reconstruction? By how much?
The 80% / 10% / 10% masking strategies along the time axis is from BERT, it is better than 100% according to the BERT paper. This also makes sense, as the 10% clean allows the model to see clean data during training.
- Your feature extractor is a transformer and you added liGRU on top of it for finetuning. Why you used GRU instead of transformers for finetuning? Does GRU perform better?
We also explore adding a single MLP layer (which is basically fine-tuning just the transformers). In our exp we find liGRU to perform better than MLP, at the cost of significantly longer training time (due to liGRU's recurrence).
- You also mentioned in the paper that not freezing TERA in finetuning performs better. By not freezing, do you mean you don't freeze it from the beginning, or you freeze TERA for a while and then defreeze it in the later epochs?
We don't freeze from the beginning, we did not explore defreezing strategies.
- To understand your TERA code, which files should I look into? It seems that they are wrapped quite deep?
The code related to input alterations (time + freq + mag) can be found in transformer/mam.py, and the code of model architecture can be found in transformer/model.py
Thx!
I hope this helps!
from s3prl.
Thx for the reply!
Besides, for the alternation strategy, why did you ignored C in Fig. 2? And, how long does it typically take for pretrain+finetuning with your code?
from s3prl.
Thx for the reply!
Besides, for the alternation strategy, why did you ignored C in Fig. 2?
It is NOT ignored, it is incorporated in the time alteration objective (80% mask / 10% replace / 10% clean). Fig. 2C is the 10% replacment. Implemented here.
And, how long does it typically take for pretrain+finetuning with your code?
It depends on your GPU and the amount of pre-train data. In my case, 100 hrs of pre-train data takes a day to pre-train, 460 hrs take around 3 days, 960 hrs take around 5 days. Finetuning with liGRU takes a week, with MLP it takes only a couple hours (less than a day).
from s3prl.
This is with one 1080Ti GPU?
from s3prl.
This is with one 1080Ti GPU?
Yes!
from s3prl.
When you evaluate WER, do you consider 'color' and 'colour' are the same? Namely, the American English and British English?
from s3prl.
When you evaluate WER, do you consider 'color' and 'colour' are the same? Namely, the American English and British English?
Since all the label data are from LibriSpeech, hence no we do not consider the difference between American English and British English. (FYI, we use the standard Kaldi pipeline for evaluating WER)
from s3prl.
Related Issues (20)
- Asking for how to use pretrained weight of Hugging Face models in downstream tasks. HOT 7
- An error occurrs when adding new downstream tasks. HOT 7
- Feature request for Language Identification on ML-SUPERB dataset HOT 5
- Multiresolution HuBERT as a new upstream HOT 7
- No module named 's3prl.superb' HOT 1
- Is this required for the SS and SE task? assert abs(feat_list[i].size(0) - length_list[i]) < 5. I am getting this error for wav2vec HOT 6
- Different upstream and downstream learning rates HOT 1
- ValueError: mutable default <class 's3prl.upstream.roberta.roberta_model.EncDecBaseConfig'> for field encoder is not allowed: use default_factory HOT 3
- Not able to submit the results. HOT 4
- The rules for conformity for emotion recognition. HOT 5
- Potential SpecAug Issue HOT 1
- What is the accept rate in the VC task evaluation output? HOT 1
- a question about two-stage downstream task HOT 1
- ASVspoof Dateset Support HOT 2
- Requesting to add CLSRIL-23 pretrained model as new upstream HOT 6
- Cannot submit my results in the leaderboard HOT 4
- Document link broken HOT 1
- Broken link HOT 4
- How to extract weighted sum SSL representations from an audio dataset?
- 使用自己的数据进行预训练
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from s3prl.