Coder Social home page Coder Social logo

Comments (12)

Some-random avatar Some-random commented on June 26, 2024

Also, can you provide us with a direct link to pretrained Tera model with 960 hours of librispeech? There are just too many items in google drive... And the N, F, NT naming conversion is confusing...

from s3prl.

andi611 avatar andi611 commented on June 26, 2024

Hi,

Can we comment out all the other directories (like train_clean_360, train_other_500) in run.sh? It seems like doing the preprocesing on the whole 960 hours of data takes too much time.

Although I haven't tried, I assume you can.

we're only interested in using pre-trained librispeech model to finetune on librispeech-clean-100.

Since fmllr transform is learned, it won't match the pre-trained librispeech model. (Even if you preprocess the whole 960 hours, it will be different. )
In other words, pre-trained fmllr models only work best with the fmllr data I have on GoogleDrive.
If you wish to use the pre-trained models, you will have to use those data (Or re-train a model with your own data).

However, if you insist on finetuning the pre-trained model on other fmllr data,
I think it will still work but with decreased performance.

Also, can you provide us with a direct link to pretrained Tera model with 960 hours of librispeech?

-F: frequency masking
-N: noise alteration
-K: it just means Kaldi data
-NT: you can ignore this, its for ablation study.
You can use: fmllrBase960-F-N-K-libri or fmllrBase960-F-K-libri. They both perform well.

from s3prl.

Leeyouxie avatar Leeyouxie commented on June 26, 2024

models
Hi, Andi:
I am running ASR task by pre-trained Tera encoder. According to your second response, I have a question about a pre-trained model and matched data. Does the pre-trained model from google drive work best on the librispeech data processed by https://github.com/andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning/wiki/Extracting-with-Kaldi?

from s3prl.

andi611 avatar andi611 commented on June 26, 2024

Hi, Andi:
I am running ASR task by pre-trained Tera encoder. According to your second response, I have a question about a pre-trained model and matched data. Does the pre-trained model from google drive work best on the librispeech data processed by https://github.com/andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning/wiki/Extracting-with-Kaldi?

No, the data processed by https://github.com/andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning/wiki/Extracting-with-Kaldi will learn a different fmllr transform from pre-train.
The "pre-trained model from google drive" works best on "the fmllr data from google drive".

from s3prl.

Leeyouxie avatar Leeyouxie commented on June 26, 2024

Thanks for your reply. Can you tell me the differences between "the fmllr data from google drive" and " https://github.com/andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning/wiki/Extracting-with-Kaldi"?

from s3prl.

andi611 avatar andi611 commented on June 26, 2024

Thanks for your reply. Can you tell me the differences between "the fmllr data from google drive" and " https://github.com/andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning/wiki/Extracting-with-Kaldi"?

The difference is, fmllr transformer is learned (trained by maximizing likelihood), hence the learned transform will not be exactly the same.

from s3prl.

Leeyouxie avatar Leeyouxie commented on June 26, 2024

Thanks, I got it! I ran the ASR task successfully using pre-trained Tera encoder. But, I got 16.7 WER on timit fmllr dataset. According to the 12th line of Table VIII(14.5 WER), there is a obvious gap in WER score.

I follow the steps in https://github.com/andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning/wiki/Extracting-with-Kaldi for preprocessing data.
Downloading fmllrBase960-F-N-K-libri ckpt from your google drive.
I run script: python run_exp.py cfg/timit_transformer_liGRU_fmllr.cfg

Is there something wrong with what I did?

from s3prl.

andi611 avatar andi611 commented on June 26, 2024

The config file is slightly different. I've uploaded my exp result to Google Drive, which contains the config file conf.cfg I used for training and the res.res file that shows the 14.5% WER.

The main differences are:

  1. a LIN network is used before the pre-trained model:
[architecture1]
arch_name = lin
arch_proto = proto/Lin.proto
arch_library = nn_transformer
arch_class = LIN
arch_pretrain_file = none
arch_freeze = False
arch_seq_model = False
arch_lr = 0.0004
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = rmsprop
opt_momentum = 0.0
opt_alpha = 0.95
opt_eps = 1e-8
opt_centered = False
opt_weight_decay = 0.0

and 2) another MLP network is used after the pre-trained model and before liGRU:

[architecture3]
arch_name = MLP_layers_in
arch_proto = proto/MLP.proto
arch_library = neural_networks
arch_class = MLP
arch_pretrain_file = none
arch_freeze = False
arch_seq_model = False
dnn_lay = 640
dnn_drop = 0.2
dnn_use_laynorm_inp = False
dnn_use_batchnorm_inp = False
dnn_use_batchnorm = True
dnn_use_laynorm = False
dnn_act = leaky_relu
arch_lr = 0.0004
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = rmsprop
opt_momentum = 0.0
opt_alpha = 0.95
opt_eps = 1e-8
opt_centered = False
opt_weight_decay = 0.0

Hence the whole model is as follow:

[model]
model_proto = proto/model.proto
model = fea_mock=compute(lin,fmllr)
	fea_mock=compute(TRANSFORMER_AM,fea_mock)
	fea_mock=compute(MLP_layers_in,fea_mock)
	out_dnn1=compute(liGRU_layers,fea_mock)
	out_dnn2=compute(MLP_layers1,out_dnn1)
	out_dnn3=compute(MLP_layers2,out_dnn1)
	loss_mono=cost_nll(out_dnn3,lab_mono)
	loss_mono_w=mult_constant(loss_mono,1.0)
	loss_cd=cost_nll(out_dnn2,lab_cd)
	loss_final=sum(loss_cd,loss_mono_w)
	err_final=cost_err(out_dnn2,lab_cd)

from s3prl.

Leeyouxie avatar Leeyouxie commented on June 26, 2024

The config file is slightly different. I've uploaded my exp result to Google Drive, which contains the config file conf.cfg I used for training and the res.res file that shows the 14.5% WER.

The main differences are:

  1. a LIN network is used before the pre-trained model:
[architecture1]
arch_name = lin
arch_proto = proto/Lin.proto
arch_library = nn_transformer
arch_class = LIN
arch_pretrain_file = none
arch_freeze = False
arch_seq_model = False
arch_lr = 0.0004
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = rmsprop
opt_momentum = 0.0
opt_alpha = 0.95
opt_eps = 1e-8
opt_centered = False
opt_weight_decay = 0.0

and 2) another MLP network is used after the pre-trained model and before liGRU:

[architecture3]
arch_name = MLP_layers_in
arch_proto = proto/MLP.proto
arch_library = neural_networks
arch_class = MLP
arch_pretrain_file = none
arch_freeze = False
arch_seq_model = False
dnn_lay = 640
dnn_drop = 0.2
dnn_use_laynorm_inp = False
dnn_use_batchnorm_inp = False
dnn_use_batchnorm = True
dnn_use_laynorm = False
dnn_act = leaky_relu
arch_lr = 0.0004
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = rmsprop
opt_momentum = 0.0
opt_alpha = 0.95
opt_eps = 1e-8
opt_centered = False
opt_weight_decay = 0.0

Hence the whole model is as follow:

[model]
model_proto = proto/model.proto
model = fea_mock=compute(lin,fmllr)
	fea_mock=compute(TRANSFORMER_AM,fea_mock)
	fea_mock=compute(MLP_layers_in,fea_mock)
	out_dnn1=compute(liGRU_layers,fea_mock)
	out_dnn2=compute(MLP_layers1,out_dnn1)
	out_dnn3=compute(MLP_layers2,out_dnn1)
	loss_mono=cost_nll(out_dnn3,lab_mono)
	loss_mono_w=mult_constant(loss_mono,1.0)
	loss_cd=cost_nll(out_dnn2,lab_cd)
	loss_final=sum(loss_cd,loss_mono_w)
	err_final=cost_err(out_dnn2,lab_cd)

The config file is slightly different. I've uploaded my exp result to Google Drive, which contains the config file conf.cfg I used for training and the res.res file that shows the 14.5% WER.

The main differences are:

  1. a LIN network is used before the pre-trained model:
[architecture1]
arch_name = lin
arch_proto = proto/Lin.proto
arch_library = nn_transformer
arch_class = LIN
arch_pretrain_file = none
arch_freeze = False
arch_seq_model = False
arch_lr = 0.0004
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = rmsprop
opt_momentum = 0.0
opt_alpha = 0.95
opt_eps = 1e-8
opt_centered = False
opt_weight_decay = 0.0

and 2) another MLP network is used after the pre-trained model and before liGRU:

[architecture3]
arch_name = MLP_layers_in
arch_proto = proto/MLP.proto
arch_library = neural_networks
arch_class = MLP
arch_pretrain_file = none
arch_freeze = False
arch_seq_model = False
dnn_lay = 640
dnn_drop = 0.2
dnn_use_laynorm_inp = False
dnn_use_batchnorm_inp = False
dnn_use_batchnorm = True
dnn_use_laynorm = False
dnn_act = leaky_relu
arch_lr = 0.0004
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = rmsprop
opt_momentum = 0.0
opt_alpha = 0.95
opt_eps = 1e-8
opt_centered = False
opt_weight_decay = 0.0

Hence the whole model is as follow:

[model]
model_proto = proto/model.proto
model = fea_mock=compute(lin,fmllr)
	fea_mock=compute(TRANSFORMER_AM,fea_mock)
	fea_mock=compute(MLP_layers_in,fea_mock)
	out_dnn1=compute(liGRU_layers,fea_mock)
	out_dnn2=compute(MLP_layers1,out_dnn1)
	out_dnn3=compute(MLP_layers2,out_dnn1)
	loss_mono=cost_nll(out_dnn3,lab_mono)
	loss_mono_w=mult_constant(loss_mono,1.0)
	loss_cd=cost_nll(out_dnn2,lab_cd)
	loss_final=sum(loss_cd,loss_mono_w)
	err_final=cost_err(out_dnn2,lab_cd)

Hi, I ran ASR again according to your provided config file. But, I still got 16.3 WER. I found the architecture2's arch_library is nn_mockingjay in your config. I use nn_transformer instead of nn_mockingjay. The following is my config file:

timit_transformer_liGRU_fmllr.cfg.zip

I wonder whether there is something wrong with my config file.

from s3prl.

andi611 avatar andi611 commented on June 26, 2024

Hi,

The nn_mockingjay is the naming of my previous code,
which is now renamed as nn_transformer , they are the same, please ignore.

Try to use this ckpt instead (The one I used):
ckpt_file = /xxx/liwubo/Self-Supervised-Speech-Pretraining-and-Representation-Learning/ckpts/fmllrBase960-K-libri/states-1000000.ckpt

from s3prl.

Leeyouxie avatar Leeyouxie commented on June 26, 2024

Hi, Andy. I downloaded the fmllrBase960-K-libri ckpt from google drive and got a bit improvements(16.8 to 15.8). I found that some dimensions are different in the config file. The following shows the config file saved in exp/xxx/conf.cfg.
conf.cfg.zip

Thanks!

from s3prl.

andi611 avatar andi611 commented on June 26, 2024

Hi, Andy. I downloaded the fmllrBase960-K-libri ckpt from google drive and got a bit improvements(16.8 to 15.8). I found that some dimensions are different in the config file. The following shows the config file saved in exp/xxx/conf.cfg.
conf.cfg.zip

Thanks!

Hi, do you mean the dimension of [architecture5] or [architecture6]?
These are controlled by Pytorch-Kaldi, I'm not very sure why it does that.

from s3prl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.