Hi, Recently I finished pretrain and finetuning on the LRS3 dataset which is pretr

Hi, We always used 32 GPUs for pretraining in each iteration a

Question on result of pretrain 433h and finetune 30h on LRS3 about av_hubert HOT 5 OPEN

li563042811 commented on August 19, 2024

Question on result of pretrain 433h and finetune 30h on LRS3

from av_hubert.

Comments (5)

chevalierNoir commented on August 19, 2024 1

Hi,

In LRS3 pretraining, the model is trained for 400K steps. The training steps for finetuning depend on the task and are 30K for VSR and AVSR under 30 hours. You can find those numbers in the optimization.max_update field of the corresponding configuration files. If you use 8 GPUs, you can simulate the 32-GPU pretraining by appending optimization.update_freq=[4]. This will increase the effective batch size to 4 times larger.

from av_hubert.

chevalierNoir commented on August 19, 2024

Hi,

Regarding the inferior performance of AV-HuBERT to the audio-HuBERT in ASR under clean setting, we also noticed such phenomenon in our paper (last paragraph of section 4.5) and we attribute it to the hyperparameter selection based on lip-reading rather than on ASR. For Table 4, we trained an audio-HuBERT for one iteration using the clusters from the AV-HuBERT of the last iteration.

In our previous experiments, we noticed setting different modality dropout values in the first 4 iterations does not have an impact as large as the one on the last iteration. We will re-check those values in our original configurations for the first 4 iterations on LRS3.
mask_prob_image in the config file is the probability of each frame being masked, which is equal to $p\times l$ in the paper. Note the p in the paper is the probability of one frame being selected as the start.

from av_hubert.

li563042811 commented on August 19, 2024

Thank you for your reply. I think maybe the reason why the result of AV-HuBERT I trained didn't surpass A-HuBERT in your paper is that I used 8 A-100 for pretrain and finetune and I didn't change batch_size and max_tokens.
In your paper, you used 32 GPUs. So the number of epochs of my training might be less than yours. Could you tell me how many epochs are in your AV-HuBERT's pretraining and finetuning?

from av_hubert.

li563042811 commented on August 19, 2024

Hi
I pretrained AV-HuBERT in LRS3 by appending optimization.update_freq=[4] still using 8 GPUs. The epoch number changes to 275 from 69. The 1st iteration costs 102 hours and after finetuning in 30h data the clean av WER comes to 12.56, which is 15.88 in case of optimization.update_freq=[1].
I didn't change other hyperparameters.
The pretrain config file is avhubert/conf/pretrain/base_lrs3_iter1.yaml appending optimization.update_freq=[4].
The finetune config file is avhubert/conf/av-finetune/base_noise_pt_noise_ft_30h.yaml appending optimization.update_freq=[4].

I have two questions,

In your experiments, is the upgrade in the 1st iteration also work in the last iteration with the same relative improvement？
Could you give WER of iteration[1-5] of your AV-HuBERT model pretrained in 433h LRS3 and fine-tuned in 30h data?

from av_hubert.

chevalierNoir commented on August 19, 2024

Hi,

We always used 32 GPUs for pretraining in each iteration and thus didn't know the number of relative improvement by using 32 GPUs compared to using 8 GPUs. But my guess is the gain will hold. The gain from using 32 GPUs is also observed in the original HuBERT paper (Figure 3).
For AVSR/ASR, we only train the model for one iteration (last iteration). For VSR (lip reading), we mostly use CTC for fine-tuning during the first 4 iterations and only use seq2seq finetuning for the last iteration (which outperforms CTC). The CTC numbers for each iteration are in the paper (first line of Table 2).

Note for fine-tuning, we always use 8 GPUs and haven't tried 32 GPUs for that.

from av_hubert.

Question on result of pretrain 433h and finetune 30h on LRS3 about av_hubert HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent