Dear Author : Hello ! I am a Master from xi’an jiaotong university, thanks for you

Thank you for your answer! But I also have two problems. <ol dir="aut

The phone accuracy of Mockingjay about s3prl HOT 4 CLOSED

s3prl commented on July 24, 2024

The phone accuracy of Mockingjay

from s3prl.

Comments (4)

andi611 commented on July 24, 2024

Hi, the reason is the features used to train Mockingjay does not match the phone label (which is a label for every 10ms, you need to use features with windows of 25 ms and an overlap of 10 ms).

You can either:

use a different preprocessing script, that matches this phone label (python preprocess/preprocess_libri.py --feature_type=fbank --delta=False # 80-dim), I recommend this approach.
or change to evaluate on the Montreal phone set (which matches the features that you are using).

The accuracy of Mockingjay should be in the range of 64%~67% (on test-clean), depending on the amount of unlabeled data used in pre-training (see TABLE I of our paper).

from s3prl.

developerFanYu commented on July 24, 2024

Thank you for your answer! But I also have two problems.

First, I see the paper 'Montreal Force Aligner: trainable text-speech alignment using Kaldi', and find the paper use 25 ms window size and 10 ms frame shift which means 15 ms overlap. Could you tell me whether I should use an overlap of 10 ms of 15 ms.
Second, I modify the 'audio.py' and set 'frame_length_ms=25, frame_shift_ms = 10' to obtain mel160 features. Further, I set 'args.phone_set==montreal_phone'. When I use my pre-train model, I can not get the ideal result. When I use ‘frame_ shift_ms=15', the result is also below 10%.

Command :
`python run_upstream.py --run=transformer --config=config/config/mockingjay_libri_melBase.yaml --name=mockingjay_melBase

python run_downstream.py --run=phone_1hidden --upstream=transformer --ckpt=path_to_ckpt/states-500000.ckpt --phone_set=montreal_phone`

I have send my configurations and results to your mail. Would you like to give some suggestions to me ? Thank you very much !

from s3prl.

developerFanYu commented on July 24, 2024

Thank you  for your answer! But I also have two problems. 1. First, I see the paper 'Montreal Force Aligner: trainable text-speech alignment using Kaldi', and find the paper use 25 ms window size and 10 ms frame shift which means 15 ms overlap. Could you tell me whether I should use an overlap of 10 ms of 15 ms. 2. Second, I modify the 'audio.py' and set 'frame_length_ms=25, frame_shift_ms = 10' to obtain mel160 features. Further, I set 'args.phone_set==montreal_phone'. When I use my pre-train model, I can not get the ideal result. When I use ‘frame_ shift_ms=15', the result is also below 10%. Command :  `python run_upstream.py --run=transformer --config=config/config/mockingjay_libri_melBase.yaml --name=mockingjay_melBase python run_downstream.py --run=phone_1hidden --upstream=transformer --ckpt=path_to_ckpt/states-500000.ckpt --phone_set=montreal_phone` I have send my configurations and results to your mail. Would you like to give some suggestions to me ? Thank you very much !

…

------------------ 原始邮件 ------------------ 发件人: "andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning" <[email protected]>; 发送时间: 2020年10月25日(星期天) 晚上7:12 收件人: "andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning"<Self-Supervised-Speech-Pretraining-and-Representation-Learning@noreply.github.com>; 抄送: "凡凡"<[email protected]>;"Author"<[email protected]>; 主题: Re: [andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning] The phone accuracy of Mockingjay (#44) Hi, the reason is the features used to train Mockingjay does not match the phone label (which is a label for every 10ms, you need to use features with windows of 25 ms and an overlap of 10 ms). You can either use a different preprocessing script, that matches this phone label (python preprocess/preprocess_libri.py --feature_type=fbank --delta=False # 80-dim), or change to evaluate on the Montreal phone set (which matches the features that you are using). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

from s3prl.

andi611 commented on July 24, 2024

Thank you for your answer! But I also have two problems.

First, I see the paper 'Montreal Force Aligner: trainable text-speech alignment using Kaldi', and find the paper use 25 ms window size and 10 ms frame shift which means 15 ms overlap. Could you tell me whether I should use an overlap of 10 ms of 15 ms.

The alignment of Montreal Force Aligner only gives time intervals of phone (i.e. 1.4s~1.7s for example).
Hence you need to compute how these intervals map to every time frame.
This is done here in the preprocessing stage.
So, whether to use different window sizes and frameshift is of your choice,
but you need to re-run this script every time you change settings in utility/audio.py.

Second, I modify the 'audio.py' and set 'frame_length_ms=25, frame_shift_ms = 10' to obtain mel160 features. Further, I set 'args.phone_set==montreal_phone'. When I use my pre-train model, I can not get the ideal result. When I use ‘frame_ shift_ms=15', the result is also below 10%.

You also need to change this line in your downstream.yaml to load Montreal phone labels:

  # phone_path: 'data/cpc_phone'
  phone_path: 'data/libri_phone'

These two problems are correlated, see here for more instructions.

I hope this helps!
Andy

from s3prl.

The phone accuracy of Mockingjay about s3prl HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent