Coder Social home page Coder Social logo

The phone accuracy of Mockingjay about s3prl HOT 4 CLOSED

s3prl avatar s3prl commented on July 24, 2024
The phone accuracy of Mockingjay

from s3prl.

Comments (4)

andi611 avatar andi611 commented on July 24, 2024

Hi, the reason is the features used to train Mockingjay does not match the phone label (which is a label for every 10ms, you need to use features with windows of 25 ms and an overlap of 10 ms).

You can either:

  1. use a different preprocessing script, that matches this phone label (python preprocess/preprocess_libri.py --feature_type=fbank --delta=False # 80-dim), I recommend this approach.

  2. or change to evaluate on the Montreal phone set (which matches the features that you are using).

The accuracy of Mockingjay should be in the range of 64%~67% (on test-clean), depending on the amount of unlabeled data used in pre-training (see TABLE I of our paper).

from s3prl.

developerFanYu avatar developerFanYu commented on July 24, 2024

Thank you for your answer! But I also have two problems.

  1. First, I see the paper 'Montreal Force Aligner: trainable text-speech alignment using Kaldi', and find the paper use 25 ms window size and 10 ms frame shift which means 15 ms overlap. Could you tell me whether I should use an overlap of 10 ms of 15 ms.

  2. Second, I modify the 'audio.py' and set 'frame_length_ms=25, frame_shift_ms = 10' to obtain mel160 features. Further, I set 'args.phone_set==montreal_phone'. When I use my pre-train model, I can not get the ideal result. When I use ‘frame_ shift_ms=15', the result is also below 10%.

Command :
`python run_upstream.py --run=transformer --config=config/config/mockingjay_libri_melBase.yaml --name=mockingjay_melBase

python run_downstream.py --run=phone_1hidden --upstream=transformer --ckpt=path_to_ckpt/states-500000.ckpt --phone_set=montreal_phone`

I have send my configurations and results to your mail. Would you like to give some suggestions to me ? Thank you very much !

from s3prl.

developerFanYu avatar developerFanYu commented on July 24, 2024

from s3prl.

andi611 avatar andi611 commented on July 24, 2024

Thank you for your answer! But I also have two problems.

  1. First, I see the paper 'Montreal Force Aligner: trainable text-speech alignment using Kaldi', and find the paper use 25 ms window size and 10 ms frame shift which means 15 ms overlap. Could you tell me whether I should use an overlap of 10 ms of 15 ms.

The alignment of Montreal Force Aligner only gives time intervals of phone (i.e. 1.4s~1.7s for example).
Hence you need to compute how these intervals map to every time frame.
This is done here in the preprocessing stage.
So, whether to use different window sizes and frameshift is of your choice,
but you need to re-run this script every time you change settings in utility/audio.py.

  1. Second, I modify the 'audio.py' and set 'frame_length_ms=25, frame_shift_ms = 10' to obtain mel160 features. Further, I set 'args.phone_set==montreal_phone'. When I use my pre-train model, I can not get the ideal result. When I use ‘frame_ shift_ms=15', the result is also below 10%.

You also need to change this line in your downstream.yaml to load Montreal phone labels:

  # phone_path: 'data/cpc_phone'
  phone_path: 'data/libri_phone'

These two problems are correlated, see here for more instructions.

I hope this helps!
Andy

from s3prl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.