Hi! Tomiinek!! I trained tacotron with generated-trainin

Hello, I am sorry for a late response. I had pr

[Question] WaveRNN Vocoder about multilingual_text_to_speech HOT 6 CLOSED

tomiinek commented on May 24, 2024

[Question] WaveRNN Vocoder

from multilingual_text_to_speech.

Comments (6)

Tomiinek commented on May 24, 2024

Hello, I am sorry for a late response.

I had problems with training the model too and it was very very slow. You can always try out different vocoders like MelGAN or WaveGlow (I do not know what is SOTA currently).
I have two questions. Do the training audios have the same scale and are they normalized to the same volume? Are the quant values up to 1000 ok?

from multilingual_text_to_speech.

sooftware commented on May 24, 2024

I'll check the scale of audio. Is the frame length of your tacotron Mel spectrogram 50 ms and frame shift 12.5 ms?
Also, Is there a WaveRNN that I learned with Lj speech?

Thank you for your response.

from multilingual_text_to_speech.

sooftware commented on May 24, 2024

Audio's scale is the same. (up to 1000)

from multilingual_text_to_speech.

sooftware commented on May 24, 2024

Furthermore, if I insert the pre-built mel (from gta.py) into the WaveRNN you opened as input, it produces no speech at all.

from multilingual_text_to_speech.

Tomiinek commented on May 24, 2024

Is the frame length of your tacotron Mel spectrogram 50 ms and frame shift 12.5 ms?

Yes, it is.

Also, Is there a WaveRNN that I learned with Lj speech?

I think that it is in the original repository that I forked.

Audio's scale is the same. (up to 1000)

Ok, and is the scale the same if you apply some volume normalization such as sox infile.wav outfile.wav gain −n −3? I would really expect audio scale in [-1, 1].