Coder Social home page Coder Social logo

Comments (10)

michael-conrad avatar michael-conrad commented on May 28, 2024

Is this the right approach?

https://pytorch.org/tutorials/recipes/recipes/warmstarting_model_using_parameters_from_a_different_model.html

from multilingual_text_to_speech.

michael-conrad avatar michael-conrad commented on May 28, 2024

I noticed that you have to code set to overwrite a checkpoints params if given an explicit param.

So I'm trying

python train-ga.py --checkpoint generated_switching --hyper_parameters generated_switching_cherokee6 --accumulation_size 5

After making sure the alphabets and languages from the checkpointed version are appended to versions in the new params file.

from multilingual_text_to_speech.

Tomiinek avatar Tomiinek commented on May 28, 2024

Ah, I am sorry for a late response, I forgot ...

Please include instructions on how to resume training starting with your 70k iteration weights.
Is this the right approach?
https://pytorch.org/tutorials/recipes/recipes/warmstarting_model_using_parameters_from_a_different_model.html

These are just weights and not checkpoints (so it is missing optimizer-related things and so on), but you can use them for initialization. Look at these lines. The last four lines are not relevant in this case, so you can remove them.

Would it be possible to add additional languages as part of a fine tuning process?

I originally wanted to include the "fine-tuning" feature, but the code became very complicated and I actually did not need it for my experiments. I removed all the code related to fine-tuning in this commit 6c603ef. Check out the train.py file.

The typical use case is probably that you fine-tune the multilingual model to a single new language or speaker. Things are complicated because you have to make sure that the alphabet, speakers etc. matches and decide what to do if not (which approach to initialization etc.). In the case of the generated model, you also (IMHO) want to freeze all the encoder parameters and fine-tune just the language and speaker embeddings and maybe also the decoder, but in the case of other models supported by the code, you want to freeze or train different parts ...

from multilingual_text_to_speech.

michael-conrad avatar michael-conrad commented on May 28, 2024

These are just weights and not checkpoints (so it is missing optimizer-related things and so on), but you can use them for initialization. Look at these lines. The last four lines are not relevant in this case, so you can remove them.

so, I can add a CLI option to do a "--with__weights" or similar, load the weights, but otherwise do everything as a new model?

if yes, would there be any advantage in starting with the previous parameters, then adding the additional language, so everything is in the same order or embedding?

from multilingual_text_to_speech.

stale avatar stale commented on May 28, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from multilingual_text_to_speech.

padmanabhankrishnamurthy avatar padmanabhankrishnamurthy commented on May 28, 2024

Hi,

Just wanted to know if there has been any movement on this, and if there's a clearer path to fine-tuning the model with new languages / speakers now?

For example, if I wanted to add support for English without having to re-train, what parameters would I have to freeze / train to enable this?

Thanks!

from multilingual_text_to_speech.

Tomiinek avatar Tomiinek commented on May 28, 2024

Hello, I am sorry guys, no movement. The training script is also not very fine-tuning friendly ๐Ÿ˜”

from multilingual_text_to_speech.

padmanabhankrishnamurthy avatar padmanabhankrishnamurthy commented on May 28, 2024

Thanks for the reply!

I've been trying to adapt the current code for fine-tuning on the LJSpeech dataset, i.e, adding support for English and for the LJSpeech speaker.

My approach currently involves freezing all parameters of the character encoder using param.requires_grad=False, and just training only the language encoder and the speaker encoder. Since there is only one speaker in the LJSpeech dataset, I have even set multi_speaker to False to turn off the adversarial speaker classifier. My model has been training for around 2 days (150 epochs on only the LJSpeech dataset), and while speech is starting to be generated in the LJSpeech speaker's voice, the model appears to have lost all information about other speakers. Consequently, feeding in any speaker id produces speech only in the LJSpeech speaker's voice.

Does this approach seem right to you?

from multilingual_text_to_speech.

Tomiinek avatar Tomiinek commented on May 28, 2024

Ou, interesting!

Just to clarify ... Are you useing GeneratedConvolutionalEncoder as the encoder? If so, how did you add English? Did you make the inner embedding bigger and trainable while fixing the rest of the encoder parameters?
Also, how do you load the pre-trained model or treat the speaker embeddings? Because if you set multi_speaker=False the checkpoint has some extra paramters (and maybe the decoder expects larger inputs?)

Fixing decoder seems ok, but you cannot expect that the resulting voice will be exactly matching Linda. Maybe, you can try fine-tuning it too but with lower learning rate.

from multilingual_text_to_speech.

padmanabhankrishnamurthy avatar padmanabhankrishnamurthy commented on May 28, 2024

Hi,

So unfortunately, our fine-tuning experiments didn't work out.
But we're trying another line of experiments wherein we're attempting to get a single English speaker to speak in another language (say for example, German). In this case, since the use-case employs only one English speaker, is it sufficient to train the model using English recordings of only the target speaker, and German recordings of multiple other speakers? I.e, am I right in concluding that recordings of multiple English speakers are unnecessary, since we wish to synthesise German speech in only one particular English voice?

Thanks!

from multilingual_text_to_speech.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.