Comments (10)
Is this the right approach?
from multilingual_text_to_speech.
I noticed that you have to code set to overwrite a checkpoints params if given an explicit param.
So I'm trying
python train-ga.py --checkpoint generated_switching --hyper_parameters generated_switching_cherokee6 --accumulation_size 5
After making sure the alphabets and languages from the checkpointed version are appended to versions in the new params file.
from multilingual_text_to_speech.
Ah, I am sorry for a late response, I forgot ...
Please include instructions on how to resume training starting with your 70k iteration weights.
Is this the right approach?
https://pytorch.org/tutorials/recipes/recipes/warmstarting_model_using_parameters_from_a_different_model.html
These are just weights and not checkpoints (so it is missing optimizer-related things and so on), but you can use them for initialization. Look at these lines. The last four lines are not relevant in this case, so you can remove them.
Would it be possible to add additional languages as part of a fine tuning process?
I originally wanted to include the "fine-tuning" feature, but the code became very complicated and I actually did not need it for my experiments. I removed all the code related to fine-tuning in this commit 6c603ef. Check out the train.py
file.
The typical use case is probably that you fine-tune the multilingual model to a single new language or speaker. Things are complicated because you have to make sure that the alphabet, speakers etc. matches and decide what to do if not (which approach to initialization etc.). In the case of the generated model, you also (IMHO) want to freeze all the encoder parameters and fine-tune just the language and speaker embeddings and maybe also the decoder, but in the case of other models supported by the code, you want to freeze or train different parts ...
from multilingual_text_to_speech.
These are just weights and not checkpoints (so it is missing optimizer-related things and so on), but you can use them for initialization. Look at these lines. The last four lines are not relevant in this case, so you can remove them.
so, I can add a CLI option to do a "--with__weights" or similar, load the weights, but otherwise do everything as a new model?
if yes, would there be any advantage in starting with the previous parameters, then adding the additional language, so everything is in the same order or embedding?
from multilingual_text_to_speech.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from multilingual_text_to_speech.
Hi,
Just wanted to know if there has been any movement on this, and if there's a clearer path to fine-tuning the model with new languages / speakers now?
For example, if I wanted to add support for English without having to re-train, what parameters would I have to freeze / train to enable this?
Thanks!
from multilingual_text_to_speech.
Hello, I am sorry guys, no movement. The training script is also not very fine-tuning friendly ๐
from multilingual_text_to_speech.
Thanks for the reply!
I've been trying to adapt the current code for fine-tuning on the LJSpeech dataset, i.e, adding support for English and for the LJSpeech speaker.
My approach currently involves freezing all parameters of the character encoder using param.requires_grad=False, and just training only the language encoder and the speaker encoder. Since there is only one speaker in the LJSpeech dataset, I have even set multi_speaker to False to turn off the adversarial speaker classifier. My model has been training for around 2 days (150 epochs on only the LJSpeech dataset), and while speech is starting to be generated in the LJSpeech speaker's voice, the model appears to have lost all information about other speakers. Consequently, feeding in any speaker id produces speech only in the LJSpeech speaker's voice.
Does this approach seem right to you?
from multilingual_text_to_speech.
Ou, interesting!
Just to clarify ... Are you useing GeneratedConvolutionalEncoder
as the encoder? If so, how did you add English? Did you make the inner embedding bigger and trainable while fixing the rest of the encoder parameters?
Also, how do you load the pre-trained model or treat the speaker embeddings? Because if you set multi_speaker=False
the checkpoint has some extra paramters (and maybe the decoder expects larger inputs?)
Fixing decoder seems ok, but you cannot expect that the resulting voice will be exactly matching Linda. Maybe, you can try fine-tuning it too but with lower learning rate.
from multilingual_text_to_speech.
Hi,
So unfortunately, our fine-tuning experiments didn't work out.
But we're trying another line of experiments wherein we're attempting to get a single English speaker to speak in another language (say for example, German). In this case, since the use-case employs only one English speaker, is it sufficient to train the model using English recordings of only the target speaker, and German recordings of multiple other speakers? I.e, am I right in concluding that recordings of multiple English speakers are unnecessary, since we wish to synthesise German speech in only one particular English voice?
Thanks!
from multilingual_text_to_speech.
Related Issues (20)
- Adding support for windows sapi5 or android HOT 4
- Voice cloning attempts HOT 1
- Model is much slower on CPU HOT 6
- No softmax layer in the classifier? HOT 1
- Params.py issue
- torch version issue HOT 4
- can't run train.py HOT 1
- data HOT 3
- When I try to train it, I got the following error: HOT 6
- How "Pronunciation control" can be implemented? HOT 1
- batchnorm1D on padded values results in large activation scaling HOT 3
- Project dependencies may have API risk issues HOT 2
- about ยต and variances ฯ HOT 5
- Dataset with various sample rates and frequency bins HOT 1
- preprocess Error HOT 1
- Can we get a cloned voicie in Real Time ? HOT 1
- is the pretrained model support speech generation in Hebrew? HOT 1
- CUDA Out of Memory error after a couple of epochs HOT 1
- Same here.
- why do we need multiple languages & multiple speakers? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from multilingual_text_to_speech.