It work fine on single GPU, but failed on 3 GPUs only change "max_gp

Here is the result <a target="_blank" rel="noopener noreferrer nofollow" href="htt

Should be solved by <a class="commit-link" data-hovercard-type="commit" data-hovercard

train failed on multiGPUs about multilingual_text_to_speech HOT 5 CLOSED

tomiinek commented on May 27, 2024

train failed on multiGPUs

from multilingual_text_to_speech.

Comments (5)

JoeyHeisenberg commented on May 27, 2024 1

Here is the result

from multilingual_text_to_speech.

Tomiinek commented on May 27, 2024

Hi, could you please provide the .json file with parameters you are using?

There is a mismatch in shapes. You are using the generated encoder, so the batch_size must be divisible by the number of lanugages you are training on times the number of GPUs. So let's say you are trianing on CSS10 with 10 languages and you want to use 3 GPUs, then set the batch_size parameter to 30, 60, 90, ...

Hope it helps 😁

from multilingual_text_to_speech.

JoeyHeisenberg commented on May 27, 2024

The above results are actually with the batch_size setting to 60

I tried to set to 180, but it still failed

Original Traceback (most recent call last): File "/data/glusterfs_speech_tts/public_data/11104653/tools/miniconda3/envs/envMTS/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "/data/glusterfs_speech_tts/public_data/11104653/tools/miniconda3/envs/envMTS/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/data/glusterfs_speech_tts/public_data/11104653/multiLingual_voice_cloning/Multilingual_Text_to_Speech/modules/tacotron2.py", line 364, in forward encoded = self._encoder(embedded, text_length, languages) File "/data/glusterfs_speech_tts/public_data/11104653/tools/miniconda3/envs/envMTS/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/data/glusterfs_speech_tts/public_data/11104653/multiLingual_voice_cloning/Multilingual_Text_to_Speech/modules/encoder.py", line 208, in forward x = x.reshape(bs // self._groups, self._groups * self._input_dim, -1) RuntimeError: shape '[3, 5120, -1]' is invalid for input of size 2994176

HERE IS THE .json
{ "balanced_sampling": true, "batch_size": 180, "case_sensitive": false, "characters": " abcdefghijklmnopqrstuvwxyzçèéßäöōǎǐíǒàáǔüèéìūòóùúāēěīâêôûñőűабвгдежзийклмнопрстуфхцчшщъыьэюяёάέήίαβγδεζηθικλμνξοπρςíστυφχψωόύώ", "checkpoint_each_epochs": 5, "dataset": "css10", "encoder_dimension": 256, "encoder_type": "generated", "epochs": 300, "generator_bottleneck_dim": 8, "generator_dim": 20, "languages": ["german", "french", "hungarian", "chinese", "spanish", "dutch", "finnish", "russian", "japanese", "greek"], "language_embedding_dimension": 32, "learning_rate": 0.001, "learning_rate_decay_each": 10000, "learning_rate_decay_start": 10000, "multi_language": true, "perfect_sampling": true, "predict_linear": false, "version": "GENERATED-TRAINING" }

from multilingual_text_to_speech.

Tomiinek commented on May 27, 2024

Oh, ok. Still can't figure out where is the problem 😢

Could you please place print(x.shape) to the line 197 of encoder.py and send me the output? I cannot test it myself now.

If you want to make it working immediately, change line 229 in train.py from:

eval_sampler = PerfectBatchSampler(dataset.dev, hp.languages, hp.batch_size, data_parallel_devices=dp_devices, shuffle=False)

eval_sampler = PerfectBatchSampler(dataset.dev, hp.languages, hp.batch_size, data_parallel_devices=dp_devices, shuffle=False, drop_last=True)

and it should work.

from multilingual_text_to_speech.

Tomiinek commented on May 27, 2024

Should be solved by 0401a6d. Thank you!

from multilingual_text_to_speech.

train failed on multiGPUs about multilingual_text_to_speech HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent