adelacvg / ttts Goto Github PK

View Code? Open in Web Editor NEW

145.0 145.0 17.0 75.63 MB

Train the next generation of TTS systems.

License: Mozilla Public License 2.0

Python 74.20% Jupyter Notebook 25.51% Shell 0.29%

tortoise-tts tts valle zero-shot

ttts's People

Contributors

Stargazers

Watchers

Forkers

splinter21 ishine maxmax2016 whitefu pan-yangxu amorjnyh techthiyanes xzm2004260 huaxuanw zdj97 boostpapa hertz-pj wangyongqing0731 aixingxy zshy1205 zioli24 kevinwang676

ttts's Issues

License

Please add a license. Please keep in mind that MPL (Coqui) may have restrictions on the license. Thank you!

some questions about training gpt

Hello, thanks for your code contribution. I try to train the gpt on my datasets, I have some questions:

I use the same batch size as the configuration file on 8 A100. It takes about 3 hours to train 2000 steps. Is this normal?
Based on your experience, how low should the loss value be?
I only changed the input phoneme sequence, vqvae is consistent with the pre-trained model you provided, and I can get the correct audio using vq decoder. When I use latent as input for pre-trained diffusion and vocos, I cannot get the correct audio. Is this normal?

Model

Hi, thanks for making this! Will you be sharing the pre-trained models (beta versions?) on Hugging Face?

Hello, I followed all your training steps. But the generated audio has serious pronunciation problems. It feels like there are some problems with the GPT part. But debugging doesn't show the problem.
May I ask if you have any experience in handling this situation?

text: "中部地区如江汉平原，当前冬小麦普遍到了拔节期，南方则进入孕穗期，这时候小麦幼穗对外界环境更加敏感，长时间降温甚至降雪，可能会形成局部冻害。"
results.zip

我使用了main中的训练代码，但是并没有限制到最少4s，请问一下，代码中限制音频最少4s是基于什么考虑呢？

English support

Hi,
Is there a timeline on when this will support English?
Thank you!

Training on a new language

Thank you for your amazing work! The result of this model sounds really promising. If possible, could you guide me on how to train this model in another language? Thank you in advance.

Will this be faster than Tortoise?

vqvae多卡训练loss为nan

使用master分支代码，在aishell3数据集上训练，使用单卡时，损失正常；使用2卡时，反向传播一次后，loss为nan

How is the performance of this code?

Hi, thanks for your nice work on TTS. I want to know, did you have train this? How is the performance of TTS?

How long train a new model

How long does it take to train a model from scratch, and how much dataset is needed?

Training diffusion error

Hello, I went to the [diffusion step] based on your code, and then the following errors appeared.

omegaconf.errors.ConfigAttributeError: Missing key timm_model_name , full_key: clip.vision_cfg.timm_model_name

I think this parameter（timm_model_name）is not written in the diffusion configuration, so I would like to ask what configuration your pretrained model is based on.

train hifigan failed

I am currently retraining a 16k HiFi-GAN, but using latent as the input, the loss is difficult to converge. Although human voice can be heard, there is a presence of electronic sound. What could be the reason for this? Looking forward to your reply, thank you.