adelacvg / ttts Goto Github PK
View Code? Open in Web Editor NEWTrain the next generation of TTS systems.
License: Mozilla Public License 2.0
Train the next generation of TTS systems.
License: Mozilla Public License 2.0
Please add a license. Please keep in mind that MPL (Coqui) may have restrictions on the license. Thank you!
Hello, thanks for your code contribution. I try to train the gpt on my datasets, I have some questions:
Hi, thanks for making this! Will you be sharing the pre-trained models (beta versions?) on Hugging Face?
Hello, I followed all your training steps. But the generated audio has serious pronunciation problems. It feels like there are some problems with the GPT part. But debugging doesn't show the problem.
May I ask if you have any experience in handling this situation?
text: "中部地区如江汉平原,当前冬小麦普遍到了拔节期,南方则进入孕穗期,这时候小麦幼穗对外界环境更加敏感,长时间降温甚至降雪,可能会形成局部冻害。"
results.zip
我使用了main中的训练代码,但是并没有限制到最少4s,请问一下,代码中限制音频最少4s是基于什么考虑呢?
Hi,
Is there a timeline on when this will support English?
Thank you!
Thank you for your amazing work! The result of this model sounds really promising. If possible, could you guide me on how to train this model in another language? Thank you in advance.
使用master分支代码,在aishell3数据集上训练,使用单卡时,损失正常;使用2卡时,反向传播一次后,loss为nan
Hi, thanks for your nice work on TTS. I want to know, did you have train this? How is the performance of TTS?
How long does it take to train a model from scratch, and how much dataset is needed?
Hello, I went to the [diffusion step] based on your code, and then the following errors appeared.
omegaconf.errors.ConfigAttributeError: Missing key timm_model_name , full_key: clip.vision_cfg.timm_model_name
I think this parameter(timm_model_name)is not written in the diffusion configuration, so I would like to ask what configuration your pretrained model is based on.
I am currently retraining a 16k HiFi-GAN, but using latent as the input, the loss is difficult to converge. Although human voice can be heard, there is a presence of electronic sound. What could be the reason for this? Looking forward to your reply, thank you.
Hi,
i am trying to train the vqvae on ~200h of speech. For some reason, it seems to work ok in the beginning, but after ~50k steps, the loss shoots up and then goes NaN. Any idea what could be the problem?
Hi,
Which dataset was used to train this model?
Thank you!
Hi @adelacvg
Can we use this kind model for speech to speech (Voice conversion).
about the diffusion model input,why use gpt latent?
if use vq code as diffusion model input,is something wrong?
compare with spear tts,this model vq code is semantic token?
thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.