Coder Social home page Coder Social logo

ttts's People

Contributors

adelacvg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ttts's Issues

License

Please add a license. Please keep in mind that MPL (Coqui) may have restrictions on the license. Thank you!

some questions about training gpt

Hello, thanks for your code contribution. I try to train the gpt on my datasets, I have some questions:

  1. I use the same batch size as the configuration file on 8 A100. It takes about 3 hours to train 2000 steps. Is this normal?
  2. Based on your experience, how low should the loss value be?
  3. I only changed the input phoneme sequence, vqvae is consistent with the pre-trained model you provided, and I can get the correct audio using vq decoder. When I use latent as input for pre-trained diffusion and vocos, I cannot get the correct audio. Is this normal?

Model

Hi, thanks for making this! Will you be sharing the pre-trained models (beta versions?) on Hugging Face?

Unstable pronunciation

Hello, I followed all your training steps. But the generated audio has serious pronunciation problems. It feels like there are some problems with the GPT part. But debugging doesn't show the problem.
May I ask if you have any experience in handling this situation?

text: "中部地区如江汉平原,当前冬小麦普遍到了拔节期,南方则进入孕穗期,这时候小麦幼穗对外界环境更加敏感,长时间降温甚至降雪,可能会形成局部冻害。"
results.zip

我使用了main中的训练代码,但是并没有限制到最少4s,请问一下,代码中限制音频最少4s是基于什么考虑呢?

English support

Hi,
Is there a timeline on when this will support English?
Thank you!

Training on a new language

Thank you for your amazing work! The result of this model sounds really promising. If possible, could you guide me on how to train this model in another language? Thank you in advance.

vqvae多卡训练loss为nan

使用master分支代码,在aishell3数据集上训练,使用单卡时,损失正常;使用2卡时,反向传播一次后,loss为nan

Training diffusion error

Hello, I went to the [diffusion step] based on your code, and then the following errors appeared.

omegaconf.errors.ConfigAttributeError: Missing key timm_model_name , full_key: clip.vision_cfg.timm_model_name

I think this parameter(timm_model_name)is not written in the diffusion configuration, so I would like to ask what configuration your pretrained model is based on.

train hifigan failed

I am currently retraining a 16k HiFi-GAN, but using latent as the input, the loss is difficult to converge. Although human voice can be heard, there is a presence of electronic sound. What could be the reason for this? Looking forward to your reply, thank you.

vqvae training fails

Hi,

i am trying to train the vqvae on ~200h of speech. For some reason, it seems to work ok in the beginning, but after ~50k steps, the loss shoots up and then goes NaN. Any idea what could be the problem?

Dataset

Hi,
Which dataset was used to train this model?
Thank you!

some question About diffusion Model

about the diffusion model input,why use gpt latent?
if use vq code as diffusion model input,is something wrong?
compare with spear tts,this model vq code is semantic token?
thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.