Coder Social home page Coder Social logo

roatienza / efficientspeech Goto Github PK

View Code? Open in Web Editor NEW
142.0 6.0 25.0 5.08 MB

PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.

License: Apache License 2.0

Python 48.95% Shell 0.86% Jupyter Notebook 50.19%
neural speech synthesis tts

efficientspeech's People

Contributors

roatienza avatar v0xie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

efficientspeech's Issues

[Request] Provide German model

Many thanks Rowel for this repository and the provided English models. I like the quality and RTF on CPU, its about 10 on my PC, and i will soon install it on different kind of RPIs.

Do you have any plans to train a German model, for instance based on

ThorstenVoice Dataset 2022.10

Can you share your experience with regards to training speed?

Unable to run demo on windows

(es) Q:\Utilities\CUDA\efficientspeech>python demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/pytorch2.0.1/tiny_eng_266k.ckpt  --infer-device cpu --text "the quick brown fox jumps over the lazy dog" --wav-filename fox.wav
100%|█████████████████████████████████████████████████████████████████████████████| 6.76M/6.76M [00:00<00:00, 68.4MB/s]
A:\Anaconda\envs\es\lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Removing weight norm...
Traceback (most recent call last):
  File "Q:\Utilities\CUDA\efficientspeech\demo.py", line 121, in <module>
    model = model.load_from_checkpoint(checkpoint,
  File "A:\Anaconda\envs\es\lib\site-packages\lightning\pytorch\utilities\model_helpers.py", line 93, in __get__
    raise TypeError(
TypeError: The classmethod `EfficientSpeech.load_from_checkpoint` cannot be called on an instance. Please call it on the class type and make sure the return value is used.

ONNX models?

I've tried efficientspeech on my Raspberry Pi 4 and it works pretty well (~2s for 3s audio) 👍, but it still needs to be a bit faster to be really useful.
In your code I've seen a comment about the ONNX models being ~3 times faster.
I failed to use the convert script so I was wondering if you could upload the model for testing? 🙂

ONNX Inference Issue

Thank you for the excellent work and sharing this implementation.

I tried to convert to ONNX and did the inference . However I have below issue/challenges. Appreciate any valuable suggestions .

ONNX input dimension remains fixed, As a result we need to pad additional Ids to the phoneme array. In the existing code, it replicates the phoneme till the ONNX input size length. This in turn creates repeated audios of the same content. Is there any specific Id, I can pad to avoid unwanted audio at the end. OR Is there a way to pass dynamic length phoneme array to ONNX model . Please clarify if I'm missing anything here and how to avoid this.

I can promote this repo but I need some info and guide for training

I have recently released this tutorial

It was so easy to do training with DLAS. Used Ozen Toolkit to prepare whole speech training dataset with single click

So both data preparation and training were single click.

Can you give me some more information - guides about how to do training to produce speech with your repo?

I would like to make a tutorial

Master Deep Voice Cloning in Minutes: Unleash Your Vocal Superpowers! Free and Locally on Your PC

!image

Error in networks.py

Hello,

i am running into this error:

File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 354, in advance
self.epoch_loop.run(self._data_fetcher)
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run
self.advance(data_fetcher)
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 218, in advance
batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs)
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 185, in run
self._optimizer_step(kwargs.get("batch_idx", 0), closure)
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 260, in _optimizer_step
call._call_lightning_module_hook(
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 140, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/core/module.py", line 1256, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 155, in step
step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 225, in optimizer_step
return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 70, in optimizer_step
closure_result = closure()
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in call
self._result = self.closure(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure
step_output = self._step_fn()
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 307, in _training_step
training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 287, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 367, in training_step
return self.model.training_step(*args, **kwargs)
File "/home/tts/efficientspeech/model.py", line 214, in training_step
y_hat = self.forward(x)
File "/home/tts/efficientspeech/model.py", line 156, in forward
return self.phoneme2mel(x, train=True) if self.training else self.predict_step(x)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/tts/efficientspeech/layers/networks.py", line 421, in forward
pred = self.encoder(x, train=train)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/tts/efficientspeech/layers/networks.py", line 370, in forward
fused_features = torch.cat([fused_features, pitch_features,
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 1 but got size 110 for tensor number 1 in the list.
Epoch 0: 0%| | 0/129 [00:02<?, ?it/s]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.