tianrengao / squeezewave Goto Github PK

View Code? Open in Web Editor NEW

255.0 255.0 50.0 74 KB

License: Other

Python 76.47% Jupyter Notebook 23.53%

squeezewave's People

Contributors

Stargazers

Watchers

Forkers

wgwangang ishine templeblock sunxh16 bichenwuucb kunlqt macroustc haifengzeng entn-at begeekmyfriend kingstorm keigotakamura alokprasad jfsantos wangfn gongchenghhu wuqiangch dachengai irentang bruinxiong ntzzc chenguoguo ensky0 donghaiyw tzenthin shadowridgedev kowaalczyk tuanad121 kimjj-geek chenfengxu714 auzxb ml2457 strongstella rixot-chang appleholic soobo-seo daniel-jypark xrosliang bohanzhai cuijianzhu yhgon copperdong fightseed longmarch7 kimkanu phat-do elijahahianyo iq-scm

squeezewave's Issues

Slower than Waveglow on GPU

I test SqueezeWave and Waveglow with same mel input on GPU Gefore RTX 2080. The results show that, SqueezeWave is two times slower than Waveglow (with short sentence.)

convert squeezewave model to pytorch script for C++ inference.

Any idea how to convert squeezewave model to pytorch script and relevant c++ code
for loading and inferencing from same.

preprocess

I want to train my own model using Chinese data. How to get the mel features and target data for training from .wav files. Where I can find the preprocess script.
Thank you!!!

bias from the model

'SqueezeWave' object has no attribute 'upsample'

hello, when i use the denoiser.py . i met this erro.But when i run the denoiser.py from the waveglow. i can run that.How can i solve this problem.Thank you.

Running in CPU

Looks like Apex library needs CUDA, when i without cuda , getting below error. any idea.

args.sampling_rate, args.is_fp16, args.denoiser_strength)

File "inference.py", line 48, in main
squeezewave, _ = amp.initialize(squeezewave, [], opt_level="O3")
File "/usr/local/lib/python3.6/dist-packages/apex-0.1-py3.6.egg/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/usr/local/lib/python3.6/dist-packages/apex-0.1-py3.6.egg/apex/amp/_initialize.py", line 171, in _initialize
check_params_fp32(models)
File "/usr/local/lib/python3.6/dist-packages/apex-0.1-py3.6.egg/apex/amp/_initialize.py", line 93, in check_params_fp32
name, param.type()))
File "/usr/local/lib/python3.6/dist-packages/apex-0.1-py3.6.egg/apex/amp/_amp_state.py", line 32, in warn_or_err
raise RuntimeError(msg)
RuntimeError: Found param WN.0.in_layers.0.0.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with parameters
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.

Loss calculation

Hi!
Can you help me understand how is the loss calculated and what is the expected good loss value from training the model?

'Upsample1d' object has no attribute 'weight'

hello,when i use the denoiser.py . i met this erro. How can i solve that. thank you

Got Nan in loss calculation

I have got negative value and NaN in loss calculation and it could not recover to normal training. It seems there is overflow in the intermediate variable. Is there any problem in the following calculation?

loss = torch.sum(z*z)/(2*self.sigma*self.sigma) - log_s_total - log_det_W_total

train.py

Traceback (most recent call last):
File "train.py", line 203, in
train(num_gpus, args.rank, args.group_name, **train_config)
File "train.py", line 131, in train
epoch_offset = max(0, int(iteration / len(train_loader)))
ZeroDivisionError: division by zero

hello, when i train the model. i met this erro.How can i solve that.
how can i load the data,thank you

[Question] published latency/delay of SW for real-time streaming inference

Hi,

I'm looking to run SW in real-time, and wanted to ask whether you've published latency benchmarks for SW for real-time streaming inference. I'm interested in the model's optimal chunk-size, future context and historical context needed (receptive field) and the model's RTF.

Thanks!

GPU Conversion

Hey, great work, great minds.
Your implementation seems to be highly optimized for low computing power devices. I'm going through the paper right now. As per the nature of the model, what do you think about conversion of the model to do the computations on GPU?
The idea behind the work is to make mel-to-audio process available for mobile processors, but that also means we can get almost similar quality (to Waveglow's) on higher speeds by doing the work on GPU.
Looking forward to your ideas!

want to see some speed tests

wav have no voice

hello,everyone.when i run the denoiser, i met the weightnorm erro. And when i use other method to solve it. the wav file has no voice. i did not know the reason. How can i solve that.thank you .

Question if the code line is a typo or not.

Hi. Im trying to adjust parameters by setting filter_length = 2048 and win_length = 1200.

And I got an assertion error in sftf.py line 68 assert(win_length >= filter_length)

I think the inequality sign direction should be apposite.

Thank you.

hop_length

你好！
请问如何修改模型使之可以适应hop_length为192的声谱图（我用模型生成的声谱图，hop_length为192的时候效果比较好）？

denoiser of the squeezewave

hello, when i run the denoiser.py of the work. I met the erro.
'Upsample1d' object has no attribute 'weight'
how can i solve that.thank you

About CPU inference time

Really nice and practical work! I have one question about the inference time on cpu, did you test the inference time with c++ code or python code? And are there any optimizations utilized?

Docker

Please offer a docker image

License

Hi, what license is this under? Can you guys add one please? Thanks.

Parameter configs for different sample rate, hop length etc.

Hi. Thanks for your nice work!
I'm working on training with mel-spectrograms of different sample rate, hop length, window size, but got shape mismatched errors. Are there any suggestions on the "SqueezeWave_config" parameters? How shoud they be adapted for different mel params？

how to use squeezewave with tts to generate voice

Hey .. am kind lost here..not an advanced dev guy but I managed to setup squeezewave .. how do i use it to generate voice from text.. would be great if anyone could help me ?

RuntimeError: expected type CUDAFloatType but got CUDAHalfType

Epoch: 0
Traceback (most recent call last):
File "/snap/pycharm-community/172/plugins/python-ce/helpers/pydev/pydevd.py", line 1434, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/snap/pycharm-community/172/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/mylinuxpc/git/SqueezeWave/train.py", line 204, in
train(num_gpus, args.rank, args.group_name, **train_config)
File "/home/mylinuxpc/git/SqueezeWave/train.py", line 141, in train
outputs = model((mel, audio))
File "/home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/mylinuxpc/git/SqueezeWave/glow.py", line 234, in forward
output = self.WN[k]((audio_0, spect))
File "/home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/mylinuxpc/git/SqueezeWave/glow.py", line 177, in forward
n_channels_tensor)
File "/home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
RuntimeError:
expected type CUDAFloatType but got CUDAHalfType (compute_types at /pytorch/aten/src/ATen/native/TensorIterator.cpp:134)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f109e540fe1 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f109e540dfa in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: at::TensorIterator::compute_types() + 0x36b (0x7f109efb2e6b in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #3: at::TensorIterator::Builder::build() + 0x46 (0x7f109efb4b66 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #4: at::TensorIterator::binary_op(at::Tensor&, at::Tensor const&, at::Tensor const&) + 0x2c4 (0x7f109efb54d4 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #5: at::native::add_out(at::Tensor&, at::Tensor const&, at::Tensor const&, c10::Scalar) + 0x71 (0x7f109ee99b41 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #6: at::native::add(at::Tensor const&, at::Tensor const&, c10::Scalar) + 0x32 (0x7f109ee99fb2 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #7: at::TypeDefault::add(at::Tensor const&, at::Tensor const&, c10::Scalar) const + 0x70 (0x7f109f17a570 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #8: torch::autograd::VariableType::add(at::Tensor const&, at::Tensor const&, c10::Scalar) const + 0x2c7 (0x7f109d081657 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #9: + 0x579b0b (0x7f109d1a3b0b in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #10: + 0x686176 (0x7f109d2b0176 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #11: torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocatorc10::IValue >&) + 0x22 (0x7f109d2ab232 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #12: + 0x664d7c (0x7f109d28ed7c in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #13: + 0x3d3184 (0x7f10d8931184 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #14: + 0x3afb63 (0x7f10d890db63 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #15: + 0x1172fc (0x7f10d86752fc in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #16: _PyMethodDef_RawFastCallDict + 0x267 (0x56458fac1987 in /home/mylinuxpc/anaconda3/bin/python)
frame #17: _PyCFunction_FastCallDict + 0x21 (0x56458fac1ae1 in /home/mylinuxpc/anaconda3/bin/python)
frame #18: _PyObject_Call_Prepend + 0x63 (0x56458fab3c43 in /home/mylinuxpc/anaconda3/bin/python)
frame #19: PyObject_Call + 0x6e (0x56458faa895e in /home/mylinuxpc/anaconda3/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x1e20 (0x56458fb57250 in /home/mylinuxpc/anaconda3/bin/python)
frame #21: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #22: _PyFunction_FastCallDict + 0x1d5 (0x56458fa9c5d5 in /home/mylinuxpc/anaconda3/bin/python)
frame #23: _PyObject_Call_Prepend + 0x63 (0x56458fab3c43 in /home/mylinuxpc/anaconda3/bin/python)
frame #24: + 0x17116a (0x56458faf616a in /home/mylinuxpc/anaconda3/bin/python)
frame #25: _PyObject_FastCallKeywords + 0x49b (0x56458fafed2b in /home/mylinuxpc/anaconda3/bin/python)
frame #26: _PyEval_EvalFrameDefault + 0x4bf6 (0x56458fb5a026 in /home/mylinuxpc/anaconda3/bin/python)
frame #27: _PyFunction_FastCallDict + 0x10b (0x56458fa9c50b in /home/mylinuxpc/anaconda3/bin/python)
frame #28: _PyObject_Call_Prepend + 0x63 (0x56458fab3c43 in /home/mylinuxpc/anaconda3/bin/python)
frame #29: PyObject_Call + 0x6e (0x56458faa895e in /home/mylinuxpc/anaconda3/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x1e20 (0x56458fb57250 in /home/mylinuxpc/anaconda3/bin/python)
frame #31: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #32: _PyFunction_FastCallDict + 0x1d5 (0x56458fa9c5d5 in /home/mylinuxpc/anaconda3/bin/python)
frame #33: _PyObject_Call_Prepend + 0x63 (0x56458fab3c43 in /home/mylinuxpc/anaconda3/bin/python)
frame #34: + 0x17116a (0x56458faf616a in /home/mylinuxpc/anaconda3/bin/python)
frame #35: _PyObject_FastCallKeywords + 0x49b (0x56458fafed2b in /home/mylinuxpc/anaconda3/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x4bf6 (0x56458fb5a026 in /home/mylinuxpc/anaconda3/bin/python)
frame #37: _PyFunction_FastCallDict + 0x10b (0x56458fa9c50b in /home/mylinuxpc/anaconda3/bin/python)
frame #38: _PyObject_Call_Prepend + 0x63 (0x56458fab3c43 in /home/mylinuxpc/anaconda3/bin/python)
frame #39: PyObject_Call + 0x6e (0x56458faa895e in /home/mylinuxpc/anaconda3/bin/python)
frame #40: _PyEval_EvalFrameDefault + 0x1e20 (0x56458fb57250 in /home/mylinuxpc/anaconda3/bin/python)
frame #41: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #42: _PyFunction_FastCallDict + 0x1d5 (0x56458fa9c5d5 in /home/mylinuxpc/anaconda3/bin/python)
frame #43: _PyObject_Call_Prepend + 0x63 (0x56458fab3c43 in /home/mylinuxpc/anaconda3/bin/python)
frame #44: + 0x17116a (0x56458faf616a in /home/mylinuxpc/anaconda3/bin/python)
frame #45: _PyObject_FastCallKeywords + 0x49b (0x56458fafed2b in /home/mylinuxpc/anaconda3/bin/python)
frame #46: _PyEval_EvalFrameDefault + 0x4bf6 (0x56458fb5a026 in /home/mylinuxpc/anaconda3/bin/python)
frame #47: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #48: _PyFunction_FastCallDict + 0x400 (0x56458fa9c800 in /home/mylinuxpc/anaconda3/bin/python)
frame #49: _PyEval_EvalFrameDefault + 0x1e20 (0x56458fb57250 in /home/mylinuxpc/anaconda3/bin/python)
frame #50: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #51: PyEval_EvalCodeEx + 0x44 (0x56458fa9c3c4 in /home/mylinuxpc/anaconda3/bin/python)
frame #52: PyEval_EvalCode + 0x1c (0x56458fa9c3ec in /home/mylinuxpc/anaconda3/bin/python)
frame #53: + 0x1e004d (0x56458fb6504d in /home/mylinuxpc/anaconda3/bin/python)
frame #54: _PyMethodDef_RawFastCallKeywords + 0xe9 (0x56458fafe569 in /home/mylinuxpc/anaconda3/bin/python)
frame #55: _PyCFunction_FastCallKeywords + 0x21 (0x56458fafe801 in /home/mylinuxpc/anaconda3/bin/python)
frame #56: _PyEval_EvalFrameDefault + 0x6258 (0x56458fb5b688 in /home/mylinuxpc/anaconda3/bin/python)
frame #57: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #58: _PyFunction_FastCallKeywords + 0x325 (0x56458fafd9c5 in /home/mylinuxpc/anaconda3/bin/python)
frame #59: _PyEval_EvalFrameDefault + 0x4aa9 (0x56458fb59ed9 in /home/mylinuxpc/anaconda3/bin/python)
frame #60: _PyFunction_FastCallKeywords + 0xfb (0x56458fafd79b in /home/mylinuxpc/anaconda3/bin/python)
frame #61: _PyEval_EvalFrameDefault + 0x6a0 (0x56458fb55ad0 in /home/mylinuxpc/anaconda3/bin/python)
frame #62: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #63: _PyFunction_FastCallKeywords + 0x325 (0x56458fafd9c5 in /home/mylinuxpc/anaconda3/bin/python)
:
operation failed in interpreter:
@torch.jit.script
def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
n_channels_int = n_channels[0]
in_act = input_a+input_b
~~~~~~~~~~~~~~~ <--- HERE
t_act = torch.tanh(in_act[:, :n_channels_int, :])
s_act = torch.sigmoid(in_act[:, n_channels_int:, :])
acts = t_act * s_act
return acts

Using Spectogram generated from Fastspeech to Squeezewave

Fastspeech project ( https://github.com/xcmyz/FastSpeech) generates mel spectrogram quite fast
from text, i am trying to integrate fastspeech mel generation with squeezewave vocoder instead of using mel2samp.py to generates mels...pt.

but getting

i tried saving the mel_postnet_torch( melspectrogram) to a pt file , then used to generate wav
from Squeezewave but i get following error.

Traceback (most recent call last):
File "inference.py", line 87, in
args.sampling_rate, args.is_fp16, args.denoiser_strength)
File "inference.py", line 57, in main
audio = squeezewave.infer(mel, sigma=sigma).float()
File "/mount/data/SqueezeWave/glow.py", line 261, in infer
output = self.WN[k]((audio_0, spect))
File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mount/data/SqueezeWave/glow.py", line 165, in forward
spect = self.cond_layer(spect)
File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 187, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected 3-dimensional input for 3-dimensional weight [2048, 80, 1], but got 4-dimensional input of size [1, 1, 80, 133] instead

Any idea was could be the issue?

I added lines to save mel calculation at
after
https://github.com/xcmyz/FastSpeech/blob/master/synthesis.py#L66
torch.save(mel_postnet_torch,"filename.pt")

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.