Coder Social home page Coder Social logo

squeezewave's People

Contributors

bichenwuucb avatar bohanzhai avatar tianrengao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

squeezewave's Issues

Slower than Waveglow on GPU

I test SqueezeWave and Waveglow with same mel input on GPU Gefore RTX 2080. The results show that, SqueezeWave is two times slower than Waveglow (with short sentence.)

preprocess

I want to train my own model using Chinese data. How to get the mel features and target data for training from .wav files. Where I can find the preprocess script.
Thank you!!!

bias from the model

'SqueezeWave' object has no attribute 'upsample'

hello, when i use the denoiser.py . i met this erro.But when i run the denoiser.py from the waveglow. i can run that.How can i solve this problem.Thank you.

Running in CPU

Looks like Apex library needs CUDA, when i without cuda , getting below error. any idea.

args.sampling_rate, args.is_fp16, args.denoiser_strength)

File "inference.py", line 48, in main
squeezewave, _ = amp.initialize(squeezewave, [], opt_level="O3")
File "/usr/local/lib/python3.6/dist-packages/apex-0.1-py3.6.egg/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/usr/local/lib/python3.6/dist-packages/apex-0.1-py3.6.egg/apex/amp/_initialize.py", line 171, in _initialize
check_params_fp32(models)
File "/usr/local/lib/python3.6/dist-packages/apex-0.1-py3.6.egg/apex/amp/_initialize.py", line 93, in check_params_fp32
name, param.type()))
File "/usr/local/lib/python3.6/dist-packages/apex-0.1-py3.6.egg/apex/amp/_amp_state.py", line 32, in warn_or_err
raise RuntimeError(msg)
RuntimeError: Found param WN.0.in_layers.0.0.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with parameters
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.

Loss calculation

Hi!
Can you help me understand how is the loss calculated and what is the expected good loss value from training the model?

Got Nan in loss calculation

image
image
I have got negative value and NaN in loss calculation and it could not recover to normal training. It seems there is overflow in the intermediate variable. Is there any problem in the following calculation?

loss = torch.sum(z*z)/(2*self.sigma*self.sigma) - log_s_total - log_det_W_total

train.py

Traceback (most recent call last):
File "train.py", line 203, in
train(num_gpus, args.rank, args.group_name, **train_config)
File "train.py", line 131, in train
epoch_offset = max(0, int(iteration / len(train_loader)))
ZeroDivisionError: division by zero

hello, when i train the model. i met this erro.How can i solve that.
how can i load the data,thank you

GPU Conversion

Hey, great work, great minds.
Your implementation seems to be highly optimized for low computing power devices. I'm going through the paper right now. As per the nature of the model, what do you think about conversion of the model to do the computations on GPU?
The idea behind the work is to make mel-to-audio process available for mobile processors, but that also means we can get almost similar quality (to Waveglow's) on higher speeds by doing the work on GPU.
Looking forward to your ideas!

wav have no voice

hello,everyone.when i run the denoiser, i met the weightnorm erro. And when i use other method to solve it. the wav file has no voice. i did not know the reason. How can i solve that.thank you .

Question if the code line is a typo or not.

Hi. Im trying to adjust parameters by setting filter_length = 2048 and win_length = 1200.

And I got an assertion error in sftf.py line 68 assert(win_length >= filter_length)

I think the inequality sign direction should be apposite.

Thank you.

hop_length

你好!
请问如何修改模型使之可以适应hop_length为192的声谱图(我用模型生成的声谱图,hop_length为192的时候效果比较好)?

denoiser of the squeezewave

hello, when i run the denoiser.py of the work. I met the erro.
'Upsample1d' object has no attribute 'weight'
how can i solve that.thank you

About CPU inference time

Really nice and practical work! I have one question about the inference time on cpu, did you test the inference time with c++ code or python code? And are there any optimizations utilized?

Docker

Please offer a docker image

License

Hi, what license is this under? Can you guys add one please? Thanks.

Parameter configs for different sample rate, hop length etc.

Hi. Thanks for your nice work!
I'm working on training with mel-spectrograms of different sample rate, hop length, window size, but got shape mismatched errors. Are there any suggestions on the "SqueezeWave_config" parameters? How shoud they be adapted for different mel params?

RuntimeError: expected type CUDAFloatType but got CUDAHalfType

Epoch: 0
Traceback (most recent call last):
File "/snap/pycharm-community/172/plugins/python-ce/helpers/pydev/pydevd.py", line 1434, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/snap/pycharm-community/172/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/mylinuxpc/git/SqueezeWave/train.py", line 204, in
train(num_gpus, args.rank, args.group_name, **train_config)
File "/home/mylinuxpc/git/SqueezeWave/train.py", line 141, in train
outputs = model((mel, audio))
File "/home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/mylinuxpc/git/SqueezeWave/glow.py", line 234, in forward
output = self.WN[k]((audio_0, spect))
File "/home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/mylinuxpc/git/SqueezeWave/glow.py", line 177, in forward
n_channels_tensor)
File "/home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
RuntimeError:
expected type CUDAFloatType but got CUDAHalfType (compute_types at /pytorch/aten/src/ATen/native/TensorIterator.cpp:134)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f109e540fe1 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f109e540dfa in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: at::TensorIterator::compute_types() + 0x36b (0x7f109efb2e6b in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #3: at::TensorIterator::Builder::build() + 0x46 (0x7f109efb4b66 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #4: at::TensorIterator::binary_op(at::Tensor&, at::Tensor const&, at::Tensor const&) + 0x2c4 (0x7f109efb54d4 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #5: at::native::add_out(at::Tensor&, at::Tensor const&, at::Tensor const&, c10::Scalar) + 0x71 (0x7f109ee99b41 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #6: at::native::add(at::Tensor const&, at::Tensor const&, c10::Scalar) + 0x32 (0x7f109ee99fb2 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #7: at::TypeDefault::add(at::Tensor const&, at::Tensor const&, c10::Scalar) const + 0x70 (0x7f109f17a570 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #8: torch::autograd::VariableType::add(at::Tensor const&, at::Tensor const&, c10::Scalar) const + 0x2c7 (0x7f109d081657 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #9: + 0x579b0b (0x7f109d1a3b0b in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #10: + 0x686176 (0x7f109d2b0176 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #11: torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocatorc10::IValue >&) + 0x22 (0x7f109d2ab232 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #12: + 0x664d7c (0x7f109d28ed7c in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #13: + 0x3d3184 (0x7f10d8931184 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #14: + 0x3afb63 (0x7f10d890db63 in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #15: + 0x1172fc (0x7f10d86752fc in /home/mylinuxpc/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #16: _PyMethodDef_RawFastCallDict + 0x267 (0x56458fac1987 in /home/mylinuxpc/anaconda3/bin/python)
frame #17: _PyCFunction_FastCallDict + 0x21 (0x56458fac1ae1 in /home/mylinuxpc/anaconda3/bin/python)
frame #18: _PyObject_Call_Prepend + 0x63 (0x56458fab3c43 in /home/mylinuxpc/anaconda3/bin/python)
frame #19: PyObject_Call + 0x6e (0x56458faa895e in /home/mylinuxpc/anaconda3/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x1e20 (0x56458fb57250 in /home/mylinuxpc/anaconda3/bin/python)
frame #21: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #22: _PyFunction_FastCallDict + 0x1d5 (0x56458fa9c5d5 in /home/mylinuxpc/anaconda3/bin/python)
frame #23: _PyObject_Call_Prepend + 0x63 (0x56458fab3c43 in /home/mylinuxpc/anaconda3/bin/python)
frame #24: + 0x17116a (0x56458faf616a in /home/mylinuxpc/anaconda3/bin/python)
frame #25: _PyObject_FastCallKeywords + 0x49b (0x56458fafed2b in /home/mylinuxpc/anaconda3/bin/python)
frame #26: _PyEval_EvalFrameDefault + 0x4bf6 (0x56458fb5a026 in /home/mylinuxpc/anaconda3/bin/python)
frame #27: _PyFunction_FastCallDict + 0x10b (0x56458fa9c50b in /home/mylinuxpc/anaconda3/bin/python)
frame #28: _PyObject_Call_Prepend + 0x63 (0x56458fab3c43 in /home/mylinuxpc/anaconda3/bin/python)
frame #29: PyObject_Call + 0x6e (0x56458faa895e in /home/mylinuxpc/anaconda3/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x1e20 (0x56458fb57250 in /home/mylinuxpc/anaconda3/bin/python)
frame #31: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #32: _PyFunction_FastCallDict + 0x1d5 (0x56458fa9c5d5 in /home/mylinuxpc/anaconda3/bin/python)
frame #33: _PyObject_Call_Prepend + 0x63 (0x56458fab3c43 in /home/mylinuxpc/anaconda3/bin/python)
frame #34: + 0x17116a (0x56458faf616a in /home/mylinuxpc/anaconda3/bin/python)
frame #35: _PyObject_FastCallKeywords + 0x49b (0x56458fafed2b in /home/mylinuxpc/anaconda3/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x4bf6 (0x56458fb5a026 in /home/mylinuxpc/anaconda3/bin/python)
frame #37: _PyFunction_FastCallDict + 0x10b (0x56458fa9c50b in /home/mylinuxpc/anaconda3/bin/python)
frame #38: _PyObject_Call_Prepend + 0x63 (0x56458fab3c43 in /home/mylinuxpc/anaconda3/bin/python)
frame #39: PyObject_Call + 0x6e (0x56458faa895e in /home/mylinuxpc/anaconda3/bin/python)
frame #40: _PyEval_EvalFrameDefault + 0x1e20 (0x56458fb57250 in /home/mylinuxpc/anaconda3/bin/python)
frame #41: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #42: _PyFunction_FastCallDict + 0x1d5 (0x56458fa9c5d5 in /home/mylinuxpc/anaconda3/bin/python)
frame #43: _PyObject_Call_Prepend + 0x63 (0x56458fab3c43 in /home/mylinuxpc/anaconda3/bin/python)
frame #44: + 0x17116a (0x56458faf616a in /home/mylinuxpc/anaconda3/bin/python)
frame #45: _PyObject_FastCallKeywords + 0x49b (0x56458fafed2b in /home/mylinuxpc/anaconda3/bin/python)
frame #46: _PyEval_EvalFrameDefault + 0x4bf6 (0x56458fb5a026 in /home/mylinuxpc/anaconda3/bin/python)
frame #47: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #48: _PyFunction_FastCallDict + 0x400 (0x56458fa9c800 in /home/mylinuxpc/anaconda3/bin/python)
frame #49: _PyEval_EvalFrameDefault + 0x1e20 (0x56458fb57250 in /home/mylinuxpc/anaconda3/bin/python)
frame #50: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #51: PyEval_EvalCodeEx + 0x44 (0x56458fa9c3c4 in /home/mylinuxpc/anaconda3/bin/python)
frame #52: PyEval_EvalCode + 0x1c (0x56458fa9c3ec in /home/mylinuxpc/anaconda3/bin/python)
frame #53: + 0x1e004d (0x56458fb6504d in /home/mylinuxpc/anaconda3/bin/python)
frame #54: _PyMethodDef_RawFastCallKeywords + 0xe9 (0x56458fafe569 in /home/mylinuxpc/anaconda3/bin/python)
frame #55: _PyCFunction_FastCallKeywords + 0x21 (0x56458fafe801 in /home/mylinuxpc/anaconda3/bin/python)
frame #56: _PyEval_EvalFrameDefault + 0x6258 (0x56458fb5b688 in /home/mylinuxpc/anaconda3/bin/python)
frame #57: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #58: _PyFunction_FastCallKeywords + 0x325 (0x56458fafd9c5 in /home/mylinuxpc/anaconda3/bin/python)
frame #59: _PyEval_EvalFrameDefault + 0x4aa9 (0x56458fb59ed9 in /home/mylinuxpc/anaconda3/bin/python)
frame #60: _PyFunction_FastCallKeywords + 0xfb (0x56458fafd79b in /home/mylinuxpc/anaconda3/bin/python)
frame #61: _PyEval_EvalFrameDefault + 0x6a0 (0x56458fb55ad0 in /home/mylinuxpc/anaconda3/bin/python)
frame #62: _PyEval_EvalCodeWithName + 0x2f9 (0x56458fa9b4f9 in /home/mylinuxpc/anaconda3/bin/python)
frame #63: _PyFunction_FastCallKeywords + 0x325 (0x56458fafd9c5 in /home/mylinuxpc/anaconda3/bin/python)
:
operation failed in interpreter:
@torch.jit.script
def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
n_channels_int = n_channels[0]
in_act = input_a+input_b
~~~~~~~~~~~~~~~ <--- HERE
t_act = torch.tanh(in_act[:, :n_channels_int, :])
s_act = torch.sigmoid(in_act[:, n_channels_int:, :])
acts = t_act * s_act
return acts

Using Spectogram generated from Fastspeech to Squeezewave

Fastspeech project ( https://github.com/xcmyz/FastSpeech) generates mel spectrogram quite fast
from text, i am trying to integrate fastspeech mel generation with squeezewave vocoder instead of using mel2samp.py to generates mels...pt.

but getting

i tried saving the mel_postnet_torch( melspectrogram) to a pt file , then used to generate wav
from Squeezewave but i get following error.

Traceback (most recent call last):
File "inference.py", line 87, in
args.sampling_rate, args.is_fp16, args.denoiser_strength)
File "inference.py", line 57, in main
audio = squeezewave.infer(mel, sigma=sigma).float()
File "/mount/data/SqueezeWave/glow.py", line 261, in infer
output = self.WN[k]((audio_0, spect))
File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mount/data/SqueezeWave/glow.py", line 165, in forward
spect = self.cond_layer(spect)
File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 187, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected 3-dimensional input for 3-dimensional weight [2048, 80, 1], but got 4-dimensional input of size [1, 1, 80, 133] instead

Any idea was could be the issue?

I added lines to save mel calculation at
after
https://github.com/xcmyz/FastSpeech/blob/master/synthesis.py#L66
torch.save(mel_postnet_torch,"filename.pt")

[Question] MOS results

Something looks strange at the MOS table presented in the paper. The WaveGlow score is 4.57 ± 0.04 and is only 0.05 lower than GroundTruth. Did anyone understand why?

The samples shared for SqueezeWave are also of lower quality than expected, sounding a bit robotic even at the 128L model. The WaveGlow results shared by NVIDIA sounds much better to me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.