Coder Social home page Coder Social logo

seungwonpark / melgan Goto Github PK

View Code? Open in Web Editor NEW
626.0 30.0 120.0 18.03 MB

MelGAN vocoder (compatible with NVIDIA/tacotron2)

Home Page: http://swpark.me/melgan/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
tts neural-vocoder gan pytorch

melgan's Introduction

MelGAN

Unofficial PyTorch implementation of MelGAN vocoder

Key Features

  • MelGAN is lighter, faster, and better at generalizing to unseen speakers than WaveGlow.
  • This repository use identical mel-spectrogram function from NVIDIA/tacotron2, so this can be directly used to convert output from NVIDIA's tacotron2 into raw-audio.
  • Pretrained model on LJSpeech-1.1 via PyTorch Hub.

Prerequisites

Tested on Python 3.6

pip install -r requirements.txt

Prepare Dataset

  • Download dataset for training. This can be any wav files with sample rate 22050Hz. (e.g. LJSpeech was used in paper)
  • preprocess: python preprocess.py -c config/default.yaml -d [data's root path]
  • Edit configuration yaml file

Train & Tensorboard

  • python trainer.py -c [config yaml file] -n [name of the run]
    • cp config/default.yaml config/config.yaml and then edit config.yaml
    • Write down the root path of train/validation files to 2nd/3rd line.
    • Each path should contain pairs of *.wav with corresponding (preprocessed) *.mel file.
    • The data loader parses list of files within the path recursively.
  • tensorboard --logdir logs/

Pretrained model

Try with Google Colab: TODO

import torch
vocoder = torch.hub.load('seungwonpark/melgan', 'melgan')
vocoder.eval()
mel = torch.randn(1, 80, 234) # use your own mel-spectrogram here

if torch.cuda.is_available():
    vocoder = vocoder.cuda()
    mel = mel.cuda()

with torch.no_grad():
    audio = vocoder.inference(mel)

Inference

  • python inference.py -p [checkpoint path] -i [input mel path]

Results

See audio samples at: http://swpark.me/melgan/. Model was trained at V100 GPU for 14 days using LJSpeech-1.1.

Implementation Authors

License

BSD 3-Clause License.

Useful resources

melgan's People

Contributors

benwu95 avatar seungwonpark avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

melgan's Issues

mel

This repository use identical mel-spectrogram function from NVIDIA/tacotron2, so this can be directly used to convert output from NVIDIA's tacotron2 into raw-audio.

how?

Report an error

ok
from .res_stack import ResStack
#from res_stack import ResStack

Use mel-gan as an universal vocoder

Hello, thanks for your nice implementation of mel-gan.

I guess mel-gan can be used as the universal vocoder, and I thought there were a mention about multi-speaker training scheme in the original paper. Have you ever tried multi-speaker setting? It might be really useful if it can be an universal vocoder similar like this.

Wrong implementation of Generator

The last layer should be:

nn.utils.weight_norm(nn.Conv1d(32, 1, kernel_size=7, stride=1, padding=3)),

not:

nn.utils.weight_norm(nn.ConvTranspose1d(32, 1, kernel_size=7, stride=1, padding=3)),

omg...

How to Edit default.yaml

Hi, I am at the stage in your instructions where I am supposed to edit the default.Yaml but I am not sure what paths to put in, and where exactly to put them in the text field when I open the yaml in notepad.

Can I do the google colab implementation for you?

Hello Seungwon Park @seungwonpark,

You said:
_

Try with Google Colab: TODO...
_

It's my 2nd time doing implementing a ML model on Google Colab. I am a beginner in ML but I learn best hands-on.

  1. Thanks for being awesome
  2. Can I do the Google Colab implementation?
  3. Have you attempted it? If so did you face any obstacles?
  4. Do you want a fork or pull request?

Sound artifact at the end of the sample

First of all, thank you for the wonderful repository.

I know that this issue has been discussed in a previous issue, but I wanted to know if the artifact that appears at the end of the inferred sentence can be solved by the current repository. I am experiencing artifacts at the end of a sample (and also in between sentences) no matter what I try, so I was hoping that someone could point me in the right direction to address this issue.

Also, I was wondering where -11.5129 came from in the following line:

zero = torch.full((1, self.mel_channel, 10), -11.5129).to(mel.device)
.

Thanks in advance.

Better audio quality with larger resnet

Hi, great repo!

I found that the audio quality improves considerably with a slightly increased ResNet as suggested in https://arxiv.org/pdf/2005.05106.pdf. The shaky and metallic artefacts are reduced a lot.

Here is a comparison of your pretrained LJSpeech with a current model I am still training (for TTS I used https://github.com/as-ideas/ForwardTacotron)

Original (6400 epochs):
https://drive.google.com/file/d/1LOIB9B7LDX9g-kVu_p1anGJgJ5vjE27s/view?usp=sharing

Larger ResNet (2000 epochs):
https://drive.google.com/file/d/19_d2SQU1xZi-o90MJ8NcKhIS6AFwliH-/view?usp=sharing

If you are interested I could open a PR making the layers more flexible.

Use this implementation for TTS engine

Can create separate branch for TTS implementation, that's the ultimate goal for every neural vocoder. I will try to use this implementation with nvidia's Tacotron2, as preprocessing for both networks are same.

Note : I am already working in it, and will post the output samples here by tomorrow.

Unable to resume training from official checkpoint

Hi @seungwonpark

Thanks for all this. I am using your official checkpoint nvidia_tacotron2_LJ11_epoch6400.pt

When I try to resume training from that checkpoint I get the following error:

2020-01-07 00:26:35,386 - INFO - Resuming from checkpoint: ./nvidia_tacotron2_LJ11_epoch6400.pt
Traceback (most recent call last):
  File "trainer.py", line 52, in <module>
    train(args, pt_dir, args.checkpoint_path, trainloader, valloader, writer, logger, hp, hp_str)
  File "/delip/workspace/melgan/utils/train.py", line 34, in train
    model_d.load_state_dict(checkpoint['model_d'])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 816, in load_state_dict
    state_dict = state_dict.copy()
AttributeError: 'NoneType' object has no attribute 'copy'

Looks like a mismatch between my PyTorch version and the one used for this checkpoint? My PyTorch version is 1.3.0a0+24ae9b5.

Do you have a checkpoint for the latest PyTorch? Or alternatively, what was the PyTorch version used with this checkpoint?

PS: md5sum for my copy of the checkpoint is 1cb89dc08401770fa9e2dd7d5c704bf5

Optimize Network to remove click like sound artifact

After 2000 epochs sound quality reach to usable level but only buggy thing remains is metallic click like noise artifact at the end of each generated sample. Needed to optimize and do some more R&D to remove such kind of Noise artifacts.

Loading without cuda results in an error

I am getting the following error while trying to load using torch hub

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

To fix apply the following snippet to this line

state_dict = torch.hub.load_state_dict_from_url(params['model_url'],
                                                        progress=progress,
                                                        map_location=torch.device('cpu'))

Torch hub command not working

Hello,
Thanks a lot for your implementation.
I am slightly confused because I am trying to run your torch hub command but it does not work.
image

Also if I look in /home/user/.cache/torch/checkpoints/, I cannot find the checkpoint even though I do get the logging: Downloading: "https://github.com/seungwonpark/melgan/releases/download/v0.1-alpha/nvidia_tacotron2_LJ11_epoch3200.pt" to /home/user/.cache/torch/checkpoints/nvidia_tacotron2_LJ11_epoch3200.pt

Batch size = 16?

Hi,
Thank you for your nice implementation. I have a question about the batch size selection. It looks like the network is small enough for bigger batch size, for example 32 or 64 on a GTX 1080Ti. Batch size of 16 is a kind of regularization?
Another question is related to the G/D updates. In your generated samples, are you using 1:1?
Thanks.

Augmentation

Hi,
I've noticed you add random noise to the audio. I just wondered if it would make more sense to add random noise to the Mel spectra? Considering that this is what you might get from something like tacotron, so noisy mels.

torchscript implementation

I noticed a torchscript branch. Is there a torchscript implementation of just the model conversion and inference?

Why remove weight norm?

Why remove weight norm in eval?
At inference time, weight norm should be kept or removed?

inference

inference.py -p [checkpoint path] -i [input mel path]
wich files i use to generate speech?
found this file "nvidia_tacotron2_LJ11_epoch6400.pt", what is it for?

hop_length

Hi!
You once commented that "the model architecture upsamples the mel-spectrogram by 256 times, so the hop_length can't be changed." May I know which model? And if I change the times, can I change hop_length?
Thanks!

Some notable differences with official implementation

Hi,
Just FYI, in the official MelGan repo, the authors used Hinge losses. However, in the paper, the author described with L2 loss. This repo is consistent with the paper! I am setting up some experiments with the Hinge loss to see the differences. Another note is that the default length of the segment_length is 8912 in the official as well (vs 16k in this repo).

Random crashes with custom dataset, tensor size mismatch

I created a small test dataset that you can replicate by downloading this podcast and following these steps.

I then used ffmpeg to convert it to a mono 22050hz wav file with ffmpeg -I input.mp3 -ac 1 -ar 22050 output.wav

I used sox to split on silence to have many smaller pieces into a split_files output folder with sox -V3 output.wav split_files/output.wav silence -l 0 3.0 1.0 5% : newfile : restart

There should be 240 pieces.

The last 24 pieces were used for validation.

Here's two seperate errors (note that the dataloader shuffling was modified to False for both these runs, despite the fact that they crash at different steps)
[eric@eric-pc melgan]$ python trainer.py -c config/default.yaml -n test4
2019-10-24 23:06:54,795 - INFO - Starting new training run.
Validation loop: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:03<00:00,  6.90it/s]
g 31.2470 d 56.5574 | step 13: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:08<00:00,  1.51it/s]
2019-10-24 23:07:10,354 - INFO - Saved checkpoint to: chkpt/test4/test4_df8b090_0000.pt
g 29.4583 d 55.8972 | step 26: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:06<00:00,  1.91it/s]
g 29.3384 d 55.7414 | step 39: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:06<00:00,  1.90it/s]
g 31.0743 d 55.8826 | step 52: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:06<00:00,  1.87it/s]
g 30.2437 d 55.5219 | step 65: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:06<00:00,  1.89it/s]
Validation loop: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:03<00:00,  6.98it/s]
g 32.9035 d 58.3628 | step 78: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:06<00:00,  1.88it/s]
g 32.2074 d 55.6909 | step 91: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:06<00:00,  1.87it/s]
g 30.4200 d 55.2120 | step 93:  15%|██████████████████████████▏                                                                                                                                           	| 2/13 [00:01<00:09,  1.20it/s]2019-10-24 23:07:59,489 - INFO - Exiting due to exception: Caught RuntimeError in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
	data = fetcher.fetch(index)
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
	return self.collate_fn(data)
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in default_collate
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in <listcomp>
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in default_collate
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in <listcomp>
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
	return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 16000 and 15986 in dimension 2 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

Traceback (most recent call last):
  File "/home/eric/Documents/repos/melgan/utils/train.py", line 64, in train
	for (melG, audioG), (melD, audioD) in loader:
  File "/usr/lib/python3.7/site-packages/tqdm/_tqdm.py", line 1060, in __iter__
	for obj in iterable:
  File "/usr/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 801, in __next__
	return self._process_data(data)
  File "/usr/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
	data.reraise()
  File "/usr/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise
	raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
	data = fetcher.fetch(index)
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
	return self.collate_fn(data)
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in default_collate
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in <listcomp>
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in default_collate
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in <listcomp>
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
	return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 16000 and 15986 in dimension 2 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

g 30.4200 d 55.2120 | step 93:  15%|██████████████████████████▏                                                                                                                                           	| 2/13 [00:01<00:08,  1.28it/s]
[eric@eric-pc melgan]$ python trainer.py -c config/default.yaml -n test5
2019-10-24 23:11:19,808 - INFO - Starting new training run.
Validation loop: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:03<00:00,  6.96it/s]
g 31.1410 d 56.5434 | step 13: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:08<00:00,  1.51it/s]
2019-10-24 23:11:35,537 - INFO - Saved checkpoint to: chkpt/test5/test5_df8b090_0000.pt
g 30.1641 d 56.2416 | step 21:  62%|████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                             	| 8/13 [00:04<00:02,  1.93it/s]2019-10-24 23:11:39,845 - INFO - Exiting due to exception: Caught RuntimeError in DataLoader worker process 8.
Original Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
	data = fetcher.fetch(index)
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
	return self.collate_fn(data)
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in default_collate
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in <listcomp>
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in default_collate
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in <listcomp>
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
	return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 16000 and 15958 in dimension 2 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

Traceback (most recent call last):
  File "/home/eric/Documents/repos/melgan/utils/train.py", line 64, in train
	for (melG, audioG), (melD, audioD) in loader:
  File "/usr/lib/python3.7/site-packages/tqdm/_tqdm.py", line 1060, in __iter__
	for obj in iterable:
  File "/usr/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 801, in __next__
	return self._process_data(data)
  File "/usr/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
	data.reraise()
  File "/usr/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise
	raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 8.
Original Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
	data = fetcher.fetch(index)
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
	return self.collate_fn(data)
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in default_collate
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in <listcomp>
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in default_collate
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in <listcomp>
	return [default_collate(samples) for samples in transposed]
  File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
	return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 16000 and 15958 in dimension 2 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

g 30.1641 d 56.2416 | step 21:  62%|████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                             	| 8/13 [00:04<00:02,  1.81it/s]

There is some noise in the gap position

Hello, thank you very much for the good work!
I use Chinese datasets for experiments,and I found some noise in the gap position, May I ask if this is the best result?
this is samples syn.zip

How about the inference speed?

Thanks for your implementation of MelGan. As is introduced, MelGAN is lighter, faster, and better at generalizing speech. How about the inference speed?

Validation data

Quick question: where do I get / generate the validation data required (I'm using the LJSpeech set)?

If you can point to other/better training data with the corresponding validation data, also let me know please.

erro in inference

hello.when i run inference.py, i met this erro.
0it [00:00, ?it/s]
how can i solve that?thank you

Discriminator dominates over generator

I could get audible results at epoch>350, but they don't look good.
Also, d_loss gets too low and g_loss gets too high.

Screen Shot 2019-10-19 at 7 39 38 PM

Perhaps this could be caused by:

  • Discriminator may have learned that real data have discrete values (np.int16): Adding a gaussian noise may help. See soumith/ganhacks#14

Strange inference results with pretrained 6400epoch model

Thank you for your implementation and effort. I have question about inference and getting test samples from your pretrained model. I am running training, preprocess and inference with no problem on my Ubuntu machine. But results are strange, i cannot repeat your samples from 6400epoch trained model.

Am I missing something crucial? This model can generate unconditional audio? What is expected to be mel input for inference? Can your implementation generate audio translation?

My generated test samples and config files are in folder:https://drive.google.com/drive/folders/1zRhTFP7GepXrm_DPHkF1Nt94LZXMBX4Z?usp=sharing

Mel-Gan 학습데이터 전처리 관련해서 질문이 있습니다.

안녕하세요 승원님. 먼저 이렇게 좋은 코드 공개해주셔서 감사합니다.
제가 음성합성 경험이 적어서, 기본적일수도 있는 부분에 대해 질문하는 점 사과드립니다.

전에 WaveRNN 보코더를 학습해본 적이 있습니다.
제가 참고해서 사용했던 레포에서는 Vocoder 앞단의 Mel-Predict Network로 생성한 Mel-Spectrogram을 학습용 Mel로 사용을 했습니다.

어차피 앞단에 모델이 고정되어 있다면, 실제로 모델이 생성한 Mel을 사용하는게 신호처리 매커니즘으로 생성한 Mel을 학습에 사용하는 것보다 더 나은 결과를 보이지 않을까 생각하고 있습니다. 이에 대해서 승원님의 의견을 여쭤보고 싶습니다.

또 한가지 여쭤보고 싶은 점은, Vocoder를 단일 화자로 학습할 수도 있지만, 다화자로 학습할 수도 있는데 다화자로 학습시 모델이 생성해내는 음질이 아무래도 떨어질까요?

감사합니다.

LJSpeech Checkpoints

Hello,

I cant find pretrained lj-speech checkpoints in this repo. Is is possible train for different language with ljspeech checkpoint?

Could you share checkpoint link for LJspeech last step?

Thank you.

Questions to use melgan on my own dataset

Hi, I encounter some problems when I try to use melgan on my dataset.
The first one is that you comment in the default.yaml that we should leave the hop_length to 256. Why can't I change the value? Is this some limitations of the model structure?
The second question is that in the MelFromDisk class, you use a mapping in the __getitem__ under training set. What is this mapping used for? I think the input idx is between [0, len(wav_list)) and the mapping also has the same interval.

Report an error

ok
from .res_stack import ResStack
#from res_stack import ResStack

New error
#from .res_stack import ResStack
from res_stack import ResStack

Text2Mel input to MelGan outputs noisy audio file without any speech

Hey!

I've retrained the text2mel model, by cutting out mel reduction part in preprocessor, and changing the hparams to:

hop_length = 256
win_length = 1024
max_N = 180 # Maximum number of characters.
max_T = 210 # Maximum number of mel frames.
e = 512 # embedding dimension
d = 256 # Text2Mel hidden unit dimension

I'm trying to feed generated mels to MelGan, but output audio file is just noisy honk.
Any ideas?

EOFError: Ran out of input with num_workers>0 in windows

Seems there is related issue https://discuss.pytorch.org/t/pytorch-windows-eoferror-ran-out-of-input-when-num-workers-0/25918/18. But I can't get workaround so far.

python trainer.py -c config/config.yaml -n firstrun
2020-04-16 06:20:54,376 - INFO - Starting new training run.
Validation loop: 0%| | 0/1283 [00:00<?, ?it/s]2
020-04-16 06:20:54,386 - INFO - Exiting due to exception: 'getstate'
Traceback (most recent call last):
File "C:\Users\susinder\PycharmProjects\melgan_seungwonpark\utils\train.py", line 60, in train
validate(hp, args, model_g, model_d, valloader, writer, step)
File "C:\Users\susinder\PycharmProjects\melgan_seungwonpark\utils\validation.py", line 13, in validate
for mel, audio in loader:
File "C:\Users\susinder\Anaconda3\envs\test\lib\site-packages\tqdm\std.py", line 1119, in iter
for obj in iterable:
File "C:\Users\susinder\Anaconda3\envs\test\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\susinder\Anaconda3\envs\test\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init
w.start()
File "C:\Users\susinder\Anaconda3\envs\test\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\susinder\Anaconda3\envs\test\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\susinder\Anaconda3\envs\test\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\susinder\Anaconda3\envs\test\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "C:\Users\susinder\Anaconda3\envs\test\lib\multiprocessing\reduction.py", line 60, in dump

                                                                          File "C:\Users\susinder\Anaconda3\envs\test\lib\multiprocessing\reduction.py", line 60, in dump

Validation loop: 0%| | 0/1283 [00:00<?, ?it/s]

(test) C:\Users\susinder\PycharmProjects\melgan_seungwonpark>Traceback (most recent call last):
File "", line 1, in
File "C:\Users\susinder\Anaconda3\envs\test\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\susinder\Anaconda3\envs\test\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.