Coder Social home page Coder Social logo

dctts-pytorch's Introduction

DC-TTS

The pytorch implementation of papar Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention.

Thanks for Kyubyong/dc_tts, which helped me a lot to overcome some difficulties.

Dataset

  • The LJ Speech Dataset. A public domain speech dataset consisting of 13,100 short audio clips of a single female speaker.

Train

I have tuned hyper parameters and trained a model with The LJ Speech Dataset. The hyper parameters may not be the best and are slightly different with those used in original paper.

To train a model yourself with The LJ Speech Dataset:

  1. Download the dataset and extract into a directory, set the directory in pkg/hyper.py
  2. Run preprocess
    python3 main.py --action preprocess
    
  3. Train Text2Mel network, you can change the device to train text2mel in pkg/hyper.py
    python3 main.py --action train --module Text2Mel
    
  4. Train SSRN network, also, it's possible to change the training device
    python3 main.py --action train --module SuperRes
    

Samples

Some synthesized samples are contained in directory synthesis. The according sentences are listed in sentences.txt. The pre-trained model for Text2Mel and SuperRes (auto-saved at logdir/text2mel/pkg/trained.pkg and logdir/superres/pkg/trained.pkg in training phase) will be loaded when synthesizing.

You can synthesis samples listed in sentences.txt with

python3 main.py --action synthesis
  • Attention Matrix for the sentence: "Which came first... the chicken or the egg? Did the universe have a beginning... and if so, what happened before then? Where did the universe come from... and where is it going?"

Pre-trained model

The samples in directory synthesis is sampled with 410k batches trained Text2Mel and 190k batches trained SuperRes.

The current result is not very satisfying, specificly, some vowels are skipped. Hope someone can find better hyper parameters and train better models. Please tell me if you were able to get a great model.

You can download the current pre-trained model from my dropbox.

Dependancy

  • scipy, librosa, num2words
  • pytorch >= 0.4.0

Relative

TensorFlow implementation: Kyubyong/dc_tts

Please email me or open an issue, if you have any question or suggestion.

dctts-pytorch's People

Contributors

chaiyujin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dctts-pytorch's Issues

problem with my own small dataset : ValueError: negative step not yet supported

hi, i create my own dataset in Persian language with 229 rows ( i know its very few but its only for test ) .
at first i change this : ( vocab = "PE ابپتثجچ‌حخدذرز‌ژس‌شصضطظعغفقکگلمنوهیءآاًهٔة'.?")
and after runing (python3 main.py --action train --module Text2Mel) i get this error:

ule Text2Mel
train: Text to Mel
loop 0
▝[1/8] |gs: 0, mels: 0.321015, bd1: 0.693171, atten: 0.637633, scale: 99.999000| [0.0% eta ETA unknown]Traceback (most recent call last):
File "main.py", line 62, in
main()
File "main.py", line 47, in main
train(args.module, args.load)
File "/home/ai/dctts-pytorch/pkg/train.py", line 54, in train
train_text2mel(load_trained)
File "/home/ai/dctts-pytorch/pkg/train.py", line 197, in train_text2mel
plot_spectrum(mels[0].cpu().data, "mel_true", gs, dir=logdir)
File "/home/ai/dctts-pytorch/pkg/utils.py", line 110, in plot_spectrum
im = ax.imshow(np.flip(spectrum, 0), cmap="jet", aspect=0.2 * spectrum.shape[1] / spectrum.shape[0])
File "<array_function internals>", line 6, in flip
File "/home/ai/Downloads/con/lib/python3.7/site-packages/numpy/lib/function_base.py", line 254, in flip
return m[indexer]
ValueError: negative step not yet supported

Training with GPU vs Movidius

Hello,
I'm noob but i like try...
I use GPU Nvidia 1050 but in trining every loop ~39 min 👎 and need 4329. Is normal?

Is easy modified this code for Movidius NCS compatible?

Thanks!

Multiple GPU

Hello,

how i can incorporate the DataParallel in this project?
model = nn.DataParallel(model)

thanks

GPU idle between iteration

Awesome work.

When I check the GPU utilization during training, I found a lot of GPU idle time between each iteration.

Did you see similar behavior? I'm using NFS storage. so one possible reason would be load and feed mel would be the bottleneck.

  • Text2Mel 64batch on V100 16GB ( pytorch 1.0)
nvidia-smi -l 1 | grep MiB
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   45C    P0   162W / 300W |   5484MiB / 16130MiB |     31%      Default |
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   46C    P0   209W / 300W |   5484MiB / 16130MiB |     95%      Default |
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   46C    P0   208W / 300W |   5484MiB / 16130MiB |     24%      Default |
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    77W / 300W |   5484MiB / 16130MiB |     94%      Default |
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    71W / 300W |   5484MiB / 16130MiB |      0%      Default |


  • SSRN log with 32batch on V100 16GB ( pytorch 1.0)
nvidia-smi -l 1 | grep MiB
| N/A   43C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   47C    P0   207W / 300W |  10386MiB / 16130MiB |     29%      Default |
| N/A   44C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   49C    P0   263W / 300W |  10386MiB / 16130MiB |    100%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   47C    P0    65W / 300W |  10386MiB / 16130MiB |    100%      Default |
| N/A   44C    P0    63W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   47C    P0   283W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    63W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    63W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   46C    P0    61W / 300W |  10386MiB / 16130MiB |     34%      Default |
| N/A   44C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   43C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    67W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    66W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   46C    P0    62W / 300W |  10386MiB / 16130MiB |     28%      Default |
| N/A   44C    P0    62W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    61W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   50C    P0   179W / 300W |  10386MiB / 16130MiB |    100%      Default |
| N/A   44C    P0    62W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0   227W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   50C    P0   293W / 300W |  10386MiB / 16130MiB |    100%      Default |
| N/A   44C    P0    67W / 300W |  10386MiB / 16130MiB |      0%      Default |

Training type

Hi! One question,
Its obligatory train in Text2Mel and SSRN? Or if only train one of this works?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.