kyubyong / dc_tts Goto Github PK

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

License: Apache License 2.0

Python 100.00%

speech speech-to-text tts

dc_tts's Introduction

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

I implement yet another text-to-speech model, dc-tts, introduced in Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. My goal, however, is not just replicating the paper. Rather, I'd like to gain insights about various sound projects.

Requirements

NumPy >= 1.11.1
TensorFlow >= 1.3 (Note that the API of tf.contrib.layers.layer_norm has changed since 1.3)
librosa
tqdm
matplotlib
scipy

Data

I train English models and an Korean model on four different speech datasets.

1. LJ Speech Dataset
2. Nick Offerman's Audiobooks
3. Kate Winslet's Audiobook
4. KSS Dataset

LJ Speech Dataset is recently widely used as a benchmark dataset in the TTS task because it is publicly available, and it has 24 hours of reasonable quality samples. Nick's and Kate's audiobooks are additionally used to see if the model can learn even with less data, variable speech samples. They are 18 hours and 5 hours long, respectively. Finally, KSS Dataset is a Korean single speaker speech dataset that lasts more than 12 hours.

Training

STEP 0. Download LJ Speech Dataset or prepare your own data.
STEP 1. Adjust hyper parameters in hyperparams.py. (If you want to do preprocessing, set prepro True`.
STEP 2. Run python train.py 1 for training Text2Mel. (If you set prepro True, run python prepro.py first)
STEP 3. Run python train.py 2 for training SSRN.

You can do STEP 2 and 3 at the same time, if you have more than one gpu card.

Training Curves

Attention Plot

Sample Synthesis

I generate speech samples based on Harvard Sentences as the original paper does. It is already included in the repo.

Run synthesize.py and check the files in samples.

Generated Samples

Dataset	Samples
LJ	50k 200k 310k 800k
Nick	40k 170k 300k 800k
Kate	40k 160k 300k 800k
KSS	400k

Pretrained Model for LJ

Download this.

Notes

The paper didn't mention normalization, but without normalization I couldn't get it to work. So I added layer normalization.
The paper fixed the learning rate to 0.001, but it didn't work for me. So I decayed it.
I tried to train Text2Mel and SSRN simultaneously, but it didn't work. I guess separating those two networks mitigates the burden of training.
The authors claimed that the model can be trained within a day, but unfortunately the luck was not mine. However obviously this is much fater than Tacotron as it uses only convolution layers.
Thanks to the guided attention, the attention plot looks monotonic almost from the beginning. I guess this seems to hold the aligment tight so it won't lose track.
The paper didn't mention dropouts. I applied them as I believe it helps for regularization.
Check also other TTS models such as Tacotron and Deep Voice 3.

dc_tts's People

Contributors

Stargazers

Watchers

Forkers

entn-at ajmssc jemisa longjohncoder locosoft1986 jangocheng huguanglong shubhampachori12110095 shyamalschandra toannhu maozhiqiang gaziway fancycheung nieshaoshuai jdc08161063 xuanhan863 fireae opencvbaby tanyufei yaduvendra baaslaawe pbaljeka haoyuanluo rishabh135 lktoken edresson justinarose templeblock 19ai sampwing bpd1069 satpreetsingh tbryant ericustc g-wang w19787 sladix bmod samprate1st bradparks phuclb1 young-sun davidtranno1 sahilbadyal iantheparker matteotosi rcox771 aixingxy elisaizrailova idgmatrix vanova kquaid yangyangii afcarl sosonak chochobo smeylan sechans nestyme rai220 lowdias ryancwalsh lbqin bowbowbow hiyoung-asr landis22b rinleit hccho2 athuljayaraj peter05010402 binyi10 quocvuong82 hagamainty meelement bayesquant hueypretilaasms grassdinosaur raynor08 haifengzeng sdlibowen baskar007 amirunpri2018 ybhwang modu-ftnc orangebaowang begeekmyfriend mazzzystar dokuzbir a-spichakou cheungbs edgency itsbalamurali distant-gradient jayanthsunchu hussain5577 sallyjoy 1tracksystem wqxieg zhyoung24 yjingyu

dc_tts's Issues

Multi-Speaker Train

Hello, I would like to know how to train with more than one speaker?
as described here: https://github.com/Kyubyong/speaker_adapted_tts

I appreciate your answer.

Sorry I can't understand why bucket length dominated by text only?

dc_tts/data_load.py

Line 124 in 6375051

tensors=[text, mel, mag, fname],

The text's length is differnt from mel before padding, sorry i maybe don't how to use tf.contrib.training.bucket_by_sequence_length. Can someone explain it or offer me some tuto?

Horizontal Attention plot at synthesis

If you try, in synthesis, to save and show Attention computed with the model pretrained on LJ-speech for example, it will look like this:

Why is it horizontal and not diagonal like during the training ? The synthesis works just fine though ...

If I comment, in "networks.py", in the function "Attention" the part corresponding to "monotonic attention" like this:

    A = tf.matmul(Q, K, transpose_b=True) * tf.rsqrt(tf.to_float(hp.d))
    # if mononotic_attention:  # for inference
    #     key_masks = tf.sequence_mask(prev_max_attentions, hp.max_N)
    #     reverse_masks = tf.sequence_mask(hp.max_N - hp.attention_win_size - prev_max_attentions, hp.max_N)[:, ::-1]
    #     masks = tf.logical_or(key_masks, reverse_masks)
    #     masks = tf.tile(tf.expand_dims(masks, 1), [1, hp.max_T, 1])
    #     paddings = tf.ones_like(A) * (-2 ** 32 + 1)  # (B, T/r, N)
    #     A = tf.where(tf.equal(masks, False), A, paddings)
    A = tf.nn.softmax(A) # (B, T/r, N)
    max_attentions = tf.argmax(A, -1)  # (B, T/r)
    R = tf.matmul(A, V)
    R = tf.concat((R, Q), -1)

The attention plot will be of diagonal shape, and the synthesis not too bad but will have the problem mentioned in the paper: may skip letters or pronounce several times parts of words.

How to uncompress the pretrained model?

Hello,

I downloaded the pre-trained model (LJLogdir.tar) from your dropbox and tried to uncompress by using WinZip in Windows - It gave me an error message "Could not read the header". I tried tar -xvf in Linux, here again, I got errors.

What is the format of this file? and how do I successfully untar it?

thanks,
Buvana

Can anyone help me describing how can I use this code for transfer learning to train with my own voice and fewer sample?

Non-latic alpabet

Can it work with extended latin alpabet i.e. Polish? Does it have to be Latin-only?

forcibly incremental attention maxtrix passing problem in the synthesization.

To synthesize mel spectrogram at step t(audio step), need the full (forcibly Incremental ) attention matrix.
In synthesize.py, only argmaxed values are passed. Then returned _Y changes at each step.
For this reason, the results are problematic.

For example,

Let _Y3 be the g.Y array obtained in the step 3.
Let _Y4 be the g.Y array obtained in the step 4.

_Y3[:,3,:] is not equal _Y4[:,3,:]
_Y4[:,3,:] affects _Y4[:,4,:].

Strange tensor flow error due to an argument

Hello,

First of all, thank you for your effort with the open-source TTS projects.
As summary, I'm trying to create a TTS of Spanish language using your open-source dc_tts.

To do that, I want to train in my laptop the net that you uploaded as "pre-trained net" in order to know if I will be able to train my own network for spanish (once I have the spanish database correctly), but I get an error of tensorflow due to an argument. I have all the libraries correctly installed and the LJ database downloaded.

The only thing that I did was to download the repository, fix correctly the database paths (LJ) and run the train script with parameter "1".

C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: TypeError: a bytes-like object is required, not 'str'

Thank you!

Vocabulary ( ) ! ; , - and eval sentences

The LJSpeech Dataset contains at least the additional characters ( ) ! ; , -
Comma and dash are included in the original paper. Are these characters intentionally omitted?

Also i think the eval sentences give very little insight about the capabilities of the network.
With this set we don't know if

Questions are pronounced correctly
Pauses after comma are reasonable
Context based pronounciation is correct (e.g from tacotron2 sample "He has read the whole thing. ")

What is the shape of the input node?

Hi, I am trying to port this model to OpenVINO IR. But I will need to know few details about the model:

Input and output nodes
Shape of the input node
Thanks!

Invalid argument: TypeError: a bytes-like object is required, not 'str'

Python 3.5-3.6 is the only Tensorflow platform available on Windows now. If you try to run this code on any new installation, it will fail because of this error:

Invalid argument: TypeError: a bytes-like object is required, not 'str'

Previous Issues that mentioned this have been closed, but no fix has been suggested. It is not a closed issue.

I am trying to fix the type mismatch, but simple encoding just causes other errors. Any ideas so this code can remain usable for everyone?

I struggled to get anything but errors for the longest. Here's what finally got me able to run 'synthesize.py'

Starting with the downloads listed on main page:
LJ Dataset: https://keithito.com/LJ-Speech-Dataset/
needs to be unpacked and put somewhere. Adjust 'data = ' in hyperparameters.py to point to it.
----> Change the name of the .csv file in that folder to 'transcript.csv'. I can't remember now what it was called, but everything will fail unless it's named transcript.csv.

Pretrained models for LJ : https://www.dropbox.com/s/1oyipstjxh2n5wo/LJ_logdir.tar?dl=0
make a directory named 'logdir' in dc_tts. Untar both the LJ01-1 and LJ01-2 folders in there. Again, everything bombs without them being located there.

finally, mkdir samples in dc_tts, or it'll complain about nowhere to write.

If you get obscure errors like "TypeError: new() got an unexpected keyword argument 'file'", you're in luck because I know what to do. Your 'protobuf' python library is out of date. Turned out for me a rogue older version was hiding out in ~/.local/lib/python2.7/site-packages/. Pip didnt seem to mind or notice when I did a million varieties of attempting to uninstall/reinstall. Finally, I rm -rf -ed that mother and then did a sudo -H pip install --upgrade protobuf. You need to make sure your version is greater than 3.something-or-other or again, no dice.

... python synthesize.py at this point runs to completion. It produces 20 files, I assume meant to match the 'harvard_sentences.txt' file, all garbled static or silence only, but it indeed completed. That's a start.

[SOLVED] It is not training

This is pretty weird.

The graph Attention plot graph is also blank

I restarted again, same issue 😕

The synthesised audio is blank. Each sentence produces an audio sample of 10 seconds of silence.

How to get the loss of a model?

Hi,

Sorry, I'm new to TensorFlow. How can I output a graph of the loss over time?

Is it possible to also get the best loss in order to compare two different models after n iterations? I want to train two tts models on the same dataset and compare the two

thanks!

Usage guide / tutorial ?

Hi,

I've had okay results with Keithito's tacotron implementation, but I wanted to try this too (and your tacotron 2 implementation too but I suppose the question fits both equally)

Could you give a short guide on what to do to run this model from scratch? I've got my own dataset modeled after the Keithito's LJSpeech dataset so anything that works with that should work on my data.

On top of that, could you share your weights / checkpoints? I've noticed that, even though my dataset is in Dutch, it worked on only a few hundred iterations on top of Keithito's English tacotron training weights. Mainly the alignment was hard / slow to train from scratch I believe.

Thanks!

FileNotFoundError: [Errno 2] No such file or directory - Wavs folder not found

I'm trying to Run Prepo.py but I get this error. I do in fact have the .wav files in this directory and I have changed the Hyperparams to point into the correct folders. Am I getting something wrong here? should I add some argument to the python prepo.py

Traceback (most recent call last):
  File "prepo.py", line 20, in <module>
    fname, mel, mag = load_spectrograms(fpath)
  File "C:\Users\dbarroso\Development Projects\Morti-OS Suite\Morti-OS-Suite\TTS\utils.py", line 152, in load_spectrograms
    mel, mag = get_spectrograms(fpath)
  File "C:\Users\dbarroso\Development Projects\Morti-OS Suite\Morti-OS-Suite\TTS\utils.py", line 32, in get_spectrograms
    y, sr = librosa.load(fpath, sr=hp.sr)
  File "C:\anaconda3\envs\morti_os\lib\site-packages\librosa\core\audio.py", line 112, in load
    with audioread.audio_open(os.path.realpath(path)) as input_file:
  File "C:\anaconda3\envs\morti_os\lib\site-packages\audioread\__init__.py", line 80, in audio_open
    return rawread.RawAudioFile(path)
  File "C:\anaconda3\envs\morti_os\lib\site-packages\audioread\rawread.py", line 61, in __init__
    self._fh = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\<USERNAME>\\TTS\\data\\private\\voice\\wheatly\\wavs\\\ufeffSM001-0001.wav'

NOTE:
in this section of the error:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\<USERNAME>\\TTS\\data\\private\\voice\\wheatly\\wavs\\\ufeffSM001-0001.wav'

ufeff should not be there, it should just be SM001-001.wav, at some point its adding this to the path

mag_output

I noticed a reference to mag_output in the synthesizer

https://github.com/Kyubyong/dc_tts/blob/master/synthesize.py#L60

but I couldn't find this variable in the train file.

I can't get up and running the code

I am trying to run the train.py but i get the error Traceback (most recent call last):
File "train.py", line 141, in
g = Graph(num=num); print("Training Graph loaded")
File "train.py", line 40, in init
self.L, self.mels, self.mags, self.fnames, self.num_batch = get_batch()
File "/home/james/Jamie/dc_tts/data_load.py", line 99, in get_batch
fpath, text_length, text = tf.train.slice_input_producer([fpaths, text_lengths, texts], shuffle=True)
AttributeError: module 'tensorflow_core._api.v2.train' has no attribute 'slice_input_producer'
I'm at most intermediate with python and machine learning could anyone help me get the code up and running I am using the LJSpeach dataset

can this translate chinese to speech

What is minimum train dataset?

Hi, nice work. I really liked Kate generated sample. I was wondering what is the minimum train dataset, because Kate's audio-book was 5 hours and the other's dataset where bigger?

Thks

how to test pretrained model on sound data?

Hello,
I am looking for realtime speech recognition. can I use the pretrained model to do that and How ?
I would appreciate it if anyone can give me tips on how to start with that.

Thanks,

How can I make the synthesis part faster?

The synthesis alone takes minimum 40 seconds to build .( Avoiding the graph loading and model loading) . Is there way to cut short the generation

Pre trained Nick models

Is there anyway these could be shared? Without supplying the obviously copyrighted audio sources, of course.

How to improve text input notation(Question)

For example for "blue" dc_tts output voice speed is fast. What i want is slow some areas of word. For example "bluuue" instead of "blue". But when i input "bluuue" that changes pronunciation ofoutput strangely not like blue. How can i achieve that?

How to train for new speaker and minimum required dataset required?

Can you please guide me on how to train for new speaker with 1 minutes sample as mentioned

load pre-trained model from checkpoint?

I'm trying to use the pre-trained model as a seed for a new voice, but I don't see any code that can load the pre-trained model from the checkpoint. Has anyone gotten that to work?
Thank you

No .npy file found

I have installed all the libraries but it keeps giving me this:

Traceback (most recent call last):
File "train.py", line 160, in
if gs > hp.num_iterations: break
File "/usr/lib/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 1014, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 839, in stop
ignore_live_threads=ignore_live_threads)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 257, in _run
enqueue_callable()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1279, in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: exceptions.IOError: [Errno 2] No such file or directory: 'mels/LJ024-0008.npy'
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/script_ops.py", line 209, in call
ret = func(*args)
File "/home/alansmithans/dc_tts/data_load.py", line 113, in _load_spectrograms
return fname, np.load(mel), np.load(mag)
File "/home/alansmithans/.local/lib/python2.7/site-packages/numpy/lib/npyio.py", line 422, in load
fid = open(os_fspath(file), "rb")
IOError: [Errno 2] No such file or directory: 'mels/LJ024-0008.npy'

change the generized wav

i use the tts model, but the output wav is read by same person, i want to know how can i change the output frequency?

W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: TypeError: a bytes-like object is required, not 'str'

Getting this error in train.py after running prepro.py. Any ideas on what to do?

UnknownError: AttributeError: 'numpy.ndarray' object has no attribute 'replace'

I face the below mentioned error msg when i run train.py

UnknownError: AttributeError: 'numpy.ndarray' object has no attribute 'replace'
	 [[Node: PyFunc = PyFunc[Tin=[DT_STRING], Tout=[DT_STRING, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_producer/Gather)]]


Training Graph loaded
INFO:tensorflow:Starting standard services.
INFO:tensorflow:Starting queue runners.
  0%|                                           | 0/42 [00:00<?, ?b/s]
INFO:tensorflow:gs/global_step/sec: 0
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.UnknownError'>, AttributeError: 'numpy.ndarray' object has no attribute 'replace'
	 [[Node: PyFunc = PyFunc[Tin=[DT_STRING], Tout=[DT_STRING, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_producer/Gather)]]
                                                                      
---------------------------------------------------------------------------
OutOfRangeError                           Traceback (most recent call last)
c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1322     try:
-> 1323       return fn(*args)
   1324     except errors.OpError as e:

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1301                                    feed_dict, fetch_list, target_list,
-> 1302                                    status, run_metadata)
   1303 

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\framework\errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    472             compat.as_text(c_api.TF_Message(self.status.status)),
--> 473             c_api.TF_GetCode(self.status.status))
    474     # Delete the underlying status object from memory otherwise it stays alive

OutOfRangeError: FIFOQueue '_0_bucket_by_sequence_length/bucket/top_queue' is closed and has insufficient elements (requested 1, current size 0)
	 [[Node: bucket_by_sequence_length/bucket/dequeue_top = QueueDequeueV2[component_types=[DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](bucket_by_sequence_length/bucket/top_queue)]]

During handling of the above exception, another exception occurred:

OutOfRangeError                           Traceback (most recent call last)
c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\training\supervisor.py in managed_session(self, master, config, start_standard_services, close_summary_writer)
    953           start_standard_services=start_standard_services)
--> 954       yield sess
    955     except Exception as e:

<ipython-input-3-aada1ae82d48> in <module>()
    128             for _ in tqdm(range(g.num_batch), total=g.num_batch, ncols=70, leave=False, unit='b'):
--> 129                 gs, _ = sess.run([g.global_step, g.train_op])
    130 

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    888       result = self._run(None, fetches, feed_dict, options_ptr,
--> 889                          run_metadata_ptr)
    890       if run_metadata:

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1119       results = self._do_run(handle, final_targets, final_fetches,
-> 1120                              feed_dict_tensor, options, run_metadata)
   1121     else:

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1316       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1317                            options, run_metadata)
   1318     else:

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1335           pass
-> 1336       raise type(e)(node_def, op, message)
   1337 

OutOfRangeError: FIFOQueue '_0_bucket_by_sequence_length/bucket/top_queue' is closed and has insufficient elements (requested 1, current size 0)
	 [[Node: bucket_by_sequence_length/bucket/dequeue_top = QueueDequeueV2[component_types=[DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](bucket_by_sequence_length/bucket/top_queue)]]

Caused by op 'bucket_by_sequence_length/bucket/dequeue_top', defined at:
  File "c:\users\home\appdata\local\programs\python\python35\lib\runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\home\appdata\local\programs\python\python35\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\traitlets\config\application.py", line 658, in launch_instance
    app.start()
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel\kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tornado\ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel\kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\IPython\core\interactiveshell.py", line 2728, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\IPython\core\interactiveshell.py", line 2850, in run_ast_nodes
    if self.run_code(code, result):
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\IPython\core\interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-aada1ae82d48>", line 122, in <module>
    g = Graph(num=num); print("Training Graph loaded")
  File "<ipython-input-3-aada1ae82d48>", line 21, in __init__
    self.L, self.mels, self.mags, self.fnames, self.num_batch = get_batch()
  File "C:\Users\Home\AppData\Local\Programs\Python\Python35\Scripts\examples\dc_tts-master\data_load.py", line 121, in get_batch
    dynamic_pad=True)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\contrib\training\python\training\bucket_ops.py", line 414, in bucket_by_sequence_length
    shared_name=shared_name)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\contrib\training\python\training\bucket_ops.py", line 288, in bucket
    dequeued = top_queue.dequeue(name="dequeue_top")
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\ops\data_flow_ops.py", line 421, in dequeue
    self._queue_ref, self._dtypes, name=name)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\ops\gen_data_flow_ops.py", line 2602, in _queue_dequeue_v2
    timeout_ms=timeout_ms, name=name)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op
    op_def=op_def)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

OutOfRangeError (see above for traceback): FIFOQueue '_0_bucket_by_sequence_length/bucket/top_queue' is closed and has insufficient elements (requested 1, current size 0)
	 [[Node: bucket_by_sequence_length/bucket/dequeue_top = QueueDequeueV2[component_types=[DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](bucket_by_sequence_length/bucket/top_queue)]]


During handling of the above exception, another exception occurred:

UnknownError                              Traceback (most recent call last)
<ipython-input-3-aada1ae82d48> in <module>()
    139 
    140                 # break
--> 141                 if gs > hp.num_iterations: break
    142 
    143     print("Done")

c:\users\home\appdata\local\programs\python\python35\lib\contextlib.py in __exit__(self, type, value, traceback)
     75                 value = type()
     76             try:
---> 77                 self.gen.throw(type, value, traceback)
     78                 raise RuntimeError("generator didn't stop after throw()")
     79             except StopIteration as exc:

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\training\supervisor.py in managed_session(self, master, config, start_standard_services, close_summary_writer)
    962         # threads which are not checking for `should_stop()`.  They
    963         # will be stopped when we close the session further down.
--> 964         self.stop(close_summary_writer=close_summary_writer)
    965       finally:
    966         # Close the session to finish up all pending calls.  We do not care

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\training\supervisor.py in stop(self, threads, close_summary_writer)
    790       # reported.
    791       self._coord.join(threads,
--> 792                        stop_grace_period_secs=self._stop_grace_secs)
    793     finally:
    794       # Close the writer last, in case one of the running threads was using it.

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\training\coordinator.py in join(self, threads, stop_grace_period_secs, ignore_live_threads)
    387       self._registered_threads = set()
    388       if self._exc_info_to_raise:
--> 389         six.reraise(*self._exc_info_to_raise)
    390       elif stragglers:
    391         if ignore_live_threads:

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\training\queue_runner_impl.py in _run(self, sess, enqueue_op, coord)
    236           break
    237         try:
--> 238           enqueue_callable()
    239         except self._queue_closed_exception_types:  # pylint: disable=catching-non-exception
    240           # This exception indicates that a queue was closed.

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in _single_operation_run()
   1229         with errors.raise_exception_on_not_ok_status() as status:
   1230           tf_session.TF_Run(self._session, None, {}, [],
-> 1231                             target_list_as_strings, status, None)
   1232       return _single_operation_run
   1233     elif isinstance(fetches, ops.Tensor):

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\framework\errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    471             None, None,
    472             compat.as_text(c_api.TF_Message(self.status.status)),
--> 473             c_api.TF_GetCode(self.status.status))
    474     # Delete the underlying status object from memory otherwise it stays alive
    475     # as there is a reference to status from this from the traceback due to

UnknownError: AttributeError: 'numpy.ndarray' object has no attribute 'replace'
	 [[Node: PyFunc = PyFunc[Tin=[DT_STRING], Tout=[DT_STRING, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_producer/Gather)]]

gpu out of memory

there is no issue during train 1. but i got OOM issue when train 2.
i am using tensorflow-gpu-1.4.0 , 1080ti gpu. any one has same issue?

Training with GPU vs Movidius

Hello,
I'm noob but i like try...
I use GPU Nvidia 1050 but in trining every loop ~39 min 👎 and need 4329. Is normal?

Is easy modified this code for Movidius NCS compatible?

Thanks!

cuda_status_not_initialized

Why is the 1st frame of mel spectrum set to 0?

Hello, in train.py, line 52
self.S = tf.concat((tf.zeros_like(self.mels[:, :1, :]), self.mels[:, :-1, :]), 1)

the 1st frame values of mel spectrum are set to 0, could you please explain why to do this?

How to run pre-trained models to generate voice e.g. Joe Rogan?

Hi,
Can you please provide instructions on how to use this repo and generate speech from input text? Sorry I'm just having a bit of trouble figuring out how to run it.

Also is there a pre-trained model of Joe Rogan? How would I go about building this? Thanks!

Bucket Boundaries error

Hello,

I've got an issue when trying to run the training script with my input wav file.

The error is:
ValueError: bucket_boundaries must not be empty

I think it might be because of the length of the input files (1 minute).

Could anyone point me in the right direction on how to edit the hyperparams in order to be able to use a file of that length?

Is there any pointer/explanation besides the linked paper which explains the different parameters in hyperparams?

Thank you for your help.

Can we build javascript model for generating the speech in very less time

what are the memory requirements to run the model?

I can see that on sythesize.py my GTX 1080 runs out of memory and GTX 1070 Ti has enough to load the Graph but as soon as the next loop starts I can't make it even a single loop in for loop. What kind of systems is anyone using to successfully run synthesize.py or train.py?

Redundant operations

dc_tts/train.py

Line 131 in 8b38110

    
           self.train_op = self.optimizer.apply_gradients(self.clipped, global_step=self.global_step)

I think you don't need to create train operation on each element here (there are quite many parameters in the model)

Weight Initialization

It looks like the default initialization scheme with tf.get_variable is uniform Glorot , but the paper uses He's Gaussian.

Using ogg for audio input

Does it make a difference to use compressed audio?

No sound in generated samples

So far I've been training for about 3 hours and the generated samples have nothing but quietness in them. I was wondering what could be the case to the issue I'm having.

Cant synthesize audio

Traceback (most recent call last):
File ".\synthesize.py", line 67, in
synthesize()
File ".\synthesize.py", line 34, in synthesize
saver1.restore(sess, tf.train.latest_checkpoint(hp.logdir + "-1"))
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1264, in restore
raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.

test

test 1 2 3

Mirror of pretrained model

idk, fuckit, just in case.
https://darknesseverytime.live/mirror/LJ_logdir.tar

How to do the fine-tuning training?

Used the seed model as base for a different voice, but the output still sounds like LJ.
I think I'm missing some steps here. @Kyubyong said to adjust the hyperparameters, but I'm not sure exactly what to do beyond the obvious steps.

Here's what I did:
Since the batch size is 32, and the author claims to have augmented the model with a minute of voice data, I used 32 voice samples for my second voice.
I edited hyperparameters.py to reflect the new data location and train.py to save the model after just one step. I also deleted the mel and mag folders just in case.
Then I ran prepo.py, train.py 1 and train.py 2. I then ran synthesize.py and the output sounds like LJ.

Help?

Error occur when runnning "python train.py 2"

I have ran "train.py 1 " to >400k.

When I run "train.py 2", I ran in to OOM.
Below is the log from the command.
https://pastebin.com/3hgn7rnd

I am using a 2080 ti and have 64gb ram.
I am using python3.7, cuda 10.0 and tensorflow 1.14.0

Spanish database used

Hello again,

I saw that you upload spanish tts in your soundcloud:
https://soundcloud.com/kyubyong-park/sets/ms10_es_t

I would like to know which database of spanish (audio/transcription) are you using to reach this approach and also in which model of TTS did you reached it.

Can I get access to that database?
Could you upload the pre-trained model for spanish?

Did you parsed audiobooks, splitting them into sentences and making the transcriptions?

Thanks you!

How to run the both network at same time ?

Training with your own voice samples

I am looking to training using my own voice as a sample and was wondering if anyone else had done this yet?

How many samples did you use? How long did you train for?

kyubyong / dc_tts Goto Github PK

dc_tts's Introduction

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

Requirements

Data

Training

Training Curves

Attention Plot

Sample Synthesis

Generated Samples

Pretrained Model for LJ

Notes

dc_tts's People

Contributors

Stargazers

Watchers

Forkers

dc_tts's Issues

Recommend Projects

Recommend Topics

Recommend Org