Coder Social home page Coder Social logo

dc_tts's Introduction

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

I implement yet another text-to-speech model, dc-tts, introduced in Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. My goal, however, is not just replicating the paper. Rather, I'd like to gain insights about various sound projects.

Requirements

  • NumPy >= 1.11.1
  • TensorFlow >= 1.3 (Note that the API of tf.contrib.layers.layer_norm has changed since 1.3)
  • librosa
  • tqdm
  • matplotlib
  • scipy

Data

I train English models and an Korean model on four different speech datasets.

1. LJ Speech Dataset
2. Nick Offerman's Audiobooks
3. Kate Winslet's Audiobook
4. KSS Dataset

LJ Speech Dataset is recently widely used as a benchmark dataset in the TTS task because it is publicly available, and it has 24 hours of reasonable quality samples. Nick's and Kate's audiobooks are additionally used to see if the model can learn even with less data, variable speech samples. They are 18 hours and 5 hours long, respectively. Finally, KSS Dataset is a Korean single speaker speech dataset that lasts more than 12 hours.

Training

  • STEP 0. Download LJ Speech Dataset or prepare your own data.
  • STEP 1. Adjust hyper parameters in hyperparams.py. (If you want to do preprocessing, set prepro True`.
  • STEP 2. Run python train.py 1 for training Text2Mel. (If you set prepro True, run python prepro.py first)
  • STEP 3. Run python train.py 2 for training SSRN.

You can do STEP 2 and 3 at the same time, if you have more than one gpu card.

Training Curves

Attention Plot

Sample Synthesis

I generate speech samples based on Harvard Sentences as the original paper does. It is already included in the repo.

  • Run synthesize.py and check the files in samples.

Generated Samples

Dataset Samples
LJ 50k 200k 310k 800k
Nick 40k 170k 300k 800k
Kate 40k 160k 300k 800k
KSS 400k

Pretrained Model for LJ

Download this.

Notes

  • The paper didn't mention normalization, but without normalization I couldn't get it to work. So I added layer normalization.
  • The paper fixed the learning rate to 0.001, but it didn't work for me. So I decayed it.
  • I tried to train Text2Mel and SSRN simultaneously, but it didn't work. I guess separating those two networks mitigates the burden of training.
  • The authors claimed that the model can be trained within a day, but unfortunately the luck was not mine. However obviously this is much fater than Tacotron as it uses only convolution layers.
  • Thanks to the guided attention, the attention plot looks monotonic almost from the beginning. I guess this seems to hold the aligment tight so it won't lose track.
  • The paper didn't mention dropouts. I applied them as I believe it helps for regularization.
  • Check also other TTS models such as Tacotron and Deep Voice 3.

dc_tts's People

Contributors

kyubyong avatar w19787 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dc_tts's Issues

Vocabulary ( ) ! ; , - and eval sentences

The LJSpeech Dataset contains at least the additional characters ( ) ! ; , -
Comma and dash are included in the original paper. Are these characters intentionally omitted?

Also i think the eval sentences give very little insight about the capabilities of the network.
With this set we don't know if

  • Questions are pronounced correctly
  • Pauses after comma are reasonable
  • Context based pronounciation is correct (e.g from tacotron2 sample "He has read the whole thing. ")

Spanish database used

Hello again,

I saw that you upload spanish tts in your soundcloud:
https://soundcloud.com/kyubyong-park/sets/ms10_es_t

I would like to know which database of spanish (audio/transcription) are you using to reach this approach and also in which model of TTS did you reached it.

Can I get access to that database?
Could you upload the pre-trained model for spanish?

Did you parsed audiobooks, splitting them into sentences and making the transcriptions?

Thanks you!

I struggled to get anything but errors for the longest. Here's what finally got me able to run 'synthesize.py'

Starting with the downloads listed on main page:
LJ Dataset: https://keithito.com/LJ-Speech-Dataset/
needs to be unpacked and put somewhere. Adjust 'data = ' in hyperparameters.py to point to it.
----> Change the name of the .csv file in that folder to 'transcript.csv'. I can't remember now what it was called, but everything will fail unless it's named transcript.csv.

Pretrained models for LJ : https://www.dropbox.com/s/1oyipstjxh2n5wo/LJ_logdir.tar?dl=0
make a directory named 'logdir' in dc_tts. Untar both the LJ01-1 and LJ01-2 folders in there. Again, everything bombs without them being located there.

finally, mkdir samples in dc_tts, or it'll complain about nowhere to write.

If you get obscure errors like "TypeError: new() got an unexpected keyword argument 'file'", you're in luck because I know what to do. Your 'protobuf' python library is out of date. Turned out for me a rogue older version was hiding out in ~/.local/lib/python2.7/site-packages/. Pip didnt seem to mind or notice when I did a million varieties of attempting to uninstall/reinstall. Finally, I rm -rf -ed that mother and then did a sudo -H pip install --upgrade protobuf. You need to make sure your version is greater than 3.something-or-other or again, no dice.

... python synthesize.py at this point runs to completion. It produces 20 files, I assume meant to match the 'harvard_sentences.txt' file, all garbled static or silence only, but it indeed completed. That's a start.

Training with your own voice samples

I am looking to training using my own voice as a sample and was wondering if anyone else had done this yet?

How many samples did you use? How long did you train for?

how to test pretrained model on sound data?

Hello,
I am looking for realtime speech recognition. can I use the pretrained model to do that and How ?
I would appreciate it if anyone can give me tips on how to start with that.

Thanks,

Bucket Boundaries error

Hello,

I've got an issue when trying to run the training script with my input wav file.

The error is:
ValueError: bucket_boundaries must not be empty

I think it might be because of the length of the input files (1 minute).

Could anyone point me in the right direction on how to edit the hyperparams in order to be able to use a file of that length?

Is there any pointer/explanation besides the linked paper which explains the different parameters in hyperparams?

Thank you for your help.

Non-latic alpabet

Can it work with extended latin alpabet i.e. Polish? Does it have to be Latin-only?

How to do the fine-tuning training?

Used the seed model as base for a different voice, but the output still sounds like LJ.
I think I'm missing some steps here. @Kyubyong said to adjust the hyperparameters, but I'm not sure exactly what to do beyond the obvious steps.

Here's what I did:
Since the batch size is 32, and the author claims to have augmented the model with a minute of voice data, I used 32 voice samples for my second voice.
I edited hyperparameters.py to reflect the new data location and train.py to save the model after just one step. I also deleted the mel and mag folders just in case.
Then I ran prepo.py, train.py 1 and train.py 2. I then ran synthesize.py and the output sounds like LJ.

Help?

How to run pre-trained models to generate voice e.g. Joe Rogan?

Hi,
Can you please provide instructions on how to use this repo and generate speech from input text? Sorry I'm just having a bit of trouble figuring out how to run it.

Also is there a pre-trained model of Joe Rogan? How would I go about building this? Thanks!

How to improve text input notation(Question)

For example for "blue" dc_tts output voice speed is fast. What i want is slow some areas of word. For example "bluuue" instead of "blue". But when i input "bluuue" that changes pronunciation ofoutput strangely not like blue. How can i achieve that?

Strange tensor flow error due to an argument

Hello,

First of all, thank you for your effort with the open-source TTS projects.
As summary, I'm trying to create a TTS of Spanish language using your open-source dc_tts.

To do that, I want to train in my laptop the net that you uploaded as "pre-trained net" in order to know if I will be able to train my own network for spanish (once I have the spanish database correctly), but I get an error of tensorflow due to an argument. I have all the libraries correctly installed and the LJ database downloaded.

The only thing that I did was to download the repository, fix correctly the database paths (LJ) and run the train script with parameter "1".

C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: TypeError: a bytes-like object is required, not 'str'

Thank you!

Why is the 1st frame of mel spectrum set to 0?

Hello, in train.py, line 52
self.S = tf.concat((tf.zeros_like(self.mels[:, :1, :]), self.mels[:, :-1, :]), 1)

the 1st frame values of mel spectrum are set to 0, could you please explain why to do this?

Redundant operations

dc_tts/train.py

Line 131 in 8b38110

self.train_op = self.optimizer.apply_gradients(self.clipped, global_step=self.global_step)

I think you don't need to create train operation on each element here (there are quite many parameters in the model)

load pre-trained model from checkpoint?

I'm trying to use the pre-trained model as a seed for a new voice, but I don't see any code that can load the pre-trained model from the checkpoint. Has anyone gotten that to work?
Thank you

No .npy file found

I have installed all the libraries but it keeps giving me this:

Traceback (most recent call last):
File "train.py", line 160, in
if gs > hp.num_iterations: break
File "/usr/lib/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 1014, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 839, in stop
ignore_live_threads=ignore_live_threads)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 257, in _run
enqueue_callable()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1279, in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: exceptions.IOError: [Errno 2] No such file or directory: 'mels/LJ024-0008.npy'
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/script_ops.py", line 209, in call
ret = func(*args)
File "/home/alansmithans/dc_tts/data_load.py", line 113, in _load_spectrograms
return fname, np.load(mel), np.load(mag)
File "/home/alansmithans/.local/lib/python2.7/site-packages/numpy/lib/npyio.py", line 422, in load
fid = open(os_fspath(file), "rb")
IOError: [Errno 2] No such file or directory: 'mels/LJ024-0008.npy'

What is minimum train dataset?

Hi, nice work. I really liked Kate generated sample. I was wondering what is the minimum train dataset, because Kate's audio-book was 5 hours and the other's dataset where bigger?

Thks

What is the shape of the input node?

Hi, I am trying to port this model to OpenVINO IR. But I will need to know few details about the model:

  1. Input and output nodes
  2. Shape of the input node
    Thanks!

No sound in generated samples

So far I've been training for about 3 hours and the generated samples have nothing but quietness in them. I was wondering what could be the case to the issue I'm having.

Usage guide / tutorial ?

Hi,

I've had okay results with Keithito's tacotron implementation, but I wanted to try this too (and your tacotron 2 implementation too but I suppose the question fits both equally)

Could you give a short guide on what to do to run this model from scratch? I've got my own dataset modeled after the Keithito's LJSpeech dataset so anything that works with that should work on my data.

On top of that, could you share your weights / checkpoints? I've noticed that, even though my dataset is in Dutch, it worked on only a few hundred iterations on top of Keithito's English tacotron training weights. Mainly the alignment was hard / slow to train from scratch I believe.

Thanks!

How to get the loss of a model?

Hi,

Sorry, I'm new to TensorFlow. How can I output a graph of the loss over time?

Is it possible to also get the best loss in order to compare two different models after n iterations? I want to train two tts models on the same dataset and compare the two

thanks!

[SOLVED] It is not training

This is pretty weird.

The graph Attention plot graph is also blank

alignment_009k

I restarted again, same issue ๐Ÿ˜•

The synthesised audio is blank. Each sentence produces an audio sample of 10 seconds of silence.

Cant synthesize audio

Traceback (most recent call last):
File ".\synthesize.py", line 67, in
synthesize()
File ".\synthesize.py", line 34, in synthesize
saver1.restore(sess, tf.train.latest_checkpoint(hp.logdir + "-1"))
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1264, in restore
raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.

UnknownError: AttributeError: 'numpy.ndarray' object has no attribute 'replace'

I face the below mentioned error msg when i run train.py

UnknownError: AttributeError: 'numpy.ndarray' object has no attribute 'replace'
	 [[Node: PyFunc = PyFunc[Tin=[DT_STRING], Tout=[DT_STRING, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_producer/Gather)]]

Training Graph loaded
INFO:tensorflow:Starting standard services.
INFO:tensorflow:Starting queue runners.
  0%|                                           | 0/42 [00:00<?, ?b/s]
INFO:tensorflow:gs/global_step/sec: 0
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.UnknownError'>, AttributeError: 'numpy.ndarray' object has no attribute 'replace'
	 [[Node: PyFunc = PyFunc[Tin=[DT_STRING], Tout=[DT_STRING, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_producer/Gather)]]
                                                                      
---------------------------------------------------------------------------
OutOfRangeError                           Traceback (most recent call last)
c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1322     try:
-> 1323       return fn(*args)
   1324     except errors.OpError as e:

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1301                                    feed_dict, fetch_list, target_list,
-> 1302                                    status, run_metadata)
   1303 

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\framework\errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    472             compat.as_text(c_api.TF_Message(self.status.status)),
--> 473             c_api.TF_GetCode(self.status.status))
    474     # Delete the underlying status object from memory otherwise it stays alive

OutOfRangeError: FIFOQueue '_0_bucket_by_sequence_length/bucket/top_queue' is closed and has insufficient elements (requested 1, current size 0)
	 [[Node: bucket_by_sequence_length/bucket/dequeue_top = QueueDequeueV2[component_types=[DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](bucket_by_sequence_length/bucket/top_queue)]]

During handling of the above exception, another exception occurred:

OutOfRangeError                           Traceback (most recent call last)
c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\training\supervisor.py in managed_session(self, master, config, start_standard_services, close_summary_writer)
    953           start_standard_services=start_standard_services)
--> 954       yield sess
    955     except Exception as e:

<ipython-input-3-aada1ae82d48> in <module>()
    128             for _ in tqdm(range(g.num_batch), total=g.num_batch, ncols=70, leave=False, unit='b'):
--> 129                 gs, _ = sess.run([g.global_step, g.train_op])
    130 

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    888       result = self._run(None, fetches, feed_dict, options_ptr,
--> 889                          run_metadata_ptr)
    890       if run_metadata:

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1119       results = self._do_run(handle, final_targets, final_fetches,
-> 1120                              feed_dict_tensor, options, run_metadata)
   1121     else:

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1316       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1317                            options, run_metadata)
   1318     else:

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1335           pass
-> 1336       raise type(e)(node_def, op, message)
   1337 

OutOfRangeError: FIFOQueue '_0_bucket_by_sequence_length/bucket/top_queue' is closed and has insufficient elements (requested 1, current size 0)
	 [[Node: bucket_by_sequence_length/bucket/dequeue_top = QueueDequeueV2[component_types=[DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](bucket_by_sequence_length/bucket/top_queue)]]

Caused by op 'bucket_by_sequence_length/bucket/dequeue_top', defined at:
  File "c:\users\home\appdata\local\programs\python\python35\lib\runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\home\appdata\local\programs\python\python35\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\traitlets\config\application.py", line 658, in launch_instance
    app.start()
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel\kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tornado\ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel\kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\IPython\core\interactiveshell.py", line 2728, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\IPython\core\interactiveshell.py", line 2850, in run_ast_nodes
    if self.run_code(code, result):
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\IPython\core\interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-aada1ae82d48>", line 122, in <module>
    g = Graph(num=num); print("Training Graph loaded")
  File "<ipython-input-3-aada1ae82d48>", line 21, in __init__
    self.L, self.mels, self.mags, self.fnames, self.num_batch = get_batch()
  File "C:\Users\Home\AppData\Local\Programs\Python\Python35\Scripts\examples\dc_tts-master\data_load.py", line 121, in get_batch
    dynamic_pad=True)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\contrib\training\python\training\bucket_ops.py", line 414, in bucket_by_sequence_length
    shared_name=shared_name)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\contrib\training\python\training\bucket_ops.py", line 288, in bucket
    dequeued = top_queue.dequeue(name="dequeue_top")
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\ops\data_flow_ops.py", line 421, in dequeue
    self._queue_ref, self._dtypes, name=name)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\ops\gen_data_flow_ops.py", line 2602, in _queue_dequeue_v2
    timeout_ms=timeout_ms, name=name)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op
    op_def=op_def)
  File "c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

OutOfRangeError (see above for traceback): FIFOQueue '_0_bucket_by_sequence_length/bucket/top_queue' is closed and has insufficient elements (requested 1, current size 0)
	 [[Node: bucket_by_sequence_length/bucket/dequeue_top = QueueDequeueV2[component_types=[DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](bucket_by_sequence_length/bucket/top_queue)]]


During handling of the above exception, another exception occurred:

UnknownError                              Traceback (most recent call last)
<ipython-input-3-aada1ae82d48> in <module>()
    139 
    140                 # break
--> 141                 if gs > hp.num_iterations: break
    142 
    143     print("Done")

c:\users\home\appdata\local\programs\python\python35\lib\contextlib.py in __exit__(self, type, value, traceback)
     75                 value = type()
     76             try:
---> 77                 self.gen.throw(type, value, traceback)
     78                 raise RuntimeError("generator didn't stop after throw()")
     79             except StopIteration as exc:

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\training\supervisor.py in managed_session(self, master, config, start_standard_services, close_summary_writer)
    962         # threads which are not checking for `should_stop()`.  They
    963         # will be stopped when we close the session further down.
--> 964         self.stop(close_summary_writer=close_summary_writer)
    965       finally:
    966         # Close the session to finish up all pending calls.  We do not care

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\training\supervisor.py in stop(self, threads, close_summary_writer)
    790       # reported.
    791       self._coord.join(threads,
--> 792                        stop_grace_period_secs=self._stop_grace_secs)
    793     finally:
    794       # Close the writer last, in case one of the running threads was using it.

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\training\coordinator.py in join(self, threads, stop_grace_period_secs, ignore_live_threads)
    387       self._registered_threads = set()
    388       if self._exc_info_to_raise:
--> 389         six.reraise(*self._exc_info_to_raise)
    390       elif stragglers:
    391         if ignore_live_threads:

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\training\queue_runner_impl.py in _run(self, sess, enqueue_op, coord)
    236           break
    237         try:
--> 238           enqueue_callable()
    239         except self._queue_closed_exception_types:  # pylint: disable=catching-non-exception
    240           # This exception indicates that a queue was closed.

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\client\session.py in _single_operation_run()
   1229         with errors.raise_exception_on_not_ok_status() as status:
   1230           tf_session.TF_Run(self._session, None, {}, [],
-> 1231                             target_list_as_strings, status, None)
   1232       return _single_operation_run
   1233     elif isinstance(fetches, ops.Tensor):

c:\users\home\appdata\local\programs\python\python35\lib\site-packages\tensorflow\python\framework\errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    471             None, None,
    472             compat.as_text(c_api.TF_Message(self.status.status)),
--> 473             c_api.TF_GetCode(self.status.status))
    474     # Delete the underlying status object from memory otherwise it stays alive
    475     # as there is a reference to status from this from the traceback due to

UnknownError: AttributeError: 'numpy.ndarray' object has no attribute 'replace'
	 [[Node: PyFunc = PyFunc[Tin=[DT_STRING], Tout=[DT_STRING, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_producer/Gather)]]

How to uncompress the pretrained model?

Hello,

I downloaded the pre-trained model (LJLogdir.tar) from your dropbox and tried to uncompress by using WinZip in Windows - It gave me an error message "Could not read the header". I tried tar -xvf in Linux, here again, I got errors.

What is the format of this file? and how do I successfully untar it?

thanks,
Buvana

FileNotFoundError: [Errno 2] No such file or directory - Wavs folder not found

I'm trying to Run Prepo.py but I get this error. I do in fact have the .wav files in this directory and I have changed the Hyperparams to point into the correct folders. Am I getting something wrong here? should I add some argument to the python prepo.py

Traceback (most recent call last):
  File "prepo.py", line 20, in <module>
    fname, mel, mag = load_spectrograms(fpath)
  File "C:\Users\dbarroso\Development Projects\Morti-OS Suite\Morti-OS-Suite\TTS\utils.py", line 152, in load_spectrograms
    mel, mag = get_spectrograms(fpath)
  File "C:\Users\dbarroso\Development Projects\Morti-OS Suite\Morti-OS-Suite\TTS\utils.py", line 32, in get_spectrograms
    y, sr = librosa.load(fpath, sr=hp.sr)
  File "C:\anaconda3\envs\morti_os\lib\site-packages\librosa\core\audio.py", line 112, in load
    with audioread.audio_open(os.path.realpath(path)) as input_file:
  File "C:\anaconda3\envs\morti_os\lib\site-packages\audioread\__init__.py", line 80, in audio_open
    return rawread.RawAudioFile(path)
  File "C:\anaconda3\envs\morti_os\lib\site-packages\audioread\rawread.py", line 61, in __init__
    self._fh = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\<USERNAME>\\TTS\\data\\private\\voice\\wheatly\\wavs\\\ufeffSM001-0001.wav'

NOTE:
in this section of the error:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\<USERNAME>\\TTS\\data\\private\\voice\\wheatly\\wavs\\\ufeffSM001-0001.wav'

ufeff should not be there, it should just be SM001-001.wav, at some point its adding this to the path

I can't get up and running the code

I am trying to run the train.py but i get the error Traceback (most recent call last):
File "train.py", line 141, in
g = Graph(num=num); print("Training Graph loaded")
File "train.py", line 40, in init
self.L, self.mels, self.mags, self.fnames, self.num_batch = get_batch()
File "/home/james/Jamie/dc_tts/data_load.py", line 99, in get_batch
fpath, text_length, text = tf.train.slice_input_producer([fpaths, text_lengths, texts], shuffle=True)
AttributeError: module 'tensorflow_core._api.v2.train' has no attribute 'slice_input_producer'
I'm at most intermediate with python and machine learning could anyone help me get the code up and running I am using the LJSpeach dataset

Invalid argument: TypeError: a bytes-like object is required, not 'str'

Python 3.5-3.6 is the only Tensorflow platform available on Windows now. If you try to run this code on any new installation, it will fail because of this error:

Invalid argument: TypeError: a bytes-like object is required, not 'str'

Previous Issues that mentioned this have been closed, but no fix has been suggested. It is not a closed issue.

I am trying to fix the type mismatch, but simple encoding just causes other errors. Any ideas so this code can remain usable for everyone?

gpu out of memory

there is no issue during train 1. but i got OOM issue when train 2.
i am using tensorflow-gpu-1.4.0 , 1080ti gpu. any one has same issue?

what are the memory requirements to run the model?

I can see that on sythesize.py my GTX 1080 runs out of memory and GTX 1070 Ti has enough to load the Graph but as soon as the next loop starts I can't make it even a single loop in for loop. What kind of systems is anyone using to successfully run synthesize.py or train.py?

Training with GPU vs Movidius

Hello,
I'm noob but i like try...
I use GPU Nvidia 1050 but in trining every loop ~39 min ๐Ÿ‘Ž and need 4329. Is normal?

Is easy modified this code for Movidius NCS compatible?

Thanks!

change the generized wav

i use the tts model, but the output wav is read by same person, i want to know how can i change the output frequency?

Horizontal Attention plot at synthesis

If you try, in synthesis, to save and show Attention computed with the model pretrained on LJ-speech for example, it will look like this:
alignment_3

Why is it horizontal and not diagonal like during the training ? The synthesis works just fine though ...

If I comment, in "networks.py", in the function "Attention" the part corresponding to "monotonic attention" like this:

    A = tf.matmul(Q, K, transpose_b=True) * tf.rsqrt(tf.to_float(hp.d))
    # if mononotic_attention:  # for inference
    #     key_masks = tf.sequence_mask(prev_max_attentions, hp.max_N)
    #     reverse_masks = tf.sequence_mask(hp.max_N - hp.attention_win_size - prev_max_attentions, hp.max_N)[:, ::-1]
    #     masks = tf.logical_or(key_masks, reverse_masks)
    #     masks = tf.tile(tf.expand_dims(masks, 1), [1, hp.max_T, 1])
    #     paddings = tf.ones_like(A) * (-2 ** 32 + 1)  # (B, T/r, N)
    #     A = tf.where(tf.equal(masks, False), A, paddings)
    A = tf.nn.softmax(A) # (B, T/r, N)
    max_attentions = tf.argmax(A, -1)  # (B, T/r)
    R = tf.matmul(A, V)
    R = tf.concat((R, Q), -1)

The attention plot will be of diagonal shape, and the synthesis not too bad but will have the problem mentioned in the paper: may skip letters or pronounce several times parts of words.

forcibly incremental attention maxtrix passing problem in the synthesization.

To synthesize mel spectrogram at step t(audio step), need the full (forcibly Incremental ) attention matrix.
In synthesize.py, only argmaxed values are passed. Then returned _Y changes at each step.
For this reason, the results are problematic.

For example,

Let _Y3 be the g.Y array obtained in the step 3.
Let _Y4 be the g.Y array obtained in the step 4.

_Y3[:,3,:] is not equal _Y4[:,3,:]
_Y4[:,3,:] affects _Y4[:,4,:].

Pre trained Nick models

Is there anyway these could be shared? Without supplying the obviously copyrighted audio sources, of course.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.