youssefsharief / arabic-tacotron-tts Goto Github PK

View Code? Open in Web Editor NEW

114.0 12.0 35.0 1.07 MB

End to end Arabic TTS system based on tacotron

License: MIT License

Python 100.00%

arabic-tacotron-tts's Introduction

Arabic Tacotron TTS

An implementation of Tacotron speech synthesis in TensorFlow for Arabic.

Audio Samples

Check Audio Samples from models trained using this repo on Nawar Halabi's speech corpus

Background

In April 2017, Google published a paper, Tacotron: Towards End-to-End Speech Synthesis, where they present a neural text-to-speech model that learns to synthesize speech directly from (text, audio) pairs. However, they didn't release their source code or training data. This is an attempt to provide an open-source implementation of the model described in their paper.

This implementation is pretty much the same as Keithito's implemetation. Here are changes I made.

Check this article to know more about this project Arabic-Tacotron-TTS

Quick Start

Installing dependencies

Install Python 3. Use version 3.5 instead of newer python versions for tensorflow support. You could use anaconda to create a new environment by
1. conda create -n myenv python=3.5
2. activate myenv
Install requirements: pip install -r requirements.txt
Install tensorflow
- pip install tensorflow or pip install tensorflow-gpu

Using a pre-trained model

Download and unpack the pretrained model
Extract the model files into a folder in a destination of your choice
Run the demo server: python demo_server.py --checkpoint .\{folder_in_a_destination_of_your_choice}\model.ckpt-200000
Point your browser at localhost:9200
- Type what you want to synthesize. Use only diacritised Arabic text.

Training

Note: you need 40GB (more or less) of free disk space to train a model.

Download a speech dataset. The following are supported out of the box:
- Nawar Halabi You can use other datasets if you convert them to the right format. See TRAINING_DATA.md for more info.
Preprocess
- Unpack the dataset`
- Add a folder called nawar_without_hag9 in ~/tacotron
  - Download temp_filtered.csv and add it there.
  - Add a folder called wavs there in which all wav files are there Your tree should look like this
```
tacotron
    |- nawar_without_hag9
        |- temp_filtered.csv
        |- wavs
```
- Run python .\preprocess.py --dataset nawar
- Update max_iters to 400 if not already set to that number
Train
- python .\train.py
Monitor with Tensorboard (optional) The trainer dumps audio and alignments every 1000 steps. You can find these in ~/tacotron/logs-tacotron. You could use tensorboard to make sense out of these data using the following command. tensorboard --logdir ~/tacotron/logs-tacotron

Changes from the original repo

Added Arabic speech corpus preprocessing code and created temp_filtered.csv
Hosted Arabic trained model based on Nawar Halabi's Speech Corpus
Updated hparams to work with the Arabic speech corpus
Added an instructional explanation on how to reproduce
Added Arabic specific tests
Removed some of the unused code
Updated symbols to match the Arabic phonetic language
Adjusted data-feeder so that all input text are phonetised by arabic_pronounce

Areas of Improvement

Add cleaners
Add embedded diacritiser

Summary of important commands

python .\preprocess.py --dataset nawar
python .\train.py --restore_step 201000
python .\demo_server.py --checkpoint C:\Users\User\tacotron\logs-tacotron\model.ckpt-70000
python -m pytest

Notes

Remember to delete the files in the training folder then preprocess again if you changed the config

Thanks to

Suhail Kwailat, Dr. Taha Zerrouki, Dr. Motaz Saad, Dr. Nawar Halabi, Keith Ito, Dr. Basem Ahmed, and Leo Ma for their detailed feedback and recommendations.

arabic-tacotron-tts's People

Contributors

Stargazers

Watchers

arabic-tacotron-tts's Issues

Sampling rate

Did you downsample the dataset before extracting Mel spectrum?

temp_filtered.csv

temp_filtered.csv contains 906 samples, do you have any rest data of Arabic Speech Corpus? As I download it. it has 1813 samples.
Thanks!

Hardware Requirements

Hello, I'm a beginner and trying to implement tacotron2, but I have a problem with using TensorFlow since from what I have understood I don't have a suitable GPU.
I'm using Macbook Pro 2012 that has the following specification:
Graphics: Intel® HD Graphics 4000 (IVB GT2)
Disk space: 512.1 GB SSD
Memory: 8 GB
Processor: Intel® Core™ i5-3210M CPU @ 2.50GHz × 4
I'm also using dual boot with OS X and Ubuntu

Do you have any recommendations on what GPU is suitable? or maybe one provided in the cloud? or any way around this issue?
Any help is much appreciated

Create a simple cli for batch TTS

There is a grave need for a batch TTS

tacotron2

@youssefsharief

is there a dll for linux

alsalam alikum
i would like to run it on an embedded device such as raspberry pi and thus i need a dll or a simple lite way to run it or devices with limited resources. It is possible to integrate it into festival or into the libraries supported by the Speech Dispatcher. please give me a feed back

thank you in advance
regards,
Dr. Sherif Omran

temp_filtered.csv

I can't find temp_filtered.csv file
anyone can help me ?

Demo Server Error

I got this error when I invoked py demo_server.py regardless if I included the check point or not.

2021-12-31 02:11:16.182040: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-12-31 02:11:16.182221: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "C:\Users\user\desktop\arabic-tacotron-tts\demo_server.py", line 3, in
from hparams import hparams, hparams_debug_string
File "C:\Users\user\desktop\arabic-tacotron-tts\hparams.py", line 5, in
hparams = tf.contrib.training.HParams(
AttributeError: module 'tensorflow' has no attribute 'contrib'

Do you know what could have gone wrong here?

paper for published result

Hi @youssefsharief , I am wondering if you've published a paper with mentioned result.
could i have your mail, please ?

some suggestions regarding the arabic tts system

Hello developers. I am not a programmer, but I want to make some suggestions for improving the quality and supporting a wider variety of users.
First, I request to change the vocoder to a better one like hi fi gan or melgan if it supports direct text to speech output without mel spectrograms. The second thing I want to suggest is sapifying the project, which means porting the project to be used by sapi5(speech application programming interface) programs, like text readers, ebook readers, screen readers for blind people and other assistive technologies for windows. The tts system should be responsive and not laggy before the speech and in the middle of it. I hope that we blind people can find a good free responsive arabic tts to read our texts, without relying on payed voices made by the west. I hope you consider my suggestions and that this project isn't dead. Thanks for your great help.

Weights

salaam 3laikm :)
great work on this project. can you please re-upload the weights ?

I am getting the following error:
Data loss: Unable to open table file .\weights\model.ckpt-200000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

No alignment for Urdu

Hi,
I have used this model to train on my speech Urdu dataset. It contains 10000 .wav files of 15 hours speech dataset. Average file is size 5.4 sec. I used default parameters and trained for 50000 steps. I used transliterated labels and transliteration_cleaner. Here is the alignment.

temp_filtered.csv

temp_filtered.csv contains 906 samples, do you have any rest data of Arabic Speech Corpus? As I download it. it has 1813 samples.
Thanks!

Error while runing the Demo server

when i run the Demo server i get this error and i didn't know how to fix it or what should i do !

Loading checkpoint: ./home/khalyl/Desktop/work/tacotron/arabic-tacotron-tts/model.ckpt-20000 Traceback (most recent call last): File "demo_server.py", line 91, in <module> synthesizer.load(args.checkpoint) File "/home/khalyl/Desktop/work/tacotron/arabic-tacotron-tts/synthesizer.py", line 26, in load saver.restore(self.session, checkpoint_path) File "/home/khalyl/anaconda3/envs/tacotronenv/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1715, in restore if not checkpoint_exists(compat.as_text(save_path)): File "/home/khalyl/anaconda3/envs/tacotronenv/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 2056, in checkpoint_exists if file_io.get_matching_files(pathname): File "/home/khalyl/anaconda3/envs/tacotronenv/lib/python3.5/site-packages/tensorflow/python/lib/io/file_io.py", line 342, in get_matching_files for single_filename in filename File "/home/khalyl/anaconda3/envs/tacotronenv/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: ./home/khalyl/Desktop/work/tacotron/arabic-tacotron-tts; No such file or directory

how todo this step : Point your browser at localhost:9200

WARNING:tensorflow:From /home/arabic-tacotron-tts/models/tacotron.py:56: MultiRNNCell.init (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.
Initialized Tacotron model. Dimensions:
embedding: 256
prenet out: 128
encoder out: 256
decoder out (5 frames): 400
decoder out (1 frame): 80
postnet out: 256
linear out: 1025
Loading checkpoint: demo/model.ckpt-200000
WARNING:tensorflow:From /home/arabic-tacotron-tts/synthesizer.py:23: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2022-04-24 22:52:06.595957: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
WARNING:tensorflow:From /home/arabic-tacotron-tts/synthesizer.py:24: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From /home/arabic-tacotron-tts/synthesizer.py:25: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

Serving on port 9200
الْق.........................................................................
when i add diacritized text , nothing is hapen
what should i add to complete the steps of pre-trained model

Is There any dataset that is larger than this one ? did you try to use this tts in voice cloning in arabic language?

Time lag for execution on tensorflow cpu

This is more of a question rather than an issue, I installed arabic-tacotron on 2 machines, one with a i5 cpu and 4G Rams, the other i7 with 8G rams. On those machines it took arabic-tacotron 125 seconds (i5) and 95 seconds (i7) to produce the sound.
The english tacotron of Kiethito produces sound after 35 seconds on the i7 machine.
Wonder if you can share how much time arabic-tacotron takes to produce sound on machines that you installed it on, and what might be causing the big slow down compared to Kiethito's tacotron.
Also are there any plans to port it to tensor 2?

Python version support for TensorFlow

in the 'Quick start' section, it says "Use version 3.5 instead of newer python versions for TensorFlow support"
but on the TensorFlow website, they say "TensorFlow is tested and supported on the following 64-bit systems: Python 3.7–3.9"
so, can I ignore what you are saying about using python 3.5 instead of the newer version?

Demo without server

Hey,

i was wondering how can i demo the model without a server , for example the input is a txt file where i have put all text sentences i would like to be synthesized then i get set of wav files for each sentence

youssefsharief / arabic-tacotron-tts Goto Github PK

arabic-tacotron-tts's Introduction

Arabic Tacotron TTS

Audio Samples

Background

Quick Start

Installing dependencies

Using a pre-trained model

Training

Changes from the original repo

Areas of Improvement

Summary of important commands

Notes

Thanks to

arabic-tacotron-tts's People

Contributors

Stargazers

Watchers

Forkers

arabic-tacotron-tts's Issues

Recommend Projects

Recommend Topics

Recommend Org