Coder Social home page Coder Social logo

arabic-tacotron-tts's Introduction

Arabic Tacotron TTS

An implementation of Tacotron speech synthesis in TensorFlow for Arabic.

Audio Samples

Check Audio Samples from models trained using this repo on Nawar Halabi's speech corpus

Background

In April 2017, Google published a paper, Tacotron: Towards End-to-End Speech Synthesis, where they present a neural text-to-speech model that learns to synthesize speech directly from (text, audio) pairs. However, they didn't release their source code or training data. This is an attempt to provide an open-source implementation of the model described in their paper.

This implementation is pretty much the same as Keithito's implemetation. Here are changes I made.

Check this article to know more about this project Arabic-Tacotron-TTS

Quick Start

Installing dependencies

  1. Install Python 3. Use version 3.5 instead of newer python versions for tensorflow support. You could use anaconda to create a new environment by

    1. conda create -n myenv python=3.5
    2. activate myenv
  2. Install requirements: pip install -r requirements.txt

  3. Install tensorflow

    • pip install tensorflow or pip install tensorflow-gpu

Using a pre-trained model

  1. Download and unpack the pretrained model

  2. Extract the model files into a folder in a destination of your choice

  3. Run the demo server: python demo_server.py --checkpoint .\{folder_in_a_destination_of_your_choice}\model.ckpt-200000

  4. Point your browser at localhost:9200

    • Type what you want to synthesize. Use only diacritised Arabic text.

Training

Note: you need 40GB (more or less) of free disk space to train a model.

  1. Download a speech dataset. The following are supported out of the box:

  2. Preprocess

    • Unpack the dataset`
    • Add a folder called nawar_without_hag9 in ~/tacotron
      • Download temp_filtered.csv and add it there.
      • Add a folder called wavs there in which all wav files are there Your tree should look like this
    tacotron
        |- nawar_without_hag9
            |- temp_filtered.csv
            |- wavs
    
    • Run python .\preprocess.py --dataset nawar
    • Update max_iters to 400 if not already set to that number
  3. Train

    • python .\train.py
  4. Monitor with Tensorboard (optional) The trainer dumps audio and alignments every 1000 steps. You can find these in ~/tacotron/logs-tacotron. You could use tensorboard to make sense out of these data using the following command. tensorboard --logdir ~/tacotron/logs-tacotron

Changes from the original repo

  1. Added Arabic speech corpus preprocessing code and created temp_filtered.csv
  2. Hosted Arabic trained model based on Nawar Halabi's Speech Corpus
  3. Updated hparams to work with the Arabic speech corpus
  4. Added an instructional explanation on how to reproduce
  5. Added Arabic specific tests
  6. Removed some of the unused code
  7. Updated symbols to match the Arabic phonetic language
  8. Adjusted data-feeder so that all input text are phonetised by arabic_pronounce

Areas of Improvement

  • Add cleaners
  • Add embedded diacritiser

Summary of important commands

  • python .\preprocess.py --dataset nawar
  • python .\train.py --restore_step 201000
  • python .\demo_server.py --checkpoint C:\Users\User\tacotron\logs-tacotron\model.ckpt-70000
  • python -m pytest

Notes

Remember to delete the files in the training folder then preprocess again if you changed the config

Thanks to

Suhail Kwailat, Dr. Taha Zerrouki, Dr. Motaz Saad, Dr. Nawar Halabi, Keith Ito, Dr. Basem Ahmed, and Leo Ma for their detailed feedback and recommendations.

arabic-tacotron-tts's People

Contributors

begeekmyfriend avatar candlewill avatar jyegerlehner avatar keithito avatar mgoldey avatar pawelkopec avatar r9y9 avatar srstevenson avatar yoosif0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arabic-tacotron-tts's Issues

Sampling rate

Did you downsample the dataset before extracting Mel spectrum?

temp_filtered.csv

temp_filtered.csv contains 906 samples, do you have any rest data of Arabic Speech Corpus? As I download it. it has 1813 samples.
Thanks!

Hardware Requirements

Hello, I'm a beginner and trying to implement tacotron2, but I have a problem with using TensorFlow since from what I have understood I don't have a suitable GPU.
I'm using Macbook Pro 2012 that has the following specification:
Graphics: Intel® HD Graphics 4000 (IVB GT2)
Disk space: 512.1 GB SSD
Memory: 8 GB
Processor: Intel® Core™ i5-3210M CPU @ 2.50GHz × 4
I'm also using dual boot with OS X and Ubuntu

Do you have any recommendations on what GPU is suitable? or maybe one provided in the cloud? or any way around this issue?
Any help is much appreciated

is there a dll for linux

alsalam alikum
i would like to run it on an embedded device such as raspberry pi and thus i need a dll or a simple lite way to run it or devices with limited resources. It is possible to integrate it into festival or into the libraries supported by the Speech Dispatcher. please give me a feed back

thank you in advance
regards,
Dr. Sherif Omran

Demo Server Error

I got this error when I invoked py demo_server.py regardless if I included the check point or not.

2021-12-31 02:11:16.182040: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-12-31 02:11:16.182221: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "C:\Users\user\desktop\arabic-tacotron-tts\demo_server.py", line 3, in
from hparams import hparams, hparams_debug_string
File "C:\Users\user\desktop\arabic-tacotron-tts\hparams.py", line 5, in
hparams = tf.contrib.training.HParams(
AttributeError: module 'tensorflow' has no attribute 'contrib'

Do you know what could have gone wrong here?

paper for published result

Hi @youssefsharief , I am wondering if you've published a paper with mentioned result.
could i have your mail, please ?

some suggestions regarding the arabic tts system

Hello developers. I am not a programmer, but I want to make some suggestions for improving the quality and supporting a wider variety of users.
First, I request to change the vocoder to a better one like hi fi gan or melgan if it supports direct text to speech output without mel spectrograms. The second thing I want to suggest is sapifying the project, which means porting the project to be used by sapi5(speech application programming interface) programs, like text readers, ebook readers, screen readers for blind people and other assistive technologies for windows. The tts system should be responsive and not laggy before the speech and in the middle of it. I hope that we blind people can find a good free responsive arabic tts to read our texts, without relying on payed voices made by the west. I hope you consider my suggestions and that this project isn't dead. Thanks for your great help.

Weights

salaam 3laikm :)
great work on this project. can you please re-upload the weights ?

I am getting the following error:
Data loss: Unable to open table file .\weights\model.ckpt-200000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

No alignment for Urdu

Hi,
I have used this model to train on my speech Urdu dataset. It contains 10000 .wav files of 15 hours speech dataset. Average file is size 5.4 sec. I used default parameters and trained for 50000 steps. I used transliterated labels and transliteration_cleaner. Here is the alignment.
step-50000-align

temp_filtered.csv

temp_filtered.csv contains 906 samples, do you have any rest data of Arabic Speech Corpus? As I download it. it has 1813 samples.
Thanks!

Error while runing the Demo server

when i run the Demo server i get this error and i didn't know how to fix it or what should i do !

Loading checkpoint: ./home/khalyl/Desktop/work/tacotron/arabic-tacotron-tts/model.ckpt-20000 Traceback (most recent call last): File "demo_server.py", line 91, in <module> synthesizer.load(args.checkpoint) File "/home/khalyl/Desktop/work/tacotron/arabic-tacotron-tts/synthesizer.py", line 26, in load saver.restore(self.session, checkpoint_path) File "/home/khalyl/anaconda3/envs/tacotronenv/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1715, in restore if not checkpoint_exists(compat.as_text(save_path)): File "/home/khalyl/anaconda3/envs/tacotronenv/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 2056, in checkpoint_exists if file_io.get_matching_files(pathname): File "/home/khalyl/anaconda3/envs/tacotronenv/lib/python3.5/site-packages/tensorflow/python/lib/io/file_io.py", line 342, in get_matching_files for single_filename in filename File "/home/khalyl/anaconda3/envs/tacotronenv/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: ./home/khalyl/Desktop/work/tacotron/arabic-tacotron-tts; No such file or directory

how todo this step : Point your browser at localhost:9200

WARNING:tensorflow:From /home/arabic-tacotron-tts/models/tacotron.py:56: MultiRNNCell.init (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.
Initialized Tacotron model. Dimensions:
embedding: 256
prenet out: 128
encoder out: 256
decoder out (5 frames): 400
decoder out (1 frame): 80
postnet out: 256
linear out: 1025
Loading checkpoint: demo/model.ckpt-200000
WARNING:tensorflow:From /home/arabic-tacotron-tts/synthesizer.py:23: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2022-04-24 22:52:06.595957: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
WARNING:tensorflow:From /home/arabic-tacotron-tts/synthesizer.py:24: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From /home/arabic-tacotron-tts/synthesizer.py:25: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

Serving on port 9200
الْق.........................................................................
when i add diacritized text , nothing is hapen
what should i add to complete the steps of pre-trained model

Time lag for execution on tensorflow cpu

This is more of a question rather than an issue, I installed arabic-tacotron on 2 machines, one with a i5 cpu and 4G Rams, the other i7 with 8G rams. On those machines it took arabic-tacotron 125 seconds (i5) and 95 seconds (i7) to produce the sound.
The english tacotron of Kiethito produces sound after 35 seconds on the i7 machine.
Wonder if you can share how much time arabic-tacotron takes to produce sound on machines that you installed it on, and what might be causing the big slow down compared to Kiethito's tacotron.
Also are there any plans to port it to tensor 2?

Python version support for TensorFlow

in the 'Quick start' section, it says "Use version 3.5 instead of newer python versions for TensorFlow support"
but on the TensorFlow website, they say "TensorFlow is tested and supported on the following 64-bit systems: Python 3.7–3.9"
so, can I ignore what you are saying about using python 3.5 instead of the newer version?

Demo without server

Hey,

i was wondering how can i demo the model without a server , for example the input is a txt file where i have put all text sentences i would like to be synthesized then i get set of wav files for each sentence

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.