Coder Social home page Coder Social logo

krgauraw / indic-tts Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ai4bharat/indic-tts

0.0 0.0 0.0 581 KB

Text-to-Speech for languages of India

License: MIT License

Shell 0.62% JavaScript 0.46% Python 31.57% CSS 0.75% HTML 2.84% Jupyter Notebook 63.39% Dockerfile 0.37%

indic-tts's Introduction

AI4Bharat Indic-TTS

Towards Building Text-To-Speech Systems for the Next Billion Users

๐ŸŽ‰ Accepted at ICASSP 2023

Deep learning based text-to-speech (TTS) systems have been evolving rapidly with advances in model architectures, training methodologies, and generalization across speakers and languages. However, these advances have not been thoroughly investigated for Indian language speech synthesis. Such investigation is computationally expensive given the number and diversity of Indian languages, relatively lower resource availability, and the diverse set of advances in neural TTS that remain untested. In this paper, we evaluate the choice of acoustic models, vocoders, supplementary loss functions, training schedules, and speaker and language diversity for Dravidian and Indo-Aryan languages. Based on this, we identify monolingual models with FastPitch and HiFi-GAN V1, trained jointly on male and female speakers to perform the best. With this setup, we train and evaluate TTS models for 13 languages and find our models to significantly improve upon existing models in all languages as measured by mean opinion scores. We open-source all models on the Bhashini platform.

TL;DR: We open-source SOTA Text-To-Speech models for 13 Indian languages: Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Odia, Rajasthani, Tamil and Telugu.

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

Authors: Gokul Karthik Kumar*, Praveen S V*, Pratyush Kumar, Mitesh M. Khapra, Karthik Nandakumar

[ArXiv Preprint] [Audio Samples] [Try It Live] [Video]

Unified architecture of our TTS system

Results

Setup:

Environment Setup:

# 1. Create environment
sudo apt-get install libsndfile1-dev ffmpeg enchant
conda create -n tts-env
conda activate tts-env

# 2. Setup PyTorch
pip3 install -U torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

# 3. Setup Trainer
git clone https://github.com/gokulkarthik/Trainer 

cd Trainer
pip3 install -e .[all]
cd ..
[or]
cp Trainer/trainer/logging/wandb_logger.py to the local Trainer installation # fixed wandb logger
cp Trainer/trainer/trainer.py to the local Trainer installation # fixed model.module.test_log and added code to log epoch 
add `gpus = [str(gpu) for gpu in gpus]` in line 53 of trainer/distribute.py

# 4. Setup TTS
git clone https://github.com/gokulkarthik/TTS 

cd TTS
pip3 install -e .[all]
cd ..
[or]
cp TTS/TTS/bin/synthesize.py to the local TTS installation # added multiple output support for TTS.bin.synthesis

# 5. Install other requirements
> pip3 install -r requirements.txt

Data Setup:

  1. Format IndicTTS dataset in LJSpeech format using preprocessing/FormatDatasets.ipynb
  2. Analyze IndicTTS dataset to check TTS suitability using preprocessing/AnalyzeDataset.ipynb

Training Steps:

  1. Set the configuration with main.py, vocoder.py, configs and run.sh. Make sure to update the CUDA_VISIBLE_DEVICES in all these files.
  2. Train and test by executing sh run.sh

Inference:

Trained model weight and config files can be downloaded at this link.

python3 -m TTS.bin.synthesize --text <TEXT> \
    --model_path <LANG>/fastpitch/best_model.pth \
    --config_path <LANG>/config.json \
    --vocoder_path <LANG>/hifigan/best_model.pth \
    --vocoder_config_path <LANG>/hifigan/config.json \
    --out_path <OUT_PATH>

Code Reference: https://github.com/coqui-ai/TTS `

indic-tts's People

Contributors

gokulnc avatar svp19 avatar ashwin-014 avatar vibhanshujainiiitr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.