Coder Social home page Coder Social logo

gro-tts's Introduction

Speech Synthesis for Gronings

In this project, we implement several state-of-the-art Text-to-Speech (TTS) architectures for Gronings, a Low Saxon language spoken in the province of Groningen and around the Groningen border in Drenthe and Friesland in the Netherlands.1

To build TTS systems for Gronings, ESPnet2, a speech processing toolkit has been utilized. The setup configuration and installation steps that have been followed based on the original documentation to develop the TTS systems are documented below.

Install ESPnet2 locally

Setup Configuration

For the experiments, the following setup has been used. It is not necessary to have this exact configuration, however, compatibility between different versions must be ensured.

  • Ubuntu 20.04 LTS
  • Python 3.8.12
  • CUDA version 11.1 (run nvcc -V to check it)
  • CUDA Driver version 470.103.01 (run nvidia-smi to check it)
  • CUDA version 11.4 (run nvidia-smi to check it)
  • PyTorch 1.10.1+cu111

Step 1: Install the following packages

  • cmake
  • sox
  • sndfile
  • ffmpeg
  • flac

The following command will install all the above packages.

 $ sudo apt-get install cmake sox libsndfile1-dev ffmpeg flac

Step 2: Installation

  1. Git clone the ESPnet repo
$ cd <any-place>
$ git clone https://github.com/espnet/espnet
  1. Setup Anaconda Environment

You have to create <espnet-root>/tools/activate_python.sh. to specify the Python interpreter used in ESPnet recipes. To do so:

$ cd <espnet-root>/tools
$ ./setup_anaconda.sh [output-dir-name|default=venv] [conda-env-name|default=root] [python-version|default=none]
# e.g.
$ ./setup_anaconda.sh anaconda espnet 3.8
  1. Install ESPnet

The Makefile tries to install ESPnet and all dependencies including PyTorch. You can specify the PyTorch version (must be compatible with your CUDA version), for example:

$ cd <espnet-root>/tools
$ make TH_VERSION=1.10.1+cu111

Note that the CUDA version is derived from the nvcc command. Alternatively, you can also specify the CUDA version.

$ cd <espnet-root>/tools
$ make TH_VERSION=1.10.1+cu111 CUDA_VERSION=11.1

Step 3: Check Installation

Note that all the packages are not required to be installed for TTS development.

$ cd <espnet-root>/tools
$ . ./activate_python.sh; python3 check_install.py

Text-to-Speech Systems

The following architectures and neural vocoders have been implemented for Gronings:

FastSpeech 2 has been implemented in two ways.

  1. Using Tacotron 2 as the Teacher Forced Aligner
  2. Using Montreal Forced Aligner to get the alignments

The procedure of training the architectures and vocoders can be found in recipe and neural vocoder.

Results, Online Demo and Pre-trained Models

Results

You can listen to the generated samples from here.

Dataset Architecture Vocoder Mean Opinion Score (MOS)
Gronings Ground Truth - -
Gronings Tacotron 2 Parallel Wavegan -
Gronings FastSpeech 2 Parallel Wavegan -
Gronings Conformer FastSpeech 2 Parallel Wavegan -
Gronings Tacotron 2 Hifi-gan -
Gronings FastSpeech 2 Hifi-gan -
Gronings Conformer FastSpeech 2 Hifi-gan -
Online Demo

The real-time demo is available on HuggingFace!

FastSpeech 2 (using Tacotron 2 as Teacher Forced Aligner) and a pre-trained Parallel Wavegan vocoder have been used here. This vocoder is pre-trained on English data since the current ESPnet+HuggingFace integration does not allow to use vocoder trained on custom data.

Pre-trained Models

The following models are trained on approx. 2 hours of Gronings speech data and can be available on HuggingFace!

References

Footnotes

  1. https://en.wikipedia.org/wiki/Gronings_dialect. โ†ฉ

gro-tts's People

Contributors

samin9796 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.