Speech Synthesis for Gronings

In this project, we implement several state-of-the-art Text-to-Speech (TTS) architectures for Gronings, a Low Saxon language spoken in the province of Groningen and around the Groningen border in Drenthe and Friesland in the Netherlands.¹

To build TTS systems for Gronings, ESPnet2, a speech processing toolkit has been utilized. The setup configuration and installation steps that have been followed based on the original documentation to develop the TTS systems are documented below.

Install ESPnet2 locally

Setup Configuration

For the experiments, the following setup has been used. It is not necessary to have this exact configuration, however, compatibility between different versions must be ensured.

Ubuntu 20.04 LTS
Python 3.8.12
CUDA version 11.1 (run nvcc -V to check it)
CUDA Driver version 470.103.01 (run nvidia-smi to check it)
CUDA version 11.4 (run nvidia-smi to check it)
PyTorch 1.10.1+cu111

Step 1: Install the following packages

cmake
sox
sndfile
ffmpeg
flac

The following command will install all the above packages.

 $ sudo apt-get install cmake sox libsndfile1-dev ffmpeg flac

Step 2: Installation

Git clone the ESPnet repo

$ cd <any-place>
$ git clone https://github.com/espnet/espnet

Setup Anaconda Environment

You have to create <espnet-root>/tools/activate_python.sh. to specify the Python interpreter used in ESPnet recipes. To do so:

$ cd <espnet-root>/tools
$ ./setup_anaconda.sh [output-dir-name|default=venv] [conda-env-name|default=root] [python-version|default=none]
# e.g.
$ ./setup_anaconda.sh anaconda espnet 3.8

Install ESPnet

The Makefile tries to install ESPnet and all dependencies including PyTorch. You can specify the PyTorch version (must be compatible with your CUDA version), for example:

$ cd <espnet-root>/tools
$ make TH_VERSION=1.10.1+cu111

Note that the CUDA version is derived from the nvcc command. Alternatively, you can also specify the CUDA version.

$ cd <espnet-root>/tools
$ make TH_VERSION=1.10.1+cu111 CUDA_VERSION=11.1

Step 3: Check Installation

Note that all the packages are not required to be installed for TTS development.

$ cd <espnet-root>/tools
$ . ./activate_python.sh; python3 check_install.py

Text-to-Speech Systems

The following architectures and neural vocoders have been implemented for Gronings:

Architecture
Neural Vocoder
- Parallel Wavegan
- Hifi-gan

FastSpeech 2 has been implemented in two ways.

Using Tacotron 2 as the Teacher Forced Aligner
Using Montreal Forced Aligner to get the alignments

The procedure of training the architectures and vocoders can be found in recipe and neural vocoder.

Results, Online Demo and Pre-trained Models

Results

You can listen to the generated samples from here.

Dataset	Architecture	Vocoder	Mean Opinion Score (MOS)
Gronings	Ground Truth	-	-
Gronings	Tacotron 2	Parallel Wavegan	-
Gronings	FastSpeech 2	Parallel Wavegan	-
Gronings	Conformer FastSpeech 2	Parallel Wavegan	-
Gronings	Tacotron 2	Hifi-gan	-
Gronings	FastSpeech 2	Hifi-gan	-
Gronings	Conformer FastSpeech 2	Hifi-gan	-

Online Demo

The real-time demo is available on HuggingFace!

FastSpeech 2 (using Tacotron 2 as Teacher Forced Aligner) and a pre-trained Parallel Wavegan vocoder have been used here. This vocoder is pre-trained on English data since the current ESPnet+HuggingFace integration does not allow to use vocoder trained on custom data.

Pre-trained Models

The following models are trained on approx. 2 hours of Gronings speech data and can be available on HuggingFace!

Fast Speech 2 (using Tacotron 2 as Teacher Forced Aligner)
Tacotron 2
Parallel Wavegan vocoder

References

https://en.wikipedia.org/wiki/Gronings_dialect. ↩

samin9796 / gro-tts Goto Github PK

gro-tts's Introduction

Speech Synthesis for Gronings

Install ESPnet2 locally

Setup Configuration

Step 1: Install the following packages

Step 2: Installation

Step 3: Check Installation

Text-to-Speech Systems

Results, Online Demo and Pre-trained Models

References

gro-tts's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

samin9796 / gro-tts Goto Github PK

gro-tts's Introduction

Speech Synthesis for Gronings

Install ESPnet2 locally

Setup Configuration

Step 1: Install the following packages

Step 2: Installation

Step 3: Check Installation

Text-to-Speech Systems

Results, Online Demo and Pre-trained Models

References

Footnotes

gro-tts's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org