Russian Open Text To Speech (TTS) Dataset

Arguably the largest public Russian TTS dataset up to date:

~5 000 voices;
~13 000 hours;
(new!) A new domain - public speech with ~3 000 hours;
(new!) A new domain - radio with ~10 000 hours;
Speaker labels for new domains are coming soon!

Prove us wrong! Open issues, collaborate, submit a PR, contribute, share your datasets! Let's make TTS/STT in Russian (and more) as open and available as CV models.

Table of contents

Updates

Update 2019-11-04

New train datasets added:

10,430 hours radio_v4;
2,709 hours public_speech;
154 hours radio_v4_add;
5% sample of all new datasets with annotation.
Speaker labels are coming soon!

Click to expand

## **_Update 2019-06-28_**

`russian_young_male_1` added (~43 hours)

## **_Update 2019-05-24_**

It's alive!
Looking for collaborators)

Downloads

Links

~~Meta data file.~~Coming soon!

Voice	Clips	Hours	GB	Comment	Links	Md5sum
5% of radio + public_speech	469797	665	66,7		mp3+txt, manifest file	`84397631475426f505babbb73b4197d9`
radio	7,603,192	10,430	1,195		mp3, txt, manifest file,	`7c2273a5b8c3cc10df3754dbe9c783e1`
public_speech	1,700,060	2,709	301		mp3, txt, manifest file,	`d41f3f21d3cb9328de3cd6a530a70832`
radio_add	92,679	157	18		mp3, txt, manifest file,	`ae00489678836b92e3a65d2ee8b51960`
russian_middle_aged_male_1	45,311	64	9.7	Rnnoise	wav+txt	`f1157d6dfd07c302c23cfe7dcb0298f5`
russian_middle_aged_male_2	46,684	38	6.0	Rnnoise	wav+txt	`059ab6b3e5fa77319f7bf20e594fc133`
russian_young_male_1 (tts_2)	118,536	43	4.9		wav+txt	`403c90662beb51ac9a39d64b879e0f1b`
total	9,606,462	13,446	1,535

Download instructions

End to end

download.sh or download.py with this config file. Please check the config first.

Manually

Download each dataset separately:

Via wget

wget https://ru-open-stt.ams3.digitaloceanspaces.com/some_file

For multi-threaded downloads use aria2 with -x flag, i.e.

aria2c -c -x5 https://ru-open-stt.ams3.digitaloceanspaces.com/some_file

Download the meta data.

Data collection / denoising / normalization methodology

The dataset is compiled using open domain sources.

Russian_middle_aged/young_male

Then the dataset is cleaned using the best ASR engine we have at hand and only items with CER less than 0.1 are left.

Then where applicable:

Spectral gating / de-noising is applied;
Rnnoise is applied;

All files are normalized as follows:

Converted to mono, if necessary;
Converted to 22 kHz sampling rate, if necessary;
Stored as 16-bit integers;

22 kHz was chosen as an optimal rate used in the literature, though in real applications as low as 8kHz may suffice.

Radio/Public Speech

All files are normalized for easier / faster runtime augmentations and processing as follows:

Converted to mono, if necessary;
Converted to 16 kHz sampling rate, if necessary;
Stored as 32 kbps mp3;

Contacts

Please contact us here or just create a GitHub issue!

Authors in alphabetic order:

Anna Slizhikova;
Alexander Veysov;
Dmitry Voronin;
Yuri Baburov;

License

Dual license, cc-by-nc and commercial usage available after agreement with dataset authors.

Donations

Donate (each coffee pays for several full downloads) or via open_collective / use our DO referral link to help.

ishine / open_tts Goto Github PK

open_tts's Introduction

Russian Open Text To Speech (TTS) Dataset

Updates

Update 2019-11-04

Downloads

Links

Download instructions

End to end

Manually

Data collection / denoising / normalization methodology

Russian_middle_aged/young_male

Radio/Public Speech

Contacts

License

Donations

open_tts's People

Contributors

Stargazers

Watchers

Recommend Projects

Recommend Topics

Recommend Org