Arguably the largest public Russian TTS dataset up to date:
- ~5 000 voices;
- ~13 000 hours;
- (new!) A new domain - public speech with ~3 000 hours;
- (new!) A new domain - radio with ~10 000 hours;
- Speaker labels for new domains are coming soon!
Prove us wrong! Open issues, collaborate, submit a PR, contribute, share your datasets! Let's make TTS/STT in Russian (and more) as open and available as CV models.
Table of contents
New train datasets added:
- 10,430 hours radio_v4;
- 2,709 hours public_speech;
- 154 hours radio_v4_add;
- 5% sample of all new datasets with annotation.
- Speaker labels are coming soon!
Click to expand
## **_Update 2019-06-28_**
`russian_young_male_1` added (~43 hours)
## **_Update 2019-05-24_**
It's alive!
Looking for collaborators)
Meta data file.Coming soon!
Voice | Clips | Hours | GB | Comment | Links | Md5sum |
---|---|---|---|---|---|---|
5% of radio + public_speech | 469797 | 665 | 66,7 | mp3+txt, manifest file | 84397631475426f505babbb73b4197d9 |
|
radio | 7,603,192 | 10,430 | 1,195 | mp3, txt, manifest file, | 7c2273a5b8c3cc10df3754dbe9c783e1 |
|
public_speech | 1,700,060 | 2,709 | 301 | mp3, txt, manifest file, | d41f3f21d3cb9328de3cd6a530a70832 |
|
radio_add | 92,679 | 157 | 18 | mp3, txt, manifest file, | ae00489678836b92e3a65d2ee8b51960 |
|
russian_middle_aged_male_1 | 45,311 | 64 | 9.7 | Rnnoise | wav+txt | f1157d6dfd07c302c23cfe7dcb0298f5 |
russian_middle_aged_male_2 | 46,684 | 38 | 6.0 | Rnnoise | wav+txt | 059ab6b3e5fa77319f7bf20e594fc133 |
russian_young_male_1 (tts_2) | 118,536 | 43 | 4.9 | wav+txt | 403c90662beb51ac9a39d64b879e0f1b |
|
total | 9,606,462 | 13,446 | 1,535 |
download.sh
or download.py
with this config file. Please check the config first.
- Download each dataset separately:
Via wget
wget https://ru-open-stt.ams3.digitaloceanspaces.com/some_file
For multi-threaded downloads use aria2 with -x
flag, i.e.
aria2c -c -x5 https://ru-open-stt.ams3.digitaloceanspaces.com/some_file
- Download the meta data.
The dataset is compiled using open domain sources.
Then the dataset is cleaned using the best ASR engine we have at hand and only items with CER
less than 0.1
are left.
Then where applicable:
All files are normalized as follows:
- Converted to mono, if necessary;
- Converted to 22 kHz sampling rate, if necessary;
- Stored as 16-bit integers;
22 kHz was chosen as an optimal rate used in the literature, though in real applications as low as 8kHz may suffice.
All files are normalized for easier / faster runtime augmentations and processing as follows:
- Converted to mono, if necessary;
- Converted to 16 kHz sampling rate, if necessary;
- Stored as 32 kbps
mp3
;
Please contact us here or just create a GitHub issue!
Authors in alphabetic order:
- Anna Slizhikova;
- Alexander Veysov;
- Dmitry Voronin;
- Yuri Baburov;
Dual license, cc-by-nc and commercial usage available after agreement with dataset authors.
Donate (each coffee pays for several full downloads) or via open_collective / use our DO referral link to help.