Coder Social home page Coder Social logo

open_tts's Introduction

Russian Open Text To Speech (TTS) Dataset

Arguably the largest public Russian TTS dataset up to date:

  • ~5 000 voices;
  • ~13 000 hours;
  • (new!) A new domain - public speech with ~3 000 hours;
  • (new!) A new domain - radio with ~10 000 hours;
  • Speaker labels for new domains are coming soon!

Prove us wrong! Open issues, collaborate, submit a PR, contribute, share your datasets! Let's make TTS/STT in Russian (and more) as open and available as CV models.

Table of contents

Updates

Update 2019-11-04

New train datasets added:

  • 10,430 hours radio_v4;
  • 2,709 hours public_speech;
  • 154 hours radio_v4_add;
  • 5% sample of all new datasets with annotation.
  • Speaker labels are coming soon!
Click to expand
## **_Update 2019-06-28_**

`russian_young_male_1` added (~43 hours)

## **_Update 2019-05-24_**

It's alive!
Looking for collaborators)

Downloads

Links

Meta data file.Coming soon!

Voice Clips Hours GB Comment Links Md5sum
5% of radio + public_speech 469797 665 66,7 mp3+txt, manifest file 84397631475426f505babbb73b4197d9
radio 7,603,192 10,430 1,195 mp3, txt, manifest file, 7c2273a5b8c3cc10df3754dbe9c783e1
public_speech 1,700,060 2,709 301 mp3, txt, manifest file, d41f3f21d3cb9328de3cd6a530a70832
radio_add 92,679 157 18 mp3, txt, manifest file, ae00489678836b92e3a65d2ee8b51960
russian_middle_aged_male_1 45,311 64 9.7 Rnnoise wav+txt f1157d6dfd07c302c23cfe7dcb0298f5
russian_middle_aged_male_2 46,684 38 6.0 Rnnoise wav+txt 059ab6b3e5fa77319f7bf20e594fc133
russian_young_male_1 (tts_2) 118,536 43 4.9 wav+txt 403c90662beb51ac9a39d64b879e0f1b
total 9,606,462 13,446 1,535

Download instructions

End to end

download.sh or download.py with this config file. Please check the config first.

Manually

  1. Download each dataset separately:

Via wget

wget https://ru-open-stt.ams3.digitaloceanspaces.com/some_file

For multi-threaded downloads use aria2 with -x flag, i.e.

aria2c -c -x5 https://ru-open-stt.ams3.digitaloceanspaces.com/some_file
  1. Download the meta data.

Data collection / denoising / normalization methodology

The dataset is compiled using open domain sources.

Russian_middle_aged/young_male

Then the dataset is cleaned using the best ASR engine we have at hand and only items with CER less than 0.1 are left.

Then where applicable:

All files are normalized as follows:

  • Converted to mono, if necessary;
  • Converted to 22 kHz sampling rate, if necessary;
  • Stored as 16-bit integers;

22 kHz was chosen as an optimal rate used in the literature, though in real applications as low as 8kHz may suffice.

Radio/Public Speech

All files are normalized for easier / faster runtime augmentations and processing as follows:

  • Converted to mono, if necessary;
  • Converted to 16 kHz sampling rate, if necessary;
  • Stored as 32 kbps mp3;

Contacts

Please contact us here or just create a GitHub issue!

Authors in alphabetic order:

  • Anna Slizhikova;
  • Alexander Veysov;
  • Dmitry Voronin;
  • Yuri Baburov;

License

Dual license, cc-by-nc and commercial usage available after agreement with dataset authors.

Donations

Donate (each coffee pays for several full downloads) or via open_collective / use our DO referral link to help.

open_tts's People

Contributors

adamnsandle avatar islanna avatar snakers4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.