Coder Social home page Coder Social logo

kyubyong / css10 Goto Github PK

View Code? Open in Web Editor NEW
456.0 23.0 60.0 183.26 MB

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

License: Apache License 2.0

Jupyter Notebook 26.91% HTML 41.27% Python 31.82%
speech speech-to-text dataset

css10's Introduction

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

Abstract

We describe our development of CSS10, a collection of single speaker speech datasets for ten languages. It is composed of short audio clips from LibriVox audiobooks and their aligned texts. To validate its quality we train two neural text-to-speech models on each dataset. Subsequently, we conduct Mean Opinion Score tests on the synthesized speech samples. We make our datasets, pretrained models, and test resources publicly available. We hope they will be used for future speech tasks.

For details, check our paper. Kyubyong gave a talk with this paper at the workshop of 2018 The Korean Society of Speech Sciences.

Environments & Dependencies

  • Linux
  • Python 2.X or 3.X
  • TensorFlow == 1.3
  • NumPy
  • Librosa
  • Matplotlib
  • tqdm
  • scipy

Audiobooks & Datasets

Code Language Audiobook Running Time Reader Dataset
de German 1. Meister Floh
2. Die acht Gesichter am Biwasee
3. Auswahl aus Die Serapionsbrüder
16:42:45 Hokuspokus CSS German
el Greek Παραμύθι χωρίς όνομα (Tale Without Name) 04:08:14 Rapunzelina CSS Greek
es Spanish 1. Bailén
2. El 19 de Marzo y el 2 de Mayo
3. La Batalla de los Arapiles
23:49:49 Tux CSS Spanish
fi Finnish 1. Gulliverin matkat kaukaisilla mailla
2. Ensimmäiset novellit
3. Kaleri-orja
4. Salmelan heinätalkoot
10:32:03 Harri Tapani Ylilammi CSS Finnish
fr French 1. Les Misérables - tome 5 .
2. Arsène Lupin contre Herlock Sholmès
19:09:03 Gilles G. Le Blanc CSS French
hu Hungarian Egri csillagok 10:00:25 Diana Majlinger CSS Hungarian
ja Japanese 明暗 (Meian) 14:55:36 ekzemplaro CSS Japanese
nl Dutch 20.000 Mijlen onder Zee 14:06:40 Bart de Leeuw CSS Dutch
ru Russian 1. Ice March - Ледяной поход
2. Early Short Stories
3. Short Stories for Children and Adults
21:22:10 Mark Chulsky CSS Russian
zh Chinese 1. 朝花夕拾 (Chao Hua Si She))2. 呐喊 (Call to Arms) 06:27:04 Jing Li CSS Chinese

Pretrained Models & Audio Samples

Code Lanuage Pretrained Models Audio Samples
de German DCTTS | TACOTRON DCTTS | TACOTRON
el Greek DCTTS DCTTS
es Spanish DCTTS | TACOTRON DCTTS | TACOTRON
fi Finnish DCTTS | TACOTRON DCTTS | TACOTRON
fr French DCTTS | TACOTRON DCTTS | TACOTRON
hu Hungarian DCTTS | TACOTRON DCTTS | TACOTRON
ja Japanese DCTTS | TACOTRON DCTTS | TACOTRON
nl Dutch DCTTS | TACOTRON DCTTS | TACOTRON
ru Russian DCTTS | TACOTRON DCTTS | TACOTRON
zh Chinese DCTTS | TACOTRON DCTTS | TACOTRON

Cite

@article{park2019css10,
  title={CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages},
  author={Park, Kyubyong and Mulc, Thomas},
  journal={Interspeech},
  year={2019}
}

By Kyubyong Park, Tommy Mulc

css10's People

Contributors

kyubyong avatar tmulc18 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

css10's Issues

Numbers or ( ) are not considered by the models

In automatically extracted sentences, both can appear. Looks like numbers can be handled by NTLK but "(" seems harder to handle, since they are associated to an "inflection" in the intonation.

Brazilian Portuguese Model

Could you make a database available in Brazilian Portuguese? If not, could you guide me on how to train one like the databases you made available?

Output node names

Hi,

I would like to freeze the pretrained Tacotron model (French) but I can't figure out what the output node names are. I tried various tools to visualize the model but none of them succeeded on my (old) machine because of the model size.

Would you mind sharing this information?

Thank you for your support and for publishing your work publicly.

Can't use GPU

Hi,

I was able to test the synthesize function (synthesize.py) on CPU with success.
But when I tried to use GPU, I have faced with different issues.
First, I tried to use tensorflow-gpu==1.3.0, but according to this chart: https://www.tensorflow.org/install/source#gpu, it requires CUDA 8, and according to this list: https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/supported-tags.md, I could use only Ubuntu 16.04 with an nvidia docker base image for CUDA 8.0, but I have failed with the installation of the requirements on Ubuntu 16.04.
As a second step, I have tried to use tensorflow-gpu==1.5.0 with CUDA 9, but the Nvidia base image for Ubuntu 18.04 support only CUDA 9.2, and not 9.0, and those looks uncompatible...
As a third step, I have tried tensorflow-gpu==1.13.1 with CUDA 10.0, with a CUDA 10.0 based Ubuntu 18.04 base docker image.
Finally, tensorflow can detect the GPU, but the session initialization (sess,run()) takes forever, and eats up all the GPU memory.
I have tried to limit the memory usage, and then the session initialization could finish after more than 4 minutes, but the Feed Forward just stuck at the very beginning, no progress at all within a few minutes.

Any ideas or suggestion? What am I doing wrong?

Thanks!

Synthesis example

Hi there, I am new to this project.

Would you please give an example of using pre-trained model to synthesize a new audio?

Vocab for japanese

i'm not found vocab in model DCTSS for japanese, you can share with me, thank.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.