Coder Social home page Coder Social logo

adambear / yourtts Goto Github PK

View Code? Open in Web Editor NEW

This project forked from edresson/yourtts

0.0 1.0 0.0 419.75 MB

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

License: Other

Python 0.01% Jupyter Notebook 99.99%

yourtts's Introduction

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

In our recent paper we propose the YourTTS model. YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening possibilities for zero-shot multi-speaker TTS and zero-shot voice conversion systems in low-resource languages. Finally, it is possible to fine-tune the YourTTS model with less than 1 minute of speech and achieve state-of-the-art results in voice similarity and with reasonable quality. This is important to allow synthesis for speakers with a very different voice or recording characteristics from those seen during training.

Production version

Come try our latest and greatest fullband english only model https://coqui.ai/

Audios samples

Visit our website for audio samples.

Implementation

All of our experiments were implemented on the Coqui TTS repo.

Colab Demos

Demo URL
Zero-Shot TTS link
Zero-Shot VC link
Zero-Shot VC - Experiment 1 (trained with just VCTK) link

Checkpoints

All the released checkpoints are licensed under CC BY-NC-ND 4.0

Model URL
Speaker Encoder link
Exp 1. YourTTS-EN(VCTK) link
Exp 1. YourTTS-EN(VCTK) + SCL link
Exp 2. YourTTS-EN(VCTK)-PT link
Exp 2. YourTTS-EN(VCTK)-PT + SCL link
Exp 3. YourTTS-EN(VCTK)-PT-FR link
Exp 3. YourTTS-EN(VCTK)-PT-FR SCL link
Exp 4. YourTTS-EN(VCTK+LibriTTS)-PT-FR SCL link

Coqui TTS released model

TTS

To use the ๐Ÿธ TTS version v0.7.0 released YourTTS model for Text-to-Speech use the following command:

tts  --text "This is an example!" --model_name tts_models/multilingual/multi-dataset/your_tts  --speaker_wav target_speaker_wav.wav --language_idx "en"

Considering the "target_speaker_wav.wav" an audio sample from the target speaker.

Voice conversion

To use the ๐Ÿธ TTS released YourTTS model for voice conversion use the following command:

tts --model_name tts_models/multilingual/multi-dataset/your_tts  --speaker_wav target_speaker_wav.wav --reference_wav  target_content_wav.wav --language_idx "en"

Considering the "target_content_wav.wav" as the reference wave file to convert into the voice of the "target_speaker_wav.wav" speaker.

Results replicability

To insure replicability, we make the audios used to generate the MOS available here. In addition, we provide the MOS for each audio here.

To re-generate our MOS results, follow the instructions here. To predict the test sentences and generate the SECS, please use the Jupyter Notebooks available here.

Test Speakers:

LibriTTS (test clean): 1188, 1995, 260, 1284, 2300, 237, 908, 1580, 121 and 1089

VCTK: p261, p225, p294, p347, p238, p234, p248, p335, p245, p326 and p302

MLS Portuguese: 12710, 5677, 12249, 12287, 9351, 11995, 7925, 3050, 4367 and 13069

Citation


@ARTICLE{2021arXiv211202418C,
  author = {{Casanova}, Edresson and {Weber}, Julian and {Shulby}, Christopher and {Junior}, Arnaldo Candido and {G{\"o}lge}, Eren and {Antonelli Ponti}, Moacir},
  title = "{YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone}",
  journal = {arXiv e-prints},
  keywords = {Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing},
  year = 2021,
  month = dec,
  eid = {arXiv:2112.02418},
  pages = {arXiv:2112.02418},
  archivePrefix = {arXiv},
  eprint = {2112.02418},
  primaryClass = {cs.SD},
  adsurl = {https://ui.adsabs.harvard.edu/abs/2021arXiv211202418C},
  adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

yourtts's People

Contributors

edresson avatar weberjulian avatar yuripourre avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.