Coder Social home page Coder Social logo

robinhad / ukrainian-tts Goto Github PK

View Code? Open in Web Editor NEW
181.0 7.0 13.0 250 KB

Ukrainian TTS (text-to-speech) using ESPNET

Home Page: https://huggingface.co/spaces/robinhad/ukrainian-tts

License: MIT License

Python 94.99% Jupyter Notebook 5.01%
ukrainian tts coqui-ai ukrainian-language text-to-speech espnet espnetv2 speech-synthesis

ukrainian-tts's Introduction

title emoji colorFrom colorTo sdk sdk_version python_version app_file pinned
Ukrainian TTS
🐌
blue
yellow
gradio
3.40.1
3.10.3
app.py
false

Ukrainian TTS 📢🤖

Ukrainian TTS (text-to-speech) using ESPNET.

pytest Open In HF🤗 Space Open In Colab Open Bot chat

Link to online demo -> https://huggingface.co/spaces/robinhad/ukrainian-tts
Note: online demo saves user input to improve user experience; by using it, you consent to analyze this data.
Link to source code and models -> https://github.com/robinhad/ukrainian-tts
Telegram bot -> https://t.me/uk_tts_bot

Features ⚙️

  • Completely offline
  • Multiple voices
  • Automatic stress with priority queue: acute -> user-defined > dictionary > model
  • Control speech speed
  • Python package works on Windows, Mac (x86/M1), Linux(x86/ARM)
  • Inference on mobile devices (inference models through espnet_onnx without cleaners)

Support ❤️

If you like my work, please support ❤️ -> https://send.monobank.ua/jar/48iHq4xAXm
You're welcome to join UA Speech Recognition and Synthesis community: Telegram https://t.me/speech_recognition_uk

Examples 🤖

Oleksa (male):

oleksa.mp4
More voices 📢🤖

Tetiana (female):

tetiana.mp4

Dmytro (male):

dmytro.mp4

Lada (female):

lada.mp4

Mykyta (male):

mykyta.mp4

How to use: 📢

Quickstart

Install using:

!pip install git+https://github.com/robinhad/ukrainian-tts.git

Code example:

from ukrainian_tts.tts import TTS, Voices, Stress
import IPython.display as ipd

tts = TTS(device="cpu") # can try gpu, mps
with open("test.wav", mode="wb") as file:
    _, output_text = tts.tts("Привіт, як у тебе справи?", Voices.Dmytro.value, Stress.Dictionary.value, file)
print("Accented text:", output_text)

ipd.Audio(filename="test.wav")

See example notebook: tts_example.ipynb Open In Colab

How to contribute: 🙌

Look into this list with current problems: #35

How to train: 🏋️

Link to guide: training/STEPS.md

Attribution 🤝

ukrainian-tts's People

Contributors

kant2002 avatar noirtier-villefort avatar robinhad avatar serh007 avatar seriar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ukrainian-tts's Issues

Want 🐸TTS back

What happened? Why it's now working on ESPNET but not Coqui AI? The ESPNET models sound horrible and unrealistic.

Memory leak

Running model on small sentences (130 characters), memory grows rapidly 4GB+.
Target: constant memory usage, no more than 2 GB at least.

Handle numbers

  • Мало бути 2 людей -> Мало бути двоє людей
  • Car plate numbers
  • Вона народилась 1923 року -> Вона народилась тисяча дев'ятсот двадцять третього року.

Tidying up

  • Redo speakers list
  • Add features
  • More examples - bot, embedded?

Demo improvements

  • Attributions
    • Yehor Smoliakov for dataset
    • Oleksii Syvokon for ukrainian-word-stress

Phonemizer

  • Test available phonemizers: espeak, lang_uk
  • Should be portable - x86_64, Windows, ARM, Linux
  • Without assumptions or simplifications - output how a word should be pronounced by reference accent
  • Support for tweaking stress for user to override

Windows installation crashes

I have an issue with installing this module on my machine.

Setup:
Windows 10
Virtual Python Environment 3.9

Command executed: pip install git+https://github.com/robinhad/ukrainian-tts.git

Error:

Building wheels for collected packages: ctc-segmentation
  Building wheel for ctc-segmentation (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for ctc-segmentation (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-cpython-39
      creating build\lib.win-amd64-cpython-39\ctc_segmentation
      copying ctc_segmentation\ctc_segmentation.py -> build\lib.win-amd64-cpython-39\ctc_segmentation
      copying ctc_segmentation\partitioning.py -> build\lib.win-amd64-cpython-39\ctc_segmentation
      copying ctc_segmentation\__init__.py -> build\lib.win-amd64-cpython-39\ctc_segmentation
      running build_ext
      building 'ctc_segmentation.ctc_segmentation_dyn' extension
      creating build\temp.win-amd64-cpython-39
      creating build\temp.win-amd64-cpython-39\Release
      creating build\temp.win-amd64-cpython-39\Release\ctc_segmentation
      "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\dell\AppData\Local\Temp\pip-build-env-1a6sa50c\overlay\Lib\site-packages\numpy\core\include -ID:\Roman\Projects\Python\python-physical-activity\.venv_39\include -ID:\Roman\Projects\Python\python-physical-activity\.venv_39\Scripts\include -ID:\Roman\Projects\Python\python-physical-activity\.venv_39\Scripts\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" /Tcctc_segmentation/ctc_segmentation_dyn.c /Fobuild\temp.win-amd64-cpython-39\Release\ctc_segmentation/ctc_segmentation_dyn.obj
      ctc_segmentation_dyn.c
      ctc_segmentation/ctc_segmentation_dyn.c(38): fatal error C1083: Cannot open include file: 'Python.h': No such file or directory
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for ctc-segmentation
Failed to build ctc-segmentation
ERROR: Could not build wheels for ctc-segmentation, which is required to install pyproject.toml-based projects

Error import StressOption

Traceback (most recent call last):
File "/home/user/Soft/Python/mamba1/test.py", line 1, in
from ukrainian_tts.tts import TTS, Voices, StressOption
ImportError: cannot import name 'StressOption' from 'ukrainian_tts.tts'

TypeError: TTS.__init__() got an unexpected keyword argument 'cache_dir'

Я запустив оцей код в гугл колабі:

!pip install git+https://github.com/robinhad/ukrainian-tts.git


from ukrainian_tts.tts import TTS, Voices, Stress
import IPython.display as ipd

tts = TTS(device="cpu", cache_dir="model") # can try gpu, mps
with open("test.wav", mode="wb") as file:
    _, output_text = tts.tts("Привіт, як у тебе справи?", Voices.Dmytro.value, Stress.Dictionary.value, file)
print("Accented text:", output_text)

ipd.Audio(filename="test.wav")

І отримав поилку :
image

Що я роблю не так?

Add stress for words from the RID app

There is an app for learning Ukrainian words: RID. Unfortunately, it's not very well maintained and will be removed in June. But I was able to reverse-engineer it, and download all 9580 words from their servers. You can download the data from this repo: https://github.com/acmpo6ou/rid-words

Here is an example of a word file:

{
    "id":11,
    "title":"Талалай",
    "description":"Той, хто багато, беззмістовно говорить. «— Досі, — впав у річ сповідальник, — ти мені здавався більше талалаєм, ніж чистобрехою, а втім, не знаю, за кого тебе мати надалі» (Мігель де Сервантес «Премудрий гідальго Дон Кіхот з Ламанчі», перекл.\t Микола Лукаш).",
    "html_description":"\u003cp\u003eТой, хто багато, беззмістовно говорить.\u003c/p\u003e\r\n\r\n\u003cp\u003e\u003cem\u003e\u0026laquo;\u0026mdash; Досі, \u0026mdash; впав у річ сповідальник, \u0026mdash; ти мені здавався більше \u003cstrong\u003eталалаєм\u003c/strong\u003e, ніж чистобрехою, а втім, не знаю, за кого тебе мати надалі\u0026raquo; (Мігель де Сервантес \u0026laquo;Премудрий гідальго Дон Кіхот з Ламанчі\u0026raquo;, перекл. Микола Лукаш).\u003c/em\u003e\u003c/p\u003e\r\n",
    "word_category_id":2,
    "stresses":[
        6
    ],
    "word_images":[
        "/uploads/word_image/photo/16772/crop_version_ok-4946387_960_720.webp"
    ],
    "done":false,
    "favorite":false,
    "shared_link":"http://rid.ck.ua/sharing/talalaj"
}

The interesting fields are: title - the word itself, and stresses the array of the stresses for the word (a word can have multiple stresses).

Using this data you can expand your dictionary with more words and their stresses. I would contribute a PR myself, but I'm not sure how to. I found a file stress.trie that probably stores all stresses, but I'm not sure how to edit it.

Error with file: speakers.pth

FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Soft/Python/mamba1/TTS/vits_mykyta_latest-September-12-2022_12+38AM-829e2c24/speakers.pth'

Vits improvements

vitsArgs = VitsArgs(
    # hifi V3
    resblock_type_decoder = '2',
    upsample_rates_decoder = [8,8,4],
    upsample_kernel_sizes_decoder = [16,16,8],
    upsample_initial_channel_decoder = 256,
    resblock_kernel_sizes_decoder = [3,5,7],
    resblock_dilation_sizes_decoder = [[1,2], [2,6], [3,12]],
)

New stress model and benchmark

  • stress model for words not present in dictionary
  • automatic stress based on sentence (text?) level context
  • benchmark and dataset with edge cases to check against

Не можу встановити PIP

Не встановлюється бібліотека.

(.venv) ➜  Ukranian-tts (.venv) ➜  Ukranial-tts pip install git+https://github.com/robinhad/ukrainian-tts.git
Collecting git+https://github.com/robinhad/ukrainian-tts.git
  Cloning https://github.com/robinhad/ukrainian-tts.git to /tmp/pip-req-build-ax0lmxfq
  Running command git clone --filter=blob:none --quiet https://github.com/robinhad/ukrainian-tts.git /tmp/pip-req-build-ax0lmxfq
  Resolved https://github.com/robinhad/ukrainian-tts.git to commit 15d57e30f092bdcb34d9eba48c1d7900033835c7
  Preparing metadata (setup.py) ... done
Collecting num2words@ git+https://github.com/kant2002/num2words.git@kant/add-cases
  Cloning https://github.com/kant2002/num2words.git (to revision kant/add-cases) to /tmp/pip-install-pnae2v9s/num2words_22d7ec5d87cb48b6b86fba47cdc8dd13
  Running command git clone --filter=blob:none --quiet https://github.com/kant2002/num2words.git /tmp/pip-install-pnae2v9s/num2words_22d7ec5d87cb48b6b86fba47cdc8dd13
  WARNING: Did not find branch or tag 'kant/add-cases', assuming revision or ref.
  Running command git checkout -q kant/add-cases
  error: pathspec 'kant/add-cases' did not match any file(s) known to git
  error: subprocess-exited-with-error

  × git checkout -q kant/add-cases did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× git checkout -q kant/add-cases did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
(.venv) ➜  Ukranial-tts
zsh: parse error near `➜'
(.venv) ➜  Ukranian-tts

One vowel words in the end of the sentence aren't stressed

Input:


Бобер на березі з бобренятами бублики пік.

Боронила борона по боронованому полю.

Ішов Прокіп, кипів окріп, прийшов Прокіп - кипить окріп, як при Прокопі, так і при Прокопі і при Прокопенятах.

Сидить Прокоп — кипить окроп, Пішов Прокоп — кипить окроп. Як при Прокопові кипів окроп, Так і без Прокопа кипить окроп.

Result:


Боб+ер н+а березі з бобрен+ятами б+ублики пік.

Борон+ила борон+а п+о борон+ованому п+олю.

Іш+ов Пр+окіп, кип+ів окр+іп, прийш+ов Пр+окіп - кип+ить окр+іп, +як пр+и Пр+окопі, т+ак +і пр+и Пр+окопі +і пр+и Прокопенятах.

Сид+ить Прок+оп — кип+ить окроп, Піш+ов Прок+оп — кип+ить окроп. +Як пр+и Пр+окопові кип+ів окроп, Т+ак +і б+ез Пр+окопа кип+ить окроп.```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.