Coder Social home page Coder Social logo

thinhlpg / vixtts-demo Goto Github PK

View Code? Open in Web Editor NEW
245.0 8.0 113.0 854 KB

A Vietnamese Voice Text-to-Speech Model ✨

Home Page: https://huggingface.co/spaces/thinhlpg/vixtts-demo

Shell 4.30% Jupyter Notebook 58.32% Python 37.38%
text-to-speech vietnamese

vixtts-demo's Introduction

viXTTS Demo 🗣️🔥

Sử dụng nhanh ✨

👉 Truy cập https://huggingface.co/spaces/thinhlpg/vixtts-demo để dùng ngay mà không cần cài đặt.

Introduction 👋

viXTTS is a text-to-speech voice generation tool that offers voice cloning voices in Vietnamese and other languages. This model is a fine-tuned version based on the XTTS-v2.0.3 model, utilizing the viVoice dataset. This repository is primarily intended for demostration purposes.

The model can be accessed at: viXTTS on Hugging Face

Online usage (Recommended)

Local Usage

This code is specifically designed for running on Ubuntu or WSL2. It is not intended for use on macOS or Windows systems. viXTTS Gradio Demo

Hardware Recommendations

  • At least 10GB of free disk space
  • At least 16GB of RAM
  • Nvidia GPU with a minimum of 4GB of VRAM
  • By default, the model will utilize the GPU. In the absence of a GPU, it will run on the CPU and run much slower.

Required Software

  • Git
  • Python version >=3.9 and <= 3.11. The default version is set to 3.11, but you can modify the Python version in the run.sh file.

Usage

git clone https://github.com/thinhlpg/vixtts-demo
cd vixtts-demo
./run.sh
  1. Run run.sh (dependencies will be automatically installed for the first run).
  2. Access the Gradio demo link.
  3. Load the model and wait for it to load.
  4. Inference and Enjoy 🤗
  5. The result will be saved in output/

Limitation

  • Subpar performance for input sentences under 10 words in Vietnamese language (yielding inconsistent output and odd trailing sounds).
  • This model is only fine-tuned in Vietnamese. The model's effectiveness with languages other than Vietnamese hasn't been tested, potentially reducing quality.

Contributions

This project is not being actively maintained, and I do not plan to release the finetuning code due to sensitive reasons, as it might be used for unethical purposes. If you want to contribute by creating versions for other operating systems, such as Windows or macOS, please fork the repository, create a new branch, test thoroughly on the respective OS, and submit a pull request specifying your contributions.

Acknowledgements

We would like to express our gratitude to all libraries, and resources that have played a role in the development of this demo, especially:

Citation

@misc{viVoice,
  author = {Thinh Le Phuoc Gia, Tuan Pham Minh, Hung Nguyen Quoc, Trung Nguyen Quoc, Vinh Truong Hoang},
  title = {viVoice: Enabling Vietnamese Multi-Speaker Speech Synthesis},
  url = {https://github.com/thinhlpg/viVoice},
  year = {2024}
}

A manuscript and a friendly dev log documenting the process might be made available later (including other works that were experimented with, but details about the filtering process are not specified in this README file).

Contact 💬

vixtts-demo's People

Contributors

thinhlpg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vixtts-demo's Issues

cho mình hỏi về vocab file ?

mình thử train xtts v2 bằng modify của mình . mình thay đổi các file tokenizer và file xtts config để nó nhận tiếng việt, mình có thay đổi file vocab thì nó gặp lổi này size mismatch for gpt.text_embedding.weight: copying a param with shape torch.Size([6681, 1024]) from checkpoint, the shape in current model is torch.Size([7544, 1024]). size mismatch for gpt.text_head.weight: copying a param with shape torch.Size([6681, 1024]) from checkpoint, the shape in current model is torch.Size([7544, 1024]). size mismatch for gpt.text_head.bias: copying a param with shape torch.Size([6681]) from checkpoint, the shape in current model is torch.Size([7544]).

bạn có kinh nghiệm chỉ mình cái với , cám ơn bạn

Great job!

Thank you and your teammates for a great job!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.