Coder Social home page Coder Social logo

anmol2059 / xtts2-ui Goto Github PK

View Code? Open in Web Editor NEW

This project forked from boltzmannentropy/xtts2-ui

0.0 0.0 0.0 5.12 MB

A User Interface for XTTS-2 Text-Based Voice Cloning using only 10 seconds of speech

License: MIT License

Python 99.45% Batchfile 0.55%

xtts2-ui's Introduction

XTTS-2-UI: A User Interface for XTTS-2 Text-Based Voice Cloning

This repository contains the essential code for cloning any voice using just text and a 10-second audio sample of the target voice. XTTS-2-UI is simple to setup and use. Example Results 🔊

Works in 16 languages and has in-built voice recording/uploading. Note: Don't expect EL level quality, it is not there yet.

Model

The model used is tts_models/multilingual/multi-dataset/xtts_v2. For more details, refer to Hugging Face - XTTS-v2 and its specific version XTTS-v2 Version 2.0.2.

Table of Contents

Setup

To set up this project, follow these steps in a terminal:

  1. Clone the Repository

    • Clone the repository to your local machine.
      git clone https://github.com/pbanuru/xtts2-ui.git
      cd xtts2-ui
  2. Create a Virtual Environment:

    • Run the following command to create a Python virtual environment:
      python -m venv venv
    • Activate the virtual environment:
      • Windows:

        # cmd prompt
        venv\Scripts\activate

        or

        # git bash
        source venv/Scripts/activate
      • Linux/Mac:

        source venv/bin/activate
  3. Install PyTorch:

    • If you have an Nvidia CUDA-Enabled GPU, choose the appropriate PyTorch installation command:
      • Before installing PyTorch, check your CUDA version by running:
        nvcc --version
      • For CUDA 12.1:
        pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
      • For CUDA 11.8:
        pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
    • If you don't have a CUDA-enabled GPU,: Follow the instructions on the PyTorch website to install the appropriate version of PyTorch for your system.
  4. Install Other Required Packages:

    • Install direct dependencies:
      pip install -r requirements.txt
    • Upgrade the TTS package to the latest version:
      pip install --upgrade TTS

After completing these steps, your setup should be complete and you can start using the project.

Models will be downloaded automatically upon first use.

Download paths:

  • MacOS: /Users/USR/Library/Application Support/tts/tts_models--multilingual--multi-dataset--xtts_v2
  • Windows: C:\Users\ YOUR-USER-ACCOUNT \AppData\Local\tts\tts_models--multilingual--multi-dataset--xtts_v2
  • Linux: /home/${USER}/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2

Inference

To run the application:

python app.py
OR
streamlit run app2.py 

Or, You can also run from the terminal itself, by providing sample input texts on texts.json and generate multiple audios with multiple speakers, (you may need to adjust on appTerminal.py)

python appTerminal.py

On initial use, you will need to agree to the terms:

[XTTS] Loading XTTS...
 > tts_models/multilingual/multi-dataset/xtts_v2 has been updated, clearing model cache...
 > You must agree to the terms of service to use this model.
 | > Please see the terms of service at https://coqui.ai/cpml.txt
 | > "I have read, understood and agreed to the Terms and Conditions." - [y/n]
 | | >

If your model is re-downloading each run, please consult Issue 4723 on GitHub.

Target Voices Dataset

The dataset consists of a single folder named targets, pre-populated with several voices for testing purposes.

To add more voices (if you don't want to go through the GUI), create a 24KHz WAV file of approximately 10 seconds and place it under the targets folder. You can use yt-dlp to download a voice from YouTube for cloning:

yt-dlp -x --audio-format wav "https://www.youtube.com/watch?"

Sample Audio Examples:

Language Audio Sample Link
English ▶️
Russian ▶️
Arabic ▶️

Language Support

Arabic, Chinese, Czech, Dutch, English, French, German, Hungarian, Italian, Japanese (see setup), Korean, Polish, Portuguese, Russian, Spanish, Turkish

Notes

If you would like to select Japanese as the target language, you must install a dictionary.

# Lite version
pip install fugashi[unidic-lite]

or for more serious processing:

# Full version
pip install fugashi[unidic]
python -m unidic download

More details here.

Credits

  1. Heavily based on https://github.com/kanttouchthis/text_generation_webui_xtts/

xtts2-ui's People

Contributors

boltzmannentropy avatar pbanuru avatar lunar-studio avatar anmol2059 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.