Coder Social home page Coder Social logo

bruce2233 / gpt-sovits Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rvc-boss/gpt-sovits

0.0 0.0 0.0 2.44 MB

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

License: MIT License

Shell 0.04% Python 99.95% Batchfile 0.01%

gpt-sovits's Introduction

GPT-SoVITS-WebUI

A Powerful Few-shot Voice Conversion and Text-to-Speech WebUI.

madewithlove


Licence Huggingface

English | 中文简体 | 日本語


Check out our demo video here!

few.shot.fine.tuning.demo.mp4

Features:

  1. Zero-shot TTS: Input a 5-second vocal sample and experience instant text-to-speech conversion.

  2. Few-shot TTS: Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.

  3. Cross-lingual Support: Inference in languages different from the training dataset, currently supporting English, Japanese, and Chinese.

  4. WebUI Tools: Integrated tools include voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.

Environment Preparation

If you are a Windows user (tested with win>=10) you can install directly via the prezip. Just download the prezip, unzip it and double-click go-webui.bat to start GPT-SoVITS-WebUI.

Tested Environments

  • Python 3.9, PyTorch 2.0.1, CUDA 11
  • Python 3.10.13, PyTorch 2.1.2, CUDA 12.3

Note: numba==0.56.4 require py<3.11

Quick Install with Conda

conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
bash install.sh

Install Manually

Make sure you have the distutils for python3.9 installed

sudo apt-get install python3.9-distutils

Pip Packages

pip install torch numpy scipy tensorboard librosa==0.9.2 numba==0.56.4 pytorch-lightning gradio==3.14.0 ffmpeg-python onnxruntime tqdm cn2an pypinyin pyopenjtalk g2p_en chardet

Additional Requirements

If you need Chinese ASR (supported by FunASR), install:

pip install modelscope torchaudio sentencepiece funasr

FFmpeg

Conda Users
conda install ffmpeg
Ubuntu/Debian Users
sudo apt install ffmpeg
sudo apt install libsox-dev
conda install -c conda-forge 'ffmpeg<7'
MacOS Users
brew install ffmpeg
Windows Users

Download and place ffmpeg.exe and ffprobe.exe in the GPT-SoVITS root.

Pretrained Models

Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models.

For Chinese ASR (additionally), download models from Damo ASR Model, Damo VAD Model, and Damo Punc Model and place them in tools/damo_asr/models.

For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from UVR5 Weights and place them in tools/uvr5/uvr5_weights.

Dataset Format

The TTS annotation .list file format:

vocal_path|speaker_name|language|text

Language dictionary:

  • 'zh': Chinese
  • 'ja': Japanese
  • 'en': English

Example:

D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.

Todo List

  • High Priority:

    • Localization in Japanese and English.
    • User guide.
    • Japanese and English dataset fine tune training.
  • Features:

    • Zero-shot voice conversion (5s) / few-shot voice conversion (1min).
    • TTS speaking speed control.
    • Enhanced TTS emotion control.
    • Experiment with changing SoVITS token inputs to probability distribution of vocabs.
    • Improve English and Japanese text frontend.
    • Develop tiny and larger-sized TTS models.
    • Colab scripts.
    • Try expand training dataset (2k hours -> 10k hours).
    • better sovits base model (enhanced audio quality)
    • model mix

Credits

Special thanks to the following projects and contributors:

Thanks to all contributors for their efforts

gpt-sovits's People

Contributors

alexzhou1995 avatar anyacoder avatar blaise-tk avatar c4fun avatar d3lik avatar eltociear avatar elvizlai avatar erythrocyte3803 avatar http-404-usernotfound avatar kexul avatar miuzarte avatar pengoosedev avatar rafaelgodoyebert avatar ricecakey06 avatar rvc-boss avatar thestingerx avatar tps-f avatar yuan-manx avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.