Coder Social home page Coder Social logo

jeffgyf / emotional-vits Goto Github PK

View Code? Open in Web Editor NEW

This project forked from innnky/emotional-vits

0.0 0.0 0.0 38.74 MB

无需情感标注的情感可控语音合成模型,基于VITS

License: MIT License

Python 2.19% Jupyter Notebook 97.80% Cython 0.02%

emotional-vits's Introduction

Emotional VITS

Hugging Face Spaces

在线demo ↑↑↑ bilibili demo

数据集无需任何情感标注,通过情感提取模型 提取语句情感embedding输入网络,实现情感可控的VITS合成

模型结构

  • 相对于原版VITS仅修改了TextEncoder部分 image-20221029104949567

模型的优缺点介绍

该模型缺点:

  • 推理时需要指定一个音频作为情感的参考音频才能够合成音频,而模型本身并不知道“激动”、“平静”这类表示情绪的词语对应的情感特征是什么。
  • 对于只有一个角色的模型,可以通过预先筛选的方式,即手动挑选几条“激动”、“平静”、“小声”之类的音频,手动实现情感文本->情感embedding的对应关系 (这个过程可以用聚类算法 简化筛选)
  • 对于有多个角色的模型,上述预筛选的方式有局限性,因为例如同样对于“平静”这一个情感而言,不同角色对应的情感embedding可能会不同,导致建立情感文本->情感embedding的映射关系很繁琐,很难通过一套统一的标准去描述不同角色之间的相似情感

该模型的优点:

  • 任何普通的TTS数据集均可以完成情感控制。无需手动打情感标签。
  • 由于在训练时候并没有指定情感的文本与embedding的对应关系,所有的情感特征embedding均在一个连续的空间内
  • 因此理论上对于任意角色数据集中出现的情感,推理时均可以通过该模型实现合成,只需要输入目标情感音频对应的embedding即可,而不会受到情感分类数量限制

快速挑选各个情感对应的音频

可以使用 聚类算法 自动对音频的情感embedding进行分类,大致上可以区分出情感差异较大的各个类别,具体使用请参考 emotion_clustering.ipynb

Pre-requisites

  1. Python >= 3.6
  2. Clone this repository
  3. Install python requirements. Please refer requirements.txt
  4. prepare datasets
  5. Build Monotonic Alignment Search and run preprocessing if you use your own datasets.
# Cython-version Monotonoic Alignment Search
cd monotonic_align
python setup.py build_ext --inplace

# Preprocessing (g2p) for your own datasets. Preprocessed phonemes for nene have been already provided.
python preprocess.py --text_index 2 --filelists filelists/train.txt filelists/val.txt --text_cleaners japanese_cleaners

  1. extract emotional embeddings, this will generate *.emo.npy for each wav file.
python emotion_extract.py --filelists filelists/train.txt filelists/val.txt

Training Exmaple

# nene
python train_ms.py -c configs/nene.json -m nene

# if you are fine tuning pretrained original VITS checkpoint ,
python train_ms.py -c configs/nene.json -m nene --ckptD /path/to/D_xxxx.pth --ckptG /path/to/G_xxxx.pth

Inference Example

See inference.ipynb or use MoeGoe

emotional-vits's People

Contributors

innnky avatar jaywalnut310 avatar jik876 avatar eltociear avatar juheeuu avatar wind4000 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.