nangongmujd,github

amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

athena

an open-source implementation of sequence-to-sequence based speech processing engine

bark

🔊 Text-Prompted Generative Audio Model

bert-vits2

vits2 backbone with bert

denoisenet

An implementation of DenoiseNet https://arxiv.org/pdf/1701.01687.pdf

diffsinger

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

faceswap

3D face swapping implemented in Python

forwardtacotron

⏩ Generating speech in a single forward pass without any attention!

fullsubnet

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

generspeech

PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.

gpt-sovits

1 mins voice data can also be used to train a good TTS model!

hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

istftnet-pytorch

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

megatts2

Unoffical implementation of Megatts2

minigpt-4

MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models

montreal-forced-aligner

Command line utility for forced alignment using Kaldi

mtts

A Demo of Mandarin/Chinese TTS frontend

natspeech

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

niftynet

[unmaintained] An open-source convolutional neural networks platform for research in medical image analysis and image-guided therapy

opentransformer

A No-Recurrence Sequence-to-Sequence Model for Speech Recognition

paddlespeech

Easy-to-use Speech Toolkit including SOTA/Streaming ASR with punctuation, influential TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.