Jean Du's Projects
the ali open source kaldi for dfsmn
ASR 2Pass onnxruntime and websocket server, based on FunASR(https://github.com/alibaba-damo-academy/FunASR).
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
use cnn/lstm and ensembling model to classify different documents, according to the api sequences each document calls.
πΈπ¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
LLM based TTS model, providing inference/training/deployment full-stack ability.
Config files for my GitHub profile.
EmotiVoice π: a Multi-Voice and Prompt-Controlled TTS Engine
End-to-End Speech Processing Toolkit
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
A Fundamental End-to-End Speech Recognition Toolkit
FSA/FST algorithms, differentiable, with PyTorch compatibility.
KWS demo based on CTC prefix beam search.
ModelScope: bring the notion of Model-as-a-Service to life.
Instant voice cloning by MyShell
Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva
Multilingual Voice Understanding Model
self-supervised vad
Production First and Production Ready End-to-End Keyword Spotting Toolkit
Production First and Production Ready End-to-End Speech Recognition Toolkit
A 10000+ hours dataset for Chinese speech recognition
Text Normalization & Inverse Text Normalization
Production First and Production Ready End-to-End Text-to-Speech Toolkit
Robust Speech Recognition via Large-Scale Weak Supervision
Port of OpenAI's Whisper model in C/C++