cafew / whisper-diarization Goto Github PK

View Code? Open in Web Editor NEW

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

License: BSD 2-Clause "Simplified" License

Python 100.00%

whisper-diarization's Introduction

Speaker Diarization Using OpenAI Whisper

Speaker Diarization pipeline based on OpenAI Whisper I'd like to thank @m-bain for Wav2Vec2 forced alignment, @mu4farooqi for punctuation realignment algorithm

This work is based on OpenAI's Whisper , Nvidia NeMo , and Facebook's Demucs

Please, star the project on github (see top-right corner) if you appreciate my contribution to the community!

What is it

This repository combines Whisper ASR capabilities with Voice Activity Detection (VAD) and Speaker Embedding to identify the speaker for each sentence in the transcription generated by Whisper. First, the vocals are extracted from the audio to increase the speaker embedding accuracy, then the transcription is generated using Whisper, then the timestamps are corrected and aligned using WhisperX to help minimize diarization error due to time shift. The audio is then passed into MarbleNet for VAD and segmentation to exclude silences, TitaNet is then used to extract speaker embeddings to identify the speaker for each segment, the result is then associated with the timestamps generated by WhisperX to detect the speaker for each word based on timestamps and then realigned using punctuation models to compensate for minor time shifts.

Whisper, WhisperX and NeMo parameters are coded into diarize.py and helpers.py, I will add the CLI arguments to change them later

Usage

python diarize.py -a AUDIO_FILE_NAME

Known Limitations

Only tested on english but several other languages are supported
Overlapping speakers are yet to be addressed, a possible approach would be to separate the audio file and isolate only one speaker, then feed it into the pipeline but this will need much more computation
There might be some errors, please raise an issue if you encounter any.

Aknowledgements

Special Thanks for @adamjonas for supporting this project

Recommend Projects

cafew / whisper-diarization Goto Github PK

whisper-diarization's Introduction

Speaker Diarization Using OpenAI Whisper

What is it

Usage

Known Limitations

Aknowledgements

whisper-diarization's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent