Table of Contents
Warning
This project is still a work in progress! See Issues for a list of known bugs and missing features.
whisper-live
is a CLI tool for real-time audio transcription using Whisper-based models.
It currently loads models using the HuggingFace transformers library.
The implementation is highly based on https://github.com/davabase/whisper_real_time. More advanced logic could be considered for future releases.
The process is as follows:
- The audio is recorded via the microphone using a background thread
- Every N seconds, an audio chunk is created. N is defined by the
--recording-duration
argument. - The audio chunk is sent to the model for transcription.
- The transcription is printed to the console.
Note: audio chunks (and transcriptions) are not aware of previous chunks. Once a chunk is processed, its audio is discarded. Once a transcription is printed, its text is not considered anymore.
- Clone this repo
- Make sure you have
hatch
installed. If not, follow the guide here. - Create a virtual environment
hatch env create
and activate ithatch shell
- Install
pre-commit
hooks for linting:pre-commit install
Warning
No unit tests are currently implemented!
whisper-live
is distributed under the terms of the MIT license.
This project was inspired by: