whisper-live's Introduction

Whisper Live

Table of Contents

Description
Contributing
License
Inspirations and Related Work

Description

Warning

This project is still a work in progress! See Issues for a list of known bugs and missing features.

whisper-live is a CLI tool for real-time audio transcription using Whisper-based models.

It currently loads models using the HuggingFace transformers library.

Implementation logic

The implementation is highly based on https://github.com/davabase/whisper_real_time. More advanced logic could be considered for future releases.

The process is as follows:

The audio is recorded via the microphone using a background thread
Every N seconds, an audio chunk is created. N is defined by the --recording-duration argument.
The audio chunk is sent to the model for transcription.
The transcription is printed to the console.

Note: audio chunks (and transcriptions) are not aware of previous chunks. Once a chunk is processed, its audio is discarded. Once a transcription is printed, its text is not considered anymore.

Contributing

Clone this repo
Make sure you have hatch installed. If not, follow the guide here.
Create a virtual environment hatch env create and activate it hatch shell
Install pre-commit hooks for linting: pre-commit install

Warning

No unit tests are currently implemented!

License

whisper-live is distributed under the terms of the MIT license.

Inspirations and Related Work

This project was inspired by:

whisper-live's People

Contributors

Stargazers

Watchers

whisper-live's Issues

Add unit tests

Add `faster-whisper` support

We need to add the support to faster-whisper. It may be faster during inference compared to HF transformers.
We should also test the performance increase/decrease.

faster-whisper: https://github.com/SYSTRAN/faster-whisper