Transcribe speech to text on node.js using OpenAI's Whisper models converted to cross-platform ONNX format
- Add dependency to project
npm install whisper-onnx-speech-to-text
- Download whisper model of choice
npx whisper-onnx-speech-to-text download
import { initWhisper } from 'whisper-onnx-speech-to-text';
const whisper = await initWhisper("base.en");
const transcript = await whisper.transcribe("example/sample.wav");
[
{
text: " And so my fellow Americans ask not what your country can do for you, ask what you can do for your country."
chunks: [
{ timestamp: [0, 8.18], text: " And so my fellow Americans ask not what your country can do for you" },
{ timestamp: [8.18, 11.06], text: " ask what you can do for your country." }
]
}
]
The initWhisper()
takes the name of the model and returns an instance of the Whisper class initialized with the chosen model.
The Whisper
class has the following methods:
transcribe(filePath: string, language?: string)
: transcribes speech from wav file.filePath
: path to wav filelanguage
: target language for recognition. Name format - the full name in English like'spanish'
disposeModel()
: dispose initialized model.