Coder Social home page Coder Social logo

solyarisoftware / deepspeechjs Goto Github PK

View Code? Open in Web Editor NEW
12.0 4.0 1.0 34 KB

DeepSpeech runtime transcript NodeJs native client

JavaScript 79.44% Shell 20.56%
deepspeech nodejs speech-recognition asr asr-benchmark speech-to-text speech-processing

deepspeechjs's Introduction

DeepSpeechJs

DeepSpeech runtime transcript NodeJs native client. Some examples and tests.

What's DeepSpeech?

DeepSpeech is an open-source Speech-To-Text engine. Documentation for installation, usage, and training models are available on deepspeech.readthedocs.io.

DeepSpeech run-time transcript, from Node Js

You want to access DeepSpeech speech to text runtime transcription from a well formatted WAV file, using NodeJs. I tested two options:

  1. Spawning, from your NodeJs main thread, an external DeepSpeech command line program. That's the simplest, dumb and slow way in terms of performances. In general, spawning an external process, catching his stdout is a trivial approach, but applicable all times you do not have better inter process communication options.

    Example: deepSpeechTranscriptSpawn.js.

  2. Using DeepSpeech native NodeJs client interface. That's a more performant way.

    Example: deepSpeechTranscriptNative.js.

    The example is very raugh, presuming the audio file is a "well formatted" WAV file. The audio file is just read in memory and the deepspeech model.stt() API is called. Official examples repo contains audio examples that show how to validate WAV, and speeech processing from streaming / in-memory buffers.

DeepSpeech official native NodeJs API

Wat's a well formatted WAV audio file?

DeepSpeech requires a 16bit 16 KHz mono WAV input audio file. To record such a file:

sudo apt install sox
sudo apt install mediainfo

rec -f S16_BE -r 16000 -c 1 my_recording.wav

mediainfo my_recording.wav
General
Complete name                            : my_recording.wav
Format                                   : Wave
File size                                : 64.0 KiB
Duration                                 : 2 s 48 ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 256 kb/s

Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 2 s 48 ms
Bit rate mode                            : Constant
Bit rate                                 : 256 kb/s
Channel(s)                               : 1 channel
Sampling rate                            : 16.0 kHz
Bit depth                                : 16 bits
Stream size                              : 64.0 KiB (100%)

Install

  1. Install DeepSpeech

    # Create and activate a virtualenv
    virtualenv -p python3 $HOME/tmp/deepspeech-venv/
    source $HOME/tmp/deepspeech-venv/bin/activate
    
    # Install DeepSpeech
    pip3 install deepspeech
    
    # Download pre-trained English model files
    curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
    curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer
    
    mkdir models
    mv *.pbmm *.scorer models/
    
    # Download example audio files
    curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/audio-0.9.3.tar.gz
    tar xvf audio-0.9.3.tar.gz
    
    # Transcribe an audio file
    deepspeech --model models/deepspeech-0.9.3-models.pbmm --scorer models/deepspeech-0.9.3-models.scorer --audio audio/2830-3980-0043.wav
  2. Install this repo

    git clone https://solyarisoftware/deepspeeechjs && cd deepspeeechjs
  3. Install the official DeepSpeech npm package

    npm install deepspeech

Run the test

The bash script test_elapsed.sh compares elapsed times of transcript of the audio file ./audio/4507-16021-0012.wav (corresponding to text why should one halt on the way), in 3 cases:

(deepspeech-venv) $ test_elapsed.sh

deepspeech_cli

Loading model from file models/deepspeech-0.9.3-models.pbmm
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2021-01-31 11:04:53.878150: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loaded model in 0.0121s.
Loading scorer from files models/deepspeech-0.9.3-models.scorer
Loaded scorer in 0.000152s.
Running inference.
why should one halt on the way
Inference took 1.527s for 2.735s audio file.

real	0m1,798s
user	0m2,483s
sys	0m0,495s

deepSpeechTranscriptSpawn

why should one halt on the way

real	0m1,832s
user	0m2,509s
sys	0m0,544s

deepSpeechTranscriptNative

usage: node deepSpeechTranscriptNative [<model pbmm file>] [<model scorer file>] [<audio file>]
using: node deepSpeechTranscriptNative ./models/deepspeech-0.9.3-models.pbmm ./models/deepspeech-0.9.3-models.scorer ./audio/4507-16021-0012.wav

TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2021-01-31 11:05:01.371379: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

pbmm      : ./models/deepspeech-0.9.3-models.pbmm
scorer    : ./models/deepspeech-0.9.3-models.scorer
elapsed   : 11ms

audio file: ./audio/4507-16021-0012.wav
transcript: why should one halt on the way
elapsed   : 1553ms

real	0m1,669s
user	0m1,928s
sys	0m0,103s

As expected, the native client transcript elapsed time (1553ms), is much better than the spawn client (1832ms).

Disclaimer

IMPORTANT: unfortunately npm package deepspeech cause a crash using node version 16.0.0. See issue. To run this project you have to downgrade installed Node version. By example I had success with Node version 14.16.1.

Changelog

  • 0.0.9 test script testPerformances.sh improved

To do

License

MIT (c) Giorgio Robino

deepspeechjs's People

Contributors

solyarisoftware avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

chrisb85

deepspeechjs's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.