ufal-dsg / alex-asr Goto Github PK

Online decoder for Kaldi NNET2 and GMM speech recognition models with Python bindings.

License: Other

Makefile 1.60% Python 73.18% C++ 24.41% Shell 0.81%

alex-asr's Introduction

Alex-ASR

Incremental speech recognition decoder for Kaldi NNET2 and GMM models with Python bindings (tested with Python 2.7 and Python 3.4).

Python module documentation is here.

Example Usage

from alex_asr import Decoder
import wave
import struct
import os

# Load speech recognition model from "asr_model_dir" directory.
decoder = Decoder("asr_model_dir/")

# Load audio frames from input wav file.
data = wave.open("input.wav")
frames = data.readframes(data.getnframes())

# Feed the audio data to the decoder.
decoder.accept_audio(frames)
decoder.decode(data.getnframes())
decoder.input_finished()

# Get and print the best hypothesis.
prob, word_ids = decoder.get_best_path()
print " ".join(map(decoder.get_word, word_ids))

Build & Install

Ubuntu 14.04 requirements installation

apt-get update
apt-get install -y build-essential libatlas-base-dev python-dev python-pip git wget gfortran g++ unzip zlib1g-dev automake autoconf libtool subversion
pip install Cython

Installation

$ python setup.py install

Configuration

The decoder takes one argument model_dir for initialization. It is a directory with the decoder model and its configuration.
It expects that a file called alex_asr.conf is contained in it. This file specifies filenames of all other configs, and adheres to Kaldi configuration standards (i.e. one option per line in a text file).
All filenames specified in this config are relative to model_dir

Example of alex_asr.conf that should reside in model_dir:

--model_type=nnet2     # Supported model types are nnet2 (nnet2::AmNnet) and gmm (AmDiagGmm)
--model=final.mdl      # Filename of the mdl file for the decoder.
--hclg=HCLG.fst        # Filename of the fst file with decoding HCLG fst.
--words=words.txt      # Filename with a list of words (each line contains: ""<word> <word-id>").
--mat_lda=final.mat    # Filaneme of the LDA transform matrix.
--mat_cmvn=cmvn.mat    # Filename of the CMVN matrix with global CMVN stats used for OnlineCMVN estimator.
--use_lda=true         # true/false; Says whether to use LDA transform specified by --mat_lda on MFCC features.
--use_ivectors=true    # true/false; Says whether to use Ivector features for decoding
                       # (depends on your decoder). If set to true, you need to also specify --cfg_ivector
                       # with configuration for the ivector extractor.
--use_cmvn=false       # true/false; Says whether to do OnlineCMVN estimation. Uses --mat_cmvn as an initial
                       # matrix for OnlineCMVN estimation. If set to true, --cfg_cmvn must specify a file
                       # with configuration for the estimator.
--use_pitch=false      # true/false. Whether to use pitch feature. If true, --cfg_pitch must specify a file
                       # with configuration of the pitch extractor.
--bits_per_sample=16   # 8/16; How many bits per sample frame?

# These parameters specify filenames of configuration of the particular parts of the decoder. Detailed below.
--cfg_decoder=decoder.cfg
--cfg_decodable=decodable.cfg
--cfg_mfcc=mfcc.cfg
--cfg_cmvn=cmvn.cfg
--cfg_splice=splice.cfg
--cfg_endpoint=endpoint.cfg
--cfg_ivector=ivector.cfg
--cfg_pitch=pitch.cfg

Decoder configuration.

Example decoder.cfg:

--max-active=7000
--min-active=200
--beam=15.0
--lattice-beam=8.0

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/decoder/lattice-faster-decoder.h#L69

Decodable configuration.

Example decodable.cfg:

--acoustic-scale=0.1

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/nnet2/online-nnet2-decodable.h#L48

MFCC configuration

Example mfcc.cfg:

--low-freq=128
--high-freq=3800

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/feat/feature-mfcc.h#L63

Online CMVN configuration

Online CMVN configuration is needed when you set --use_cmvn=true.

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/feat/online-feature.h#L176

Splice configuration

Example splice.cfg:

--left-context=3
--right-context=3

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/feat/online-feature.h#L384

Endpoint configuration

Endpointing configuration is needed if you intend to call EndpointDetected and TrailingSilenceLength functions of the Decoder.

Example endpoint.cfg:

--endpoint.silence_phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25

https://github.com/kaldi-asr/kaldi/blob/master/src/online2/online-endpoint.h#L159

IVector configuration

Ivector configuration is needed if you set --use_ivectors=true.

Example ivector.cfg:

--splice-config=ivector_extractor/splice_opts
--cmvn-config=ivector_extractor/online_cmvn.conf
--lda-matrix=ivector_extractor/final.mat
--global-cmvn-stats=ivector_extractor/global_cmvn.stats
--diag-ubm=ivector_extractor/final.dubm
--ivector-extractor=ivector_extractor/final.ie
--num-gselect=5
--min-post=0.025
--posterior-scale=0.1
--max-remembered-frames=1000
--max-count=100

https://github.com/kaldi-asr/kaldi/blob/master/src/online2/online-ivector-feature.h#L110

Pitch configuration

Pitch configuration is neede if you set --use_pitch=true.

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/feat/pitch-functions.h#L136

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/feat/pitch-functions.h#L250

Regenerate and publish documentation

Provided you have built the module, the documentation can be built by the following commads:

$ cd doc
$ bash build_and_push_to_web.sh

This uses sphinx to build the documentation and pushes it as a GitHub page of the repository.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Credits

Author: Lukas Zilka ([email protected]).

Adapted from Ondra Platek's PyKaldi https://github.com/UFAL-DSG/pykaldi

Integrated Cython code from: https://github.com/vchahun/pyfst

alex-asr's People

Contributors

Stargazers

Watchers

alex-asr's Issues

Install Alex asr

Hi,
I want install alex-asr to try online kaldi recognition, but after these commands:
apt-get update
apt-get install -y build-essential libatlas-base-dev python-dev python-pip git wget gfortran g++ unzip zlib1g-dev automake autoconf libtool subversion
pip install Cython

when a try to install the script with Python2.7 system shows this error:

`running install
running bdist_egg
running egg_info
writing requirements to alex_asr.egg-info/requires.txt
writing alex_asr.egg-info/PKG-INFO
writing top-level names to alex_asr.egg-info/top_level.txt
writing dependency_links to alex_asr.egg-info/dependency_links.txt
reading manifest file 'alex_asr.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'alex_asr.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
It appears that the env is prepared. If there are errors, try deleting libs/ and rerunning the script.
echo alex_asr.a libs/kaldi/tools/openfst//src/lib/.libs/libfst.a libs/kaldi/src/online2/kaldi-online2.a libs/kaldi/src/ivector/kaldi-ivector.a libs/kaldi/src/nnet2/kaldi-nnet2.a libs/kaldi/src/lat/kaldi-lat.a libs/kaldi/src/decoder/kaldi-decoder.a libs/kaldi/src/cudamatrix/kaldi-cudamatrix.a libs/kaldi/src/feat/kaldi-feat.a libs/kaldi/src/transform/kaldi-transform.a libs/kaldi/src/gmm/kaldi-gmm.a libs/kaldi/src/thread/kaldi-thread.a libs/kaldi/src/hmm/kaldi-hmm.a libs/kaldi/src/tree/kaldi-tree.a libs/kaldi/src/matrix/kaldi-matrix.a libs/kaldi/src/util/kaldi-util.a libs/kaldi/src/base/kaldi-base.a > setup.py.add_libs
echo -msse -msse2 -Wall -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H=1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -g -Ilibs/kaldi/tools/openfst//include -Ilibs/kaldi/src -Ilibs/kaldi/tools/ATLAS/include -Ilibs/kaldi/tools/CLAPACK -Wno-sign-compare -I. -fPIC > setup.py.cxxflags_kaldi
echo "-Llibs/kaldi/tools/openfst//lib -Llibs/kaldi/tools/openfst//lib/fst" > setup.py.cxxflags_pyfst
skipping 'alex_asr/decoder.cpp' Cython extension (up-to-date)
cythoning alex_asr/fst/_fst.pyx to alex_asr/fst/_fst.cpp

Error compiling Cython file:

...
label_fst_map['ROOT'] = self
for label, fst in label_fst_map.items():
assert (not fst.osyms or fst.osyms == self.osyms) # output symbols must match
label_id = self.osyms[label]
label_fst_pairs.push_back(pair[int, libfst.ConstStdVectorFstPtr](label_id, fst.fst))
libfst.Replace(label_fst_pairs, result.fst, self.osyms['ROOT'], epsilon)

^

alex_asr/fst/_fst.pyx:766:22: no suitable method found

Error compiling Cython file:

...
label_fst_map['ROOT'] = self
for label, fst in label_fst_map.items():
assert (not fst.osyms or fst.osyms == self.osyms) # output symbols must match
label_id = self.osyms[label]
label_fst_pairs.push_back(pair[int, libfst.ConstLogVectorFstPtr](label_id, fst.fst))
libfst.Replace(label_fst_pairs, result.fst, self.osyms['ROOT'], epsilon)

^

alex_asr/fst/_fst.pyx:1407:22: no suitable method found
building 'alex_asr.fst._fst' extension
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I. -Ilibs/kaldi/tools/openfst/include -Ilibs/kaldi/src -I/usr/include/python2.7 -c alex_asr/fst/_fst.cpp -o build/temp.linux-x86_64-2.7/alex_asr/fst/_fst.o -std=c++0x -Llibs/kaldi/tools/openfst//lib -Llibs/kaldi/tools/openfst//lib/fst
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
alex_asr/fst/_fst.cpp:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.
#error Do not use this file, it is the result of a failed Cython compilation.
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
`

Can you help me?

Thanks in advance

Mat

I am looking to hire an ESP32 Python & C programmer to help me to expose ESP32 I2S exiting C driver to ESP32 Python (LoBo version) for Microphone recording and audio playing for speech recognition. Can I hire you? [email protected]

Hi
I am looking to hire an ESP32 Python & C programmer to help me to expose ESP32 I2S exiting C driver to ESP32 Python (LoBo version) for Microphone recording and audio playing for speech recognition.
Can I hire you? [email protected]

There is an existing working sample in C to a firmware interface for ESP32 I2S for reading Microphone by I2S ICS-43434 IC and play audio using I2S MAX98357A IC.
Sample of using ESP32 I2S in C can see here: https://goo.gl/oSGTsu and https://github.com/MrBuddyCasino/ESP32_Alexa
The target is exposed from C lib from ESP32 IDF to Python to send audio to Google speech recognition and Google text to speech from ESP32. (the free service as in Chrome)
The development environment must run as Python from https://github.com/loboris/MicroPython_ESP32_psRAM_LoBo
The work is fixed price, milestone-based and payment will be as 6 milestone as below, I will provide the hardware for that:

Expose to LoBO Python the ESP32-IDE /MicroPython/Tools/esp-idf/components/driver/include/driver/I2S.H and I2S.C by integrating them in machine module so dir(machine) will show the I2S interface and all commands to set I2S for recording and playing using the ESP32 DMA. With test Python app showing that all I2S in machine.I2S module is working. No new make file will be created, only add code to machine lib C module.
Use the above machine.I2S to record and play to a wav file using a digital mic on I2S ICS-43434 IC and play wav using MAX98357A IC. With test Python app showing that wav and playing are working from random files, as play("test.wav") and record ("rec.wav") both must work on the same time. Also support for Flak encoding, so if test.wav is Flak file it will be played ok. If a command record_flak ("rec_flak.wav") the rec_flak.wav will be flak file.
Doing above but record and play from ram, with Flak encoding as needed by Google SR AND TTS. See https://cloud.google.com/speech-to-text/docs/encoding.
With test Python app demo it working.
Use ram Flak Ram recoding to do SR to get the recognized the Mic audio, as text returns from Google back to ESP32,
Send any text to be played by Google text to speech from ESP32.
Replace the button recording function By voice keyword recognized as the gate to full SR. (as "OK Google"
All from Python As:
You Say the Wake word "Lovey’s", recorded to Ram in a loop all the time. A lower audio level is a mark to send a chunk of 1 sec ("Lovey’s") data to Google SR after flak encoding. Mind adjust_for_ambient_noise(source) see https://goo.gl/M5zrKR
Wait get recognize " Lovy's as return text, if not go to stage 1
Play TTS "Yes master?"
You Say to ESP32 Mic I2S ICS-43434 “What is the baby temperature now?”
Get back the text in ESP32 Python “What is the baby temperature now”, if not recognize, send text to speech to say ("please repeat the question with silence between the words"). Go to 4
ESP32 Send to TTS “The temperature now is 30 degree”
The return wav or Flak been played to using MAX98357A over I2S
Loop command can be continued until will be no command for 30 sec, then the keyword "Lovey’s", must say again. Go to 1
Google API use wav Audio Encoding name: FLAC Free Lossless Audio Codec 16-bit or 24-bit required for streams. See sample flak in Phyton here . So, FLAC can be done in C or Python. But interface must be from Python. See https://goo.gl/mtrwVN
Maybe this is a good source for FLAC in C https://goo.gl/cCYFTh
See 200 repository results for "speech recognition" +Python https://goo.gl/UqNf1w
Also see https://realpython.com/python-speech-recognition/#picking-a-python-speech-recognition-package
Contact: [email protected]
Skype: nissim.test

Installation problem of alex-asr

Hello,
I want to use the alex_asr (https://github.com/UFAL-DSG/alex-asr) for deploying my kaldi model. By installing it, I get the following error message:

******************Message *************************
g++ -msse -msse2 -Wall -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H=1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -g -Ilibs/kaldi/tools/openfst//include -Ilibs/kaldi/src -Ilibs/kaldi/tools/ATLAS/include -Ilibs/kaldi/tools/CLAPACK -Wno-sign-compare -I. -fPIC -c -o src/decoder.o src/decoder.cc
In file included from ./src/decoder_config.h:15:0,
from ./src/decoder.h:8,
from src/decoder.cc:1:
./src/utils.h:63:48: error: ‘vector’ has not been declared
vector alpha,
^~~~~~
./src/utils.h:63:54: error: expected ‘,’ or ‘...’ before ‘<’ token
vector alpha,
^
src/decoder.cc: In member function ‘bool alex_asr::Decoder::GetBestPath(std::vector, kaldi::BaseFloat)’:
src/decoder.cc:190:50: error: ‘vector’ does not name a type; did you mean ‘perror’?
static_cast<vector *>(0),
^~~~~~
perror
src/decoder.cc:190:56: error: expected ‘>’ before ‘<’ token
static_cast<vector *>(0),
^
src/decoder.cc:190:56: error: expected ‘(’ before ‘<’ token
src/decoder.cc:190:56: error: expected primary-expression before ‘<’ token
src/decoder.cc:190:62: error: expected primary-expression before ‘>’ token
static_cast<vector *>(0),
^
src/decoder.cc:190:65: error: expected primary-expression before ‘>’ token
static_cast<vector *>(0),
^
src/decoder.cc: In member function ‘int32 alex_asr::Decoder::TrailingSilenceLength()’:
src/decoder.cc:241:58: error: no matching function for call to ‘TrailingSilenceLength(kaldi::TransitionModel&, std::cxx11::string&, kaldi::LatticeFasterOnlineDecoder&)’
*decoder);
^
In file included from ./src/decoder_config.h:12:0,
from ./src/decoder.h:8,
from src/decoder.cc:1:
libs/kaldi/src/online2/online-endpoint.h:192:7: note: candidate: template<class FST, class DEC> int32 kaldi::TrailingSilenceLength(const kaldi::TransitionModel&, const string&, const DEC&)
int32 TrailingSilenceLength(const TransitionModel &tmodel,
^~~~~~~~~~~~~~~~~~~~~
libs/kaldi/src/online2/online-endpoint.h:192:7: note: template argument deduction/substitution failed:
src/decoder.cc:241:58: note: couldn't deduce template parameter ‘FST’
*decoder);
^
: recipe for target 'src/decoder.o' failed
make: *** [src/decoder.o] Error 1
error: [Errno 2] No such file or directory: 'setup.py.add_libs'

I need your help please.

Thanks in advance

ufal-dsg / alex-asr Goto Github PK

alex-asr's Introduction

Alex-ASR

Example Usage

Build & Install

Ubuntu 14.04 requirements installation

Installation

Configuration

Decoder configuration.

Decodable configuration.

MFCC configuration

Online CMVN configuration

Splice configuration

Endpoint configuration

IVector configuration

Pitch configuration

Regenerate and publish documentation

License

Credits

alex-asr's People

Contributors

Stargazers

Watchers

Forkers

alex-asr's Issues

Error compiling Cython file:

^

Error compiling Cython file:

^

Recommend Projects

Recommend Topics

Recommend Org