Coder Social home page Coder Social logo

alex-asr's Introduction

Alex-ASR

Incremental speech recognition decoder for Kaldi NNET2 and GMM models with Python bindings (tested with Python 2.7 and Python 3.4).

Python module documentation is here.

Example Usage

from alex_asr import Decoder
import wave
import struct
import os

# Load speech recognition model from "asr_model_dir" directory.
decoder = Decoder("asr_model_dir/")

# Load audio frames from input wav file.
data = wave.open("input.wav")
frames = data.readframes(data.getnframes())

# Feed the audio data to the decoder.
decoder.accept_audio(frames)
decoder.decode(data.getnframes())
decoder.input_finished()

# Get and print the best hypothesis.
prob, word_ids = decoder.get_best_path()
print " ".join(map(decoder.get_word, word_ids))

Build & Install

Ubuntu 14.04 requirements installation

apt-get update
apt-get install -y build-essential libatlas-base-dev python-dev python-pip git wget gfortran g++ unzip zlib1g-dev automake autoconf libtool subversion
pip install Cython

Installation

$ python setup.py install

Configuration

  • The decoder takes one argument model_dir for initialization. It is a directory with the decoder model and its configuration.
  • It expects that a file called alex_asr.conf is contained in it. This file specifies filenames of all other configs, and adheres to Kaldi configuration standards (i.e. one option per line in a text file).
  • All filenames specified in this config are relative to model_dir

Example of alex_asr.conf that should reside in model_dir:

--model_type=nnet2     # Supported model types are nnet2 (nnet2::AmNnet) and gmm (AmDiagGmm)
--model=final.mdl      # Filename of the mdl file for the decoder.
--hclg=HCLG.fst        # Filename of the fst file with decoding HCLG fst.
--words=words.txt      # Filename with a list of words (each line contains: ""<word> <word-id>").
--mat_lda=final.mat    # Filaneme of the LDA transform matrix.
--mat_cmvn=cmvn.mat    # Filename of the CMVN matrix with global CMVN stats used for OnlineCMVN estimator.
--use_lda=true         # true/false; Says whether to use LDA transform specified by --mat_lda on MFCC features.
--use_ivectors=true    # true/false; Says whether to use Ivector features for decoding
                       # (depends on your decoder). If set to true, you need to also specify --cfg_ivector
                       # with configuration for the ivector extractor.
--use_cmvn=false       # true/false; Says whether to do OnlineCMVN estimation. Uses --mat_cmvn as an initial
                       # matrix for OnlineCMVN estimation. If set to true, --cfg_cmvn must specify a file
                       # with configuration for the estimator.
--use_pitch=false      # true/false. Whether to use pitch feature. If true, --cfg_pitch must specify a file
                       # with configuration of the pitch extractor.
--bits_per_sample=16   # 8/16; How many bits per sample frame?

# These parameters specify filenames of configuration of the particular parts of the decoder. Detailed below.
--cfg_decoder=decoder.cfg
--cfg_decodable=decodable.cfg
--cfg_mfcc=mfcc.cfg
--cfg_cmvn=cmvn.cfg
--cfg_splice=splice.cfg
--cfg_endpoint=endpoint.cfg
--cfg_ivector=ivector.cfg
--cfg_pitch=pitch.cfg

Decoder configuration.

Example decoder.cfg:

--max-active=7000
--min-active=200
--beam=15.0
--lattice-beam=8.0

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/decoder/lattice-faster-decoder.h#L69

Decodable configuration.

Example decodable.cfg:

--acoustic-scale=0.1

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/nnet2/online-nnet2-decodable.h#L48

MFCC configuration

Example mfcc.cfg:

--low-freq=128
--high-freq=3800

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/feat/feature-mfcc.h#L63

Online CMVN configuration

Online CMVN configuration is needed when you set --use_cmvn=true.

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/feat/online-feature.h#L176

Splice configuration

Example splice.cfg:

--left-context=3
--right-context=3

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/feat/online-feature.h#L384

Endpoint configuration

Endpointing configuration is needed if you intend to call EndpointDetected and TrailingSilenceLength functions of the Decoder.

Example endpoint.cfg:

--endpoint.silence_phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25

https://github.com/kaldi-asr/kaldi/blob/master/src/online2/online-endpoint.h#L159

IVector configuration

Ivector configuration is needed if you set --use_ivectors=true.

Example ivector.cfg:

--splice-config=ivector_extractor/splice_opts
--cmvn-config=ivector_extractor/online_cmvn.conf
--lda-matrix=ivector_extractor/final.mat
--global-cmvn-stats=ivector_extractor/global_cmvn.stats
--diag-ubm=ivector_extractor/final.dubm
--ivector-extractor=ivector_extractor/final.ie
--num-gselect=5
--min-post=0.025
--posterior-scale=0.1
--max-remembered-frames=1000
--max-count=100

https://github.com/kaldi-asr/kaldi/blob/master/src/online2/online-ivector-feature.h#L110

Pitch configuration

Pitch configuration is neede if you set --use_pitch=true.

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/feat/pitch-functions.h#L136

Details: https://github.com/kaldi-asr/kaldi/blob/master/src/feat/pitch-functions.h#L250

Regenerate and publish documentation

Provided you have built the module, the documentation can be built by the following commads:

$ cd doc
$ bash build_and_push_to_web.sh 

This uses sphinx to build the documentation and pushes it as a GitHub page of the repository.

License

Copyright 2015 Charles University in Prague

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Credits

Author: Lukas Zilka ([email protected]).

Adapted from Ondra Platek's PyKaldi https://github.com/UFAL-DSG/pykaldi

Integrated Cython code from: https://github.com/vchahun/pyfst

alex-asr's People

Contributors

jurcicek avatar ondrejklejch avatar oplatek avatar proycon avatar ticcky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alex-asr's Issues

Install Alex asr

Hi,
I want install alex-asr to try online kaldi recognition, but after these commands:
apt-get update
apt-get install -y build-essential libatlas-base-dev python-dev python-pip git wget gfortran g++ unzip zlib1g-dev automake autoconf libtool subversion
pip install Cython

when a try to install the script with Python2.7 system shows this error:

`running install
running bdist_egg
running egg_info
writing requirements to alex_asr.egg-info/requires.txt
writing alex_asr.egg-info/PKG-INFO
writing top-level names to alex_asr.egg-info/top_level.txt
writing dependency_links to alex_asr.egg-info/dependency_links.txt
reading manifest file 'alex_asr.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'alex_asr.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
It appears that the env is prepared. If there are errors, try deleting libs/ and rerunning the script.
echo alex_asr.a libs/kaldi/tools/openfst//src/lib/.libs/libfst.a libs/kaldi/src/online2/kaldi-online2.a libs/kaldi/src/ivector/kaldi-ivector.a libs/kaldi/src/nnet2/kaldi-nnet2.a libs/kaldi/src/lat/kaldi-lat.a libs/kaldi/src/decoder/kaldi-decoder.a libs/kaldi/src/cudamatrix/kaldi-cudamatrix.a libs/kaldi/src/feat/kaldi-feat.a libs/kaldi/src/transform/kaldi-transform.a libs/kaldi/src/gmm/kaldi-gmm.a libs/kaldi/src/thread/kaldi-thread.a libs/kaldi/src/hmm/kaldi-hmm.a libs/kaldi/src/tree/kaldi-tree.a libs/kaldi/src/matrix/kaldi-matrix.a libs/kaldi/src/util/kaldi-util.a libs/kaldi/src/base/kaldi-base.a > setup.py.add_libs
echo -msse -msse2 -Wall -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H=1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -g -Ilibs/kaldi/tools/openfst//include -Ilibs/kaldi/src -Ilibs/kaldi/tools/ATLAS/include -Ilibs/kaldi/tools/CLAPACK -Wno-sign-compare -I. -fPIC > setup.py.cxxflags_kaldi
echo "-Llibs/kaldi/tools/openfst//lib -Llibs/kaldi/tools/openfst//lib/fst" > setup.py.cxxflags_pyfst
skipping 'alex_asr/decoder.cpp' Cython extension (up-to-date)
cythoning alex_asr/fst/_fst.pyx to alex_asr/fst/_fst.cpp

Error compiling Cython file:

...
label_fst_map['ROOT'] = self
for label, fst in label_fst_map.items():
assert (not fst.osyms or fst.osyms == self.osyms) # output symbols must match
label_id = self.osyms[label]
label_fst_pairs.push_back(pair[int, libfst.ConstStdVectorFstPtr](label_id, fst.fst))
libfst.Replace(label_fst_pairs, result.fst, self.osyms['ROOT'], epsilon)

^

alex_asr/fst/_fst.pyx:766:22: no suitable method found

Error compiling Cython file:

...
label_fst_map['ROOT'] = self
for label, fst in label_fst_map.items():
assert (not fst.osyms or fst.osyms == self.osyms) # output symbols must match
label_id = self.osyms[label]
label_fst_pairs.push_back(pair[int, libfst.ConstLogVectorFstPtr](label_id, fst.fst))
libfst.Replace(label_fst_pairs, result.fst, self.osyms['ROOT'], epsilon)

^

alex_asr/fst/_fst.pyx:1407:22: no suitable method found
building 'alex_asr.fst._fst' extension
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I. -Ilibs/kaldi/tools/openfst/include -Ilibs/kaldi/src -I/usr/include/python2.7 -c alex_asr/fst/_fst.cpp -o build/temp.linux-x86_64-2.7/alex_asr/fst/_fst.o -std=c++0x -Llibs/kaldi/tools/openfst//lib -Llibs/kaldi/tools/openfst//lib/fst
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
alex_asr/fst/_fst.cpp:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.
#error Do not use this file, it is the result of a failed Cython compilation.
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
`

Can you help me?

Thanks in advance

Mat

I am looking to hire an ESP32 Python & C programmer to help me to expose ESP32 I2S exiting C driver to ESP32 Python (LoBo version) for Microphone recording and audio playing for speech recognition. Can I hire you? [email protected]

Hi
I am looking to hire an ESP32 Python & C programmer to help me to expose ESP32 I2S exiting C driver to ESP32 Python (LoBo version) for Microphone recording and audio playing for speech recognition.
Can I hire you? [email protected]

There is an existing working sample in C to a firmware interface for ESP32 I2S for reading Microphone by I2S ICS-43434 IC and play audio using I2S MAX98357A IC.
Sample of using ESP32 I2S in C can see here: https://goo.gl/oSGTsu and https://github.com/MrBuddyCasino/ESP32_Alexa
The target is exposed from C lib from ESP32 IDF to Python to send audio to Google speech recognition and Google text to speech from ESP32. (the free service as in Chrome)
The development environment must run as Python from https://github.com/loboris/MicroPython_ESP32_psRAM_LoBo
The work is fixed price, milestone-based and payment will be as 6 milestone as below, I will provide the hardware for that:

  1. Expose to LoBO Python the ESP32-IDE /MicroPython/Tools/esp-idf/components/driver/include/driver/I2S.H and I2S.C by integrating them in machine module so dir(machine) will show the I2S interface and all commands to set I2S for recording and playing using the ESP32 DMA. With test Python app showing that all I2S in machine.I2S module is working. No new make file will be created, only add code to machine lib C module.
  2. Use the above machine.I2S to record and play to a wav file using a digital mic on I2S ICS-43434 IC and play wav using MAX98357A IC. With test Python app showing that wav and playing are working from random files, as play("test.wav") and record ("rec.wav") both must work on the same time. Also support for Flak encoding, so if test.wav is Flak file it will be played ok. If a command record_flak ("rec_flak.wav") the rec_flak.wav will be flak file.
  3. Doing above but record and play from ram, with Flak encoding as needed by Google SR AND TTS. See https://cloud.google.com/speech-to-text/docs/encoding.
    With test Python app demo it working.
  4. Use ram Flak Ram recoding to do SR to get the recognized the Mic audio, as text returns from Google back to ESP32,
  5. Send any text to be played by Google text to speech from ESP32.
    Replace the button recording function By voice keyword recognized as the gate to full SR. (as "OK Google"
    All from Python As:
  6. You Say the Wake word "Lovey’s", recorded to Ram in a loop all the time. A lower audio level is a mark to send a chunk of 1 sec ("Lovey’s") data to Google SR after flak encoding. Mind adjust_for_ambient_noise(source) see https://goo.gl/M5zrKR
  7. Wait get recognize " Lovy's as return text, if not go to stage 1
  8. Play TTS "Yes master?"
  9. You Say to ESP32 Mic I2S ICS-43434 “What is the baby temperature now?”
  10. Get back the text in ESP32 Python “What is the baby temperature now”, if not recognize, send text to speech to say ("please repeat the question with silence between the words"). Go to 4
  11. ESP32 Send to TTS “The temperature now is 30 degree”
  12. The return wav or Flak been played to using MAX98357A over I2S
  13. Loop command can be continued until will be no command for 30 sec, then the keyword "Lovey’s", must say again. Go to 1
  14. Google API use wav Audio Encoding name: FLAC Free Lossless Audio Codec 16-bit or 24-bit required for streams. See sample flak in Phyton here . So, FLAC can be done in C or Python. But interface must be from Python. See https://goo.gl/mtrwVN
  15. Maybe this is a good source for FLAC in C https://goo.gl/cCYFTh
    See 200 repository results for "speech recognition" +Python https://goo.gl/UqNf1w
    Also see https://realpython.com/python-speech-recognition/#picking-a-python-speech-recognition-package
    Contact: [email protected]
    Skype: nissim.test

Installation problem of alex-asr

Hello,
I want to use the alex_asr (https://github.com/UFAL-DSG/alex-asr) for deploying my kaldi model. By installing it, I get the following error message:

******************Message *************************
g++ -msse -msse2 -Wall -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H=1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -g -Ilibs/kaldi/tools/openfst//include -Ilibs/kaldi/src -Ilibs/kaldi/tools/ATLAS/include -Ilibs/kaldi/tools/CLAPACK -Wno-sign-compare -I. -fPIC -c -o src/decoder.o src/decoder.cc
In file included from ./src/decoder_config.h:15:0,
from ./src/decoder.h:8,
from src/decoder.cc:1:
./src/utils.h:63:48: error: ‘vector’ has not been declared
vector alpha,
^~~~~~
./src/utils.h:63:54: error: expected ‘,’ or ‘...’ before ‘<’ token
vector alpha,
^
src/decoder.cc: In member function ‘bool alex_asr::Decoder::GetBestPath(std::vector
, kaldi::BaseFloat
)’:
src/decoder.cc:190:50: error: ‘vector’ does not name a type; did you mean ‘perror’?
static_cast<vector *>(0),
^~~~~~
perror
src/decoder.cc:190:56: error: expected ‘>’ before ‘<’ token
static_cast<vector *>(0),
^
src/decoder.cc:190:56: error: expected ‘(’ before ‘<’ token
src/decoder.cc:190:56: error: expected primary-expression before ‘<’ token
src/decoder.cc:190:62: error: expected primary-expression before ‘>’ token
static_cast<vector *>(0),
^
src/decoder.cc:190:65: error: expected primary-expression before ‘>’ token
static_cast<vector *>(0),
^
src/decoder.cc: In member function ‘int32 alex_asr::Decoder::TrailingSilenceLength()’:
src/decoder.cc:241:58: error: no matching function for call to ‘TrailingSilenceLength(kaldi::TransitionModel&, std::cxx11::string&, kaldi::LatticeFasterOnlineDecoder&)’
*decoder
);
^
In file included from ./src/decoder_config.h:12:0,
from ./src/decoder.h:8,
from src/decoder.cc:1:
libs/kaldi/src/online2/online-endpoint.h:192:7: note: candidate: template<class FST, class DEC> int32 kaldi::TrailingSilenceLength(const kaldi::TransitionModel&, const string&, const DEC&)
int32 TrailingSilenceLength(const TransitionModel &tmodel,
^~~~~~~~~~~~~~~~~~~~~
libs/kaldi/src/online2/online-endpoint.h:192:7: note: template argument deduction/substitution failed:
src/decoder.cc:241:58: note: couldn't deduce template parameter ‘FST’
*decoder
);
^
: recipe for target 'src/decoder.o' failed
make: *** [src/decoder.o] Error 1
error: [Errno 2] No such file or directory: 'setup.py.add_libs'

I need your help please.

Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.