Python Implementation of the Feature Extraction Process in Kaldi
This repository contains Python scripts for the feature extraction process in speech recognition systems. It is originated from Kaldi (https://github.com/kaldi-asr/kaldi), but may be more flexible for most speech recognition systems, especially end-to-end ones.
- feature_extraction_template.py: a template for the feature extraction process, blocks which need to be filled by the user are commented with the keyword "BLOCK"
- utils/extract_window.py: the preprocessing and window slicing functions
- utils/fft2melmx.py: an adopted script calculating mel weights for the conversion from the fft feature to the mel feature, see the comments in the file for details
- utils/deltas.py: the delta feature calculation function used in TensorFlow graphs
- utils/deltas_np.py: the delta feature calculation function which can be used without TensorFlow
Most of the arguments can be modified in feature_extraction_template.py. The default arguments in the template are used for the wide residual BLSTM network (WRBN) based acoustic model.
- The feature_extraction_template.py only shows the feature extraction process with the usage of TensorFlow. It should be easy to extend it to the version without TensorFlow (using utils/deltas_np.py).
- Currently the log energy pre window function in Kaldi is not supported. This function is seldomly used in most of my work though. Contributions are welcome.
- Currently we only support filter-bank (fbank) features. It should be easy to extend to other popular features such as MFCC.