Coder Social home page Coder Social logo

cuongnm5 / kaldifeat Goto Github PK

View Code? Open in Web Editor NEW

This project forked from csukuangfj/kaldifeat

0.0 0.0 0.0 693 KB

Kaldi-compatible online & offline feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd - Provide C++ & Python API

Home Page: https://csukuangfj.github.io/kaldifeat

License: Other

Shell 3.36% C++ 79.38% Python 10.36% Makefile 0.04% CMake 6.85%

kaldifeat's Introduction

kaldifeat

Documentation Status

Documentation: https://csukuangfj.github.io/kaldifeat

Note: If you are looking for a version that does not depend on PyTorch, please see https://github.com/csukuangfj/kaldi-native-fbank

Comments Options Feature Computer Usage
Fbank for Whisper kaldifeat.WhisperFbankOptions kaldifeat.WhisperFbank
opts = kaldifeat.WhisperFbankOptions()
opts.device = torch.device('cuda', 0)
fbank = kaldifeat.WhisperFbank(opts)
features = fbank(wave)
See #82
Fbank for Whisper-V3 kaldifeat.WhisperFbankOptions kaldifeat.WhisperFbank
opts = kaldifeat.WhisperFbankOptions()
opts.num_mels = 128
opts.device = torch.device('cuda', 0)
fbank = kaldifeat.WhisperFbank(opts)
features = fbank(wave)
FBANK kaldifeat.FbankOptions kaldifeat.Fbank
opts = kaldifeat.FbankOptions()
opts.device = torch.device('cuda', 0)
opts.frame_opts.window_type = 'povey'
fbank = kaldifeat.Fbank(opts)
features = fbank(wave)
Streaming FBANK kaldifeat.FbankOptions kaldifeat.OnlineFbank See ./kaldifeat/python/tests/test_fbank.py
MFCC kaldifeat.MfccOptions kaldifeat.Mfcc
opts = kaldifeat.MfccOptions();
opts.num_ceps = 13
mfcc = kaldifeat.Mfcc(opts)
features = mfcc(wave)
Streaming MFCC kaldifeat.MfccOptions kaldifeat.OnlineMfcc See ./kaldifeat/python/tests/test_mfcc.py
PLP kaldifeat.PlpOptions kaldifeat.Plp
opts = kaldifeat.PlpOptions();
opts.mel_opts.num_bins = 23
plp = kaldifeat.Plp(opts)
features = plp(wave)
Streaming PLP kaldifeat.PlpOptions kaldifeat.OnlinePlp See ./kaldifeat/python/tests/test_plp.py
Spectorgram kaldifeat.SpectrogramOptions kaldifeat.Spectrogram
opts = kaldifeat.SpectrogramOptions();
print(opts)
spectrogram = kaldifeat.Spectrogram(opts)
features = spectrogram(wave)

Feature extraction compatible with Kaldi using PyTorch, supporting CUDA, batch processing, chunk processing, and autograd.

The following kaldi-compatible commandline tools are implemented:

  • compute-fbank-feats
  • compute-mfcc-feats
  • compute-plp-feats
  • compute-spectrogram-feats

(NOTE: We will implement other types of features, e.g., Pitch, ivector, etc, soon.)

HINT: It supports also streaming feature extractors for Fbank, MFCC, and Plp.

Usage

Let us first generate a test wave using sox:

# generate a wave of 1.2 seconds, containing a sine-wave
# swept from 300 Hz to 3300 Hz
sox -n -r 16000 -b 16 test.wav synth 1.2 sine 300-3300

HINT: Download test.wav.

Fbank

import torchaudio

import kaldifeat

filename = "./test.wav"
wave, samp_freq = torchaudio.load(filename)

wave = wave.squeeze()

opts = kaldifeat.FbankOptions()
opts.frame_opts.dither = 0
# Yes, it has same options like `Kaldi`

fbank = kaldifeat.Fbank(opts)
features = fbank(wave)

To compute features that are compatible with Kaldi, wave samples have to be scaled to the range [-32768, 32768]. WARNING: You don't have to do this if you don't care about the compatibility with Kaldi.

The following is an example:

wave *= 32768
fbank = kaldifeat.Fbank(opts)
features = fbank(wave)
print(features[:3])

The output is:

tensor([[15.0074, 21.1730, 25.5286, 24.4644, 16.6994, 13.8480, 11.2087, 11.7952,
         10.3911, 10.4491, 10.3012,  9.8743,  9.6997,  9.3751,  9.3476,  9.3559,
          9.1074,  9.0032,  9.0312,  8.8399,  9.0822,  8.7442,  8.4023],
        [13.8785, 20.5647, 25.4956, 24.6966, 16.9541, 13.9163, 11.3364, 11.8449,
         10.2565, 10.5871, 10.3484,  9.7474,  9.6123,  9.3964,  9.0695,  9.1177,
          8.9136,  8.8425,  8.5920,  8.8315,  8.6226,  8.8605,  8.9763],
        [13.9475, 19.9410, 25.4494, 24.9051, 17.0004, 13.9207, 11.6667, 11.8217,
         10.3411, 10.7258, 10.0983,  9.8109,  9.6762,  9.4218,  9.1246,  8.7744,
          9.0863,  8.7488,  8.4695,  8.6710,  8.7728,  8.7405,  8.9824]])

You can compute the fbank feature for the same wave with Kaldi using the following commands:

echo "1 test.wav" > test.scp
compute-fbank-feats --dither=0 scp:test.scp ark,t:test.txt
head -n4 test.txt

The output is:

1  [
  15.00744 21.17303 25.52861 24.46438 16.69938 13.84804 11.2087 11.79517 10.3911 10.44909 10.30123 9.874329 9.699727 9.37509 9.347578 9.355928 9.107419 9.00323 9.031268 8.839916 9.082197 8.744139 8.40221
  13.87853 20.56466 25.49562 24.69662 16.9541 13.91633 11.33638 11.84495 10.25656 10.58718 10.34841 9.747416 9.612316 9.39642 9.06955 9.117751 8.913527 8.842571 8.59212 8.831518 8.622513 8.86048 8.976251
  13.94753 19.94101 25.4494 24.90511 17.00044 13.92074 11.66673 11.82172 10.34108 10.72575 10.09829 9.810879 9.676199 9.421767 9.124647 8.774353 9.086291 8.74897 8.469534 8.670973 8.772754 8.740549 8.982433

You can see that kaldifeat produces the same output as Kaldi (within some tolerance due to numerical precision).

HINT: Download test.scp and test.txt.

To use GPU, you can use:

import torch

opts = kaldifeat.FbankOptions()
opts.device = torch.device("cuda", 0)

fbank = kaldifeat.Fbank(opts)
features = fbank(wave.to(opts.device))

MFCC, PLP, Spectrogram

To compute MFCC features, please replace kaldifeat.FbankOptions and kaldifeat.Fbank with kaldifeat.MfccOptions and kaldifeat.Mfcc, respectively. The same goes for PLP and Spectrogram.

Please refer to

for more examples.

HINT: In the examples, you can find that

  • kaldifeat supports batch processing as well as chunk processing
  • kaldifeat uses the same options as Kaldi's compute-fbank-feats and compute-mfcc-feats

Usage in other projects

icefall

icefall uses kaldifeat to extract features for a pre-trained model.

See https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/conformer_ctc/pretrained.py.

k2

k2 uses kaldifeat's C++ API.

See https://github.com/k2-fsa/k2/blob/v2.0-pre/k2/torch/csrc/features.cu.

lhotse

lhotse uses kaldifeat to extract features on GPU.

See https://github.com/lhotse-speech/lhotse/blob/master/lhotse/features/kaldifeat.py.

sherpa

sherpa uses kaldifeat for streaming feature extraction.

See https://github.com/k2-fsa/sherpa/blob/master/sherpa/bin/pruned_stateless_emformer_rnnt2/decode.py

Installation

Refer to https://csukuangfj.github.io/kaldifeat for installation.

kaldifeat's People

Contributors

csukuangfj avatar trunglebka avatar dadoou avatar cuongnm5 avatar pkufool avatar veelion avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.