funcwj / setk Goto Github PK

View Code? Open in Web Editor NEW

390.0 22.0 91.0 37.2 MB

Tools for Speech Enhancement integrated with Kaldi

License: Apache License 2.0

CMake 0.64% C++ 31.40% Shell 10.48% Python 50.51% Perl 6.96%

kaldi speech-enhancement beamforming speech speech-separation rir-generator time-frequency-masking

setk's Introduction

SETK: Speech Enhancement Tools integrated with Kaldi

Here are some speech enhancement/separation tools integrated with Kaldi. I use them for front-end's data processing.

Python Scripts

Supervised (mask-based) adaptive beamformer (GEVD/MVDR/MCWF...)
Data convertion among MATLAB, Numpy and Kaldi
Data visualization (TF-mask, spatial/spectral features, beam pattern...)
Unified data and IO handlers for Kaldi's scripts, archives, wave and numpy's ndarray...
Unsupervised mask estimation (CGMM/CACGMM)
Spatial/Spectral feature computation
DS (delay and sum) beamformer, SD (supper-directive) beamformer
AuxIVA, WPE & WPD, FB (Fixed Beamformer)
Mask computation (iam, irm, ibm, psm, crm)
RIR simulation (1D/2D arrays)
Single channel speech separation (TF spectral masking)
Si-SDR/SDR/WER evaluation
Pywebrtc vad wrapper
Mask-based source localization
Noise suppression
Data simulation
...

Please check out the following instruction for usage of the scripts.

Kaldi Commands

Compute time-frequency masks (ibm, irm etc)
Compute phase & magnitude spectrogram & complex STFT
Seperate target component using input masks
Wave reconstruction from enhanced spectral features
Complex matrix/vector class
MVDR/GEVD beamformer (depend on T-F mask, not very stable)
Fixed beamformer
Compute angular spectrogram based on SRP-PHAT
RIR generator (reference from RIR-Generator)

To build the sources, you need to compile Kaldi with --shared flags and patch matrix/matrix-common.h first

typedef enum {
    kTrans          = 112,  // CblasTrans
    kNoTrans        = 111,  // CblasNoTrans
    kConjTrans      = 113,  // CblasConjTrans
    kConjNoTrans    = 114   // CblasConjNoTrans
} MatrixTransposeType;

Then run

mkdir build
cd build
export KALDI_ROOT=/path/to/kaldi/root
export OPENFST_ROOT=/path/to/openfst/root
# if on UNIX, need compile kaldi with openblas
export OPENBLAS_ROOT=/path/to/openblas/root
cmake ..
make -j

Now I mainly work on sptk package, development based on kaldi is stopped.

For developers (who want to make commits or PRs), please remember to setup pre-commit for code style formating.

setk's People

Contributors

Stargazers

Watchers

Forkers

mctyro wxb506 runngezhang entn-at lbqin cc-cherie rpersie normonisping mobil787 sdqdlgj eternityup audiobucket alongwithyou sundy1219 auditoryworks hangtingchen byfaith beyondboy panxin801 suwoncjh yongyug ronggan xiongmaoxia audioworld wwxm0523 twistedmove xdcesc manideep2510 dingsw1 martinmml woodstone121 wgfi110 xixirupan zk1001 road2018 cc-richard fhahaha c111190 zhaoforever ai-sherry shiweipku jusperlee trbingwy bob-hu kylezhang1118 spxen zhuangweiji spxnn 601222543 xiaozhuo12138 jvyvkai sofianebenziane jiay7 whiteweak ntzzc unanan stonesjtu ctwgl ishine lflyme aidanmomo tianhualefei oucxlw morojs cst781 hongyu-speech zhuhuifeng shiyuzh2007 liziru hust-cxl feiyu1017 gleb-shnshn ductho9799 spxia jeffery-work gurugubelllik sunging test1104 zxynbnb boson-lv pashasah sayduke baekms jaedukseo sugarcase felixfuyihui zhangzhaofeng wj199031738 8095 meadow163 madkote

setk's Issues

online beamforming的speech mask、noise mask怎么得到的

你好，我看你代码发现online beamforming初始的mask是从文件读入的，请问有什么计算方法么？此外，看这个更新参数的方法只是简单的通过平滑函数alpha更新的，并不是你这个repo的方法对吧。

顺便 def do_online_beamform这个函数中的
chunk = beamformer.run(speech_mask[base:base + chunk_size], stft_mat[:, :, base:base + chunk_size], noise_mask=noise_mask, normalize=ban)
run函数中没有normalize属性

Size not matched

Hi,
Thanks a lot for your code!
In the line
https://github.com/funcwj/setk/blob/master/include/stft.cc#L237
you want to add [0: end - s] and [-s: end], but you write

denominator.Range(0, window_size + s).AddVec(1, analysis_window_square.Range(-s, window_size + s));
The size is not matched. Is this a bug?

usage of this repository

the SE project using kaldi seems good, if you can provide the detail of usage, that sounds perfect.

train_rnn_mask.sh

Hello,
In the "train_rnn_mask.sh", the output layer includes the option "include-activation= ", but I complied it in error. Can you help me?
Thank you!

关于CircularSDBeamformer方法的抑制能力疑问

你好，
我在尝试按照Jacob的 Design of Circular Differential Microphone Arrays 中的Superdirective相关内容实现超指向波束成形，但发现在半径1厘米的6麦圆阵列设置下得到的beampattern与书上所示相差较大。

书中p104，Fig. 6.5 Patterns of the superdirective beamformer with a UCA (M = 6), without the symmetry constraint，能看到beampattern存在若干抑制较高约40dB的零点。

但在我的实现中，相同阵列设置下抑制能力最大只有约5dB。
同时我尝试了setk中的CircularSDBeamformer(radius=1. / 100, num_arounded=6)方法，发现结果与自己实现的效果类似，与书上差异较大。

想请教下这中间可能存在什么问题？

--src-begin=32000,0 \ --src-sdr=3 \

在进行两个.wav文件混合时，如果我是对原始的两个音频混合的话，上面的这两个参数要改成多少？还是就是使用这个原始的就可以？

error occurred while compiling setk with kaldi & OpenBLAS

I patched kaldi/src/matrix/matrix-common.h in kaldi and followed the instruction:

mkdir build
cd build
export KALDI_ROOT=/kaldi/root/dir
export OPENBLAS_ROOT=/openblas/root/dir
cmake ..
make -j

then got this error below:

/usr/include/c++/5/bits/basic_string.h:5172:5: note: candidate: template<class _CharT, class _Traits, class _Alloc> std::basic_ostream<_CharT, _Traits>& std::operator<<(std::basic_ostream<_CharT, _Traits>&, const std::__cxx11::basic_string<_CharT, _Traits, _Alloc>&)
operator<<(basic_ostream<_CharT, _Traits>& __os,
^
/usr/include/c++/5/bits/basic_string.h:5172:5: note: template argument deduction/substitution failed:
......./setk/include/beamformer.cc:174:50: note: 'kaldi::MessageLogger' is not derived from 'std::basic_ostream<_CharT, _Traits>'
<< std::imag(s) << ")" << std::endl;
^
include/CMakeFiles/setk.dir/build.make:182: recipe for target 'include/CMakeFiles/setk.dir/beamformer.cc.o' failed
make[2]: *** [include/CMakeFiles/setk.dir/beamformer.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
CMakeFiles/Makefile2:85: recipe for target 'include/CMakeFiles/setk.dir/all' failed
make[1]: *** [include/CMakeFiles/setk.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Can somebody help,plz ?
THX.

kaldi-android

Hi，我读了你的博客文章——《在安卓上使用kaldi进行模型开发》，但是没有发现你是如何在Android端使用kaldi的nnet和feat模块。目前，我想在android端使用kaldi的译码模块，但是很少的资料谈到这一块。
不知道方不方便看看你是如何调用nnet和feat模块的？
谢谢！

mask.scp file for adaptive beamformer

Hello,
I wanted to understand for the adaptive_beamformer example present in setk/doc/adaptive_beamformer/ how msk.scp is defined so as to run the file. I've tried to run it by creating a mask.scp file which involves the path of the egs.npy mask file which was created from the previous scripts. Any help would be appreciated.