Coder Social home page Coder Social logo

funcwj / setk Goto Github PK

View Code? Open in Web Editor NEW
390.0 22.0 91.0 37.2 MB

Tools for Speech Enhancement integrated with Kaldi

License: Apache License 2.0

CMake 0.64% C++ 31.40% Shell 10.48% Python 50.51% Perl 6.96%
kaldi speech-enhancement beamforming speech speech-separation rir-generator time-frequency-masking

setk's Introduction

SETK: Speech Enhancement Tools integrated with Kaldi

Here are some speech enhancement/separation tools integrated with Kaldi. I use them for front-end's data processing.

Python Scripts

  • Supervised (mask-based) adaptive beamformer (GEVD/MVDR/MCWF...)
  • Data convertion among MATLAB, Numpy and Kaldi
  • Data visualization (TF-mask, spatial/spectral features, beam pattern...)
  • Unified data and IO handlers for Kaldi's scripts, archives, wave and numpy's ndarray...
  • Unsupervised mask estimation (CGMM/CACGMM)
  • Spatial/Spectral feature computation
  • DS (delay and sum) beamformer, SD (supper-directive) beamformer
  • AuxIVA, WPE & WPD, FB (Fixed Beamformer)
  • Mask computation (iam, irm, ibm, psm, crm)
  • RIR simulation (1D/2D arrays)
  • Single channel speech separation (TF spectral masking)
  • Si-SDR/SDR/WER evaluation
  • Pywebrtc vad wrapper
  • Mask-based source localization
  • Noise suppression
  • Data simulation
  • ...

Please check out the following instruction for usage of the scripts.

Kaldi Commands

  • Compute time-frequency masks (ibm, irm etc)
  • Compute phase & magnitude spectrogram & complex STFT
  • Seperate target component using input masks
  • Wave reconstruction from enhanced spectral features
  • Complex matrix/vector class
  • MVDR/GEVD beamformer (depend on T-F mask, not very stable)
  • Fixed beamformer
  • Compute angular spectrogram based on SRP-PHAT
  • RIR generator (reference from RIR-Generator)

To build the sources, you need to compile Kaldi with --shared flags and patch matrix/matrix-common.h first

typedef enum {
    kTrans          = 112,  // CblasTrans
    kNoTrans        = 111,  // CblasNoTrans
    kConjTrans      = 113,  // CblasConjTrans
    kConjNoTrans    = 114   // CblasConjNoTrans
} MatrixTransposeType;

Then run

mkdir build
cd build
export KALDI_ROOT=/path/to/kaldi/root
export OPENFST_ROOT=/path/to/openfst/root
# if on UNIX, need compile kaldi with openblas
export OPENBLAS_ROOT=/path/to/openblas/root
cmake ..
make -j

Now I mainly work on sptk package, development based on kaldi is stopped.

For developers (who want to make commits or PRs), please remember to setup pre-commit for code style formating.

setk's People

Contributors

funcwj avatar gleb-shnshn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

setk's Issues

online beamforming的speech mask、noise mask怎么得到的

你好,我看你代码发现online beamforming初始的mask是从文件读入的,请问有什么计算方法么?此外,看这个更新参数的方法只是简单的通过平滑函数alpha更新的,并不是你这个repo的方法对吧。

顺便 def do_online_beamform这个函数中的
chunk = beamformer.run(speech_mask[base:base + chunk_size], stft_mat[:, :, base:base + chunk_size], noise_mask=noise_mask, normalize=ban)
run函数中没有normalize属性

usage of this repository

the SE project using kaldi seems good, if you can provide the detail of usage, that sounds perfect.

train_rnn_mask.sh

Hello,
In the "train_rnn_mask.sh", the output layer includes the option "include-activation= ", but I complied it in error. Can you help me?
Thank you!

关于CircularSDBeamformer方法的抑制能力疑问

你好,
我在尝试按照Jacob的 Design of Circular Differential Microphone Arrays 中的Superdirective相关内容实现超指向波束成形,但发现在半径1厘米的6麦圆阵列设置下得到的beampattern与书上所示相差较大。

书中p104,Fig. 6.5 Patterns of the superdirective beamformer with a UCA (M = 6), without the symmetry constraint,能看到beampattern存在若干抑制较高约40dB的零点。

但在我的实现中,相同阵列设置下抑制能力最大只有约5dB。
同时我尝试了setk中的CircularSDBeamformer(radius=1. / 100, num_arounded=6)方法,发现结果与自己实现的效果类似,与书上差异较大。

想请教下这中间可能存在什么问题?

咨询

兄弟,你对kaldi的关键词模块熟悉么?同样也是学生

--src-begin=32000,0 \ --src-sdr=3 \

在进行两个.wav文件混合时,如果我是对原始的两个音频混合的话,上面的这两个参数要改成多少?还是就是使用这个原始的就可以?

error occurred while compiling setk with kaldi & OpenBLAS

I patched kaldi/src/matrix/matrix-common.h in kaldi and followed the instruction:

mkdir build
cd build
export KALDI_ROOT=/kaldi/root/dir
export OPENBLAS_ROOT=/openblas/root/dir
cmake ..
make -j

then got this error below:

/usr/include/c++/5/bits/basic_string.h:5172:5: note: candidate: template<class _CharT, class _Traits, class _Alloc> std::basic_ostream<_CharT, _Traits>& std::operator<<(std::basic_ostream<_CharT, _Traits>&, const std::__cxx11::basic_string<_CharT, _Traits, _Alloc>&)
operator<<(basic_ostream<_CharT, _Traits>& __os,
^
/usr/include/c++/5/bits/basic_string.h:5172:5: note: template argument deduction/substitution failed:
......./setk/include/beamformer.cc:174:50: note: 'kaldi::MessageLogger' is not derived from 'std::basic_ostream<_CharT, _Traits>'
<< std::imag(s) << ")" << std::endl;
^
include/CMakeFiles/setk.dir/build.make:182: recipe for target 'include/CMakeFiles/setk.dir/beamformer.cc.o' failed
make[2]: *** [include/CMakeFiles/setk.dir/beamformer.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
CMakeFiles/Makefile2:85: recipe for target 'include/CMakeFiles/setk.dir/all' failed
make[1]: *** [include/CMakeFiles/setk.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Can somebody help,plz ?
THX.

kaldi-android

Hi,我读了你的博客文章——《在安卓上使用kaldi进行模型开发 》,但是没有发现你是如何在Android端使用kaldi的nnet和feat模块。目前,我想在android端使用kaldi的译码模块,但是很少的资料谈到这一块。
不知道方不方便看看你是如何调用nnet和feat模块的?
谢谢!

mask.scp file for adaptive beamformer

Hello,
I wanted to understand for the adaptive_beamformer example present in setk/doc/adaptive_beamformer/ how msk.scp is defined so as to run the file. I've tried to run it by creating a mask.scp file which involves the path of the egs.npy mask file which was created from the previous scripts. Any help would be appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.