helianvine / fdndlp Goto Github PK

View Code? Open in Web Editor NEW

146.0 11.0 60.0 1.23 MB

A speech dereverberation algorithm, also called wpe

License: MIT License

MATLAB 47.56% M 2.48% Python 49.96%

speech-enhancement dereverberation

fdndlp's Introduction

Frequency Domain Variance-normalized Delayed Linear Prediction Algorithm

Introduction

This program is an implementation of variance-normalizied delayed linear prediction in time-frequency domain, which is aimed at speech dereverberation, known as weighted prediction error (WPE) method.

Requirements

MATLB Code
- signal processing toolbox
Python Code
- Python 3.x
- Numpy
- soundfile
- matplotlib (Optional)

Run the Demo

MATLAB code
- Just run the script file demo_fdndlp.m in MATLAB and the audio sample in wav_sample will be used.
- To use your own data, change the filepath and sample_name in demo_fdndlp.m.
- The configrations are gathered in config.m. Be careful to change the settings.

Python code

Usage:

python wpe.py [-h] [-o OUTPUT] [-m MIC_NUM] [-n OUT_NUM] [-p ORDER] filename

To use the default configrations and the given audio sample, run:
```
python wpe.py ../wav_sample/sample_4ch.wav
```

Layout

 ./
 +-- matlab/                          matlab code files
 |   +-- lib/
 |   |   +-- +util/                   utility functions
 |   |   |-- stftanalysis.m           
 |   |   |-- stftsynthesis.m
 |   |-- demo_fdndlp.m
 |   |-- fdndlp.m
 |   |-- config.m
 +-- python/                          python code files
 |   |-- wpe.py
 |   |-- stft.py
 +-- wav_sample/                      audio samples
 |   |-- sample_4ch.wav               reverberant speech
 |   |-- drv_sample_4ch.wav           dereverberated speech
 |-- README.md

Reference

WPE speech dereverberation

Nakatani T, Yoshioka T, Kinoshita K, et al. Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction[J]. IEEE Transactions on Audio Speech & Language Processing, 2010, 18(7):1717-1731.

fdndlp's People

Contributors

Stargazers

Watchers

Forkers

lihao0214 elendizzie maxmax2016 cuicheng0511 annie-gu xinkez templeblock wxb506 alex-ht karelvesely84 moonlightsong bubing xdcesc marcinwitkowski spxnn zcy618 happytonakai byfaith semahaissaoui dereverberation maggie0830 speechdnn adwardy linzai1992 haibit mjc14 lumiamia senpin aysan1 jie-fei kuonanhong arnesch mxe191 bob-hu spxen nn0615 gxu82 kangsl2000 runngezhang xuanphu108 fengdalu zongxiangli josanna heping236 normonisping nitin4525 chunchy lgg007 amorjnyh ryuk17 sx1105092 eternityup

fdndlp's Issues

I donot know how to deal the problem ,I run the demo by pycharm, if because of IDE or pycharm?? ask for help ! very thank you !!

usage: wpe.py [-h] [-o OUTPUT] [-m MIC_NUM] [-n OUT_NUM] [-p ORDER] filename
wpe.py: error: the following arguments are required: filename

Can this python code remove real time reverb?

Hey!

I have a wet sound of a clap inside a small room 4.3 x 1.8 x 2.4 meters. I want to remove the reverb from the sound before passing it on.

Later on, I want to use this code to remove reverb from a machine room on a ship with several electric motors, hydraulics and valves. Is it possible to remove reverb with this code?

Improvement to reduce memory usage on large audio files

The np.diag(sigma2) in __ndlp, uses O(n^2) memory, where n grows linearly with signal (audio) length. I believe it can be fixed by replacing a matrix multiply with an element-wise multiply. Note two things:
(1) np.dot on two 2D arrays is interpreted as matrix multiply.
(2) np.dot(A, np.diag(B)) = matmul(A, np.diag(B)) = A * B[np.newaxis, :]
(2.1) Just to elaborate why the above holds. The matrix multiply can be thought of repeating for each row in A, multiply together the ith column by the ith row in the diagonal matrix (because everywhere else is zero). Thus, this reduces to an element-wise multiply.

Numerical check (run it as often as you want to verify):
tmp1 = np.random.rand(2,8)
tmp2 = np.random.rand(8)
res1 = (tmp1 @ np.diag(tmp2))
res2 = (tmp1 * tmp2[None, :])
np.allclose(res1,res2)

Code that is to be modified

def __ndlp(self, xk):
        """Variance-normalized delayed liner prediction 
        Here is the specific WPE algorithm implementation. The input should be
        the reverberant time-frequency signal in a single frequency bin and 
        the output will be the dereverberated signal in the corresponding 
        frequency bin.
        Args:
            xk: A 2-dimension numpy array with shape=(frames, input_chanels)
        Returns:
            A 2-dimension numpy array with shape=(frames, output_channels)
        """
                
        cols = xk.shape[0] - self.d
        xk_buf = xk[:,0:self.out_num]
        xk = np.concatenate(
            (np.zeros((self.p - 1, self.channels)), xk),
            axis=0)
        xk_tmp = xk[:,::-1].copy()
        frames = stride_tricks.as_strided(
            xk_tmp,
            shape=(self.channels * self.p, cols),
            strides=(xk_tmp.strides[-1], xk_tmp.strides[-1]*self.channels))
        frames = frames[::-1]
        sigma2 = np.mean(1 / (np.abs(xk_buf[self.d:]) ** 2), axis=1)
        
        for _ in range(self.iterations):
            x_cor_m = np.dot(
                    #np.dot(frames, np.diag(sigma2)),  # REPLACE THIS LINE WITH THE FOLLOWING
                    frames * sigma2[None, :],
                    np.conj(frames.T))
            x_cor_v = np.dot(
                frames, 
                np.conj(xk_buf[self.d:] * sigma2.reshape(-1, 1)))
            coeffs = np.dot(np.linalg.inv(x_cor_m), x_cor_v)
            dk = xk_buf[self.d:] - np.dot(frames.T, np.conj(coeffs))
            sigma2 = np.mean(1 / (np.abs(dk) ** 2), axis=1)
        return np.concatenate((xk_buf[0:self.d], dk))

discontinuities every N samples?

Hi there. Thanks for sharing this code. I tried to run the Matlab example in Octave, which seems fine when removing the sonogram display (which doesn't exist). I noticed however that every N samples (default: 256) there is a discontinuity in the generated output file. This is noticable already in the included example (as a slight audible regular crackle at times), although the nature of the voice makes it less apparent; I tried various other sound source for which the problem is strong perceivable. Are you aware of this issue? Is it inherent in the algorithm?

helianvine / fdndlp Goto Github PK

fdndlp's Introduction

Frequency Domain Variance-normalized Delayed Linear Prediction Algorithm

Introduction

Requirements

Run the Demo

Layout

Reference

fdndlp's People

Contributors

Stargazers

Watchers

Forkers

fdndlp's Issues

I donot know how to deal the problem ,I run the demo by pycharm, if because of IDE or pycharm?? ask for help ! very thank you !!

Can this python code remove real time reverb?

Improvement to reduce memory usage on large audio files

discontinuities every N samples?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent