Coder Social home page Coder Social logo

fdndlp's Introduction

Frequency Domain Variance-normalized Delayed Linear Prediction Algorithm

Introduction

This program is an implementation of variance-normalizied delayed linear prediction in time-frequency domain, which is aimed at speech dereverberation, known as weighted prediction error (WPE) method.

Requirements

  • MATLB Code
    • signal processing toolbox
  • Python Code
    • Python 3.x
    • Numpy
    • soundfile
    • matplotlib (Optional)

Run the Demo

  • MATLAB code

    • Just run the script file demo_fdndlp.m in MATLAB and the audio sample in wav_sample will be used.
    • To use your own data, change the filepath and sample_name in demo_fdndlp.m.
    • The configrations are gathered in config.m. Be careful to change the settings.
  • Python code

    • Usage:
      python wpe.py [-h] [-o OUTPUT] [-m MIC_NUM] [-n OUT_NUM] [-p ORDER] filename
    • To use the default configrations and the given audio sample, run:
      python wpe.py ../wav_sample/sample_4ch.wav

Layout

 ./
 +-- matlab/                          matlab code files
 |   +-- lib/
 |   |   +-- +util/                   utility functions
 |   |   |-- stftanalysis.m           
 |   |   |-- stftsynthesis.m
 |   |-- demo_fdndlp.m
 |   |-- fdndlp.m
 |   |-- config.m
 +-- python/                          python code files
 |   |-- wpe.py
 |   |-- stft.py
 +-- wav_sample/                      audio samples
 |   |-- sample_4ch.wav               reverberant speech
 |   |-- drv_sample_4ch.wav           dereverberated speech
 |-- README.md

Reference

WPE speech dereverberation

Nakatani T, Yoshioka T, Kinoshita K, et al. Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction[J]. IEEE Transactions on Audio Speech & Language Processing, 2010, 18(7):1717-1731.

fdndlp's People

Contributors

alex-ht avatar helianvine avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fdndlp's Issues

Can this python code remove real time reverb?

Hey!

I have a wet sound of a clap inside a small room 4.3 x 1.8 x 2.4 meters. I want to remove the reverb from the sound before passing it on.

Later on, I want to use this code to remove reverb from a machine room on a ship with several electric motors, hydraulics and valves. Is it possible to remove reverb with this code?

Improvement to reduce memory usage on large audio files

The np.diag(sigma2) in __ndlp, uses O(n^2) memory, where n grows linearly with signal (audio) length. I believe it can be fixed by replacing a matrix multiply with an element-wise multiply. Note two things:
(1) np.dot on two 2D arrays is interpreted as matrix multiply.
(2) np.dot(A, np.diag(B)) = matmul(A, np.diag(B)) = A * B[np.newaxis, :]
(2.1) Just to elaborate why the above holds. The matrix multiply can be thought of repeating for each row in A, multiply together the ith column by the ith row in the diagonal matrix (because everywhere else is zero). Thus, this reduces to an element-wise multiply.

Numerical check (run it as often as you want to verify):
tmp1 = np.random.rand(2,8)
tmp2 = np.random.rand(8)
res1 = (tmp1 @ np.diag(tmp2))
res2 = (tmp1 * tmp2[None, :])
np.allclose(res1,res2)

Code that is to be modified

def __ndlp(self, xk):
        """Variance-normalized delayed liner prediction 
        Here is the specific WPE algorithm implementation. The input should be
        the reverberant time-frequency signal in a single frequency bin and 
        the output will be the dereverberated signal in the corresponding 
        frequency bin.
        Args:
            xk: A 2-dimension numpy array with shape=(frames, input_chanels)
        Returns:
            A 2-dimension numpy array with shape=(frames, output_channels)
        """
                
        cols = xk.shape[0] - self.d
        xk_buf = xk[:,0:self.out_num]
        xk = np.concatenate(
            (np.zeros((self.p - 1, self.channels)), xk),
            axis=0)
        xk_tmp = xk[:,::-1].copy()
        frames = stride_tricks.as_strided(
            xk_tmp,
            shape=(self.channels * self.p, cols),
            strides=(xk_tmp.strides[-1], xk_tmp.strides[-1]*self.channels))
        frames = frames[::-1]
        sigma2 = np.mean(1 / (np.abs(xk_buf[self.d:]) ** 2), axis=1)
        
        for _ in range(self.iterations):
            x_cor_m = np.dot(
                    #np.dot(frames, np.diag(sigma2)),  # REPLACE THIS LINE WITH THE FOLLOWING
                    frames * sigma2[None, :],
                    np.conj(frames.T))
            x_cor_v = np.dot(
                frames, 
                np.conj(xk_buf[self.d:] * sigma2.reshape(-1, 1)))
            coeffs = np.dot(np.linalg.inv(x_cor_m), x_cor_v)
            dk = xk_buf[self.d:] - np.dot(frames.T, np.conj(coeffs))
            sigma2 = np.mean(1 / (np.abs(dk) ** 2), axis=1)
        return np.concatenate((xk_buf[0:self.d], dk))

discontinuities every N samples?

Hi there. Thanks for sharing this code. I tried to run the Matlab example in Octave, which seems fine when removing the sonogram display (which doesn't exist). I noticed however that every N samples (default: 256) there is a discontinuity in the generated output file. This is noticable already in the included example (as a slight audible regular crackle at times), although the nature of the voice makes it less apparent; I tried various other sound source for which the problem is strong perceivable. Are you aware of this issue? Is it inherent in the algorithm?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.