Coder Social home page Coder Social logo

gedebabin / dbt-net Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yuguochencuc/dbt-net

0.0 0.0 0.0 188.14 MB

The audio demos with respect to the paper "DBT-Net: Dual-branch federative magnitude and phase estimation with attention-in-attention transformer for monaural speech enhancement" are provided (submitted to TASLP). The code will also be released soon.

Python 100.00%

dbt-net's Introduction

DBT-Net

The audio demos with respect to the paper "DBT-Net: Dual-branch federative magnitude and phase estimation with attention-in-attention transformer for monaural speech enhancement" are provided (Accepted by IEEE TASLP). The code and the pretained model is also released.

Overall architecture:

image

Code:

You can use dual_aia_trans_merge_crm() in aia_trans.py for dual-branch SE, while aia_complex_trans_mag() and aia_complex_trans_ri() are single-branch aprroaches. The trained weights on VB dataset, 30h WSJ0-SI84 datset and 300h 2020 DNS-Challenge are also provided. You can directly perform inference or finetune the model by using vb_aia_merge_new.pth.tar.

requirements:

CUDA 10.1
torch == 1.8.0
pesq == 0.0.1
librosa == 0.7.2
SoundFile == 0.10.3

How to train

Step1

prepare your data. Run json_extract.py to generate json files, which records the utterance file names for both training and validation set

# Run json_extract.py
json_extract.py

Step2

change the parameter settings accroding to your directory (within config_vb.py or config_dns.py)

Step3

Network Training (you can also use aia_complex_trans_mag() and aia_complex_trans_ri() network in aia_trans.py for single-branch SE)

# Run main_vb.py or main_dns.py to begin network training 
# solver_merge.py and train_merge.py contain detailed training process
main_vb.py

Inference:

The trained weights are provided in BEST_MODEL.

# Run enhance_vb.py or enhance_wsj.py to enhance the noisy speech samples.
enhance_vb.py 

Experimental Results

WSJ0-SI84 Dataset

lQLPDhtQojea5r3NAsTNBCGwwIWt6yPeAGcCVMiMrwD1AA_1057_708

DNSMOS

816A7E97-B2AC-4a40-A1E7-5160BA631A1D

Voice-Bank + Demand dataset

3D94CBE7-904A-4fd1-95F9-D640899F105F

Spectrogram Visualization

lQLPDhtQogiEcxbNAm_NBl2wnOokv3Dc5iECVMg-l0D5AA_1629_623

dbt-net's People

Contributors

yuguochencuc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.