Coder Social home page Coder Social logo

haoheliu / voicefixer_main Goto Github PK

View Code? Open in Web Editor NEW
265.0 12.0 48.0 22.03 MB

General Speech Restoration

Home Page: https://haoheliu.github.io/demopage-voicefixer/

License: MIT License

Python 99.43% Shell 0.57%
speech-processing speech-enhancement speech-analysis speech-synthesis machine-learning tts speech-to-text speech

voicefixer_main's Introduction

arXiv Open In Colab PyPI version githubio

2021-11-06: I have just updated the code structure to make it easier to understand. It may have potential bug now. I will do some test training later.

2021-11-01: I will update the code and make it easier to use later.

VoiceFixer

VoiceFixer is a framework for general speech restoration. We aim at the restoration of severely degraded speech and historical speech.

Materials

Usage

Environment (Do this at first)

# Download dataset and prepare running environment
git clone https://github.com/haoheliu/voicefixer_main.git
cd voicefixer_main
source init.sh 

VoiceFixer for general speech restoration

Here we take VF_UNet(voicefixer with unet as analysis module) as an example.

  • Training
# pass in a configuration file to the training script
python3 train_gsr_voicefixer.py -c config/vctk_base_voicefixer_unet.json # you can modify the configuration file to personalize your training

You can checkout the logs directory for checkpoints, logging and validation results.

  • Evaluation

Automatic evaluation and generating .csv file on all testsets.

For example, if you like to evaluate on all testset (default).

python3 eval_gsr_voicefixer.py  \
                    --config  <path-to-the-config-file> \
                    --ckpt  <path-to-the-checkpoint> 

For example, if you just wanna evaluate on GSR testset.

python3 eval_gsr_voicefixer.py  
                    --config  <path-to-the-config-file> \
                    --ckpt  <path-to-the-checkpoint> \
                    --testset  general_speech_restoration \ 
                    --description  general_speech_restoration_eval 

There are generally seven testsets you can pass to --testset:

  • base: all testset
  • clip: testset with speech that have clipping threshold of 0.1, 0.25, and 0.5
  • reverb: testset with reverberate speech
  • general_speech_restoration: testset with speech that contain all kinds of random distortions
  • enhancement: testset with noisy speech
  • speech_super_resolution: testset with low resolution speech that have sampling rate of 2kHz, 4kHz, 8kHz, 16kHz, and 24kHz.

And if you would like to evaluate on a small portion of data, e.g. 10 utterance. You can pass the number to --limit_numbers argument.

python3 eval_gsr_voicefixer.py  \
                    --config  <path-to-the-config-file> \
                    --ckpt  <path-to-the-checkpoint> \
                    --limit_numbers 10 

Evaluation results will be presented in the exp_results folder.

ResUNet for general speech restoration

  • Training
# pass in a configuration file to the training script
python3 train_gsr_voicefixer.py -c config/vctk_base_voicefixer_unet.json

You can checkout the logs directory for checkpoints, logging and validation results.

  • Evaluation (similar to voicefixer evaluation)
    python3 eval_ssr_unet.py  
                        --config  <path-to-the-config-file> \
                        --ckpt  <path-to-the-checkpoint> \
                        --limit_numbers <int-test-only-on-a-few-utterance> \
                        --testset  <the-testset-you-want-to-use> \ 
                        --description  <describe-this-test>

ResUNet for single task speech restoration

  • Training

    • Denoising
    # pass in a configuration file to the training script
    python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_denoising.json
    • Dereverberation
    # pass in a configuration file to the training script
    python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_dereverberation.json
    • Super Resolution
    # pass in a configuration file to the training script
    python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_super_resolution.json
    • Declipping
    # pass in a configuration file to the training script
    python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_declipping.json

You can checkout the logs directory for checkpoints, logging and validation results.

  • Evaluation (similar to voicefixer evaluation)
    python3 eval_ssr_unet.py  
                        --config  <path-to-the-config-file> \
                        --ckpt  <path-to-the-checkpoint> \
                        --limit_numbers <int-test-only-on-a-few-utterance> \
                        --testset  <the-testset-you-want-to-use> \ 
                        --description  <describe-this-test>

Citation

 @misc{liu2021voicefixer,   
     title={VoiceFixer: Toward General Speech Restoration With Neural Vocoder},   
     author={Haohe Liu and Qiuqiang Kong and Qiao Tian and Yan Zhao and DeLiang Wang and Chuanzeng Huang and Yuxuan Wang},  
     year={2021},  
     eprint={2109.13731},  
     archivePrefix={arXiv},  
     primaryClass={cs.SD}  
 }

real-life-example real-life-example real-life-example

voicefixer_main's People

Contributors

ak391 avatar anonymous20211004 avatar haoheliu avatar msinanyildirim avatar satvik-venkatesh avatar ws-choi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

voicefixer_main's Issues

Training VoiceFixer on Custom Dataset

Hi,
I want to restore my speech audio samples. For that, I want to train a the voicefixer repo on my custom dataset because when I'm using the pretrained voicefixer, the voice of the speaker is getting changed slightly.

When I'm trying to train the data of custom dataset, I'm unable to do so. I'm getting the following error while training:
Runtime-error: The size of tensor a (301) must match the size of tensor b (934) at non-singleton dimension 2

This is coming during performing sanity check. The input and target tensor values are not same. Any idea how I can resolve this issue ? Also if you can mention the steps for training the model on custom data, that we be really helpful.

Thanks in advance !
Look forward to hearing from you.

Regards,
Harsh

How long does it take to train?

Hi, thx for making this great program! What GPU did you use, and how long did it take to train? Just curious whether it would be feasible to train my own version. Thanks!

License

Hi,
Are you planning to eventually change this license to a more permissive license (ie ISC, MIT, etc)?
Thanks!

Vocoder crash

Thanks for you PIP module! However I am get error on machine with 2xGPU. I am use simple code as is (with real file path)

from voicefixer import VoiceFixer
voicefixer = VoiceFixer()
voicefixer.restore(input="", # input wav file path
output="", # output wav file path
cuda=False, # whether to use gpu acceleration
mode = 0) # You can try out mode 0, 1 to find out the best result

from voicefixer import Vocoder #Universal Speaker Independent Vocoder
vocoder = Vocoder(sample_rate=44100) # only support 44100 sample rate
vocoder.oracle(fpath="", # input wav file path
out_path="") # output wav file path

voicefixer part work OK but vocoder part get following error:

Traceback (most recent call last):
File "main.py", line 11, in
vocoder.oracle(fpath="/home/MassProcessor/_in/KERR_0012.wav", # input wav file path
File "/usr/local/lib/python3.8/dist-packages/voicefixer/vocoder/base.py", line 56, in oracle
wav_re = self.model(mel)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/voicefixer/vocoder/model/generator.py", line 95, in forward
conditions = self.condnet(conditions)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 298, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 294, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

I am try this code on another machine without GPU and vocoder work perfectly

Update with workaround. Turn off CUDA for vocoder like "export CUDA_DEVICE_AVAILABLE=". Yes, nothing after "="

Train with 16k data

Hi, thanks for the awesome work.
I have a question here, if i train the model using 16k data , will the trained model guarantee the performance?

License

please add a license file

questions for vocoder

Hi, @haoheliu. Thank you for your awesome work.

  1. After read code on the vocoder part, I found that there is only a pre-trained model and no training steps. Why is there no implementation of this part ? And under what circumstances is the pre-trained model obtained and how is its performance ?
  2. The vocoder part in the original TFGAN paper does not include the subband discriminator(there is also no implementation of this part). Because I did not see the relevant interpretation in the paper, what help or impact does the subband discriminator have on the model ?

If I can get an answer, it will help me a lot.
Thank you.

about speech super-resolution

when you train speech super-resolution use mask method
but high frequency is none mask * 0 = 0 why use mask but not mapping?

SISNR implementation

Hi, Thanks for the awesome work. I was going through the Si-SNR implementation but found that you have used negative of sdr here . If I have not grossly misunderstood anything, can you please let me know why you have used negative here? Thanks .

about checkpoint and training details

Thank you for your awesome work

Could you provide the pre-trained check-point model?
For training, analysis and synthesis stage are trained together or separately?

Pre-trained model

Can you provide a pre-trained model or trained parameters?thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.