Coder Social home page Coder Social logo

cstr-edinburgh / magphase Goto Github PK

View Code? Open in Web Editor NEW
77.0 20.0 31.0 18.98 MB

MagPhase Vocoder: Speech analysis/synthesis system for TTS and related applications.

License: Apache License 2.0

Python 99.36% Shell 0.64%
tts merlin synthesis vocoder speech-analysis phase-spectra

magphase's Introduction

CSTR-Edinburgh blog

This is the repository for the CSTR-Edinburgh blog. It is based on Jekyll Now.

The intention of the blog is to showcase our projects, e.g. an abstract plus a figure or table from a published paper with links to the paper, source code and results. The posts should be brief and instead of going into too much detail it should refer to the respective papers.

Creating a new post

  1. Go to the _posts directory
  2. Click Create new file
  3. Include the header below, and modify its title and the body:
  4. Save in the format year-month-day-title.md

Boilerplate header:

---
layout: post
title: Modify this title.
author: Your Name
---

Your content goes here.

Images

  • Place images in CSTR-Edinburgh.github.io/images/. I.e. you can go here and click Upload files
  • Preface image filenames by the filename of your blog post, e.g.: year-month-day-title-imagename.png
  • Include them in your text by: ![an image alt text]({{ site.baseurl }}/images/year-month-day-title-imagename.png "an image title")

magphase's People

Contributors

felipeespic avatar oliverwatts avatar ronanki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

magphase's Issues

Is variable frame rate essential for quality?

To my understanding, magphase vocoder can also use fixed frame rate. I guess the advantage of using variable frame rate is computational efficiency and quality. If my understanding is correct, I'm
interested in which benefits more. Because If fixed frame rate can be used, it is easier to integrate the vocoder into any existing TTS frameworks.

read_reaper_est_file: IndexError: too many indices for array

Error:

3032 Traceback (most recent call last):
3033 File "0_batch_feature_extraction_for_merlin.py", line 63, in
3034 feat_extraction(in_wav_dir, token, out_feats_dir)
3035 File "0_batch_feature_extraction_for_merlin.py", line 40, in feat_extraction
3036 mp.analysis_compressed(wav_file, out_dir=out_feats_dir)
3037 File "/home/mtoman/MagPhase/src/magphase.py", line 1607, in analysis_compressed
3038 m_mag, m_real, m_imag, v_f0, fs, v_shift = analysis_lossless(wav_file, fft_len=fft_len)
3039 File "/home/mtoman/MagPhase/src/magphase.py", line 1582, in analysis_lossless
3040 v_pm_sec, v_voi = la.read_reaper_est_file(est_file, check_len_smpls=len(v_sig), fs=fs)
3041 File "/home/mtoman/MagPhase/src/libaudio.py", line 455, in read_reaper_est_file
3042 v_pm_sec = m_data[:,0]
3043 IndexError: too many indices for array

In:
https://github.com/CSTR-Edinburgh/magphase/blob/master/src/libaudio.py#L452

Caused by an EST file
temp_15857.est.txt
(the input sound sample looks pretty normal)

Potential solution:
Adding ndmin=2 to loadtxt ensures same dimensions for all return values.
But follows up with another error later on.
So perhaps we can return an error here and ignore the file in question?

16k sample rate audio error

Sorry if this question is too naive.

Can I use this tool on 16k sample rate audios? I am trying to run the demo code on a 16k sample rate audio and got error

user@localhost:~/Downloads/magphase-master/demos$ python demo_copy_synthesis_lossless.py
Analysing.....................................................
Extracting epochs with REAPER...
Residual symmetry: P:7371.288574  N:7185.912109  MEAN:-840.706116
Inverting signal
Traceback (most recent call last):
  File "demo_copy_synthesis_lossless.py", line 71, in 
    m_mag, m_real, m_imag, v_f0, fs, v_shift = mp.analysis_lossless(wav_file_orig)
  File "/home/user/Downloads/magphase-master/src/magphase.py", line 2891, in analysis_lossless
    m_fft, v_shift = analysis_with_del_comp_from_pm(v_sig, fs, v_pm_smpls, fft_len=fft_len)
  File "/home/user/Downloads/magphase-master/src/magphase.py", line 291, in analysis_with_del_comp_from_pm
    l_frms, v_lens, v_pm_plus, v_shift, v_rights = windowing(v_in_sig, v_pm_smpls_defi, win_func=win_func)
  File "/home/user/Downloads/magphase-master/src/magphase.py", line 108, in windowing
    v_frm = v_frm * v_win
ValueError: operands could not be broadcast together with shapes (217,) (421,) 

Thanks,

RuntimeWarning: invalid value encountered in divide

By @hyuezhi:

hi, when I run the code with my own data(child voice data),it always get the bug"magphase.py:352: RuntimeWarning: invalid value encountered in divide", is there anything I did wrong?

Magphase vocoder

I am trying to work with Magphase in the 3.6 python environment. After some modifying in Magphase files to work with my python version, I still have some problems:

@6985130d66b7:~/magphase/demos$ python demo_copy_synthesis_low_dim.py
Analysing.....................................................
Extracting epochs with REAPER...
Residual symmetry: P:2120.314453 N:1149.693604 MEAN:0.606241
Inverting signal
Synthesising.................................................
Traceback (most recent call last):
File "demo_copy_synthesis_low_dim.py", line 79, in
v_syn_sig = mp.synthesis_from_compressed(m_mag_mel_log, m_real_mel, m_imag_mel, v_lf0_smth, fs, b_const_rate=b_const_rate, b_out_hpf=False)
File "/magphase/src/magphase.py", line 854, in synthesis_from_compressed
m_mag = np.exp(la.sp_mel_unwarp(m_mag_mel_log, fft_len_half, alpha=alpha, in_type='log'))
File "/magphase/src/libaudio.py", line 683, in sp_mel_unwarp
m_sp_unwr = mcep_to_sp_cosmat(m_mcep[:,:ncoeffs], nbins_out, alpha=alpha, out_type=in_type)
File "/magphase/src/libaudio.py", line 611, in mcep_to_sp_cosmat
v_bins_out = np.linspace(0, np.pi, num=n_spbins)
File "<array_function internals>", line 6, in linspace
File "/.local/lib/python3.6/site-packages/numpy/core/function_base.py", line 113, in linspace
num = operator.index(num)
TypeError: 'float' object cannot be interpreted as an integer

Adding magphase to Merlin configuration.py, output dims?

In the script that extracts features for magphase, it says typically it extracts 60 mag, 45 real, and 45 imag features. I am using 48kHz audio, just like in the script. So are those numbers correct then? I wonder if there are delta or delta-delta features extracted as well? What should I put in configuration.py as the output dimension for these features?

run 0_batch_feature_extraction_for_merlin.py error

when I run 0_batch_feature_extraction_for_merlin.py ,I get the following error:

Analysing file: 108955.wav............................
Extracting epochs with REAPER...
Residual symmetry: P:934.997803 N:1402.585083 MEAN:-0.363143
Traceback (most recent call last):
File "0_batch_feature_extraction_for_merlin.py", line 63, in
feat_extraction(in_wav_dir, file_name_token, out_feats_dir)
File "0_batch_feature_extraction_for_merlin.py", line 40, in feat_extraction
mp.analysis_compressed(wav_file, out_dir=out_feats_dir)
File "/mnt/diskd/deng/training-tts-magphase/src/magphase.py", line 1585, in analysis_compressed
m_mag, m_real, m_imag, v_f0, fs, v_shift = analysis_lossless(wav_file, fft_len=fft_len)
File "/mnt/diskd/deng/training-tts-magphase/src/magphase.py", line 1565, in analysis_lossless
m_fft, v_shift = analysis_with_del_comp_from_pm(v_sig, fs, v_pm_smpls, fft_len=fft_len)
File "/mnt/diskd/deng/training-tts-magphase/src/magphase.py", line 196, in analysis_with_del_comp_from_pm
raise ValueError("fft_len (%d) is shorter than the maximum frame length (%d). Please, increase de FFT length." % (fft_len,len_max))
ValueError: fft_len (4096) is shorter than the maximum frame length (14972). Please, increase de FFT length.

Can someone give me a hand?

demo_run_for_merlin has lower result voice quality than demo_copy_synthesis_low_dim

I try to extract features and re-synthesis speech with same source audio and compare the results between demo_run_for_merlin scripts and demo_copy_synthesis_low_dim.sh. I test with audacity and see voice quality of output of demo_run_for_merlin lower than demo_copy_synthesis_low_dim.sh. could you please help me how to improve it, because I will use magphase vocoder for my TTS project with Merlin and I want to get best program. thank you so much.

Constant-rate features vs variable-rate labels

Hi Felipe,

I've being comparing the quality of Magphase copy-synthesis (low-dim) on a female voice depending on the value of b_const_rate. (I used mag_dim=60 and phase_dim=45). It seemed to me that the constant rate version has a little more buzziness but I somehow found it preferable to the pitch-synchronous version which is kind of noisy sometimes. (It looks as if the interpolation was filtering the noise albeit a bit too much). Does this match your observations?

Also, I have a question regarding the training with Merlin. Is it preferable to use the constant-rate version (b_const_rate = 1) or to use pitch-synchronous features and warp the labels (b_conv_labs_rate = 1)?

Thanks!

MagPhase vocoder v2.0 released (April 2018)

New in Version 2.0 (April 2018):

  • Constant frame-rate support.
  • Improved sound quality.
  • Two types of post-filter available.
  • Selectable number of coefficients for phase features (real and imag).
  • Selectable number of coefficients for the magnitude feature (mag).

run copy_syn error

when I run demo_copy_synthesis_lossless.py or demo_copy_synthesis_low_dim.py or 2_batch_wave_generation.py I always get the following error: A value in x_new is above the interpolation range

Analysing.....................................................
Extracting epochs with REAPER...
Residual symmetry: P:2120.314453 N:1149.693604 MEAN:0.606241
Inverting signal
Synthesising.................................................
Traceback (most recent call last):
File "./demo_copy_synthesis_low_dim.py", line 74, in
v_syn_sig = mp.synthesis_from_compressed(m_mag_mel_log, m_real_mel, m_imag_mel, v_lf0, fs, fft_len)
File "/home/huawei/magphase-master/src/magphase.py", line 530, in synthesis_from_compressed
m_real_mel = f_intrp_real(np.arange(ncoeffs_mag))
File "/usr/local/lib/python2.7/dist-packages/scipy/interpolate/polyint.py", line 79, in call
y = self._evaluate(x)
File "/usr/local/lib/python2.7/dist-packages/scipy/interpolate/interpolate.py", line 498, in _evaluate
out_of_bounds = self._check_bounds(x_new)
File "/usr/local/lib/python2.7/dist-packages/scipy/interpolate/interpolate.py", line 528, in _check_bounds
raise ValueError("A value in x_new is above the interpolation "
ValueError: A value in x_new is above the interpolation range.

and I found that labs_var_rate/ file is always empty, can you give me some help? ^_^

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.