r9y9 / nnmnkwii Goto Github PK

View Code? Open in Web Editor NEW

392.0 23.0 77.0 81.54 MB

Library to build speech synthesis systems designed for easy and fast prototyping.

Home Page: https://r9y9.github.io/nnmnkwii/latest/

License: Other

Python 80.64% Shell 0.16% Cython 19.20%

machine-learning speech-synthesis voice-conversion python text-to-speech speech-processing

nnmnkwii's Introduction

nnmnkwii ([nanamin kawaii])

Library to build speech synthesis systems designed for easy and fast prototyping.

Documentation

STABLE — most recently tagged version of the documentation.
LATEST — in-development version of the documentation.

Installation

The latest release is availabe on pypi. Assuming you have already numpy installed, you can install nnmnkwii by:

pip install nnmnkwii

If you want the latest development version, run:

pip install git+https://github.com/r9y9/nnmnkwii

or:

git clone https://github.com/r9y9/nnmnkwii
cd nnmnkwii
python setup.py develop # or install

This should resolve the package dependencies and install nnmnkwii property.

At the moment, nnmnkwii.autograd package depends on PyTorch. If you need autograd features, please install PyTorch as well.

Acknowledgements

The library is inspired by the following open source projects:

Merlin: https://github.com/CSTR-Edinburgh/merlin
Librosa: https://github.com/librosa/librosa

Logo was created by Gloomy Ghost(@740291272) (#40)

nnmnkwii's People

Contributors

Stargazers

Watchers

Forkers

jfsantos nieshaoshuai benjamesbabala fancyerii kastnerkyle pc2752 lsq357 aitorbajo pbaljeka root20 toannhu jimregan craftdata entn-at chander-dev vinayakpahalwan7 ooshaunoo ishandutta2007 batikim09 doreln namitak22 whyky del18687058912 hiroshiba dendisuhubdy sse001007 yamachu gitschott visionandy t15cs012 kevinyang007 roelvdp yfliao aozhi voicevio mauna-ai ramstein reloadbrain slbinilkumar japita-se jujugirl lmaxwell mateuszroszkowski loong1989 forkkit aria-k-alethia hyama5 lbxcfx hongyu-speech 5l1v3r1 egaebel gitmesam binghaong shartoo lxngoddess5321 kio829 taroushirani sshuster liusheng134 pragyanaischool moseshi-dev apokar oatsu-gh triper1022 seledreams oriankeith001 colorfingers techthiyanes ryu-xh shaun95 asmasbeih ahmad-abdellatif iq-scm oxygen-dioxide sanyaade-projects

nnmnkwii's Issues

Rounding error for the number of frames

Hi.

I found that there was a slight difference in the number of frames between linguistic features (X) and acoustic features (Y).

For example,

(filename, X_acoustic.shape, Y_acoustic.shape) (#frames, #features)
BASIC5000_0619.npy (1242, 541) (1281, 199)
BASIC5000_0538.npy (2651, 541) (2761, 199)
BASIC5000_0537.npy (587, 541) (587, 199)

https://github.com/r9y9/nnmnkwii/blob/master/nnmnkwii/frontend/merlin.py#L186

I think that it is good to modify the implementation of this part as follows.

frame_number = int((end_time - start_time) / frame_shift_in_micro_sec)
↓
frame_number = int(end_time / frame_shift_in_micro_sec) - int(start_time / frame_shift_in_micro_sec)

The original implementation of Merlin look like this.

https://github.com/CSTR-Edinburgh/merlin/blob/9160d9f1d18fee45d1f0398779883a410a511112/src/frontend/label_normalisation.py#L209

frame_number = int(end_time/50000) - int(start_time/50000)

Regards.

Documentation for nnmnkwii.io.hts is out of date

It seems the documentation for nnmnkwii.io.hts is not up to date.
https://r9y9.github.io/nnmnkwii/v0.0.1/_modules/nnmnkwii/io/hts.html

For example, according to the source in this doc, it needs to change member variables of labels to change frame_shift_in_micro_sec. however, in the latest version, it is provided as an argument of each class method.
https://github.com/r9y9/nnmnkwii/blob/master/nnmnkwii/io/hts.py

Statistical speech synthesis demo in a notebook

Small dataset
Large dataset

End-to-end speech synthesis demo

End-to-end speech synthesis should be considered in design.

Autograd STFT for PyTorch

https://www.tensorflow.org/api_docs/python/tf/contrib/signal/stft

Numpy implementation of MLPG gradient computation

For future extensibility for other tensor libraries.

Document how to build speech synthesis system for new languages

All you need is that

Wav files
Full-context labels
HTS-style question file

With those all prepared, it should be very straightforward to implement.

installation error on python3.7 MacOS

Command "/Users/ghostgloomy/VS/deepvoice3_pytorch/venv/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/8v/vsyd4gdj325f52cllphs6_m40000gn/T/pip-install-jo8yuick/bandmat/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/8v/vsyd4gdj325f52cllphs6_m40000gn/T/pip-record-ia0kg871/install-record.txt --single-version-externally-managed --compile --install-headers /Users/ghostgloomy/VS/deepvoice3_pytorch/venv/include/site/python3.7/bandmat" failed with error code 1 in /private/var/folders/8v/vsyd4gdj325f52cllphs6_m40000gn/T/pip-install-jo8yuick/bandmat/

Documentation versioning

I just forgot to to this. Will do it later.

Performance optimization

At the moment my TTS demonstration notebook (PyTorch implementation) shows it's ~two times slower than Merlin's slt_full_demo for frame-wise training. Need to inversitgate once I finish design work.

nnmnkwii:

8% 2/25 [00:22<04:19, 11.27s/it]

merlin:

2017-08-11 19:17:21,970     INFO main.train_DNN: epoch 1, validation error 164.203384, train error 171.853867  time spent 7.81
2017-08-11 19:17:29,401    DEBUG main.train_DNN: calculating validation loss
2017-08-11 19:17:29,635     INFO main.train_DNN: epoch 2, validation error 162.061722, train error 167.491043  time spent 7.60
2017-08-11 19:17:37,138    DEBUG main.train_DNN: calculating validation loss
2017-08-11 19:17:37,384     INFO main.train_DNN: epoch 3, validation error 160.627853, train error 165.390228  time spent 7.68
2017-08-11 19:17:44,817    DEBUG main.train_DNN: calculating validation loss
2017-08-11 19:17:45,056     INFO main.train_DNN: epoch 4, validation error 159.642395, train error 163.833908  time spent 7.60
2017-08-11 19:17:52,522    DEBUG main.train_DNN: calculating validation loss
2017-08-11 19:17:52,760     INFO main.train_DNN: epoch 5, validation error 158.908524, train error 162.641251  time spent 7.63
2017-08-11 19:18:00,324    DEBUG main.train_DNN: calculating validation loss

RuntimeError: Builds without cython

Hi @r9y9 , I run python setup.py develop , but it still give me this error.
I hope u can help me to fix this error. Thank you.

installation error on python3.5 ubuntu 14.04

I install using: python3.5 -m pip -v --no-cache-dir install nnmnkwii, and got the following error. Plewase advise how to resolve this.
......................
error: can't copy 'nnmnkwii/util/_example_data/slt_arctic_demo_data': doesn't exist or not a regular file
error
Cleaning up...
Removing source in /tmp/pip-build-vrm0moli/nnmnkwii
Command "/usr/bin/python3.5 -u -c "import setuptools, tokenize;file='/tmp/pip-build-vrm0moli/nnmnkwii/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-0tw90rj7-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-vrm0moli/nnmnkwii/
Exception information:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/usr/local/lib/python3.5/dist-packages/pip/commands/install.py", line 342, in run
prefix=options.prefix_path,
File "/usr/local/lib/python3.5/dist-packages/pip/req/req_set.py", line 784, in install
**kwargs
File "/usr/local/lib/python3.5/dist-packages/pip/req/req_install.py", line 878, in install
spinner=spinner,
File "/usr/local/lib/python3.5/dist-packages/pip/utils/init.py", line 707, in call_subprocess
% (command_desc, proc.returncode, cwd))
pip.exceptions.InstallationError: Command "/usr/bin/python3.5 -u -c "import setuptools, tokenize;file='/tmp/pip-build-vrm0moli/nnmnkwii/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-0tw90rj7-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-vrm0moli/nnmnkwii/

Custom data sources for JSUT (Japanese speech corpus of Saruwatari Lab, University of Tokyo)

https://sites.google.com/site/shinnosuketakamichi/publication/jsut

Data files cannot be fetched.

I tried the 00-Quick start guide and got the following warning in [12].

/home/korguchi/anaconda3/lib/python3.6/site-packages/nnmnkwii/datasets/__init__.py:125: UserWarning: No files are collected. You might have specified wrong data source.
  warn("No files are collected. You might have specified wrong data source.")

I also tried the example of nnmnkwii.datasets.FileSourceDataset. The same problem occured.

Environment

Ubuntu 16.04.3 LTS (Windows Subsystem for Linux)
Anaconda3 Python3.6

Support JVS as Dataset

Hi,

I wonder if you want to add JVS as a dataset, it is a Japanese Multi-speaker speech dataset.

If you want, I think I can write the code.

Builtin data sources for VCTK

Namespace decision: function/pre-processing/util

Currently It's not obvious how the packages (namespaces) different. Utility functions can be used as pre-processing. Utility functions are of course functions. Maybe merging into a single function module?

80% test coverages

#From: https://r9y9.github.io/nnmnkwii/latest/design.html#development-guidelines

One of the development guidelines is:

Fully unit tested: There’s no software that has no bugs.

We should add more unit tests for untested code paths.

Problem when try to install via pip install

I have a problem when I try to installed nnmnkwii module. It's response to me like this figure.

I use python on window, "Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32 Type "copyright", "credits" or "license()" for more information."

Can someone suggest me to fix it?
Thank you very much.

Problem with loading datasource for VCTK dataset .

Hello @r9y9 brilliant work , but while loading the data loader for VCTK dataset with transcriptions which are in file with structure

VCTK-Corpus/
    COPYING
    README
    speaker-info.txt
    txt/
        p225/
            p225_001.txt

I get the following error :

 File "prepare_accoustic_features.py", line 128, in <module>
    X = X_dataset.asarray(verbose=1)
  File "/root/anaconda3/envs/pyt/lib/python3.6/site-packages/nnmnkwii/datasets/__init__.py", line 153, in asarray
    D = self[0].shape[-1]
  File "/root/anaconda3/envs/pyt/lib/python3.6/site-packages/nnmnkwii/datasets/__init__.py", line 126, in __getitem__
    *self.collected_files[idx])
  File "prepare_accoustic_features.py", line 56, in collect_features
    fs, x = wavfile.read(wav_path)
  File "/root/anaconda3/envs/pyt/lib/python3.6/site-packages/scipy/io/wavfile.py", line 233, in read
    fid = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'Please call Stella.'

Display package

Parallel data
Linguistic features?
Alignement?

Test failure on CI

Support for mono-phone labels

Library design

Can be used with arbitrary large dataset
Support frame-wise iteration / sequence-wise iteration for dataset
Preprocessing interface decision: function or class? Currently mixed. What's the input/output should be? ref: https://stackoverflow.com/questions/8108688/in-python-when-should-i-use-a-function-instead-of-a-method
Should be usable with end-to-end speech synthesis

Add utils for multi-stream features in TTS

from https://github.com/r9y9/gantts/blob/master/gantts/multistream.py

Improved support for labels

Thanks for writing this useful library! I am trying it from a few days and felt the need for better support for non HTS labels.

It would be good to have something like this: https://github.com/facebookresearch/loop/blob/master/utils.py#L143 which does not depend on label files and uses nltk 's cmudict to generate phonemes.

I can contribute if you guide me.

My current workaround is that I use merlin's scripts to generate test and train labels to use with your code.

Preprocessing: low pass filter for modulation spectrum

method mentioned in https://arxiv.org/pdf/1709.08041.pdf and the author's recent papers.

Add tests for GMM based voice conversion if swap=True

Ref #25

Support for WaveNet

Data sources for The Spoken Wikipedia Corpora

http://nats.gitlab.io/swc/

Bug in parameter generation

Hello, I found a bug in paramgen.mlpg. Specifically, the generated parameters of the beginning and end of utterance become small values even if the static mean has a large value. Following Google Colab is an example of the strange MLPG behavior.
https://colab.research.google.com/drive/1C5TzPjaDRwDKuOV_XmeCmMAnX_QxEH3P

This might be caused by using the distributions of dynamic features of the first (t=0) and final (t=T-1) frames for MLPG, although these distributions cannot be defined without using the values of frames t=-1 and t=T.
Merlin overcomes this problem by giving a very large value (100000000000) to the variance of the first and final frames.
https://github.com/CSTR-Edinburgh/merlin/blob/master/src/frontend/mlpg_fast.py

If you don't mind, I'll make PR to fix this issue.

CUDA implementation of MLPG

MGE training is quite slow.

http://www.sciencedirect.com/science/article/pii/S0377042712003081

pip install produces UnicodeCodecError

Hi! I get a UnicodeDecodeError when trying to install your library.

pip install --user nnmnkwii
Collecting nnmnkwii
  Using cached nnmnkwii-0.0.6.tar.gz
    Complete output from command python setup.py egg_info:
    fatal: Not a git repository (or any parent up to mount point /tmp)
    Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-1it86_s_/nnmnkwii/setup.py", line 110, in <module>
        README = open('README.rst').read()
      File "/opt/conda/envs/pytorch-py35/lib/python3.5/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 300: ordinal not in range(128)

Any thoughts what this could be and how to fix it?

Find a correct way to collect test coverage of backward passes of autograd functions

or simply ignore test coverage of these passes? Currently there are missing test coverage in autograd functions, while we are actually testing. For example, see https://codecov.io/gh/r9y9/nnmnkwii/src/fea5e54fa98c317d0d082225de4ebc32278d22ac/nnmnkwii/autograd/_impl/mlpg.py#L158.

Windows support (Experimental)

At least it should be installed via pip. Ref #66

Windows CI #68
Waiting for MattShannon/bandmat#6
Once torch becomes available on Windows, enable tests on Windows as well

ImportError occured if install by "pip install", and setup by git work well

env: Python 3.5
First, I install nnmnkwii by "pip install nnmnkwii" and succeed.
but when I import nnmnkwii , error occured that:

from nnmnkwii.preprocessing.alignment import DTWAligner
File "/export/soft/env_pytorch/img/lib/python3.5/site-packages/nnmnkwii/preprocessing/alignment.py", line 4, in
from nnmnkwii.baseline.gmm import MLPG
File "/export/soft/env_pytorch/img/lib/python3.5/site-packages/nnmnkwii/baseline/gmm.py", line 5, in
from nnmnkwii.paramgen import mlpg
File "/export/soft/env_pytorch/img/lib/python3.5/site-packages/nnmnkwii/paramgen/init.py", line 3, in
from ._mlpg import build_win_mats, mlpg, mlpg_grad, unit_variance_mlpg_matrix
File "/export/soft/env_pytorch/img/lib/python3.5/site-packages/nnmnkwii/paramgen/_mlpg.py", line 9, in
from nnmnkwii.util.linalg import cholesky_inv_banded
File "/export/soft/env_pytorch/img/lib/python3.5/site-packages/nnmnkwii/util/linalg.py", line 4, in
from ._linalg import dpotri_full_L, dpotri_full_U
ImportError: No module named 'nnmnkwii.util._linalg'

but after downloading .zip file and setup by "python setup.py develop" ,It works well.
I feel confused.

Autograd waveform reconstruction from STFT

Griffin lim

Missing tests for minmax, meanvar, minmax_scale and scale

How to reproduce lab files for jsut?

Thank you for great repo! I'm working on vocoder training with jsut and I tried reproduction of lab files but after running
perl ./segment_julius.pl jsut/
I've got lab files which are pretty different style than them downloaded in colab.
I'm planning to train a vocoder with own dataset and i prefer same kind of lab files as ones from colab because i've managed to reproduce the result. How can i convert the lab files after the command? Thanks in advance.

Here's the one from colab

0 50000 x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[2]
50000 100000 x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[3]
100000 1200000 x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[4]
1200000 1250000 x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[5]
1250000 1300000 x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[6]
1300000 1600000 x^sil-hh+iy=t@1_2/A:0_0_0/B:1-1-2@1-1&1-4#1-3$1-4!0-1;0-1|iy/C:1+1+4/D:0_0/E:content+1@1+3&1+2#0+1/F:content_1/G:0_0/H:4=3@1=2|L-H%/I:9=6/J:13+9-2[2]

but what I've got is

0.0000000 0.3125000 silB
0.3125000 0.3525000 m
0.3525000 0.4325000 i
0.4325000 0.5225000 z
0.5225000 0.5525000 u
0.5525000 0.6525000 o
0.6525000 0.7525000 m
0.7525000 0.8225000 a
0.8225000 0.8725000 r
0.8725000 0.9725000 e:
0.9725000 1.0925000 sh
1.0925000 1.1225000 i
1.1225000 1.2325000 a

Seq2seq

Currently I only tested frame-wise training and sequence-wise training with exactly time-aligned linguistic/acoustic frame features, but we can use seq2seq models for roughly aligned ones. Sequence-wise training with non-aligned dataset, which I haven't tested yet, might reveal library design weakness, so I should try it as soon as possible.

Non-pretrained wavenet support

Hi - I was wondering if anyone has attempted to use the world vocoder features as part of a wavenet implementation. Thus instead of vocoding through pyworld (as is done in the tutorials here) the vocoder features would be used to train the wavenet neural vocoder as in https://github.com/kan-bayashi/PytorchWaveNetVocoder.

The wavenet implementation demonstrated here is a pretrained model (trained on English data) - I assume that would make it unsuitable for another language? As such I would like to train the vocoder myself on my own language data. I have a model trained for my language using the repo I just mentioned, but I am unsure how to bootstrap this to a full TTS system. Thanks!

numpy is a dependency, but it is also required by setup, thus a "cold" pip install nnmnkwii fails

This could be related to #64 .
Installing with pip install nnmnkwii as instructed in the main readme fails when done in a new environment.
Performing pip install numpy before, and then pip install nnmnkwii succeeds.
This is because in setup.py numpy is imported in

nnmnkwii/setup.py

Line 12 in 3162967

import numpy as np

because it is used in

nnmnkwii/setup.py

Line 83 in 3162967

include_dirs=[np.get_include()],

and

nnmnkwii/setup.py

Line 90 in 3162967

include_dirs=[np.get_include()],

It would be preferable if these actions could be performed differently.
Numpy is actually a dependency of the package, but it is not granted to be installed before the actual dependencies file (setup.py) is executed.

nnmnkwii/setup.py

Line 126 in 3162967

'numpy >= 1.11.0',

Regards.

Demonstration (tutorial) notebooks

TODO list:

Voice conversion (VC), Text-to-speech (TTS)

https://github.com/r9y9/nnmnkwii_gallery

English:

Japanese:

~~GMM VC (jp) (I have this locally)~~
~~DNN VC (jp) (I have this locally)~~
DNN TTS (jp) (i have this locally) https://github.com/r9y9/gantts/tree/jp
End-to-end TTS https://github.com/r9y9/deepvoice3_pytorch

Add link to GAN-TTS and GAN-VC in doc

I'm currently exploring GANs in text-to-synthesis and voice conversion. When I finish up my work, I will add link to the repo in the doc, to demonstrate the library capability.

Reference: https://arxiv.org/abs/1709.08041

Logo

I want to replace current logo to something great.

Different multi speaker dataset

Hi I'm trying to use your code for different multi-speaker dataset. I want to have global conditioning for different speaker IDs and local conditioning for another signal looks like speech signal. Which way do you recommend? changing the prototype functions (nnmnkwii) or changing your function inside wavenet_vocoder for loading the data set (like cmu_arctic)?

Metrics package

Mel-cepstrum distortion (MCD)
Normalized mean squared error (RMSE)
VUV error?

Adjust tutorial notebooks

There has been a few API changes. Need to adjust before v0.1.0 release.

TODOs for 0.1.0 release

Numpy implementation for MLPG grad #7
Statistical speech synthesis demo #2

problems of noisy loud audios?

Hi, @r9y9, I followed the document tutorial of synthesis audios. I have tried both your DNN based and RNN based models, step by step following the python tutorial. However, when the model trained, in the test stage, i got a noisy loud audio. Can you give me some advices about this problem. In fact, i make some little changes, i split the train and test phase into different files, and both model file and parameters are saved in local. In the test stage, I just load those files from disk and go through the following test steps. Also, I don't use the test files in your tutorial, but randomly select a lab file from 'data/slt_arctic_full_data/label_state_align/' instead. Is there anything wrong with me?

r9y9 / nnmnkwii Goto Github PK

nnmnkwii's Introduction

nnmnkwii ([nanamin kawaii])

Documentation

Installation

Acknowledgements

nnmnkwii's People

Contributors

Stargazers

Watchers

Forkers

nnmnkwii's Issues

Recommend Projects

Recommend Topics

Recommend Org