Coder Social home page Coder Social logo

r9y9 / nnmnkwii Goto Github PK

View Code? Open in Web Editor NEW
392.0 23.0 77.0 81.54 MB

Library to build speech synthesis systems designed for easy and fast prototyping.

Home Page: https://r9y9.github.io/nnmnkwii/latest/

License: Other

Python 80.64% Shell 0.16% Cython 19.20%
machine-learning speech-synthesis voice-conversion python text-to-speech speech-processing

nnmnkwii's Introduction

alt text

nnmnkwii ([nanamin kawaii])

PyPI Python package Build Status codecov DOI

Library to build speech synthesis systems designed for easy and fast prototyping.

Documentation

  • STABLEmost recently tagged version of the documentation.
  • LATESTin-development version of the documentation.

Installation

The latest release is availabe on pypi. Assuming you have already numpy installed, you can install nnmnkwii by:

pip install nnmnkwii

If you want the latest development version, run:

pip install git+https://github.com/r9y9/nnmnkwii

or:

git clone https://github.com/r9y9/nnmnkwii
cd nnmnkwii
python setup.py develop # or install

This should resolve the package dependencies and install nnmnkwii property.

At the moment, nnmnkwii.autograd package depends on PyTorch. If you need autograd features, please install PyTorch as well.

Acknowledgements

The library is inspired by the following open source projects:

Logo was created by Gloomy Ghost(@740291272) (#40)

nnmnkwii's People

Contributors

aria-k-alethia avatar hiroshiba avatar hyama5 avatar jimregan avatar mateuszroszkowski avatar npn avatar r9y9 avatar taroushirani avatar yamachu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nnmnkwii's Issues

Rounding error for the number of frames

Hi.

I found that there was a slight difference in the number of frames between linguistic features (X) and acoustic features (Y).

For example,

(filename, X_acoustic.shape, Y_acoustic.shape) (#frames, #features)
BASIC5000_0619.npy (1242, 541) (1281, 199)
BASIC5000_0538.npy (2651, 541) (2761, 199)
BASIC5000_0537.npy (587, 541) (587, 199)

https://github.com/r9y9/nnmnkwii/blob/master/nnmnkwii/frontend/merlin.py#L186

I think that it is good to modify the implementation of this part as follows.

frame_number = int((end_time - start_time) / frame_shift_in_micro_sec)
↓
frame_number = int(end_time / frame_shift_in_micro_sec) - int(start_time / frame_shift_in_micro_sec)

The original implementation of Merlin look like this.

https://github.com/CSTR-Edinburgh/merlin/blob/9160d9f1d18fee45d1f0398779883a410a511112/src/frontend/label_normalisation.py#L209

frame_number = int(end_time/50000) - int(start_time/50000)

Regards.

installation error on python3.7 MacOS

Command "/Users/ghostgloomy/VS/deepvoice3_pytorch/venv/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/8v/vsyd4gdj325f52cllphs6_m40000gn/T/pip-install-jo8yuick/bandmat/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/8v/vsyd4gdj325f52cllphs6_m40000gn/T/pip-record-ia0kg871/install-record.txt --single-version-externally-managed --compile --install-headers /Users/ghostgloomy/VS/deepvoice3_pytorch/venv/include/site/python3.7/bandmat" failed with error code 1 in /private/var/folders/8v/vsyd4gdj325f52cllphs6_m40000gn/T/pip-install-jo8yuick/bandmat/

Performance optimization

At the moment my TTS demonstration notebook (PyTorch implementation) shows it's ~two times slower than Merlin's slt_full_demo for frame-wise training. Need to inversitgate once I finish design work.

nnmnkwii:

8% 2/25 [00:22<04:19, 11.27s/it]

merlin:

2017-08-11 19:17:21,970     INFO main.train_DNN: epoch 1, validation error 164.203384, train error 171.853867  time spent 7.81
2017-08-11 19:17:29,401    DEBUG main.train_DNN: calculating validation loss
2017-08-11 19:17:29,635     INFO main.train_DNN: epoch 2, validation error 162.061722, train error 167.491043  time spent 7.60
2017-08-11 19:17:37,138    DEBUG main.train_DNN: calculating validation loss
2017-08-11 19:17:37,384     INFO main.train_DNN: epoch 3, validation error 160.627853, train error 165.390228  time spent 7.68
2017-08-11 19:17:44,817    DEBUG main.train_DNN: calculating validation loss
2017-08-11 19:17:45,056     INFO main.train_DNN: epoch 4, validation error 159.642395, train error 163.833908  time spent 7.60
2017-08-11 19:17:52,522    DEBUG main.train_DNN: calculating validation loss
2017-08-11 19:17:52,760     INFO main.train_DNN: epoch 5, validation error 158.908524, train error 162.641251  time spent 7.63
2017-08-11 19:18:00,324    DEBUG main.train_DNN: calculating validation loss

installation error on python3.5 ubuntu 14.04

I install using: python3.5 -m pip -v --no-cache-dir install nnmnkwii, and got the following error. Plewase advise how to resolve this.
......................
error: can't copy 'nnmnkwii/util/_example_data/slt_arctic_demo_data': doesn't exist or not a regular file
error
Cleaning up...
Removing source in /tmp/pip-build-vrm0moli/nnmnkwii
Command "/usr/bin/python3.5 -u -c "import setuptools, tokenize;file='/tmp/pip-build-vrm0moli/nnmnkwii/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-0tw90rj7-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-vrm0moli/nnmnkwii/
Exception information:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/usr/local/lib/python3.5/dist-packages/pip/commands/install.py", line 342, in run
prefix=options.prefix_path,
File "/usr/local/lib/python3.5/dist-packages/pip/req/req_set.py", line 784, in install
**kwargs
File "/usr/local/lib/python3.5/dist-packages/pip/req/req_install.py", line 878, in install
spinner=spinner,
File "/usr/local/lib/python3.5/dist-packages/pip/utils/init.py", line 707, in call_subprocess
% (command_desc, proc.returncode, cwd))
pip.exceptions.InstallationError: Command "/usr/bin/python3.5 -u -c "import setuptools, tokenize;file='/tmp/pip-build-vrm0moli/nnmnkwii/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-0tw90rj7-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-vrm0moli/nnmnkwii/

Data files cannot be fetched.

I tried the 00-Quick start guide and got the following warning in [12].

/home/korguchi/anaconda3/lib/python3.6/site-packages/nnmnkwii/datasets/__init__.py:125: UserWarning: No files are collected. You might have specified wrong data source.
  warn("No files are collected. You might have specified wrong data source.")

I also tried the example of nnmnkwii.datasets.FileSourceDataset. The same problem occured.

Environment

  • Ubuntu 16.04.3 LTS (Windows Subsystem for Linux)
  • Anaconda3 Python3.6

Support JVS as Dataset

Hi,

I wonder if you want to add JVS as a dataset, it is a Japanese Multi-speaker speech dataset.

If you want, I think I can write the code.

Namespace decision: function/pre-processing/util

Currently It's not obvious how the packages (namespaces) different. Utility functions can be used as pre-processing. Utility functions are of course functions. Maybe merging into a single function module?

Problem when try to install via pip install

I have a problem when I try to installed nnmnkwii module. It's response to me like this figure.
image

I use python on window, "Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32 Type "copyright", "credits" or "license()" for more information."

Can someone suggest me to fix it?
Thank you very much.

Problem with loading datasource for VCTK dataset .

Hello @r9y9 brilliant work , but while loading the data loader for VCTK dataset with transcriptions which are in file with structure

VCTK-Corpus/
    COPYING
    README
    speaker-info.txt
    txt/
        p225/
            p225_001.txt

I get the following error :

 File "prepare_accoustic_features.py", line 128, in <module>
    X = X_dataset.asarray(verbose=1)
  File "/root/anaconda3/envs/pyt/lib/python3.6/site-packages/nnmnkwii/datasets/__init__.py", line 153, in asarray
    D = self[0].shape[-1]
  File "/root/anaconda3/envs/pyt/lib/python3.6/site-packages/nnmnkwii/datasets/__init__.py", line 126, in __getitem__
    *self.collected_files[idx])
  File "prepare_accoustic_features.py", line 56, in collect_features
    fs, x = wavfile.read(wav_path)
  File "/root/anaconda3/envs/pyt/lib/python3.6/site-packages/scipy/io/wavfile.py", line 233, in read
    fid = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'Please call Stella.'

Improved support for labels

Hi

Thanks for writing this useful library! I am trying it from a few days and felt the need for better support for non HTS labels.

It would be good to have something like this: https://github.com/facebookresearch/loop/blob/master/utils.py#L143 which does not depend on label files and uses nltk 's cmudict to generate phonemes.

I can contribute if you guide me.

My current workaround is that I use merlin's scripts to generate test and train labels to use with your code.

Bug in parameter generation

Hello, I found a bug in paramgen.mlpg. Specifically, the generated parameters of the beginning and end of utterance become small values even if the static mean has a large value. Following Google Colab is an example of the strange MLPG behavior.
https://colab.research.google.com/drive/1C5TzPjaDRwDKuOV_XmeCmMAnX_QxEH3P

This might be caused by using the distributions of dynamic features of the first (t=0) and final (t=T-1) frames for MLPG, although these distributions cannot be defined without using the values of frames t=-1 and t=T.
Merlin overcomes this problem by giving a very large value (100000000000) to the variance of the first and final frames.
https://github.com/CSTR-Edinburgh/merlin/blob/master/src/frontend/mlpg_fast.py

If you don't mind, I'll make PR to fix this issue.

pip install produces UnicodeCodecError

Hi! I get a UnicodeDecodeError when trying to install your library.

pip install --user nnmnkwii
Collecting nnmnkwii
  Using cached nnmnkwii-0.0.6.tar.gz
    Complete output from command python setup.py egg_info:
    fatal: Not a git repository (or any parent up to mount point /tmp)
    Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-1it86_s_/nnmnkwii/setup.py", line 110, in <module>
        README = open('README.rst').read()
      File "/opt/conda/envs/pytorch-py35/lib/python3.5/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 300: ordinal not in range(128)

Any thoughts what this could be and how to fix it?

ImportError occured if install by "pip install", and setup by git work well

env: Python 3.5
First, I install nnmnkwii by "pip install nnmnkwii" and succeed.
but when I import nnmnkwii , error occured that:

from nnmnkwii.preprocessing.alignment import DTWAligner
File "/export/soft/env_pytorch/img/lib/python3.5/site-packages/nnmnkwii/preprocessing/alignment.py", line 4, in
from nnmnkwii.baseline.gmm import MLPG
File "/export/soft/env_pytorch/img/lib/python3.5/site-packages/nnmnkwii/baseline/gmm.py", line 5, in
from nnmnkwii.paramgen import mlpg
File "/export/soft/env_pytorch/img/lib/python3.5/site-packages/nnmnkwii/paramgen/init.py", line 3, in
from ._mlpg import build_win_mats, mlpg, mlpg_grad, unit_variance_mlpg_matrix
File "/export/soft/env_pytorch/img/lib/python3.5/site-packages/nnmnkwii/paramgen/_mlpg.py", line 9, in
from nnmnkwii.util.linalg import cholesky_inv_banded
File "/export/soft/env_pytorch/img/lib/python3.5/site-packages/nnmnkwii/util/linalg.py", line 4, in
from ._linalg import dpotri_full_L, dpotri_full_U
ImportError: No module named 'nnmnkwii.util._linalg'

but after downloading .zip file and setup by "python setup.py develop" ,It works well.
I feel confused.

How to reproduce lab files for jsut?

Thank you for great repo! I'm working on vocoder training with jsut and I tried reproduction of lab files but after running
perl ./segment_julius.pl jsut/
I've got lab files which are pretty different style than them downloaded in colab.
I'm planning to train a vocoder with own dataset and i prefer same kind of lab files as ones from colab because i've managed to reproduce the result. How can i convert the lab files after the command? Thanks in advance.

Here's the one from colab

0 50000 x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[2]
50000 100000 x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[3]
100000 1200000 x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[4]
1200000 1250000 x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[5]
1250000 1300000 x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[6]
1300000 1600000 x^sil-hh+iy=t@1_2/A:0_0_0/B:1-1-2@1-1&1-4#1-3$1-4!0-1;0-1|iy/C:1+1+4/D:0_0/E:content+1@1+3&1+2#0+1/F:content_1/G:0_0/H:4=3@1=2|L-H%/I:9=6/J:13+9-2[2]

but what I've got is

0.0000000 0.3125000 silB
0.3125000 0.3525000 m
0.3525000 0.4325000 i
0.4325000 0.5225000 z
0.5225000 0.5525000 u
0.5525000 0.6525000 o
0.6525000 0.7525000 m
0.7525000 0.8225000 a
0.8225000 0.8725000 r
0.8725000 0.9725000 e:
0.9725000 1.0925000 sh
1.0925000 1.1225000 i
1.1225000 1.2325000 a

Seq2seq

Currently I only tested frame-wise training and sequence-wise training with exactly time-aligned linguistic/acoustic frame features, but we can use seq2seq models for roughly aligned ones. Sequence-wise training with non-aligned dataset, which I haven't tested yet, might reveal library design weakness, so I should try it as soon as possible.

Non-pretrained wavenet support

Hi - I was wondering if anyone has attempted to use the world vocoder features as part of a wavenet implementation. Thus instead of vocoding through pyworld (as is done in the tutorials here) the vocoder features would be used to train the wavenet neural vocoder as in https://github.com/kan-bayashi/PytorchWaveNetVocoder.

The wavenet implementation demonstrated here is a pretrained model (trained on English data) - I assume that would make it unsuitable for another language? As such I would like to train the vocoder myself on my own language data. I have a model trained for my language using the repo I just mentioned, but I am unsure how to bootstrap this to a full TTS system. Thanks!

numpy is a dependency, but it is also required by setup, thus a "cold" pip install nnmnkwii fails

This could be related to #64 .
Installing with pip install nnmnkwii as instructed in the main readme fails when done in a new environment.
Performing pip install numpy before, and then pip install nnmnkwii succeeds.
This is because in setup.py numpy is imported in

import numpy as np

because it is used in
include_dirs=[np.get_include()],

and
include_dirs=[np.get_include()],

It would be preferable if these actions could be performed differently.
Numpy is actually a dependency of the package, but it is not granted to be installed before the actual dependencies file (setup.py) is executed.
'numpy >= 1.11.0',

Regards.

Logo

I want to replace current logo to something great.

Different multi speaker dataset

Hi I'm trying to use your code for different multi-speaker dataset. I want to have global conditioning for different speaker IDs and local conditioning for another signal looks like speech signal. Which way do you recommend? changing the prototype functions (nnmnkwii) or changing your function inside wavenet_vocoder for loading the data set (like cmu_arctic)?

Metrics package

  • Mel-cepstrum distortion (MCD)
  • Normalized mean squared error (RMSE)
  • VUV error?

problems of noisy loud audios?

Hi, @r9y9, I followed the document tutorial of synthesis audios. I have tried both your DNN based and RNN based models, step by step following the python tutorial. However, when the model trained, in the test stage, i got a noisy loud audio. Can you give me some advices about this problem. In fact, i make some little changes, i split the train and test phase into different files, and both model file and parameters are saved in local. In the test stage, I just load those files from disk and go through the following test steps. Also, I don't use the test files in your tutorial, but randomly select a lab file from 'data/slt_arctic_full_data/label_state_align/' instead. Is there anything wrong with me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.