Coder Social home page Coder Social logo

ossian's Introduction

Ossian + DNN demo

Ossian is a collection of Python code for building text-to-speech (TTS) systems, with an emphasis on easing research into building TTS systems with minimal expert supervision. Work on it started with funding from the EU FP7 Project Simple4All, and this repository contains a version which is considerable more up-to-date than that previously available. In particular, the original version of the toolkit relied on HTS to perform acoustic modelling. Although it is still possible to use HTS, it now supports the use of neural nets trained with the Merlin toolkit as duration and acoustic models. All comments and feedback about ways to improve it are very welcome.

Python dependencies

Use the pip package installer -- within a Python virtualenv as necessary -- to get some necessary packages:

pip install numpy
pip install scipy
pip install configobj
pip install scikit-learn
pip install regex
pip install lxml
pip install argparse

We will use the Merlin toolkit to train neural networks, creating the following dependencies:

pip install bandmat 
pip install theano
pip install matplotlib

Getting the tools

Clone the Ossian github repository as follows:

git clone https://github.com/CSTR-Edinburgh/Ossian.git

This will create a directory called ./Ossian; the following discussion assumes that an environment variable $OSSIAN is set to point to this directory.

Ossian relies on the Hidden Markov Model Toolkit (HTK) and HMM-based Speech Synthesis System (HTS) for alignment and (optionally) acoustic modelling -- here are some notes on obtaining and compiling the necessary tools. To get a copy of the HTK source code it is necessary to register on the HTK website to obtain a username and password. It is here assumed that these have been obtained and the environment variables $HTK_USERNAME and $HTK_PASSWORD point to them.

Running the following script will download and install the necessary tools (including Merlin):

./scripts/setup_tools.sh $HTK_USERNAME $HTK_PASSWORD

Acquire some data

Ossian expects its training data to be in the directories:

 ./corpus/<LANG>/speakers/<DATA_NAME>/txt/*.txt
 ./corpus/<LANG>/speakers/<DATA_NAME>/wav/*.wav

Text and wave files should be numbered consistently with each other. <LANG> and <DATA_NAME> are both arbitrary strings, but it is sensible to choose ones which make obvious sense.

Download and unpack this toy (Romanian) corpus for some guidance:

cd $OSSIAN
wget https://www.dropbox.com/s/uaz1ue2dked8fan/romanian_toy_demo_corpus_for_ossian.tar?dl=0
tar xvf romanian_toy_demo_corpus_for_ossian.tar\?dl\=0

This will create the following directory structures:

./corpus/rm/speakers/rss_toy_demo/
./corpus/rm/text_corpora/wikipedia_10K_words/

Let's start by building some voices on this tiny dataset. The results will sound bad, but if you can get it to speak, no matter how badly, the tools are working and you can retrain on more data of your own choosing. Below are instructions on how to train HTS-based and neural network based voices on this data.

You can download 1 hour sets of data in various languages we prepared here: http://tundra.simple4all.org/ssw8data.html

DNN-based voice using a naive recipe

Ossian trains voices according to a given 'recipe' -- the recipe specifies a sequence of processes which are applied to an utterance to turn it from text into speech, and is given in a file called $OSSIAN/recipes/<RECIPE>.cfg (where <RECIPE> is the name of a the specific recipe you are using). We will start with a recipe called naive_01_nn. If you want to add components to the synthesiser, the best way to start will be to take the file for an existing recipe, copy it to a file with a new name and modify it.

The recipe naive_01_nn is a language independent recipe which naively uses letters as acoustic modelling units. It will work reasonably for languages with sensible orthographies (e.g. Romanian) and less well for e.g. English.

Ossian will put all files generated during training on the data <DATA_NAME> in language <LANG> according to recipe <RECIPE> in a directory called:

 $OSSIAN/train/<LANG>/speakers/<DATA_NAME>/<RECIPE>/

When if has successfully trained a voice, the components needed at synthesis are copied to:

 $OSSIAN/voices/<LANG>/<DATA_NAME>/<RECIPE>/

Assuming that we want to start by training a voice from scratch, we might want to check that these locations do not already exist for our combination of data/language/recipe:

rm -r $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/ $OSSIAN/voices/rm/rss_toy_demo/naive_01_nn/

Then to train, do this:

cd $OSSIAN
python ./scripts/train.py -s rss_toy_demo -l rm naive_01_nn

As various messages printed during training will inform you, training of the neural networks themselves which will be used for duration and acoustic modelling is not directly supported within Ossian. The data and configs needed to train networks for duration and acoustic model are prepared by the above command line, but the Merlin toolkit needs to be called separately to actually train the models. The NNs it produces then need to be converted back to a suitable format for Ossian. This is a little messy, but better integration between Ossian and Merlin is an ongoing area of development.

Here's how to do this -- these same instructions will have been printed when you called ./scripts/train.py above. First, train the duration model:

cd $OSSIAN
export THEANO_FLAGS=""; python ./tools/merlin/src/run_merlin.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/duration_predictor/config.cfg

For this toy data, training on CPU like this will be quick. Alternatively, to use GPU for training, do:

./scripts/util/submit.sh ./tools/merlin/src/run_merlin.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/duration_predictor/config.cfg

If training went OK, then you can export the trained model to a better format for Ossian. The basic problem is that the NN-TTS tools store the model as a Python pickle file -- if this is made on a GPU machine, it can only be used on a GPU machine. This script converts to a more flexible format understood by Ossian -- call it with the same config file you used for training and the name of a directory when the new format should be put:

python ./scripts/util/store_merlin_model.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/duration_predictor/config.cfg $OSSIAN/voices/rm/rss_toy_demo/naive_01_nn/processors/duration_predictor

When training the duration model, there will be loads of warnings saying WARNING: no silence found! -- theses are not a problem and can be ignored.

Similarly for the acoustic model:

cd $OSSIAN
export THEANO_FLAGS=""; python ./tools/merlin/src/run_merlin.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/acoustic_predictor/config.cfg

Or:

./scripts/util/submit.sh ./tools/merlin/src/run_merlin.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/acoustic_predictor/config.cfg

Then:

python ./scripts/util/store_merlin_model.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/acoustic_predictor/config.cfg $OSSIAN/voices/rm/rss_toy_demo/naive_01_nn/processors/acoustic_predictor

If training went OK, you can synthesise speech. There is an example Romanian sentence in $OSSIAN/test/txt/romanian.txt -- we will synthesise a wave file for it in $OSSIAN/test/wav/romanian_toy_naive.wav like this:

mkdir $OSSIAN/test/wav/

python ./scripts/speak.py -l rm -s rss_toy_demo -o ./test/wav/romanian_toy_HTS.wav naive_01_nn ./test/txt/romanian.txt

You can find the audio for this sentence here for comparison (it was not used in training).

The configuration files used for duration and acoustic model training will work as-is for the toy data set, but when you move to other data sets, you will want to experiment with editing them to get better permformance. In particular, you will want to increase training_epochs to train voices on larger amounts of data; this could be set to e.g. 30 for the acoustic model and e.g. 100 for the duration model. You will also want to experiment with learning_rate, batch_size, and network architecture (hidden_layer_size, hidden_layer_type). Currently, Ossian only supports feed-forward networks.

Other recipes

We have used many other recipes with Ossian which will be documented here when cleaned up enough to be useful to others. These will give the ability to add more knowledge to the voices built, in the form of lexicons, letter-to-sound rules etc., and integrate existing trained components where they are available for the target language.

add instructions on adding more text

ossian's People

Contributors

candlewill avatar jrmeyer avatar oliverwatts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ossian's Issues

problem when training

when training the demo, I encountered this problem:

== Train voice (proc no. 1 (word_splitter)) ==
<Tokenisers.RegexTokeniser object at 0x7f6bc78bc400>
['call', 'class', 'delattr', 'dict', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'module', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'weakref', 'add_safetext', 'add_terminal_tokens', 'add_token_classes', 'apply_to_utt', 'apply_to_utts_which_have', 'child_node_type', 'class_attribute', 'class_patterns', 'classify_token', 'component_path', 'default_class', 'do_training', 'get_location', 'get_training_dir', 'language', 'lowercase_safetext', 'parallelisable', 'process_utterance', 'processor_name', 'regex', 'reuse_component', 'safetext_attribute', 'safetext_token', 'split_attribute', 'split_pattern', 'splitting_function', 'target_nodes', 'train', 'train_on_utts_which_have', 'trained', 'verify', 'voice_resources']
<class 'Tokenisers.RegexTokeniser'>
False
Train processor word_splitter
RegexTokeniser requires no training
Applying processor word_splitter
ppppppppppppppppppppppppppppp

== Train voice (proc no. 2 (segment_adder)) ==
<Phonetisers.NaivePhonetiser object at 0x7f6bc78bc8d0>
['call', 'class', 'delattr', 'dict', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'module', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'weakref', 'apply_to_utt', 'apply_to_utts_which_have', 'child_node_type', 'class_attribute', 'component_path', 'do_training', 'get_location', 'get_phonetic_segments', 'get_training_dir', 'language', 'output_attribute', 'parallelisable', 'possible_pause_classes', 'probable_pause_classes', 'process_utterance', 'processor_name', 'reuse_component', 'target_attribute', 'target_nodes', 'train', 'train_on_utts_which_have', 'trained', 'verify', 'voice_resources', 'word_classes']
<class 'Phonetisers.NaivePhonetiser'>
False
Train processor segment_adder
NaivePhonetiser requires no training
Applying processor segment_adder
uuuuuuuuuuuuuuuuuuuuuuuuuuuuu

== Train voice (proc no. 3 (word_vector_tagger)) ==
<VSMTagger.VSMTagger object at 0x7f6bc78bc3c8>
['call', 'class', 'delattr', 'dict', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'module', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'weakref', 'process_text_line', 'apply_to_utt', 'apply_to_utts_which_have', 'component_path', 'context_size', 'discretisation_method', 'do_training', 'get_location', 'get_training_dir', 'input_attribute', 'language', 'n_discretisation_bins', 'norm_counts', 'output_attribute_stem', 'parallelisable', 'process_utterance', 'processor_name', 'rank', 'replace_whitespace', 'reuse_component', 'svd_algorithm', 'table_file', 'target_nodes', 'tokenisation_pattern', 'train', 'train_on_utts_which_have', 'trained', 'unseen_method', 'verify', 'voice_resources', 'vsm']
<class 'VSMTagger.VSMTagger'>
True
Applying processor word_vector_tagger
p<Utterance.Utterance object at 0x7f6d0904b588>
//token[@token_class='word']
[]
Traceback (most recent call last):
File "./scripts/train.py", line 148, in
main_work()
File "./scripts/train.py", line 83, in main_work
train(opts, dirs)
File "./scripts/train.py", line 122, in train
voice.train(corpus)
File "/home/chenliang/1_code/2_TTS/Ossian/scripts/main/Voice.py", line 378, in train
processor.apply_to_utt(utterance, voice_mode=self.run_mode)
File "/home/chenliang/1_code/2_TTS/Ossian/scripts/processors/UtteranceProcessor.py", line 222, in apply_to_utt
self.process_utterance(utterance)
File "/home/chenliang/1_code/2_TTS/Ossian/scripts/processors/VSMTagger.py", line 76, in process_utterance
kwargs={"field": "dim
%s"%(i)})
File "/home/chenliang/1_code/2_TTS/Ossian/scripts/util/NodeProcessors.py", line 35, in enrich_nodes
assert len(nodes) > 0
AssertionError

Does anyone know why this happend? How to solve it? Thank you for helping.

output wav after training is silence

the tool setup of Ossian and Merlin goes well also the training on Arabic data (text and 16k wav) No errors occured while training but the synthesized voice is just silence. the same thing happened when train the demo accourding to the github setup instructions the output is silence

is it possible to train an LSTM neural network with Ossian/Merlin?

I have specified in /Ossian/train/.../speakers/.../naive_01_nn/processors/acoustic_predictor/config.cfg:

`
hidden_layer_size : [1024, 1024, 1024, 1024, 512]
hidden_layer_type : ['TANH', 'TANH', 'TANH', 'TANH', 'LSTM']
...

sequential_training : True
`

run_merlin runs fine, but store_merlin produces an error complaining about
' assert len(param_vals) == len(layer_types) * 2 ## W and b for each layer' in store_merlin.py

Can Ossian be used with an LSTM neural network?

Python3 support

Is there any plan to make it possible to use python3 with Ossian ?

Chinese front-end

What should I need to prepare to realize a Mandarin Chinese front-edn, are *.txt files and *.wav files enough? Is it need any additional operation to make a front-end for Chinese? Thanks.

which specific python version to try on this repo

I am using python 2.7.18 in Ubuntu 20.04 and created a virtual environment to run it. But there are syntax errors in scripts. especially print command. did anyone face similar issue? how to solve this issue? manually correcting few errors are fine.. it throws up more than 50 syntax errors. what's the solution to this problem. kindly help. Thanks!

how to train?

Is it possible to train the model with the .wav files which sample rate are 16000 and how to do?

How to add dictionary to Ossian?

Hi Oliver,

I'm wondering how to add specific entries (like numbers or abbreviations) to
something like a dictionary or lexicon for Ossian.

I'm guessing it will look like this:

1 one
2 two
25 twenty five
NASA naesuh
Ms. miss

Thanks!

-josh

Errors running demo

Hello,

I am stepping through the demo example in the README and I get some errors which result in .cmp files not getting created. I see that another user had this same issue and I tried reinstalling everything as they found to be the solution, but I still get the same errors. The errors appear to start in the step 'acoustic_feature_extractor'. What's strange is that the errors are e.g.

Cannot open file /proj/tts/tools/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/cmp/adr_diph1_001.sp.double!

but that file does exist. Also, when I separately run the command that appears to produce that error:

/proj/tts/tools/Ossian//tools/bin//x2x +df /proj/tts/tools/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/cmp/adr_diph1_001.sp.double | /proj/tts/tools/Ossian//tools/bin//sopr -R -m 32768.0 | /proj/tts/tools/Ossian//tools/bin//mcep -a 0.77 -m 59 -l 2048.0 -j 0 -f 0.0 -q 3 > /proj/tts/tools/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/cmp/adr_diph1_001.mgc

it runs without producing any error. I am attaching the output from running the demo. Any advice or suggestions are greatly appreciated.
demo_out.txt

Training issue

I had a training issue previously posed in #7.
When I run the following command
python ./scripts/train.py -s rss_toy_demo -l rm naive_01_nn

I had an error message like this
== Train voice (proc no. 6 (aligner)) ==
Train processor aligner

      Training aligner -- see /WORK/TTS/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/aligner/training/log.txt

ERROR [+2319] HERest: file name of vocabulary list expected
FATAL ERROR - Terminating program /WORK/TTS/Ossian//tools/bin//HERest
Aligner training failed

The last couple of the file /WORK/TTS/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/aligner/training/log.txt is shown below.
Reestimation failed: cmp.mmf not made

Step 3 in script /WORK/TTS/Ossian//scripts/acoustic_model_training/subrecipes/script/standard_alignment.sh failed, aborted!

Does anyone know the workaround for this issue?

Validation error increasing in acoustic model training

I am using Ossian to train a Bangla (Bengali) voice. My data-set consists of ~4000 sentences (7 hours of speech). The error graph I obtained after training the acoustic model looks like this:
adrita_acoustic

I have used (almost) all the default settings, except changing some hyper-parameters as follows:

  • batch_size       : 128
  • training_epochs  : 15
  • L2_regularization: 0.003

The synthesized speech does not sound bad. But I think there are lot of rooms for improvements available by looking at the error graph. Can someone direct me to any changes to improve the acoustic model? Do I need more data (I am working on it), or reduce the size/layer of the NN? Any suggestions about the hyper-parameters? Thanks.

Training error

The command line is:
"python ./scripts/train.py -s rss_toy_demo -l rm naive_01_nn’‘

and the err msg:

'''
-- Gather corpus
-- Train voice
/home/zhsk/Ossian-bak/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn
/home/zhsk/Ossian-bak/Ossian/voices//rm/rss_toy_demo/naive_01_nn
try loading config from python...
/home/zhsk/Ossian-bak/Ossian/recipes/naive_01_nn.cfg
{'state_contexts': [('start_time', './attribute::start'), ('end_time', './attribute::end'), ('htk_state', 'count(./preceding-sibling::state) + 1'), ('htk_monophone', './ancestor::segment/attribute::pronunciation'), ('ll_segment', './ancestor::segment/preceding::segment[2]/attribute::pronunciation'), ('l_segment', './ancestor::segment/preceding::segment[1]/attribute::pronunciation'), ('c_segment', './ancestor::segment/attribute::pronunciation'), ('r_segment', './ancestor::segment/following::segment[1]/attribute::pronunciation'), ('rr_segment', './ancestor::segment/following::segment[2]/attribute::pronunciation'), ('length_left_word', "count(ancestor::token/preceding::token[@token_class='word'][1]/descendant::segment)"), ('length_current_word', 'count(ancestor::token/descendant::segment)'), ('length_right_word', "count(ancestor::token/following::token[@token_class='word'][1]/descendant::segment)"), ('since_beginning_of_word', "count_Xs_since_start_Y('segment', 'token')"), ('till_end_of_word', "count_Xs_till_end_Y('segment', 'token')"), ('length_l_phrase_in_words', "count(ancestor::phrase/preceding::phrase[1]/descendant::token[@token_class='word'])"), ('length_c_phrase_in_words', "count(ancestor::phrase/descendant::token[@token_class='word'])"), ('length_r_phrase_in_words', "count(ancestor::phrase/following::phrase[1]/descendant::token[@token_class='word'])"), ('length_l_phrase_in_segments', 'count(ancestor::phrase/preceding::phrase[1]/descendant::segment)'), ('length_c_phrase_in_segments', 'count(ancestor::phrase/descendant::segment)'), ('length_r_phrase_in_segments', 'count(ancestor::phrase/following::phrase[1]/descendant::segment)'), ('since_phrase_start_in_segs', "count_Xs_since_start_Y('segment', 'phrase')"), ('till_phrase_end_in_segs', "count_Xs_till_end_Y('segment', 'phrase')"), ('since_phrase_start_in_words', 'count_Xs_since_start_Y('token[@token_class="word"]', 'phrase')'), ('till_phrase_end_in_words', 'count_Xs_till_end_Y('token[@token_class="word"]', 'phrase')'), ('since_start_sentence_in_segments', "count_Xs_since_start_Y('segment', 'utt')"), ('since_start_sentence_in_words', 'count_Xs_since_start_Y('token[@token_class="word"]', 'utt')'), ('since_start_sentence_in_phrases', "count_Xs_since_start_Y('phrase', 'utt')"), ('till_end_sentence_in_segments', "count_Xs_till_end_Y('segment', 'utt')"), ('till_end_sentence_in_words', 'count_Xs_till_end_Y('token[@token_class="word"]', 'utt')'), ('till_end_sentence_in_phrases', "count_Xs_till_end_Y('phrase', 'utt')"), ('length_sentence_in_segments', 'count(ancestor::utt/descendant::segment)'), ('length_sentence_in_words', "count(ancestor::utt/descendant::token[@token_class='word'])"), ('length_sentence_in_phrases', 'count(ancestor::utt/descendant::phrase)'), ('L_vsm_d1', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d1"), ('C_vsm_d1', './ancestor::token/attribute::vsm_d1'), ('R_vsm_d1', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d1"), ('L_vsm_d2', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d2"), ('C_vsm_d2', './ancestor::token/attribute::vsm_d2'), ('R_vsm_d2', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d2"), ('L_vsm_d3', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d3"), ('C_vsm_d3', './ancestor::token/attribute::vsm_d3'), ('R_vsm_d3', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d3"), ('L_vsm_d4', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d4"), ('C_vsm_d4', './ancestor::token/attribute::vsm_d4'), ('R_vsm_d4', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d4"), ('L_vsm_d5', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d5"), ('C_vsm_d5', './ancestor::token/attribute::vsm_d5'), ('R_vsm_d5', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d5"), ('L_vsm_d6', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d6"), ('C_vsm_d6', './ancestor::token/attribute::vsm_d6'), ('R_vsm_d6', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d6"), ('L_vsm_d7', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d7"), ('C_vsm_d7', './ancestor::token/attribute::vsm_d7'), ('R_vsm_d7', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d7"), ('L_vsm_d8', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d8"), ('C_vsm_d8', './ancestor::token/attribute::vsm_d8'), ('R_vsm_d8', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d8"), ('L_vsm_d9', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d9"), ('C_vsm_d9', './ancestor::token/attribute::vsm_d9'), ('R_vsm_d9', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d9"), ('L_vsm_d10', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d10"), ('C_vsm_d10', './ancestor::token/attribute::vsm_d10'), ('R_vsm_d10', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d10")], 'dur_label_maker': <FeatureDumper.FeatureDumper object at 0x7ff7f63ed3d0>, 'SKLDecisionTreePausePredictor': <class 'SKLProcessors.SKLDecisionTreePausePredictor'>, 'train_stages': [[<Tokenisers.RegexTokeniser object at 0x7ff7f63ed310>, <Phonetisers.NaivePhonetiser object at 0x7ff7f63edd50>, <VSMTagger.VSMTagger object at 0x7ff7f63edb50>], [<FeatureDumper.FeatureDumper object at 0x7ff7f63edbd0>, <FeatureExtractor.WorldExtractor object at 0x7ff7f63eddd0>, <Aligner.StateAligner object at 0x7ff7f63edc10>, <SKLProcessors.SKLDecisionTreePausePredictor object at 0x7ff7f63ede10>, <PhraseMaker.PhraseMaker object at 0x7ff7f63edcd0>, <FeatureDumper.FeatureDumper object at 0x7ff7f63edd10>], [<FeatureDumper.FeatureDumper object at 0x7ff7f63ed3d0>, <NN.NNDurationPredictor object at 0x7ff7f63edf90>, <FeatureDumper.FeatureDumper object at 0x7ff7f63ed390>, <NN.NNAcousticPredictor object at 0x7ff7f63edb10>]], 'PUNC_PATT': '[\p{C}||\p{P}||\p{S}]', 'JUNCTURE_NODES': "//token[@token_class='space'] | //token[@token_class='punctuation']", 'WorldExtractor': <class 'FeatureExtractor.WorldExtractor'>, 'RegexTokeniser': <class 'Tokenisers.RegexTokeniser'>, 'current_dir': '/home/zhsk/Ossian-bak/Ossian/recipes', 'phrase_adder': <PhraseMaker.PhraseMaker object at 0x7ff7f63edcd0>, 'dur_data_maker': <FeatureDumper.FeatureDumper object at 0x7ff7f63edd10>, 'pause_predictor': <SKLProcessors.SKLDecisionTreePausePredictor object at 0x7ff7f63ede10>, 'speech_generation': [<FeatureDumper.FeatureDumper object at 0x7ff7f63ed3d0>, <NN.NNDurationPredictor object at 0x7ff7f63edf90>, <FeatureDumper.FeatureDumper object at 0x7ff7f63ed390>, <NN.NNAcousticPredictor object at 0x7ff7f63edb10>], 'runtime_stages': [[<Tokenisers.RegexTokeniser object at 0x7ff7f63ed310>, <Phonetisers.NaivePhonetiser object at 0x7ff7f63edd50>, <VSMTagger.VSMTagger object at 0x7ff7f63edb50>], [<SKLProcessors.SKLDecisionTreePausePredictor object at 0x7ff7f63ede10>, <PhraseMaker.PhraseMaker object at 0x7ff7f63edcd0>], [<FeatureDumper.FeatureDumper object at 0x7ff7f63ed3d0>, <NN.NNDurationPredictor object at 0x7ff7f63edf90>, <FeatureDumper.FeatureDumper object at 0x7ff7f63ed390>, <NN.NNAcousticPredictor object at 0x7ff7f63edb10>]], 'text_proc': [<Tokenisers.RegexTokeniser object at 0x7ff7f63ed310>, <Phonetisers.NaivePhonetiser object at 0x7ff7f63edd50>, <VSMTagger.VSMTagger object at 0x7ff7f63edb50>], 'acoustic_predictor': <NN.NNAcousticPredictor object at 0x7ff7f63edb10>, 'duration_predictor': <NN.NNDurationPredictor object at 0x7ff7f63edf90>, 'NNAcousticPredictor': <class 'NN.NNAcousticPredictor'>, 'align_label_dumper': <FeatureDumper.FeatureDumper object at 0x7ff7f63edbd0>, 'pause_predictor_features': [('response', './attribute::has_silence="yes"'), ('token_is_punctuation', './attribute::token_class="punctuation"'), ('since_start_utterance_in_words', "count(preceding::token[@token_class='word'])"), ('till_end_utterance_in_words', "count(following::token[@token_class='word'])"), ('L_vsm_d1', "./preceding::token[@token_class='word'][1]/attribute::vsm_d1"), ('C_vsm_d1', './attribute::vsm_d1'), ('R_vsm_d1', "./following::token[@token_class='word'][1]/attribute::vsm_d1"), ('L_vsm_d2', "./preceding::token[@token_class='word'][1]/attribute::vsm_d2"), ('C_vsm_d2', './attribute::vsm_d2'), ('R_vsm_d2', "./following::token[@token_class='word'][1]/attribute::vsm_d2"), ('L_vsm_d3', "./preceding::token[@token_class='word'][1]/attribute::vsm_d3"), ('C_vsm_d3', './attribute::vsm_d3'), ('R_vsm_d3', "./following::token[@token_class='word'][1]/attribute::vsm_d3"), ('L_vsm_d4', "./preceding::token[@token_class='word'][1]/attribute::vsm_d4"), ('C_vsm_d4', './attribute::vsm_d4'), ('R_vsm_d4', "./following::token[@token_class='word'][1]/attribute::vsm_d4"), ('L_vsm_d5', "./preceding::token[@token_class='word'][1]/attribute::vsm_d5"), ('C_vsm_d5', './attribute::vsm_d5'), ('R_vsm_d5', "./following::token[@token_class='word'][1]/attribute::vsm_d5"), ('L_vsm_d6', "./preceding::token[@token_class='word'][1]/attribute::vsm_d6"), ('C_vsm_d6', './attribute::vsm_d6'), ('R_vsm_d6', "./following::token[@token_class='word'][1]/attribute::vsm_d6"), ('L_vsm_d7', "./preceding::token[@token_class='word'][1]/attribute::vsm_d7"), ('C_vsm_d7', './attribute::vsm_d7'), ('R_vsm_d7', "./following::token[@token_class='word'][1]/attribute::vsm_d7"), ('L_vsm_d8', "./preceding::token[@token_class='word'][1]/attribute::vsm_d8"), ('C_vsm_d8', './attribute::vsm_d8'), ('R_vsm_d8', "./following::token[@token_class='word'][1]/attribute::vsm_d8"), ('L_vsm_d9', "./preceding::token[@token_class='word'][1]/attribute::vsm_d9"), ('C_vsm_d9', './attribute::vsm_d9'), ('R_vsm_d9', "./following::token[@token_class='word'][1]/attribute::vsm_d9"), ('L_vsm_d10', "./preceding::token[@token_class='word'][1]/attribute::vsm_d10"), ('C_vsm_d10', './attribute::vsm_d10'), ('R_vsm_d10', "./following::token[@token_class='word'][1]/attribute::vsm_d10")], 'PhraseMaker': <class 'PhraseMaker.PhraseMaker'>, 'alignment': [<FeatureDumper.FeatureDumper object at 0x7ff7f63edbd0>, <FeatureExtractor.WorldExtractor object at 0x7ff7f63eddd0>, <Aligner.StateAligner object at 0x7ff7f63edc10>, <SKLProcessors.SKLDecisionTreePausePredictor object at 0x7ff7f63ede10>, <PhraseMaker.PhraseMaker object at 0x7ff7f63edcd0>, <FeatureDumper.FeatureDumper object at 0x7ff7f63edd10>], 'VSMTagger': <class 'VSMTagger.VSMTagger'>, 'duration_data_contexts': [('state_1_nframes', '(./state[1]/attribute::end - ./state[1]/attribute::start) div 5'), ('state_2_nframes', '(./state[2]/attribute::end - ./state[2]/attribute::start) div 5'), ('state_3_nframes', '(./state[3]/attribute::end - ./state[3]/attribute::start) div 5'), ('state_4_nframes', '(./state[4]/attribute::end - ./state[4]/attribute::start) div 5'), ('state_5_nframes', '(./state[5]/attribute::end - ./state[5]/attribute::start) div 5')], 'phone_and_state_contexts': [('length_left_word', "count(ancestor::token/preceding::token[@token_class='word'][1]/descendant::segment)"), ('length_current_word', 'count(ancestor::token/descendant::segment)'), ('length_right_word', "count(ancestor::token/following::token[@token_class='word'][1]/descendant::segment)"), ('since_beginning_of_word', "count_Xs_since_start_Y('segment', 'token')"), ('till_end_of_word', "count_Xs_till_end_Y('segment', 'token')"), ('length_l_phrase_in_words', "count(ancestor::phrase/preceding::phrase[1]/descendant::token[@token_class='word'])"), ('length_c_phrase_in_words', "count(ancestor::phrase/descendant::token[@token_class='word'])"), ('length_r_phrase_in_words', "count(ancestor::phrase/following::phrase[1]/descendant::token[@token_class='word'])"), ('length_l_phrase_in_segments', 'count(ancestor::phrase/preceding::phrase[1]/descendant::segment)'), ('length_c_phrase_in_segments', 'count(ancestor::phrase/descendant::segment)'), ('length_r_phrase_in_segments', 'count(ancestor::phrase/following::phrase[1]/descendant::segment)'), ('since_phrase_start_in_segs', "count_Xs_since_start_Y('segment', 'phrase')"), ('till_phrase_end_in_segs', "count_Xs_till_end_Y('segment', 'phrase')"), ('since_phrase_start_in_words', 'count_Xs_since_start_Y('token[@token_class="word"]', 'phrase')'), ('till_phrase_end_in_words', 'count_Xs_till_end_Y('token[@token_class="word"]', 'phrase')'), ('since_start_sentence_in_segments', "count_Xs_since_start_Y('segment', 'utt')"), ('since_start_sentence_in_words', 'count_Xs_since_start_Y('token[@token_class="word"]', 'utt')'), ('since_start_sentence_in_phrases', "count_Xs_since_start_Y('phrase', 'utt')"), ('till_end_sentence_in_segments', "count_Xs_till_end_Y('segment', 'utt')"), ('till_end_sentence_in_words', 'count_Xs_till_end_Y('token[@token_class="word"]', 'utt')'), ('till_end_sentence_in_phrases', "count_Xs_till_end_Y('phrase', 'utt')"), ('length_sentence_in_segments', 'count(ancestor::utt/descendant::segment)'), ('length_sentence_in_words', "count(ancestor::utt/descendant::token[@token_class='word'])"), ('length_sentence_in_phrases', 'count(ancestor::utt/descendant::phrase)'), ('L_vsm_d1', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d1"), ('C_vsm_d1', './ancestor::token/attribute::vsm_d1'), ('R_vsm_d1', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d1"), ('L_vsm_d2', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d2"), ('C_vsm_d2', './ancestor::token/attribute::vsm_d2'), ('R_vsm_d2', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d2"), ('L_vsm_d3', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d3"), ('C_vsm_d3', './ancestor::token/attribute::vsm_d3'), ('R_vsm_d3', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d3"), ('L_vsm_d4', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d4"), ('C_vsm_d4', './ancestor::token/attribute::vsm_d4'), ('R_vsm_d4', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d4"), ('L_vsm_d5', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d5"), ('C_vsm_d5', './ancestor::token/attribute::vsm_d5'), ('R_vsm_d5', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d5"), ('L_vsm_d6', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d6"), ('C_vsm_d6', './ancestor::token/attribute::vsm_d6'), ('R_vsm_d6', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d6"), ('L_vsm_d7', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d7"), ('C_vsm_d7', './ancestor::token/attribute::vsm_d7'), ('R_vsm_d7', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d7"), ('L_vsm_d8', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d8"), ('C_vsm_d8', './ancestor::token/attribute::vsm_d8'), ('R_vsm_d8', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d8"), ('L_vsm_d9', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d9"), ('C_vsm_d9', './ancestor::token/attribute::vsm_d9'), ('R_vsm_d9', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d9"), ('L_vsm_d10', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d10"), ('C_vsm_d10', './ancestor::token/attribute::vsm_d10'), ('R_vsm_d10', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d10")], 'word_vector_tagger': <VSMTagger.VSMTagger object at 0x7ff7f63edb50>, 'inspect': <module 'inspect' from '/usr/lib/python2.7/inspect.pyc'>, 'tokenisation_pattern': '(\p{Z}[\p{C}||\p{P}||\p{S}]\p{Z}+|\p{Z}*[\p{C}||\p{P}||\p{S}]+\Z)', 'aligner': <Aligner.StateAligner object at 0x7ff7f63edc10>, 'sys': <module 'sys' (built-in)>, 'dnn_label_maker': <FeatureDumper.FeatureDumper object at 0x7ff7f63ed390>, 'NaivePhonetiser': <class 'Phonetisers.NaivePhonetiser'>, 'tokeniser': <Tokenisers.RegexTokeniser object at 0x7ff7f63ed310>, 'LETTER_PATT': '[\p{L}||\p{N}||\p{M}]', 'AcousticModelWorld': <class 'AcousticModel.AcousticModelWorld'>, 'speech_feature_extractor': <FeatureExtractor.WorldExtractor object at 0x7ff7f63eddd0>, 'dim': 10, 'c': <module 'default.const' from '/home/zhsk/Ossian-bak/Ossian/scripts/default/const.py'>, 'FeatureDumper': <class 'FeatureDumper.FeatureDumper'>, 'word_vsm_dim': 10, 'speech_coding_config': {'delta_delta_window': '1.0 -2.0 1.0', 'static_window': '1', 'order': 59, 'delta_window': '-0.5 0.0 0.5'}, 'pause_prediction': [<SKLProcessors.SKLDecisionTreePausePredictor object at 0x7ff7f63ede10>, <PhraseMaker.PhraseMaker object at 0x7ff7f63edcd0>], 'i': 5, 'SPACE_PATT': '\p{Z}', 'PUNC_OR_SPACE_PATT': '[\p{Z}||\p{C}||\p{P}||\p{S}]', 'phonetiser': <Phonetisers.NaivePhonetiser object at 0x7ff7f63edd50>, 'NNDurationPredictor': <class 'NN.NNDurationPredictor'>, 'os': <module 'os' from '/usr/lib/python2.7/os.pyc'>, 'phone_contexts': [('htk_monophone', './attribute::pronunciation'), ('start_time', './attribute::start'), ('end_time', './attribute::end'), ('ll_segment', 'preceding::segment[2]/attribute::pronunciation'), ('l_segment', 'preceding::segment[1]/attribute::pronunciation'), ('c_segment', './attribute::pronunciation'), ('r_segment', 'following::segment[1]/attribute::pronunciation'), ('rr_segment', 'following::segment[2]/attribute::pronunciation'), ('length_left_word', "count(ancestor::token/preceding::token[@token_class='word'][1]/descendant::segment)"), ('length_current_word', 'count(ancestor::token/descendant::segment)'), ('length_right_word', "count(ancestor::token/following::token[@token_class='word'][1]/descendant::segment)"), ('since_beginning_of_word', "count_Xs_since_start_Y('segment', 'token')"), ('till_end_of_word', "count_Xs_till_end_Y('segment', 'token')"), ('length_l_phrase_in_words', "count(ancestor::phrase/preceding::phrase[1]/descendant::token[@token_class='word'])"), ('length_c_phrase_in_words', "count(ancestor::phrase/descendant::token[@token_class='word'])"), ('length_r_phrase_in_words', "count(ancestor::phrase/following::phrase[1]/descendant::token[@token_class='word'])"), ('length_l_phrase_in_segments', 'count(ancestor::phrase/preceding::phrase[1]/descendant::segment)'), ('length_c_phrase_in_segments', 'count(ancestor::phrase/descendant::segment)'), ('length_r_phrase_in_segments', 'count(ancestor::phrase/following::phrase[1]/descendant::segment)'), ('since_phrase_start_in_segs', "count_Xs_since_start_Y('segment', 'phrase')"), ('till_phrase_end_in_segs', "count_Xs_till_end_Y('segment', 'phrase')"), ('since_phrase_start_in_words', 'count_Xs_since_start_Y('token[@token_class="word"]', 'phrase')'), ('till_phrase_end_in_words', 'count_Xs_till_end_Y('token[@token_class="word"]', 'phrase')'), ('since_start_sentence_in_segments', "count_Xs_since_start_Y('segment', 'utt')"), ('since_start_sentence_in_words', 'count_Xs_since_start_Y('token[@token_class="word"]', 'utt')'), ('since_start_sentence_in_phrases', "count_Xs_since_start_Y('phrase', 'utt')"), ('till_end_sentence_in_segments', "count_Xs_till_end_Y('segment', 'utt')"), ('till_end_sentence_in_words', 'count_Xs_till_end_Y('token[@token_class="word"]', 'utt')'), ('till_end_sentence_in_phrases', "count_Xs_till_end_Y('phrase', 'utt')"), ('length_sentence_in_segments', 'count(ancestor::utt/descendant::segment)'), ('length_sentence_in_words', "count(ancestor::utt/descendant::token[@token_class='word'])"), ('length_sentence_in_phrases', 'count(ancestor::utt/descendant::phrase)'), ('L_vsm_d1', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d1"), ('C_vsm_d1', './ancestor::token/attribute::vsm_d1'), ('R_vsm_d1', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d1"), ('L_vsm_d2', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d2"), ('C_vsm_d2', './ancestor::token/attribute::vsm_d2'), ('R_vsm_d2', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d2"), ('L_vsm_d3', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d3"), ('C_vsm_d3', './ancestor::token/attribute::vsm_d3'), ('R_vsm_d3', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d3"), ('L_vsm_d4', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d4"), ('C_vsm_d4', './ancestor::token/attribute::vsm_d4'), ('R_vsm_d4', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d4"), ('L_vsm_d5', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d5"), ('C_vsm_d5', './ancestor::token/attribute::vsm_d5'), ('R_vsm_d5', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d5"), ('L_vsm_d6', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d6"), ('C_vsm_d6', './ancestor::token/attribute::vsm_d6'), ('R_vsm_d6', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d6"), ('L_vsm_d7', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d7"), ('C_vsm_d7', './ancestor::token/attribute::vsm_d7'), ('R_vsm_d7', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d7"), ('L_vsm_d8', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d8"), ('C_vsm_d8', './ancestor::token/attribute::vsm_d8'), ('R_vsm_d8', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d8"), ('L_vsm_d9', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d9"), ('C_vsm_d9', './ancestor::token/attribute::vsm_d9'), ('R_vsm_d9', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d9"), ('L_vsm_d10', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d10"), ('C_vsm_d10', './ancestor::token/attribute::vsm_d10'), ('R_vsm_d10', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d10")], 'StateAligner': <class 'Aligner.StateAligner'>}
train
Cannot load NN model from model_dir: /home/zhsk/Ossian-bak/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/duration_predictor -- not trained yet
Cannot load NN model from model_dir: /home/zhsk/Ossian-bak/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/acoustic_predictor -- not trained yet

== Train voice (proc no. 1 (word_splitter)) ==
Train processor word_splitter
RegexTokeniser requires no training
Applying processor word_splitter
p p p p p pp pp p p p pp p pp p p p pp pp p p p pp

== Train voice (proc no. 2 (segment_adder)) ==
Train processor segment_adder
NaivePhonetiser requires no training
Applying processor segment_adder
p p pp p pp p pp p p pp p p pp p p pp p p pp p p p

== Train voice (proc no. 3 (word_vector_tagger)) ==
Train processor word_vector_tagger
Count types...
Assemble cooccurance matrix...
Factorise cooccurance matrix...
Write output to /home/zhsk/Ossian-bak/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/word_vector_tagger/table_file.table
Applying processor word_vector_tagger
p p p p p p p p p p p p p p p p p p p p p p p p p p p p p

== Train voice (proc no. 4 (feature_dumper)) ==
Train processor feature_dumper
Applying processor feature_dumper
p p pp p pp p pp p p pp p p pp p pp p p pp p p p p

== Train voice (proc no. 5 (acoustic_feature_extractor)) ==
Train processor acoustic_feature_extractor
Applying processor acoustic_feature_extractor
pppppp p p p p pp pp p p p p p p p p p p p p p p p

== Train voice (proc no. 6 (aligner)) ==
Train processor aligner

      Training aligner -- see /home/zhsk/Ossian-bak/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/aligner/training/log.txt

ERROR [+2319] HERest: file name of vocabulary list expected
FATAL ERROR - Terminating program /home/zhsk/Ossian-bak/Ossian//tools/bin//HERest
’‘’

LookupTable has no field 'dim_10' among its fields:

I am truing to train on hindi language. I followed the same exact steps mentioned in the document. What is it that i am missing. If you can please provide a step by step guide to train this for Hindi language.
I tested the same for English voice and it runs perfectly.

this is the whole error:
== Train voice (proc no. 3 (word_vector_tagger)) ==
Applying processor word_vector_tagger
p
Traceback (most recent call last):
File "./scripts/train.py", line 147, in
main_work()
File "./scripts/train.py", line 82, in main_work
train(opts, dirs)
File "./scripts/train.py", line 121, in train
voice.train(corpus)
File "/home/nyg/Ossian/scripts/main/Voice.py", line 363, in train
processor.apply_to_utt(utterance, voice_mode=self.run_mode)
File "/home/nyg/Ossian/recipes/../scripts/processors/UtteranceProcessor.py", line 222, in apply_to_utt
self.process_utterance(utterance)
File "/home/nyg/Ossian/recipes/../scripts/processors/VSMTagger.py", line 76, in process_utterance
kwargs={"field": "dim_%s"%(i)})
File "/home/nyg/Ossian/scripts/util/NodeProcessors.py", line 34, in enrich_nodes
transformed_data = function(input_data, **kwargs)
File "/home/nyg/Ossian/scripts/util/LookupTable.py", line 85, in lookup
assert field in self.fields,"LookupTable has no field '%s' among its fields: %s"%(field, " ".join(self.fields))
AssertionError: LookupTable has no field 'dim_10' among its fields: dim_1 dim_2 dim_3 dim_4 dim_5 dim_6 dim_7 dim_8 dim_9

Using Ossian with Phone-level alignment

Hi,
I am struggling to use Ossian with phone level alignment instead of state level alignment. It is easy to switch from one to the other in Merlin but in Ossian it is not so obvious.
Is there a recipe I could use for that?

has anybody used Ossian to train Hindi TTS

I am lloking for better configurations for training a Hindi language TTS.
I have used Ossian with 16 layers of NN each for accoustic and Duration training with 25 and 100 epochs respectively using naive_01_nn recipe, but there is no significant change to the previous configurations.

So if you could please post a good configuration or a new recipe.

AttributeError: 'NNAcousticPredictor' object has no attribute 'model'

Traceback (most recent call last):
File "./scripts/speak.py", line 181, in
main_work()
File "./scripts/speak.py", line 131, in main_work
output_labfile=output_labfile)
File "/home/ashwini/Ossian/scripts/main/Voice.py", line 261, in synth_utterance
processor.apply_to_utt(utt, voice_mode=self.run_mode) ## utt is changed in place
File "/home/ashwini/Ossian/voices/en/ens_toy_demo/naive_01_nn/../../../../scripts/processors/UtteranceProcessor.py", line 222, in apply_to_utt
self.process_utterance(utterance)
File "/home/ashwini/Ossian/voices/en/ens_toy_demo/naive_01_nn/../../../../scripts/processors/NN.py", line 850, in process_utterance
streams = self.model.generate(label, variance_expansion=self.variance_expansion,
AttributeError: 'NNAcousticPredictor' object has no attribute 'model'

model got trained and i could save it too. but while synthesising voice this is the issue.

cmp files not genereted

I try to run the rss_toy_demo example step by step according to the README.md. However, I get errors when execute python ./scripts/train.py -s rss_toy_demo -l rm naive_01_nn. I dig into the code, found that there is no *.cmp files under /train//rm/speakers/rss_toy_demo/naive_01_nn/cmp/ folder.

Could anyone tell me why? And how to generate these files?

sys.exit('set_up_data.py: No matching data files found in %s and %s'%( \

Here is the detail info:

root@de-3879-ng-1-034425-3089955241-qx7zj:~/workspace/Projects/Ossian# python ./scripts/train.py -s rss_toy_demo -l rm naive_01_nn
 -- Gather corpus
 -- Train voice
/root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn
/root/workspace/Projects/Ossian/voices//rm/rss_toy_demo/naive_01_nn
try loading config from python...
/root/workspace/Projects/Ossian/recipes/naive_01_nn.cfg
{'state_contexts': [('start_time', './attribute::start'), ('end_time', './attribute::end'), ('htk_state', 'count(./preceding-sibling::state) + 1'), ('htk_monophone', './ancestor::segment/attribute::pronunciation'), ('ll_segment', './ancestor::segment/preceding::segment[2]/attribute::pronunciation'), ('l_segment', './ancestor::segment/preceding::segment[1]/attribute::pronunciation'), ('c_segment', './ancestor::segment/attribute::pronunciation'), ('r_segment', './ancestor::segment/following::segment[1]/attribute::pronunciation'), ('rr_segment', './ancestor::segment/following::segment[2]/attribute::pronunciation'), ('length_left_word', "count(ancestor::token/preceding::token[@token_class='word'][1]/descendant::segment)"), ('length_current_word', 'count(ancestor::token/descendant::segment)'), ('length_right_word', "count(ancestor::token/following::token[@token_class='word'][1]/descendant::segment)"), ('since_beginning_of_word', "count_Xs_since_start_Y('segment', 'token')"), ('till_end_of_word', "count_Xs_till_end_Y('segment', 'token')"), ('length_l_phrase_in_words', "count(ancestor::phrase/preceding::phrase[1]/descendant::token[@token_class='word'])"), ('length_c_phrase_in_words', "count(ancestor::phrase/descendant::token[@token_class='word'])"), ('length_r_phrase_in_words', "count(ancestor::phrase/following::phrase[1]/descendant::token[@token_class='word'])"), ('length_l_phrase_in_segments', 'count(ancestor::phrase/preceding::phrase[1]/descendant::segment)'), ('length_c_phrase_in_segments', 'count(ancestor::phrase/descendant::segment)'), ('length_r_phrase_in_segments', 'count(ancestor::phrase/following::phrase[1]/descendant::segment)'), ('since_phrase_start_in_segs', "count_Xs_since_start_Y('segment', 'phrase')"), ('till_phrase_end_in_segs', "count_Xs_till_end_Y('segment', 'phrase')"), ('since_phrase_start_in_words', 'count_Xs_since_start_Y(\'token[@token_class="word"]\', \'phrase\')'), ('till_phrase_end_in_words', 'count_Xs_till_end_Y(\'token[@token_class="word"]\', \'phrase\')'), ('since_start_sentence_in_segments', "count_Xs_since_start_Y('segment', 'utt')"), ('since_start_sentence_in_words', 'count_Xs_since_start_Y(\'token[@token_class="word"]\', \'utt\')'), ('since_start_sentence_in_phrases', "count_Xs_since_start_Y('phrase', 'utt')"), ('till_end_sentence_in_segments', "count_Xs_till_end_Y('segment', 'utt')"), ('till_end_sentence_in_words', 'count_Xs_till_end_Y(\'token[@token_class="word"]\', \'utt\')'), ('till_end_sentence_in_phrases', "count_Xs_till_end_Y('phrase', 'utt')"), ('length_sentence_in_segments', 'count(ancestor::utt/descendant::segment)'), ('length_sentence_in_words', "count(ancestor::utt/descendant::token[@token_class='word'])"), ('length_sentence_in_phrases', 'count(ancestor::utt/descendant::phrase)'), ('L_vsm_d1', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d1"), ('C_vsm_d1', './ancestor::token/attribute::vsm_d1'), ('R_vsm_d1', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d1"), ('L_vsm_d2', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d2"), ('C_vsm_d2', './ancestor::token/attribute::vsm_d2'), ('R_vsm_d2', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d2"), ('L_vsm_d3', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d3"), ('C_vsm_d3', './ancestor::token/attribute::vsm_d3'), ('R_vsm_d3', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d3"), ('L_vsm_d4', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d4"), ('C_vsm_d4', './ancestor::token/attribute::vsm_d4'), ('R_vsm_d4', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d4"), ('L_vsm_d5', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d5"), ('C_vsm_d5', './ancestor::token/attribute::vsm_d5'), ('R_vsm_d5', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d5"), ('L_vsm_d6', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d6"), ('C_vsm_d6', './ancestor::token/attribute::vsm_d6'), ('R_vsm_d6', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d6"), ('L_vsm_d7', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d7"), ('C_vsm_d7', './ancestor::token/attribute::vsm_d7'), ('R_vsm_d7', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d7"), ('L_vsm_d8', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d8"), ('C_vsm_d8', './ancestor::token/attribute::vsm_d8'), ('R_vsm_d8', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d8"), ('L_vsm_d9', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d9"), ('C_vsm_d9', './ancestor::token/attribute::vsm_d9'), ('R_vsm_d9', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d9"), ('L_vsm_d10', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d10"), ('C_vsm_d10', './ancestor::token/attribute::vsm_d10'), ('R_vsm_d10', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d10")], 'dur_label_maker': <FeatureDumper.FeatureDumper object at 0x7fd779ceb6d0>, 'SKLDecisionTreePausePredictor': <class 'SKLProcessors.SKLDecisionTreePausePredictor'>, 'train_stages': [[<Tokenisers.RegexTokeniser object at 0x7fd77a645b90>, <Phonetisers.NaivePhonetiser object at 0x7fd779cd5d50>, <VSMTagger.VSMTagger object at 0x7fd779ceb710>], [<FeatureDumper.FeatureDumper object at 0x7fd779ceb790>, <FeatureExtractor.WorldExtractor object at 0x7fd779ceb510>, <Aligner.StateAligner object at 0x7fd779ceb590>, <SKLProcessors.SKLDecisionTreePausePredictor object at 0x7fd779ceb610>, <PhraseMaker.PhraseMaker object at 0x7fd779ceb7d0>, <FeatureDumper.FeatureDumper object at 0x7fd779ceb690>], [<FeatureDumper.FeatureDumper object at 0x7fd779ceb6d0>, <NN.NNDurationPredictor object at 0x7fd77ea2bb50>, <FeatureDumper.FeatureDumper object at 0x7fd779cd5d90>, <NN.NNAcousticPredictor object at 0x7fd779cd5ed0>]], 'PUNC_PATT': '[\\p{C}||\\p{P}||\\p{S}]', 'JUNCTURE_NODES': "//token[@token_class='space'] | //token[@token_class='punctuation']", 'WorldExtractor': <class 'FeatureExtractor.WorldExtractor'>, 'RegexTokeniser': <class 'Tokenisers.RegexTokeniser'>, 'current_dir': '/root/workspace/Projects/Ossian/recipes', 'phrase_adder': <PhraseMaker.PhraseMaker object at 0x7fd779ceb7d0>, 'dur_data_maker': <FeatureDumper.FeatureDumper object at 0x7fd779ceb690>, 'pause_predictor': <SKLProcessors.SKLDecisionTreePausePredictor object at 0x7fd779ceb610>, 'speech_generation': [<FeatureDumper.FeatureDumper object at 0x7fd779ceb6d0>, <NN.NNDurationPredictor object at 0x7fd77ea2bb50>, <FeatureDumper.FeatureDumper object at 0x7fd779cd5d90>, <NN.NNAcousticPredictor object at 0x7fd779cd5ed0>], 'runtime_stages': [[<Tokenisers.RegexTokeniser object at 0x7fd77a645b90>, <Phonetisers.NaivePhonetiser object at 0x7fd779cd5d50>, <VSMTagger.VSMTagger object at 0x7fd779ceb710>], [<SKLProcessors.SKLDecisionTreePausePredictor object at 0x7fd779ceb610>, <PhraseMaker.PhraseMaker object at 0x7fd779ceb7d0>], [<FeatureDumper.FeatureDumper object at 0x7fd779ceb6d0>, <NN.NNDurationPredictor object at 0x7fd77ea2bb50>, <FeatureDumper.FeatureDumper object at 0x7fd779cd5d90>, <NN.NNAcousticPredictor object at 0x7fd779cd5ed0>]], 'text_proc': [<Tokenisers.RegexTokeniser object at 0x7fd77a645b90>, <Phonetisers.NaivePhonetiser object at 0x7fd779cd5d50>, <VSMTagger.VSMTagger object at 0x7fd779ceb710>], 'acoustic_predictor': <NN.NNAcousticPredictor object at 0x7fd779cd5ed0>, 'duration_predictor': <NN.NNDurationPredictor object at 0x7fd77ea2bb50>, 'NNAcousticPredictor': <class 'NN.NNAcousticPredictor'>, 'align_label_dumper': <FeatureDumper.FeatureDumper object at 0x7fd779ceb790>, 'pause_predictor_features': [('response', './attribute::has_silence="yes"'), ('token_is_punctuation', './attribute::token_class="punctuation"'), ('since_start_utterance_in_words', "count(preceding::token[@token_class='word'])"), ('till_end_utterance_in_words', "count(following::token[@token_class='word'])"), ('L_vsm_d1', "./preceding::token[@token_class='word'][1]/attribute::vsm_d1"), ('C_vsm_d1', './attribute::vsm_d1'), ('R_vsm_d1', "./following::token[@token_class='word'][1]/attribute::vsm_d1"), ('L_vsm_d2', "./preceding::token[@token_class='word'][1]/attribute::vsm_d2"), ('C_vsm_d2', './attribute::vsm_d2'), ('R_vsm_d2', "./following::token[@token_class='word'][1]/attribute::vsm_d2"), ('L_vsm_d3', "./preceding::token[@token_class='word'][1]/attribute::vsm_d3"), ('C_vsm_d3', './attribute::vsm_d3'), ('R_vsm_d3', "./following::token[@token_class='word'][1]/attribute::vsm_d3"), ('L_vsm_d4', "./preceding::token[@token_class='word'][1]/attribute::vsm_d4"), ('C_vsm_d4', './attribute::vsm_d4'), ('R_vsm_d4', "./following::token[@token_class='word'][1]/attribute::vsm_d4"), ('L_vsm_d5', "./preceding::token[@token_class='word'][1]/attribute::vsm_d5"), ('C_vsm_d5', './attribute::vsm_d5'), ('R_vsm_d5', "./following::token[@token_class='word'][1]/attribute::vsm_d5"), ('L_vsm_d6', "./preceding::token[@token_class='word'][1]/attribute::vsm_d6"), ('C_vsm_d6', './attribute::vsm_d6'), ('R_vsm_d6', "./following::token[@token_class='word'][1]/attribute::vsm_d6"), ('L_vsm_d7', "./preceding::token[@token_class='word'][1]/attribute::vsm_d7"), ('C_vsm_d7', './attribute::vsm_d7'), ('R_vsm_d7', "./following::token[@token_class='word'][1]/attribute::vsm_d7"), ('L_vsm_d8', "./preceding::token[@token_class='word'][1]/attribute::vsm_d8"), ('C_vsm_d8', './attribute::vsm_d8'), ('R_vsm_d8', "./following::token[@token_class='word'][1]/attribute::vsm_d8"), ('L_vsm_d9', "./preceding::token[@token_class='word'][1]/attribute::vsm_d9"), ('C_vsm_d9', './attribute::vsm_d9'), ('R_vsm_d9', "./following::token[@token_class='word'][1]/attribute::vsm_d9"), ('L_vsm_d10', "./preceding::token[@token_class='word'][1]/attribute::vsm_d10"), ('C_vsm_d10', './attribute::vsm_d10'), ('R_vsm_d10', "./following::token[@token_class='word'][1]/attribute::vsm_d10")], 'PhraseMaker': <class 'PhraseMaker.PhraseMaker'>, 'alignment': [<FeatureDumper.FeatureDumper object at 0x7fd779ceb790>, <FeatureExtractor.WorldExtractor object at 0x7fd779ceb510>, <Aligner.StateAligner object at 0x7fd779ceb590>, <SKLProcessors.SKLDecisionTreePausePredictor object at 0x7fd779ceb610>, <PhraseMaker.PhraseMaker object at 0x7fd779ceb7d0>, <FeatureDumper.FeatureDumper object at 0x7fd779ceb690>], 'VSMTagger': <class 'VSMTagger.VSMTagger'>, 'duration_data_contexts': [('state_1_nframes', '(./state[1]/attribute::end - ./state[1]/attribute::start) div 5'), ('state_2_nframes', '(./state[2]/attribute::end - ./state[2]/attribute::start) div 5'), ('state_3_nframes', '(./state[3]/attribute::end - ./state[3]/attribute::start) div 5'), ('state_4_nframes', '(./state[4]/attribute::end - ./state[4]/attribute::start) div 5'), ('state_5_nframes', '(./state[5]/attribute::end - ./state[5]/attribute::start) div 5')], 'phone_and_state_contexts': [('length_left_word', "count(ancestor::token/preceding::token[@token_class='word'][1]/descendant::segment)"), ('length_current_word', 'count(ancestor::token/descendant::segment)'), ('length_right_word', "count(ancestor::token/following::token[@token_class='word'][1]/descendant::segment)"), ('since_beginning_of_word', "count_Xs_since_start_Y('segment', 'token')"), ('till_end_of_word', "count_Xs_till_end_Y('segment', 'token')"), ('length_l_phrase_in_words', "count(ancestor::phrase/preceding::phrase[1]/descendant::token[@token_class='word'])"), ('length_c_phrase_in_words', "count(ancestor::phrase/descendant::token[@token_class='word'])"), ('length_r_phrase_in_words', "count(ancestor::phrase/following::phrase[1]/descendant::token[@token_class='word'])"), ('length_l_phrase_in_segments', 'count(ancestor::phrase/preceding::phrase[1]/descendant::segment)'), ('length_c_phrase_in_segments', 'count(ancestor::phrase/descendant::segment)'), ('length_r_phrase_in_segments', 'count(ancestor::phrase/following::phrase[1]/descendant::segment)'), ('since_phrase_start_in_segs', "count_Xs_since_start_Y('segment', 'phrase')"), ('till_phrase_end_in_segs', "count_Xs_till_end_Y('segment', 'phrase')"), ('since_phrase_start_in_words', 'count_Xs_since_start_Y(\'token[@token_class="word"]\', \'phrase\')'), ('till_phrase_end_in_words', 'count_Xs_till_end_Y(\'token[@token_class="word"]\', \'phrase\')'), ('since_start_sentence_in_segments', "count_Xs_since_start_Y('segment', 'utt')"), ('since_start_sentence_in_words', 'count_Xs_since_start_Y(\'token[@token_class="word"]\', \'utt\')'), ('since_start_sentence_in_phrases', "count_Xs_since_start_Y('phrase', 'utt')"), ('till_end_sentence_in_segments', "count_Xs_till_end_Y('segment', 'utt')"), ('till_end_sentence_in_words', 'count_Xs_till_end_Y(\'token[@token_class="word"]\', \'utt\')'), ('till_end_sentence_in_phrases', "count_Xs_till_end_Y('phrase', 'utt')"), ('length_sentence_in_segments', 'count(ancestor::utt/descendant::segment)'), ('length_sentence_in_words', "count(ancestor::utt/descendant::token[@token_class='word'])"), ('length_sentence_in_phrases', 'count(ancestor::utt/descendant::phrase)'), ('L_vsm_d1', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d1"), ('C_vsm_d1', './ancestor::token/attribute::vsm_d1'), ('R_vsm_d1', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d1"), ('L_vsm_d2', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d2"), ('C_vsm_d2', './ancestor::token/attribute::vsm_d2'), ('R_vsm_d2', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d2"), ('L_vsm_d3', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d3"), ('C_vsm_d3', './ancestor::token/attribute::vsm_d3'), ('R_vsm_d3', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d3"), ('L_vsm_d4', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d4"), ('C_vsm_d4', './ancestor::token/attribute::vsm_d4'), ('R_vsm_d4', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d4"), ('L_vsm_d5', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d5"), ('C_vsm_d5', './ancestor::token/attribute::vsm_d5'), ('R_vsm_d5', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d5"), ('L_vsm_d6', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d6"), ('C_vsm_d6', './ancestor::token/attribute::vsm_d6'), ('R_vsm_d6', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d6"), ('L_vsm_d7', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d7"), ('C_vsm_d7', './ancestor::token/attribute::vsm_d7'), ('R_vsm_d7', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d7"), ('L_vsm_d8', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d8"), ('C_vsm_d8', './ancestor::token/attribute::vsm_d8'), ('R_vsm_d8', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d8"), ('L_vsm_d9', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d9"), ('C_vsm_d9', './ancestor::token/attribute::vsm_d9'), ('R_vsm_d9', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d9"), ('L_vsm_d10', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d10"), ('C_vsm_d10', './ancestor::token/attribute::vsm_d10'), ('R_vsm_d10', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d10")], 'word_vector_tagger': <VSMTagger.VSMTagger object at 0x7fd779ceb710>, 'inspect': <module 'inspect' from '/usr/lib/python2.7/inspect.pyc'>, 'tokenisation_pattern': '(\\p{Z}*[\\p{C}||\\p{P}||\\p{S}]*\\p{Z}+|\\p{Z}*[\\p{C}||\\p{P}||\\p{S}]+\\Z)', 'aligner': <Aligner.StateAligner object at 0x7fd779ceb590>, 'sys': <module 'sys' (built-in)>, 'dnn_label_maker': <FeatureDumper.FeatureDumper object at 0x7fd779cd5d90>, 'NaivePhonetiser': <class 'Phonetisers.NaivePhonetiser'>, 'tokeniser': <Tokenisers.RegexTokeniser object at 0x7fd77a645b90>, 'LETTER_PATT': '[\\p{L}||\\p{N}||\\p{M}]', 'AcousticModelWorld': <class 'AcousticModel.AcousticModelWorld'>, 'speech_feature_extractor': <FeatureExtractor.WorldExtractor object at 0x7fd779ceb510>, 'dim': 10, 'c': <module 'default.const' from '/root/workspace/Projects/Ossian/scripts/default/const.pyc'>, 'FeatureDumper': <class 'FeatureDumper.FeatureDumper'>, 'word_vsm_dim': 10, 'speech_coding_config': {'delta_delta_window': '1.0 -2.0 1.0', 'static_window': '1', 'order': 59, 'delta_window': '-0.5 0.0 0.5'}, 'pause_prediction': [<SKLProcessors.SKLDecisionTreePausePredictor object at 0x7fd779ceb610>, <PhraseMaker.PhraseMaker object at 0x7fd779ceb7d0>], 'i': 5, 'SPACE_PATT': '\\p{Z}', 'PUNC_OR_SPACE_PATT': '[\\p{Z}||\\p{C}||\\p{P}||\\p{S}]', 'phonetiser': <Phonetisers.NaivePhonetiser object at 0x7fd779cd5d50>, 'NNDurationPredictor': <class 'NN.NNDurationPredictor'>, 'os': <module 'os' from '/usr/lib/python2.7/os.pyc'>, 'phone_contexts': [('htk_monophone', './attribute::pronunciation'), ('start_time', './attribute::start'), ('end_time', './attribute::end'), ('ll_segment', 'preceding::segment[2]/attribute::pronunciation'), ('l_segment', 'preceding::segment[1]/attribute::pronunciation'), ('c_segment', './attribute::pronunciation'), ('r_segment', 'following::segment[1]/attribute::pronunciation'), ('rr_segment', 'following::segment[2]/attribute::pronunciation'), ('length_left_word', "count(ancestor::token/preceding::token[@token_class='word'][1]/descendant::segment)"), ('length_current_word', 'count(ancestor::token/descendant::segment)'), ('length_right_word', "count(ancestor::token/following::token[@token_class='word'][1]/descendant::segment)"), ('since_beginning_of_word', "count_Xs_since_start_Y('segment', 'token')"), ('till_end_of_word', "count_Xs_till_end_Y('segment', 'token')"), ('length_l_phrase_in_words', "count(ancestor::phrase/preceding::phrase[1]/descendant::token[@token_class='word'])"), ('length_c_phrase_in_words', "count(ancestor::phrase/descendant::token[@token_class='word'])"), ('length_r_phrase_in_words', "count(ancestor::phrase/following::phrase[1]/descendant::token[@token_class='word'])"), ('length_l_phrase_in_segments', 'count(ancestor::phrase/preceding::phrase[1]/descendant::segment)'), ('length_c_phrase_in_segments', 'count(ancestor::phrase/descendant::segment)'), ('length_r_phrase_in_segments', 'count(ancestor::phrase/following::phrase[1]/descendant::segment)'), ('since_phrase_start_in_segs', "count_Xs_since_start_Y('segment', 'phrase')"), ('till_phrase_end_in_segs', "count_Xs_till_end_Y('segment', 'phrase')"), ('since_phrase_start_in_words', 'count_Xs_since_start_Y(\'token[@token_class="word"]\', \'phrase\')'), ('till_phrase_end_in_words', 'count_Xs_till_end_Y(\'token[@token_class="word"]\', \'phrase\')'), ('since_start_sentence_in_segments', "count_Xs_since_start_Y('segment', 'utt')"), ('since_start_sentence_in_words', 'count_Xs_since_start_Y(\'token[@token_class="word"]\', \'utt\')'), ('since_start_sentence_in_phrases', "count_Xs_since_start_Y('phrase', 'utt')"), ('till_end_sentence_in_segments', "count_Xs_till_end_Y('segment', 'utt')"), ('till_end_sentence_in_words', 'count_Xs_till_end_Y(\'token[@token_class="word"]\', \'utt\')'), ('till_end_sentence_in_phrases', "count_Xs_till_end_Y('phrase', 'utt')"), ('length_sentence_in_segments', 'count(ancestor::utt/descendant::segment)'), ('length_sentence_in_words', "count(ancestor::utt/descendant::token[@token_class='word'])"), ('length_sentence_in_phrases', 'count(ancestor::utt/descendant::phrase)'), ('L_vsm_d1', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d1"), ('C_vsm_d1', './ancestor::token/attribute::vsm_d1'), ('R_vsm_d1', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d1"), ('L_vsm_d2', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d2"), ('C_vsm_d2', './ancestor::token/attribute::vsm_d2'), ('R_vsm_d2', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d2"), ('L_vsm_d3', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d3"), ('C_vsm_d3', './ancestor::token/attribute::vsm_d3'), ('R_vsm_d3', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d3"), ('L_vsm_d4', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d4"), ('C_vsm_d4', './ancestor::token/attribute::vsm_d4'), ('R_vsm_d4', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d4"), ('L_vsm_d5', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d5"), ('C_vsm_d5', './ancestor::token/attribute::vsm_d5'), ('R_vsm_d5', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d5"), ('L_vsm_d6', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d6"), ('C_vsm_d6', './ancestor::token/attribute::vsm_d6'), ('R_vsm_d6', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d6"), ('L_vsm_d7', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d7"), ('C_vsm_d7', './ancestor::token/attribute::vsm_d7'), ('R_vsm_d7', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d7"), ('L_vsm_d8', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d8"), ('C_vsm_d8', './ancestor::token/attribute::vsm_d8'), ('R_vsm_d8', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d8"), ('L_vsm_d9', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d9"), ('C_vsm_d9', './ancestor::token/attribute::vsm_d9'), ('R_vsm_d9', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d9"), ('L_vsm_d10', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d10"), ('C_vsm_d10', './ancestor::token/attribute::vsm_d10'), ('R_vsm_d10', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d10")], 'StateAligner': <class 'Aligner.StateAligner'>}
train
Cannot load NN model from model_dir: /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/duration_predictor -- not trained yet
Cannot load NN model from model_dir: /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/acoustic_predictor -- not trained yet


== Train voice (proc no. 1 (word_splitter))  ==
Train processor word_splitter
RegexTokeniser requires no training
          Applying processor word_splitter
uuuuuuuuuuuuuuuuuuuuuuuuuuuuu

== Train voice (proc no. 2 (segment_adder))  ==
Train processor segment_adder
NaivePhonetiser requires no training
          Applying processor segment_adder
uuuuuuuuuuuuuuuuuuuuuuuuuuuuu

== Train voice (proc no. 3 (word_vector_tagger))  ==
          Applying processor word_vector_tagger
u u u u u u u u u u u u u u u u u u u u u u u u u u u u u 

== Train voice (proc no. 4 (feature_dumper))  ==
Train processor feature_dumper
FeatureDumper already trained -- questions exist:
/root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn//SomeFileName
          Applying processor feature_dumper
uuuuuuuuuuuuuuuuuuuuuuuuuuuuu

== Train voice (proc no. 5 (acoustic_feature_extractor))  ==
Train processor acoustic_feature_extractor
          Applying processor acoustic_feature_extractor
uuuuuuuuuuuuuuuuuuuuuuuuuuuuu

== Train voice (proc no. 6 (aligner))  ==
Train processor aligner

          Training aligner -- see /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/aligner/training/log.txt

set_up_data.py: No matching data files found in /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/align_lab and /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/cmp
Aligner training failed

Thanks very much.

Errors when installing tools

I got the following error-message when running the setup_tools.sh script:

/bin/bash: clang: command not found Makefile:239: recipe for target 'delta.o' failed make[2]: *** [delta.o] Error 127 make[2]: Leaving directory '/mnt/c/Users/ild/OneDrive - University of Edinburgh/project/Ossian/tools/downloads/SPTK-3.6/bin/delta' Makefile:317: recipe for target 'install-recursive' failed make[1]: *** [install-recursive] Error 1 make[1]: Leaving directory '/mnt/c/Users/ild/OneDrive - University of Edinburgh/project/Ossian/tools/downloads/SPTK-3.6/bin' Makefile:268: recipe for target 'install-recursive' failed make: *** [install-recursive] Error 1 22

I set up the $OSSIAN directory variable in a separate setup-script, and since the OneDrive-folder has spaces in it, I had to reassign IFS="" . Since this change did not follow along with the $OSSIAN variable, I had to set IFS="" in the setup_tools.sh script as well (so the path wouldn't get cut off when installing Merlin). Is that what might cause the problems, or is it unrelated? I have tried to google the errors, but there doesn't seem to be any straight-forward answers to what is going wrong here.

aligner failed

ERROR [+7390] StepAlpha: Alpha prune failed sq(9) > qHi(8) at time 71260 FATAL ERROR - Terminating program /home/ ashwini/Ossian//tools/bin//HERest Aligner training failed

Training takes up too much disk space

Hi Oliver,

When I do training on a 3.5 hour corpus, I run out of disk space (12Gigs) very fast:

OSSIAN$ du -h train/chv/speakers/news/naive_01_nn -d 1
26M     train/chv/speakers/news/naive_01_nn/time_lab
888K    train/chv/speakers/news/naive_01_nn/dnn_training_ACOUST
9.7G    train/chv/speakers/news/naive_01_nn/cmp
129M    train/chv/speakers/news/naive_01_nn/lab_dur
8.7M    train/chv/speakers/news/naive_01_nn/align_lab
8.5M    train/chv/speakers/news/naive_01_nn/dur
64M     train/chv/speakers/news/naive_01_nn/utt
253M    train/chv/speakers/news/naive_01_nn/processors
12M     train/chv/speakers/news/naive_01_nn/align_log
629M    train/chv/speakers/news/naive_01_nn/lab_dnn
11G     train/chv/speakers/news/naive_01_nn

I see most of the space is taken up under the cmp directory:

OSSIAN$ du -h train/chv/speakers/news/naive_01_nn/cmp -d 1
4.0K    train/chv/speakers/news/naive_01_nn/cmp/nn_mgc_lf0_vuv_bap_199
4.0K    train/chv/speakers/news/naive_01_nn/cmp/nn_norm_mgc_lf0_vuv_bap_199
4.4G    train/chv/speakers/news/naive_01_nn/cmp/binary_label_502
2.9G    train/chv/speakers/news/naive_01_nn/cmp/nn_no_silence_lab_502
4.0K    train/chv/speakers/news/naive_01_nn/cmp/nn_no_silence_lab_norm_502
9.7G    train/chv/speakers/news/naive_01_nn/cmp

So the binary_label_502 and nn_no_silence_lab_502 take up the most space under cmp.

Any work arounds?

I'm running Ossian on AWS with 16G disk space, and since the OS takes up about 4G, training crashes after I train the frontend and move on to Merlin.

Specifically, I crash after this command:

python ./tools/merlin/src/run_merlin.py /home/ubuntu/Ossian/train//chv/speakers/news/naive_01_nn/processors/acoustic_predictor/config.cfg

Thanks!

-josh

can't run the demo

Dear all,

I have successfully executed the setup script:
./scripts/setup_tools.sh

and prepared the training data:
./corpus/rm/speakers/rss_toy_demo/
./corpus/rm/text_corpora/wikipedia_10K_words/

But I can't run the demo. Here is the output:

python ./scripts/train.py -s rss_toy_demo -l rm -p 1 naive_01_nn
/home/liao/Ossian/
-- Gather corpus
-- Train voice
/home/liao/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn
/home/liao/Ossian/voices//rm/rss_toy_demo/naive_01_nn
try loading config from python...
/home/liao/Ossian/recipes/naive_01_nn.cfg
trouble importting from merlin -- installed properly?

Another thing is there is a bug in "tools/g2p/Utility.cc" while running test_release.sh

Utility.cc: In function ‘int Core::getline(std::istream&, std::__cxx11::string&, std::__cxx11::string)’:
Utility.cc:43:21: error: ‘EOF’ was not declared in this scope
if (is.get() == EOF) return EOF;
^
Utility.cc:48:35: error: ‘EOF’ was not declared in this scope
while (((token = is.get()) != EOF) &&
^
error: command 'g++' failed with exit status 1
rm: cannot remove './train/en/speakers/tundra_toy_demo/*': No such file or directory
rm: cannot remove 'voices/en/tundra_toy_demo/english_gold_basic': No such file or directory
-- Gather corpus
Traceback (most recent call last):
File "./scripts/train.py", line 147, in
main_work()
File "./scripts/train.py", line 82, in main_work
train(opts, dirs)
File "./scripts/train.py", line 111, in train
for f in os.listdir(c):
OSError: [Errno 2] No such file or directory:
'/home/liao/Ossian/corpus/en/speakers/tundra_toy_demo/txt'
/home/liao/Ossian/train//en/speakers/tundra_toy_demo/english_gold_basic
/home/liao/Ossian/voices//en/tundra_toy_demo/english_gold_basic
No voice of specified configuration exists to synthesise from

Could you give me some tips?

Thanks a lot!

Best Regards,
Yuanfu

Training error

Hi,

Iam getting below error. Please see the screen log.

Can anyone help me how to solve this issue. Thanks in advance.

[@token_class='word'][1]/attribute::vsm_d9"), ('C_vsm_d9', './ancestor::token/attribute::vsm_d9'), ('R_vsm_d9', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d9"), ('L_vsm_d10', "./ancestor::token/preceding::token[@token_class='word'][1]/attribute::vsm_d10"), ('C_vsm_d10', './ancestor::token/attribute::vsm_d10'), ('R_vsm_d10', "./ancestor::token/following::token[@token_class='word'][1]/attribute::vsm_d10")], 'StateAligner': <class 'Aligner.StateAligner'>}
train
Cannot load NN model from model_dir: /home/user/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/duration_predictor -- not trained yet
Cannot load NN model from model_dir: /home/user/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/acoustic_predictor -- not trained yet

== Train voice (proc no. 1 (word_splitter)) ==
Train processor word_splitter
RegexTokeniser requires no training
Applying processor word_splitter
u u u u u u uu u u u u uu u u u u u uu u u u u u u u u

== Train voice (proc no. 2 (segment_adder)) ==
Train processor segment_adder
NaivePhonetiser requires no training
Applying processor segment_adder
u u u u u u uu u u u u u uu u u u u u uu u u u u u u u

== Train voice (proc no. 3 (word_vector_tagger)) ==
Applying processor word_vector_tagger
u u u u u u u u u u u u u u u u u u u u u u u u u u u u u

== Train voice (proc no. 4 (feature_dumper)) ==
Train processor feature_dumper
FeatureDumper already trained -- questions exist:
/home/user/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn//SomeFileName
Applying processor feature_dumper
u u u u u u uu u u u u u uu u u u u u uu u u u u u u u

== Train voice (proc no. 5 (acoustic_feature_extractor)) ==
Train processor acoustic_feature_extractor
Applying processor acoustic_feature_extractor
u u u u u u uu u u u u u uu u u u u u uu u u u u u u u

== Train voice (proc no. 6 (aligner)) ==
Train processor aligner

      Training aligner -- see /home/user/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/aligner/training/log.txt

sh: 1: /home/user/Ossian//tools/bin//HLEd: not found
sh: 1: /home/user/Ossian//tools/bin//HLEd: not found
Aligner training failed
(venv)

Change Processor File / class names

Hi Oliver et al,

I was looking up some things on SKLDecisionTreePausePredictor, because the SKL part wasn't obvious to me. Then I saw it's sci-kit learn, and that there's a file called SKLProcessors.py with other processors built on sci-kit learn.

I suggest the SKL be removed from class and file names, to make things more transparent.

In general, it seems that the *.py scripts in $OSSIAN/scripts/processors/* are one processor per file. (e.g. PhraseMaker.py, VSMTagger.py, Syllabifier.py).

SKLProcessors.py deviates from this trend, with multiple processors contained in the one file. I suggest that some kind of standard for processors and their filenames be set.

Maybe stick with the one processor per python script?

Thoughts?

-josh

Ossian training

After running Duration predictor command, TANH model file is not getting created inside nnets model folder which is inside dnn_training_DUR folder. How to correct this? Acoustic predictor is working by the way.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.