yajiemiao / eesen Goto Github PK

View Code? Open in Web Editor NEW

This project forked from srvk/eesen

202.0 202.0 72.0 2.18 MB

The official repository of the Eesen project

License: Apache License 2.0

Shell 8.03% Perl 4.56% Python 0.39% Makefile 0.87% C++ 82.08% C 0.78% Cuda 3.25% Protocol Buffer 0.04%

eesen's People

Contributors

Stargazers

Watchers

Forkers

rosrad sw005320 deeplearningsprint uphantom yujunlhz jren2012 alex-ht tjadamlee daidengxin seaokcs simpleoier halspeech cyu0913 hdubey rohithkodali sohuren wqn628 duum huangguanzhe dreadlord1984 g10dras zxsted superyangwenwen yufish chagge tjuwxb areslp elendizzie pdaicode haipengliu qoboty agangzz tangchangcheng wombat78 williamchenwl peidong-wang jiakuilee krislc aprilyapingzhang gherao generalization germany-zhu zhuleiustc qizailiu rares14324 chssozxw yanchaomars kevinyang7 drasted lvzhiqiang shubhampachori12110095 haisongding abhi3p buaaawen xdcesc flyahead furuya1223 joeblack22 sunxingxingtf wangbaorui zhuty94 lonelybeansprouts xiachen1993 weimingtom tanghaitao1994 dapao1988 liyongze hahadashi housebaby beimingcao zcth428

eesen's Issues

BLAS alternatives

Is is possible to use an alternative to ATLAS?

Running the install for ATLAS I hit the "CPU Throttling apparently enabled!" error/abort. I'm reluctant to disable CPU dynamic frequency scaling, especially when I have a number of other linear algebra libraries already installed (cuda/cublas, Openblas, boost/ublas, scipy/blas). Wouldn't the cuda libcublas be acceptable, or even preferable?

getting different results with same setup

Hi, YaJie,
I am trying to train LSTM model for handwriting recognition by eesen. I got different results with same experiment setup. The token accuracy is same at first several iterations; then it becomes different at subsequent other iterations. Could you tell me the reasons?
Some log information is as follows:
Token Accuracy in first pass training:
./exp/log/tr.iter1.log:TOKEN_ACCURACY >> -0.00429153% <<
./exp/log/tr.iter2.log:TOKEN_ACCURACY >> 0% <<
./exp/log/tr.iter3.log:TOKEN_ACCURACY >> 6.46579% <<
./exp/log/tr.iter4.log:TOKEN_ACCURACY >> 72.9836% <<
./exp/log/tr.iter5.log:TOKEN_ACCURACY >> 87.4248% <<
./exp/log/tr.iter6.log:TOKEN_ACCURACY >> 90.1313% <<
./exp/log/tr.iter7.log:TOKEN_ACCURACY >> 92.1912% <<
./exp/log/tr.iter8.log:TOKEN_ACCURACY >> 92.1286% <<
./exp/log/tr.iter9.log:TOKEN_ACCURACY >> 94.884% <<
./exp/log/tr.iter10.log:TOKEN_ACCURACY >> 96.2871% <<
./exp/log/tr.iter11.log:TOKEN_ACCURACY >> 97.1303% <<
./exp/log/tr.iter12.log:TOKEN_ACCURACY >> 97.5863% <<
./exp/log/tr.iter13.log:TOKEN_ACCURACY >> 97.9297% <<
./exp/log/tr.iter14.log:TOKEN_ACCURACY >> 98.1461% <<
./exp/log/tr.iter15.log:TOKEN_ACCURACY >> 98.3608% <<

Token Accuracy in second pass training:

./exp/log/tr.iter1.log:TOKEN_ACCURACY >> -0.00429153% <<
./exp/log/tr.iter2.log:TOKEN_ACCURACY >> 0% <<
./exp/log/tr.iter3.log:TOKEN_ACCURACY >> 17.2112% <<
./exp/log/tr.iter4.log:TOKEN_ACCURACY >> 83.2679% <<
./exp/log/tr.iter5.log:TOKEN_ACCURACY >> 92.4763% <<
./exp/log/tr.iter6.log:TOKEN_ACCURACY >> 94.1155% <<
./exp/log/tr.iter7.log:TOKEN_ACCURACY >> 95.1631% <<
./exp/log/tr.iter8.log:TOKEN_ACCURACY >> 95.7934% <<
./exp/log/tr.iter9.log:TOKEN_ACCURACY >> 96.2399% <<
./exp/log/tr.iter10.log:TOKEN_ACCURACY >> 97.2067% <<
./exp/log/tr.iter11.log:TOKEN_ACCURACY >> 97.8258% <<
./exp/log/tr.iter12.log:TOKEN_ACCURACY >> 98.2346% <<
./exp/log/tr.iter13.log:TOKEN_ACCURACY >> 98.4879% <<
./exp/log/tr.iter14.log:TOKEN_ACCURACY >> 98.7412% <<
./exp/log/tr.iter15.log:TOKEN_ACCURACY >> 98.8743% <<

Some logs in first pass training:

< VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 1010 sequences (0.167242Hr): Obj(log[Pzx]) = -18.7755   TokenAcc = -0.13721%

< VLOG1 After 2020 sequences (0.385994Hr): Obj(log[Pzx]) = -14.9581 TokenAcc = 0%
< VLOG1 After 3030 sequences (0.638081Hr): Obj(log[Pzx]) = -16.637 TokenAcc = 0%
< VLOG1 After 4040 sequences (0.918656Hr): Obj(log[Pzx]) = -17.8677 TokenAcc = 0%
< VLOG1 After 5050 sequences (1.22437Hr): Obj(log[Pzx]) = -19.0778 TokenAcc = 0%
< VLOG1 After 6060 sequences (1.55504Hr): Obj(log[Pzx]) = -20.3243 TokenAcc = 0%
< VLOG1 After 7070 sequences (1.91034Hr): Obj(log[Pzx]) = -21.2999 TokenAcc = 0%
< VLOG1 After 8080 sequences (2.29047Hr): Obj(log[Pzx]) = -22.514 TokenAcc = 0%
< VLOG1 After 9090 sequences (2.69605Hr): Obj(log[Pzx]) = -23.3817 TokenAcc = 0%
< VLOG1 After 10100 sequences (3.12853Hr): Obj(log[Pzx]) = -24.4061 TokenAcc = 0%
< VLOG1 After 11110 sequences (3.58984Hr): Obj(log[Pzx]) = -25.2887 TokenAcc = 0%
< VLOG1 After 12120 sequences (4.08381Hr): Obj(log[Pzx]) = -26.0652 TokenAcc = 0%
< VLOG1 After 13130 sequences (4.61721Hr): Obj(log[Pzx]) = -27.3839 TokenAcc = 0%
< VLOG1 After 14140 sequences (5.20036Hr): Obj(log[Pzx]) = -28.5032 TokenAcc = 0%
< VLOG1 After 15150 sequences (5.85033Hr): Obj(log[Pzx]) = -30.1663 TokenAcc = 0%
< VLOG1 After 16160 sequences (6.62743Hr): Obj(log[Pzx]) = -32.4056 TokenAcc = 0%

Some logs in second pass training:

VLOG1 After 1010 sequences (0.167242Hr): Obj(log[Pzx]) = -17.1955 TokenAcc = -0.13721%
VLOG1 After 2020 sequences (0.385994Hr): Obj(log[Pzx]) = -14.9554 TokenAcc = 0%
VLOG1 After 3030 sequences (0.638081Hr): Obj(log[Pzx]) = -16.6586 TokenAcc = 0%
VLOG1 After 4040 sequences (0.918656Hr): Obj(log[Pzx]) = -17.8926 TokenAcc = 0%
VLOG1 After 5050 sequences (1.22437Hr): Obj(log[Pzx]) = -19.1039 TokenAcc = 0%
VLOG1 After 6060 sequences (1.55504Hr): Obj(log[Pzx]) = -20.3462 TokenAcc = 0%
VLOG1 After 7070 sequences (1.91034Hr): Obj(log[Pzx]) = -21.3233 TokenAcc = 0%
VLOG1 After 8080 sequences (2.29047Hr): Obj(log[Pzx]) = -22.5532 TokenAcc = 0%
VLOG1 After 9090 sequences (2.69605Hr): Obj(log[Pzx]) = -23.4058 TokenAcc = 0%
VLOG1 After 10100 sequences (3.12853Hr): Obj(log[Pzx]) = -24.4316 TokenAcc = 0%
VLOG1 After 11110 sequences (3.58984Hr): Obj(log[Pzx]) = -25.3131 TokenAcc = 0%
VLOG1 After 12120 sequences (4.08381Hr): Obj(log[Pzx]) = -26.0824 TokenAcc = 0%
VLOG1 After 13130 sequences (4.61721Hr): Obj(log[Pzx]) = -27.3863 TokenAcc = 0%
VLOG1 After 14140 sequences (5.20036Hr): Obj(log[Pzx]) = -28.4302 TokenAcc = 0%
VLOG1 After 15150 sequences (5.85033Hr): Obj(log[Pzx]) = -29.8969 TokenAcc = 0%
VLOG1 After 16160 sequences (6.62743Hr): Obj(log[Pzx]) = -32.0402 TokenAcc = 0%

Why is the Obj(log[Pzx]) different?

No gradient clipping in parallel version lstm training?

Hi
I saw that gradient clipping from non-parallel version of lstm code. (i.e. bilstm-layer.h)
But I cannot see corresponding part in parallel version of lstm. (i.e. bilstm-parallel-layer.h)
Although training seems fine for almost all cases ( I tried with several different size architecture on swbd), I wonder whether there is some reason you did not include clipping in parallel version lstm.

lattice

different Token Accuracy on same sets

I am trying to train RNN model for handwriting char actor string recognition by eesen tool. I use training set as cv set. In other words, My cv set is same as training set. But I find Token Accuracy of training set is different with one of cv set at each epoch in training log.
Some log is as following:
EPOCH 4 RUNNING ... ENDS [2016-Jul-12 16:15:11]: lrate 0.000245, TRAIN ACCURACY 55.2512%, VALID ACCURACY 78.5178%

gpucompute: cuda-matrix.cc:1075:57: error: ‘cuda_apply_heaviside’ was not declared in this scope

The function "cuda_apply_heaviside" does not seem to be declared in gpucompute. As a result, the current version of the gpu module (and EESEN) does not compile:

g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H=1 -rdynamic -DHAVE_CXXABI_H -DHAVE_OPENBLAS -I /opt/openblas/include -I /srv/data/speech/eesen/tools/openfst/include -DHAVE_OPENFST_GE_10400 -std=c++0x -g -DHAVE_CUDA -I/usr/local/cuda/include -c -o cuda-matrix.o cuda-matrix.cc
cuda-matrix.cc: In instantiation of ‘void eesen::CuMatrixBase::ApplyHeaviside() [with Real = float]’:
cuda-matrix.cc:1227:16: required from here
cuda-matrix.cc:1075:57: error: ‘cuda_apply_heaviside’ was not declared in this scope
cuda_apply_heaviside(dimGrid, dimBlock, data_, Dim());
^
cuda-matrix.cc: In instantiation of ‘void eesen::CuMatrixBase::ApplyHeaviside() [with Real = double]’:
cuda-matrix.cc:1228:16: required from here
cuda-matrix.cc:1075:57: error: ‘cuda_apply_heaviside’ was not declared in this scope
make: *** [cuda-matrix.o] Error 1

As ApplyHeaviside() was introduced in the last commit and searching for cuda_apply_heaviside points to it being available in Kaldi's cu-kernels.h but not in EESEN, my guess is that there were some changes missing in the last commit.

SVN checkout error

Because of the existence of a .svn/ folder in the src/makefiles/ folder, SVN checkout fails

the output of LSTM

First, thanks for your help all the time. And I have been being confused by the modeled units all the time .For instance : The unit.txt

And I wonder why we should model the first phone and the second phone ,Actually,both of them don't exist in my training label.Can I delete them and not model them ?
any help would be appreciated.

Cuda memory

Hi~
Thanks for your help all the time,I had a question again,when I run experiment on my own data,the tr.iter 1.log showed:
WARNING (train-ctc-parallel:SelectGpuId():cuda-device.cc:150) Suggestion: use 'nvidia-smi -c 1' to set compute exclusive mode
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:262) Selecting from 4 GPUs
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(0): GeForce GTX 980 free:4000M, used:94M, total:4095M, free/total:0.976835
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(1): GeForce GTX 980 free:4000M, used:94M, total:4095M, free/total:0.976835
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(2): GeForce GTX 980 free:4000M, used:94M, total:4095M, free/total:0.976835
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(3): GeForce GTX 980 free:4000M, used:94M, total:4095M, free/total:0.976834
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:310) Selected device: 0 (automatically)
LOG (train-ctc-parallel:FinalizeActiveGpu():cuda-device.cc:194) The active GPU is [0]: GeForce GTX 980 free:3984M, used:110M, total:4095M, free/total:0.972929 version 5.2
LOG (train-ctc-parallel:PrintMemoryUsage():cuda-device.cc:334) Memory used: 0 bytes.
LOG (train-ctc-parallel:DisableCaching():cuda-device.cc:731) Disabling caching of GPU memory.
How can i solve the problem？

Did you have experience with Obj = nan, TokenAcc = nan%?

Hi
I am slightly modifying your character based RNN+CTC experiment on swbd.
I am trying to use minimal character unit (alphabet(26) + {space ' - } + noise + laugh + vocal-noise) instead of including all the characters such as digits, &, _ . Thus RNN have 34 output units. For this experiment, I had to modify lexicon2.txt & units.txt, and this makes transcription have longer sequence than before.
For example) 260 : t w o - s i x t y, 401k : f o u r - o - o n e - k

But, this experiment produce nan for Obj & TokenAcc consistently even if I tried with smaller learning rate & various RNN architecture.
I suspect this is because 'train-ctc-parallel' does not rescale alpha, beta during forward-backward algorithm. It seems that non-parallel version use rescaling kernel. (i.e. _compute_ctc_alpha_one_sequence_rescale). But parallel version does use code without rescaling.

Did you have similar experience about nan error?
Did I miss rescaling part from your code? Hope I did not make mistake and bother you much.

Here is the a few lines of log example
VLOG1 After 20 sequences (0.000913889Hr): Obj(log[Pzx]) = -50.5868 TokenAcc = -nan%
VLOG1 After 40 sequences (0.00273056Hr): Obj(log[Pzx]) = nan TokenAcc = -260%
VLOG1 After 60 sequences (0.00498056Hr): Obj(log[Pzx]) = -59.5562 TokenAcc = -140.909%
VLOG1 After 80 sequences (0.00747778Hr): Obj(log[Pzx]) = -75.5068 TokenAcc = -34.9206%
VLOG1 After 100 sequences (0.0101417Hr): Obj(log[Pzx]) = -67.462 TokenAcc = -66.6667%
VLOG1 After 120 sequences (0.0129083Hr): Obj(log[Pzx]) = -31.2848 TokenAcc = 11.6279%
...
VLOG1 After 740 sequences (0.1354Hr): Obj(log[Pzx]) = 10.6927 TokenAcc = 0%
VLOG1 After 760 sequences (0.140061Hr): Obj(log[Pzx]) = 8.33837 TokenAcc = 0%
VLOG1 After 780 sequences (0.144731Hr): Obj(log[Pzx]) = 4.95938 TokenAcc = 0%
VLOG1 After 800 sequences (0.149453Hr): Obj(log[Pzx]) = 2.56755 TokenAcc = 0%
VLOG1 After 820 sequences (0.154206Hr): Obj(log[Pzx]) = 9.55445 TokenAcc = 0%

LDC

hi!
I'm a student in China,and I'm not a member of LDC,so I don't know the format of the text,can you provides me with an example?

where is prune-lm?

Hi，
I try to run WSJ example, but I get a error:
local/wsj_data_prep.sh: line 161: prune-lm: command not found

Where is the tool "prune-lm"?

tedlium example training error

I try to run tedlium example, but I get a problem.
It does not work when training process.

========Network Training with the 110-Hour Set=========
steps/train_ctc_parallel.sh --add-deltas true --num-sequence 20 --frame-num-limit 25000 --learn-rate 0.00004 --report-step 1000 --halving-after-epoch 12 --feats-tmpdir exp/train_phn_l5_c320/XXXXX data/train_tr95 data/train_cv05 exp/train_phn_l5_c320
feat-to-len scp:data/train_tr95/feats.scp ark,t:-
feat-to-len scp:data/train_cv05/feats.scp ark,t:-
copy-feats 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/train_tr95/utt2spk scp:data/train_tr95/cmvn.scp scp:exp/train_phn_l5_c320/train.scp ark:- |' ark,scp:exp/train_phn_l5_c320/g8Kne/train.ark,exp/train_phn_l5_c320/train_local.scp
apply-cmvn --norm-vars=true --utt2spk=ark:data/train_tr95/utt2spk scp:data/train_tr95/cmvn.scp scp:exp/train_phn_l5_c320/train.scp ark:-
LOG (apply-cmvn:main():apply-cmvn.cc:129) Applied cepstral mean and variance normalization to 51984 utterances, errors on 0
LOG (copy-feats:main():copy-feats.cc:100) Copied 51984 feature matrices.
copy-feats 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/train_cv05/utt2spk scp:data/train_cv05/cmvn.scp scp:exp/train_phn_l5_c320/cv.scp ark:- |' ark,scp:exp/train_phn_l5_c320/g8Kne/cv.ark,exp/train_phn_l5_c320/cv_local.scp
apply-cmvn --norm-vars=true --utt2spk=ark:data/train_cv05/utt2spk scp:data/train_cv05/cmvn.scp scp:exp/train_phn_l5_c320/cv.scp ark:-
LOG (apply-cmvn:main():apply-cmvn.cc:129) Applied cepstral mean and variance normalization to 2583 utterances, errors on 0
LOG (copy-feats:main():copy-feats.cc:100) Copied 2583 feature matrices.
Initializing model as exp/train_phn_l5_c320/nnet/nnet.iter0
TRAINING STARTS [2017-Feb-1 17:57:13]
[NOTE] TOKEN_ACCURACY refers to token accuracy, i.e., (1.0 - token_error_rate).
EPOCH 1 RUNNING ... Removing features tmpdir exp/train_phn_l5_c320/g8Kne @ liao-ubuntu2
cv.ark
train.ark

I will appreciate any help,
Thanks

online decoding

whether it supports online decoding..

Training error

I get the following syntax error and negative token accuracy while running on my own dataset
kindly help me to sort out....

                      Model Training

steps/train_ctc_parallel.sh --add-deltas true --num-sequence 10 --learn-rate 0.00004 --report-step 1000 --halving-after-epoch 12 --feats-tmpdir exp/train_phn_l2_c200/XXXXX data/train_tr95 data/train_cv05 exp/train_phn_l2_c200
feat-to-len scp:data/train_tr95/feats.scp ark,t:-
feat-to-len scp:data/train_cv05/feats.scp ark,t:-
copy-feats 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/train_tr95/utt2spk scp:data/train_tr95/cmvn.scp scp:exp/train_phn_l2_c200/train.scp ark:- |' ark,scp:exp/train_phn_l2_c200/Ofwfy/train.ark,exp/train_phn_l2_c200/train_local.scp
apply-cmvn --norm-vars=true --utt2spk=ark:data/train_tr95/utt2spk scp:data/train_tr95/cmvn.scp scp:exp/train_phn_l2_c200/train.scp ark:-
LOG (apply-cmvn:main():apply-cmvn.cc:129) Applied cepstral mean and variance normalization to 2823 utterances, errors on 0
LOG (copy-feats:main():copy-feats.cc:100) Copied 2823 feature matrices.
copy-feats 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/train_cv05/utt2spk scp:data/train_cv05/cmvn.scp scp:exp/train_phn_l2_c200/cv.scp ark:- |' ark,scp:exp/train_phn_l2_c200/Ofwfy/cv.ark,exp/train_phn_l2_c200/cv_local.scp
apply-cmvn --norm-vars=true --utt2spk=ark:data/train_cv05/utt2spk scp:data/train_cv05/cmvn.scp scp:exp/train_phn_l2_c200/cv.scp ark:-
LOG (apply-cmvn:main():apply-cmvn.cc:129) Applied cepstral mean and variance normalization to 151 utterances, errors on 0
LOG (copy-feats:main():copy-feats.cc:100) Copied 151 feature matrices.
TRAINING STARTS [2016-Feb-11 09:46:16]
[NOTE] TOKEN_ACCURACY refers to token accuracy, i.e., (1.0 - token_error_rate).
EPOCH 1 RUNNING ... ENDS [2016-Feb-11 09:50:04]: lrate 4e-05, TRAIN ACCURACY -71.7361%, VALID ACCURACY -75.7551%
(standard_in) 1: syntax error
(standard_in) 1: syntax error
steps/train_ctc_parallel.sh: line 163: [: too many arguments
(standard_in) 1: syntax error
steps/train_ctc_parallel.sh: line 175: [: 1: unary operator expected
EPOCH 2 RUNNING ... ENDS [2016-Feb-11 09:53:56]: lrate 4e-05, TRAIN ACCURACY -71.7361%, VALID ACCURACY -75.7551%
(standard_in) 1: syntax error
(standard_in) 1: syntax error
steps/train_ctc_parallel.sh: line 163: [: too many argument

Lattice Decoding Error

I'm building a text recogniser using eesen framework, the training step is realised succesfully.

But the decoding stage fails. I have changed the beam,lattice-beam and acoustic scale many times.
The problem still resistent.
The command of decoding is:
steps/decode_ctc_lat.sh --cmd "$decode_cmd" --nj 1 --beam 18.0 --lattice_beam 8.0 --max-active 5000 --acwt 1.3
data/lang_phn_test_${lm_suffix} data/test_handwritten $dir/decode_test_handwritten_${lm_suffix} || exit 1;
This is the content of the log file:

net-output-extract --class-frame-counts=exp/train_phn_l2_c140/label.counts --apply-log=true exp/train_phn_l2_c140/final.nnet "ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/test_handwritten/split1/1/utt2spk scp:data/test_handwritten/split1/1/cmvn.scp scp:data/test_handwritten/split1/1/feats.scp ark:- |" ark:- | latgen-faster --max-active=5000 --max-mem=50000000 --beam=18.0 --lattice-beam=8.0 --acoustic-scale=1.3 --allow-partial=true --word-symbol-table=data/lang_phn_test_tg/words.txt data/lang_phn_test_tg/TLG.fst ark:- "ark:|gzip -c > exp/train_phn_l2_c140/decode_test_handwritten_tg/lat.1.gz"

Started at Thu Sep 1 17:55:04 CEST 2016

net-output-extract --class-frame-counts=exp/train_phn_l2_c140/label.counts --apply-log=true exp/train_phn_l2_c140/final.nnet 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/test_handwritten/split1/1/utt2spk scp:data/test_handwritten/split1/1/cmvn.scp scp:data/test_handwritten/split1/1/feats.scp ark:- |' ark:-
LOG (net-output-extract:SelectGpuId():cuda-device.cc:77) Manually selected to compute on CPU.
LOG (net-output-extract:DisableCaching():cuda-device.cc:731) Disabling caching of GPU memory.
latgen-faster --max-active=5000 --max-mem=50000000 --beam=18.0 --lattice-beam=8.0 --acoustic-scale=1.3 --allow-partial=true --word-symbol-table=data/lang_phn_test_tg/words.txt data/lang_phn_test_tg/TLG.fst ark:- 'ark:|gzip -c > exp/train_phn_l2_c140/decode_test_handwritten_tg/lat.1.gz'
LOG (net-output-extract:ClassPrior():class-prior.cc:33) Computing class-priors from : exp/train_phn_l2_c140/label.counts
apply-cmvn --norm-vars=true --utt2spk=ark:data/test_handwritten/split1/1/utt2spk scp:data/test_handwritten/split1/1/cmvn.scp scp:data/test_handwritten/split1/1/feats.scp ark:-
AHTD3A0002_Para2_1 ahA heM yaB naA sp aeA daM ayA sp waM aeA naA sp taB deB yaA sp aeA raM aaA hhA aaA taA sp comA aeA naA sp aaA naA sp aeA naA bslA yaA sp aeA ayE sp aeA yaA sp aaA kaA sp aeA naA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para2_1 is 0.847503 over 873 frames.
AHTD3A0002_Para2_2 ghM taE sp waM aeA naA sp aeA naB waM aaA ayA sp aeA heM maE sp ahA naA sp aaA ayA sp aeA naA sp taB aaE naA sp waM aeA naA bslA kaM yaA sp aaA baM haE aaE aeA raM deA sp dotA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para2_2 is 0.79611 over 923 frames.
AHTD3A0002_Para2_3 ahA raM aaA hhA taA sp aeA heM laA sp aeA haE sp ayA sp sp waM aeA deB dotA sp waM aeA naA sp ghB yaB toB sp comA aeA heM broA aeA naA sp aaA naA sp aeA naA sp aeA naA sp dbqA aaA deA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para2_3 is 0.828762 over 936 frames.
AHTD3A0002_Para2_4 alA heM aaE zhaA sp ayA sp hhA heM yaB alM aaE sp sp zhaA sp equA aaE sp ahA naA sp aeA yaA sp dotA sp sp aeA alA heM aeE yaA sp ahA naA sp aeA alE sp aeA naA sp taB ghM toB yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para2_4 is 0.832251 over 892 frames.
AHTD3A0002_Para3_1 ahA naA sp aeA naA sp aaA ayA sp aeA naA sp dotA dotA sp sp seM aeE jaB yaA sp aeA yaA sp dotA sp dhM aaA sp aeA yaA sp aaA kaA sp ahA naA sp taB yaA sp ahA raM aeA yaA sp dotA sp baE
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para3_1 is 0.851592 over 914 frames.
AHTD3A0002_Para3_2 raM fslA hhA sp ayA sp waM aeA naA sp aaA yaA sp aeA raM aaA ayA sp sp aeA deB dotA sp thB maE sp dotA heM ahA broA aeA heM maE sp dbqA dotA taE sp aeA naA sp comA heA alM aaE aeA aaA hhA aaA taA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para3_2 is 0.857723 over 885 frames.
AHTD3A0002_Para3_3 ghA bslA aeE naB waM aaA ayA sp aaA ayA sp aeA yaA sp dotA sp waM aeA naA sp aaA ayA sp comA aeA naA sp taB haE alM yaA sp broA aeA naB shM aeE dbqA alB kaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para3_3 is 0.835602 over 858 frames.
AHTD3A0002_Para3_4 waM heM yaA sp sp shA aeA naA sp jaB waM aeA naA sp waM aeA naA sp heM yaA sp aeA waM aaA kaM ayM taE sp aeA ahA naA sp n3A aeE yaA sp dbqA sp aeA ayE sp aeA yaA sp dotA dotA dbqA comA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para3_4 is 0.857684 over 871 frames.
AHTD3A0002_Para3_5 heM yaA sp aeA daM heM aaE sp aeA naA sp aaA naA sp aeA naA sp dotA dotA sp sp aeA jaB daM sp aeA aeE jaB haE sp waM aeA naA sp aeA yaA sp aeA naA sp waM aeA naA sp heM wlE yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para3_5 is 0.841021 over 854 frames.
AHTD3A0002_Para3_6 naA sp aeA yaA sp aaA jaB haE sp aeA yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para3_6 is 0.948729 over 180 frames.
AHTD3A0002_Para4_1 waM aaA aaE aeA naA sp sp aeA shM yaB alE alM aeA deB amE ayA sp aeA ayE ayM alB ayE sp aeA naA sp ahA naA sp aeA naA sp aeA naA sp comA sp aeA yaA sp dotA n6A aeA saM laA sp aaA ayB yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para4_1 is 0.866579 over 912 frames.
AHTD3A0002_Para4_2 taB wlE daM yaA sp ahA raM maE sp aaA raM ayA sp aeA naA sp dhM broA aeA raM deB dotA sp aeA naA sp aaA laB eeE sp aeA naA sp faM yaA sp ahA naA sp deA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para4_2 is 0.891789 over 847 frames.
AHTD3A0002_Para4_3 seM naA sp sp dotA sp aeA aaA ayA sp aeA naA sp yaB zaE aaA laA sp aeA yaA sp dotA sp waM heM yaA sp aeA yaA sp ahA naA aeE yaA sp aeA daM aaA hhA sp aeA baM haE sp dotA aeE yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para4_3 is 0.867084 over 870 frames.
AHTD3A0002_Para4_4 alA haM aeA yaA sp waM heM yaA sp broA aeA ghB yaB alM yaA sp aeA deB comA aeA raM deB dotA dotA alA sp aeA dotA dotA heM laA sp aeA yaA sp ahA aeA aeE naB hhE sp aeA ayE heM daM sp faM yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para4_4 is 0.834662 over 869 frames.
AHTD3A0002_Para4_5 heM yaA sp aeA raM deB yaA sp aaA ayA aeA alA hypA sp aeA seM aeE yaA sp aeA jaB daM sp aeA naA sp aeA naA bslA daM ayA sp daM aeA naA kaM yaA sp faM yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para4_5 is 0.870726 over 830 frames.
AHTD3A0002_Para4_6 haE sp aeA naA sp aeA naA sp aaA yaA sp aeA aaA deA aeA ayE sp khM aeA seM ayM alB yaA sp yaB aeE keB laA sp aeA laB aeA naA sp maB naA sp aeA yaA sp aeA yaA sp naB aaE sp aeA naB yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para4_6 is 0.885211 over 903 frames.
AHTD3A0002_Para4_7 n1A n2A aeA yaA sp aeA seM aeE aeE raM deB comA aeA seM raM taB heE sp aeA yaA sp aeA naA sp heM yaA sp aeA deB keB aeE naA sp aeA naA bslA hhA keB naA sp aeA naA sp kaM raM naA sp taB aeE yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para4_7 is 0.83318 over 913 frames.
AHTD3A0002_Para4_8 aaA heA sp heA sp waM aeA naA sp ghA sp seM amE dotA sp sp seM aeE naA sp aeA seM ayE n9A aeA waM aaA raM yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0002_Para4_8 is 0.886301 over 659 frames.
AHTD3A0003_Para4_3 dotA heM ayB yaA sp aaA yaA sp aeA naB hhE ayB aeA heM daM aaA ayA sp aeA naA sp aeA naA sp aaA raM aaA ayA sp hypA sp aeA ghA aeE yaA sp aeA waM jaB yaA sp aeA naA sp aeA ghA bslA ayM yaB ayE sp aeA seM kaM yaA sp maB eeE sp keB naA sp aeA naA sp ghB raM aaA deA aeA taB yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0003_Para4_3 is 0.825307 over 954 frames.
AHTD3A0003_Para4_4 naB ayM alB ayE aeE saM laA sp aeA maB maE sp aeA yaB heE sp shM aaE raM heA sp maE sp n1A n2A sp aeA seM aeE aeE sp aeA alE sp aeA ghA bslA alB yaA sp aeA aaA deA aeA n3A dbqA ayB aeA shM yaB hhE sp ghA sp heM sp sp ayA sp aeA naA sp jaB raM yaB baE
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0003_Para4_4 is 0.814618 over 924 frames.
AHTD3A0003_Para4_5 aaA seM taM aeE jaB ayM n1A n5A n0A maE sp aeA seM alM yaA sp thB maE sp aeA amE dotA n7A maE aeE yaA sp n8A n2A n9A sp aeA aeE haE sp aaA yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0003_Para4_5 is 0.925499 over 745 frames.
AHTD3A0004_Para1_1 raM aaA deA sp sp aeA shM yaB alE sp amA alM amE naB yaB ayE ayM alB ayE sp aeA yaA sp aeA naA sp ahA brcA dotA keB naA sp aeA naA sp dotA n6A alB alE sp aeA naB daM yaA sp yaB wlE daM yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0004_Para1_1 is 0.892255 over 988 frames.
AHTD3A0004_Para1_2 raM aaA ayA sp aeA deB yaA sp raM aeA yaA sp brcA alB eeE aeA taA sp faM yaA sp shM alE deA sp zhaA alB ghB dotA sp aeA yaA sp scrA ghE ghM yaB raM deB yaB naA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0004_Para1_2 is 0.826241 over 1006 frames.
AHTD3A0004_Para1_3 heM yaA sp kaM raM naA sp aeA shM ayM taM aeE sp aeA daM dbqA aeE sp aeA yaA sp dbqA dotA aeE raM alA sp aeA heA dotA sp waM heM yaA sp aeA ghB ayA sp dotA sp waM aeA naA sp deB dotA sp baE sp aeA naA sp heM yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0004_Para1_3 is 0.820131 over 990 frames.
AHTD3A0004_Para1_4 waM aaA laB aeE naA sp aeA heM daM alA sp faM heM yaA sp aeA raM deB yaA sp aaA ayA aeA alA hypA deA aeA seM aeE yaA sp aeA waM jaB yaA sp aeA naA sp aeA naA sp amA ayA ayE daM aeA naA sp aaA hhA heM maB eeE keB bslA yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0004_Para1_4 is 0.828792 over 1031 frames.
AHTD3A0004_Para1_5 ahA naA sp aeA aaA deA sp heM yaA sp aeA seM ayM alB ayE sp aeA laA sp aeA yaA sp aeA maB raM aeA seA sp aeA yaA sp aeA aaE alB jaE sp maE sp aeA yaA sp aeA seM aeE aeE ayA sp comA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0004_Para1_5 is 0.88772 over 1011 frames.
AHTD3A0004_Para1_6 ghA bslA alB eeA eeE aaA deA aeA n3A dbqA bslA ayB aeA deB yaB hhE naA sp aeA naA sp heM daM alA ayA sp aeA naA bslA hhA taB heM dotA sp aeA naA sp aaA naA sp aeA naA sp jaB daM yaA sp aeA raM heA alA sp waM aeA naA sp ahA naA sp aeA yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0004_Para1_6 is 0.805979 over 1014 frames.
AHTD3A0004_Para1_7 amE dotA n7A n5A aeE naA sp aeA waM aeA naA sp waM aeA naA sp aeA yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0004_Para1_7 is 0.912582 over 368 frames.
AHTD3A0004_Para2_1 ahA naA sp jaB ayE sp shM aeE naA sp aeA daM naA sp waM heM hhA keB aeE naA sp aaA naA sp aeA naA sp n3A heM raM deA sp ayA sp waM aeA naA sp hhA ayM daM sp aeA naA hhA sp aeA naA sp dotA sp waM aeA raM deA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0004_Para2_1 is 0.872719 over 991 frames.
AHTD3A0004_Para2_2 ghA sp heA sp aeA yaA sp aaA ayA sp hhA sp broA aeA yaA sp aeA naA sp zhaA sp ayA sp aeA deB yaA sp baM aeE raM aaA deB yaA sp dotA fslA brcA crcA aeE naA sp aeA raM deA sp hypA deA sp heM daM alA sp waM waM yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0004_Para2_2 is 0.844068 over 1052 frames.
AHTD3A0004_Para2_3 zhaA sp dotA ayM raM deA sp aaA ayA sp sp dotA sp waM ahA naA sp ahA naA sp aeA alA hypA deA aeA faE sp saA sp ghA sp sp aeA deB yaA sp aeA naA sp jaB raM yaB baE sp aeA naA sp taB keB naA sp aeA deB brcA aeA naA sp aaA yaA sp aeA seM ayE sp aaA yaA
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0004_Para2_3 is 0.868646 over 1087 frames.
AHTD3A0004_Para2_4 ghA bslA alA sp deA aeA ghA sp heM sp dotA sp waM aeA deB aeE sp dhM yaA sp shA sp dotA aeE aaA aaA yaA sp deB raM alA sp aeA naB heM dotA dotA sp ghA sp aeA deB yaA sp aaA yaA sp aeA seM haE sp aaA ayA sp waM aeA naA sp keB laA sp aeA yaA sp taB aeE jaB naB baE
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance AHTD3A0004_Para2_4 is 0.839514 over 1090 frames.
WARNING (latgen-faster:ProcessNonemitting():lattice-faster-decoder.cc:772) Error, no surviving tokens: frame is 61
WARNING (latgen-faster:PruneTokensForFrame():lattice-faster-decoder.cc:456) No tokens alive [doing pruning]
WARNING (latgen-faster:PruneTokensForFrame():lattice-faster-decoder.cc:456) No tokens alive [doing pruning]
WARNING (latgen-faster:PruneTokensForFrame():lattice-faster-decoder.cc:456) No tokens alive [doing pruning]
WARNING (latgen-faster:PruneTokensForFrame():lattice-faster-decoder.cc:456) No tokens alive [doing pruning]
WARNING (latgen-faster:PruneTokensForFrame():lattice-faster-decoder.cc:456) No tokens alive [doing pruning]
WARNING (latgen-faster:PruneTokensForFrame():lattice-faster-decoder.cc:456) No tokens alive [doing pruning]
WARNING (latgen-faster:PruneTokensForFrame():lattice-faster-decoder.cc:456) No tokens alive [doing pruning]
WARNING (latgen-faster:PruneTokensForFrame():lattice-faster-decoder.cc:456) No tokens alive [doing pruning]
WARNING (latgen-faster:PruneTokensForFrame():lattice-faster-decoder.cc:456) No tokens alive [doing pruning]
WARNING (latgen-faster:PruneTokensForFrame():lattice-faster-decoder.cc:456) No tokens alive [doing pruning]
WARNING (latgen-faster:PruneTokensForFrame():lattice-faster-decoder.cc:456) No tokens alive [doing pruning]
WARNING (latgen-faster:PruneTokensForFrame():lattice-faster-decoder.cc:456) No tokens alive [doing pruning]
WARNING (latgen-faster:PruneTokensForFrame():lattice-faster-decoder.cc:456) No tokens alive [doing pruning]
KALDI_ASSERT: at latgen-faster:PruneForwardLinks:lattice-faster-decoder.cc:314, failed: link_extra_cost == link_extra_cost
Stack trace is:
eesen::KaldiGetStackTrace()
eesen::KaldiAssertFailure_(char const_, char const_, int, char const_)
eesen::LatticeFasterDecoder::PruneForwardLinks(int, bool_, bool_, float)
eesen::LatticeFasterDecoder::PruneActiveTokens(float)
eesen::LatticeFasterDecoder::Decode(eesen::DecodableInterface_)
eesen::DecodeUtteranceLatticeFaster(eesen::LatticeFasterDecoder&, eesen::DecodableInterface&, fst::SymbolTable const_, std::string, double, bool, bool, eesen::TableWritereesen::BasicVectorHolder, eesen::TableWritereesen::BasicVectorHolder, eesen::TableWritereesen::CompactLatticeHolder_, eesen::TableWritereesen::LatticeHolder_, double_)
latgen-faster(main+0x83a) [0x8191127]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0xb6b82a83]
latgen-faster() [0x8190810]
bash: line 1: 3076 Broken pipe net-output-extract --class-frame-counts=exp/train_phn_l2_c140/label.counts --apply-log=true exp/train_phn_l2_c140/final.nnet "ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/test_handwritten/split1/1/utt2spk scp:data/test_handwritten/split1/1/cmvn.scp scp:data/test_handwritten/split1/1/feats.scp ark:- |" ark:-
3077 Aborted (core dumped) | latgen-faster --max-active=5000 --max-mem=50000000 --beam=18.0 --lattice-beam=8.0 --acoustic-scale=1.3 --allow-partial=true --word-symbol-table=data/lang_phn_test_tg/words.txt data/lang_phn_test_tg/TLG.fst ark:- "ark:|gzip -c > exp/train_phn_l2_c140/decode_test_handwritten_tg/lat.1.gz"

Accounting: time=274 threads=1

Ended (code 134) at Thu Sep 1 17:59:38 CEST 2016, elapsed time 274 seconds

Softmax probabilty vs Number of frames

My network is

120 400 0.1 1.0 50.0 1.0 400 31 0.1 31 31

while trying to plot a decode output. softmax vs number of frames using
net-output-extract --class-frame-counts=$srcdir/label.counts --apply-log=true $srcdir/final.nnet "$feats" ark,t:my1.ark

my input file (frame size) =62 frames.
while plotting the values in * my1.ark*.

I get 72 X 62 matrix instead of 31X62. why ?
my1.ark contains softmax vs frames is it not ?
if no, how to get softmax probability?

the installation of eesen

hello ,
may I ask for some help ? while i install the essen by "./configure --use-cuda=yes --cudatk-dir=/path/to/cuda_library",I have no idea about the the path of the CUDA library.Anyone can help me? Any help is appreciated. @yajiemiao

few questions

Hi there,
I am willing to test but before may I ask a few questions:

is GPU just recommended or necessary ? I intend to run a mulitcore machines.
can I use grid-engine like with Kaldi queue.pl ?
how does it take to train for 108hours tedlium 1 in your experience ?
how long would that take for 30hours of audio and do you think it is enough to get decent results ?
Cheers,
Vinny.