Hi I tested timit data using eesen, but the result is not good as follows:
training process
EPOCH 11 RUNNING ... ENDS [2016-Jun-6 17:02:47]: lrate 4e-05, TRAIN ACCURACY 23.4300%, VALID ACCURACY 17.3147%
EPOCH 12 RUNNING ... ENDS [2016-Jun-6 17:07:02]: lrate 4e-05, TRAIN ACCURACY 25.2924%, VALID ACCURACY 16.1223%
EPOCH 13 RUNNING ... ENDS [2016-Jun-6 17:11:18]: lrate 4e-05, TRAIN ACCURACY 26.1150%, VALID ACCURACY 18.4033%
EPOCH 14 RUNNING ... ENDS [2016-Jun-6 17:15:33]: lrate 4e-05, TRAIN ACCURACY 26.6806%, VALID ACCURACY 19.5179%
EPOCH 15 RUNNING ... ENDS [2016-Jun-6 17:19:51]: lrate 4e-05, TRAIN ACCURACY 27.1350%, VALID ACCURACY 18.6625%
EPOCH 16 RUNNING ... ENDS [2016-Jun-6 17:24:07]: lrate 2e-05, TRAIN ACCURACY 27.4092%, VALID ACCURACY 20.1400%
EPOCH 17 RUNNING ... ENDS [2016-Jun-6 17:28:23]: lrate 1e-05, TRAIN ACCURACY 27.5363%, VALID ACCURACY 20.2177%
finished, too small rel. improvement .0777
Training succeeded. The final model exp/train_phn_l5_c320/final.nnet
Removing features tmpdir exp/train_phn_l5_c320/ptrXL @ pingan-nlp-001
cv.ark train.ark
testing process
rjb1_sx64-0000000-0000248 out-moded
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjb1_sx64-0000000-0000248 is 0.454562 over 246 frames.
mrjh0_sa1-0000000-0000385 she had your dark suit in greasy wash water all
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_sa1-0000000-0000385 is 0.577131 over 383 frames.
mrjh0_sa2-0000000-0000317 how ask me to carry an oily rag like
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_sa2-0000000-0000317 is 0.483511 over 315 frames.
mrjh0_si1145-0000000-0000487 how unauthentic
LOG (latgen-faster:RebuildRepository():determinize-lattice-pruned.cc:294) Rebuilding repository.
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si1145-0000000-0000487 is 0.258022 over 485 frames.
mrjh0_si1775-0000000-0000306 how unauthentic
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si1775-0000000-0000306 is 0.384129 over 304 frames.
mrjh0_si515-0000000-0000296 out-moded
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si515-0000000-0000296 is 0.429838 over 294 frames.
mrjh0_sx155-0000000-0000394 how unauthentic
I checked the ark file of timit and tedlium data and I found some difference, but I do not know how the difference come.
The tedlium's ark file like this :
AlGore_2009 [
510340.6 586395.1 608272.1 621239.9 642546.4 653072.2 651401.9 651305.8 653922.6 659371.4 654681.1 652654.5 646230.6 645681.9 650887.6 655483.5 666377.6 671666.1 672115.6 669366.7 669373.2 681050.7 703447.4 715073.2 709013.8 702928.3 713154.4 718430.6 711170 688705.3 658752.9 641324.2 630078.5 628411.7 623944.6 627934.9 639849.6 641777.4 643522.4 627100.5 39020
6946354 9087419 9763794 1.018412e+07 1.091917e+07 1.127568e+07 1.123372e+07 1.124698e+07 1.134869e+07 1.154412e+07 1.137156e+07 1.1279e+07 1.104712e+07 1.103819e+07 1.121266e+07 1.137482e+07 1.174376e+07 1.193318e+07 1.195034e+07 1.184173e+07 1.18378e+07 1.224044e+07 1.303755e+07 1.345864e+07 1.321482e+07 1.298168e+07 1.337751e+07 1.358918e+07 1.332344e+07 1.25133e+07 1.146994e+07 1.091216e+07 1.056395e+07 1.05361e+07 1.041939e+07 1.053006e+07 1.088328e+07 1.093435e+07 1.097803e+07 1.044419e+07 0 ]
And the timit's is like this:
fadg0_sa1 [
3077.437 3576.837 3893.808 4497.17 4646.433 4888.595 5084.933 5245.375 5266.312 5316.513 5304.906 5279.905 5159.947 5092.513 5093.656 5096.891 5198.106 5342.096 5525.816 5622.102 5590.077 5587.714 5621.955 5658.111 5640.733 5684.978 5922.412 6028.531 5843.909 5494.285 5123.665 4873.254 4768.456 4619.075 4454.212 4446.68 4533.783 4809.863 5073.438 5097.519 372
28369.65 38061.98 44509.9 59787.96 63547.87 70383.9 75846.95 80695.33 82071.57 83632.43 82730.72 81498.48 78174.86 76341.12 75977.55 75682.39 78059.57 82118.61 87383.28 90191.3 89340.34 89230.34 90614.35 91722.68 90768.06 91814.1 99787.2 103876.6 97762.07 85880.71 74550.43 67565.18 64682.43 60528.35 56254.4 56227.67 58352.52 65413.38 72421.16 72856.04 0 ]
but the scripts is the same as the tedliums', (I just modified exist code),the diff runs like this:
91c91
< || exit 208;
---
> || exit 1;
106c106
< || exit 209;
---
> || exit 1;
and the scripts is like this:
#!/bin/bash
# Copyright 2012 Karel Vesely Johns Hopkins University (Author: Daniel Povey)
# Apache 2.0
# To be run from .. (one directory up from here)
# see ../run.sh for example
# Begin configuration section.
nj=4
cmd=run.pl
fbank_config=conf/fbank.conf
compress=true
# End configuration section.
echo "$0 $@" # Print the command line for logging
if [ -f path.sh ]; then . ./path.sh; fi
. parse_options.sh || exit 1;
if [ $# != 3 ]; then
echo "usage: make_fbank.sh [options] <data-dir> <log-dir> <path-to-fbankdir>";
echo "options: "
echo " --fbank-config <config-file> # config passed to compute-fbank-feats "
echo " --nj <nj> # number of parallel jobs"
echo " --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
exit 1;
fi
data=$1
logdir=$2
fbankdir=$3
# make $fbankdir an absolute pathname.
fbankdir=`perl -e '($dir,$pwd)= @ARGV; if($dir!~m:^/:) { $dir = "$pwd/$dir"; } print $dir; ' $fbankdir ${PWD}`
# use "name" as part of name of the archive.
name=`basename $data`
mkdir -p $fbankdir || exit 1;
mkdir -p $logdir || exit 1;
if [ -f $data/feats.scp ]; then
mkdir -p $data/.backup
echo "$0: moving $data/feats.scp to $data/.backup"
mv $data/feats.scp $data/.backup
fi
scp=$data/wav.scp
required="$scp $fbank_config"
for f in $required; do
if [ ! -f $f ]; then
echo "make_fbank.sh: no such file $f"
exit 1;
fi
done
utils/validate_data_dir.sh --no-text --no-feats $data || exit 1;
if [ -f $data/spk2warp ]; then
echo "$0 [info]: using VTLN warp factors from $data/spk2warp"
vtln_opts="--vtln-map=ark:$data/spk2warp --utt2spk=ark:$data/utt2spk"
elif [ -f $data/utt2warp ]; then
echo "$0 [info]: using VTLN warp factors from $data/utt2warp"
vtln_opts="--vtln-map=ark:$data/utt2warp"
fi
for n in $(seq $nj); do
# the next command does nothing unless $fbankdir/storage/ exists, see
# utils/create_data_link.pl for more info.
utils/create_data_link.pl $fbankdir/raw_fbank_$name.$n.ark
done
if [ -f $data/segments ]; then
echo "$0 [info]: segments file exists: using that."
split_segments=""
for n in $(seq $nj); do
split_segments="$split_segments $logdir/segments.$n"
done
utils/split_scp.pl $data/segments $split_segments || exit 1;
rm $logdir/.error 2>/dev/null
$cmd JOB=1:$nj $logdir/make_fbank_${name}.JOB.log \
extract-segments scp,p:$scp $logdir/segments.JOB ark:- \| \
compute-fbank-feats $vtln_opts --verbose=2 --config=$fbank_config ark:- ark:- \| \
copy-feats --compress=$compress ark:- \
ark,scp:$fbankdir/raw_fbank_$name.JOB.ark,$fbankdir/raw_fbank_$name.JOB.scp \
|| exit 208;
else
echo "$0: [info]: no segments file exists: assuming wav.scp indexed by utterance."
split_scps=""
for n in $(seq $nj); do
split_scps="$split_scps $logdir/wav.$n.scp"
done
utils/split_scp.pl $scp $split_scps || exit 1;
$cmd JOB=1:$nj $logdir/make_fbank_${name}.JOB.log \
compute-fbank-feats $vtln_opts --verbose=2 --config=$fbank_config scp,p:$logdir/wav.JOB.scp ark:- \| \
copy-feats --compress=$compress ark:- \
ark,scp:$fbankdir/raw_fbank_$name.JOB.ark,$fbankdir/raw_fbank_$name.JOB.scp \
|| exit 209;
fi
if [ -f $logdir/.error.$name ]; then
echo "Error producing fbank features for $name:"
tail $logdir/make_fbank_${name}.1.log
exit 1;
fi
# concatenate the .scp files together.
for n in $(seq $nj); do
cat $fbankdir/raw_fbank_$name.$n.scp || exit 1;
done > $data/feats.scp
rm $logdir/wav.*.scp $logdir/segments.* 2>/dev/null
nf=`cat $data/feats.scp | wc -l`
nu=`cat $data/utt2spk | wc -l`
if [ $nf -ne $nu ]; then
echo "It seems not all of the feature files were successfully ($nf != $nu);"
echo "consider using utils/fix_data_dir.sh $data"
fi
echo "Succeeded creating filterbank features for $name"
Is there some thing wrong?
and what is out-moded and journalese mean?
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mbns0_sx340-0000000-0000242 is 0.466467 over 240 frames.
mbns0_sx430-0000000-0000343 out-moded
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mbns0_sx430-0000000-0000343 is 0.430763 over 341 frames.
mbns0_sx70-0000000-0000119 journalese