Hi
Great code and model.
I just ran LSTM-Word-Small on my mac, but got unsatisfied result. I can't figure why.
Here is the log file.
yangyifans-MacBook-Pro:lstm-char-cnn yang1fan2$ th main.lua -savefile word-small -word_vec_size 200 -highway_layers 0 -use_chars 0 -use_words 1 -rnn_size 200 -EOS '+'
loading data files...
Word vocab size: 9999, Char vocab size: 50
reshaping tensors...
data load done. Number of batches in train: 1267, val: 100, test: 1
Word vocab size: 9999, Char vocab size: 50, Max word length (incl. padding): 19
creating an LSTM-CNN with 2 layers
number of parameters in the model: 4652799
cloning rnn
cloning criterion
100/31675 (epoch 0.08), train_loss = 1092.5376
200/31675 (epoch 0.16), train_loss = 1062.9700
300/31675 (epoch 0.24), train_loss = 707.9908
400/31675 (epoch 0.32), train_loss = 538.6978
500/31675 (epoch 0.39), train_loss = 508.0643
600/31675 (epoch 0.47), train_loss = 562.3513
700/31675 (epoch 0.55), train_loss = 447.6828
800/31675 (epoch 0.63), train_loss = 361.3279
900/31675 (epoch 0.71), train_loss = 341.5817
1000/31675 (epoch 0.79), train_loss = 384.3430
1100/31675 (epoch 0.87), train_loss = 322.6886
1200/31675 (epoch 0.95), train_loss = 282.5245
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch1.00_309.59.t7
1300/31675 (epoch 1.03), train_loss = 336.0408
1400/31675 (epoch 1.10), train_loss = 272.2516
1500/31675 (epoch 1.18), train_loss = 328.0399
1600/31675 (epoch 1.26), train_loss = 413.7821
1700/31675 (epoch 1.34), train_loss = 250.8095
1800/31675 (epoch 1.42), train_loss = 245.7039
1900/31675 (epoch 1.50), train_loss = 335.7718
2000/31675 (epoch 1.58), train_loss = 252.8674
2100/31675 (epoch 1.66), train_loss = 211.3629
2200/31675 (epoch 1.74), train_loss = 281.4043
2300/31675 (epoch 1.82), train_loss = 201.7554
2400/31675 (epoch 1.89), train_loss = 297.1916
2500/31675 (epoch 1.97), train_loss = 308.5774
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch2.00_229.58.t7
2600/31675 (epoch 2.05), train_loss = 204.1578
2700/31675 (epoch 2.13), train_loss = 229.0827
2800/31675 (epoch 2.21), train_loss = 258.6883
2900/31675 (epoch 2.29), train_loss = 215.6444
3000/31675 (epoch 2.37), train_loss = 201.1391
3100/31675 (epoch 2.45), train_loss = 245.8782
3200/31675 (epoch 2.53), train_loss = 285.2821
3300/31675 (epoch 2.60), train_loss = 210.9398
3400/31675 (epoch 2.68), train_loss = 255.7903
3500/31675 (epoch 2.76), train_loss = 138.6418
3600/31675 (epoch 2.84), train_loss = 167.4747
3700/31675 (epoch 2.92), train_loss = 196.0062
3800/31675 (epoch 3.00), train_loss = 272.9710
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch3.00_199.35.t7
3900/31675 (epoch 3.08), train_loss = 189.8568
4000/31675 (epoch 3.16), train_loss = 228.6565
4100/31675 (epoch 3.24), train_loss = 224.3237
4200/31675 (epoch 3.31), train_loss = 182.4509
4300/31675 (epoch 3.39), train_loss = 231.6450
4400/31675 (epoch 3.47), train_loss = 198.1385
4500/31675 (epoch 3.55), train_loss = 213.2757
4600/31675 (epoch 3.63), train_loss = 194.8259
4700/31675 (epoch 3.71), train_loss = 261.0416
4800/31675 (epoch 3.79), train_loss = 175.5076
4900/31675 (epoch 3.87), train_loss = 246.1651
5000/31675 (epoch 3.95), train_loss = 199.7342
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch4.00_181.66.t7
5100/31675 (epoch 4.03), train_loss = 239.4468
5200/31675 (epoch 4.10), train_loss = 199.5525
5300/31675 (epoch 4.18), train_loss = 224.9765
5400/31675 (epoch 4.26), train_loss = 180.9969
5500/31675 (epoch 4.34), train_loss = 193.3227
5600/31675 (epoch 4.42), train_loss = 167.5974
5700/31675 (epoch 4.50), train_loss = 230.5838
5800/31675 (epoch 4.58), train_loss = 141.6197
5900/31675 (epoch 4.66), train_loss = 166.2485
6000/31675 (epoch 4.74), train_loss = 204.4503
6100/31675 (epoch 4.81), train_loss = 155.5831
6200/31675 (epoch 4.89), train_loss = 192.1082
6300/31675 (epoch 4.97), train_loss = 189.0958
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch5.00_172.21.t7
6400/31675 (epoch 5.05), train_loss = 146.9685
6500/31675 (epoch 5.13), train_loss = 177.8722
6600/31675 (epoch 5.21), train_loss = 196.9578
6700/31675 (epoch 5.29), train_loss = 150.1310
6800/31675 (epoch 5.37), train_loss = 127.3223
6900/31675 (epoch 5.45), train_loss = 255.7305
7000/31675 (epoch 5.52), train_loss = 221.3599
7100/31675 (epoch 5.60), train_loss = 200.8017
7200/31675 (epoch 5.68), train_loss = 184.7957
7300/31675 (epoch 5.76), train_loss = 140.1135
7400/31675 (epoch 5.84), train_loss = 177.0135
7500/31675 (epoch 5.92), train_loss = 147.8841
7600/31675 (epoch 6.00), train_loss = 153.9457
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch6.00_162.51.t7
7700/31675 (epoch 6.08), train_loss = 141.3737
7800/31675 (epoch 6.16), train_loss = 193.2330
7900/31675 (epoch 6.24), train_loss = 134.6816
8000/31675 (epoch 6.31), train_loss = 111.2546
8100/31675 (epoch 6.39), train_loss = 168.0664
8200/31675 (epoch 6.47), train_loss = 184.5089
8300/31675 (epoch 6.55), train_loss = 168.0994
8400/31675 (epoch 6.63), train_loss = 145.1965
8500/31675 (epoch 6.71), train_loss = 174.8552
8600/31675 (epoch 6.79), train_loss = 173.7721
8700/31675 (epoch 6.87), train_loss = 191.3827
8800/31675 (epoch 6.95), train_loss = 161.0672
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch7.00_158.39.t7
8900/31675 (epoch 7.02), train_loss = 218.3148
9000/31675 (epoch 7.10), train_loss = 157.8990
9100/31675 (epoch 7.18), train_loss = 183.1041
9200/31675 (epoch 7.26), train_loss = 176.4712
9300/31675 (epoch 7.34), train_loss = 157.2909
9400/31675 (epoch 7.42), train_loss = 172.3378
9500/31675 (epoch 7.50), train_loss = 170.8574
9600/31675 (epoch 7.58), train_loss = 143.9417
9700/31675 (epoch 7.66), train_loss = 186.8887
9800/31675 (epoch 7.73), train_loss = 162.1487
9900/31675 (epoch 7.81), train_loss = 157.1883
10000/31675 (epoch 7.89), train_loss = 156.6241
10100/31675 (epoch 7.97), train_loss = 180.1722
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch8.00_151.39.t7
10200/31675 (epoch 8.05), train_loss = 116.0106
10300/31675 (epoch 8.13), train_loss = 129.2786
10400/31675 (epoch 8.21), train_loss = 207.6249
10500/31675 (epoch 8.29), train_loss = 104.5945
10600/31675 (epoch 8.37), train_loss = 132.6894
10700/31675 (epoch 8.45), train_loss = 176.2369
10800/31675 (epoch 8.52), train_loss = 134.9308
10900/31675 (epoch 8.60), train_loss = 142.0736
11000/31675 (epoch 8.68), train_loss = 133.2670
11100/31675 (epoch 8.76), train_loss = 172.5976
11200/31675 (epoch 8.84), train_loss = 121.0163
11300/31675 (epoch 8.92), train_loss = 110.9682
11400/31675 (epoch 9.00), train_loss = 158.1777
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch9.00_149.49.t7
11500/31675 (epoch 9.08), train_loss = 148.1391
11600/31675 (epoch 9.16), train_loss = 178.7232
11700/31675 (epoch 9.23), train_loss = 123.9441
11800/31675 (epoch 9.31), train_loss = 123.8790
11900/31675 (epoch 9.39), train_loss = 190.1114
12000/31675 (epoch 9.47), train_loss = 203.7419
12100/31675 (epoch 9.55), train_loss = 159.9928
12200/31675 (epoch 9.63), train_loss = 158.1153
12300/31675 (epoch 9.71), train_loss = 131.7295
12400/31675 (epoch 9.79), train_loss = 188.1800
12500/31675 (epoch 9.87), train_loss = 142.4499
12600/31675 (epoch 9.94), train_loss = 230.6982
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch10.00_146.57.t7
12700/31675 (epoch 10.02), train_loss = 156.9309
12800/31675 (epoch 10.10), train_loss = 137.6987
12900/31675 (epoch 10.18), train_loss = 129.4219
13000/31675 (epoch 10.26), train_loss = 158.5684
13100/31675 (epoch 10.34), train_loss = 161.0942
13200/31675 (epoch 10.42), train_loss = 180.7851
13300/31675 (epoch 10.50), train_loss = 116.3297
13400/31675 (epoch 10.58), train_loss = 103.2180
13500/31675 (epoch 10.66), train_loss = 228.6890
13600/31675 (epoch 10.73), train_loss = 152.2666
13700/31675 (epoch 10.81), train_loss = 126.1322
13800/31675 (epoch 10.89), train_loss = 112.6598
13900/31675 (epoch 10.97), train_loss = 135.5179
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch11.00_145.04.t7
14000/31675 (epoch 11.05), train_loss = 100.1897
14100/31675 (epoch 11.13), train_loss = 141.0636
14200/31675 (epoch 11.21), train_loss = 149.9115
14300/31675 (epoch 11.29), train_loss = 112.7567
14400/31675 (epoch 11.37), train_loss = 147.1632
14500/31675 (epoch 11.44), train_loss = 137.0094
14600/31675 (epoch 11.52), train_loss = 129.4210
14700/31675 (epoch 11.60), train_loss = 136.0187
14800/31675 (epoch 11.68), train_loss = 123.0264
14900/31675 (epoch 11.76), train_loss = 137.9644
15000/31675 (epoch 11.84), train_loss = 130.8094
15100/31675 (epoch 11.92), train_loss = 87.5872
15200/31675 (epoch 12.00), train_loss = 128.7816
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch12.00_141.58.t7
15300/31675 (epoch 12.08), train_loss = 146.3830
15400/31675 (epoch 12.15), train_loss = 161.2118
15500/31675 (epoch 12.23), train_loss = 127.5935
15600/31675 (epoch 12.31), train_loss = 133.2026
15700/31675 (epoch 12.39), train_loss = 217.7041
15800/31675 (epoch 12.47), train_loss = 145.0895
15900/31675 (epoch 12.55), train_loss = 107.9422
16000/31675 (epoch 12.63), train_loss = 143.7288
16100/31675 (epoch 12.71), train_loss = 120.0762
16200/31675 (epoch 12.79), train_loss = 143.4678
16300/31675 (epoch 12.87), train_loss = 134.0410
16400/31675 (epoch 12.94), train_loss = 185.3824
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch13.00_140.84.t7
16500/31675 (epoch 13.02), train_loss = 161.8008
16600/31675 (epoch 13.10), train_loss = 142.4872
16700/31675 (epoch 13.18), train_loss = 151.0291
16800/31675 (epoch 13.26), train_loss = 138.8018
16900/31675 (epoch 13.34), train_loss = 114.5137
17000/31675 (epoch 13.42), train_loss = 140.7112
17100/31675 (epoch 13.50), train_loss = 105.9626
17200/31675 (epoch 13.58), train_loss = 83.9275
17300/31675 (epoch 13.65), train_loss = 163.0975
17400/31675 (epoch 13.73), train_loss = 130.6434
17500/31675 (epoch 13.81), train_loss = 119.0841
17600/31675 (epoch 13.89), train_loss = 107.8958
17700/31675 (epoch 13.97), train_loss = 137.8417
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch14.00_132.52.t7
17800/31675 (epoch 14.05), train_loss = 124.0942
17900/31675 (epoch 14.13), train_loss = 117.4392
18000/31675 (epoch 14.21), train_loss = 130.3233
18100/31675 (epoch 14.29), train_loss = 112.2990
18200/31675 (epoch 14.36), train_loss = 105.0138
18300/31675 (epoch 14.44), train_loss = 107.7117
18400/31675 (epoch 14.52), train_loss = 112.1500
18500/31675 (epoch 14.60), train_loss = 117.9624
18600/31675 (epoch 14.68), train_loss = 142.7740
18700/31675 (epoch 14.76), train_loss = 134.4659
18800/31675 (epoch 14.84), train_loss = 91.5064
18900/31675 (epoch 14.92), train_loss = 100.8196
19000/31675 (epoch 15.00), train_loss = 103.1925
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch15.00_130.91.t7
19100/31675 (epoch 15.07), train_loss = 117.1723
19200/31675 (epoch 15.15), train_loss = 116.2074
19300/31675 (epoch 15.23), train_loss = 80.1053
19400/31675 (epoch 15.31), train_loss = 135.2300
19500/31675 (epoch 15.39), train_loss = 185.7589
19600/31675 (epoch 15.47), train_loss = 136.6290
19700/31675 (epoch 15.55), train_loss = 111.4722
19800/31675 (epoch 15.63), train_loss = 113.1709
19900/31675 (epoch 15.71), train_loss = 94.4868
20000/31675 (epoch 15.79), train_loss = 111.0743
20100/31675 (epoch 15.86), train_loss = 119.4882
20200/31675 (epoch 15.94), train_loss = 120.4031
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch16.00_130.00.t7
20300/31675 (epoch 16.02), train_loss = 136.0015
20400/31675 (epoch 16.10), train_loss = 98.3182
20500/31675 (epoch 16.18), train_loss = 141.7701
20600/31675 (epoch 16.26), train_loss = 171.3912
20700/31675 (epoch 16.34), train_loss = 99.4955
20800/31675 (epoch 16.42), train_loss = 126.5100
20900/31675 (epoch 16.50), train_loss = 135.4863
21000/31675 (epoch 16.57), train_loss = 91.0479
21100/31675 (epoch 16.65), train_loss = 126.2115
21200/31675 (epoch 16.73), train_loss = 149.4726
d21300/31675 (epoch 16.81), train_loss = 87.0476
21400/31675 (epoch 16.89), train_loss = 78.0156
21500/31675 (epoch 16.97), train_loss = 70.4944
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch17.00_126.35.t7
21600/31675 (epoch 17.05), train_loss = 102.3634
21700/31675 (epoch 17.13), train_loss = 109.8009
21800/31675 (epoch 17.21), train_loss = 129.0442
21900/31675 (epoch 17.28), train_loss = 89.7495
22000/31675 (epoch 17.36), train_loss = 108.9761
22100/31675 (epoch 17.44), train_loss = 106.9783
22200/31675 (epoch 17.52), train_loss = 85.5451
22300/31675 (epoch 17.60), train_loss = 126.5788
22400/31675 (epoch 17.68), train_loss = 132.2608
22500/31675 (epoch 17.76), train_loss = 74.0349
22600/31675 (epoch 17.84), train_loss = 75.8679
22700/31675 (epoch 17.92), train_loss = 97.7860
22800/31675 (epoch 18.00), train_loss = 110.0467
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch18.00_125.53.t7
22900/31675 (epoch 18.07), train_loss = 89.6176
23000/31675 (epoch 18.15), train_loss = 138.7959
23100/31675 (epoch 18.23), train_loss = 90.8744
23200/31675 (epoch 18.31), train_loss = 140.9495
23300/31675 (epoch 18.39), train_loss = 149.4366
23400/31675 (epoch 18.47), train_loss = 127.3338
23500/31675 (epoch 18.55), train_loss = 90.9294
23600/31675 (epoch 18.63), train_loss = 97.4022
23700/31675 (epoch 18.71), train_loss = 103.0955
23800/31675 (epoch 18.78), train_loss = 102.0323
23900/31675 (epoch 18.86), train_loss = 104.4937
24000/31675 (epoch 18.94), train_loss = 92.4890
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch19.00_124.30.t7
24100/31675 (epoch 19.02), train_loss = 102.4333
24200/31675 (epoch 19.10), train_loss = 99.1028
24300/31675 (epoch 19.18), train_loss = 109.3732
24400/31675 (epoch 19.26), train_loss = 109.8171
24500/31675 (epoch 19.34), train_loss = 97.5112
24600/31675 (epoch 19.42), train_loss = 145.1198
24700/31675 (epoch 19.49), train_loss = 96.1052
24800/31675 (epoch 19.57), train_loss = 81.6132
24900/31675 (epoch 19.65), train_loss = 100.9439
25000/31675 (epoch 19.73), train_loss = 129.0468
25100/31675 (epoch 19.81), train_loss = 87.8252
25200/31675 (epoch 19.89), train_loss = 89.5284
25300/31675 (epoch 19.97), train_loss = 52.1641
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch20.00_123.69.t7
25400/31675 (epoch 20.05), train_loss = 99.9568
25500/31675 (epoch 20.13), train_loss = 118.0871
25600/31675 (epoch 20.21), train_loss = 118.6653
25700/31675 (epoch 20.28), train_loss = 90.6946
25800/31675 (epoch 20.36), train_loss = 114.4039
25900/31675 (epoch 20.44), train_loss = 78.5488
26000/31675 (epoch 20.52), train_loss = 112.3676
26100/31675 (epoch 20.60), train_loss = 92.4415
26200/31675 (epoch 20.68), train_loss = 130.9558
26300/31675 (epoch 20.76), train_loss = 108.5386
26400/31675 (epoch 20.84), train_loss = 88.6149
26500/31675 (epoch 20.92), train_loss = 71.9182
26600/31675 (epoch 20.99), train_loss = 152.6365
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch21.00_123.18.t7
26700/31675 (epoch 21.07), train_loss = 105.6602
26800/31675 (epoch 21.15), train_loss = 126.5473
26900/31675 (epoch 21.23), train_loss = 106.3288
27000/31675 (epoch 21.31), train_loss = 114.4642
27100/31675 (epoch 21.39), train_loss = 104.3161
27200/31675 (epoch 21.47), train_loss = 106.3294
27300/31675 (epoch 21.55), train_loss = 91.8286
27400/31675 (epoch 21.63), train_loss = 85.4033
27500/31675 (epoch 21.70), train_loss = 121.0194
27600/31675 (epoch 21.78), train_loss = 92.0562
27700/31675 (epoch 21.86), train_loss = 101.6783
27800/31675 (epoch 21.94), train_loss = 84.2354
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch22.00_122.85.t7
27900/31675 (epoch 22.02), train_loss = 106.6517
28000/31675 (epoch 22.10), train_loss = 84.0312
28100/31675 (epoch 22.18), train_loss = 107.4262
28200/31675 (epoch 22.26), train_loss = 113.2599
28300/31675 (epoch 22.34), train_loss = 94.5707
28400/31675 (epoch 22.42), train_loss = 151.1607
28500/31675 (epoch 22.49), train_loss = 105.3479
28600/31675 (epoch 22.57), train_loss = 111.4545
28700/31675 (epoch 22.65), train_loss = 99.9958
28800/31675 (epoch 22.73), train_loss = 139.2409
28900/31675 (epoch 22.81), train_loss = 91.4084
29000/31675 (epoch 22.89), train_loss = 79.4813
29100/31675 (epoch 22.97), train_loss = 97.5256
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch23.00_122.58.t7
29200/31675 (epoch 23.05), train_loss = 117.4427
29300/31675 (epoch 23.13), train_loss = 104.0569
29400/31675 (epoch 23.20), train_loss = 137.5399
29500/31675 (epoch 23.28), train_loss = 91.9614
29600/31675 (epoch 23.36), train_loss = 87.3350
29700/31675 (epoch 23.44), train_loss = 67.8878
29800/31675 (epoch 23.52), train_loss = 103.1114
29900/31675 (epoch 23.60), train_loss = 100.8149
30000/31675 (epoch 23.68), train_loss = 118.3131
30100/31675 (epoch 23.76), train_loss = 123.7189
30200/31675 (epoch 23.84), train_loss = 103.1361
30300/31675 (epoch 23.91), train_loss = 75.9410
30400/31675 (epoch 23.99), train_loss = 122.3899
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch24.00_122.53.t7
30500/31675 (epoch 24.07), train_loss = 83.0460
30600/31675 (epoch 24.15), train_loss = 101.1161
30700/31675 (epoch 24.23), train_loss = 68.5993
30800/31675 (epoch 24.31), train_loss = 115.3679
30900/31675 (epoch 24.39), train_loss = 120.2563
31000/31675 (epoch 24.47), train_loss = 127.7466
31100/31675 (epoch 24.55), train_loss = 78.7842
31200/31675 (epoch 24.63), train_loss = 98.9353
31300/31675 (epoch 24.70), train_loss = 124.4050
31400/31675 (epoch 24.78), train_loss = 115.8360
31500/31675 (epoch 24.86), train_loss = 112.5002
31600/31675 (epoch 24.94), train_loss = 81.2895
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch25.00_122.52.t7
evaluating loss over split index 3
Perplexity on test set: 115.86590686001
Thanks;