yoonkim / lstm-char-cnn Goto Github PK

LSTM language model with CNN over characters

License: MIT License

Lua 95.38% Shell 4.62%

lstm-char-cnn's Introduction

Character-Aware Neural Language Models

Code for the paper Character-Aware Neural Language Models (AAAI 2016).

A neural language model (NLM) built on character inputs only. Predictions are still made at the word-level. The model employs a convolutional neural network (CNN) over characters to use as inputs into an long short-term memory (LSTM) recurrent neural network language model (RNN-LM). Also optionally passes the output from the CNN through a Highway Network, which improves performance.

Much of the base code is from Andrej Karpathy's excellent character RNN implementation.

Requirements

Code is written in Lua and requires Torch. It also requires the nngraph and the luautf8 packages, which can be installed via:

luarocks install nngraph
luarocks install luautf8

GPU usage will additionally require cutorch and cunn packages:

luarocks install cutorch
luarocks install cunn

cudnn will result in a good (8x-10x) speed-up for convolutions, so it is highly recommended. This will make the training time of a character-level model be somewhat competitive against a word-level model (1500 tokens/sec vs 3000 tokens/sec for the large character/word-level models described below).

git clone https://github.com/soumith/cudnn.torch.git
cd cudnn.torch
luarocks make cudnn-scm-1.rockspec

Data

Data should be put into the data/ directory, split into train.txt, valid.txt, and test.txt

Each line of the .txt file should be a sentence. The English Penn Treebank (PTB) data (Tomas Mikolov's pre-processed version with vocab size equal to 10K, widely used by the language modeling community) is given as the default.

The paper also runs the models on non-English data (Czech, French, German, Russian, and Spanish), from the ICML 2014 paper Compositional Morphology for Word Representations and Language Modelling by Jan Botha and Phil Blunsom. This can be downloaded from Jan's website.

For ease of use, we provide a script to download the non-English data (get_data.sh). The script also saves the downloaded data into the relevant folders.

Note on PTB

The PTB data above does not have end-of-sentence tokens for each sentence, and hence these must be manually appended. This can be done by adding -EOS '+' to the script (obviously you can use other characters than + to represent an end-of-sentence token---we recommend a single unused character).

The non-English data already have end-of-sentence tokens for each line so, you want to add -EOS '' to the command line.

Unicode in Lua

Lua is unicode-agnostic (each string is just a sequence of bytes) so we use the luautf8 package to deal with languages where a character can be more than one byte (e.g. Russian). Many thanks to vseledkin for alerting us to the fact that previous version of the code did not take this account!

Model

Here are some example scripts. Add -gpuid 0 to each line to use a GPU (which is required to get any reasonable speed with the CNN), and -cudnn 1 to use the cudnn package. Scripts to reproduce the results of the paper can be found under run_models.sh

Character-level models

Large character-level model (LSTM-CharCNN-Large in the paper). This is the default: should get ~82 on valid and ~79 on test. Takes ~5 hours with cudnn.

th main.lua -savefile char-large -EOS '+'

Small character-level model (LSTM-CharCNN-Small in the paper). This should get ~96 on valid and ~93 on test. Takes ~2 hours with cudnn.

th main.lua -savefile char-small -rnn_size 300 -highway_layers 1 
-kernels '{1,2,3,4,5,6}' -feature_maps '{25,50,75,100,125,150}' -EOS '+'

Word-level models

Large word-level model (LSTM-Word-Large in the paper). This should get ~89 on valid and ~85 on test.

th main.lua -savefile word-large -word_vec_size 650 -highway_layers 0 
-use_chars 0 -use_words 1 -EOS '+'

Small word-level model (LSTM-Word-Small in the paper). This should get ~101 on valid and ~98 on test.

th main.lua -savefile word-small -word_vec_size 200 -highway_layers 0 
-use_chars 0 -use_words 1 -rnn_size 200 -EOS '+'

Combining both

Note that if -use_chars and -use_words are both set to 1, the model will concatenate the output from the CNN with the word embedding. We've found this model to underperform a purely character-level model, though.

Evaluation

By default main.lua will evaluate the model on test data after training, but this will use the last epoch's model, and also will be slow due to the way the data is set up.

Evaluation on test can be performed via the following script:

th evaluate.lua -model model_file.t7 -data_dir data/ptb -savefile model_results.t7

Where model_file.t7 is the path to the best performing (on validation) model. This will also save some basic statistics (e.g. perplexity by token) in model_results.t7.

Hierarchical Softmax

Training on a larger vocabulary (e.g. 100K+) will require hierarchical softmax (HSM) to train at a reasonable speed. You can use the -hsm option to do this. For example -hsm 500 will randomly split the vocabulary into 500 clusters of (approximately) equal size. -hsm 0 is the default and will not use HSM. -hsm -1 will automatically choose the number of clusters for you, by choosing the integer closest to sqrt(|V|).

Batch Size

If training on bigger datasets you should probably use a larger batch size (e.g. -batch_size 100).

Licence

MIT

lstm-char-cnn's People

Contributors

Stargazers

Watchers

Forkers

samim23 srush yjernite fangyw darkseed wavelets carpedm20 fiskio zhoujialinmumu xsongx liangkai dapeng2018 cookid ml-lab cheng6076 fabiofumarola skaasj vseledkin lemaoliu zbxzc35 cc13ny tigerneil codeaudit tianxin1860 frankchu0229 dylansun milesqli xuehui1991 zh4ngx lviiii suzhaolong yasumasamiyamoto peterjliu hughperkins thorjonsson pengsun zhangxinnan fone4u wizardofoddz noa anirudh9119 flyinggh zhengkaifu liyuanlucasliu weiyumiao aigujin jroakes lijian8 bygreencn abhitopia nagyistge spyatakov wanjinchang jeffliusky jungikim miradel51 v-chuqin tomokane nobodyinamerica boknilev binbinbian akbari59 nipengmath ankitkv ericjpfk tianlongwang yanyankangkang nooralahzadeh yss4 imclab yangsaiyong stevesyang njustsecretweapon pranjaldaga stevenlol mkroutikov vyraun sabirdvd shaform mhjabreel tastyminerals cynsithia likerainsun angelikilazaridou jsk11 rexzhang4321 fangzheng354 johndpope zherongz soledad89 izzeddingur stillkeeptry aniketgurav hfxunlp polaris79 sanket-patil rongyousu xiaoyun4 shaoxuan92 nymph332088

lstm-char-cnn's Issues

problem about torch

Hi, Great work! But I met some problems while trying to run the code myself.
I installed Lua following [http://torch.ch/docs/getting-started.html#_ ] , and installed all required packages, but the Lua crashed when I tried to train the model.

the error message is:

/home/s2free/torch/install/bin/luajit: /home/s2free/torch/install/share/lua/5.1/
trepl/init.lua:384: /home/s2free/torch/install/share/lua/5.1/torch/init.lua:102:
 class nn.Squeeze has been already assigned a parent class

stack traceback:
    [C]: in function 'error'
    /home/s2free/torch/install/share/lua/5.1/trepl/init.lua:384: in function
 'require'
    main.lua:14: in main chunk
    [C]: in function 'dofile'
    ...free/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main
 chunk
    [C]: at 0x00406240

And I found that if I comment the 384 line of /home/s2free/torch/install/share/lua/5.1/trepl/init.lua, the code won't break.

As I'm a rookie on Torch, I'm really curious about why this would happen.

Thanks a lot!

OpenCL version

Hi,
Would it be hard, by including cltorch and clnn libraries, to modify your code to support an OpenCL GPU implementation? As you mentioned, you've adapted Andrei's code for this and it has an OpenCL option to run (e.g., -opencl 1)

Reduce memory requirement while precessing the data

Hi, is there any easy way to fix the memory hungry step during the pre-processing of the data? I guess this command is the culprit.
output_chars[split] = torch.ones(split_counts[split], max_word_l):long()

I have 40M sentences of data and it requires more than 200GB of RAM

Cheers - Hassan

error in cuda version

Trying to run the cuda version I get this error:

luajit: ~/tools/torch/install/share/lua/5.1/nn/THNN.lua:804: wrong number of arguments for function call
stack traceback:
[C]: in function 'v'
~/tools/torch/install/share/lua/5.1/nn/THNN.lua:804: in function 'LookupTable_accGradParameters'
~/tools/torch/install/share/lua/5.1/nn/LookupTable.lua:73: in function 'accGradParameters'
~/tools/torch/install/share/lua/5.1/nngraph/gmodule.lua:409: in function 'neteval'
~/tools/torch/install/share/lua/5.1/nngraph/gmodule.lua:420: in function 'accGradParameters'
~/tools/torch/install/share/lua/5.1/nn/Module.lua:31: in function 'backward'
./util/HLogSoftMax.lua:91: in function 'updateGradInput'
./util/HLogSoftMax.lua:98: in function 'backward'
main.lua:327: in function 'feval'
main.lua:380: in main chunk
[C]: at 0x00406670

shuffling the data performs worse

Hi,

I have found that the model performs much worse when trained using the shuffled data (PTB). For example, the final PPL of the large word-level model is 97.79. Do you have any idea?

Thanks!

How te generate train.txt and valid.txt from personal datas ?

Hello !

First of all, thank you for this work !

I'm would like to train the model from a personal dataset but I don't know how to generate the required train.txt and valid.txt files ? Could you help me here ?

Thanks !

Problem with LSTM-Word-Small on CPU. Only get 122 on validation and 115 on test

Hi
Great code and model.
I just ran LSTM-Word-Small on my mac, but got unsatisfied result. I can't figure why.
Here is the log file.

yangyifans-MacBook-Pro:lstm-char-cnn yang1fan2$ th main.lua -savefile word-small -word_vec_size 200 -highway_layers 0 -use_chars 0 -use_words 1 -rnn_size 200 -EOS '+'
loading data files...
Word vocab size: 9999, Char vocab size: 50
reshaping tensors...
data load done. Number of batches in train: 1267, val: 100, test: 1
Word vocab size: 9999, Char vocab size: 50, Max word length (incl. padding): 19
creating an LSTM-CNN with 2 layers
number of parameters in the model: 4652799
cloning rnn
cloning criterion
100/31675 (epoch 0.08), train_loss = 1092.5376
200/31675 (epoch 0.16), train_loss = 1062.9700
300/31675 (epoch 0.24), train_loss = 707.9908
400/31675 (epoch 0.32), train_loss = 538.6978
500/31675 (epoch 0.39), train_loss = 508.0643
600/31675 (epoch 0.47), train_loss = 562.3513
700/31675 (epoch 0.55), train_loss = 447.6828
800/31675 (epoch 0.63), train_loss = 361.3279
900/31675 (epoch 0.71), train_loss = 341.5817
1000/31675 (epoch 0.79), train_loss = 384.3430
1100/31675 (epoch 0.87), train_loss = 322.6886
1200/31675 (epoch 0.95), train_loss = 282.5245
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch1.00_309.59.t7
1300/31675 (epoch 1.03), train_loss = 336.0408
1400/31675 (epoch 1.10), train_loss = 272.2516
1500/31675 (epoch 1.18), train_loss = 328.0399
1600/31675 (epoch 1.26), train_loss = 413.7821
1700/31675 (epoch 1.34), train_loss = 250.8095
1800/31675 (epoch 1.42), train_loss = 245.7039
1900/31675 (epoch 1.50), train_loss = 335.7718
2000/31675 (epoch 1.58), train_loss = 252.8674
2100/31675 (epoch 1.66), train_loss = 211.3629
2200/31675 (epoch 1.74), train_loss = 281.4043
2300/31675 (epoch 1.82), train_loss = 201.7554
2400/31675 (epoch 1.89), train_loss = 297.1916
2500/31675 (epoch 1.97), train_loss = 308.5774
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch2.00_229.58.t7
2600/31675 (epoch 2.05), train_loss = 204.1578
2700/31675 (epoch 2.13), train_loss = 229.0827
2800/31675 (epoch 2.21), train_loss = 258.6883
2900/31675 (epoch 2.29), train_loss = 215.6444
3000/31675 (epoch 2.37), train_loss = 201.1391
3100/31675 (epoch 2.45), train_loss = 245.8782
3200/31675 (epoch 2.53), train_loss = 285.2821
3300/31675 (epoch 2.60), train_loss = 210.9398
3400/31675 (epoch 2.68), train_loss = 255.7903
3500/31675 (epoch 2.76), train_loss = 138.6418
3600/31675 (epoch 2.84), train_loss = 167.4747
3700/31675 (epoch 2.92), train_loss = 196.0062
3800/31675 (epoch 3.00), train_loss = 272.9710
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch3.00_199.35.t7
3900/31675 (epoch 3.08), train_loss = 189.8568
4000/31675 (epoch 3.16), train_loss = 228.6565
4100/31675 (epoch 3.24), train_loss = 224.3237
4200/31675 (epoch 3.31), train_loss = 182.4509
4300/31675 (epoch 3.39), train_loss = 231.6450
4400/31675 (epoch 3.47), train_loss = 198.1385
4500/31675 (epoch 3.55), train_loss = 213.2757
4600/31675 (epoch 3.63), train_loss = 194.8259
4700/31675 (epoch 3.71), train_loss = 261.0416
4800/31675 (epoch 3.79), train_loss = 175.5076
4900/31675 (epoch 3.87), train_loss = 246.1651
5000/31675 (epoch 3.95), train_loss = 199.7342
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch4.00_181.66.t7
5100/31675 (epoch 4.03), train_loss = 239.4468
5200/31675 (epoch 4.10), train_loss = 199.5525
5300/31675 (epoch 4.18), train_loss = 224.9765
5400/31675 (epoch 4.26), train_loss = 180.9969
5500/31675 (epoch 4.34), train_loss = 193.3227
5600/31675 (epoch 4.42), train_loss = 167.5974
5700/31675 (epoch 4.50), train_loss = 230.5838
5800/31675 (epoch 4.58), train_loss = 141.6197
5900/31675 (epoch 4.66), train_loss = 166.2485
6000/31675 (epoch 4.74), train_loss = 204.4503
6100/31675 (epoch 4.81), train_loss = 155.5831
6200/31675 (epoch 4.89), train_loss = 192.1082
6300/31675 (epoch 4.97), train_loss = 189.0958
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch5.00_172.21.t7
6400/31675 (epoch 5.05), train_loss = 146.9685
6500/31675 (epoch 5.13), train_loss = 177.8722
6600/31675 (epoch 5.21), train_loss = 196.9578
6700/31675 (epoch 5.29), train_loss = 150.1310
6800/31675 (epoch 5.37), train_loss = 127.3223
6900/31675 (epoch 5.45), train_loss = 255.7305
7000/31675 (epoch 5.52), train_loss = 221.3599
7100/31675 (epoch 5.60), train_loss = 200.8017
7200/31675 (epoch 5.68), train_loss = 184.7957
7300/31675 (epoch 5.76), train_loss = 140.1135
7400/31675 (epoch 5.84), train_loss = 177.0135
7500/31675 (epoch 5.92), train_loss = 147.8841
7600/31675 (epoch 6.00), train_loss = 153.9457
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch6.00_162.51.t7
7700/31675 (epoch 6.08), train_loss = 141.3737
7800/31675 (epoch 6.16), train_loss = 193.2330
7900/31675 (epoch 6.24), train_loss = 134.6816
8000/31675 (epoch 6.31), train_loss = 111.2546
8100/31675 (epoch 6.39), train_loss = 168.0664
8200/31675 (epoch 6.47), train_loss = 184.5089
8300/31675 (epoch 6.55), train_loss = 168.0994
8400/31675 (epoch 6.63), train_loss = 145.1965
8500/31675 (epoch 6.71), train_loss = 174.8552
8600/31675 (epoch 6.79), train_loss = 173.7721
8700/31675 (epoch 6.87), train_loss = 191.3827
8800/31675 (epoch 6.95), train_loss = 161.0672
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch7.00_158.39.t7
8900/31675 (epoch 7.02), train_loss = 218.3148
9000/31675 (epoch 7.10), train_loss = 157.8990
9100/31675 (epoch 7.18), train_loss = 183.1041
9200/31675 (epoch 7.26), train_loss = 176.4712
9300/31675 (epoch 7.34), train_loss = 157.2909
9400/31675 (epoch 7.42), train_loss = 172.3378
9500/31675 (epoch 7.50), train_loss = 170.8574
9600/31675 (epoch 7.58), train_loss = 143.9417
9700/31675 (epoch 7.66), train_loss = 186.8887
9800/31675 (epoch 7.73), train_loss = 162.1487
9900/31675 (epoch 7.81), train_loss = 157.1883
10000/31675 (epoch 7.89), train_loss = 156.6241
10100/31675 (epoch 7.97), train_loss = 180.1722
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch8.00_151.39.t7
10200/31675 (epoch 8.05), train_loss = 116.0106
10300/31675 (epoch 8.13), train_loss = 129.2786
10400/31675 (epoch 8.21), train_loss = 207.6249
10500/31675 (epoch 8.29), train_loss = 104.5945
10600/31675 (epoch 8.37), train_loss = 132.6894
10700/31675 (epoch 8.45), train_loss = 176.2369
10800/31675 (epoch 8.52), train_loss = 134.9308
10900/31675 (epoch 8.60), train_loss = 142.0736
11000/31675 (epoch 8.68), train_loss = 133.2670
11100/31675 (epoch 8.76), train_loss = 172.5976
11200/31675 (epoch 8.84), train_loss = 121.0163
11300/31675 (epoch 8.92), train_loss = 110.9682
11400/31675 (epoch 9.00), train_loss = 158.1777
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch9.00_149.49.t7
11500/31675 (epoch 9.08), train_loss = 148.1391
11600/31675 (epoch 9.16), train_loss = 178.7232
11700/31675 (epoch 9.23), train_loss = 123.9441
11800/31675 (epoch 9.31), train_loss = 123.8790
11900/31675 (epoch 9.39), train_loss = 190.1114
12000/31675 (epoch 9.47), train_loss = 203.7419
12100/31675 (epoch 9.55), train_loss = 159.9928
12200/31675 (epoch 9.63), train_loss = 158.1153
12300/31675 (epoch 9.71), train_loss = 131.7295
12400/31675 (epoch 9.79), train_loss = 188.1800
12500/31675 (epoch 9.87), train_loss = 142.4499
12600/31675 (epoch 9.94), train_loss = 230.6982
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch10.00_146.57.t7
12700/31675 (epoch 10.02), train_loss = 156.9309
12800/31675 (epoch 10.10), train_loss = 137.6987
12900/31675 (epoch 10.18), train_loss = 129.4219
13000/31675 (epoch 10.26), train_loss = 158.5684
13100/31675 (epoch 10.34), train_loss = 161.0942
13200/31675 (epoch 10.42), train_loss = 180.7851
13300/31675 (epoch 10.50), train_loss = 116.3297
13400/31675 (epoch 10.58), train_loss = 103.2180
13500/31675 (epoch 10.66), train_loss = 228.6890
13600/31675 (epoch 10.73), train_loss = 152.2666
13700/31675 (epoch 10.81), train_loss = 126.1322
13800/31675 (epoch 10.89), train_loss = 112.6598
13900/31675 (epoch 10.97), train_loss = 135.5179
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch11.00_145.04.t7
14000/31675 (epoch 11.05), train_loss = 100.1897
14100/31675 (epoch 11.13), train_loss = 141.0636
14200/31675 (epoch 11.21), train_loss = 149.9115
14300/31675 (epoch 11.29), train_loss = 112.7567
14400/31675 (epoch 11.37), train_loss = 147.1632
14500/31675 (epoch 11.44), train_loss = 137.0094
14600/31675 (epoch 11.52), train_loss = 129.4210
14700/31675 (epoch 11.60), train_loss = 136.0187
14800/31675 (epoch 11.68), train_loss = 123.0264
14900/31675 (epoch 11.76), train_loss = 137.9644
15000/31675 (epoch 11.84), train_loss = 130.8094
15100/31675 (epoch 11.92), train_loss = 87.5872
15200/31675 (epoch 12.00), train_loss = 128.7816
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch12.00_141.58.t7
15300/31675 (epoch 12.08), train_loss = 146.3830
15400/31675 (epoch 12.15), train_loss = 161.2118
15500/31675 (epoch 12.23), train_loss = 127.5935
15600/31675 (epoch 12.31), train_loss = 133.2026
15700/31675 (epoch 12.39), train_loss = 217.7041
15800/31675 (epoch 12.47), train_loss = 145.0895
15900/31675 (epoch 12.55), train_loss = 107.9422
16000/31675 (epoch 12.63), train_loss = 143.7288
16100/31675 (epoch 12.71), train_loss = 120.0762
16200/31675 (epoch 12.79), train_loss = 143.4678
16300/31675 (epoch 12.87), train_loss = 134.0410
16400/31675 (epoch 12.94), train_loss = 185.3824
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch13.00_140.84.t7
16500/31675 (epoch 13.02), train_loss = 161.8008
16600/31675 (epoch 13.10), train_loss = 142.4872
16700/31675 (epoch 13.18), train_loss = 151.0291
16800/31675 (epoch 13.26), train_loss = 138.8018
16900/31675 (epoch 13.34), train_loss = 114.5137
17000/31675 (epoch 13.42), train_loss = 140.7112
17100/31675 (epoch 13.50), train_loss = 105.9626
17200/31675 (epoch 13.58), train_loss = 83.9275
17300/31675 (epoch 13.65), train_loss = 163.0975
17400/31675 (epoch 13.73), train_loss = 130.6434
17500/31675 (epoch 13.81), train_loss = 119.0841
17600/31675 (epoch 13.89), train_loss = 107.8958
17700/31675 (epoch 13.97), train_loss = 137.8417
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch14.00_132.52.t7
17800/31675 (epoch 14.05), train_loss = 124.0942
17900/31675 (epoch 14.13), train_loss = 117.4392
18000/31675 (epoch 14.21), train_loss = 130.3233
18100/31675 (epoch 14.29), train_loss = 112.2990
18200/31675 (epoch 14.36), train_loss = 105.0138
18300/31675 (epoch 14.44), train_loss = 107.7117
18400/31675 (epoch 14.52), train_loss = 112.1500
18500/31675 (epoch 14.60), train_loss = 117.9624
18600/31675 (epoch 14.68), train_loss = 142.7740
18700/31675 (epoch 14.76), train_loss = 134.4659
18800/31675 (epoch 14.84), train_loss = 91.5064
18900/31675 (epoch 14.92), train_loss = 100.8196
19000/31675 (epoch 15.00), train_loss = 103.1925
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch15.00_130.91.t7
19100/31675 (epoch 15.07), train_loss = 117.1723
19200/31675 (epoch 15.15), train_loss = 116.2074
19300/31675 (epoch 15.23), train_loss = 80.1053
19400/31675 (epoch 15.31), train_loss = 135.2300
19500/31675 (epoch 15.39), train_loss = 185.7589
19600/31675 (epoch 15.47), train_loss = 136.6290
19700/31675 (epoch 15.55), train_loss = 111.4722
19800/31675 (epoch 15.63), train_loss = 113.1709
19900/31675 (epoch 15.71), train_loss = 94.4868
20000/31675 (epoch 15.79), train_loss = 111.0743
20100/31675 (epoch 15.86), train_loss = 119.4882
20200/31675 (epoch 15.94), train_loss = 120.4031
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch16.00_130.00.t7
20300/31675 (epoch 16.02), train_loss = 136.0015
20400/31675 (epoch 16.10), train_loss = 98.3182
20500/31675 (epoch 16.18), train_loss = 141.7701
20600/31675 (epoch 16.26), train_loss = 171.3912
20700/31675 (epoch 16.34), train_loss = 99.4955
20800/31675 (epoch 16.42), train_loss = 126.5100
20900/31675 (epoch 16.50), train_loss = 135.4863
21000/31675 (epoch 16.57), train_loss = 91.0479
21100/31675 (epoch 16.65), train_loss = 126.2115
21200/31675 (epoch 16.73), train_loss = 149.4726
d21300/31675 (epoch 16.81), train_loss = 87.0476
21400/31675 (epoch 16.89), train_loss = 78.0156
21500/31675 (epoch 16.97), train_loss = 70.4944
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch17.00_126.35.t7
21600/31675 (epoch 17.05), train_loss = 102.3634
21700/31675 (epoch 17.13), train_loss = 109.8009
21800/31675 (epoch 17.21), train_loss = 129.0442
21900/31675 (epoch 17.28), train_loss = 89.7495
22000/31675 (epoch 17.36), train_loss = 108.9761
22100/31675 (epoch 17.44), train_loss = 106.9783
22200/31675 (epoch 17.52), train_loss = 85.5451
22300/31675 (epoch 17.60), train_loss = 126.5788
22400/31675 (epoch 17.68), train_loss = 132.2608
22500/31675 (epoch 17.76), train_loss = 74.0349
22600/31675 (epoch 17.84), train_loss = 75.8679
22700/31675 (epoch 17.92), train_loss = 97.7860
22800/31675 (epoch 18.00), train_loss = 110.0467
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch18.00_125.53.t7
22900/31675 (epoch 18.07), train_loss = 89.6176
23000/31675 (epoch 18.15), train_loss = 138.7959
23100/31675 (epoch 18.23), train_loss = 90.8744
23200/31675 (epoch 18.31), train_loss = 140.9495
23300/31675 (epoch 18.39), train_loss = 149.4366
23400/31675 (epoch 18.47), train_loss = 127.3338
23500/31675 (epoch 18.55), train_loss = 90.9294
23600/31675 (epoch 18.63), train_loss = 97.4022
23700/31675 (epoch 18.71), train_loss = 103.0955
23800/31675 (epoch 18.78), train_loss = 102.0323
23900/31675 (epoch 18.86), train_loss = 104.4937
24000/31675 (epoch 18.94), train_loss = 92.4890
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch19.00_124.30.t7
24100/31675 (epoch 19.02), train_loss = 102.4333
24200/31675 (epoch 19.10), train_loss = 99.1028
24300/31675 (epoch 19.18), train_loss = 109.3732
24400/31675 (epoch 19.26), train_loss = 109.8171
24500/31675 (epoch 19.34), train_loss = 97.5112
24600/31675 (epoch 19.42), train_loss = 145.1198
24700/31675 (epoch 19.49), train_loss = 96.1052
24800/31675 (epoch 19.57), train_loss = 81.6132
24900/31675 (epoch 19.65), train_loss = 100.9439
25000/31675 (epoch 19.73), train_loss = 129.0468
25100/31675 (epoch 19.81), train_loss = 87.8252
25200/31675 (epoch 19.89), train_loss = 89.5284
25300/31675 (epoch 19.97), train_loss = 52.1641
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch20.00_123.69.t7
25400/31675 (epoch 20.05), train_loss = 99.9568
25500/31675 (epoch 20.13), train_loss = 118.0871
25600/31675 (epoch 20.21), train_loss = 118.6653
25700/31675 (epoch 20.28), train_loss = 90.6946
25800/31675 (epoch 20.36), train_loss = 114.4039
25900/31675 (epoch 20.44), train_loss = 78.5488
26000/31675 (epoch 20.52), train_loss = 112.3676
26100/31675 (epoch 20.60), train_loss = 92.4415
26200/31675 (epoch 20.68), train_loss = 130.9558
26300/31675 (epoch 20.76), train_loss = 108.5386
26400/31675 (epoch 20.84), train_loss = 88.6149
26500/31675 (epoch 20.92), train_loss = 71.9182
26600/31675 (epoch 20.99), train_loss = 152.6365
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch21.00_123.18.t7
26700/31675 (epoch 21.07), train_loss = 105.6602
26800/31675 (epoch 21.15), train_loss = 126.5473
26900/31675 (epoch 21.23), train_loss = 106.3288
27000/31675 (epoch 21.31), train_loss = 114.4642
27100/31675 (epoch 21.39), train_loss = 104.3161
27200/31675 (epoch 21.47), train_loss = 106.3294
27300/31675 (epoch 21.55), train_loss = 91.8286
27400/31675 (epoch 21.63), train_loss = 85.4033
27500/31675 (epoch 21.70), train_loss = 121.0194
27600/31675 (epoch 21.78), train_loss = 92.0562
27700/31675 (epoch 21.86), train_loss = 101.6783
27800/31675 (epoch 21.94), train_loss = 84.2354
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch22.00_122.85.t7
27900/31675 (epoch 22.02), train_loss = 106.6517
28000/31675 (epoch 22.10), train_loss = 84.0312
28100/31675 (epoch 22.18), train_loss = 107.4262
28200/31675 (epoch 22.26), train_loss = 113.2599
28300/31675 (epoch 22.34), train_loss = 94.5707
28400/31675 (epoch 22.42), train_loss = 151.1607
28500/31675 (epoch 22.49), train_loss = 105.3479
28600/31675 (epoch 22.57), train_loss = 111.4545
28700/31675 (epoch 22.65), train_loss = 99.9958
28800/31675 (epoch 22.73), train_loss = 139.2409
28900/31675 (epoch 22.81), train_loss = 91.4084
29000/31675 (epoch 22.89), train_loss = 79.4813
29100/31675 (epoch 22.97), train_loss = 97.5256
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch23.00_122.58.t7
29200/31675 (epoch 23.05), train_loss = 117.4427
29300/31675 (epoch 23.13), train_loss = 104.0569
29400/31675 (epoch 23.20), train_loss = 137.5399
29500/31675 (epoch 23.28), train_loss = 91.9614
29600/31675 (epoch 23.36), train_loss = 87.3350
29700/31675 (epoch 23.44), train_loss = 67.8878
29800/31675 (epoch 23.52), train_loss = 103.1114
29900/31675 (epoch 23.60), train_loss = 100.8149
30000/31675 (epoch 23.68), train_loss = 118.3131
30100/31675 (epoch 23.76), train_loss = 123.7189
30200/31675 (epoch 23.84), train_loss = 103.1361
30300/31675 (epoch 23.91), train_loss = 75.9410
30400/31675 (epoch 23.99), train_loss = 122.3899
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch24.00_122.53.t7
30500/31675 (epoch 24.07), train_loss = 83.0460
30600/31675 (epoch 24.15), train_loss = 101.1161
30700/31675 (epoch 24.23), train_loss = 68.5993
30800/31675 (epoch 24.31), train_loss = 115.3679
30900/31675 (epoch 24.39), train_loss = 120.2563
31000/31675 (epoch 24.47), train_loss = 127.7466
31100/31675 (epoch 24.55), train_loss = 78.7842
31200/31675 (epoch 24.63), train_loss = 98.9353
31300/31675 (epoch 24.70), train_loss = 124.4050
31400/31675 (epoch 24.78), train_loss = 115.8360
31500/31675 (epoch 24.86), train_loss = 112.5002
31600/31675 (epoch 24.94), train_loss = 81.2895
evaluating loss over split index 2
saving checkpoint to cv/lm_word-small_epoch25.00_122.52.t7
evaluating loss over split index 3
Perplexity on test set: 115.86590686001

Thanks;

Image based recognision

i was wondering if the input to this model can be an image? that is , can I input an image and will it detect which language it is ?

How is the matrix C-k padded with zeros?

One of the footnotes of the paper states that the character-level representation matrix C-k is zero-padded so that the input to TDNN all have fixed size like batch_size x max_word_l x char_vec_size. Is the original character-level representation padded with zeros at the top, botton, or is it centered? I can't seem to find in the code where this is done.

So for example if you have:
0.5030 0.4816 0.9014 0.1423 1.2905
-0.6805 -1.1351 -2.2197 -0.1245 -1.7221
-0.6821 1.3445 -1.3602 1.6282 1.3519

What does the padded result look like?

0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000
0.5030 0.4816 0.9014 0.1423 1.2905
-0.6805 -1.1351 -2.2197 -0.1245 -1.7221
-0.6821 1.3445 -1.3602 1.6282 1.3519

0.0000 0.0000 0.0000 0.0000 0.0000
0.5030 0.4816 0.9014 0.1423 1.2905
-0.6805 -1.1351 -2.2197 -0.1245 -1.7221
-0.6821 1.3445 -1.3602 1.6282 1.3519
0.0000 0.0000 0.0000 0.0000 0.0000

0.5030 0.4816 0.9014 0.1423 1.2905
-0.6805 -1.1351 -2.2197 -0.1245 -1.7221
-0.6821 1.3445 -1.3602 1.6282 1.3519
0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000

Squeeze error

Trying to run main.lua, I'm getting the following error:

Word vocab size: 599589, Char vocab size: 42, Max word length (incl. padding):  65
creating an LSTM-CNN with 2 layers
/home/robin/dev/torch/install/bin/luajit: ./model/TDNN.lua:27: attempt to call field 'Squeeze' (a nil value)
stack traceback:
        ./model/TDNN.lua:27: in function 'tdnn'
        ./model/LSTMTDNN.lua:53: in function 'lstmtdnn'
        main.lua:150: in main chunk
        [C]: in function 'dofile'
        .../dev/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00406670

I see some recent commits about Squeeze so I presume this is related; any tips?

Thank you!

Problem with all non english corpora

Great code/model but i see one problem:
I think that results for character models are only valid for english corpus (ptb). For all other languages (especially for russian where all letters are 2-byte sequences) you actually have models for sequences of bytes not for sequences of characters. Am i right? or you converted corpora to language specific one-byte encoding before processing?

path index undefined

To solve this error:

luajit: ./util/BatchLoaderUnk.lua:14: attempt to index global 'path' (a nil value)

I added this line to both main.lua and util/BatchLoaderUnk.lua:

local path = require 'pl.path'

and I had to install:

luarocks install luafilesystem

Character

Hi! Can this code be used for character prediction too or only word prediction? Thank you!

Reconstruction of table 6 from paper - Dealing with OOV words

Hi, thank you very much for this.

I wanted to ask you whether you could elaborate on how table 6 is constructed, I am having some difficulties reconstructing it after training on the PTB-data.
Specifically for OOV words.

I think I understand how to compute the cosine similarity between two words that exist in the word_vecs lookup table. However when I compute the nearest neighbor words based on cosine similarity I get different results from what is described in the paper:

th> get_sim_words('his',5,cpchar,word2idx,idx2word)                                         
{
  1 : 
   {
      1 : "his"
      2 : 1
    }
  2 : 
    {
      1 : "my"
      2 : 0.67714271790195
    }
  3 : 
    {
      1 : "your"
      2 : 0.67532773464339
    }
  4 : 
    {
      1 : "its"
      2 : 0.63439247861717
    }
  5 : 
    {
      1 : "her"
      2 : 0.62416681420755
    }
}

Here I am simply using the lookup table found in checkpoint.protos.rnn.modules[2].weight:double().
I obtain the row in the lookup table which corresponds to the word for which I want the nearest neighbors. Compute the matrix vector product and sort based on similarity.

I assume that for the nearest neighbor words of OOV words you are using the character embedding space? Any help or tips on how you did this would be very appreciated.

Thanks,

wrong number of arguments for function call in THNN.lua

Attempting to run your code results in the following error:

using CUDA on GPU 0...  
loading data files...   
Word vocab size: 97117, Char vocab size: 163    
reshaping tensors...    
data load done. Number of batches in train: 711, val: 64, test: 1   
Word vocab size: 97117, Char vocab size: 163, Max word length (incl. padding):  59  
creating an LSTM-CNN with 2 layers  
number of parameters in the model: 50060817 
cloning rnn 
cloning criterion   
/local/pavels/torch/install/bin/luajit: /local/pavels/torch/install/share/lua/5.1/nn/THNN.lua:109: wrong number of arguments for function call
stack traceback:
    [C]: in function 'v'
    /local/pavels/torch/install/share/lua/5.1/nn/THNN.lua:109: in function 'Threshold_updateGradInput'
    /local/pavels/torch/install/share/lua/5.1/nn/Threshold.lua:32: in function 'updateGradInput'
    ...l/pavels/torch/install/share/lua/5.1/nngraph/gmodule.lua:420: in function 'neteval'
    ...l/pavels/torch/install/share/lua/5.1/nngraph/gmodule.lua:454: in function 'updateGradInput'
    ...l/pavels/torch/install/share/lua/5.1/nngraph/gmodule.lua:420: in function 'neteval'
    ...l/pavels/torch/install/share/lua/5.1/nngraph/gmodule.lua:454: in function 'updateGradInput'
    /local/pavels/torch/install/share/lua/5.1/nn/Module.lua:31: in function 'backward'
    lstm-char-cnn/main.lua:331: in function 'feval'
    lstm-char-cnn/main.lua:381: in main chunk
    [C]: in function 'dofile'
    ...vels/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00405d70

command used:

./th lstm-char-cnn/main.lua -data_dir lstm-char-cnn/data/ru -savefile ru-char-small -rnn_size 300 -use_chars 0 -use_words 1 -word_vec_size 200 -highway_layers 1 -kernels '{1,2,3,4,5,6}' -feature_maps '{25,50,75,100,125,150}' -EOS '' -gpuid 0 > charlstm.log 2>&1 &

What is the task at hand ?

May be a naive question - but its not very clear what is the task at hand.

When I look at the training data - its a just a series of sentences (with no label).
So is the task to predict (k+1)th word after looking at 1....k words ?

About the hierarchical softmax

Hi,

Thanks for the great model, and happy new year.

I would like to ask about your hierarchical softmax. Is it your intention to equally share the words to the cluster, or to make the implementation easier. I find it hard to understand the way you distribute the words to clusters, did you use a normal distribution ? I tried to group words based on their unigram frequencies (like in Mikolov's model) but the result is very bad.

Also, I guess you have also tried fbnn HSM. I tried to apply it on top of the network (after the final dropout), but it gives very huge loss. Is it possible to improve your HSM to make it work better with asynchronous clusters (some may have several words, while some have a lot of words).

Thank you,

cnn application

what is the use of CNN here?Is it used to find out the same meaning and close words?

How can I get the 2013 ACL Workshop MT data?

Hi.
I'm looking for 2013 ACL Workshop on Machine Translation data which is mentioned in the paper.
(Character-aware Neural language models) but the below link is no longer available.

http://www.statmt.org/wmt13/translation-task.html

Does anyone can help me to find out the data?

New version without Squeeze.lua causes error

Hi:

The previous version using Squeeze.lua works fine.
I would like to test sample.lua but it does not work with the previous version.

I'm new to lua/torch and I would like to sample from this model as per karpathy's original.

When I run the new version without Squeeze.lua I get the following error which is referencing 'Squeeze':

ne-time setup: preprocessing input train/valid/test files in dir: data/ptb
Processing text into tensors...
After first pass of data, max word length is: 26
Token count: train 10124, val 1408, test 3570
done
saving data/ptb/vocab.t7
saving data/ptb/data.t7
saving data/ptb/data_char.t7
loading data files...
Word vocab size: 4040, Char vocab size: 71
reshaping tensors...
data load done. Number of batches in train: 14, val: 2, test: 1
Word vocab size: 4040, Char vocab size: 71, Max word length (incl. padding): 26
creating an LSTM-CNN with 2 layers
/home/pixelhead/torch/install/bin/luajit: ./model/TDNN.lua:35: attempt to call field 'Squeeze' (a nil value)
stack traceback:
./model/TDNN.lua:35: in function 'tdnn'
./model/LSTMTDNN.lua:53: in function 'lstmtdnn'
main.lua:150: in main chunk
[C]: in function 'dofile'
...head/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

Is there another switch I should set to fix this?

Thanks,

Sampling code

Hi, thank you for providing results from such a valuable research.

I'm trying to build word prediction model using your RNN code.
I successfully finished installation and built the first model!! Thank you.
Now I would like to see how it actually predict next word.
But I am having a hard time to write codes for such task, since I am very new to this domain.
In the Kaparthy's package, there is a script for generating texts (sample.lua).
Could you provide such script for your solution?

Thank you very much in advance.

yoonkim / lstm-char-cnn Goto Github PK

lstm-char-cnn's Introduction

Character-Aware Neural Language Models

Requirements

Data

Note on PTB

Unicode in Lua

Model

Character-level models

Word-level models

Combining both

Evaluation

Hierarchical Softmax

Batch Size

Licence

lstm-char-cnn's People

Contributors

Stargazers

Watchers

Forkers

lstm-char-cnn's Issues

Recommend Projects

Recommend Topics

Recommend Org