Standard encoder-decoder NMT (following Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu et el)
- python 3.6
- torch 1.2
- tensorboard 1.14+
- psutil
- dill
- CUDA 9
- Source / target files: one sentence per line
- Source / target vocab files: one vocab per line, the top 5 fixed to be
<pad> <unk> <s> </s> <spc>
as defined inutils/config.py
To train the model - check af-run/run-aaf-pretrain.sh
train_path_src
- path to source file for trainingtrain_path_tgt
- path to target file for trainingdev_path_src
- path to source file for validation (default set toNone
)dev_path_tgt
- path to target file for validation (default set toNone
)path_vocab_src
- path to source vocab listpath_vocab_tgt
- path to target vocab listload_embedding_src
- load pretrained src embedding if providedload_embedding_tgt
- load pretrained target embedding if provideduse_type
-word
or tokenise intochar
save
- dir to save the trained modelrandom_seed
- set random seedshare_embedder
- share embedding matrix across source and targetembedding_size_enc
- source embedding sizeembedding_size_dec
- target embedding sizehidden_size_enc
- encoder hidden sizenum_bilstm_enc
- number of encoder BiLSTM layersnum_unilstm_enc
- number of encoder UniLSTM layers (default0
)hidden_size_dec
- decoder hidden sizenum_unilstm_dec
- number of decoder UniLSTM layersatt_mode
- attention modebahdanau | bilinear | hybrid
hidden_size_att
- only used ifatt_mode
is set tohybrid
residual
- residual connection across LSTM layershidden_size_shared
- transformed attention output hidden sizemax_seq_len
- maximum sequence length, longer sentences filtered out in trainingbatch_size
- batch sizebatch_first
- set toTrue
seqrev
- train seq2seq in reverse ordereval_with_mask
- compute loss on non<pad>
tokens (defaultTrue
)scheduled_sampling
- scheduled samplingteacher_forcing_ratio
- probability to run in teacher forcing mode, set to1.0
for teacher forcing to be used throughoutdropout
- dropout rateembedding_dropout
- embedding dropout ratenum_epochs
- number of epochsuse_gpu
- set toTrue
if GPU device is availablelearning_rate
- learning ratemax_grad_norm
- gradient clippingcheckpoint_every
- number of batches trained for 1 checkpoint saved (ifdev_path*
not given, save after every epoch)print_every
- number of batches trained for train losses printedmax_count_no_improve
- used whendev_path*
is given, number of batches trained (with no improvement in accuracy on dev set) before roll backmax_count_num_rollback
- reduce learning rate if rolling back for multiple timeskeep_num
- number of checkpoint kept in model dir (used ifdev_path*
is given)normalise_loss
- normalise loss on per token basisminibatch_split
- if OOM, split batch into minibatch (note gradient descent still is done per batch, not minibatch)
To test the model - check af-run/run-aaf-pretrain.sh
test_path_src
- path to source textseqrev
- translate in reverse order or notpath_vocab_src
- be consistent with trainingpath_vocab_tgt
- be consistent with traininguse_type
- be consistent with trainingload
- path to model checkpointtest_path_out
- path to save the translated textmax_seq_len
- maximum translation sequence length (set to be at least larger than the maximum source sentence length)batch_size
- batch size in translation, restricted by memoryuse_gpu
- set toTrue
if GPU device is availablebeam_width
- beam search decodingeval_mode
- default1
(other modes for debugging)