lmthang / nmt.matlab Goto Github PK

View Code? Open in Web Editor NEW

105.0 105.0 50.0 3.84 MB

Code to train state-of-the-art Neural Machine Translation systems.

Home Page: http://nlp.stanford.edu/projects/nmt/

MATLAB 52.48% Shell 4.21% Python 17.17% Perl 25.50% Smalltalk 0.64%

nmt.matlab's People

Contributors

Stargazers

Watchers

Forkers

zerkh rhuangq xiongshufeng pbaljeka songcheng esalesky mkorpusik jiangnandexue kevinwenya amoliu hitluobin rohinarora kdjyss lemaoliu miradel51 irvhsu chagge fangzheng354 sigmaquan binbinbian lihao0214 chenyulinx yufish wai7niu8 riverdarda stevenlol meteora9479 davidtranno1 izzeddingur benjamesbabala zhoudaqing zxsted logonod xriver007 anaskoara csu-mapping splevine yunwenhuang littleboy96 jangkyung thegodone anhtran1995 charudatta10 afcarl gccrpm auscenery mirroryuexin idillegal xrosliang bhavnabansal

nmt.matlab's Issues

the content about translations.txt

Hi
when I use command line:
trainLSTM('../output/id.1000/train.10k', '../output/id.1000/valid.100', '../output/id.1000/test.100', 'de', 'en', '../output/id.1000/train.10k.de.vocab.1000', '../output/id.1000/train.10k.en.vocab.1000', '../output/basic', 'isResume', 0)
testLSTM('../output/advanced/modelRecent.mat', 2, 10, 1, '../output/advanced/translations.txt')
but the translation.txt contents are that:

The

loadVocab: difference between word and word class

Both of these seem the same to me:
(loadVocab.m)
elseif length(temp) == 1 % word
word = temp{1};
freq = 1;
elseif length(temp) == 1 % word class
word = temp{1};
freq = 1;

Expected BLEU score on test data?

Hi,

Just want to ask what the expected BLEU score on the given test data is please? Thanks!

Normalize score of each translation can improve the translation results

This is because the scores do not take the length of translation into account.

model ensemble

Hi,

Is model ensemble part available? Or at least can you explain how to do model ensemble.

Here is my currently understanding:

Give the sequence x_1, x_2,...x_n, and two models m_1, m_2. First use the encoder part of m_1, m_2 to get the states s_1, s_2, then feed the states to the decoder parts of m_1, m_2, the first input to the decoder is EOS, then the decoder generates the probability for the next words p_1, p_2, average the probability to get (p_1 + p_2)/2, then pick the top k words according to the average probability as candidates, and then use them to feed into the next steps, the states s_1, s_2 are both updated after the first step.

Is this correct? I tried to implement using the above way, but don't find improvement using several models.

NaN values in "srcVecsAll"

Hi,

I am wondering why I am getting error as
"Subscript indices must either be real positive integers or logicals." in buildSrcVecs line 58. When I checked the values of srcVecsAll, it includes NaN while it should include only integers as indices. Could you help me why it happens and what causes this NaN values?

Thanks

the content about translations.txt

Error when run the code for toy data

Hi everyone,
I am new on neural networks . Now I am trying to but I get this error:

trainLSTM('../output/id.1000/train.10k', '../output/id.1000/valid.100', '../output/id.1000/test.100', 'de', 'en', '../output/id.1000/train.10k.de.vocab.1000', '../output/id.1000/train.10k.en.vocab.1000', '../output/basic', 'isResume', 0)

Loading vocab from file ../output/id.1000/train.10k.en.vocab.1000 ...
Loading vocab from file ../output/id.1000/train.10k.en.vocab.1000 ...
Loading vocab from file ../output/id.1000/train.10k.de.vocab.1000 ...
Bilingual setting
Init LSTM parameters using dataType=double, initRange=0.1
Model size = 458700, individual sizes: W_src{1}=80000 W_tgt{1}=80000 W_emb_src=99500 W_emb_tgt=99600 W_soft=99600
assert = 0
attnFunc = 0
attnOpt = 0
batchSize = 128
dataType = double
debug = 0
decode = 1
dropout = 1
epochFraction = 1
feedInput = 0
finetuneEpoch = 5
finetuneRate = 0.5
gpuDevice = 0
initRange = 0.1
isBi = 1
isClip = 1
isGradCheck = 0
isProfile = 0
isResume = 0
isReverse = 0
learningRate = 1
loadModel =
logFreq = 10
lstmOpt = 0
lstmSize = 100
maxGradNorm = 5
maxLenRatio = 1.5
maxSentLen = 51
minLenRatio = 0.5
numEpoches = 10
numLayers = 1
onlyCPU = 0
outDir = ../output/basic
posWin = 10
saveHDF = 0
seed = 0
shuffle = 1
sortBatch = 1
srcLang = de
srcVocabFile = ../output/id.1000/train.10k.de.vocab.1000
testPrefix = ../output/id.1000/test.100
tgtLang = en
tgtVocabFile = ../output/id.1000/train.10k.en.vocab.1000
trainPrefix = ../output/id.1000/train.10k
validPrefix = ../output/id.1000/valid.100
chunkSize = 12800
baseIndex = 0
clipForward = 50
clipBackward = 1000
nonlinear_gate_f = sigmoid
nonlinear_gate_f_prime = sigmoidPrime
nonlinear_f = tanh
nonlinear_f_prime = tanhPrime
beamSize = 12
stackSize = 100
unkPenalty = 0
lengthReward = 0
isGPU = 0
logId = 6
srcSos = 994
srcEos = 995
srcVocabSize = 995
nullPosId = 0
tgtSos = 995
tgtEos = 996
tgtVocabSize = 996
modelFile = ../output/basic/model.mat
modelRecentFile = ../output/basic/modelRecent.mat
softmaxSize = 100
lr = 1
epoch = 1
bestCostValid = 100000
testPerplexity = 100000
curTestPerpWord = 100000
startIter = 0
iter = 0
epochBatchCount = 0
finetuneCount = 0
modelSize = 458700
Loading data src from file ../output/id.1000/valid.100.de
src 1:gegen , um der von
src end: eine die des von hat , eine for eine .
Loading data tgt from file ../output/id.1000/valid.100.en
tgt 1:than to the ##AT##-##AT## of
tgt end: one a in the of , however an .
numSents=100, numWords=1953
Loading data src from file ../output/id.1000/test.100.de
src 1: und sich gibt Verf眉gung
src end:Der aber ein des vor und sie am , dass die des stehen und per 3 , um , Ort es f眉r auf , 2 er .
Loading data tgt from file ../output/id.1000/test.100.en
tgt 1: and minute land time
tgt end:The a of the beginning and that the 's double " and in " to mine the guests en 's .
numSents=100, numWords=2225
Load train data srcFile "../output/id.1000/train.10k.de" and tgtFile "../output/id.1000/train.10k.en"
src 1: Unsere aus der Ihrer .
src end:wei脽 Ein ich und auch eingesetzt sei , wie es Modul war : Express einen , New dieser als Wann 17 , der auch gibt from war , ihm von seiner zu seiner durch ihr gehen zu , f眉r eine so , da脽 da die der am , und guten auf eine Apartments , da脽 sich in Dateien eine und .
tgt: from 鈥?do .
tgt end:cities shall We apartment , and restaurants has 24 , 't were it was that an a , of a mobile first de of , and one , bit , we was to the of a by his entire unto , Lord be a of to the of the some of the available download of the information , and in a to having in about a of the available 鈥?do and reviews .
src input 1: <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> <s_sos> Teil Entwicklung N盲he Sterne des Hotel in kommen Sie eine dir .
src mask: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
tgt input 1: <t_sos> than ##UNDERSCORE## Update is 00 to short your % at Hotel in . <t_eos>
tgt output 1: than ##UNDERSCORE## Update is 00 to short your % at Hotel in . <t_eos> <t_eos>
tgt mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
Epoch 1, lr=1, 30-Oct-2015 19:47:17
varsDenseUpdate: W_src W_tgt W_soft
Index exceeds matrix dimensions.

Error in lstmCostGrad (line 110)
x_t = W_emb(:, input(:, tt));

Error in trainLSTM (line 269)
[costs, grad] = lstmCostGrad(model, trainData, params, 0);
Could anyone help me with it? Thanks

difference between trainBatches{}.input and trainBatches{}.srcInput

I don't quite understand what the input here is? Shouldn't the input be the same as the source?

buildSrcVecs.m

Hi, Imthang
I am puzzled when I read your code in file buildSrcVecs.m, I cannot understand the following code:

% check those that are out of boundaries
excludeIds = find(indicesAll>numSrcHidVecs | indicesAll<(srcMaxLen-srcLens+1));

I cannot understand:
indicesAll<(srcMaxLen-srcLens+1)

If srcMaxLen is 10, one of the sentence length is 7, then following the code, indices < (10-7+1)=4 are out of bound, why? Can you explain it?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.