ziyuehuang / mxseq2seq Goto Github PK
View Code? Open in Web Editor NEWseq2seq with attention in mxnet
seq2seq with attention in mxnet
Hi, Sheng
I am in trouble with reproducing seq2seq in http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html
The loss values are higher in gluon than pytorch, I cannot figure out the reasons. I think the network and hyperparameters are the same.
Could you please help review this? Thanks a lot. @szha
git clone https://github.com/ZiyueHuang/MXSeq2Seq.git
cd MXSeq2Seq/gluon
python seq2seq.py --cuda
Here are the outputs,
Reading lines...
Read 135842 sentence pairs
Trimmed to 10853 sentence pairs
Counting words...
Counted words:
(u'fra', 4489)
(u'eng', 2925)
[u'elles n ont pas toujours raison .', u'they re not always right .']
3.28828048161
2.78100977883
2.58317873447
2.40703460461
2.28146159903
2.15195642098
2.0522787211
1.96067533115
1.82878418601
1.74697916563
1.67514307116
1.57050654945
1.50527894126
1.41056300702
1.36106930289
Here are the loss values in pytorch,
3m 23s (- 47m 22s) (5000 6%) 2.8848
6m 44s (- 43m 48s) (10000 13%) 2.3516
10m 12s (- 40m 51s) (15000 20%) 2.0009
13m 38s (- 37m 31s) (20000 26%) 1.7755
16m 49s (- 33m 38s) (25000 33%) 1.5787
20m 5s (- 30m 7s) (30000 40%) 1.4096
23m 17s (- 26m 37s) (35000 46%) 1.3090
26m 33s (- 23m 14s) (40000 53%) 1.0980
29m 45s (- 19m 50s) (45000 60%) 1.0109
32m 57s (- 16m 28s) (50000 66%) 0.9418
36m 12s (- 13m 10s) (55000 73%) 0.8696
39m 27s (- 9m 51s) (60000 80%) 0.8121
42m 41s (- 6m 34s) (65000 86%) 0.7046
45m 56s (- 3m 16s) (70000 93%) 0.6555
49m 7s (- 0m 0s) (75000 100%) 0.6015
In seq2seq.py the attention weights are computed like this:
attn_weights = F.softmax(
self.attn(F.concat(embedded, hidden[0].flatten(), dim=1)))
Where embedded is the input of the decoder and hidden is the encoder's hidden as in the train you define hidden as: decoder_hidden = encoder_hidden. The problem is that as I found online in different sources the attention weights are computed with decoder's hidden and encoder's output.
what are the differences between using two Trainer and using single Trainer ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.