Coder Social home page Coder Social logo

mxseq2seq's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

mxseq2seq's Issues

Cannot reproduce the results in pytorch.

Hi, Sheng

I am in trouble with reproducing seq2seq in http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

The loss values are higher in gluon than pytorch, I cannot figure out the reasons. I think the network and hyperparameters are the same.

Could you please help review this? Thanks a lot. @szha

git clone https://github.com/ZiyueHuang/MXSeq2Seq.git
cd MXSeq2Seq/gluon
python seq2seq.py --cuda

Here are the outputs,

Reading lines...
Read 135842 sentence pairs
Trimmed to 10853 sentence pairs
Counting words...
Counted words:
(u'fra', 4489)
(u'eng', 2925)
[u'elles n ont pas toujours raison .', u'they re not always right .']
3.28828048161
2.78100977883
2.58317873447
2.40703460461
2.28146159903
2.15195642098
2.0522787211
1.96067533115
1.82878418601
1.74697916563
1.67514307116
1.57050654945
1.50527894126
1.41056300702
1.36106930289

Here are the loss values in pytorch,

3m 23s (- 47m 22s) (5000 6%) 2.8848
6m 44s (- 43m 48s) (10000 13%) 2.3516
10m 12s (- 40m 51s) (15000 20%) 2.0009
13m 38s (- 37m 31s) (20000 26%) 1.7755
16m 49s (- 33m 38s) (25000 33%) 1.5787
20m 5s (- 30m 7s) (30000 40%) 1.4096
23m 17s (- 26m 37s) (35000 46%) 1.3090
26m 33s (- 23m 14s) (40000 53%) 1.0980
29m 45s (- 19m 50s) (45000 60%) 1.0109
32m 57s (- 16m 28s) (50000 66%) 0.9418
36m 12s (- 13m 10s) (55000 73%) 0.8696
39m 27s (- 9m 51s) (60000 80%) 0.8121
42m 41s (- 6m 34s) (65000 86%) 0.7046
45m 56s (- 3m 16s) (70000 93%) 0.6555
49m 7s (- 0m 0s) (75000 100%) 0.6015

Attention weights

In seq2seq.py the attention weights are computed like this:

attn_weights = F.softmax(
            self.attn(F.concat(embedded, hidden[0].flatten(), dim=1)))

Where embedded is the input of the decoder and hidden is the encoder's hidden as in the train you define hidden as: decoder_hidden = encoder_hidden. The problem is that as I found online in different sources the attention weights are computed with decoder's hidden and encoder's output.

image
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.