Coder Social home page Coder Social logo

jadore801120 / attention-is-all-you-need-pytorch Goto Github PK

View Code? Open in Web Editor NEW
8.5K 94.0 1.9K 166 KB

A PyTorch implementation of the Transformer model in "Attention is All You Need".

License: MIT License

Python 99.31% Shell 0.69%
attention deep-learning attention-is-all-you-need pytorch nlp natural-language-processing

attention-is-all-you-need-pytorch's Introduction

Attention is all you need: A Pytorch Implementation

This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017).

A novel sequence to sequence framework utilizes the self-attention mechanism, instead of Convolution operation or Recurrent structure, and achieve the state-of-the-art performance on WMT 2014 English-to-German translation task. (2017/06/12)

The official Tensorflow Implementation can be found in: tensorflow/tensor2tensor.

To learn more about self-attention mechanism, you could read "A Structured Self-attentive Sentence Embedding".

The project support training and translation with trained model now.

Note that this project is still a work in progress.

BPE related parts are not yet fully tested.

If there is any suggestion or error, feel free to fire an issue to let me know. :)

Usage

WMT'16 Multimodal Translation: de-en

An example of training for the WMT'16 Multimodal Translation task (http://www.statmt.org/wmt16/multimodal-task.html).

0) Download the spacy language model.

# conda install -c conda-forge spacy 
python -m spacy download en
python -m spacy download de

1) Preprocess the data with torchtext and spacy.

python preprocess.py -lang_src de -lang_trg en -share_vocab -save_data m30k_deen_shr.pkl

2) Train the model

python train.py -data_pkl m30k_deen_shr.pkl -log m30k_deen_shr -embs_share_weight -proj_share_weight -label_smoothing -output_dir output -b 256 -warmup 128000 -epoch 400

3) Test the model

python translate.py -data_pkl m30k_deen_shr.pkl -model trained.chkpt -output prediction.txt

[(WIP)] WMT'17 Multimodal Translation: de-en w/ BPE

1) Download and preprocess the data with bpe:

Since the interfaces is not unified, you need to switch the main function call from main_wo_bpe to main.

python preprocess.py -raw_dir /tmp/raw_deen -data_dir ./bpe_deen -save_data bpe_vocab.pkl -codes codes.txt -prefix deen

2) Train the model

python train.py -data_pkl ./bpe_deen/bpe_vocab.pkl -train_path ./bpe_deen/deen-train -val_path ./bpe_deen/deen-val -log deen_bpe -embs_share_weight -proj_share_weight -label_smoothing -output_dir output -b 256 -warmup 128000 -epoch 400

3) Test the model (not ready)

  • TODO:
    • Load vocabulary.
    • Perform decoding after the translation.

Performance

Training

  • Parameter settings:
    • batch size 256
    • warmup step 4000
    • epoch 200
    • lr_mul 0.5
    • label smoothing
    • do not apply BPE and shared vocabulary
    • target embedding / pre-softmax linear layer weight sharing.

Testing

  • coming soon.

TODO

  • Evaluation on the generated text.
  • Attention weight plot.

Acknowledgement

  • The byte pair encoding parts are borrowed from subword-nmt.
  • The project structure, some scripts and the dataset preprocessing steps are heavily borrowed from OpenNMT/OpenNMT-py.
  • Thanks for the suggestions from @srush, @iamalbert, @Zessay, @JulesGM, @ZiJianZhao, and @huanghoujing.

attention-is-all-you-need-pytorch's People

Contributors

jadore801120 avatar mattiadg avatar sliedes avatar tony2037 avatar yuhsianghuang avatar zyex030640417 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

attention-is-all-you-need-pytorch's Issues

Batch Beam Search Problem

In the Beam.py-L30-L31:

self.next_ys = [self.tt.LongTensor(size).fill_(Constants.PAD)]
self.next_ys[0][0] = Constants.BOS

It seems that only the top hypothesis get "BOS" as start while all other hypothesis get "PAD" as start. Why don't all the hypothesis get "BOS" as start?
And in the Beam.py-L65-L68:

        # End condition is when top-of-beam is EOS.
        if self.next_ys[-1][0] == Constants.EOS:
            self.done = True
            self.all_scores.append(self.scores)

you set that end condition is when top-of-beam is "EOS". Why top-of-beam instead of all-of-beam?

Dimension error in forward pass

I am receiving the following error when I try to run the train script:

File "train.py", line 266, in <module>
    main()
  File "train.py", line 263, in main
    train(transformer, training_data, validation_data, crit, optimizer, opt)
  File "train.py", line 124, in train
    train_loss, train_accu = train_epoch(model, training_data, crit, optimizer)
  File "train.py", line 55, in train_epoch
    pred = model(src, tgt)
  File "/home/ubuntu/miniconda3/envs/cuda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/attention-is-all-you-need-pytorch/transformer/Models.py", line 179, in forward
    enc_outputs, enc_slf_attns = self.encoder(src_seq, src_pos)
  File "/home/ubuntu/miniconda3/envs/cuda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/attention-is-all-you-need-pytorch/transformer/Models.py", line 76, in forward
    enc_output, slf_attn_mask=enc_slf_attn_mask)
  File "/home/ubuntu/miniconda3/envs/cuda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/attention-is-all-you-need-pytorch/transformer/Layers.py", line 18, in forward
    enc_input, enc_input, enc_input, attn_mask=slf_attn_mask)
  File "/home/ubuntu/miniconda3/envs/cuda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/attention-is-all-you-need-pytorch/transformer/SubLayers.py", line 68, in forward
    return self.layer_norm(outputs + residual), attns
  File "/home/ubuntu/miniconda3/envs/cuda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/attention-is-all-you-need-pytorch/transformer/Modules.py", line 52, in forward
    ln_out = (z - mu.expand_as(z)) / (sigma.expand_as(z) + self.eps)
  File "/home/ubuntu/miniconda3/envs/cuda/lib/python3.6/site-packages/torch/autograd/variable.py", line 681, in expand_as
    return Expand.apply(self, tensor.size())
  File "/home/ubuntu/miniconda3/envs/cuda/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 106, in forward
    result = i.expand(*new_size)
RuntimeError: The expanded size of the tensor (24) must match the existing size (64) at non-singleton dimension 1. at /home/ubuntu/cuda-ubuntu-16.04-ec2/pytorch/torch/lib/THC/generic/THCTensor.c:323

d_word_vec and d_model must be equal in Encoder

According to the paper, d_word_vec and d_model must be equal. However, the interface for Encoder allows you to set them to different values. If you initialize an Encoder and set them to different values, you get an error in Line 54 MultiHeadAttention during the forward pass.

Multi-GPUs?

Hi, thanks for the sharing.
It seems like this code does not support multi-GPUs.
So are you planning on it?

Model training error

Training the model throws me the error below:

python train.py -data data/multi30k.atok.low.pt -save_model trained -save_mode best -proj_share_weight
Namespace(batch_size=64, cuda=True, d_inner_hid=1024, d_k=64, d_model=512, d_v=64, d_word_vec=512, data='data/multi30k.atok.low.pt', dropout=0.1, embs_share_weight=False, epoch=10, log=None, max_token_seq_len=52, n_head=8, n_layers=6, n_warmup_steps=4000, no_cuda=False, proj_share_weight=True, save_mode='best', save_model='trained', src_vocab_size=2909, tgt_vocab_size=3150)
('[ Epoch', 0, ']')

  • (Training) : 0%| | 0/453 [00:00<?, ?it/s]/home/user/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py:357: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
    result = self.forward(*input, **kwargs)
    Traceback (most recent call last):
    File "train.py", line 271, in
    main()
    File "train.py", line 268, in main
    train(transformer, training_data, validation_data, crit, optimizer, opt)
    File "train.py", line 126, in train
    train_loss, train_accu = train_epoch(model, training_data, crit, optimizer)
    File "train.py", line 57, in train_epoch
    pred = model(src, tgt)
    File "/home/user/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
    result = self.forward(*input, **kwargs)
    File "/srv/disk01/user/medical_data/attention-is-all-you-need-pytorch/transformer/Models.py", line 192, in forward
    enc_output, _ = self.encoder(src_seq, src_pos)
    ValueError: need more than 1 value to unpack

Feeding the output of the last encoding layer to the decoder

The original paper and the animation in this page seem only feed the output of the last encoding layer to the decoder, while the implementation here seems feed the output of each encoding layer to the corresponding decoding layer, which might not work if the encoder and the decoder have different number of layers.

KeyError on testing

After fixing a key error about tensor integer types, running translate.py seems to return KeyErrors with numbers, and checking with python seems to indicate that they are missing(the keys).
But skipping keys that are non-existent inside the write loop seems to return poor results.
result of pred.txt after running code
screen shot 2018-08-23 at 3 03 00 pm

heres the changed code:
screen shot 2018-08-23 at 3 03 14 pm

Did anybody experience this or have a fix? Thank you.

Memory Problem?

Hi, I clone your code and run train it on WMT English-German task, but it failed with "RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generic/THCStorage.cu:66".
I run it on a Tesla K40 which has the same memory capacity of 12GB as your Titan X, and with the default settings.
So I don`t know why this happens, do you have any idea? Thanks

Code failing while translation

For translation, I use the following command
CUDA_LAUNCH_BLOCKING=1 CUDA_VISIBLE_DEVICES=1 python3 translate.py -model trained.chkpt -vocab data/nmt.atok.low.pt -src data/nmt/test.en.atok
and get the following error : error

Can someone help?

Decoder input

Hi, I am not sure if you are feeding the right input to the decoder.

(pg. 2) "Given z, the decoder then generates an output sequence (y1, ..., ym) of symbols one element at a time. At each step the model is auto-regressive, consuming the previously generated symbols as additional input when generating the next."

I believe your decoder input is a batch of target sequences.

MultiHeadAttention() implemention question

Hi, Yu-Hsiang.
I am not clear about line 62 in file Sublayers.py

# back to original mb_size batch
outputs = outputs.view(mb_size, len_q, -1)            # mb_size x len_q x (n_head*d_v)

is it right?
above code is equal to below(tensorflow implemention from Kyubyong)?

# Restore shape
outputs = tf.concat(tf.split(outputs, num_heads, axis=0), axis=2 )          # (N, T_q, C)

question about softmax

when calulate the attention in MultiHead ,softmax's dim is 0,but i think dim=2 is right.

masking on tensor.data?

Hi,

I noticed that you were masking out the padded tensor by assigning value to tensor.data.

attn.data.masked_fill_(attn_mask, -float('inf'))

Is this correct?

Based on the discussion here, shouldn't we assign values to tensor itself, instead of tensor.data? In this way, the history of the gradient can be tracked.

bugs in the masking code

hi, i found that in decoder there is a subsequent mask which mask out the future information here . However, in line 123, you feed in the dec_input(which is the target embeding) at first layer. now check this line and then the MultiHeadAttention moudle's forward function, it has a residual connection and will make dec_input directly reached output, see here. so it doest not use the subsequent mask, which means that the model knows the future. am i correct?

Ubuntu Server Unable to recognise German Character

Ubuntu Server : Ein Boston Terrier lรคuft รผber saftig-grรผnes Gras vor einem wei?^?en Zaun.

Macbook Pro : Ein Boston Terrier lรคuft รผber saftig-grรผnes Gras vor einem weiรŸen Zaun.

Can you tell me how to set up the language encoding in Ubuntu? Best Wishes

nan loss when training

Training and validation loss is nan (using commit e21800a):

$ python3 preprocess.py -train_src data/multi30k/train.en -train_tgt data/multi30k/train.de -valid_src data/multi30k/val.en -valid_tgt data/multi30k/val.de -output data/multi30k/data.pt
$ python3 train.py -data data/multi30k/data.pt -save_model trained -save_model best
[ Epoch 0 ]
  - (Training)   loss:      nan, accuracy: 3.7 %
  - (Validation) loss:      nan, accuracy: 10.0 %
    - [Info] The checkpoint file has been updated.
[ Epoch 1 ]
  - (Training)   loss:      nan, accuracy: 9.09 %
  - (Validation) loss:      nan, accuracy: 9.87 %
[ Epoch 2 ]
  - (Training)   loss:      nan, accuracy: 9.09 %
  - (Validation) loss:      nan, accuracy: 9.83 %
[ Epoch 3 ]
  - (Training)   loss:      nan, accuracy: 9.1 %
  - (Validation) loss:      nan, accuracy: 9.92 %
[ Epoch 4 ]
  - (Training)   loss:      nan, accuracy: 9.09 %
  - (Validation) loss:      nan, accuracy: 9.91 %

Error about the mask in ScaledDotProductAttention

Currently, the attention mask in the ScaledDotProductAttention is generated in Line 28 in Models.py by:
pad_attn_mask = seq_k.data.eq(Constants.PAD).unsqueeze(1)
pad_attn_mask = pad_attn_mask.expand(mb_size, len_q, len_k)

Ignoring the batch dimension for an explanation, I assume the generated pad_attn_mask is a matrix of shape (len_q * len_k), then this code will produce the matrix like [A 1], where 1 is an all one submatrix. However, I think the generated attention mask should be like [B 1 // 1 1], where 1 is an all one submatrix and // means line break (sorry I don't know how to type formula in Markdown environments).

Masking bug?

I get 98% accuracy after 10 epochs on the multi30k validation set using this 1-layer model:

python train.py -data data/multi30k.atok.low.pt -save_model trained -save_mode best -proj_share_weight -dropout 0.0 -n_layers 1 -n_warmup_steps 40 -epoch 50 -d_inner_hid 1 -d_model 128 -d_word_vec 128 -n_head 4

This is a very small model (note -d_inner_hid 1), which should not get good results at all (98% accuracy is way too high in any case). Generating translations with translate.py produces non-sense. This makes me suspect that there is a problem with the masking code that allows the model to 'cheat' by looking at the target sequence.

I haven't been able to figure out where the problem is, but something seems wrong.

TypeError: cat() takes no keyword arguments

Traceback (most recent call last):
File "train.py", line 266, in
main()
File "train.py", line 263, in main
train(transformer, training_data, validation_data, crit, optimizer, opt)
File "train.py", line 124, in train
train_loss, train_accu = train_epoch(model, training_data, crit, optimizer)
File "train.py", line 55, in train_epoch
pred = model(src, tgt)
File "/home/sushuting/local/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/home/sushuting/workspace/attention-is-all-you-need-pytorch/transformer/Models.py", line 179, in forward
enc_outputs, enc_slf_attns = self.encoder(src_seq, src_pos)
File "/home/sushuting/local/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/home/sushuting/workspace/attention-is-all-you-need-pytorch/transformer/Models.py", line 76, in forward
enc_output, slf_attn_mask=enc_slf_attn_mask)
File "/home/sushuting/local/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/home/sushuting/workspace/attention-is-all-you-need-pytorch/transformer/Layers.py", line 18, in forward
enc_input, enc_input, enc_input, attn_mask=slf_attn_mask)
File "/home/sushuting/local/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/home/sushuting/workspace/attention-is-all-you-need-pytorch/transformer/SubLayers.py", line 62, in forward
outputs = torch.cat(torch.split(outputs, mb_size, dim=0), dim=-1)
TypeError: cat() takes no keyword arguments

accuracy reduce during the training

Hi. I just follow the tutorial to train the model with the dataset given here. However, the accuracy is relatively high at epoch 0 and has a sharp decline after that. Does anybody meet the similar problem?

Here is the record:

[ Epoch 0 ]

  • (Training) ppl: 69.89619, accuracy: 48.127 %, elapse: 9.800 min
  • (Validation) ppl: 86.35283, accuracy: 20.530 %, elapse: 3.096 min
    • [Info] The checkpoint file has been updated.
      [ Epoch 1 ]
  • (Training) ppl: 135.71178, accuracy: 32.377 %, elapse: 10.807 min
  • (Validation) ppl: 865.33501, accuracy: 5.777 %, elapse: 3.052 min
    [ Epoch 2 ]
  • (Training) ppl: 193.38618, accuracy: 27.988 %, elapse: 11.013 min
  • (Validation) ppl: 949.73713, accuracy: 4.359 %, elapse: 3.093 min

Document strings' style do not accord PEP8

As mentioned here:

PEP 257 describes good docstring conventions. Note that most importantly, the """ that ends a multiline docstring should be on a line by itself, e.g.:

"""Return a foobang

Optional plotz says to frobnicate the bizbaz first.
"""
For one liner docstrings, please keep the closing """ on the same line.

but most docstrings used in the code is:

''' document strings '''

Batch size limitation

Hi I was wondering why the maximum batch size is ~100 using a GPU with ~11GB of RAM whereas in the tensor2tensor the maximum batch size there is 1024?

eval() questions

In your eval_epoch() function you feedforward src and tgt through the model just like the training phase. Is this correct? Shouldn't eval be similar to testing where the model won't know the true target? Should there be an autoregressive step for evaluation where the prediction words are generated one by one and used by subsequent predictions?

Bug in translating

There's a mistake when repeating data for beam search.

The source seq here

src_seq = Variable(src_seq.data.repeat(beam_size, 1))

gets a matrix in the following order for source sequence

seq1
seq2
seq3
seq1
seq2
seq3
seq1
seq2
seq3

while the beam search input here

input_data = torch.stack([b.get_current_state() for b in beam if not b.done])

takes an input like

seq1
seq1
seq1
seq2
seq2
seq2
seq3
seq3
seq3

, both of which are fed into the decoder

            dec_outputs, dec_slf_attns, dec_enc_attns = self.model.decoder(
                input_data, input_pos, src_seq, enc_outputs)

The order of the two input does not match.

Tensor data type error

This may have something to do with pytorch version (I use 0.4.0), but I think people should know:

Traceback (most recent call last): File "train.py", line 271, in <module> main() File "train.py", line 268, in main train(transformer, training_data, validation_data, crit, optimizer, opt) File "train.py", line 126, in train train_loss, train_accu = train_epoch(model, training_data, crit, optimizer) File "train.py", line 73, in train_epoch return total_loss/n_total_words, n_total_correct/n_total_words RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #2 'other'

About label smoothing.

gold = gold * (1 - eps) + (1 - gold) * eps / num_class

Hi, thanks for the implementation. It is very neat and elegant. I noticed that you mentioned "label smoothing" is not done yet, but I also found you have implemented this line. I think it is correct but I am not sure what is the num_class in sequence-to-sequence models. Should it be equal to the size of the vocabulary?

Positional Encoding

In position_encoding_init, shouldn't it be

[pos / np.power(10000, (i//2)*2 / d_pos_vec ) for i in range(d_pos_vec)]

instead of

[pos / np.power(10000, 2*i/d_pos_vec) for i in range(d_pos_vec)]

In the original formulation, for positions 2i and 2i+1, the power should be 2i / d_model.

Softmax layer for output probabilities

Hello,

In the main Transformer model the encoder and decoder parts are calculated.
Then they are fed to a linear layer for the target word projections. But shouldn't this layer be followed by a softmax function to calculate the output probabilities like in the Transformer schematic?

Or am I looking over something? I can't seem to locate this last softmax function in the code.

Preprocessing Error

On running the following command for preprocessing
for l in en de; do for f in data/multi30k/*.$l; do if [[ "$f" != *"test"* ]]; then sed -i "$ d" $f; fi; done; done;
I'm getting the following error
sed: 1: "data/multi30k/train.en": extra characters at the end of d command sed: 1: "data/multi30k/val.en": extra characters at the end of d command sed: 1: "data/multi30k/train.de": extra characters at the end of d command sed: 1: "data/multi30k/val.de": extra characters at the end of d command

Please advice as to how I should proceed

RunTimeError During Training

After training for the first epoch I get the following error trying to calculate training accuracy and loss:

RuntimeError: Expected object of type.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #2 'other

At the following line of code at line 102 in train.py
return total_loss/n_total_words, n_total_correct/n_total_words

What command can continue running program?

Hi,thanks for your sharing.
My program has broken downใ€‚
What command can continue running program at GPU?

CUDA_VISIBLE_DEVICES=3 python train.py -data data/multi30k.atok.low.pt -save_model trained -save_mode best -proj_share_weight

Model training error

ejklektov@gpu3:~/attention-is-all-you-need-pytorch$ CUDA_VISIBLE_DEVICES=5 python3 train.py -data data/multi30k.atok.low.pt -save_model trained -save_mode best -proj_share_weight
Namespace(batch_size=64, cuda=True, d_inner_hid=1024, d_k=64, d_model=512, d_v=64, d_word_vec=512, data='data/multi30k.atok.low.pt', dropout=0.1, embs_share_weight=False, epoch=10, log=None, max_token_seq_len=52, n_head=8, n_layers=6, n_warmup_steps=4000, no_cuda=False, proj_share_weight=True, save_mode='best', save_model='trained', src_vocab_size=2909, tgt_vocab_size=3149)
/home/ejklektov/attention-is-all-you-need-pytorch/transformer/Modules.py:13: UserWarning: nn.init.xavier_normal is now deprecated in favor of nn.init.xavier_normal_.
init.xavier_normal(self.linear.weight)
/home/ejklektov/attention-is-all-you-need-pytorch/transformer/SubLayers.py:33: UserWarning: nn.init.xavier_normal is now deprecated in favor of nn.init.xavier_normal_.
init.xavier_normal(self.w_qs)
/home/ejklektov/attention-is-all-you-need-pytorch/transformer/SubLayers.py:34: UserWarning: nn.init.xavier_normal is now deprecated in favor of nn.init.xavier_normal_.
init.xavier_normal(self.w_ks)
/home/ejklektov/attention-is-all-you-need-pytorch/transformer/SubLayers.py:35: UserWarning: nn.init.xavier_normal is now deprecated in favor of nn.init.xavier_normal_.
init.xavier_normal(self.w_vs)
[ Epoch 0 ]

  • (Training) : 0%| | 0/454 [00:00<?, ?it/s]/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py:491: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
    result = self.forward(*input, **kwargs)
    train.py:71: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
    total_loss += loss.data[0]
    Traceback (most recent call last):
    File "train.py", line 271, in
    main()
    File "train.py", line 268, in main
    train(transformer, training_data, validation_data, crit, optimizer, opt)
    File "train.py", line 126, in train
    train_loss, train_accu = train_epoch(model, training_data, crit, optimizer)
    File "train.py", line 73, in train_epoch
    return total_loss/n_total_words, n_total_correct/n_total_words
    RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #2 'other'

I change train.py 71line code,
loss.data[0] ===> loss.item[0]
but it doesn't work

Dropout when predicting

Shouldn't we set dropout prob to 0.0 during prediction?
I notice that in SubLayers.py line 27, the attn_dropout was not set for ScaledDotProductAttention

Why need "get_attn_subsequent_mask" function?

What is the difference between encoder self attention and decoder self attention?
Why need "get_attn_subsequent_mask" function in the decoder self attention?

Thanks for your reply in advance!

Assert error in validation.

Hi,

I've tried to run a training on iwslt data en-fr. The first train epoch finished with loss: nan, but this may be due to my choice of parameters. The problem is, when it started the validation I got the following error:

/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [109,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [109,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [109,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [109,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [109,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [109,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [109,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [109,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [121,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCTensorMath.cu line=226 error=59 : device-side assert triggered
Traceback (most recent call last):                                                                                                                                                                                 
  File "attention-is-all-you-need-pytorch/train.py", line 244, in <module>
    main()
  File "attention-is-all-you-need-pytorch/train.py", line 241, in main
    train(transformer, training_data, validation_data, crit, optimizer, opt)
  File "attention-is-all-you-need-pytorch/train.py", line 120, in train
    valid_loss, valid_accu = eval_epoch(model, validation_data, crit)
  File "attention-is-all-you-need-pytorch/train.py", line 85, in eval_epoch
    pred = model(src, tgt)
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/hardmnt/hltmt0/data/digangi/attention-is-all-you-need-pytorch/transformer/Models.py", line 180, in forward
    enc_outputs, enc_slf_attns = self.encoder(src_seq, src_pos)
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/hardmnt/hltmt0/data/digangi/attention-is-all-you-need-pytorch/transformer/Models.py", line 76, in forward
    enc_output, slf_attn_mask=enc_slf_attn_mask)
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/hardmnt/hltmt0/data/digangi/attention-is-all-you-need-pytorch/transformer/Layers.py", line 18, in forward
    enc_input, enc_input, enc_input, attn_mask=slf_attn_mask)
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/hardmnt/hltmt0/data/digangi/attention-is-all-you-need-pytorch/transformer/SubLayers.py", line 43, in forward
    outputs = torch.cat(outputs, 2)
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 841, in cat
    return Concat(dim)(*iterable)
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 310, in forward
    return torch.cat(inputs, self.dim)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCTensorMath.cu:226

I have no experience with pytorch, so I don't know how to fix it at the moment.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.