to run on pytorch 1.0 both the position and the div term need to be initialized as flo

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

"exp" not implemented for 'torch.LongTensor' pytorch 1.0 about annotated-transformer HOT 5 CLOSED

harvardnlp commented on May 16, 2024

"exp" not implemented for 'torch.LongTensor' pytorch 1.0

from annotated-transformer.

Comments (5)

AmoghM commented on May 16, 2024

@xvdp Are you able to run the rest of it? I am getting two errors which I am not able to resolve:

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'

.
.
.
<ipython-input-139-2ae4ba63671c> in encode(self, src, src_mask)
     17 
     18     def encode(self, src, src_mask):
---> 19         return self.encoder(self.src_embed(src), src_mask)
     20 
     21     def decode(self, memory, src_mask, tgt, tgt_mask):
.
.
.
RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'

NotImplementedError

.
.
.
74     def encode(self, src, src_mask):
---> 75         return self.encoder(self.src_embed(src), src_mask)

Both has self.encoder() as fault. I can't really figure out what is happening. It will super great if you can provide some insight on this. Link to that jupyter notebook:
https://github.com/AmoghM/DeepLearning/blob/master/TransformerNetwork/HarvardTransformer.ipynb

from annotated-transformer.

v-iashin commented on May 16, 2024

@xvdp
Solved

"exp" not implemented for 'torch.LongTensor' pytorch 1.0

by the original version:

div_term = 1 / (10000 ** (torch.arange(0., d_model, 2) / d_model))

I am not sure though what is the main reason why they decided to use .exp but my best guess is the numerical stability.

from annotated-transformer.

xvdp commented on May 16, 2024

Glad you solved it, sorry - i hadn't seen your message.
There are are a couple other things that need to be fixed so this runs on pytorch 1.0 +. As in the access to scalars using .item() instead of data()[0]
but curiously I did not run into your problem.
Ill note it here.

from annotated-transformer.

V-Enzo commented on May 16, 2024

@AmoghM
out = greedy_decode (model, src.cuda(), src_mask.cuda(), max_len=60, start_symbol=TGT.vocab.stoi["<s>"])
try to add .cuda() at the end of src and src_mask, this will move src and src_mask to gpus
I find the answer in the below link
https://github.com/huggingface/pytorch-pretrained-BERT/issues/227

from annotated-transformer.

AmoghM commented on May 16, 2024

@V-Enzo Thanks for pointing at this. I will try to do it and report back.

from annotated-transformer.

Recommend Projects