Comments (27)
Yeah, I will try to do next week.
from annotated-transformer.
Does anyone have a fix that works? I've tried a few of the suggestions above but I still get this error:
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)
when trying to evalute this line:
run_epoch(data_gen(V, 30, 20), model, SimpleLossCompute(model.generator, criterion, model_opt))
I'm using pytorch version 1.0.0
Try this: https://github.com/ictnlp-wshugen/annotated-transformer_codes
from annotated-transformer.
@davidalbertonogueira you can refer to my repository.
https://github.com/DaoD/annotated-transformer/tree/master/src
Hope this can help you.I forked you repo. But get the result below:
Process finished with exit code 136 (interrupted by signal 8: SIGFPE)Do you know why?
It seems that there is something wrong with the float point.
I made a little change in the run_epoch(), then the problem is solved:
total_tokens += batch.ntokens.float()
tokens += batch.ntokens.float()
from annotated-transformer.
I'm still getting the same error when using the code from ictnlp-wshugen above. Here's the full error (note: I'm using python 3.7.1 on Windows, pytorch 1.0.0, CUDA 9.0.)
Error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)
python first.py C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) Traceback (most recent call last): File "first.py", line 58, in <module> run_epoch(data_gen(V, 30, 20), model, SimpleLossCompute(model.generator, criterion, model_opt)) File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\flow.py", line 54, in run_epoch out = model.forward(batch.src, batch.trg, batch.src_mask, batch.trg_mask) File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 24, in forward return self.decode(self.encode(src, src_mask), src_mask, tgt, tgt_mask) File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 27, in encode return self.encoder(self.src_embed(src), src_mask) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\container.py", line 92, in forward input = module(input) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\embeddings.py", line 16, in forward return self.lut(x) * math.sqrt(self.d_model) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 118, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 1454, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)
OK, the problem occurs in Windows Platform, now I make it runnable in Windows.
you can update the code or see the change �ictnlp-wshugen/annotated-transformer_codes@ffe3bcc
from annotated-transformer.
I gave it a try and found the loss not decreasing beyond 1.5 for the first copying example.. It was the same for pytorch 0.2 aswell.
from annotated-transformer.
I have gotten it to work with pytorch 0.4 up until and including the copying example. It required inserting a few .item() and .float() calls. Would you like a PR?
from annotated-transformer.
Yep, its as simple as @vthorsteinsson mentioned until first example atleast. Jumping between pytorch versions, I missed the dim parameter in softmax.
from annotated-transformer.
@vthorsteinsson would it be possible to have your "corrections"?
from annotated-transformer.
@srush Any updates on how to fix things for PyTorch 0.4? Have been stuck on this for quite some time, thank you!
from annotated-transformer.
Many may have already solved it. But for those who face the same issue with the toy problem, I fixed the error by changing SimpleLossCompute and runEpoch, mostly due to the tensor norm, which has to be changed to FloatTensor from LongTensor:
class SimpleLossCompute:
"A simple loss compute and train function."
def __init__(self, generator, criterion, opt=None):
self.generator = generator
self.criterion = criterion
self.opt = opt
def __call__(self, x, y, norm):
x = self.generator(x)
loss = self.criterion(x.contiguous().view(-1, x.size(-1)), y.contiguous().view(-1)) / norm.float()
loss.backward()
if self.opt is not None:
self.opt.step()
self.opt.optimizer.zero_grad()
return loss.item() * norm.float().item()
The run_epoch function can be fixed accordingly by casting things like batch.ntokens to FloatTensor
from annotated-transformer.
@richielo hi, I find that after moving to Pytorch 0.4.
The memory cached keeps increasing in training even I have freed it after every batch trianing.
from annotated-transformer.
@DaoD Hi, I think this may be a different issue. After fixing it with the way I mentioned above, it works fine except for multi-gpu training. are you sure you are freeing memory correctly or declaring variable correctly?
from annotated-transformer.
@richielo Thx. I find that I do not use norm.item(), this may cause creating the parameter multiple times.
And I also replace loss.data[0] with loss.item(), I guess this is more stable without any warnings.
Anyway, thx for your warm help.
from annotated-transformer.
@DaoD Oh yes, I missed that too! Glad that you figured it out 👍
from annotated-transformer.
Any updates on this? I've fixed some fixes, but many others are still required...
from annotated-transformer.
@davidalbertonogueira you can refer to my repository.
https://github.com/DaoD/annotated-transformer/tree/master/src
Hope this can help you.
from annotated-transformer.
@davidalbertonogueira you can refer to my repository.
https://github.com/DaoD/annotated-transformer/tree/master/src
Hope this can help you.
I forked you repo. But get the result below:
Process finished with exit code 136 (interrupted by signal 8: SIGFPE)
Do you know why?
from annotated-transformer.
@QianhuiWu hi, it looks like there is a "divide by zero" error.
I guess it would happen when 'total_tokens = 0'. And I think there should be a check in the function run_epoch().
But, the parameter 'total_tokens' and 'tokens' are the number of tokens in the epoch, thus should be integers. I'm confused why this can be fixed by setting them as float numbers.
from annotated-transformer.
SimpleLossCompute
Maybe it is the norm that should be blamed, I solved this by fixing the Batch class,
self.ntokens = (self.trg_y != pad).data.sum()
-> self.ntokens = (self.trg_y != pad).data.sum().item()
for (self.trg_y != pad).data.sum() is a tensor.
from annotated-transformer.
Does anyone have a fix that works? I've tried a few of the suggestions above but I still get this error:
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)
when trying to evalute this line:
run_epoch(data_gen(V, 30, 20), model,
SimpleLossCompute(model.generator, criterion, model_opt))
I'm using pytorch version 1.0.0
from annotated-transformer.
from annotated-transformer.
I'm still getting the same error when using the code from ictnlp-wshugen above. Here's the full error (note: I'm using python 3.7.1 on Windows, pytorch 1.0.0, CUDA 9.0.)
Error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)
python first.py
C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
Traceback (most recent call last):
File "first.py", line 58, in <module>
run_epoch(data_gen(V, 30, 20), model, SimpleLossCompute(model.generator, criterion, model_opt))
File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\flow.py", line 54, in run_epoch
out = model.forward(batch.src, batch.trg, batch.src_mask, batch.trg_mask)
File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 24, in forward
return self.decode(self.encode(src, src_mask), src_mask, tgt, tgt_mask)
File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 27, in encode
return self.encoder(self.src_embed(src), src_mask)
File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
input = module(input)
File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\embeddings.py", line 16, in forward
return self.lut(x) * math.sqrt(self.d_model)
File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 118, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 1454, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)
from annotated-transformer.
from annotated-transformer.
@vthorsteinsson @srush @QianhuiWu @shehel @teucer @DaoD @eraoul @ictnlp-wshugen
Please try and look into this issue:
#28
Need help in running it on pytorch 1.0.1.post2
from annotated-transformer.
I'm still getting the same error when using the code from ictnlp-wshugen above. Here's the full error (note: I'm using python 3.7.1 on Windows, pytorch 1.0.0, CUDA 9.0.)
Error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)python first.py C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) Traceback (most recent call last): File "first.py", line 58, in <module> run_epoch(data_gen(V, 30, 20), model, SimpleLossCompute(model.generator, criterion, model_opt)) File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\flow.py", line 54, in run_epoch out = model.forward(batch.src, batch.trg, batch.src_mask, batch.trg_mask) File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 24, in forward return self.decode(self.encode(src, src_mask), src_mask, tgt, tgt_mask) File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 27, in encode return self.encoder(self.src_embed(src), src_mask) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\container.py", line 92, in forward input = module(input) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\embeddings.py", line 16, in forward return self.lut(x) * math.sqrt(self.d_model) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 118, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 1454, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)
OK, the problem occurs in Windows Platform, now I make it runnable in Windows.
you can update the code or see the change �ictnlp-wshugen/annotated-transformer_codes@ffe3bcc
Well, I've tried your work. Everything is going well running on single GPU.
But when I trained it on multi-GPU, condition would be like this #16. Do you have any solution about it?
from annotated-transformer.
@davidalbertonogueira you can refer to my repository.
https://github.com/DaoD/annotated-transformer/tree/master/src
Hope this can help you.I forked you repo. But get the result below:
Process finished with exit code 136 (interrupted by signal 8: SIGFPE)
Do you know why?It seems that there is something wrong with the float point.
I made a little change in the run_epoch(), then the problem is solved:
total_tokens += batch.ntokens.float()
tokens += batch.ntokens.float()
thank you for your sharing code!
from annotated-transformer.
I'm still getting the same error when using the code from ictnlp-wshugen above. Here's the full error (note: I'm using python 3.7.1 on Windows, pytorch 1.0.0, CUDA 9.0.)
Error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)python first.py C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) Traceback (most recent call last): File "first.py", line 58, in <module> run_epoch(data_gen(V, 30, 20), model, SimpleLossCompute(model.generator, criterion, model_opt)) File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\flow.py", line 54, in run_epoch out = model.forward(batch.src, batch.trg, batch.src_mask, batch.trg_mask) File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 24, in forward return self.decode(self.encode(src, src_mask), src_mask, tgt, tgt_mask) File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 27, in encode return self.encoder(self.src_embed(src), src_mask) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\container.py", line 92, in forward input = module(input) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\embeddings.py", line 16, in forward return self.lut(x) * math.sqrt(self.d_model) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 118, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 1454, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)
OK, the problem occurs in Windows Platform, now I make it runnable in Windows.
you can update the code or see the change �ictnlp-wshugen/annotated-transformer_codes@ffe3bcc
Well, I've tried your work. Everything is going well running on single GPU. But when I trained it on multi-GPU, condition would be like this #16. Do you have any solution about it?
Hi, could you mind sharing how to run it on a single gpu? I keep getting a CUDA out of memory error (using Google Colab)
from annotated-transformer.
Related Issues (20)
- Some doubts about SublayerConnection HOT 5
- How long is the training process? HOT 3
- TypeError: dropout(): argument 'input' (position 1) must be Tensor, not NoneType HOT 1
- nbatches vs batch_size
- Visualization issue
- No need for a generator in the EncoderDecoder class HOT 1
- How to do the inference?
- How to calculate the BLEU score? HOT 1
- label smoothing inf err HOT 2
- Incorrect implementaion of SublayerConnection class HOT 2
- use the greedy_decode two times in check_outputs function
- Issue with Spacy Dependency Version: issubclass() arg 1 must be a class HOT 4
- dockerfile HOT 1
- annotated-transformer
- The first column of synthetic data in the first example should be set to 0 instead of 1?
- note on torch 1.11 vs torch 2.1 compatibility
- About transpose processing in `MultiHeadedAttention` class. HOT 2
- Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None. HOT 7
- Epoch Training: Help HOT 1
- Typo in Multihead-attention: HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from annotated-transformer.