Coder Social home page Coder Social logo

Comments (27)

srush avatar srush commented on May 22, 2024 4

Yeah, I will try to do next week.

from annotated-transformer.

ictnlp-wshugen avatar ictnlp-wshugen commented on May 22, 2024 4

Does anyone have a fix that works? I've tried a few of the suggestions above but I still get this error:

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

when trying to evalute this line:

 run_epoch(data_gen(V, 30, 20), model, 
           SimpleLossCompute(model.generator, criterion, model_opt))

I'm using pytorch version 1.0.0

Try this: https://github.com/ictnlp-wshugen/annotated-transformer_codes

from annotated-transformer.

QianhuiWu avatar QianhuiWu commented on May 22, 2024 1

@davidalbertonogueira you can refer to my repository.
https://github.com/DaoD/annotated-transformer/tree/master/src
Hope this can help you.

I forked you repo. But get the result below:
Process finished with exit code 136 (interrupted by signal 8: SIGFPE)

Do you know why?

It seems that there is something wrong with the float point.
I made a little change in the run_epoch(), then the problem is solved:

total_tokens += batch.ntokens.float()
tokens += batch.ntokens.float()

from annotated-transformer.

ictnlp-wshugen avatar ictnlp-wshugen commented on May 22, 2024 1

I'm still getting the same error when using the code from ictnlp-wshugen above. Here's the full error (note: I'm using python 3.7.1 on Windows, pytorch 1.0.0, CUDA 9.0.)

Error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

python first.py
C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
Traceback (most recent call last):
  File "first.py", line 58, in <module>
    run_epoch(data_gen(V, 30, 20), model, SimpleLossCompute(model.generator, criterion, model_opt))
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\flow.py", line 54, in run_epoch
    out = model.forward(batch.src, batch.trg, batch.src_mask, batch.trg_mask)
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 24, in forward
    return self.decode(self.encode(src, src_mask), src_mask, tgt, tgt_mask)
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 27, in encode
    return self.encoder(self.src_embed(src), src_mask)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
    input = module(input)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\embeddings.py", line 16, in forward
    return self.lut(x) * math.sqrt(self.d_model)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 118, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 1454, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

OK, the problem occurs in Windows Platform, now I make it runnable in Windows.
you can update the code or see the change �ictnlp-wshugen/annotated-transformer_codes@ffe3bcc
image

from annotated-transformer.

shehel avatar shehel commented on May 22, 2024

I gave it a try and found the loss not decreasing beyond 1.5 for the first copying example.. It was the same for pytorch 0.2 aswell.

from annotated-transformer.

vthorsteinsson avatar vthorsteinsson commented on May 22, 2024

I have gotten it to work with pytorch 0.4 up until and including the copying example. It required inserting a few .item() and .float() calls. Would you like a PR?

from annotated-transformer.

shehel avatar shehel commented on May 22, 2024

Yep, its as simple as @vthorsteinsson mentioned until first example atleast. Jumping between pytorch versions, I missed the dim parameter in softmax.

from annotated-transformer.

teucer avatar teucer commented on May 22, 2024

@vthorsteinsson would it be possible to have your "corrections"?

from annotated-transformer.

richielo avatar richielo commented on May 22, 2024

@srush Any updates on how to fix things for PyTorch 0.4? Have been stuck on this for quite some time, thank you!

from annotated-transformer.

richielo avatar richielo commented on May 22, 2024

Many may have already solved it. But for those who face the same issue with the toy problem, I fixed the error by changing SimpleLossCompute and runEpoch, mostly due to the tensor norm, which has to be changed to FloatTensor from LongTensor:

class SimpleLossCompute:
    "A simple loss compute and train function."
    def __init__(self, generator, criterion, opt=None):
        self.generator = generator
        self.criterion = criterion
        self.opt = opt
    
    def __call__(self, x, y, norm):
        x = self.generator(x)
        loss = self.criterion(x.contiguous().view(-1, x.size(-1)), y.contiguous().view(-1)) / norm.float()
        loss.backward()
        if self.opt is not None:
            self.opt.step()
            self.opt.optimizer.zero_grad()
        return loss.item() * norm.float().item()

The run_epoch function can be fixed accordingly by casting things like batch.ntokens to FloatTensor

from annotated-transformer.

DaoD avatar DaoD commented on May 22, 2024

@richielo hi, I find that after moving to Pytorch 0.4.
The memory cached keeps increasing in training even I have freed it after every batch trianing.

from annotated-transformer.

richielo avatar richielo commented on May 22, 2024

@DaoD Hi, I think this may be a different issue. After fixing it with the way I mentioned above, it works fine except for multi-gpu training. are you sure you are freeing memory correctly or declaring variable correctly?

from annotated-transformer.

DaoD avatar DaoD commented on May 22, 2024

@richielo Thx. I find that I do not use norm.item(), this may cause creating the parameter multiple times.
And I also replace loss.data[0] with loss.item(), I guess this is more stable without any warnings.
Anyway, thx for your warm help.

from annotated-transformer.

richielo avatar richielo commented on May 22, 2024

@DaoD Oh yes, I missed that too! Glad that you figured it out 👍

from annotated-transformer.

davidalbertonogueira avatar davidalbertonogueira commented on May 22, 2024

Any updates on this? I've fixed some fixes, but many others are still required...

from annotated-transformer.

DaoD avatar DaoD commented on May 22, 2024

@davidalbertonogueira you can refer to my repository.
https://github.com/DaoD/annotated-transformer/tree/master/src
Hope this can help you.

from annotated-transformer.

QianhuiWu avatar QianhuiWu commented on May 22, 2024

@davidalbertonogueira you can refer to my repository.
https://github.com/DaoD/annotated-transformer/tree/master/src
Hope this can help you.

I forked you repo. But get the result below:
Process finished with exit code 136 (interrupted by signal 8: SIGFPE)

Do you know why?

from annotated-transformer.

DaoD avatar DaoD commented on May 22, 2024

@QianhuiWu hi, it looks like there is a "divide by zero" error.
I guess it would happen when 'total_tokens = 0'. And I think there should be a check in the function run_epoch().
But, the parameter 'total_tokens' and 'tokens' are the number of tokens in the epoch, thus should be integers. I'm confused why this can be fixed by setting them as float numbers.

from annotated-transformer.

bigempire avatar bigempire commented on May 22, 2024

SimpleLossCompute

Maybe it is the norm that should be blamed, I solved this by fixing the Batch class,
self.ntokens = (self.trg_y != pad).data.sum()
-> self.ntokens = (self.trg_y != pad).data.sum().item()
for (self.trg_y != pad).data.sum() is a tensor.

from annotated-transformer.

eraoul avatar eraoul commented on May 22, 2024

Does anyone have a fix that works? I've tried a few of the suggestions above but I still get this error:

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

when trying to evalute this line:

 run_epoch(data_gen(V, 30, 20), model, 
           SimpleLossCompute(model.generator, criterion, model_opt))

I'm using pytorch version 1.0.0

from annotated-transformer.

eraoul avatar eraoul commented on May 22, 2024

from annotated-transformer.

eraoul avatar eraoul commented on May 22, 2024

I'm still getting the same error when using the code from ictnlp-wshugen above. Here's the full error (note: I'm using python 3.7.1 on Windows, pytorch 1.0.0, CUDA 9.0.)

Error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

python first.py
C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
Traceback (most recent call last):
  File "first.py", line 58, in <module>
    run_epoch(data_gen(V, 30, 20), model, SimpleLossCompute(model.generator, criterion, model_opt))
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\flow.py", line 54, in run_epoch
    out = model.forward(batch.src, batch.trg, batch.src_mask, batch.trg_mask)
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 24, in forward
    return self.decode(self.encode(src, src_mask), src_mask, tgt, tgt_mask)
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 27, in encode
    return self.encoder(self.src_embed(src), src_mask)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
    input = module(input)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\embeddings.py", line 16, in forward
    return self.lut(x) * math.sqrt(self.d_model)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 118, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 1454, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

from annotated-transformer.

eraoul avatar eraoul commented on May 22, 2024

from annotated-transformer.

AmoghM avatar AmoghM commented on May 22, 2024

@vthorsteinsson @srush @QianhuiWu @shehel @teucer @DaoD @eraoul @ictnlp-wshugen
Please try and look into this issue:
#28
Need help in running it on pytorch 1.0.1.post2

from annotated-transformer.

benbijituo avatar benbijituo commented on May 22, 2024

I'm still getting the same error when using the code from ictnlp-wshugen above. Here's the full error (note: I'm using python 3.7.1 on Windows, pytorch 1.0.0, CUDA 9.0.)
Error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

python first.py
C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
Traceback (most recent call last):
  File "first.py", line 58, in <module>
    run_epoch(data_gen(V, 30, 20), model, SimpleLossCompute(model.generator, criterion, model_opt))
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\flow.py", line 54, in run_epoch
    out = model.forward(batch.src, batch.trg, batch.src_mask, batch.trg_mask)
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 24, in forward
    return self.decode(self.encode(src, src_mask), src_mask, tgt, tgt_mask)
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 27, in encode
    return self.encoder(self.src_embed(src), src_mask)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
    input = module(input)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\embeddings.py", line 16, in forward
    return self.lut(x) * math.sqrt(self.d_model)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 118, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 1454, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

OK, the problem occurs in Windows Platform, now I make it runnable in Windows.
you can update the code or see the change �ictnlp-wshugen/annotated-transformer_codes@ffe3bcc
image

Well, I've tried your work. Everything is going well running on single GPU.
But when I trained it on multi-GPU, condition would be like this #16. Do you have any solution about it?

from annotated-transformer.

qzxyxiaobao avatar qzxyxiaobao commented on May 22, 2024

@davidalbertonogueira you can refer to my repository.
https://github.com/DaoD/annotated-transformer/tree/master/src
Hope this can help you.

I forked you repo. But get the result below:
Process finished with exit code 136 (interrupted by signal 8: SIGFPE)
Do you know why?

It seems that there is something wrong with the float point.
I made a little change in the run_epoch(), then the problem is solved:

total_tokens += batch.ntokens.float()
tokens += batch.ntokens.float()

thank you for your sharing code!

from annotated-transformer.

yuvaraj91 avatar yuvaraj91 commented on May 22, 2024

I'm still getting the same error when using the code from ictnlp-wshugen above. Here's the full error (note: I'm using python 3.7.1 on Windows, pytorch 1.0.0, CUDA 9.0.)
Error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

python first.py
C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
Traceback (most recent call last):
  File "first.py", line 58, in <module>
    run_epoch(data_gen(V, 30, 20), model, SimpleLossCompute(model.generator, criterion, model_opt))
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\flow.py", line 54, in run_epoch
    out = model.forward(batch.src, batch.trg, batch.src_mask, batch.trg_mask)
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 24, in forward
    return self.decode(self.encode(src, src_mask), src_mask, tgt, tgt_mask)
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\encoder_decoder.py", line 27, in encode
    return self.encoder(self.src_embed(src), src_mask)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
    input = module(input)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\ernichol\OneDrive - Microsoft\code\pytorch\transformer\refactored-annotated-transformer_codes-master\transformer\embeddings.py", line 16, in forward
    return self.lut(x) * math.sqrt(self.d_model)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 118, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\ernichol\AppData\Local\Continuum\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 1454, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

OK, the problem occurs in Windows Platform, now I make it runnable in Windows.
you can update the code or see the change �ictnlp-wshugen/annotated-transformer_codes@ffe3bcc
image

Well, I've tried your work. Everything is going well running on single GPU. But when I trained it on multi-GPU, condition would be like this #16. Do you have any solution about it?

Hi, could you mind sharing how to run it on a single gpu? I keep getting a CUDA out of memory error (using Google Colab)

from annotated-transformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.