Coder Social home page Coder Social logo

utils.pytorch's People

Contributors

eladhoffer avatar nadavbh12 avatar scottclowe avatar tbennun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

utils.pytorch's Issues

Usage examples

Very nice utils!

It'd be cool if you provided examples in action for some/all of them.

I am particularly interested in the cross entropy with label smoothing.

Thanks! #

smoothed cross entropy loss

Hi , I noticed you changed cross_entropy.py lately to take account for ignore_index, but I think the new version might still be missing a step at the end? I was wondering in cross_entropy.py , when you calculate the smoothed cross entropy as the sum of the ce + kl, averaged over the number of tokens in the batch, shouldn't you subtract the padding tokens from the number of tokens in the denominator before you average?

if reduce:
kl = kl.mean() if size_average else kl.sum()

# for label smoothing with parameter eps:
if onehot_smoothing:
    entropy = -(math.log(1 - smooth_eps) + smooth_eps *
                math.log(smooth_eps / ((num_classes - 1) * (1 - smooth_eps))))
else:
    entropy = -(target * target.log()).sum()

if size_average:
    kl *= num_classes
    entropy /= logits.size(0)

here when you divide by logits.size(0), i think logits.size(0) = batch_size*sequence_length which includes padding_tokens in the total count. Shouldn't it be something like

num_tokens = targets.ne(ignore_idx).sum()
...
kl = kl.sum() / num_tokens
...
entropy /= num_tokens 

Maybe with some epsilon introduced to make sure you don't divide by zero? Apologies if im wrong about this.

Several Issues about the quantize.py

Thanks for composing this code, I am using quantize.py to quantize my model.
And I meet the following issues:

  1. model.register_buffer(n, p): string n cannnot include '.', so I replce all the '.' with '_'. then problem solved.
    2.q_x.scale * (q_x.tensor.float() - q_x.zero_point): Maybe before quantization, the bytetensor should be converted to float tensor?

ImportError: cannot import name DEFAULT_PALETTE

Traceback (most recent call last):
  File "main.py", line 13, in <module>
    from seq2seq.tools.utils.log import setup_logging
  File "/home/demobin/github/seq2seq.pytorch/seq2seq/tools/utils/log.py", line 9, in <module>
    from bokeh.plotting.helpers import DEFAULT_PALETTE
ImportError: cannot import name DEFAULT_PALETTE

Possible Bug in LabelSmoothing for Binary Cross Entropy

Possible bug in LabelSmoothing for Binary Cross Entropy.

Current Code:

smooth_eps = smooth_eps or 0
    if smooth_eps > 0:
        target = target.float()
        target.add_(smooth_eps).div_(2.)

Shouldn't it be:

smooth_eps = smooth_eps or 0
    if smooth_eps > 0:
        target = target.float()* (1- smooth_eps)
        target = target + (smooth_eps/2)

Question about label smoothing implementation

Hi, I've been going over your implementation of label smoothing for cross-entropy, and I don't understand why, in this code in cross_entropy.py:

        eps_sum = smooth_eps / num_classes
        eps_nll = 1. - eps_sum - smooth_eps
        likelihood = lsm.gather(dim=-1, index=target.unsqueeze(-1)).squeeze(-1)
        loss = -(eps_nll * likelihood + eps_sum * lsm.sum(-1))

you have eps_nll = 1. - eps_sum - smooth_eps instead of just eps_nll = 1. - smooth_eps. Doesn't eps_nll = 1. - eps_sum - smooth_eps introduce an extra term in the loss that shouldn't be there? Going by the paper,

sum_k q(k) log p(k),

is likelihood in the above snippet and

sum_k log p(k),

is lsm.sum(-1). The label-smoothed loss, for uniform u(k), should be

  • (1-epsilon) sum_k q(k) log p(k) - epsilon/K sum_k log p(k),

so shouldn't it be

loss = -((1 - smooth_eps) * likelihood + eps_sum * lsm.sum(-1))?

tests / instructions

Hi Elad - ma kore.
I am trying to use the smooth cross entropy. I am not sure what exactly is the expected input. Can you add some rough doc or an example, please?
Thanks,
Dan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.