Coder Social home page Coder Social logo

Comments (9)

juntang-zhuang avatar juntang-zhuang commented on August 22, 2024 3

@Mut1nyJD @FilipAndersson245 Hi, I just tried a by-pass to deal with the mixed precision issue, that is to cast weight and gradient to float32, update, then cast to float16. In this way the float32 burden is only applied to the weight update, but not the backward, so the computation overload would not be too much. See the link to code juntang-zhuang/Adabelief-Optimizer#31 (comment). Please let me know if you have other suggestions.

from lightweight-gan.

juntang-zhuang avatar juntang-zhuang commented on August 22, 2024 1

I guess it might be a problem with low-precision with epsilon. Because default eps=1e-8 for Adam (1e-16 for AdaBelief), this number will be just rounded to 0 in 16 bit precision. In this case, the update is pretty close to divide by 0 (or a very very small number), and the eps completely loses its impact as in float32 format.

from lightweight-gan.

lucidrains avatar lucidrains commented on August 22, 2024

Hmm, I don't really know, you should ask @FilipAndersson245

from lightweight-gan.

FilipAndersson245 avatar FilipAndersson245 commented on August 22, 2024

I have not tested Adabelief together with mixed-precision mode, maybe submit an issue at the authors github?

from lightweight-gan.

Mut1nyJD avatar Mut1nyJD commented on August 22, 2024

I have not tested Adabelief together with mixed-precision mode, maybe submit an issue at the authors github?

Nope not yet wanted to ask here first if anyone had some success, but will do, thank you!

from lightweight-gan.

juntang-zhuang avatar juntang-zhuang commented on August 22, 2024

Hi, thanks for testing with AdaBelief. AdaBelief might be incompatible with mixed-precision, because low precision might cause difference in gradient to be 0, hence division by 0. I'm not quite familiar with low-precision training, could you point to the code where low-precision takes effect? We will look into it and perhaps find some solution for the next release of AdaBelief. Thanks a lot.

from lightweight-gan.

Mut1nyJD avatar Mut1nyJD commented on August 22, 2024

Hmm okay I am not so sure if that it is a pure AdaBelief problem it probably is just far worse there.
But if with normal Adam it seems to become unstable as well it just takes a lot longer after a bit more than 50000 iterations it crashed out with the same error. I am still running PyTorch1.6 maybe worth upgrading to 1.7

from lightweight-gan.

Mut1nyJD avatar Mut1nyJD commented on August 22, 2024

But if with normal Adam it seems to become unstable as well it just takes a lot longer after a bit more than 50000 iterations it crashed out with the same error. I am still running PyTorch1.6 maybe worth upgrading to 1.7

Strangely enough this does not seem to happen when I increase the number of attention layers, maybe just pure coincidence?

from lightweight-gan.

Mut1nyJD avatar Mut1nyJD commented on August 22, 2024

@juntang-zhuang
Great! Sounds like a reasonable workaround to me so should work. Happy to give it a try again when I have some time.

from lightweight-gan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.