Coder Social home page Coder Social logo

Comments (5)

glenn-jocher avatar glenn-jocher commented on May 14, 2024

@bobo0810 yes, I think you are talking about the SGD learning rate 'burn in', which is supposed to be much smaller for the first 1000 batches of training. This was brought up by @xyutao in issue #2.

I'm going to switch the training from Adam to SGD with burn in in a new commit soon.

from yolov3.

glenn-jocher avatar glenn-jocher commented on May 14, 2024

@bobo0810 do you have an exact definition of the learning rate over the training? I tried switching to SGD and implementing a burn-in phase but was unsuccessful, the losses diverged before the burn-in completed.

From darknet I think the correct burnin in formula is this, which will slowly ramp up the LR to 1e-3 after 1000 iterations and leave it there:

# SGD burn-in
if (epoch == 0) & (i <= 1000):
    power = ??
    lr = 1e-3 * (i / 1000) ** power
    for g in optimizer.param_groups:
        g['lr'] = lr

I can't find the correct value of power though. I tried with power=2 and training diverged around 200 iterations. Increasing to power=5 training diverges after 400 iterations. power=10 also diverges.

I see that the divergence is in the width and height losses, the other terms appear fine. I think one problem may be that the width and height terms are bound at zero at the bottom, but are unbound at the top, so its possible that the network is predicting impossibly large widths and heights, causing the losses there to diverge. I may need to bound these or redefine the width and height terms and try again. I used a variant of the width and height terms for a different project that had no divergence problems with SGD.

from yolov3.

glenn-jocher avatar glenn-jocher commented on May 14, 2024

@bobo0810 I've switched from Adam to SGD with burn-in (which exponentially ramps up the learning rate from 0 to 0.001 over the first 1000 iterations) in commit a722601.

from yolov3.

bobo0810 avatar bobo0810 commented on May 14, 2024

thank you very much

from yolov3.

glenn-jocher avatar glenn-jocher commented on May 14, 2024

@bobo0810 your welcome, but the change opened up different issues, mainly that the height and width terms diverged during training, so I had to bound these using new height and width calculations. See issue #2 for a full explanation.

from yolov3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.