Hi , didn't the learning rate update during the training phase?

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

SGD Learning Rate 'Burn In' about yolov3 HOT 5 CLOSED

ultralytics commented on May 14, 2024

SGD Learning Rate 'Burn In'

from yolov3.

Comments (5)

glenn-jocher commented on May 14, 2024

@bobo0810 yes, I think you are talking about the SGD learning rate 'burn in', which is supposed to be much smaller for the first 1000 batches of training. This was brought up by @xyutao in issue #2.

I'm going to switch the training from Adam to SGD with burn in in a new commit soon.

from yolov3.

glenn-jocher commented on May 14, 2024

@bobo0810 do you have an exact definition of the learning rate over the training? I tried switching to SGD and implementing a burn-in phase but was unsuccessful, the losses diverged before the burn-in completed.

From darknet I think the correct burnin in formula is this, which will slowly ramp up the LR to 1e-3 after 1000 iterations and leave it there:

# SGD burn-in
if (epoch == 0) & (i <= 1000):
    power = ??
    lr = 1e-3 * (i / 1000) ** power
    for g in optimizer.param_groups:
        g['lr'] = lr

I can't find the correct value of power though. I tried with power=2 and training diverged around 200 iterations. Increasing to power=5 training diverges after 400 iterations. power=10 also diverges.

I see that the divergence is in the width and height losses, the other terms appear fine. I think one problem may be that the width and height terms are bound at zero at the bottom, but are unbound at the top, so its possible that the network is predicting impossibly large widths and heights, causing the losses there to diverge. I may need to bound these or redefine the width and height terms and try again. I used a variant of the width and height terms for a different project that had no divergence problems with SGD.

from yolov3.

glenn-jocher commented on May 14, 2024

@bobo0810 I've switched from Adam to SGD with burn-in (which exponentially ramps up the learning rate from 0 to 0.001 over the first 1000 iterations) in commit a722601.

from yolov3.

bobo0810 commented on May 14, 2024

thank you very much

from yolov3.

glenn-jocher commented on May 14, 2024

@bobo0810 your welcome, but the change opened up different issues, mainly that the height and width terms diverged during training, so I had to bound these using new height and width calculations. See issue #2 for a full explanation.

from yolov3.

Recommend Projects

SGD Learning Rate 'Burn In' about yolov3 HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent