Comments (5)
@bobo0810 yes, I think you are talking about the SGD learning rate 'burn in', which is supposed to be much smaller for the first 1000 batches of training. This was brought up by @xyutao in issue #2.
I'm going to switch the training from Adam to SGD with burn in in a new commit soon.
from yolov3.
@bobo0810 do you have an exact definition of the learning rate over the training? I tried switching to SGD and implementing a burn-in phase but was unsuccessful, the losses diverged before the burn-in completed.
From darknet I think the correct burnin in formula is this, which will slowly ramp up the LR to 1e-3 after 1000 iterations and leave it there:
# SGD burn-in
if (epoch == 0) & (i <= 1000):
power = ??
lr = 1e-3 * (i / 1000) ** power
for g in optimizer.param_groups:
g['lr'] = lr
I can't find the correct value of power though. I tried with power=2
and training diverged around 200 iterations. Increasing to power=5
training diverges after 400 iterations. power=10
also diverges.
I see that the divergence is in the width and height losses, the other terms appear fine. I think one problem may be that the width and height terms are bound at zero at the bottom, but are unbound at the top, so its possible that the network is predicting impossibly large widths and heights, causing the losses there to diverge. I may need to bound these or redefine the width and height terms and try again. I used a variant of the width and height terms for a different project that had no divergence problems with SGD.
from yolov3.
@bobo0810 I've switched from Adam to SGD with burn-in (which exponentially ramps up the learning rate from 0 to 0.001 over the first 1000 iterations) in commit a722601.
from yolov3.
thank you very much
from yolov3.
@bobo0810 your welcome, but the change opened up different issues, mainly that the height and width terms diverged during training, so I had to bound these using new height and width calculations. See issue #2 for a full explanation.
from yolov3.
Related Issues (20)
- mAP comparison between Pre-trained model & trained model (coco2017) HOT 3
- Regarding the training environment HOT 3
- hello HOT 2
- About the instructions and code comments HOT 3
- A hopelessly long try to replicate the YOLOv3 kernel HOT 2
- Change in the anchor boxes HOT 10
- ❗️Closed per Code of Conduct HOT 1
- no anchor_grid in V9.6.0 yolov3.pt HOT 5
- Convert YOLOv3 dataset format to YOLOv8 HOT 3
- What's the difference between it and Yolov3 by Joseph Redmon ? HOT 7
- Integrating YOLOv8 into YOLOv3 Ultralytics HOT 2
- Seeking Advice on Equivalent YOLOv5 Variant to Standard YOLOv3 HOT 1
- Unexpectedly large trained model size (~200 MB .pt and ~400 MB .onnx) HOT 4
- Training requires much more VRAM than v5/v8 and results in ~200 MB models comparing to <15 MB models of v5/v8 HOT 5
- how to train your yolov8?
- Need info regarding yolov3-tiny anchors, dataset creation and loss function. HOT 5
- Cannot compute loss function from best model HOT 1
- yolov3_ros input topic channel problem HOT 5
- Issue with training YOLOv3-tiny from scratch HOT 4
- yolov3.pt HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yolov3.