Comments (21)
See #2 (comment) for possible SGD warm-up requirements.
from yolov3.
Hi, thanks for the experiment. are you using weight decay here for this Adam experiment?
from yolov3.
@glenn-jocher In the official darknet code, the burn_in config is defined here:
https://github.com/pjreddie/darknet/blob/680d3bde1924c8ee2d1c1dea54d3e56a05ca9a26/cfg/yolov3.cfg#L19
If current batch_num < burn_in, the learning rate would be scaled based on the value of burn_in:
https://github.com/pjreddie/darknet/blob/680d3bde1924c8ee2d1c1dea54d3e56a05ca9a26/src/network.c#L95
from yolov3.
@xyutao thanks for the links. This looks like a fairly easy change to implement. I can go ahead and submit a commit with this. Have you tried this out successfully on your side?
from yolov3.
@xyutao From your darknet link I think the correct burnin in formula is this, which will slowly ramp up the LR to 1e-3 after 1000 iterations and leave it there:
# SGD burn-in
if (epoch == 0) & (i <= 1000):
power = ??
lr = 1e-3 * (i / 1000) ** power
for g in optimizer.param_groups:
g['lr'] = lr
I can't find the correct value of power though. I tried with power=2
and training diverged around 200 iterations. Increasing to power=5
training diverges after 400 iterations. power=10
also diverges.
I see that the divergence is in the width and height losses, the other terms appear fine. I think one problem may be that the width and height terms are bound at zero at the bottom, but are unbound at the top, so its possible that the network is predicting impossibly large widths and heights, causing the losses there to diverge. I may need to bound these or redefine the width and height terms and try again. I used a variant of the width and height terms for a different project that had no divergence problems with SGD.
from yolov3.
@glenn-jocher The default value of power is 4. See:
https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/parser.c#L698
When I tried other yolov3 implementations, the training successfully converged with power = 1 to 4. Maybe just as you thought, it is the problem of the weight and height losses.
from yolov3.
Closing this as SGD burn-in has been successfully implemented.
from yolov3.
although I know this is closed, we exclusively use Adam for training with our fork of this repo, it instantly took us from a precision on our dataset of 20% to 85% (with slight mAP increases as well)
from yolov3.
@kieranstrobel that's interesting. Have you trained COCO as well with Adam?
We tried Adam as well as Adabound recently, but observed performance drops on both on COCO. What LR did you use for Adam vs SGD?
from yolov3.
@kieranstrobel I ran a quick comparison using our small coco dataset coco_16img.data
. I used the default hyperparameters for both. i.e.:
# Optimizer
optimizer = optim.Adam(model.parameters(), lr=hyp['lr0'], weight_decay=hyp['weight_decay'])
# optimizer = AdaBound(model.parameters(), lr=hyp['lr0'], final_lr=0.1)
optimizer = optim.SGD(model.parameters(), lr=hyp['lr0'], momentum=hyp['momentum'], weight_decay=hyp['weight_decay'], nesterov=True)
The training command was:
python3 train.py --data data/coco_16img.data --batch-size 16 --accumulate 1 --img-size 320 --nosave --cache
BTW, the burn-in period (original issue topic) has been removed because the wh-divergence issue is now resolved due to GIoU loss replacing the four individual regression losses (x, y, w, h). This example scenario above should actually favor Adam, since Adam is known for reducing training losses moreso than validation losses (and then failing to generalize well), because the dataset trains and validates on the same images, but SGD still clearly outperforms it.
Can you plot a comparison using your custom dataset?
from yolov3.
I see xNets https://arxiv.org/pdf/1908.04646v2.pdf uses Adam at 5E5 LR in their results, so I did a study again of Adam results on the first epoch of COCO at 320. The results show lowest validation loss and best mAP (0.202) at 9E-5 Adam LR. This exceeds the 0.161 SGD mAP after the same 1 epoch. The validation losses were also lower with Adam:
[1.79, 3.96, 2.44]
Adam val losses lr=9E-5 (giou, obj, cls)[1.80, 4.15, 2.68]
SGD val losses lr=0.0023, momentum=0.97 (giou, obj, cls)
I will try to train to 27 epochs with Adam at this LR next.
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # iE5 Adam LR
do
python3 train.py --epochs 1 --weights weights/darknet53.conv.74 --batch-size 64 --accumulate 1 --img-size 320 --var ${i}
done
sudo shutdown
from yolov3.
44.9 SGD vs 45.2 Adam 9E-5 LR:
from yolov3.
@xuefeicao yes, for both. Search train.py for weight_decay.
from yolov3.
from yolov3.
This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.
from yolov3.
@glenn-jocher does this experimental results aply to this repo now,when i see the optimizer is also SGD ?
from yolov3.
@nanhui69 yes, but I would recommend yolov5 for new projects.
https://github.com/ultralytics/yolov5
from yolov3.
@glenn-jocher Do you happen to know if YOLOv5 has the same issue with Adam performing better than the default SGD?
from yolov3.
@danielcrane I don't know, but you can test Adam out on your own training workflows by passing the --adam flag (make sure you reduce your LR accordingly in your hyp file):
Line 6 in c1f8dd9
from yolov3.
@glenn-jocher Understood, thanks!
from yolov3.
@danielcrane you're welcome! If you have any other questions, feel free to ask.
from yolov3.
Related Issues (20)
- About the instructions and code comments HOT 3
- A hopelessly long try to replicate the YOLOv3 kernel HOT 2
- Change in the anchor boxes HOT 10
- ❗️Closed per Code of Conduct HOT 1
- no anchor_grid in V9.6.0 yolov3.pt HOT 5
- Convert YOLOv3 dataset format to YOLOv8 HOT 3
- What's the difference between it and Yolov3 by Joseph Redmon ? HOT 7
- Integrating YOLOv8 into YOLOv3 Ultralytics HOT 2
- Seeking Advice on Equivalent YOLOv5 Variant to Standard YOLOv3 HOT 1
- Unexpectedly large trained model size (~200 MB .pt and ~400 MB .onnx) HOT 4
- Training requires much more VRAM than v5/v8 and results in ~200 MB models comparing to <15 MB models of v5/v8 HOT 5
- how to train your yolov8?
- Need info regarding yolov3-tiny anchors, dataset creation and loss function. HOT 5
- Cannot compute loss function from best model HOT 1
- yolov3_ros input topic channel problem HOT 5
- Issue with training YOLOv3-tiny from scratch HOT 4
- yolov3.pt HOT 3
- 关于调用推理代码块遇到的与一些问题 HOT 8
- Bug of incomplete information display HOT 2
- No module named 'ultralytics.yolo' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yolov3.