Coder Social home page Coder Social logo

Comments (16)

thtrieu avatar thtrieu commented on May 22, 2024 12

What batch size are you using? Because without the batch size, step number cannot say anything about how far you've gone. According to the author of YOLO, he used pretty powerful machine and the training have two stages with the first stage (training convolution layer with average pool) takes about a week. So you should be patient if you're not that far from the beginning.

Training deep net is more of an art than science. So my suggestion is you first train your model on a small data size first to see if the model is able to overfit over training set, if not then there's a problem to solve before proceeding. Notice due to data augmentation built in the code, you can't really reach 0.0 for the loss.

I've trained a few configs on my code and the loss can shrink down well from > 10.0 to around 0.5 or below (parameters C, B, S are not relevant since the loss is averaged across the output tensor). I usually start with default learning rate 1e-5, and batch size 16 or even 8 to speed up the loss first until it stops decreasing and seem to be unstable.

Then, learning rate will be decreased down to 1e-6 and batch size increase to 32 and 64 whenever I feel that the loss get stuck (and testing still does not give good result). You can switch to other adaptive learning rate training algorithm (e.g. Adadelta, Adam, etc) if you feel like familiar with them by editing ./yolo/train.py/yolo_loss()

You can also look at the learning rate policy the YOLO author used, inside .cfg files.

Best of luck

from darkflow.

thtrieu avatar thtrieu commented on May 22, 2024 11

Just a friendly ping. I've finish training for a YOLO of 4 classes, if you are interested I will write some notes about the process of training it.

from darkflow.

thtrieu avatar thtrieu commented on May 22, 2024 9

I have updated the code for many cycles since then, so it will affect the scaling of loss value. But mechanism is the same. Here are my notes:

  1. You should really re-use the trained weights, this is a supported feature in darkflow. Preferably 2 or 3 first layers taken from the original YOLO would be good.

  2. Before training, run a fine-tuning on some trained models to see the loss value. These are converged values, so that is your goal to get down around these numbers. (Approximately 1.5 ~ 1.7)

  3. Make sure to overfit a very small training dataset before going further. This makes sure the logic is working.

  4. When get stuck at the loss value, overfit a very small set of data training again. If you are able to get the loss down, your model is underfitting, so consider two options: 1. increase the size of layers, 2. increase the depth. The later is usually better in terms of generalization and speed.

  5. Occasionally visualise the prediction and see what kind of mistake the model is making. In my case it was predicting almost all classes to be person due to heavily skewed data. When I gradually set the weight for class term in the loss objective higher, this mistake get less severe. Notice replicating other class’s data to achieve balance will result in an unnatural distribution of training data. So I would advise against this.

Good luck, I'd love to hear update from your training.

from darkflow.

KentChun33333 avatar KentChun33333 commented on May 22, 2024

@thtrieu What a nice suggestion !

I also encountered similar issues, and find out that pre-trained weight might be a really help. More, quality and quantities of data-itself is really important especially while training a yolo-style network, it just too hard to converge well ...

I am still struggling on this '

from darkflow.

fengjian0106 avatar fengjian0106 commented on May 22, 2024

@thtrieu thank you~

In my first round of training, the batch size is 12. I get your point when you say patient.

My final goal is to find the bounding box of object which is not in the Imagenet, so I do the training without pre-trained model.

Thanks again!

from darkflow.

fengjian0106 avatar fengjian0106 commented on May 22, 2024

@thtrieu Yes, I am looking forward to it.

from darkflow.

MisayaZ avatar MisayaZ commented on May 22, 2024

@thtrieu I run a fine-tuning on tiny-yolo-voc models, but the loss value is approximately 6, not 1.5~1.7.

from darkflow.

thtrieu avatar thtrieu commented on May 22, 2024

I don't have much experience in YOLOv2, maybe @ryansun1900 does.

Here is why YOLOv2's loss is much higher than that of v1:

  • In v2, there are 13 x 13 x 5 = 845 proposal bounding boxes, each with its own confidence (objectness) and conditional class probability terms.
  • In v1, there are only 7 x 7 x 2 = 98 proposal bounding boxes, sharing the same confidence term as well as conditional class probability terms.

So the output volume of v2 is much larger than v1 (21125 vs 1470), and so is the loss.

from darkflow.

ryansun1900 avatar ryansun1900 commented on May 22, 2024

So far, I don't have much experience in training large data too.
But thtrieu's explanation is correct. The loss implementation is different between yolov1 & yolov2. I think the loss difference is reasonable.

from darkflow.

 avatar commented on May 22, 2024

thanks for the good tips :)

from darkflow.

Shameendra avatar Shameendra commented on May 22, 2024

Hi ,

  1. When get stuck at the loss value, overfit a very small set of data training again. If you are able to get the loss down, your model is underfitting, so consider two options: 1. increase the size of layers, 2. increase the depth. The later is usually better in terms of generalization and speed.

@thtrieu can you please explain what do you mean by increase the deapth? How do we do it? By changing something in the cfg file? I am training for 9 classes with yolov2 and have creazed a cfg file called yolov2-tiny-9c.cfg. SO i make changes in this file or in the original yolov2-tiny.cfg file?

from darkflow.

CdAB63 avatar CdAB63 commented on May 22, 2024

I`m training a model for 1 class, yolov3-tiny.cfg. Training set 6800 jpegs ranging from 1 to 24 objects in each jpeg. Training set images normalized to 720 lines (height) but variable width. Batch size 24, subdivisions 2. Image size 512x512. Learning rate 0.0015. Max batches 450000. Although mAP is high (about 98%) average loss is still above 0.5. I guess that model is fully trained at iteration 31500 because beyond this point mAP is stable at 0.98 (98%).

My doubt is: I feel the model is overfit because it does not generalizes well or it does not generalizes well because average loss is still high?

Screenshot_20200615_104714

from darkflow.

luthfi07 avatar luthfi07 commented on May 22, 2024

I`m training a model for 1 class, yolov3-tiny.cfg. Training set 6800 jpegs ranging from 1 to 24 objects in each jpeg. Training set images normalized to 720 lines (height) but variable width. Batch size 24, subdivisions 2. Image size 512x512. Learning rate 0.0015. Max batches 450000. Although mAP is high (about 98%) average loss is still above 0.5. I guess that model is fully trained at iteration 31500 because beyond this point mAP is stable at 0.98 (98%).

My doubt is: I feel the model is overfit because it does not generalizes well or it does not generalizes well because average loss is still high?

Screenshot_20200615_104714

hey can you tell me how to print chart like this when you training your model?

from darkflow.

gitgurra avatar gitgurra commented on May 22, 2024

hey can you tell me how to print chart like this when you training your model?

I think he's using AlexeyAB's repo which has GUI support.

from darkflow.

NayabZahra avatar NayabZahra commented on May 22, 2024

Just a friendly ping. I've finish training for a YOLO of 4 classes, if you are interested I will write some notes about the process of training it.

I want to get complete loss function computation as I am facing a problem in understanding it

from darkflow.

krkrman avatar krkrman commented on May 22, 2024

ranging

do not write the parameter dont_show in the training command

from darkflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.