Coder Social home page Coder Social logo

Comments (27)

smichalowski avatar smichalowski commented on July 26, 2024

I had to lower the batch size from 32 to 20, so the number of iterations increases. Probably thats the case.

from google_inception_v3_for_caffe.

ducha-aiki avatar ducha-aiki commented on July 26, 2024

@smichalowski instead of increasing number of iterations. you could set iter_size parameter in solver, which allows to accumulate several batches and make change instead. This could allow you to use any batch_size you can, but applying batch_size you want.

from google_inception_v3_for_caffe.

LiberiFatali avatar LiberiFatali commented on July 26, 2024

So what's the model acurracy until now? It would be great if you can upload pre-trained model after training it.
Thanks

from google_inception_v3_for_caffe.

smichalowski avatar smichalowski commented on July 26, 2024

Hi

Training finished, however accuracy is not even close to those in the paper.

I0328 20:54:29.132506  7687 solver.cpp:337] Iteration 6500000, Testing net (#0)
I0328 21:03:36.287979  7687 solver.cpp:404]     Test net output #0: acc/top-1 = 0.616755
I0328 21:03:36.288162  7687 solver.cpp:404]     Test net output #1: acc/top-5 = 0.829167
I0328 21:03:36.288192  7687 solver.cpp:404]     Test net output #2: loss = 1.79035 (* 1 = 1.79035 loss)

Maybe because scale layer was missing in my implementation as pointed by @ducha-aiki in #2.

Anyway you can download it here: Googlenet V3 iteration 65000000

Currently Im training this model with scale layers and slightly modified solver (added iter_size).

I0329 16:03:46.388962 27648 solver.cpp:337] Iteration 490000, Testing net (#0)
I0329 16:09:03.663909 27648 solver.cpp:404]     Test net output #0: acc/top-1 = 0.4969
I0329 16:09:03.671882 27648 solver.cpp:404]     Test net output #1: acc/top-5 = 0.760801
I0329 16:09:03.671911 27648 solver.cpp:404]     Test net output #2: loss = 2.23722 (* 1 = 2.23722 loss)

from google_inception_v3_for_caffe.

ducha-aiki avatar ducha-aiki commented on July 26, 2024

@smichalowski nevertheless it was nice experiment 👍

from google_inception_v3_for_caffe.

smichalowski avatar smichalowski commented on July 26, 2024

if anyone is interested in this topic:

I0408 21:30:03.270594 24678 solver.cpp:337] Iteration 1010000, Testing net (#0)
I0408 21:35:39.703930 24678 solver.cpp:404]     Test net output #0: acc/top-1 = 0.533099
I0408 21:35:39.704145 24678 solver.cpp:404]     Test net output #1: acc/top-5 = 0.778603
I0408 21:35:39.704166 24678 solver.cpp:404]     Test net output #2: loss = 2.09437 (* 1 = 2.09437 loss)

from google_inception_v3_for_caffe.

revilokeb avatar revilokeb commented on July 26, 2024

@smichalowski thanks a lot for providing your solver, train_val and snapshots. Do you have an explanation / intuition why you are not reaching the accuracy that is stated in the paper?

from google_inception_v3_for_caffe.

elezengz avatar elezengz commented on July 26, 2024

@revilokeb Hi, do you have any deploy.prototxt file that can be used to predict the probabilities. I tried to write one, but the accuracy seems bad compared with the test phase. Thanks a lot!

from google_inception_v3_for_caffe.

revilokeb avatar revilokeb commented on July 26, 2024

@elezengz I dont have it off the shelf unfortunately, will let you know when I have one

from google_inception_v3_for_caffe.

smichalowski avatar smichalowski commented on July 26, 2024

@revilokeb i don't know why its not reaching accuracy from the paper, however its still training but its not getting better....

I0511 09:52:17.478355  4483 solver.cpp:337] Iteration 2970000, Testing net (#0)
I0511 09:57:31.673902  4483 solver.cpp:404]     Test net output #0: acc/top-1 = 0.556999
I0511 09:57:31.674114  4483 solver.cpp:404]     Test net output #1: acc/top-5 = 0.793503
I0511 09:57:31.674137  4483 solver.cpp:404]     Test net output #2: loss = 1.99363 (* 1 = 1.99363 loss)

@elezengz i don't have deploy.prototxt however you can try to this: http://sites.duke.edu/rachelmemo/2015/05/05/convert-train_val-prototxt-to-deploy-prototxt/

from google_inception_v3_for_caffe.

elezengz avatar elezengz commented on July 26, 2024

@smichalowski Hi, thank you for your quick response. I am doing some work with deep learning. I have tried this one, but the accuracy is still low. For inception V3, for training, the validation loss can be around e-5, and it is really good. But when I manually test the validation data, the accuracy drops a lot. Could you give me a hand? Thanks again!

from google_inception_v3_for_caffe.

revilokeb avatar revilokeb commented on July 26, 2024

@smichalowski wrt test accuracy: interesting, as you are trying with Inception v3 what I am currently trying with Inception Resnet v2 (https://github.com/revilokeb/inception_resnetv2_caffe), and although much behind you in terms of epochs it seems I am also leveling off.
I have tried stronger drops in learning rate (not shown) which gives improvement of a few percentage points in validation error but nowhere close to what is shown in the paper and after additional few more epochs it is leveling off again. Most importantly validation accuracy after 20 epochs is far away from what Google guys are reporting.
What learning rate are you currently at?

from google_inception_v3_for_caffe.

smichalowski avatar smichalowski commented on July 26, 2024

Iteration 2984640, lr = 0.0184563

from google_inception_v3_for_caffe.

ducha-aiki avatar ducha-aiki commented on July 26, 2024

@smichalowski, @revilokeb Friend of mine has noticed that you both have small batch size, but haven`t adjusted your learning rate. My tests show that it could cause string underfitting - see https://github.com/ducha-aiki/caffenet-benchmark/blob/master/BatchSize.md

There are two possibilities:
a) reduce learning rate in proportion of reducing batch size
b) keep your learning rate as it is now, but increase iter_size in solver, so batch_size*iter_size = batch_size in paper.

from google_inception_v3_for_caffe.

revilokeb avatar revilokeb commented on July 26, 2024

@ducha-aiki many thanks for your helpful comment! Your explorations on impact of parameters on learning https://github.com/ducha-aiki/caffenet-benchmark are indeed extremely valuable. I have been aware of a few thoughts on this here http://arxiv.org/abs/1404.5997, which from a practical point of view are in line with your conclusion for batch size (note also his reasoning for weight decay).

However going through the publication http://arxiv.org/abs/1602.07261 I could not really find a comment on their batch size. They state that they are training on "20 replicas" each on Kepler GPU, probably 12GB (allowing batch_size ~ 16 on each). As I am currently running the stuff on two 12 GPUs that would require lowering my base_lr by a factor of 10 as compared to their base_lr then, right? (or increase batch_size by using iter_size as you have pointed out)

from google_inception_v3_for_caffe.

smichalowski avatar smichalowski commented on July 26, 2024

for my second attempt I've modified solver, setting iter_size: 2, but nothing changed

from google_inception_v3_for_caffe.

ducha-aiki avatar ducha-aiki commented on July 26, 2024

@revilokeb yes, there are lot of small details remain unclear there :( I think, they have effective batch size ~320, so you should try to decrease your 10x.

@smichalowski 2x could be too little difference. Their paper states:
"We have trained our networks with stochastic gradient
utilizing the TensorFlow [1] distributed machine learning
system using 50 replicas running each on a NVidia Kepler
GPU with batch size 32 for 100 epochs. "

So batch_size probably = 50x32.

However, I haven`t too much success with RMSProp on ImageNet, so there is could also the cause as well

from google_inception_v3_for_caffe.

elezengz avatar elezengz commented on July 26, 2024

@revilokeb Hi, I am trying to use your modified inception_v2. But the system reports me that:
Message type "caffe.LayerParameter" has no field named "scale_param".
It seems that my caffe does not support Scale layer. Could you give me a hand to solve it? Inception_bn is working properly. Thanks a lot!

from google_inception_v3_for_caffe.

elezengz avatar elezengz commented on July 26, 2024

@revilokeb Hi, I have tried inception bn on my own data set. I got excellent results as:
I0512 13:13:17.976313 28646 solver.cpp:406] Test net output #1: loss1_accuracy_top5 = 1
I0512 13:13:17.976336 28646 solver.cpp:406] Test net output #2: loss1_loss = 0.0497387 (* 0.3 = 0.0149216 loss)
I0512 13:13:17.976344 28646 solver.cpp:406] Test net output #3: loss2_accuracy_top1 = 0.986902
I0512 13:13:17.976359 28646 solver.cpp:406] Test net output #4: loss2_accuracy_top5 = 1
I0512 13:13:17.976368 28646 solver.cpp:406] Test net output #5: loss2_loss = 0.0724502 (* 0.3 = 0.0217351 loss)
I0512 13:13:17.976375 28646 solver.cpp:406] Test net output #6: loss3_accuracy_top1 = 0.984453
I0512 13:13:17.976383 28646 solver.cpp:406] Test net output #7: loss3_accuracy_top5 = 0.999034
I0512 13:13:17.976399 28646 solver.cpp:406] Test net output #8: loss3_loss = 0.0748366 (* 1 = 0.0748366 loss)

However, when I use deploy file to test the valid data, I got really bad result as:
Accuracy is: 0.790346907994

If I change to googleNet, there is no such problem. Do you have any idea on it? Thanks a lot!

from google_inception_v3_for_caffe.

revilokeb avatar revilokeb commented on July 26, 2024

@elezengz maybe we should not derail the discussion in this issue too far away from test accuracy of inception v3 (maybe better open an issue in my inception resnet v2 rep for that topic or the rep where you got the inception bn from) - a few comments though: 1. caffe master has got a scale layer (which has got a scale_param) since end Jan 2016 (BVLC/caffe#3591). I dont know on which branch / commit your caffe has been built, maybe check that first. 2. which Inception-BN are you referring to, https://github.com/lim0606/caffe-googlenet-bn? If you look into the train_val.prototxt you will also find a scale layer there. 3. Accuracy: a bit hard to say without knowing your exact code / deploy, maybe you post it?

from google_inception_v3_for_caffe.

smichalowski avatar smichalowski commented on July 26, 2024

Yesterday I have stopped last training, lowered learning rate to 0.045, changed step_size to 32500 and resumed from iteration 1830000. After about 24h I got some nice results. They are not reaching these from paper however are far more better than previous. I will leave it for next few days and then try changing iter_size

I0512 21:54:39.445983 25908 solver.cpp:337] Iteration 1885000, Testing net (#0)
I0512 21:59:56.912246 25908 solver.cpp:404]     Test net output #0: acc/top-1 = 0.614998
I0512 21:59:56.912451 25908 solver.cpp:404]     Test net output #1: acc/top-5 = 0.836004
I0512 21:59:56.912477 25908 solver.cpp:404]     Test net output #2: loss = 1.67508 (* 1 = 1.67508 loss)

from google_inception_v3_for_caffe.

yuzcccc avatar yuzcccc commented on July 26, 2024

Hi, I have some experience when training the Inception-v2 model, maybe it help you,

  1. train the net with three softmax losses rather than a single one.
  2. reduce the learning rate more rapidly than google did!. In my opinion, the optimization can be finished within 30 epochs, and the the final learning rate is 1e-5. (gradually reduce the LR from 1e-4 to 1e-5, the loss is still drop and the val acc is still increase). Therefore, I recommend set the init LR to 0.045, and gamma to 0.96 and reduce the LR after every 1/6 epoch

from google_inception_v3_for_caffe.

elezengz avatar elezengz commented on July 26, 2024

@revilokeb Hi, thanks a lot for your reply. Yes, we can open another session on this discussion. I post my deploy file as follows
Head: name: "inception_bn"
input: "data"
input_shape {
dim: 1
dim: 3
dim: 227
dim: 227
}
Tail:
layer {
name: "prob"
type: "Softmax"
bottom: "loss3_classifier"
top: "prob"
}
I just use the deploy file they provided with train_val file. However, it seems that the predict accuracy is horrible low.

I use 328 * 328 as input. When I use 299 * 299 files, the result is the same. Thanks again!

from google_inception_v3_for_caffe.

elezengz avatar elezengz commented on July 26, 2024

@revilokeb for inception v2, I used 328 * 328, for inception bn, I used 256 * 256. thanks.

from google_inception_v3_for_caffe.

ck196 avatar ck196 commented on July 26, 2024

Do anybody have deploy.prototxt file for this network?
@smichalowski Can you share you deploy.prototxt file?

from google_inception_v3_for_caffe.

smichalowski avatar smichalowski commented on July 26, 2024

@monkeykju i dont have deploy.prototxt, i did not create it because of the lack of working trained model :-) but, it should be quite easy to modify train_val.prototxt to deploy.prototxt, please check this link

from google_inception_v3_for_caffe.

smichalowski avatar smichalowski commented on July 26, 2024

just updated repo with new train_val.prototxt that achieves expected results.

from google_inception_v3_for_caffe.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.