In your paper, I find <div class="snippet-clipboard-content notranslate position-r

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Yesterday I experimented with short ramp-upBoris <span class="em

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

A mismatch found in your code and paper. about nvcaffe HOT 14 OPEN

qiuxin2012 commented on September 2, 2024

A mismatch found in your code and paper.

from nvcaffe.

Comments (14)

borisgin commented on September 2, 2024

Code is correct. We are working on the paper update :)

from nvcaffe.

qiuxin2012 commented on September 2, 2024

@borisgin Another miss match.
In https://github.com/borisgin/nvcaffe-0.16/blob/caffe-0.16/models/bvlc_alexnet/solver_8K.prototxt
I found you set rampup_interval: 600, so the warmup for 8k batchsize is 4 epochs.
But your paper(Table 6) said, warm up is 8 epochs.
Something wrong with the paper?

from nvcaffe.

borisgin commented on September 2, 2024

Yesterday I experimented with short ramp-up Boris

…

On Aug 31, 2017, at 11:35 PM, Xin Qiu ***@***.***> wrote: @borisgin Another miss match. In https://github.com/borisgin/nvcaffe-0.16/blob/caffe-0.16/models/bvlc_alexnet/solver_8K.prototxt I found you set rampup_interval: 600, so the warmup for 8k batchsize is 4 epochs. But your paper(Table 6) said, warm up is 8 epochs. Something wrong with the paper? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

from nvcaffe.

hiyijian commented on September 2, 2024

@borisfom
Maybe another mismatch: wgrad_norm in your code is computed from "g + beta* w"(it is computed after regularization), not exactly the same as paper's "g".

from nvcaffe.

hiyijian commented on September 2, 2024

And Maybe weight_decay in "rate = gw_ratio * w_norm / (wgrad_norm + weight_decay * w_norm)" should be global weight_decay * local_decay?

from nvcaffe.

borisgin commented on September 2, 2024

It's interesting idea! How would you tune local decay automatically? In our experiments we used only global weight decay which was fixed.

from nvcaffe.

hiyijian commented on September 2, 2024

Thanks. I use fixed local decay. What do you think about the first miss match I mentioned above, please?

from nvcaffe.

borisgin commented on September 2, 2024

For GPU branch, weight update, moment , and regularization are fused in to one kernel. So the Regularize() function skips this stage.

from nvcaffe.

hiyijian commented on September 2, 2024

wow，I didn't notice that. Thanks

from nvcaffe.

hiyijian commented on September 2, 2024

hi @borisgin , Did you use any auto local decay when training ImageNet, along with LARS solver? I found auto local decay feature in your code, But no any clue in the paper. Thanks

from nvcaffe.

borisgin commented on September 2, 2024

My branch has a lot of experimental knobs which I don’t put into official nvcaffe branch since they did not prove themselves yet . Also some features were developed after we submitted paper :). Code supports both momentum and weight decay adjustment policies (currently this is only “poly” or “fixed”) which I used for my experiments with Neumann optimizer

from nvcaffe.

SeaOfOcean commented on September 2, 2024

@borisgin How is your experiments with Neumann optimizer? Can I see the code of Neumann optimizer

from nvcaffe.

borisgin commented on September 2, 2024

I experimented with simplifed Neuman optimizer ( wo external loop) . I found that it behaves very similar to the standard SGD with momentum. So we decided not to add this optimizer to nvidia/caffe

from nvcaffe.

Tron-x commented on September 2, 2024

你好，请问在用多卡增大batchsize训练的时候，

这个batchsize大小是多卡总的batchsize还是单卡的batchsize？

from nvcaffe.

A mismatch found in your code and paper. about nvcaffe HOT 14 OPEN

Comments (14)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent