Coder Social home page Coder Social logo

Comments (14)

borisgin avatar borisgin commented on September 2, 2024

Code is correct. We are working on the paper update :)

from nvcaffe.

qiuxin2012 avatar qiuxin2012 commented on September 2, 2024

@borisgin Another miss match.
In https://github.com/borisgin/nvcaffe-0.16/blob/caffe-0.16/models/bvlc_alexnet/solver_8K.prototxt
I found you set rampup_interval: 600, so the warmup for 8k batchsize is 4 epochs.
But your paper(Table 6) said, warm up is 8 epochs.
Something wrong with the paper?

from nvcaffe.

borisgin avatar borisgin commented on September 2, 2024

from nvcaffe.

hiyijian avatar hiyijian commented on September 2, 2024

@borisfom
Maybe another mismatch: wgrad_norm in your code is computed from "g + beta* w"(it is computed after regularization), not exactly the same as paper's "g".

from nvcaffe.

hiyijian avatar hiyijian commented on September 2, 2024

And Maybe weight_decay in "rate = gw_ratio * w_norm / (wgrad_norm + weight_decay * w_norm)" should be global weight_decay * local_decay?

from nvcaffe.

borisgin avatar borisgin commented on September 2, 2024

It's interesting idea! How would you tune local decay automatically? In our experiments we used only global weight decay which was fixed.

from nvcaffe.

hiyijian avatar hiyijian commented on September 2, 2024

Thanks. I use fixed local decay. What do you think about the first miss match I mentioned above, please?

from nvcaffe.

borisgin avatar borisgin commented on September 2, 2024

For GPU branch, weight update, moment , and regularization are fused in to one kernel. So the Regularize() function skips this stage.

from nvcaffe.

hiyijian avatar hiyijian commented on September 2, 2024

wow,I didn't notice that. Thanks

from nvcaffe.

hiyijian avatar hiyijian commented on September 2, 2024

hi @borisgin , Did you use any auto local decay when training ImageNet, along with LARS solver? I found auto local decay feature in your code, But no any clue in the paper. Thanks

from nvcaffe.

borisgin avatar borisgin commented on September 2, 2024

My branch has a lot of experimental knobs which I don’t put into official nvcaffe branch since they did not prove themselves yet . Also some features were developed after we submitted paper :). Code supports both momentum and weight decay adjustment policies (currently this is only “poly” or “fixed”) which I used for my experiments with Neumann optimizer

from nvcaffe.

SeaOfOcean avatar SeaOfOcean commented on September 2, 2024

@borisgin How is your experiments with Neumann optimizer? Can I see the code of Neumann optimizer

from nvcaffe.

borisgin avatar borisgin commented on September 2, 2024

I experimented with simplifed Neuman optimizer ( wo external loop) . I found that it behaves very similar to the standard SGD with momentum. So we decided not to add this optimizer to nvidia/caffe

from nvcaffe.

Tron-x avatar Tron-x commented on September 2, 2024

你好,请问在用多卡增大batchsize训练的时候,
image
这个batchsize大小是多卡总的batchsize还是单卡的batchsize?

from nvcaffe.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.