Coder Social home page Coder Social logo

Comments (14)

T1anZhenYu avatar T1anZhenYu commented on May 26, 2024 1

If you have some update, please let me know. Thanks

from pytorch-filterresponsenormalizationlayer.

Charles-Xie avatar Charles-Xie commented on May 26, 2024

and could you please tell me if you can successfully reproduce the result? I tried it on cifar100 but FRN performs way worse than BN.

from pytorch-filterresponsenormalizationlayer.

yukkyo avatar yukkyo commented on May 26, 2024

@Charles-Xie
Thanks comment !
I am also suffering here...

When I first change BN of layer 0, accuracy and loss is more unstable.
So I'm wondering if I should change all BNs. (ex. if change BN in _make_layer(), very bad).

I've also done some experiments and none of them exceeded the base :(

from pytorch-filterresponsenormalizationlayer.

T1anZhenYu avatar T1anZhenYu commented on May 26, 2024

So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo

from pytorch-filterresponsenormalizationlayer.

Charles-Xie avatar Charles-Xie commented on May 26, 2024

So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo

I am still trying FRN on cifar100 and imagenet. I find it performs slightly worse than bn on imagenet using batch size 32 per gpu. For cifar100 it performs much worse (about 5 points drop), I guess this is because the specific settings (learnable epsilon, learning rate warmup) needs to be finetuned according to datasets and network structures.

Any suggestions will be appreciated.

from pytorch-filterresponsenormalizationlayer.

T1anZhenYu avatar T1anZhenYu commented on May 26, 2024

So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo

I am still trying FRN on cifar100 and imagenet. I find it performs slightly worse than bn on imagenet using batch size 32 per gpu. For cifar100 it performs much worse (about 5 points drop), I guess this is because the specific settings (learnable epsilon, learning rate warmup) needs to be finetuned according to datasets and network structures.

Any suggestions will be appreciated.

Thanks for sharing.
I have two questions.

  • Is epsilon learnable on Conv layer? I have tired both learnable and not learnable settings, but haven't saw any difference on cifar100.
  • What's your learning rate warmup mechanism? I am using cosine ramp-up schedule for 50 epochs in condition of 400 total epochs on cifar100. But I'm not sure whether it's the best setting.

from pytorch-filterresponsenormalizationlayer.

Charles-Xie avatar Charles-Xie commented on May 26, 2024

So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo

I am still trying FRN on cifar100 and imagenet. I find it performs slightly worse than bn on imagenet using batch size 32 per gpu. For cifar100 it performs much worse (about 5 points drop), I guess this is because the specific settings (learnable epsilon, learning rate warmup) needs to be finetuned according to datasets and network structures.
Any suggestions will be appreciated.

Thanks for sharing.
I have two questions.

  • Is epsilon learnable on Conv layer? I have tired both learnable and not learnable settings, but haven't saw any difference on cifar100.
  • What's your learning rate warmup mechanism? I am using cosine ramp-up schedule for 50 epochs in condition of 400 total epochs on cifar100. But I'm not sure whether it's the best setting.

For cifar100 I have tried learnable epsilon, and the improvement is about 1% compared with FRN without epsilon. But the result is still about 4% worse than BN.

I also tried learning rate warmup and the improvement is less than 1% for FRN. I use multistep learning rate as experiments on cifar100 in a lot of paper. I don't think learning rate schedule difference or warmup can explain the gap between FRN and BN. Still trying to find out another reason.

from pytorch-filterresponsenormalizationlayer.

T1anZhenYu avatar T1anZhenYu commented on May 26, 2024

So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo

I am still trying FRN on cifar100 and imagenet. I find it performs slightly worse than bn on imagenet using batch size 32 per gpu. For cifar100 it performs much worse (about 5 points drop), I guess this is because the specific settings (learnable epsilon, learning rate warmup) needs to be finetuned according to datasets and network structures.
Any suggestions will be appreciated.

Thanks for sharing.
I have two questions.

  • Is epsilon learnable on Conv layer? I have tired both learnable and not learnable settings, but haven't saw any difference on cifar100.
  • What's your learning rate warmup mechanism? I am using cosine ramp-up schedule for 50 epochs in condition of 400 total epochs on cifar100. But I'm not sure whether it's the best setting.

For cifar100 I have tried learnable epsilon, and the improvement is about 1% compared with FRN without epsilon. But the result is still about 4% worse than BN.

I also tried learning rate warmup and the improvement is less than 1% for FRN. I use multistep learning rate as experiments on cifar100 in a lot of paper. I don't think learning rate schedule difference or warmup can explain the gap between FRN and BN. Still trying to find out another reason.

About the epsilon, I will try again and see the difference.
About the learning rate schedule. Are you using * Linear Multistep Schedule * ? In the paper, it's clearly said it uses cosine decay schedule. And in my experence, it really helps a lot. And without warmup, FRN doesn't even convergence. So I do think these two tricks make a difference.

We address this by using a ramp-up in the learning rate that slowly increases the learning rate from 0 to the peak value during an initial warmup phase. Since all our experiments use cosine learning rate decay schedule, we use a cosine ramp-up schedule as well.

from pytorch-filterresponsenormalizationlayer.

Charles-Xie avatar Charles-Xie commented on May 26, 2024

@T1anZhenYu Yes, linear multistep schedule.
I will try cosine learning rate and warmup for FRN later.
Can you tell me the results of FRN (under best condition) and BN in your experiments?
In my experiments, on CIFAR100, top5 error 25.7 for BN and 30.17 for FRN (when epsilon is not learnable); on ImageNet, top 5 error 24.3 for BN and 24.7 for FRN.

from pytorch-filterresponsenormalizationlayer.

Charles-Xie avatar Charles-Xie commented on May 26, 2024

@T1anZhenYu also it is mentioned in paper that

While training InceptionV3 and VGG-A, it was crucial to use learning rate rampup (refer Section 4.1) and learned epsilon (refer Section 3.3) for FRN to achieve peak performance. FRN underperformed other methods on InceptionV3 and failed to learn entirely on VGG-A without rampup. Other methods were not significantly affected.

So for resnet on ImageNet, warmup does not seems to be necessary. But for cifar this may be important (according to your information).

from pytorch-filterresponsenormalizationlayer.

T1anZhenYu avatar T1anZhenYu commented on May 26, 2024

@T1anZhenYu Yes, linear multistep schedule.
I will try cosine learning rate and warmup for FRN later.
Can you tell me the results of FRN (under best condition) and BN in your experiments?
In my experiments, on CIFAR100, top5 error 25.7 for BN and 30.17 for FRN (when epsilon is not learnable); on ImageNet, top 5 error 24.3 for BN and 24.7 for FRN.

I haven't tried ImageNet. And I only recorded top1 accuracy on resnet20. Will this help?
top1 accuracy:

数据集 FRN BN
cifar10 92.40% 92.10%
cifar100 65.0% 68.70%
Besides, thanks for quoting the paper. And I think it's better to add warmup.

from pytorch-filterresponsenormalizationlayer.

Charles-Xie avatar Charles-Xie commented on May 26, 2024

@T1anZhenYu Yes, linear multistep schedule.
I will try cosine learning rate and warmup for FRN later.
Can you tell me the results of FRN (under best condition) and BN in your experiments?
In my experiments, on CIFAR100, top5 error 25.7 for BN and 30.17 for FRN (when epsilon is not learnable); on ImageNet, top 5 error 24.3 for BN and 24.7 for FRN.

I haven't tried ImageNet. And I only recorded top1 accuracy on resnet20. Will this help?
top1 accuracy:

数据集 FRN BN
cifar10 92.40% 92.10%
cifar100 65.0% 68.70%
Besides, thanks for quoting the paper. And I think it's better to add warmup.

Thanks for the result on cifar :) It helps.

from pytorch-filterresponsenormalizationlayer.

Charles-Xie avatar Charles-Xie commented on May 26, 2024

@T1anZhenYu sure

from pytorch-filterresponsenormalizationlayer.

T1anZhenYu avatar T1anZhenYu commented on May 26, 2024

About the result, I made a mistake. Sorry

数据集 FRN BN
cifar10 91.40% 92.10%
cifar100 65.0% 68.70%
@Charles-Xie

from pytorch-filterresponsenormalizationlayer.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.