Light

senet_frn first stage bn or frn? about pytorch-filterresponsenormalizationlayer HOT 14 OPEN

Charles-Xie commented on August 21, 2024

senet_frn first stage bn or frn?

from pytorch-filterresponsenormalizationlayer.

Comments (14)

T1anZhenYu commented on August 21, 2024 1

If you have some update, please let me know. Thanks

from pytorch-filterresponsenormalizationlayer.

Charles-Xie commented on August 21, 2024

and could you please tell me if you can successfully reproduce the result? I tried it on cifar100 but FRN performs way worse than BN.

from pytorch-filterresponsenormalizationlayer.

yukkyo commented on August 21, 2024

@Charles-Xie
Thanks comment !
I am also suffering here...

When I first change BN of layer 0, accuracy and loss is more unstable.
So I'm wondering if I should change all BNs. (ex. if change BN in _make_layer(), very bad).

I've also done some experiments and none of them exceeded the base :(

from pytorch-filterresponsenormalizationlayer.

T1anZhenYu commented on August 21, 2024

So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo

from pytorch-filterresponsenormalizationlayer.

Charles-Xie commented on August 21, 2024

So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo

I am still trying FRN on cifar100 and imagenet. I find it performs slightly worse than bn on imagenet using batch size 32 per gpu. For cifar100 it performs much worse (about 5 points drop), I guess this is because the specific settings (learnable epsilon, learning rate warmup) needs to be finetuned according to datasets and network structures.

Any suggestions will be appreciated.

from pytorch-filterresponsenormalizationlayer.

T1anZhenYu commented on August 21, 2024

So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo

I am still trying FRN on cifar100 and imagenet. I find it performs slightly worse than bn on imagenet using batch size 32 per gpu. For cifar100 it performs much worse (about 5 points drop), I guess this is because the specific settings (learnable epsilon, learning rate warmup) needs to be finetuned according to datasets and network structures.

Any suggestions will be appreciated.

Thanks for sharing.
I have two questions.

Is epsilon learnable on Conv layer? I have tired both learnable and not learnable settings, but haven't saw any difference on cifar100.
What's your learning rate warmup mechanism? I am using cosine ramp-up schedule for 50 epochs in condition of 400 total epochs on cifar100. But I'm not sure whether it's the best setting.

from pytorch-filterresponsenormalizationlayer.

Charles-Xie commented on August 21, 2024

So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo

I am still trying FRN on cifar100 and imagenet. I find it performs slightly worse than bn on imagenet using batch size 32 per gpu. For cifar100 it performs much worse (about 5 points drop), I guess this is because the specific settings (learnable epsilon, learning rate warmup) needs to be finetuned according to datasets and network structures.
Any suggestions will be appreciated.

Thanks for sharing.
I have two questions.

Is epsilon learnable on Conv layer? I have tired both learnable and not learnable settings, but haven't saw any difference on cifar100.

What's your learning rate warmup mechanism? I am using cosine ramp-up schedule for 50 epochs in condition of 400 total epochs on cifar100. But I'm not sure whether it's the best setting.

For cifar100 I have tried learnable epsilon, and the improvement is about 1% compared with FRN without epsilon. But the result is still about 4% worse than BN.

I also tried learning rate warmup and the improvement is less than 1% for FRN. I use multistep learning rate as experiments on cifar100 in a lot of paper. I don't think learning rate schedule difference or warmup can explain the gap between FRN and BN. Still trying to find out another reason.

from pytorch-filterresponsenormalizationlayer.

T1anZhenYu commented on August 21, 2024

So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo

I am still trying FRN on cifar100 and imagenet. I find it performs slightly worse than bn on imagenet using batch size 32 per gpu. For cifar100 it performs much worse (about 5 points drop), I guess this is because the specific settings (learnable epsilon, learning rate warmup) needs to be finetuned according to datasets and network structures.
Any suggestions will be appreciated.

Thanks for sharing.
I have two questions.

Is epsilon learnable on Conv layer? I have tired both learnable and not learnable settings, but haven't saw any difference on cifar100.

What's your learning rate warmup mechanism? I am using cosine ramp-up schedule for 50 epochs in condition of 400 total epochs on cifar100. But I'm not sure whether it's the best setting.

For cifar100 I have tried learnable epsilon, and the improvement is about 1% compared with FRN without epsilon. But the result is still about 4% worse than BN.

I also tried learning rate warmup and the improvement is less than 1% for FRN. I use multistep learning rate as experiments on cifar100 in a lot of paper. I don't think learning rate schedule difference or warmup can explain the gap between FRN and BN. Still trying to find out another reason.

About the epsilon, I will try again and see the difference.
About the learning rate schedule. Are you using * Linear Multistep Schedule * ? In the paper, it's clearly said it uses cosine decay schedule. And in my experence, it really helps a lot. And without warmup, FRN doesn't even convergence. So I do think these two tricks make a difference.

We address this by using a ramp-up in the learning rate that slowly increases the learning rate from 0 to the peak value during an initial warmup phase. Since all our experiments use cosine learning rate decay schedule, we use a cosine ramp-up schedule as well.

from pytorch-filterresponsenormalizationlayer.

Charles-Xie commented on August 21, 2024

@T1anZhenYu Yes, linear multistep schedule.
I will try cosine learning rate and warmup for FRN later.
Can you tell me the results of FRN (under best condition) and BN in your experiments?
In my experiments, on CIFAR100, top5 error 25.7 for BN and 30.17 for FRN (when epsilon is not learnable); on ImageNet, top 5 error 24.3 for BN and 24.7 for FRN.

from pytorch-filterresponsenormalizationlayer.

Charles-Xie commented on August 21, 2024

@T1anZhenYu also it is mentioned in paper that

While training InceptionV3 and VGG-A, it was crucial to use learning rate rampup (refer Section 4.1) and learned epsilon (refer Section 3.3) for FRN to achieve peak performance. FRN underperformed other methods on InceptionV3 and failed to learn entirely on VGG-A without rampup. Other methods were not significantly affected.

So for resnet on ImageNet, warmup does not seems to be necessary. But for cifar this may be important (according to your information).

from pytorch-filterresponsenormalizationlayer.

T1anZhenYu commented on August 21, 2024

@T1anZhenYu Yes, linear multistep schedule.
I will try cosine learning rate and warmup for FRN later.
Can you tell me the results of FRN (under best condition) and BN in your experiments?
In my experiments, on CIFAR100, top5 error 25.7 for BN and 30.17 for FRN (when epsilon is not learnable); on ImageNet, top 5 error 24.3 for BN and 24.7 for FRN.

I haven't tried ImageNet. And I only recorded top1 accuracy on resnet20. Will this help?
top1 accuracy:

数据集	FRN	BN
cifar10	92.40%	92.10%
cifar100	65.0%	68.70%
Besides, thanks for quoting the paper. And I think it's better to add warmup.

from pytorch-filterresponsenormalizationlayer.

Charles-Xie commented on August 21, 2024

@T1anZhenYu Yes, linear multistep schedule.
I will try cosine learning rate and warmup for FRN later.
Can you tell me the results of FRN (under best condition) and BN in your experiments?
In my experiments, on CIFAR100, top5 error 25.7 for BN and 30.17 for FRN (when epsilon is not learnable); on ImageNet, top 5 error 24.3 for BN and 24.7 for FRN.

I haven't tried ImageNet. And I only recorded top1 accuracy on resnet20. Will this help?
top1 accuracy:

数据集 FRN BN

cifar10 92.40% 92.10%

cifar100 65.0% 68.70%

Besides, thanks for quoting the paper. And I think it's better to add warmup.

Thanks for the result on cifar :) It helps.

from pytorch-filterresponsenormalizationlayer.

Charles-Xie commented on August 21, 2024

@T1anZhenYu sure

from pytorch-filterresponsenormalizationlayer.

T1anZhenYu commented on August 21, 2024

About the result, I made a mistake. Sorry

数据集	FRN	BN
cifar10	91.40%	92.10%
cifar100	65.0%	68.70%
@Charles-Xie

from pytorch-filterresponsenormalizationlayer.

Related Issues (4)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.