Comments (14)
If you have some update, please let me know. Thanks
from pytorch-filterresponsenormalizationlayer.
and could you please tell me if you can successfully reproduce the result? I tried it on cifar100 but FRN performs way worse than BN.
from pytorch-filterresponsenormalizationlayer.
@Charles-Xie
Thanks comment !
I am also suffering here...
When I first change BN of layer 0, accuracy and loss is more unstable.
So I'm wondering if I should change all BNs. (ex. if change BN in _make_layer()
, very bad).
I've also done some experiments and none of them exceeded the base :(
from pytorch-filterresponsenormalizationlayer.
So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo
from pytorch-filterresponsenormalizationlayer.
So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo
I am still trying FRN on cifar100 and imagenet. I find it performs slightly worse than bn on imagenet using batch size 32 per gpu. For cifar100 it performs much worse (about 5 points drop), I guess this is because the specific settings (learnable epsilon, learning rate warmup) needs to be finetuned according to datasets and network structures.
Any suggestions will be appreciated.
from pytorch-filterresponsenormalizationlayer.
So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo
I am still trying FRN on cifar100 and imagenet. I find it performs slightly worse than bn on imagenet using batch size 32 per gpu. For cifar100 it performs much worse (about 5 points drop), I guess this is because the specific settings (learnable epsilon, learning rate warmup) needs to be finetuned according to datasets and network structures.
Any suggestions will be appreciated.
Thanks for sharing.
I have two questions.
- Is epsilon learnable on Conv layer? I have tired both learnable and not learnable settings, but haven't saw any difference on cifar100.
- What's your learning rate warmup mechanism? I am using cosine ramp-up schedule for 50 epochs in condition of 400 total epochs on cifar100. But I'm not sure whether it's the best setting.
from pytorch-filterresponsenormalizationlayer.
So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo
I am still trying FRN on cifar100 and imagenet. I find it performs slightly worse than bn on imagenet using batch size 32 per gpu. For cifar100 it performs much worse (about 5 points drop), I guess this is because the specific settings (learnable epsilon, learning rate warmup) needs to be finetuned according to datasets and network structures.
Any suggestions will be appreciated.Thanks for sharing.
I have two questions.
- Is epsilon learnable on Conv layer? I have tired both learnable and not learnable settings, but haven't saw any difference on cifar100.
- What's your learning rate warmup mechanism? I am using cosine ramp-up schedule for 50 epochs in condition of 400 total epochs on cifar100. But I'm not sure whether it's the best setting.
For cifar100 I have tried learnable epsilon, and the improvement is about 1% compared with FRN without epsilon. But the result is still about 4% worse than BN.
I also tried learning rate warmup and the improvement is less than 1% for FRN. I use multistep learning rate as experiments on cifar100 in a lot of paper. I don't think learning rate schedule difference or warmup can explain the gap between FRN and BN. Still trying to find out another reason.
from pytorch-filterresponsenormalizationlayer.
So have you reproduce FRN on cifar100?@Charles-Xie @yukkyo
I am still trying FRN on cifar100 and imagenet. I find it performs slightly worse than bn on imagenet using batch size 32 per gpu. For cifar100 it performs much worse (about 5 points drop), I guess this is because the specific settings (learnable epsilon, learning rate warmup) needs to be finetuned according to datasets and network structures.
Any suggestions will be appreciated.Thanks for sharing.
I have two questions.
- Is epsilon learnable on Conv layer? I have tired both learnable and not learnable settings, but haven't saw any difference on cifar100.
- What's your learning rate warmup mechanism? I am using cosine ramp-up schedule for 50 epochs in condition of 400 total epochs on cifar100. But I'm not sure whether it's the best setting.
For cifar100 I have tried learnable epsilon, and the improvement is about 1% compared with FRN without epsilon. But the result is still about 4% worse than BN.
I also tried learning rate warmup and the improvement is less than 1% for FRN. I use multistep learning rate as experiments on cifar100 in a lot of paper. I don't think learning rate schedule difference or warmup can explain the gap between FRN and BN. Still trying to find out another reason.
About the epsilon, I will try again and see the difference.
About the learning rate schedule. Are you using * Linear Multistep Schedule * ? In the paper, it's clearly said it uses cosine decay schedule. And in my experence, it really helps a lot. And without warmup, FRN doesn't even convergence. So I do think these two tricks make a difference.
We address this by using a ramp-up in the learning rate that slowly increases the learning rate from 0 to the peak value during an initial warmup phase. Since all our experiments use cosine learning rate decay schedule, we use a cosine ramp-up schedule as well.
from pytorch-filterresponsenormalizationlayer.
@T1anZhenYu Yes, linear multistep schedule.
I will try cosine learning rate and warmup for FRN later.
Can you tell me the results of FRN (under best condition) and BN in your experiments?
In my experiments, on CIFAR100, top5 error 25.7 for BN and 30.17 for FRN (when epsilon is not learnable); on ImageNet, top 5 error 24.3 for BN and 24.7 for FRN.
from pytorch-filterresponsenormalizationlayer.
@T1anZhenYu also it is mentioned in paper that
While training InceptionV3 and VGG-A, it was crucial to use learning rate rampup (refer Section 4.1) and learned epsilon (refer Section 3.3) for FRN to achieve peak performance. FRN underperformed other methods on InceptionV3 and failed to learn entirely on VGG-A without rampup. Other methods were not significantly affected.
So for resnet on ImageNet, warmup does not seems to be necessary. But for cifar this may be important (according to your information).
from pytorch-filterresponsenormalizationlayer.
@T1anZhenYu Yes, linear multistep schedule.
I will try cosine learning rate and warmup for FRN later.
Can you tell me the results of FRN (under best condition) and BN in your experiments?
In my experiments, on CIFAR100, top5 error 25.7 for BN and 30.17 for FRN (when epsilon is not learnable); on ImageNet, top 5 error 24.3 for BN and 24.7 for FRN.
I haven't tried ImageNet. And I only recorded top1 accuracy on resnet20. Will this help?
top1 accuracy:
数据集 | FRN | BN |
---|---|---|
cifar10 | 92.40% | 92.10% |
cifar100 | 65.0% | 68.70% |
Besides, thanks for quoting the paper. And I think it's better to add warmup. |
from pytorch-filterresponsenormalizationlayer.
@T1anZhenYu Yes, linear multistep schedule.
I will try cosine learning rate and warmup for FRN later.
Can you tell me the results of FRN (under best condition) and BN in your experiments?
In my experiments, on CIFAR100, top5 error 25.7 for BN and 30.17 for FRN (when epsilon is not learnable); on ImageNet, top 5 error 24.3 for BN and 24.7 for FRN.I haven't tried ImageNet. And I only recorded top1 accuracy on resnet20. Will this help?
top1 accuracy:
数据集 FRN BN cifar10 92.40% 92.10% cifar100 65.0% 68.70% Besides, thanks for quoting the paper. And I think it's better to add warmup.
Thanks for the result on cifar :) It helps.
from pytorch-filterresponsenormalizationlayer.
@T1anZhenYu sure
from pytorch-filterresponsenormalizationlayer.
About the result, I made a mistake. Sorry
数据集 | FRN | BN |
---|---|---|
cifar10 | 91.40% | 92.10% |
cifar100 | 65.0% | 68.70% |
@Charles-Xie |
from pytorch-filterresponsenormalizationlayer.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-filterresponsenormalizationlayer.