I see the authoe's code, I find his WSconv2d pad_mod is 'same'. Pytorch's conv2d dono't have pad_mode, and I think your padding should greater 0, but I find your padding always be 0. I want to know why?
I see you train.py your learning rate is constant, why?
Thank you!
Hi, noticed that the AveragePool ('pool' layer) is not used in forward function.
Instead, forward uses torch.mean. Removing the layer doesn't change pooling behavior.
I tried using this model as a feature extractor and was a bit confused for a moment.
Thanks for the great work on the pytorch implementation of NFNet! The accuracies achieved by this implementation are pretty impressive also and I am wondering if these training results were simply derived from the training script, that is, without data augmentation.