psclab-asu / learning-in-the-frequency-domain Goto Github PK

Python 88.76% Shell 0.17% Dockerfile 0.04% C++ 3.73% Cuda 7.30%

learning-in-the-frequency-domain's Introduction

Notice: This repository is deprecated, please use https://github.com/calmevtime/DCTNet.

Learning in the Frequency Domain

This is the source code for the CVPR'20 paper entitled "Learning in the Frequency Domain" (https://arxiv.org/abs/2002.12416).

Highlights

We propose a method of learning in the frequency domain (using DCT coefficients as input), which requires little modification to the existing CNN models that take RGB input. We validate our method on ResNet50 and MobileNetV2 for the image classification task and Mask R-CNN for the instance segmentation task.
We show that learning in the frequency domain better preserves image information in the pre-processing stage than the conventional spatial downsampling approach (spatially resizing the images to 224×224, the default input size of most CNN models) and consequently achieves improved accuracy, i.e., +1.41% on ResNet-50 and +0.66% on MobileNetV2 for the ImageNet classification task, +0.8% on Mask R-CNN for both object detection and instance segmentation tasks.
We analyze the spectral bias from the frequency perspective and show that the CNN models are more sensitive to low-frequency channels than high-frequency channels, similar to the human visual system (HVS).
We propose a learning-based dynamic channel selection method to identify the trivial frequency components for static removal during inference. Experiment results on ResNet-50 show that one can prune up to 87.5% of the frequency channels using the proposed channel selection method with no or little accuracy degradation in the ImageNet classification task.
To the best of our knowledge, this is the first work that explores learning in the frequency domain for object detection and instance segmentation. Experiment results on Mask R-CNN show that learning in the frequency domain can achieve a 0.8% average precision improvement for the instance segmentation task on the COCO dataset.

Please refer to the image classfication and instance segmentation sections for more details.

learning-in-the-frequency-domain's People

Contributors

Stargazers

Watchers

Forkers

alwc youtang1993 wdayang dbofseuofhust xiongweiwu soonhwan-kwon dreamkily dennistang742

learning-in-the-frequency-domain's Issues

Such a bad open source

No SE module and gumble softmax sampling code.

The mobilenetv2 cannot be evaluated with your pretrained models.

Hi! Thank you for your great work!
I have evaluated the ResNetDCT_Upscaled_Static with your pretrained parameters successfully.
But I cannot evaluate the "mobilenetv2dct_upscaled_subset" with your pretrained parameters (mobilenetv2dct_upscaled_static_24/32). Because the parameters do not match the model you define.
Actually, there is not anyone model matching with your pretrained parameters.
Did I miss something? I'm looking forward to your reply！

RuntimeError: Error(s) in loading state_dict for MobileNetV2DCT_Upscaled_Subset:
Missing key(s) in state_dict: "upconv_y.0.weight", "upconv_y.1.weight", "upconv_y.1.bias", "upconv_y.1.running_mean", "upconv_y.1.running_var", "upconv_cb.0.weight", "upconv_cb.2.weight", "upconv_cb.2.bias", "upconv_cb.2.running_mean", "upconv_cb.2.running_var", "upconv_cr.0.weight", "upconv_cr.2.weight", "upconv_cr.2.bias", "upconv_cr.2.running_mean", "upconv_cr.2.running_var".
size mismatch for features.0.conv.0.weight: copying a param with shape torch.Size([24, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 1, 3, 3]).
size mismatch for features.0.conv.1.weight: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for features.0.conv.1.bias: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for features.0.conv.1.running_mean: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for features.0.conv.1.running_var: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for features.0.conv.3.weight: copying a param with shape torch.Size([16, 24, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 32, 1, 1]).

Calculate mean and std

@calmevtime @renfengbo
Hi, thanks for your great work, may I ask how do you get train_upscaled_static_std and train_upscaled_static_std these two values defined in /data/work/supp/classification/datasets/init.py ?

I sample 10w images (448*448) from Imagenet, and use the following transform to calculate mean and std, but I get quite different results (for example, the first number in mean is -6.0, while it is -8.8 in init.py) .

mean:

array([-6.03033250e+00,  1.29993835e-02,  1.49537622e-02,  1.51799573e-02,
       -1.13648491e-04,  1.54878503e-02,  1.53720154e-02,  1.53556921e-02,
        2.81915356e-02,  1.60401630e-04, -4.44463577e-04, -1.01026878e-04,
        5.11066914e-05, -1.98013484e-06, -3.41422224e-05, -2.58331299e-07,
        1.10180435e-03,  2.21792049e-04,  1.39479723e-04, -8.06250381e-05,
       -2.07372379e-04, -1.05006399e-04, -5.55165100e-05, -6.47384501e-05,
        2.58280060e-03, -1.80571003e-04,  1.18345385e-04,  1.12984667e-04,
       -9.14287949e-05,  4.99107647e-05,  8.15401077e-05,  9.16138768e-06,
       -2.27722988e-04,  1.57206738e-02,  1.47642676e-02,  1.54191162e-02,
       -4.14411783e-05,  1.51528882e-02,  1.46344177e-02,  1.49200305e-02,
        6.63593979e-04,  4.27199173e-05,  1.16677284e-04,  6.97712183e-06,
        3.13424659e-05, -1.30740047e-05,  4.64571810e-05, -1.42793226e-05,
        9.56597519e-05, -4.12341833e-05, -5.65912151e-05, -4.26276398e-05,
        3.67857361e-05, -2.01084042e-05, -1.99776542e-05, -4.22449780e-05,
        2.81112843e-04, -3.76434875e-05,  6.74299002e-05, -1.70663524e-05,
        7.03380299e-05, -4.41773653e-05, -1.28061640e-05,  2.91422606e-05,
       -4.79707900e+01, -1.56035420e-01,  1.84412866e-02, -1.81954609e-01,
        8.18166809e-04, -2.88985859e-01,  1.66051001e-02, -8.56152500e-01,
        2.55319219e-01,  6.34916153e-04, -6.54642410e-04, -3.67918243e-04,
       -1.46316862e-04,  4.28928223e-04,  8.34166408e-06,  6.26668053e-04,
        3.02419678e-03,  3.90776024e-04,  6.26678009e-04, -2.69588165e-04,
       -4.69866219e-04,  1.35695171e-04, -7.54751015e-05,  2.07719803e-04,
        2.63585474e-02, -4.98884430e-04,  7.94840765e-05,  4.33962746e-04,
       -9.15148163e-05, -4.85011435e-05,  1.71204340e-05,  7.87531233e-05,
        3.61622124e-04,  1.60066968e-02,  1.53226550e-02,  1.57744312e-02,
       -1.57388401e-04,  1.57735291e-02,  1.55607458e-02,  1.59658862e-02,
        7.61185242e-03,  7.60141182e-05,  1.83090401e-04,  2.97386026e-05,
        1.29148855e-04, -2.14893079e-05, -8.30834389e-05, -2.46814802e-06,
       -4.37176418e-05, -7.39859772e-05,  1.27551041e-04,  1.38395560e-05,
        2.68336916e-05,  3.96110487e-05,  4.98725027e-06,  5.02774382e-05,
        2.35398788e-03,  4.50284004e-05,  4.04752493e-05, -7.75732183e-05,
       -3.65114021e-05,  4.00702047e-05, -1.35714030e-05, -6.49227858e-06,
       -4.03517225e+01, -1.11374043e-01,  1.59219690e-02, -1.36966182e-01,
        1.23506355e-05, -2.23436934e-01,  1.54884302e-02, -6.65007500e-01,
        4.89485303e-02,  4.99520540e-05,  2.91033287e-04,  2.53711338e-04,
        7.27137375e-05,  1.61097932e-05, -3.57716489e-05,  4.53832970e-04,
       -4.07430267e-04, -1.03491755e-04, -1.29601517e-04,  6.75728321e-05,
        1.41087322e-04,  3.39652328e-04,  2.33641863e-05,  8.59971695e-04,
        5.30565369e-03,  3.57365084e-05, -7.69644356e-05, -5.08962917e-05,
       -2.04017639e-05,  4.38169384e-05, -6.81887770e-05,  2.54687166e-05,
        1.86980667e-04,  1.57325134e-02,  1.46990674e-02,  1.51832434e-02,
        3.39924145e-05,  1.47439392e-02,  1.39994238e-02,  1.47644763e-02,
        1.93976776e-03,  6.48592949e-06, -5.98660469e-05,  7.06000447e-06,
       -1.71651232e-05,  1.77008224e-05,  1.34183979e-05, -1.78827178e-05,
        6.76107407e-07,  1.38425112e-05,  3.02519321e-05,  4.65274429e-05,
       -2.75446343e-05,  3.41837192e-05,  1.75191140e-05,  1.25331631e-04,
        3.84187164e-04,  2.43143845e-05, -3.01020527e-05,  7.12374985e-06,
       -7.48400331e-06,  5.84180713e-06,  1.81409192e-05,  3.42158675e-06])

std

array([ 82.38169827,  11.38608322,   5.81470583,   3.52816731,
         2.28901719,   1.49236127,   1.08501744,   0.93430705,
        11.9924049 ,   5.79940342,   3.82653512,   2.57590202,
         1.75848328,   1.19242361,   0.89143787,   0.78188766,
         6.17919321,   3.91159174,   2.92009203,   2.12184516,
         1.50184193,   1.0502063 ,   0.80137738,   0.71105992,
         3.79255235,   2.68489233,   2.16317836,   1.67390197,
         1.2370023 ,   0.89879742,   0.70613112,   0.63874275,
         2.4841856 ,   1.85924305,   1.55564345,   1.25683596,
         0.97185804,   0.74702268,   0.61385155,   0.57088932,
         1.62551123,   1.27050052,   1.09685006,   0.92327635,
         0.7542027 ,   0.62263785,   0.54023429,   0.51832272,
         1.1820139 ,   0.95243114,   0.8406118 ,   0.73182664,
         0.6246161 ,   0.54960161,   0.49839851,   0.48895034,
         1.00303605,   0.82013756,   0.73174628,   0.6499707 ,
         0.57257392,   0.51675503,   0.47800795,   0.47636717,
       259.36079478,  38.44986519,  19.16965258,  10.89434406,
         6.43545545,   3.66728131,   2.2594354 ,   1.67022082,
        40.60415659,  19.00020315,  11.99398764,   7.44093844,
         4.56251835,   2.660099  ,   1.66803618,   1.24568483,
        20.39905466,  12.24901995,   8.65123948,   5.75999435,
         3.63950595,   2.17124496,   1.38173622,   1.04991463,
        11.63568499,   7.69741319,   5.84057403,   4.13102437,
         2.70702985,   1.66560499,   1.09088518,   0.85387174,
         6.88068855,   4.7445341 ,   3.71515009,   2.7268528 ,
         1.85518841,   1.19665103,   0.82843158,   0.68215483,
         3.91339393,   2.77513896,   2.22651773,   1.68565288,
         1.20127373,   0.83837315,   0.63520696,   0.56045151,
         2.40690855,   1.73863335,   1.41733271,   1.10696651,
         0.83259191,   0.63906468,   0.53207093,   0.49742473,
         1.73745029,   1.28346151,   1.06528646,   0.85709265,
         0.67805348,   0.5552189 ,   0.4881204 ,   0.47174586,
       122.72236084,  15.16835772,   7.08301888,   3.80431485,
         2.17961708,   1.25995895,   0.8367607 ,   0.77972808,
        16.04436612,   7.17905356,   4.3238403 ,   2.56993457,
         1.55470093,   0.94589099,   0.66617143,   0.58821462,
         7.58912016,   4.44666898,   3.04147918,   1.96773926,
         1.24167036,   0.79642635,   0.58626616,   0.53091011,
         4.13914657,   2.71065144,   2.01906855,   1.41162135,
         0.94801087,   0.65401318,   0.51175742,   0.48173046,
         2.3989198 ,   1.66442266,   1.29721596,   0.96672071,
         0.70264342,   0.53906953,   0.45475156,   0.44528184,
         1.38069485,   1.01322971,   0.83189345,   0.67026537,
         0.54118055,   0.46403401,   0.41805331,   0.42281751,
         0.90856605,   0.70672666,   0.60918523,   0.52767384,
         0.45933229,   0.42762222,   0.40134326,   0.41231353,
         0.71600727,   0.58663444,   0.52283646,   0.47365366,
         0.4332441 ,   0.41275816,   0.39301931,   0.40753712])

SE_ResNet50DCT() is not defined

Hi! The function SE_ResNet50DCT() called in classification/models/imagenet/resnet.py/line479 seems to be not defined. I wonder where I can find it?

Request for pre-trained model

Thanks for your great work, may I ask for the pre-trained model on ImageNet?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.