Coder Social home page Coder Social logo

taesiri / zoomisallyouneed Goto Github PK

View Code? Open in Web Editor NEW
36.0 5.0 2.0 112.31 MB

Official code and data for NeurIPS 2023 paper "ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification"

Home Page: https://taesiri.github.io/ZoomIsAllYouNeed/

License: MIT License

Python 0.88% Jupyter Notebook 99.11% Shell 0.01%
image-recognition imagenet neurips object-detection ood out-of-distribution imagenet-hard

zoomisallyouneed's Introduction

ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification


Abstract

Image classifiers are information-discarding machines, by design. Yet, how these models discard information remains mysterious. We hypothesize that one way for image classifiers to reach high accuracy is to first zoom to the most discriminative region in the image and then extract features from there to predict image labels, discarding the rest of the image. Studying six popular networks ranging from AlexNet to CLIP, we find that proper framing of the input image can lead to the correct classification of 98.91% of ImageNet images. Furthermore, we uncover positional biases in various datasets, especially a strong center bias in two popular datasets: ImageNet-A and ObjectNet. Finally, leveraging our insights into the potential of zooming, we propose a test-time augmentation (TTA) technique that improves classification accuracy by forcing models to explicitly perform zoom-in operations before making predictions. Our method is more interpretable, accurate, and faster than MEMO, a state-of-the-art (SOTA) TTA method. We introduce ImageNet-Hard, a new benchmark that challenges SOTA classifiers including large vision-language models even when optimal zooming is allowed.

animation.mp4

ImageNet-Hard

The ImageNet-Hard is a new benchmark that comprises an array of challenging images, curated from several validation datasets of ImageNet. This dataset challenges state-of-the-art vision models, as merely zooming in often fails to enhance their ability to correctly classify images. Consequently, even the most advanced models, such as CLIP-ViT-L/14@336px, struggle to perform well on this dataset, achieving only 2.02% accuracy.

The ImageNet-Hard dataset is avaible to access and browser on Hugging Face:

  • ImageNet-Hard Hugging Face Dataset
  • ImageNet-Hard-4K Hugging Face Dataset.

Dataset Distribution

Dataset Distribution

Performance Report

Model Accuracy
AlexNet 7.34
VGG-16 12.00
ResNet-18 10.86
ResNet-50 14.74
ViT-B/32 18.52
EfficientNet-B0 16.57
EfficientNet-B7 23.20
EfficientNet-L2-Ns 39.00
CLIP-ViT-L/14@224px 1.86
CLIP-ViT-L/14@336px 2.02
OpenCLIP-ViT-bigG-14 15.93
OpenCLIP-ViT-L-14 15.60

Evaluation Code

Supplementary Material

You can find all the supplementary material on Google Drive.

Citation information

If you use this software, please consider citing:

@article{taesiri2023zoom,
  title={ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification},
  author={Taesiri, Mohammad Reza and Nguyen, Giang and Habchi, Sarra and Bezemer, Cor-Paul and Nguyen, Anh},
  booktitle={Advances in Neural Information Processing Systems}
  year={2023}
}

zoomisallyouneed's People

Contributors

anguyen8 avatar taesiri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

paperwave

zoomisallyouneed's Issues

How to achieve 98.91% of ImageNet Accuracy

Hi authors!

Congratulations for this amazing work. Very interesting indeed. I was reading the paper and the following caught my attention:

" ...we find that proper framing of the input image can lead to the correct classification of 98.91% of ImageNet images."

I assume 98.91% denote Top-1 accuracy. However, according to a study from MIT, at least 6% of ImageNet validation set labels are incorrect. I wonder how the authors may justify this discrepancy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.