Apply pre-trained Vision transformer to retrieve images
An image retrieval system is a computer system used for browsing, searching and retrieving images from a large database of digital images.
Given a list of images, build an image retrieval system that finds relevant or similar images among the source images compared to the query image.
Feature Extractor: A component (typically a network) that is automatically identifying and extracting meaningful features from raw image data. Some well-known extractor are:
- Digital Image Processing: HOG, SIFT...
- Traditional Machine Learning: Bag of Visual Words, PCA...
- Deep Learning: CNNs, Autoencoder, ViT...
Vision Transformer (ViT): A deep learning architecture designed for image classification tasks. ViT deviates from traditional CNNs by using a transformer-based architecture.
โ Paper: https://arxiv.org/pdf/2010.11929.pdf
Download dataset or add shortcut if using Google Colab: here