Light

jhgan00 / image-retrieval-transformers Goto Github PK

View Code? Open in Web Editor NEW

44.0 2.0 6.0 576 KB

(Unofficial) PyTorch implementation of Training Vision Transformers for Image Retrieval(El-Nouby, Alaaeldin, et al. 2021).

Python 88.78% Shell 11.22%

deit image image-retrieval vision-transformers

image-retrieval-transformers's Introduction

Training Vision Transformers for Image Retrieval

(Unofficial) PyTorch implementation of Training Vision Transformers for Image Retrieval(El-Nouby, Alaaeldin, et al. 2021).
I have not yet achieved exactly the same results as reported in the paper(Differential entropy regularization does not have much effect on In-shop and SOP datasets).

Requirements

# Python 3.7
pip install -r requirements.txt

Training

See scripts/train.*.sh

CUB-200-2011

# CUB-200-2011
python main.py \
  --model deit_small_distilled_patch16_224 \
  --max-iter 2000 \
  --dataset cub200 \
  --data-path /data/CUB_200_2011 \
  --rank 1 2 4 8 \
  --lambda-reg 0.7

Stanford Online Products

# Stanford Online Products
python main.py \
  --model deit_small_distilled_patch16_224 \
  --max-iter 35000 \
  --dataset sop \
  --m 2 \
  --data-path /data/Stanford_Online_Products \
  --rank 1 10 100 1000 \
  --lambda-reg 0.7

In-shop

# In-shop
python main.py \
  --model deit_small_distilled_patch16_224 \
  --max-iter 35000 \
  --dataset inshop \
  --data-path /data/In-shop \
  --m 2 \
  --rank 1 10 20 30 \
  --memory-ratio 0.2 \
  --device cuda:2 \
  --encoder-momentum 0.999 \
  --lambda-reg 0.7

Experiments

IRT_O – off-the-shelf extraction of features from a ViT backbone, pre-trained on ImageNet;

IRT_L – fine-tuning a transformer with metric learning, in particular with a contrastive loss;

IRT_R – additionally regularizing the output feature space to encourage uniformity.

†: Models pre-trained with distillation with a convnet trained on ImageNet1k

Method	Backbone	SOP				CUB-200				In-Shop
Method	Backbone	1	10	100	1000	1	2	4	8	1	10	20	30
IRT_O	DeiT-S	53.12	68.96	81.60	94.09	58.68	71.30	80.96	88.18	31.28	57.03	64.20	68.28
IRT_L	DeiT-S	83.56	93.29	97.23	99.03	73.68	82.58	88.77	92.71	93.09	98.28	98.74	99.02
IRT_R	DeiT-S	82.67	92.73	96.69	98.80	73.73	82.91	89.30	93.35	90.47	97.97	98.61	98.92
IRT_R	DeiT-S†	82.70	92.85	96.92	98.86	76.55	85.26	90.92	94.65	90.66	98.16	98.68	98.99

References

El-Nouby, Alaaeldin, et al. "Training vision transformers for image retrieval." arXiv preprint arXiv:2102.05644 (2021).

image-retrieval-transformers's People

Contributors

Stargazers

Watchers

Forkers

anikchaudhuri norfrile peternara sunxingxingtf ttl518

image-retrieval-transformers's Issues

Particular Object Retrieval

You have implemented Category-level Retrieval. Have you considered implementing a Particular Object Retrieval? Thank you!

Would you mind sharing the data preprocessing?

As there is no "/data" directory, would you mind sharing this part or the data preprocessing process? Thanks!

Can I apply Custom Dataset?

Hello, I'm trying to learn by applying a custom dataset with unsupervised learning without labeling, is it possible?
If It's possible, could you tell me how?

Trained checkpoints

Hi,

Thank you so much for implementing the code. I am wondering if you can share the checkpoints.

Thanks,
Gowthami

training on multiple GPUs

firstly thanks for such an amazing and clear implementation of the paper. I want to train the model on 8 GPUs with 24GB. Should ı add only torch.nn.DataParallel or should ı change other codes as well? Thanks in advance

Test code

Can i have the test section?
Thank you!

Performance on Cars196 dataset

Hi, thanks for this great repo! I've tried out a few runs, and they work nicely.

I've also tested this method on the Cars196 dataset (with the same setup as CUB, I also wrote a dataset file for it, but almost the same). However, it performed pretty badly, with R@1=52%.

As it is one of the most evaluated datasets in deep metric learning community, I wonder if you have any idea why this is the case. Because usually if the methods work on CUB and SOP, they at least perform comparably on Cars196, and this is not the case. Thanks in advance.

Can you provide test code?

I need to test the code to retrieve similar pictures, if you can provide it, it is very grateful!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.