Coder Social home page Coder Social logo

jhgan00 / image-retrieval-transformers Goto Github PK

View Code? Open in Web Editor NEW
44.0 2.0 6.0 576 KB

(Unofficial) PyTorch implementation of Training Vision Transformers for Image Retrieval(El-Nouby, Alaaeldin, et al. 2021).

Python 88.78% Shell 11.22%
deit image image-retrieval vision-transformers

image-retrieval-transformers's Introduction

Training Vision Transformers for Image Retrieval

  • (Unofficial) PyTorch implementation of Training Vision Transformers for Image Retrieval(El-Nouby, Alaaeldin, et al. 2021).
  • I have not yet achieved exactly the same results as reported in the paper(Differential entropy regularization does not have much effect on In-shop and SOP datasets).

Requirements

# Python 3.7
pip install -r requirements.txt

Training

  • See scripts/train.*.sh
# CUB-200-2011
python main.py \
  --model deit_small_distilled_patch16_224 \
  --max-iter 2000 \
  --dataset cub200 \
  --data-path /data/CUB_200_2011 \
  --rank 1 2 4 8 \
  --lambda-reg 0.7
# Stanford Online Products
python main.py \
  --model deit_small_distilled_patch16_224 \
  --max-iter 35000 \
  --dataset sop \
  --m 2 \
  --data-path /data/Stanford_Online_Products \
  --rank 1 10 100 1000 \
  --lambda-reg 0.7
# In-shop
python main.py \
  --model deit_small_distilled_patch16_224 \
  --max-iter 35000 \
  --dataset inshop \
  --data-path /data/In-shop \
  --m 2 \
  --rank 1 10 20 30 \
  --memory-ratio 0.2 \
  --device cuda:2 \
  --encoder-momentum 0.999 \
  --lambda-reg 0.7

Experiments

  • IRTO – off-the-shelf extraction of features from a ViT backbone, pre-trained on ImageNet;
  • IRTL – fine-tuning a transformer with metric learning, in particular with a contrastive loss;
  • IRTR – additionally regularizing the output feature space to encourage uniformity.
  • †: Models pre-trained with distillation with a convnet trained on ImageNet1k
Method Backbone SOP CUB-200 In-Shop
1 10 100 1000 1 2 4 8 1 10 20 30
IRTO DeiT-S 53.12 68.96 81.60 94.09 58.68 71.30 80.96 88.18 31.28 57.03 64.20 68.28
IRTL DeiT-S 83.56 93.29 97.23 99.03 73.68 82.58 88.77 92.71 93.09 98.28 98.74 99.02
IRTR DeiT-S 82.67 92.73 96.69 98.80 73.73 82.91 89.30 93.35 90.47 97.97 98.61 98.92
IRTR DeiT-S† 82.70 92.85 96.92 98.86 76.55 85.26 90.92 94.65 90.66 98.16 98.68 98.99

References

  • El-Nouby, Alaaeldin, et al. "Training vision transformers for image retrieval." arXiv preprint arXiv:2102.05644 (2021).

image-retrieval-transformers's People

Contributors

dependabot[bot] avatar jhgan00 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

image-retrieval-transformers's Issues

Particular Object Retrieval

You have implemented Category-level Retrieval. Have you considered implementing a Particular Object Retrieval? Thank you!

Can I apply Custom Dataset?

Hello, I'm trying to learn by applying a custom dataset with unsupervised learning without labeling, is it possible?
If It's possible, could you tell me how?

Trained checkpoints

Hi,

Thank you so much for implementing the code. I am wondering if you can share the checkpoints.

Thanks,
Gowthami

training on multiple GPUs

firstly thanks for such an amazing and clear implementation of the paper. I want to train the model on 8 GPUs with 24GB. Should ı add only torch.nn.DataParallel or should ı change other codes as well? Thanks in advance

Test code

Can i have the test section?
Thank you!

Performance on Cars196 dataset

Hi, thanks for this great repo! I've tried out a few runs, and they work nicely.

I've also tested this method on the Cars196 dataset (with the same setup as CUB, I also wrote a dataset file for it, but almost the same). However, it performed pretty badly, with R@1=52%.

As it is one of the most evaluated datasets in deep metric learning community, I wonder if you have any idea why this is the case. Because usually if the methods work on CUB and SOP, they at least perform comparably on Cars196, and this is not the case. Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.