Coder Social home page Coder Social logo

naver / cgd Goto Github PK

View Code? Open in Web Editor NEW
144.0 18.0 14.0 95.02 MB

Combination of Multiple Global Descriptors for Image Retrieval

Home Page: https://arxiv.org/abs/1903.10663

License: Apache License 2.0

Python 93.47% Shell 6.53%
image-retrieval global-descriptor cgd mxnet cbir

cgd's Introduction

Combination of Multiple Global Descriptors for Image Retrieval

This is the repository to reproduce the results of our paper "Combination of Multiple Global Descriptors for Image Retrieval".

HeeJae Jun*, Byungsoo Ko*, Youngjoon Kim, Insik Kim, Jongtack Kim (* Authors contributed equally.)

@NAVER/LINE Vision

Approach

Prerequisite

  • Python 2.7 or above
  • MXNet-1.4.0 or above
  • Numpy and tqdm

Usage

Download dataset

$ bash download.sh cub200

Extract pre-trained model

$ tar zxvf ./checkpoints/CGD.CUB200.C_concat_MG.ResNet50v.dim1536.tar.gz -C ./checkpoints/

Test

$ python test.py
usage: test.py [-h] [--image-width IMAGE_WIDTH] [--image-height IMAGE_HEIGHT]
               [--batch-size BATCH_SIZE] [--num-workers NUM_WORKERS]
               [--recallk RECALLK] [--data-dir DATA_DIR]
               [--train-txt TRAIN_TXT] [--test-txt TEST_TXT]
               [--bbox-txt BBOX_TXT] --pretrained-model PRETRAINED_MODEL
               [--gpu GPU]
$ python test.py --pretrained-model=checkpoints/CGD.CUB200.C_concat_MG.ResNet50v.dim1536
...
R@   1: 0.7681
R@   2: 0.8484
R@   4: 0.9060
R@   8: 0.9433

Citation

@article{jun2019combination,
  title={Combination of Multiple Global Descriptors for Image Retrieval},
  author={Jun, HeeJae and Ko, ByungSoo and Kim, Youngjoon and Kim, Insik and Kim, Jongtack},
  journal={arXiv preprint arXiv:1903.10663},
  year={2019}
}

License

Copyright 2019-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

cgd's People

Contributors

kis2u avatar kobiso avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cgd's Issues

Some questions about the paper

You have done a very nice job on your paper! I tried to implement your proposed network these days, but I found several problems.

  • The first one is I found that the Fc layer after descriptor always have negative impacts on the result, on the CUB200-2011, I get top-1 recall 72 on the final l2 layer, but I can get 76 on GD(1) layer (under MG configuration). I think maybe it is influenced strongly by the auxiliary classification branch? I want to know if the loss of ranking loss branch should get more weight?

  • The second one is how much iters do you train, what is your strategy to tune the lr? (I use 4000 iters and Adam, and the lr divide by 10 on iter 1000, 2000, 3000)

  • The third question is do you use bias on every fc layer?

  • The last one is do you fix the bn layer in the backbone?

Inshop Data - Training Auxiliary Classifier - which categories?

Hello guys!

I was wondering when you trained the model on the InShop dataset, what categories you used for calculating the classification loss?

Were they top-level categories like Blouses_Shirts, Cardigans etc. or were they more granular like id_00000001, id_00000271 etc.

I presume while the former can cohesively get broad categories together in the latent space, the latter can help guide the training to bring the same/similar clothes' embeddings together (as in red polo and blue polo together and so on). But I want to know how was the experiment conducted?

Can someone please help me with this issue?

Thanks & Regards.

Evaluation for other benchmarks

Hi, could you tell me on which dataset your models are trained? Have you evaluated your models on other benchmarks like UKBench, Holidays, Oxford-5k, Paris-6k, .... If not, do you intent to test your models in order to compare your models with the SOTA models on these datasets.
Thank you in advance for your reply

why ResNet-50

Thank you for your great job of this paper!
1.Why can RESNET-50 get the best effect in CNN backbone network?

2.In the backbone network without down sampling, the input is 224x224x3, and the output is 14x14x1536, right?

About embedding

Hi, I noticed that in your paper on Table 5. The original embedding is 1536, but you reduce this dimension to 768/512 afterwards. I want to know that which method do you use in reducing dimension. Is this PCA or just simple reshape?
Looking forward to your reply. Thx!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.