naver / cgd Goto Github PK

View Code? Open in Web Editor NEW

144.0 18.0 14.0 95.02 MB

Combination of Multiple Global Descriptors for Image Retrieval

Home Page: https://arxiv.org/abs/1903.10663

License: Apache License 2.0

Python 93.47% Shell 6.53%

image-retrieval global-descriptor cgd mxnet cbir

cgd's Introduction

Combination of Multiple Global Descriptors for Image Retrieval

This is the repository to reproduce the results of our paper "Combination of Multiple Global Descriptors for Image Retrieval".

HeeJae Jun*, Byungsoo Ko*, Youngjoon Kim, Insik Kim, Jongtack Kim (* Authors contributed equally.)

@NAVER/LINE Vision

Approach

Prerequisite

Python 2.7 or above
MXNet-1.4.0 or above
Numpy and tqdm

Usage

Download dataset

$ bash download.sh cub200

Extract pre-trained model

$ tar zxvf ./checkpoints/CGD.CUB200.C_concat_MG.ResNet50v.dim1536.tar.gz -C ./checkpoints/

Test

$ python test.py
usage: test.py [-h] [--image-width IMAGE_WIDTH] [--image-height IMAGE_HEIGHT]
               [--batch-size BATCH_SIZE] [--num-workers NUM_WORKERS]
               [--recallk RECALLK] [--data-dir DATA_DIR]
               [--train-txt TRAIN_TXT] [--test-txt TEST_TXT]
               [--bbox-txt BBOX_TXT] --pretrained-model PRETRAINED_MODEL
               [--gpu GPU]

$ python test.py --pretrained-model=checkpoints/CGD.CUB200.C_concat_MG.ResNet50v.dim1536
...
R@   1: 0.7681
R@   2: 0.8484
R@   4: 0.9060
R@   8: 0.9433

Citation

@article{jun2019combination,
  title={Combination of Multiple Global Descriptors for Image Retrieval},
  author={Jun, HeeJae and Ko, ByungSoo and Kim, Youngjoon and Kim, Insik and Kim, Jongtack},
  journal={arXiv preprint arXiv:1903.10663},
  year={2019}
}

License

Copyright 2019-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

cgd's People

Contributors

Stargazers

Watchers

Forkers

saiaman mercileesb pgsrv yanqi1811 dapenggg sweaterr shivakant1999 ttl518 peternara navervision kiminh lindsey98 chrisbyd flyinggh

cgd's Issues

Release of training code

Hello thanks for releasing this code. When you will release training code as well? Thanks

Some questions about the paper

You have done a very nice job on your paper! I tried to implement your proposed network these days, but I found several problems.

The first one is I found that the Fc layer after descriptor always have negative impacts on the result, on the CUB200-2011, I get top-1 recall 72 on the final l2 layer, but I can get 76 on GD(1) layer (under MG configuration). I think maybe it is influenced strongly by the auxiliary classification branch? I want to know if the loss of ranking loss branch should get more weight?
The second one is how much iters do you train, what is your strategy to tune the lr? (I use 4000 iters and Adam, and the lr divide by 10 on iter 1000, 2000, 3000)
The third question is do you use bias on every fc layer?
The last one is do you fix the bn layer in the backbone?

What is the meaning of 'non-conventional usage' of backbone at Table 7 in the Paper?

Thank you for your great job of this repo and paper!
I notice that the result of ResNet-50 with non-conventional usage has the best performance. I want to know how to implement this ' non-conventional usage'.
Does it mean 'discarding the down-sampling operation between stage3 and stage4' in section 3.1 of the paper?
Thanks a lots.

Inshop Data - Training Auxiliary Classifier - which categories?

Hello guys!

I was wondering when you trained the model on the InShop dataset, what categories you used for calculating the classification loss?

Were they top-level categories like Blouses_Shirts, Cardigans etc. or were they more granular like id_00000001, id_00000271 etc.

I presume while the former can cohesively get broad categories together in the latent space, the latter can help guide the training to bring the same/similar clothes' embeddings together (as in red polo and blue polo together and so on). But I want to know how was the experiment conducted?

Can someone please help me with this issue?

Thanks & Regards.

Evaluation for other benchmarks

Hi, could you tell me on which dataset your models are trained? Have you evaluated your models on other benchmarks like UKBench, Holidays, Oxford-5k, Paris-6k, .... If not, do you intent to test your models in order to compare your models with the SOTA models on these datasets.
Thank you in advance for your reply

why ResNet-50

Thank you for your great job of this paper!
1.Why can RESNET-50 get the best effect in CNN backbone network？

2.In the backbone network without down sampling, the input is 224x224x3, and the output is 14x14x1536, right?

About embedding

Hi, I noticed that in your paper on Table 5. The original embedding is 1536, but you reduce this dimension to 768/512 afterwards. I want to know that which method do you use in reducing dimension. Is this PCA or just simple reshape?
Looking forward to your reply. Thx!

naver / cgd Goto Github PK

cgd's Introduction

Combination of Multiple Global Descriptors for Image Retrieval

Approach

Prerequisite

Usage

Download dataset

Extract pre-trained model

Test

Citation

License

cgd's People

Contributors

Stargazers

Watchers

Forkers

cgd's Issues

Recommend Projects

Recommend Topics

Recommend Org