nashory / delf-pytorch Goto Github PK

PyTorch Implementation of "Large-Scale Image Retrieval with Attentive Deep Local Features"

License: MIT License

Jupyter Notebook 98.84% Python 1.16%

delf-pytorch's Introduction

Pytorch Implementation of Deep Local Feature (DeLF)

PyTorch Implementation of "Large-Scale Image Retrieval with Attentive Deep Local Features"
reference: https://arxiv.org/pdf/1612.06321.pdf

Prerequisites

PyTorch
python3
CUDA

Training DeLF

There are 2 steps for DeLF training: (1) finetune stage, and (2) keypoint stage.
Finetune stage loads resnet50 model pretrained on ImageNet, and finetune.
Keypoint stage freezes the "base" network, and only update "attention" network for keypoint selection. After the train process is done, model will be saved at repo/<expr>/keypoint/ckpt

(1) training finetune stage:

$ cd train/
$ python main.py \
    --stage 'finetune' \
    --optim 'sgd' \
    --gpu_id 6 \
    --expr 'landmark' \
    --ncls 586 \
    --finetune_train_path <path to train data> \
    --finetune_val_path <path to val data> \

(2) training keypoint stage:

load_from: absolute path to pytorch model you wish to load. (<model_name>.pth.tar)
expr: name of experiment you wish to save as.

$ cd train/
$ python main.py \
    --stage 'keypoint' \
    --gpu_id 6 \
    --ncls 586 \
    --optim 'sgd' \
    --use_random_gamma_scaling true \
    --expr 'landmark' \
    --load_from <path to model> \
    --keypoint_train_path <path to train data> \
    --keypoint_val_path <path to val data> \

Feature Extraction of DeLF

There are also two steps to extract DeLF: (1) train PCA, (2) extract dimension reduced DeLF.
IMPORTANT: YOU MUST CHANGE OR COPY THE NAME OF MODEL from repo/<expr>/keypoint/ckpt/bestshot.pth.tar to repo/<expr>/keypoint/ckpt/fix.pth.tar.
I intentionally added this to prevent the model from being updated after the PCA matrix is already calculated.

(1) train PCA

$ cd extract/
$ python extractor.py
    --gpu_id 4 \
    --load_expr 'delf' \
    --mode 'pca' \
    --stage 'inference' \
    --batch_size 1 \
    --input_path <path to train data>, but it is hardcoded.
    --output_path <output path to save pca matrix>, but it is hardcoded.

(2) extract dimension reduced DeLF

$ cd extract/
$ python extractor.py
    --gpu_id 4 \
    --load_expr 'delf' \
    --mode 'delf' \
    --stage 'inference' \
    --batch_size 1 \
    --attn_thres 0.31 \
    --iou_thres 0.92 \
    --top_k 1000 \
    --use_pca True \
    --pca_dims 40 \
    --pca_parameters_path <path to pca matrix file.>, but it is hardcoded.
    --input_path <path to train data>, but it is hardcoded.
    --output_path <output path to save pca matrix>, but it is hardcoded.

Visualization

You can visualize DeLF matching batween two arbitrary query images. Let's assume there exist two images, test/img1.jpg, test/img2.jpg. Run visualize.ipynb using Jupyter Notebook, and run each cells. You may get the result like below.

1) RANSAC Matching (Correspondance Matching + Geometric Verification)

2) Attention Map

Ranking Result on Oxf5k:

glr1k, glr2k: Trained DeLF model with a subset of google-landmark-dataset on kaggle, which contains top-K instances sorted by the # of images included.
** ldmk: Trained DeLF model with landmark dataset. (exactly same with the paper)

glr1k ranking result

glr2k ranking result

ldmk ranking result

Benchmark Result on Oxf5k (comparing to original paper)

Note: DELF_TF is the author's model, and the feature was extracted using this nice repo. (https://github.com/insikk/delf_enhanced)
PYTORCH_LDMK: Trained with landmark dataset.
PYTORCH_GLR1K: Trained with a subset of google-landmark-dataset with 1k instance classes.
PYTORCH_GLR1K: Trained with a subset of google-landmark-dataset with 2k instance classes.
PYTORCH_BNK_V3_BAL_HANA: Private currency dataset I personally own just for check.

Classes	DELF_TF	PYTORCH_LDMK	PYTORCH_GLR1K	PYTORCH_GLR2K	PYTORCH_BNK_V3_BAL_HANA
mAP	0.851307	0.849373	0.87828	0.866517	0.489614
all_souls_1	0.751052	0.767453	0.916059	0.886243	0.0584418
all_souls_2	0.517995	0.645628	0.708546	0.767904	0.287783
all_souls_3	0.626499	0.760189	0.881578	0.903977	0.347261
all_souls_4	0.968566	0.930445	0.967221	0.980288	0.515091
all_souls_5	0.735256	0.827341	0.899803	0.911414	0.117378
ashmolean_1	0.83206	0.768585	0.829522	0.860364	0.157126
ashmolean_2	0.844329	0.803305	0.814522	0.88631	0.194069
ashmolean_3	0.8407	0.863916	0.86428	0.841624	0.20158
ashmolean_4	0.857416	0.730968	0.816007	0.829129	0.353456
ashmolean_5	0.77901	0.84768	0.808717	0.875755	0.106619
balliol_1	0.917435	0.818512	0.914453	0.857404	0.362258
balliol_2	0.462124	0.5546	0.68825	0.632167	0.0984046
balliol_3	0.710849	0.72742	0.80883	0.729275	0.209934
balliol_4	0.658099	0.681549	0.749764	0.667446	0.342497
balliol_5	0.739436	0.689549	0.80835	0.716029	0.319832
bodleian_1	0.7943	0.797353	0.833887	0.851872	0.350422
bodleian_2	0.828246	0.549165	0.520681	0.413119	0.643002
bodleian_3	0.84655	0.844758	0.954003	0.841856	0.799652
bodleian_4	0.726362	0.732197	0.916468	0.84604	0.476852
bodleian_5	0.815629	0.864863	0.915992	0.847784	0.773505
christ_church_1	0.953197	0.97743	0.96955	0.987822	0.866622
christ_church_2	0.960692	0.950959	0.975525	0.979186	0.783949
christ_church_3	0.932694	0.951987	0.940492	0.942081	0.263114
christ_church_4	0.965374	0.979779	0.970264	0.981529	0.784185
christ_church_5	0.971503	0.971411	0.976488	0.983004	0.312071
cornmarket_1	0.690551	0.722799	0.692261	0.681911	0.492891
cornmarket_2	0.727338	0.382168	0.32282	0.184599	0.169908
cornmarket_3	0.707911	0.650324	0.696718	0.672553	0.379656
cornmarket_4	0.65958	0.789562	0.656362	0.669228	0.273514
cornmarket_5	0.68901	0.814039	0.606983	0.558519	0.19587
hertford_1	0.92893	0.915811	0.957557	0.951947	0.562145
hertford_2	0.960313	0.942536	0.937546	0.951293	0.524951
hertford_3	0.936073	0.959108	0.97494	0.941641	0.570177
hertford_4	0.898146	0.914434	0.924889	0.927225	0.679879
hertford_5	0.975377	0.929499	0.946097	0.94726	0.235865
keble_1	1	1	1	1	0.954762
keble_2	1	0.944161	1	1	0.921088
keble_3	1	0.932568	1	1	0.931319
keble_4	1	1	1	1	0.331796
keble_5	1	0.87432	1	1	0.944161
magdalen_1	0.710288	0.766209	0.819577	0.861361	0.109972
magdalen_2	0.830566	0.928487	0.914451	0.926896	0.164253
magdalen_3	0.759041	0.832379	0.872577	0.896532	0.168931
magdalen_4	0.853145	0.877747	0.880979	0.844535	0.0728258
magdalen_5	0.761443	0.77776	0.841862	0.791102	0.175314
pitt_rivers_1	1	1	1	1	0.647935
pitt_rivers_2	1	1	1	1	1
pitt_rivers_3	1	1	1	1	0.746479
pitt_rivers_4	1	1	1	1	0.599398
pitt_rivers_5	1	1	1	1	1
radcliffe_camera_1	0.93144	0.916562	0.943584	0.95298	0.860801
radcliffe_camera_2	0.961224	0.980161	0.980304	0.982237	0.936467
radcliffe_camera_3	0.925759	0.908404	0.949748	0.959252	0.871228
radcliffe_camera_4	0.979608	0.98273	0.983941	0.988227	0.787773
radcliffe_camera_5	0.90082	0.936742	0.952967	0.949522	0.894346

Author

Minchul Shin(@nashory)
contact: [email protected]

delf-pytorch's People

Contributors

Stargazers

Watchers

Forkers

hunglethanh9 10183308 lewiszhao kakashidan stoneyang xiaodanli001 bigdatasciencegroup johndpope hikkikuma xiaoweihu chicm-ms meiliniumowang jason-lee-lxx gpostelnicu xytjcxy christinaliang cscn89 eshuka irvingshu jxqj barbecacov canhld94 berryrb liuheng0111 yangzhaojason githubfragments xiaolurd nathangq thoang3 runauto dongan-beta novioleo weidom peternara walpola-layantha-perera carriex berooo kshaonan kperkins411 pangqianqian kurhula yunlonggao23 dokingson qinziwen rohitkeshari nhonth nlp-ljy alright-code annaduring maybeee18 locobuzz muskanmahajan37 ziyunxiao ttl518 guitaryourself joohyungson knut0815 zhuxiongwei24 healthonrails chengwei920412 mnseong faithfulnguyen zheng547

delf-pytorch's Issues

extractor

when i extractor.py in pca stage ,this problem hanppend?what should I do?

evaluation about benchmark set

Hi! I appreciate you(@nashory) releasing the code.
I am very curious about the way you evaluated the benchmark set.
Could you explain to me in detail?
And, if you have that evaluation code, can you release the code?

How can i get the model pretrained on imagenet?

I want to extract delf for a dataset of common things like book, bottle, picture. But i found the matching performance of the offered model finetuned on the Romelandmark dataset is bad. So i want to known how to get the model pretrained on imagenet? Or do you have any suggestions for extracting delf for a dataset of common things? Looking forward to your early reply. Thank you very much!

No module named 'skimage'

I will be very grateful if someone can tell me the answer or upload this module.

hyper-parameters for training keypoint stage && training result of finetune stage

@nashory hi, I noticed that, when training the keypoint stage, you set use_l2_normalized_feature = True, what's the reason of setting this parameters? And what's more, I noticed that you set target_layer = layer3 by default, have you tried target_layer = layer4? If yes, which one is better?

Another question that confused me is, when I train finetune stage directly on google-landmark-dataset-top1k, I got acc1 over 97.5, how about your result on this stage?

Thank you and wait for your answer.

”bar“ is missing from “utils”

Lack of the function "bar",i will appreciated if you can upload it. Have a nice day!

need help! speed infer time

I test the infer time of pytorch, 7scales used 1---2s，
while the infer time of tensorflow ,7 scales just used 0.1--0.2s

how can i speed it?

Original DELF model by authors has BN for spatial attention

See here the definition of the model in Tensorflow by authors :
https://github.com/tensorflow/models/blob/master/research/delf/delf/python/training/model/delf_model.py

They use BatchNorm between conv1 and conv2 of the spatial attention module.

error message:floor_vml_cpu not implemented for 'Long'

Model file seems missing

There is no file 'archive/model/ldmk/keypoint/ckpt/fix.pth.tar' in the repo, is it missing? How can I get it?

multi-gpu training?

Thanks for the great pytorch code. It seems that this repo does not support multi-gpu training.

Hardcoded arguments for "Feature Extraction of DeLF"

Thanks for the implementation!

In the readme, examples are provided of arguments being passed into the (1) train PCA and (2) extract dimension reduced DeLF steps. However, all the values are hardcoded in extract.py. Thus, the steps in the main README are somewhat misleading.

Is there specific resoning for hardcoding the extract parameters, but passing the train parameters?

I feel that it would be quite convenient to pass all extract parameters from cli.

AttributeError: module 'utils.misc' has no attribute 'get_mean_and_std'

The value of receptive field

Hi ! Thank you for your great work.
I have a little question. The value of receptive field in layer4 is 483 in your code. But I find you use resnet50 from torchvision model zoo, it is different with resnet50 from tensorflow slim. The value of receptive field in layer4 may be 427 ?

No module 'progress'

Did you not upload completely? Missing module'progress' ，I would be very grateful if you could respond.

Pre-trained landmark problem

@nashory Thanks for sharing such a great work. I had downloaded your shared pre-trained weights on landmarks dataset but it seems there's error with the file that I cant read or extract as well.
Can you fix the link or file ?

Train Data Format

Hi, I'm trying to train DeLF on my custom dataset but I'm not able to understand how to arrange the data set. Where should i place my data and in what format should it be. Could anyone please help me.
Any help would be great.

what is the difference between the finetune dataset and the keypoint dataset?

train PCA

error

error:floor_vml_cpu not implemented for 'Long'

In train PCA, the error "floor_vml_cpu not implemented for 'Long'" ocur.

What's mean of receptive field, stride and padding of FeatureExtractor class in extractor.py?

I wanna extract delf with another network.How to set receptive field, stride and padding of FeatureExtractor class in extractor.py?

The code of “Benchmark Result on Oxf5k

bar“ is missing frombar函数

Hello, what is the training data set? Where can I download it?

Hello, what is the training data set? Where can I download it?
For example, the default '../../data/landmarks/landmarks_full_val' in the code.

Training datasets

many image download links are invalid.

from feeder import Feeder

when running the code, in visualize.ipynb, "from feeder import Feeder", I did't search it, I'd want to know what the feeder packet is and where can I load it? Thank you for your sharing

question about visualize.ipynb

I try to match two pictures(with your provided trained model), but I meet something wrong like
"UserWarning: An output with one or more elements was resized since it had shape [999], which does not match the required output shape [998].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:24.)
torch.index_select(x1, 0, idx, out=xx1)"
and it cycle and cycle until
"UserWarning: An output with one or more elements was resized since it had shape [2], which does not match the required output shape [1].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:24.)
torch.index_select(x1, 0, idx, out=xx1)"
@nashory I am confused and would like your reply please

len() of a 0-d tensor

Hi, Thanks for releasing this code
I try to train the model on my dataset
but sometimes when I run the visualize notebook I get an error:

/home/ubuntu/DeLF-pytorch/helper/delf_helper.pyc in GetDelfFeatureFromSingleScale(x, model, scale, pca_mean, pca_vars, pca_matrix, pca_dims, rf, stride, padding, attn_thres, use_pca)
    283     # use attention score to select feature.
    284     indices = None
--> 285     while(indices is None or len(indices) == 0):
    286         indices = torch.gt(scaled_scores, attn_thres).nonzero().squeeze()
    287         attn_thres = attn_thres * 0.5   # use lower threshold if no indexes are found.

/usr/local/lib/python2.7/dist-packages/torch/tensor.pyc in __len__(self)
    368     def __len__(self):
    369         if self.dim() == 0:
--> 370             raise TypeError("len() of a 0-d tensor")
    371         return self.shape[0]

any idea why ?

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

In the step2: Feature Extraction of DeLF
After the step1, I have get the fix.pth.tar model file, the config in extractor.py show here:

MODE = 'pca'           # either "delf" or "pca"
GPU_ID = 4
IOU_THRES = 0.98
ATTN_THRES = 0.37
TOP_K = 1000
USE_PCA = False
PCA_DIMS = 40
SCALE_LIST = [0.25, 0.3535, 0.5, 0.7071, 1.0, 1.4142, 2.0]
ARCH = 'resnet50'
EXPR = 'dummy'
TARGET_LAYER = 'layer3'
MODEL_NAME = 'res18_mix_debase_2'
LOAD_FROM = '../train/repo/res18_mix_debase_1/keypoint/ckpt/fix.pth.tar'
PCA_PARAMETERS_PATH = './output/pca/{}/pca.h5'.format(MODEL_NAME)

however, I get the result in console:

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

And regardless what I type print() method to test. there is not anything return.

Pretrained weights trained on landmark dataset

I share the pretrained weight trained on landmarkd dataset.
Please download it from the following url:

https://drive.google.com/open?id=1dbdaDyVeIb53iGh4Uk5kA4in9-uURoLM

extractor

hello,when i run extractor.py,it happened,what should I do?

Is there any way to install feeder or its alternative

How to Compute and Sort the Similarity between Retrieved Pictures and Base Pictures

How to calculate and sort the similarity after extracting the features of the retrieved image and the base image.

How to prepare my own dataset to train the keypoint stage?

I wanna train delf model for product searching, however what kind of datasets should i provide for keypoint training? Is there any off-th-shelf datasets can i use for training?

license?

Hi,

Thank you for the cool re-implementation! May I ask you about the license on your code?
I would like to port to to the kornia local features https://kornia.readthedocs.io/en/latest/feature.html

--
Best, Dmytro

datasets

Hello! I am always looking at your beneficial repository.

I'm wondering what difference between the dataset used in the pretrain process and finetuning process.

One uses full and one uses clean, what is the difference between this two datasets?

No module named 'progress'

CUDA out of memory when running notebook/visualize.ipynb

Hi @nashory and everyone,

I have the following error when running get_result(myfeeder, query) in the visualize.ipynb notebook. Could you show me how to fix this? I've tried to reduce workers = 1 to no avail. Thank you in advance!

RuntimeError: CUDA out of memory. Tried to allocate 374.00 MiB (GPU 0; 10.73 GiB total capacity; 9.57 GiB already allocated; 330.62 MiB free; 66.62 MiB cached)

inference for test

Hi, how run inference for retrieval test image?

Some functions in utils/misc.py are not implemented

Functions declared in var __all__ in utils/misc.py, which are get_mean_and_std and init_param, are not implemented. To start training, those two need to be removed from __all__.