Coder Social home page Coder Social logo

filipradenovic / cnnimageretrieval-pytorch Goto Github PK

View Code? Open in Web Editor NEW
1.4K 34.0 316.0 95 KB

CNN Image Retrieval in PyTorch: Training and evaluating CNNs for Image Retrieval in PyTorch

Home Page: http://cmp.felk.cvut.cz/cnnimageretrieval

License: MIT License

Python 100.00%
image-retrieval convolutional-neural-networks cnn python pytorch

cnnimageretrieval-pytorch's Introduction

CNN Image Retrieval in PyTorch: Training and evaluating CNNs for Image Retrieval in PyTorch

This is a Python toolbox that implements the training and testing of the approach described in our papers:

Fine-tuning CNN Image Retrieval with No Human Annotation,
Radenović F., Tolias G., Chum O., TPAMI 2018 [arXiv]

CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples,
Radenović F., Tolias G., Chum O., ECCV 2016 [arXiv]


What is it?

This code implements:

  1. Training (fine-tuning) CNN for image retrieval
  2. Learning supervised whitening, as post-processing, for global image descriptors
  3. Testing CNN image retrieval on Oxford and Paris datasets

Prerequisites

In order to run this toolbox you will need:

  1. Python3 (tested with Python 3.7.0 on Debian 8.1)
  2. PyTorch deep learning framework (tested with version 1.0.0)
  3. All the rest (data + networks) is automatically downloaded with our scripts

Usage

Navigate (cd) to the root of the toolbox [YOUR_CIRTORCH_ROOT]. You can install package with pip3 install . if you need. Make sure to have desired PyTorch and torchvision packages installed.

Training

Example training script is located in YOUR_CIRTORCH_ROOT/cirtorch/examples/train.py

python3 -m cirtorch.examples.train [-h] [--training-dataset DATASET] [--no-val]
                [--test-datasets DATASETS] [--test-whiten DATASET]
                [--test-freq N] [--arch ARCH] [--pool POOL]
                [--local-whitening] [--regional] [--whitening]
                [--not-pretrained] [--loss LOSS] [--loss-margin LM]
                [--image-size N] [--neg-num N] [--query-size N]
                [--pool-size N] [--gpu-id N] [--workers N] [--epochs N]
                [--batch-size N] [--optimizer OPTIMIZER] [--lr LR]
                [--momentum M] [--weight-decay W] [--print-freq N]
                [--resume FILENAME]
                EXPORT_DIR

For detailed explanation of the options run:

python3 -m cirtorch.examples.train -h

Note: Data and networks used for training and testing are automatically downloaded when using the example script.

Testing

Example testing script is located in YOUR_CIRTORCH_ROOT/cirtorch/examples/test.py

python3 -m cirtorch.examples.test [-h] (--network-path NETWORK | --network-offtheshelf NETWORK)
               [--datasets DATASETS] [--image-size N]
               [--multiscale MULTISCALE] [--whitening WHITENING] [--gpu-id N]

For detailed explanation of the options run:

python3 -m cirtorch.examples.test -h

Note: Data used for testing are automatically downloaded when using the example script.


Papers implementation

Training

For example, to train our best network described in the TPAMI 2018 paper run the following command. After each epoch, the fine-tuned network will be tested on the revisited Oxford and Paris benchmarks:

python3 -m cirtorch.examples.train YOUR_EXPORT_DIR --gpu-id '0' --training-dataset 'retrieval-SfM-120k' 
            --test-datasets 'roxford5k,rparis6k' --arch 'resnet101' --pool 'gem' --loss 'contrastive' 
            --loss-margin 0.85 --optimizer 'adam' --lr 5e-7 --neg-num 5 --query-size=2000 
            --pool-size=22000 --batch-size 5 --image-size 362

Networks can be evaluated with learned whitening after each epoch (whitening is estimated at the end of the epoch). To achieve this run the following command. Note that this will significantly slow down the entire training procedure, and you can evaluate networks with learned whitening later on using the example test script.

python3 -m cirtorch.examples.train YOUR_EXPORT_DIR --gpu-id '0' --training-dataset 'retrieval-SfM-120k' 
            --test-datasets 'roxford5k,rparis6k' --test-whiten 'retrieval-SfM-30k' 
            --arch 'resnet101' --pool 'gem' --loss 'contrastive' --loss-margin 0.85 
            --optimizer 'adam' --lr 5e-7 --neg-num 5 --query-size=2000 --pool-size=22000 
            --batch-size 5 --image-size 362

Note: Adjusted (lower) learning rate is set to achieve similar performance as with MatConvNet and PyTorch-0.3.0 implementation of the training.

Testing our pretrained networks

Pretrained networks trained using the same parameters as in our TPAMI 2018 paper are provided, with precomputed post-processing whitening step. To evaluate them run:

python3 -m cirtorch.examples.test --gpu-id '0' --network-path 'retrievalSfM120k-resnet101-gem' 
                --datasets 'oxford5k,paris6k,roxford5k,rparis6k' 
                --whitening 'retrieval-SfM-120k'
                --multiscale '[1, 1/2**(1/2), 1/2]'

or

python3 -m cirtorch.examples.test --gpu-id '0' --network-path 'retrievalSfM120k-vgg16-gem' 
                --datasets 'oxford5k,paris6k,roxford5k,rparis6k' 
                --whitening 'retrieval-SfM-120k'
                --multiscale '[1, 1/2**(1/2), 1/2]'

The table below shows the performance comparison of networks trained with this framework and the networks used in the paper which were trained with our CNN Image Retrieval in MatConvNet:

Model Oxford Paris ROxf (M) RPar (M) ROxf (H) RPar (H)
VGG16-GeM (MatConvNet) 87.9 87.7 61.9 69.3 33.7 44.3
VGG16-GeM (PyTorch) 87.3 87.8 60.9 69.3 32.9 44.2
ResNet101-GeM (MatConvNet) 87.8 92.7 64.7 77.2 38.5 56.3
ResNet101-GeM (PyTorch) 88.2 92.5 65.4 76.7 40.1 55.2

Note (June 2022): We updated download files for Oxford 5k and Paris 6k images to use images with blurred faces as suggested by the original dataset owners. Bear in mind, "experiments have shown that one can use the face-blurred version for benchmarking image retrieval with negligible loss of accuracy".

Testing your trained networks

To evaluate your trained network using single scale and without learning whitening:

python3 -m cirtorch.examples.test --gpu-id '0' --network-path YOUR_NETWORK_PATH 
                --datasets 'oxford5k,paris6k,roxford5k,rparis6k'

To evaluate trained network using multi scale evaluation and with learned whitening as post-processing:

python3 -m cirtorch.examples.test --gpu-id '0' --network-path YOUR_NETWORK_PATH 
                --datasets 'oxford5k,paris6k,roxford5k,rparis6k'
                --whitening 'retrieval-SfM-120k' 
                --multiscale '[1, 1/2**(1/2), 1/2]'
Testing off-the-shelf networks

Off-the-shelf networks can be evaluated as well, for example:

python3 -m cirtorch.examples.test --gpu-id '0' --network-offtheshelf 'resnet101-gem'
                --datasets 'oxford5k,paris6k,roxford5k,rparis6k'
                --whitening 'retrieval-SfM-120k' 
                --multiscale '[1, 1/2**(1/2), 1/2]'

Networks with projection (FC) layer after global pooling

Training

An alternative architecture includes a learnable FC (projection) layer after the global pooling. It is important to initialize the parameters of this layer with the result of learned whitening. To train such a setup you should run the following commands (the performance will be evaluated every 5 epochs on roxford5k and rparis6k):

python3 -m cirtorch.examples.train YOUR_EXPORT_DIR --gpu-id '0' --training-dataset 'retrieval-SfM-120k' 
            --loss 'triplet' --loss-margin 0.5 --optimizer 'adam' --lr 1e-6 
            --arch 'resnet50' --pool 'gem' --whitening 
            --neg-num 5 --query-size=2000 --pool-size=20000 
            --batch-size 5 --image-size 1024 --epochs 100 
            --test-datasets 'roxford5k,rparis6k' --test-freq 5 

or

python3 -m cirtorch.examples.train YOUR_EXPORT_DIR --gpu-id '0' --training-dataset 'retrieval-SfM-120k' 
            --loss 'triplet' --loss-margin 0.5 --optimizer 'adam' --lr 5e-7 
            --arch 'resnet101' --pool 'gem' --whitening 
            --neg-num 4 --query-size=2000 --pool-size=20000 
            --batch-size 5 --image-size 1024 --epochs 100 
            --test-datasets 'roxford5k,rparis6k' --test-freq 5 

or

python3 -m cirtorch.examples.train YOUR_EXPORT_DIR --gpu-id '0' --training-dataset 'retrieval-SfM-120k' 
            --loss 'triplet' --loss-margin 0.5 --optimizer 'adam' --lr 5e-7 
            --arch 'resnet152' --pool 'gem' --whitening 
            --neg-num 3 --query-size=2000 --pool-size=20000 
            --batch-size 5 --image-size 900 --epochs 100 
            --test-datasets 'roxford5k,rparis6k' --test-freq 5 

for ResNet50, ResNet101, or ResNet152, respectively.

Implementation details:

  • The FC layer is initialized with the result of whitening learned in a supervised manner using our training data and off-the-shelf features.
  • The whitening for this FC layer is precomputed for popular architectures and pooling methods, see imageretrievalnet.py#L50 for the full list of precomputed FC layers.
  • When this FC layer is added in the fine-tuning procedure, the performance is highest if the images are with a similar high-resolution at train and test time.
  • When this FC layer is added, the distribution of pairwise distances changes significantly, so roughly twice larger margin should be used for contrastive loss. In this scenario, triplet loss performs slightly better.
  • Additional tunning of hyper-parameters can be performed to achieve higher performance or faster training. Note that, in this example, --neg-num and --image-size hyper-parameters are chosen such that the training can be performed on a single GPU with 16 GB of memory.
Testing our pretrained networks with projection layer

Pretrained networks with projection layer are provided, trained both on retrieval-SfM-120k (rSfM120k) and google-landmarks-2018 (gl18) train datasets. For this architecture, there is no need to compute whitening as post-processing step (typically the performance boost is insignificant), although one can do that, as well. For example, multi-scale evaluation of ResNet101 with GeM with projection layer trained on google-landmarks-2018 (gl18) dataset using high-resolution images and a triplet loss, is performed with the following script:

python3 -m cirtorch.examples.test_e2e --gpu-id '0' --network 'gl18-tl-resnet101-gem-w' 
            --datasets 'roxford5k,rparis6k' --multiscale '[1, 2**(1/2), 1/2**(1/2)]'

Multi-scale performance of all available pre-trained networks is given in the following table:

Model ROxf (M) RPar (M) ROxf (H) RPar (H)
rSfM120k-tl-resnet50-gem-w 64.7 76.3 39.0 54.9
rSfM120k-tl-resnet101-gem-w 67.8 77.6 41.7 56.3
rSfM120k-tl-resnet152-gem-w 68.8 78.0 41.3 57.2
gl18-tl-resnet50-gem-w 63.6 78.0 40.9 57.5
gl18-tl-resnet101-gem-w 67.3 80.6 44.3 61.5
gl18-tl-resnet152-gem-w 68.7 79.7 44.2 60.3

Note (June 2022): We updated download files for Oxford 5k and Paris 6k images to use images with blurred faces as suggested by the original dataset owners. Bear in mind, "experiments have shown that one can use the face-blurred version for benchmarking image retrieval with negligible loss of accuracy".


Related publications

Training (fine-tuning) convolutional neural networks
@article{RTC18,
 title = {Fine-tuning {CNN} Image Retrieval with No Human Annotation},
 author = {Radenovi{\'c}, F. and Tolias, G. and Chum, O.}
 journal = {TPAMI},
 year = {2018}
}
@inproceedings{RTC16,
 title = {{CNN} Image Retrieval Learns from {BoW}: Unsupervised Fine-Tuning with Hard Examples},
 author = {Radenovi{\'c}, F. and Tolias, G. and Chum, O.},
 booktitle = {ECCV},
 year = {2016}
}
Revisited benchmarks for Oxford and Paris ('roxford5k' and 'rparis6k')
@inproceedings{RITAC18,
 author = {Radenovi{\'c}, F. and Iscen, A. and Tolias, G. and Avrithis, Y. and Chum, O.},
 title = {Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking},
 booktitle = {CVPR},
 year = {2018}
}

Versions

master (devolopment)

master (development)

  • Merged pull request #78 that fixes broken download links
  • Merged pull request #56 that adds setup file
v1.2 (07 Dec 2020)

v1.2 (07 Dec 2020)

  • Added example script for descriptor extraction with different publicly available models
  • Added the MIT license
  • Added mutli-scale performance on roxford5k and rparis6k for new pre-trained networks with projection, trained on both retrieval-SfM-120 and google-landmarks-2018 train datasets
  • Added a new example test script without post-processing, for networks that include projection layer
  • Added few things in train example: GeMmp pooling, triplet loss, small trick to handle really large batches
  • Added more pre-computed whitening options in imageretrievalnet
  • Added triplet loss
  • Added GeM pooling with multiple parameters (one p per channel/dimensionality)
  • Added script to enable download on Windows 10 as explained in Issue #39, courtesy of SongZRui
  • Fixed cropping of down-sampled query image
v1.1 (12 Jun 2019)

v1.1 (12 Jun 2019)

  • Migrated code to PyTorch 1.0.0, removed Variable, added torch.no_grad for more speed and less memory at evaluation
  • Added rigid grid regional pooling that can be combined with any global pooling method (R-MAC, R-SPoC, R-GeM)
  • Added PowerLaw normalization layer
  • Added multi-scale testing with any given set of scales, in example test script
  • Fix related to precision errors of covariance matrix estimation during whitening learning
  • Fixed minor bugs
v1.0 (09 Jul 2018)

v1.0 (09 Jul 2018)

  • First public version
  • Compatible with PyTorch 0.3.0

cnnimageretrieval-pytorch's People

Contributors

carandraug avatar filipradenovic avatar gtolias avatar zurk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cnnimageretrieval-pytorch's Issues

from torchvision import get_image_backend ImportError: cannot import name 'get_image_backend'

I run below script on torch 0.4.0, meet a error, any ideas ?

python3 -m cirtorch.examples.train ./test --gpu-id '5' --training-dataset 'retrieval-SfM-120k' >> './test/retrieval-SfM-120k_resnet101_gem_contrastive_m0.85_adam_lr1.0e-06_wd1.0e-04_nnum5_qsize2000_psize20000_bsize5_imsize362' --optimizer 'adam' --lr 1e-6 --neg-n>> Using pre-trained model 'resnet101'

imageretrievalnet.py: for 'resnet101' custom pretrained features 'imagenet-caffe-resnet101-features-10a101d.pth' are used
Evaluating network on test datasets...
roxford5k: Extracting...
roxford5k: database images...
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data5/zxt/downloads/cnnimageretrieval-pytorch-master/cirtorch/examples/train.py", line 507, in
main()
File "/data5/zxt/downloads/cnnimageretrieval-pytorch-master/cirtorch/examples/train.py", line 240, in main
test(args.test_datasets, model)
File "/data5/zxt/downloads/cnnimageretrieval-pytorch-master/cirtorch/examples/train.py", line 442, in test
vecs = extract_vectors(net, images, image_size, transform)
File "/data5/zxt/downloads/cnnimageretrieval-pytorch-master/cirtorch/networks/imageretrievalnet.py", line 174, in extract_vectors
for i, input in enumerate(loader):
File "/home/zxt/py3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 286, in next
return self._process_next_batch(batch)
File "/home/zxt/py3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
ImportError: Traceback (most recent call last):
File "/home/zxt/py3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/zxt/py3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 57, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/data5/zxt/downloads/cnnimageretrieval-pytorch-master/cirtorch/datasets/genericdataset.py", line 50, in getitem
img = self.loader(path)
File "/data5/zxt/downloads/cnnimageretrieval-pytorch-master/cirtorch/datasets/datahelpers.py", line 36, in default_loader
from torchvision import get_image_backend
ImportError: cannot import name 'get_image_backend'

How to Choose the Number of Positive and Negative pairs

Hi Filip,
For your dataset like retrieval-SfM-120k.pkl, 1 query + 1 positive + 5 negative for training. I want use more positive and negative pairs, such as 1 query + 8 positive + 16 negative. Do you have any suggestions or experiences to share? thanks

Mean average precision results using the training parameters mentioned in README

Hi @filipradenovic,

Wondering if you have mAP results when running:

python3 -m cirtorch.examples.train YOUR_EXPORT_DIR --gpu-id '0' --training-dataset 'retrieval-SfM-120k' 
            --test-datasets 'roxford5k,rparis6k' --arch 'resnet101' --pool 'gem' --loss 'contrastive' 
            --loss-margin 0.85 --optimizer 'adam' --lr 1e-6 --neg-num 5 --query-size=2000 
            --pool-size=20000 --batch-size 5 --image-size 362

I got these results after around 30 epochs:

>> roxford5k: Evaluating...
>> roxford5k: mAP E: 69.27, M: 52.5, H: 25.54
>> roxford5k: mP@k[1, 5, 10] E: [88.24 78.63 72.01], M: [87.14 78.86 71.29], H: [58.57 45.43 37.  ]
>> rparis6k: Evaluating...
>> rparis6k: mAP E: 84.64, M: 67.1, H: 41.5
>> rparis6k: mP@k[1, 5, 10] E: [100.    95.14  92.86], M: [100.    97.71  96.  ], H: [94.29 84.29 81.43]

Does these look correct?

Also the mAP plot looked something like the following when plotted after every epoch:
newplot 16
newplot 29
newplot 31

What training parameters should I change to achieve mAP of 65.3 and 40.0 for roxford5k (M and H)?

Thanks!

GeM pooling parameter

Hi @filipradenovic ,

For your experiment on networks with whitening learned end-to-end, with triplet loss, trained on the Google Landmarks dataset 2018: could you share to which value the GeM pooling parameter p converged to?

If you could share learning curve showing the evolution of p over the training run, that would be even better :)

Thanks!

loss

Hello,I have a question about the loss. When I use the gem and rmac pooling in your code, the loss decreases slow, but I use AdaptiveAvgPool method in torchvision, loss decreases fast. Does this affect retrieval accuracy? I look forward to your reply. Thank you.

why 'qidxs' equals 'pidxs' sometimes?

I read in the 'retrieval-SfM-120k.pkl' using pickle. Then I checked whether the two corresponding values in qidxs and pidxs are different (They should be different, right? Because query and positive image should not be the same picture). However, I found some pairs have equal image idxs:

count = 0
for q, p in zip(data['train']['qidxs'], data['train']['pidxs']):
... if q == p:
... count += 1
...
count
47

47 pairs out of 181697 pairs have the same idx value. Is that normal or a bug?
Thanks!

"Training" by using the default parameters as stated in README

Environment: Linux

Parameter Setting: I used all the parameters as stated in README, i.e., the command line as follows:

python3.6 -m cirtorch.examples.train export_model --gpu-id '0' --training-dataset 'retrieval-SfM-120k' --test-datasets 'roxford5k,rparis6k' --arch 'resnet101' --pool 'gem' --loss 'contrastive' --loss-margin 0.85 --optimizer 'adam' --lr 5e-7 --neg-num 5 --query-size=2000 --pool-size=22000 --batch-size 5 --image-size 362

Error I occurred:
File "/tmp/dlinaf/ENV/lib/python3.6/site-packages/PIL/Image.py", line 2687, in open OSError: cannot identify image file <_io.BufferedReader name='/home/data/dlinaf/manifold_ranking/cnn_ir/source_code/cnnimageretrieval-pytorch-master/data/train/retrieval-SfM-120k/ims/0d/40/18/100dda42e5994d9e921060abcd18400d'>

When I trained the model, it works for 2 epochs but the error occurs in the 3rd epoch. It seems that one image cannot be identified. However, I have re-downloaded the dataset but it still does not work.

Could you please help me?

TypeError: integer argument expected, got float

Hi,
I used python3 to train the network, but there seems to be something wrong. Could you give me some advice? Thank u!

qiuchenli@qiuchenli-GS43VR-7RE:~/deeplearn_loop/pytorch-cnnimageretrieval/cnnimageretrieval-pytorch$ python3 -m cirtorch.examples.train YOUR_EXPORT_DIR --gpu-id '0' --training-dataset 'retrieval-SfM-120k' --test-datasets 'roxford5k,rparis6k' --arch 'resnet101' --pool 'gem' --loss 'contrastive' --loss-margin 0.85 --optimizer 'adam' --lr 1e-6 --neg-num 5 --query-size=2000 --pool-size=20000 --batch-size 5 --image-size 362

Creating directory if it does not exist:
'YOUR_EXPORT_DIR/retrieval-SfM-120k_resnet101_gem_contrastive_m0.85_adam_lr1.0e-06_wd1.0e-04_nnum5_qsize2000_psize20000_bsize5_imsize362'
Using pre-trained model 'resnet101'
imageretrievalnet.py: for 'resnet101' custom pretrained features 'imagenet-caffe-resnet101-features-10a101d.pth' are used
Evaluating network on test datasets...
roxford5k: Extracting...
roxford5k: database images...

4993/4993 done...
roxford5k: query images...
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/qiuchenli/deeplearn_loop/pytorch-cnnimageretrieval/cnnimageretrieval-pytorch/cirtorch/examples/train.py", line 507, in
main()
File "/home/qiuchenli/deeplearn_loop/pytorch-cnnimageretrieval/cnnimageretrieval-pytorch/cirtorch/examples/train.py", line 240, in main
test(args.test_datasets, model)
File "/home/qiuchenli/deeplearn_loop/pytorch-cnnimageretrieval/cnnimageretrieval-pytorch/cirtorch/examples/train.py", line 444, in test
qvecs = extract_vectors(net, qimages, image_size, transform, bbxs)
File "/home/qiuchenli/deeplearn_loop/pytorch-cnnimageretrieval/cnnimageretrieval-pytorch/cirtorch/networks/imageretrievalnet.py", line 174, in extract_vectors
for i, input in enumerate(loader):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 210, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 230, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 42, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 42, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/qiuchenli/deeplearn_loop/pytorch-cnnimageretrieval/cnnimageretrieval-pytorch/cirtorch/datasets/genericdataset.py", line 56, in getitem
img = self.transform(img)
File "/home/qiuchenli/.local/lib/python3.5/site-packages/torchvision/transforms/transforms.py", line 49, in call
img = t(img)
File "/home/qiuchenli/.local/lib/python3.5/site-packages/torchvision/transforms/transforms.py", line 76, in call
return F.to_tensor(pic)
File "/home/qiuchenli/.local/lib/python3.5/site-packages/torchvision/transforms/functional.py", line 70, in to_tensor
img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
File "/usr/lib/python3/dist-packages/PIL/Image.py", line 674, in tobytes
self.load()
File "/usr/lib/python3/dist-packages/PIL/Image.py", line 1961, in load
self.im = self.im.crop(self.__crop)
TypeError: integer argument expected, got float

Are query images suppose to be in the database also?

Hi,
When I look at the test code for oxford5k, it seems that the query images are in the database also. When images for respectively the database and the query are found:

cfg = configdataset(dataset, os.path.join(get_data_root(), 'test'))
images = [cfg['im_fname'](cfg,i) for i in range(cfg['n'])]
qimages = [cfg['qim_fname'](cfg,i) for i in range(cfg['nq'])]

When I run

for im in qimages:
    if im in images:
        print(im)

it seems that all the images are also available in the database. As far as I understand, they are not suppose to be both places? It is probably me who am missing something, so a clarification on this would be appreciated. Thank you!

Running the toolbox on Windows

As we know from the script, there are some Linux shell command in the download.py, such as 'wget', 'rm', 'rm -rf', etc. Actually, we can easily find alternative commands in windows 10. Here I summon a brief list:
wget: you can download and install wget directly from here, where you can choose *64 or *86 architecture.
rm: del (remove files)
rm -rf : rd(remove directories)
download.txt

The attachment is a modified script that works properly on windows 10. Because git do not support supply with a .py file, it has been renamed into .txt. But I am pretty sure that it works.

Why 'qidxs' and 'pidxs' are required in database?

Thanks for your great work on image retrieval!

I found your training data retrieval-SfM-120k.pkl include ['cids', 'qidxs', 'pidxs', 'cluster'].
After generate clusters, why I have to create positive pairs of images(qidxs,pidxs).

Thanks!

Is SfM-120k dataset necessary?

In "Our pretrained networks" you wrote a command, that, when run, tries to download SfM-120k dataset, which is huge (>50GB).
Why does that happen? I don't see where would it be used (since network is pretrained)
Is it possible to disable that download and to proceed without this dataset?

can you explain how to decrease positions of positives in cirtorch.utils.evaluate?

Your working is really great! It help me a lot. When I calculated the MAP through your code, I found a problem. In cirtorch.utils.evaluate.py, you define a function:

def compute_map(ranks, gnd, kappas=[]):
......
k = 0;
ij = 0;
if len(junk):
# decrease positions of positives based on the number of
# junk images appearing before them
ip = 0
while (ip < len(pos)):
while (ij < len(junk) and pos[ip] > junk[ij]):
k += 1
ij += 1
pos[ip] = pos[ip] - k
ip += 1
.......
In this code from line 83 in evaluate.py, why do you reset the k=0 and ij=0 in the "while(ip < len(pos))"? In my opinion, if we do not reset k and ij, it may not be reasonable. Can you explain the reason? Thank you very much!

How to create the clusters for the dataset?

Hi,filip!Glad to see your marvelous work on image retrieval.
You wrote in your paper that the SfM can automatically select the training data,so I wonder that if you can share the code about the Structure-from-Motion system for the training data?So,we can use the code for our own data.

finding positive pairs

Hi Filip,
For your dataset like retrieval-SfM-120k.pkl, did you find a positive pair for each image in 'cids' (if exist)?
I'm trying to build my own dataset using exactly the same approach as yours, using colmap.
Do you plan to publish the code for extracting the positive pairs?
Thank you very much!

Evaluating network with PyTorch 0.4.1

When evaluating the provided fine-tuned networks in multi-scale mode with PyTorch 0.4.1 the results are slightly different when compared to the ones in the README that was generated after using 0.3.0, ie:

Model Oxford Paris ROxf (M) RPar (M) ROxf (H) RPar (H)
VGG16-GeM (v0.3.0) 87.2 87.8 60.5 69.3 32.4 44.3
VGG16-GeM (v0.4.1) 87.3 87.8 60.9 69.3 32.9 44.2
ResNet101-GeM (v0.3.0) 88.2 92.5 65.3 76.6 40.0 55.2
ResNet101-GeM (v0.4.1) 88.2 92.5 65.4 76.7 40.1 55.2

This is caused by a different default behavior of the torch.nn.functional.upsample function used for multi-scale evaluation, ie:

With align_corners = True, the linearly interpolating modes
(linear, bilinear, and trilinear) don't proportionally align the
output and input pixels, and thus the output values can depend on the
input size. This was the default behavior for these modes up to version
0.3.1. Since then, the default behavior is align_corners = False.
See :class:~torch.nn.Upsample for concrete examples on how this
affects the outputs.

Also, since 0.4.1, this function is deprecated in favor of torch.nn.functional.interpolate.

Both of these issues will be handled when migrating to PyTorch 0.4.1 or higher.

The pretrain model for ResNet101

you say 'for some models, we have imported features (convolutions) from caffe because the image retrieval performance is higher for them' , what did you do for the caffe model to improve the performance?

Interest in PR w/ additional features?

In my fork, I have code for

  • running on INSTRE
  • alpha query expansion
  • diffusion
  • using regional features instead of global features

Would you be interested in merging a PR that includes (some of) that functionality? It'd take a little work to clean my fork up to be ready, so double checking before I put the effort in.

Why does the average negative distance decreases during the training process?

I guess I am making mistakes somewhere, because I got the average distance decreases during the training process, and the test mAP does not keep increasing with the epoch like you showed in your paper though the training loss is decreasing. The learning rate and weight decay are set to default. The difference is that I replaced the max pooling layers in VGG16 to a learnable mixed max and average pooling. Should I adjust the learning rate?
Many Thx in advance.

About the number of images in SfM.

Hi, I check the cdis in pkl file, and find

len(train['cids']) == 91642
len(val['cids']) == 6403

Does it mean training dataset only uses 91k images and val dataset use 6k images, not 120k or 30k.

Thanks.

How can I test wiht Flickr100k

Excuse me, sir:
During our experiment, we would like to test the performance not only on Oxford5k or Paris6k. We want to try test on Oxford105k and Paris106k dataset. Can you tell me any tip I should prepare for this bigger dataset? Such as a new combined ground_truth file or save it in which path or it's still ok keeping oxford/paris ground_truth separate with Flickr100k's ground truth file ?
Thank you so much.

TypeError when using 'rmac' pooling method.

Hi Filip, thank you for this great work. I am trying to use this codes to evaluate the off-the-shelf networks, I tested with "vgg" and "mac" features and it works well. But when I tried using 'rmac' pooling method, I got the following error.

zxw@Dell:~/codes/cnnimageretrieval-pytorch$ python3 -m cirtorch.examples.test --gpu-id '0' --network-offtheshelf 'vgg16-rmac' --datasets 'oxford5k' --multiscale
>> Loading off-the-shelf network:
>>>> 'vgg16-rmac'
>> imageretrievalnet.py: for 'vgg16' custom pretrained features 'imagenet-caffe-vgg16-features-d369c8e.pth' are used
>>>> loaded network: 
  (meta): dict( 
     architecture: vgg16
     pooling: rmac
     whitening: False
     outputdim: 512
     mean: [0.485, 0.456, 0.406]
     std: [0.229, 0.224, 0.225]
  )

>> oxford5k: Extracting...
>> oxford5k: database images...
Traceback (most recent call last):
  File "/home/zxw/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/zxw/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/zxw/codes/cnnimageretrieval-pytorch/cirtorch/examples/test.py", line 207, in <module>
    main()
  File "/home/zxw/codes/cnnimageretrieval-pytorch/cirtorch/examples/test.py", line 178, in main
    vecs = extract_vectors(net, images, args.image_size, transform, ms=ms, msp=msp)
  File "/home/zxw/codes/cnnimageretrieval-pytorch/cirtorch/networks/imageretrievalnet.py", line 181, in extract_vectors
    vecs[:, i] = extract_ms(net, input_var, ms, msp)
  File "/home/zxw/codes/cnnimageretrieval-pytorch/cirtorch/networks/imageretrievalnet.py", line 203, in extract_ms
    v += net(input_var_t).pow(msp).cpu().data.squeeze()
  File "/home/zxw/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zxw/codes/cnnimageretrieval-pytorch/cirtorch/networks/imageretrievalnet.py", line 68, in forward
    o = self.norm(self.pool(self.features(x))).squeeze(-1).squeeze(-1)
  File "/home/zxw/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zxw/codes/cnnimageretrieval-pytorch/cirtorch/layers/pooling.py", line 54, in forward
    return LF.rmac(x, L=self.L, eps=self.eps)
  File "/home/zxw/codes/cnnimageretrieval-pytorch/cirtorch/layers/functional.py", line 39, in rmac
    Wd = idx.tolist()[0]
TypeError: 'int' object is not subscriptable

I checked the code, It seems that 'idx.tolist()' is an integer so it is not subscriptable, I change the code for rmac feature in functional.py in line 39:
Wd = idx.tolist()[0] to
Wd = idx.tolist()
also, in line 41:
Hd = idx.tolist()[0] to
Hd = idx.tolist()
then I tested the code (using rmac pooling method) and it works well.

Did I take a right action? Is that a bug?
Looking forward to your response, thank you.

Expected runtime

Hi --

Roughly what should the expected runtime be for e.g. running the "off-the-shelf" example:

python3 -m cirtorch.examples.test --gpu-id '0' --network-offtheshelf 'resnet101-gem'
                --datasets 'oxford5k,paris6k,roxford5k,rparis6k'
                --whitening 'retrieval-SfM-120k' --multiscale

Thanks!

ims dataset error

I download the ims dataset, but /33/30/d7/0590b47980c41f08fd0dbf8dbbd73033 FileNotFoundError.

pytorch version of Deep Shape Matching

Hi filip, thank you for your excellent work. I wonder if you can provide a pytorch version of deep shape matching? If you can, this will be very helpful.

RuntimeError: expand(torch.FloatTensor{[1, 1]}, size=[1]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

model = ActorCritic(num_inputs, num_outputs, hidden_size).to(device)
optimizer = optim.Adam(model.parameters(), lr=lr)

for _ in range(num_steps):
state = torch.FloatTensor(state).to(device)
dist, value = model(state)

19 state = torch.FloatTensor(state).to(device)
20 print(state)
---> 21 dist, value = model(state)
22
23 action = dist.sample()

~.conda\envs\reinforcement\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)

in forward(self, x)
27 value = self.critic(x)
28 mu = self.actor(x)
---> 29 std = self.log_std.exp().expand_as(mu)
30 dist = Normal(mu, std)
31 return dist, value

RuntimeError: expand(torch.FloatTensor{[1, 1]}, size=[1]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

Why does 'qidxs' and 'pidxs' have some values which present in both of the lists?

Hi Filip,

Why does 'qidxs' and 'pidxs' have some values which present in both of the lists? I thought qidxs are from training 'query' set and pidxs are from the training 'database' set which are supposed to be disjoint?
so you did not divide the training data into two sets: training query and training database?
In other words, you combine training query and training database into one set called "training" and used training query set and training database set interchangeably?

Multiple loss.backward() for each minibatch during training

Hi @filipradenovic,

Sorry for a trivial question but I'm not aware how PyTorch stores gradient for multiple calls of loss.backward() during before calling optimizer.step().

  1. Does it sum all the gradients or store an average?
  2. Is there a reason you did call loss.backward() multiple times and not summing up all the losses and calling loss.backward() once?
  3. Also, is there a reason for computing loss for each tuple separately in minibatch for training and computing loss for the whole minibatch together during validation?

Thanks!

How did you draw Fig2 in the "Fine-tuning CNN ..." paper?

image
This figure is very impressive and straightforward to visualize the most distinguish patch of an image. I think it will do a great help in understanding which parts contributed more in my own work. Would you be so kind to opensource that code?
Many thanks in advance~

Get " RuntimeError: CUDA error (3): initialization error " when running train.py with GPU

Traceback (most recent call last):
File "/home/superuser/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/superuser/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/superuser/Projects/cnnimageretrieval-pytorch-master-20181101/cirtorch/examples/train.py", line 558, in
main()
File "/home/superuser/Projects/cnnimageretrieval-pytorch-master-20181101/cirtorch/examples/train.py", line 268, in main
loss = train(train_loader, model, criterion, optimizer, epoch)
File "/home/superuser/Projects/cnnimageretrieval-pytorch-master-20181101/cirtorch/examples/train.py", line 307, in train
for i, (input, target) in enumerate(train_loader):
File "/home/superuser/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 336, in next
return self._process_next_batch(batch)
File "/home/superuser/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/home/superuser/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/superuser/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/superuser/Projects/cnnimageretrieval-pytorch-master-20181101/cirtorch/datasets/traindataset.py", line 110, in getitem
output.append(self.loader(self.images[self.nidxs[index][i]]))
RuntimeError: CUDA error (3): initialization error

It has no error when uses CPU only.
I am using Python 3.6 , PyTorch 0.4.1 , and CUDA 9.2, what's wrong?

whiten

hello!I want to know how I can get my own whiten.pkl

Not learning batchnorm weights during training

Hi @filipradenovic,

I am wondering what is the reason behind not learning batch norm weights during training.

    model.train()
    model.apply(set_batchnorm_eval)
    classname = m.__class__.__name__
    if classname.find('BatchNorm') != -1:
        # freeze running mean and std
        m.eval()
        # freeze parameters
        # for p in m.parameters():
            # p.requires_grad = False

Thanks!

train

hi, i have a problem when i run the train.py
error: argyment --training-dataset/-d: invalid choice: 'retrieval-SfM-120k',
can you give me some advices?thanks

RuntimeError: expand(torch.cuda.FloatTensor{[2048, 1]}, size=[2048]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

I run train code on torch (0.4.0) torchvision (0.2.1)

python3 -m cirtorch.examples.train ./train --gpu-id '3' --training-dataset 'retrieval-SfM-120k' \
            --test-datasets 'roxford5k,rparis6k' --arch 'resnet101' --pool 'gem' --loss 'contrastive' \
            --loss-margin 0.85 --optimizer 'adam' --lr 1e-6 --neg-num 5 --query-size=2000 \
            --pool-size=20000 --batch-size 5 --image-size 362

met errors:

loss = train(train_loader, model, criterion, optimizer, epoch)
File "/data5/zxt/downloads/cnnimageretrieval-pytorch-master/cirtorch/examples/train.py", line 283, in train
train_loader.dataset.create_epoch_tuples(model)
File "/data5/zxt/downloads/cnnimageretrieval-pytorch-master/cirtorch/datasets/traindataset.py", line 170, in create_epoch_tuples
qvecs[:, i+1] = net(Variable(input.cuda())).data
RuntimeError: expand(torch.cuda.FloatTensor{[2048, 1]}, size=[2048]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

some questions about rmac pooling method

hello,
firstly, your codes really help me a lot, thanks!
nowadays i am trying to use rmac pooling method to get images' feature vectors from [N, channels, W, H] feature maps, but i got some questions.

`W = x.size(3)
H = x.size(2)

w = min(W, H)
w2 = math.floor(w/2.0 - 1)

b = (max(H, W)-w)/(steps-1)
(tmp, idx) = torch.min(torch.abs(((w**2 - w*b)/w**2)-ovr), 0) # steps(idx) regions for long dimension

# region overplus per dimension
Wd = 0;
Hd = 0;
if H < W:  
    Wd = idx.item() + 1
elif H > W:
    Hd = idx.item() + 1`

if W==H in my feature maps, codes above seems not work, does it matters?

Why compute_ap() does not generate correct result?

Hi, I am testing the part that generates the mAP, and it does not generate correct answer.

I checked the details of the code and it is due to this part:

if rank == 0:
precision_0 = 1.
else:
precision_0 = float(j) / rank
precision_1 = float(j + 1) / (rank + 1)
ap += (precision_0 + precision_1) * recall_step / 2.

Could you explain why it is not simply written as
ap += (precision_1) * recall_step?

Many thanks!

UserWarning

/home/j***/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:747: UserWarning: Possibly corrupt EXIF data. Expecting to read 8 bytes but only got 0. Skipping tag 41487

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.