imatge-upc / retrieval-2016-deepvision Goto Github PK

Faster R-CNN features for Instance Search

Home Page: http://imatge-upc.github.io/retrieval-2016-deepvision/

License: MIT License

Shell 2.74% Python 97.26%

retrieval-2016-deepvision's Introduction

Faster R-CNN Features for Instance Search

	Paper accepted at 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops

| | | | | |:-:|:-:|:-:|:-:|:-:| | Amaia Salvador | Xavier Giro-i-Nieto | Ferran Marques | Shin'ichi Satoh |

A joint collaboration between:


Universitat Politecnica de Catalunya (UPC)	UPC ETSETB TelecomBCN	UPC Image Processing Group	National Institute of Informatics

Publication

Abstract

Image representations derived from pre-trained Convolutional Neural Networks (CNNs) have become the new state of the art in computer vision tasks such as instance retrieval. This work explores the suitability for instance retrieval of image- and region-wise representations pooled from an object detection CNN such as Faster R-CNN. We take advantage of the object proposals learned by a Region Proposal Network (RPN) and their associated CNN features to build an instance search pipeline composed of a first filtering stage followed by a spatial reranking. We further investigate the suitability of Faster R-CNN features when the network is fine-tuned for the same objects one wants to retrieve. We assess the performance of our proposed system with the Oxford Buildings 5k, Paris Buildings 6k and a subset of TRECVid Instance Search 2013, achieving competitive results.

Cite

You can find our paper in the Proceedings of the DeepVision: Deep Learning in Computer Vision Workshop at CVPR 2016. Our preprint is also available on arXiv.

Please cite with the following Bibtex code:

@InProceedings{Salvador_2016_CVPR_Workshops,
author = {Salvador, Amaia and Giro-i-Nieto, Xavier and Marques, Ferran and Satoh, Shin'ichi},
title = {Faster R-CNN Features for Instance Search},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2016}
}

You may also want to refer to our publication with the more human-friendly Chicago style:

Amaia Salvador, Xavier Giro-i-Nieto, Ferran Marques and Shin'ichi Satoh. "Faster R-CNN Features for Instance Search." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 2016.

Talk on video

2016-05-Seminar-AmaiaSalvador-DeepVision from Image Processing Group on Vimeo.

Slides

Convolutional Features for Instance Search from Xavier Giro

Code Instructions

This python repository contains the necessary tools to reproduce the retrieval pipeline based on off-the-shelf Faster R-CNN features.

Setup

You need to download and install Faster R-CNN python implementation by Ross Girshick. Point params['fast_rcnn_path'] to the Faster R-CNN root path in params.py.
Download Oxford and Paris Buildings datasets. There are scripts under data/images/paris and data/images/oxford/ that will do that for you.
Download Faster R-CNN models by running data/models/fetch_models.sh.

Usage

Data preparation. Run read_data.py to create the lists of query and database images. Run this twice changing params['dataset'] to 'oxford' and 'paris'.
Feature Extraction. Run features.py to extract Fast R-CNN features for all images in a dataset and store them to disk.
Ranking. Run ranker.py to generate and store the rankings for the queries of the chosen dataset.
Rerank based on region features by running rerank.py.
Evaluation. Run eval.py to obtain the Average Precision.
Visualization. Run vis.pyto populate data/figures with the visualization of the top generated rankings for each query.

Behind the scenes

Acknowledgements

We would like to especially thank Albert Gil Moreno and Josep Pujal from our technical support team at the Image Processing Group at UPC.


Albert Gil	Josep Pujal


We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeForce GTX Titan Z and Titan X used in this work.
The Image ProcessingGroup at the UPC is a SGR14 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office.
This work has been developed in the framework of the project BigGraph TEC2013-43935-R, funded by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF).

Contact

If you have any general doubt about our work or code which may be of interest for other researchers, please use the public issues section on this github repo. Alternatively, drop us an e-mail at [email protected] or [email protected].

retrieval-2016-deepvision's People

Contributors

Stargazers

Watchers

Forkers

amaiasalvador wanjinchang pkuoocs xsongx ml-lab v-italy dectinc gninnur matrixping benjamesbabala amos-zq zencoding mk-z chiehchiu wait1988 zkailinzhang cv-ip caomw rexnxiaobai xiaozhuka mydude jassonvia carlesventura brettll dejunzhang clear-datacenter zimenglan-sysu-512 evanweiner tybxiaobao xuepo99 anazou akumar14 twinsyssy1018 skyuuka zhangxujinsh zhengkaifu ucasqcz yiboin silasxue ilovecv lijian8 dingxiaoliang33 6676401088 donaldlee2008 kyocen antriv vinodrajendran001 yiliangnie lzdjlu huan2016 selinache wenxuanliu woshidaerduotu99 pustar lgen hyzcn chenbangfeng milestonesvn mohanarunachalam thinkronize peternara sinianyutian k-sandhu mzk665 hxl1990 kwan-ywan lihua213 xshuyu grantfrefg jliangqiu reloadbrain liviust cosimo17 garfield2005 gsygsy96 kunlqt rizwanabro hhgxx123 csuk0914 zjuqiushi ann1019 pankajmehar lim-0 kapitsa2811 ducanhvina17 ttl518 asukaj callmedxx mary258 lykasbongbongbong gkuo06 fedral

retrieval-2016-deepvision's Issues

l2 normalization never actually occurs in this codebase

In many places in the code there is normalize(feats) but the return value is never actually used. The feats array remains un-normalized. This function will not run in-place as it is currently called.

Use feats = normalize(feats)

about dataset

Could I use my own dataset to extract features?

ValueError: could not broadcast input array from shape (0,512) into shape (512)

Hello~
when the database is paris and run ranker.py, I have the following error:
wh@rsliu-X10DAi:~/FR-for-instance/retrieval-2016-deepvision-master$ python ranker.py
Applying PCA
Traceback (most recent call last):
File "ranker.py", line 165, in
R.rank()
File "ranker.py", line 146, in rank
self.get_query_vectors()
File "ranker.py", line 71, in get_query_vectors
self.query_feats[i,:] = self.db_feats[np.where(np.array(self.database_list) == query_file)]
ValueError: could not broadcast input array from shape (0,512) into shape (512)

It is strange because when the database is oxford, it is OK.
I don't know how to fix it. Please help me.
Many thanks.

Trained Model

Dear Amaia,

I read your paper “Faster R-CNN Features for Instance Search” and find it a great work. We are interested in trying out the method and apply it in our research.

I download the codes from the GitHub and can only get the best performance of Table 1 IPA-max, IPA-max 55.9%. We cannot implement the Fine-tune Faster R-CNN model, would you please share your fine tuned mode or the pre-trained objects of the Microsoft COCO dataset with us then we can fine-tune the model by ourself?

Many thanks,

Lianli Gao

incompletelly downloading oxford and paris dataset?

i run the script get_oxford.sh and get_paris.sh to download the dataset, but i found that only 1059 images for oxford and 2794 images for paris. while in your paper, there are 5063 images for oxford and 6412 for paris.
btw, i run ranker.py, and found ValueError: could not broadcast input array from shape (0,512) into shape (512), i guess that it causes by missing some images.

thanks.

ZF Model

Do you have examples of what the params.py file should look like for the ZF model, rather than the VGG model. I could probably track down the appropriate prototxt file myself, but I want to make sure I'm not making some mistake.

Thanks!
Ben

Choice of features

I was curious if one of the creators might be able to give a little insight into the choice of features used.

It's my understanding that the faster-rcnn roi_pooling layer maxpools the previous conv_* layer in a 7x7 grid, yielding a (n_boxes, 512, 7, 7) output. You take this output and either max- or sum-pool the 7x7 grid to get a 512d vector.

I was wondering if you had tried your current approaches using the fc6 layer that maps (n_boxes, 512, 7, 7) -> (n_boxes, 4096). Or, alternatively, flatten (n_boxes, 512, 7, 7) -> (n_boxes, 512 * 7 * 7) -- this would be a big vector, but the 7x7 grid is an roi_pooling parameter that could be reduced to 2x2 or 3x3.

Just curious to see if you (or others) had experimented at all with these different featurizations, or if there are reasons to think that they would not perform well.

Thanks!
Ben

About ranker.py

Hi, when I run the .py file, there show the error meessage about that where set params['dataset'] = 'oxford' in my params.py, How can I fix it? Thx :
Traceback (most recent call last):
File "ranker.py", line 169, in
R.rank()
File "ranker.py", line 149, in rank
self.get_query_vectors()
File "ranker.py", line 74, in get_query_vectors
self.query_feats[i,:] = self.db_feats[np.where(np.array(self.database_list) == query_file)]
ValueError: could not broadcast input array from shape (0,512) into shape (512)

Asking some help for the pretrained model

Dear Amaia and Xavier. I am very interest in your work that “Faster R-CNN Features for Instance Search”. But I download the Faster R-CNN models failed by running "data/models/fetch_models.sh"(https://github.com/imatge-upc/retrieval-2016-deepvision) Could you please give me a new download link. Thank you very much.

wrong dataset name

        # PCA MODEL - use paris for oxford data and vice versa
        if self.dataset is 'paris':

            self.pca = pickle.load(open(params['pca_model'] + '_oxford.pkl', 'rb'))

        elif self.dataset is 'oxford':

            self.pca = pickle.load(open(params['pca_model'] + '_paris.pkl', 'rb'))

the names of dataset are exchanged.

To get it to work with QE

i.e. if self.stage is 'rerank2nd':
self.get_query_local_feat(frames_sorted[i_qe],locations_sorted[i_qe]) must be reshaped to (-1,1) before adding to query_feats

What is the dataset for fine-tuning?

Thanks for your great work. I have a question to ask.
It seems that the datasets for fine-tuning (oxford/paris/ins2013) are also the datasets you perform query on? Am I right? I'm new in image retrieval and I not sure if it's a convention, but it seems to be unfair if the "training set" and the "test set" are the same.

Ask for the download link of pretrained model ~

Hi!
I'm interested in your excellent work.
I'm trying to list your method as a comparison in my paper. However, the download link provided in fetch_model.sh is seems invalid now. could you please update it? Thanks a lot!

No region proposals detected

Hi while running features.py I keep getting tis error:

File "/home/Athma/Downloads/InstanceSearch/retrieval-2016-deepvision/test.py", line 76, in _get_rois_blob
rois, levels = _project_im_rois(im_rois, im_scale_factors)
File "/home/Athma/Downloads/InstanceSearch/retrieval-2016-deepvision/test.py", line 91, in _project_im_rois
im_rois = im_rois.astype(np.float, copy=False)
AttributeError: 'NoneType' object has no attribute 'astype'

My caffe installation of faster rcnn seems to be working just fine. and network gets loaded in caffe.
this is where the issue comes:

I0509 13:05:38.588491 22199 net.cpp:270] This network produces output bbox_pred
I0509 13:05:38.588496 22199 net.cpp:270] This network produces output cls_prob
I0509 13:05:38.588518 22199 net.cpp:283] Network initialization done.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 548317115
I0509 13:05:39.171497 22199 net.cpp:816] Ignoring source layer data
I0509 13:05:39.247697 22199 net.cpp:816] Ignoring source layer loss_cls
I0509 13:05:39.247720 22199 net.cpp:816] Ignoring source layer loss_bbox
I0509 13:05:39.249155 22199 net.cpp:816] Ignoring source layer silence_rpn_cls_score
I0509 13:05:39.249172 22199 net.cpp:816] Ignoring source layer silence_rpn_bbox_pred
Extracting database features...
--Traceback ERROR---

features.py Cause ERROR "check_array ValueError: Found array with 0 sample(s)"

I have successfully run the following:

database (oxford)
database (paris)
data/models/fetch_models.sh
read_data.py

But when I run features.py I have the following error:

Traceback (most recent call last): File "features.py", line 119, in <module> learn_transform(params,feats) File "features.py", line 23, in learn_transform feats = normalize(feats) File "/usr/lib/python2.7/dist-packages/sklearn/preprocessing/data.py", line 1280, in normalize estimator='the normalize function', dtype=FLOAT_DTYPES) File "/usr/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 407, in check_array context)) ValueError: Found array with 0 sample(s) (shape=(0, 512)) while a minimum of 1 is required by the normalize function.