Coder Social home page Coder Social logo

lu-feng / cricavpr Goto Github PK

View Code? Open in Web Editor NEW
102.0 1.0 6.0 2.32 MB

Official repository for the CVPR 2024 paper "CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition".

License: MIT License

Python 100.00%
image-localization loop-closure-detection visual-geolocalization visual-place-recognition adapter relocalization visual-slam

cricavpr's Introduction

CricaVPR

This is the official repository for the CVPR 2024 paper "CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition".

Getting Started

This repo follows the framework of GSV-Cities for training, and the Visual Geo-localization Benchmark for evaluation. You can download the GSV-Cities datasets HERE, and refer to VPR-datasets-downloader to prepare test datasets.

The test dataset should be organized in a directory tree as such:

├── datasets_vg
    └── datasets
        └── pitts30k
            └── images
                ├── train
                │   ├── database
                │   └── queries
                ├── val
                │   ├── database
                │   └── queries
                └── test
                    ├── database
                    └── queries

Before training, you should download the pre-trained foundation model DINOv2(ViT-B/14) HERE.

Train

python3 train.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=pitts30k --foundation_model_path=/path/to/pre-trained/dinov2_vitb14_pretrain.pth --epochs_num=10

Test

To evaluate the trained model:

python3 eval.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=pitts30k --resume=/path/to/trained/model/CricaVPR.pth

To add PCA:

python3 eval.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=pitts30k --resume=/path/to/trained/model/CricaVPR.pth --pca_dim=4096 --pca_dataset_folder=pitts30k/images/train

Trained Model

You can directly download the trained model HERE.

Related Work

Our another work (two-stage VPR based on DINOv2) SelaVPR achieved SOTA performance on several datasets. The code is released at HERE.

Acknowledgements

Parts of this repo are inspired by the following repositories:

GSV-Cities

Visual Geo-localization Benchmark

DINOv2

Citation

If you find this repo useful for your research, please consider leaving a star⭐️ and citing the paper

@inproceedings{lu2024cricavpr,
  title={CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition},
  author={Lu, Feng and Lan, Xiangyuan and Zhang, Lijun and Jiang, Dongmei and Wang, Yaowei and Yuan, Chun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month={June},
  year={2024}
}

cricavpr's People

Contributors

lu-feng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

cricavpr's Issues

Releasing the model on torch.hub?

Hi, thank you for uploading the code and trained models!
Could you release the model on torch.hub? It is quite simple to do and allows people to use your model with two lines of code, allowing more people to use your model and helping to spread your work!
For example I did it for CosPlace, and the trained model can be automatically downloaded from anywhere without cloning the repo just like this

import torch
model = torch.hub.load("gmberton/cosplace", "get_trained_model", backbone="ResNet50", fc_output_dim=2048)

so it would be easier for everyone to use and I could also add it to this benchmarking repo.

PS: I really like the focus on small dimensional features, congrats on the paper!

questions about test dataset

Hello, I used the testing program for Pitts30k in NetVLAD and used the weight file you provided to reduce the dimensionality to 4096 using PCA. The results I obtained seem to be inconsistent with those in the paper. Is there a problem with the dataset?
R@1: 88.0, R@5: 94.7, R@10: 96.6, R@20: 97.4

question about test

Hi, @Lu-Feng
Thanks you for an interesting research!

I have some question about cross image encoder.

from the question link below,
#7 (comment)

I think that cross image encoder could use prior knowledge(consecutiveness of queries) of test dataset

from supplemetary material (table 10),
there is an ablation study of cross-image encoder
the performance on pitts30k is written below,
No encoder : 90.6, 95.9, 97.2
Transformer encoder layer x 2(default) : 94.8, 97.4, 98.1

compare this performance of crica in the above link,

when test dataloader use shuffle=True,

2024-05-02 12:47:59   [0]: Recalls on < BaseDataset, pitts30k - #database: 10000; #queries: 6816 >: R@1: 92.2, R@5: 96.2, R@10: 97.2, R@100: 99.4
2024-05-02 12:49:38   [1]: Recalls on < BaseDataset, pitts30k - #database: 10000; #queries: 6816 >: R@1: 91.8, R@5: 96.1, R@10: 97.3, R@100: 99.4
2024-05-02 12:51:17   [2]: Recalls on < BaseDataset, pitts30k - #database: 10000; #queries: 6816 >: R@1: 92.0, R@5: 96.1, R@10: 97.2, R@100: 99.4
2024-05-02 12:52:57   [3]: Recalls on < BaseDataset, pitts30k - #database: 10000; #queries: 6816 >: R@1: 91.9, R@5: 96.1, R@10: 97.3, R@100: 99.4
2024-05-02 12:54:37   [4]: Recalls on < BaseDataset, pitts30k - #database: 10000; #queries: 6816 >: R@1: 92.0, R@5: 96.0, R@10: 97.3, R@100: 99.4
2024-05-02 12:56:28   [5]: Recalls on < BaseDataset, pitts30k - #database: 10000; #queries: 6816 >: R@1: 92.1, R@5: 96.0, R@10: 97.3, R@100: 99.3
2024-05-02 12:58:21   [6]: Recalls on < BaseDataset, pitts30k - #database: 10000; #queries: 6816 >: R@1: 92.1, R@5: 96.0, R@10: 97.3, R@100: 99.4
2024-05-02 13:00:23   [7]: Recalls on < BaseDataset, pitts30k - #database: 10000; #queries: 6816 >: R@1: 92.1, R@5: 96.2, R@10: 97.2, R@100: 99.4
2024-05-02 13:02:23   [8]: Recalls on < BaseDataset, pitts30k - #database: 10000; #queries: 6816 >: R@1: 91.9, R@5: 95.9, R@10: 97.2, R@100: 99.4
2024-05-02 13:04:21   [9]: Recalls on < BaseDataset, pitts30k - #database: 10000; #queries: 6816 >: R@1: 92.0, R@5: 95.9, R@10: 97.2, R@100: 99.3
2024-05-02 13:04:21   Average recall on < BaseDataset, pitts30k - #database: 10000; #queries: 6816 >: 91.99090375586854

it seems cross image encoder seems to have less impact when test dataloader shuffle=True.
and i think that this issue should be solved .

Release Code

Hello,when will you be releasing the code please?

How you generate the feature map in Fig.6?

Thanks for sharing your work! I wonder how you generate the visualization of feature maps in Fig.6 in the paper. Could you please give some detailed explanation of it (about which layer you used, how you combine the features and so on) or would you release relevant code?

Why PCA?

Hi, thank you for the code!

Any specific reason the dimensionality reduction is done with PCA? I understand the flexibility of storing the PCA matrix and controlling the dimensionality after fitting, but a learnable linear projection at the end of the network may deliver better performance, did you test that?

Thanks!

Quesion about infer_batch_size

Thank you for sharing your remarkable work. I am truly impressed by the performance it shows. However, I have a question about the infer_batch_size.

The paper mentioned, "An inference batch contains 8 images for Pitts30K and 16 images for others". Given that CricaVPRNet is designed to encode along the "batch" dimension, I am curious whether the infer_batch_size or the image order within the dataset might affect the generated descriptor. If this is the case, could different test results be observed with varying infer_batch_size or if the test set is shuffled?

Please let me know if I have misunderstood or overlooked any details.

Questions for DINOv2

Hello,how does your code reduce features dim to 512,1024 after importing DINOv2? Gem pooling or adding a linear layer?

Could you please give me some help about the NordLand dataset?

I found the model preform very bad when using NordLand dataset while performing very good in other datasets, I download the dataset from here because the download url of this dataset in https://github.com/gmberton/VPR-datasets-downloader has been unavailable. Then I also use the left part of download_nordland.py to format this dataset. However, the eval result is very unsatisfying. Could you please give some help about Nordland dataset?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.