Coder Social home page Coder Social logo

tgdt's Introduction

Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training

This repo is built on top of VSE++ and TERAN.

Setup

Setup python environment using conda:

conda env create --file environment.yml
conda activate gls
export PYTHONPATH=.

Get the data

  1. Download and extract the data folder, containing annotations, the splits by Karpathy et al. and ROUGEL - SPICE precomputed relevances for both COCO and Flickr30K datasets:
wget http://datino.isti.cnr.it/teran/data.tar
tar -xvf data.tar
  1. Download the bottom-up features for both COCO and Flickr30K. We use the code by Anderson et al. for extracting them. The following command extracts them under data/coco/ and data/f30k/. If you prefer another location, be sure to adjust the configuration file accordingly.
# for MS-COCO
wget http://datino.isti.cnr.it/teran/features_36_coco.tar
tar -xvf features_36_coco.tar -C data/coco

# for Flickr30k
wget http://datino.isti.cnr.it/teran/features_36_f30k.tar
tar -xvf features_36_f30k.tar -C data/f30k

Evaluate

Download and extract our pre-trained models.

Then, issue the following commands for evaluating a given model.

# F30K
python3 test.py runs/f30k_m0.3/model_best_rsum.pth.tar --config configs/f30k_global.yaml
python3 test.py runs/f30k_m0.3/model_best_rsum.pth.tar --config configs/f30k_local.yaml
python3 test_gl.py runs/f30k_m0.3/model_best_rsum.pth.tar --config configs/f30k_local.yaml

# COCO
python3 test.py runs/coco_m0.3/model_best_rsum.pth.tar --config configs/coco_global.yaml
python3 test.py runs/coco_m0.3/model_best_rsum.pth.tar --config configs/coco_local.yaml
python3 test_gl.py runs/coco_m0.3/model_best_rsum.pth.tar --config configs/coco_local.yaml

Train

In order to train the model using a given configuration, issue the following command:

python3 train.py --config configs/f30k_all.yaml --logger_name runs/f30k_m0.3
python3 train.py --config configs/coco_all.yaml --logger_name runs/coco_m0.3

Citation

Please cite this work if you find it useful:.

@article{liu2023efficient,
  title={Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training},
  author={Liu, Chong and Zhang, Yuqi and Wang, Hongsong and Chen, Weihua and Wang, Fan and Huang, Yan and Shen, Yi-Dong and Wang, Liang},
  journal={IEEE Transactions on Image Processing},
  year={2023}
}

tgdt's People

Contributors

hongsong-wang avatar lcfractal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

tgdt's Issues

The results cannot be reproduced

What an outstanding job, I tried to reproduce the code of this repository and painfully realized that I can't reproduce the results of the paper, the rsum differs by tens of points, what can I do to get the results of the paper?
Remarkably, the results in FLICKR30K and COCO 1K are exactly the same, which is surprising.

Question about the transformer_encoder

Thanks for sharing this great work. I have some questions about it. First, you mention you use pre-extracted image features but you choose use transformer to extract image features instead. This dosen't match the paper. Second, i am wondering why you add transformer_encoder_text and transformer_encoder_img. I saw your baseline TERAN before. Compared with it, you add these two encoders. I can't get it, especiallly the operation of .detach(). Can you help me solve these two questions? Thanks again for this work!
1693890408075

Dataset missing

Hello, the data set link on your code is missing, can you share it again

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.