Coder Social home page Coder Social logo

mesnico / aladin Goto Github PK

View Code? Open in Web Editor NEW
17.0 5.0 6.0 18.02 MB

Official implementation of the paper "ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval"

License: MIT License

Python 99.66% Shell 0.34%
computer-vision cross-modal cross-modal-retrieval deep-learning language-and-vision natural-language-processing pytorch

aladin's Introduction

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval

Introduction

This is the code for reproducing the results from our paper ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval, accepted at CBMI 2022.

Our code is based on OSCAR, whose repository is available here.

Architecture

Installation

Requirements

  • Python 3.7
  • Pytorch 1.2
  • torchvision 0.4.0
  • cuda 10.0

Setup with Conda

# create a new environment
conda create --name oscar python=3.7
conda activate oscar

# install pytorch1.2
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch

export INSTALL_DIR=$PWD

# install apex
cd $INSTALL_DIR
git clone https://github.com/NVIDIA/apex.git
cd apex
git checkout f3a960f80244cf9e80558ab30f7f7e8cbf03c0a0
python setup.py install --cuda_ext --cpp_ext

# install this repo
cd $INSTALL_DIR
git clone --recursive https://github.com/mesnico/OSCAR-TERAN-distillation
cd OSCAR-TERAN-distillation/coco_caption
./get_stanford_models.sh
cd ..
python setup.py build develop

# install requirements
pip install -r requirements.txt

unset INSTALL_DIR

Download OSCAR & Vin-VL Retrieval data:

Download the checkpoint folder with azcopy:

path/to/azcopy copy 'https://biglmdiag.blob.core.windows.net/vinvl/model_ckpts/coco_ir/base/checkpoint-0132780/' <checkpoint-target-folder> --recursive

Download the IR data

path/to/azcopy copy 'https://biglmdiag.blob.core.windows.net/vinvl/datasets/coco_ir' <data-folder> --recursive

Download the pre-extracted Bottom-Up features

path/to/azcopy copy 'https://biglmdiag.blob.core.windows.net/vinvl/image_features/coco_X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/model_0060000/' <features-folder> --recursive

Training

cd alad 
python train.py --data_dir <data-folder>/coco_ir --img_feat_file <features-folder>/features.tsv --eval_model_dir <checkpoint-target-folder>/checkpoint-0132780 --config configs/<config>.yaml --logger_name <output-folder> --val_step 7000 --max_seq_length 50 --max_img_seq_length 34

Configurations

The parameter --config is very important. Configurations are placed in yaml format inside the configs folder:

  • alad-alignment-triplet.yaml: Trains the alignment head using hinge-based triplet ranking loss, finetuning also the Vin-VL backbone;
  • alad-matching-triplet-finetune.yaml: Trains only the matching head using hinge-based triplet ranking loss. The parameter --load-teacher-model can be used to provide a backbone previously trained using the alad-alignment-triplet.yaml configuration;
  • alad-matching-distill-finetune.yaml: Trains only the matching head by distilling the scores from the alignment head. The parameter --load-teacher-model in this case IS NEEDED to provide a correctly trained alignment head, previously trained using the alad-alignment-triplet.yaml configuration;
  • alad-matching-triplet-e2e.yaml: Trains the matching head, finetuning also the Vin-VL backbone;
  • alad-alignment-and-matching-distill.yaml: Trains the whole architecture (matching+alignment heads) end-to-end. The variable activate_distillation_after inside the configuration file controls how many epochs to wait before activating the distillation loss (wait that the backbone is minimally stable); alternatively, you can load a pre-trained backbone using the --load-teacher-model option.

Monitor Training

Training and validation metrics, as well as model checkpoints are put inside the <output-folder> path. You can live monitor all the metrics using tensorboard:

tensorboard --logdir <output-folder>

Testing

The following script tests a model on the 1k MS-COCO test set (you can download our best model from here; it is obtained with the alad-alignment-and-matching-distill.yaml configuration.)

cd alad
python test.py --data_dir <data-folder>/coco_ir --img_feat_file <features-folder>/features.tsv --eval_model_dir <checkpoint-target-folder>/checkpoint-0132780 --max_seq_length 50 --max_img_seq_length 34 --eval_img_keys_file test_img_keys_1k.tsv --load_checkpoint <path/to/checkpoint.pth.tar>

To test on 5k test set, simply set --eval_img_keys_file test_img_keys.tsv.

Reference

If you found this code useful, please cite the following paper:

@inproceedings{messina2022aladin,
  title={ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval},
  author={Messina, Nicola and Stefanini, Matteo and Cornia, Marcella and Baraldi, Lorenzo and Falchi, Fabrizio and Amato, Giuseppe and Cucchiara, Rita},
  booktitle={International Conference on Content-based Multimedia Indexing},
  pages={64--70},
  year={2022}
}

aladin's People

Contributors

mesnico avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

aladin's Issues

Questions about gpu

hello,
Recently, I'm doing an experiment in this direction, but I'm worried that gpu resources will not be supported. I would like to ask how much gpu resources are needed after the whole experiment?Hope to help me, thank you.

Can't download the the checkpoint folder

Hello
Thank you very much for your answer to my previous question. I appreciate this article very much and am very interested in it. I have tried to run the code. But i get error is'failed to perform copy command due to error: Login Credentials missing. No SAS token or OAuth token is present and the resource is not public' So I can't download the the checkpoint folder andIR data with azcopy. Hope to get your help, thank you.

No such file or directory: 'training_args.bin'

Hello,
Thank you very much for your answer to my previous question. I appreciate this article very much and am very interested in it. I have tried to run the code, but in the process of running it, I encountered a problem. torch.load(op.join(args.eval_model_dir, 'training_args.bin'))shows no file between 530-540 of the train.py code, and I didn't find it in the folder. This problem has puzzled me for a long time, and I haven't found a solution. Hope to get your help, thank you very much!
def restore_training_settings(args):
assert not args.do_train and (args.do_test or args.do_eval)
train_args = torch.load(op.join(args.eval_model_dir, 'training_args.bin'))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.