Coder Social home page Coder Social logo

lily11223344 / ssl-vqa Goto Github PK

View Code? Open in Web Editor NEW

This project forked from crossmodalgroup/ssl-vqa

0.0 0.0 0.0 73 KB

Code for our IJCAI2020 paper: Overcoming Language Priors with Self-supervised Learning for Visual Question Answering

Shell 0.76% Python 99.24%

ssl-vqa's Introduction

SSL-VQA

Here is the implementation of our IJCAI 2020 paper Overcoming Language Priors with Self-supervised Learning for Visual Question Answering. This repository contains code modified from here, many thanks!

Requirements

  • python 3.6.8

  • pytorch 1.0.1

  • zarr

  • tdqm

  • spacy

  • h5py

Download and preprocess the data

cd data 
bash download.sh
python preprocess_image.py --data trainval
python create_dictionary.py --dataroot vqacp2/
python preprocess_text.py --dataroot vqacp2/ --version v2
cd ..

Training

  • Train our model with multi-label VQA loss
CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ 
--img_root data/coco/ --output saved_models_cp2/ --self_loss_weight 3 --ml_loss
  • Train our model with corss-entropy VQA loss
CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ 
--img_root data/coco/ --output saved_models_cp2/ --self_loss_weight 1.2 --ce_loss
  • Train the model with 80% of the original training set
CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ 
--img_root data/coco/ --output saved_models_cp2/ --self_loss_weight 3 --ml_loss --ratio 0.8

Evaluation

  • A json file of results from the test set can be produced with:
CUDA_VISIBLE_DEVICES=0 python test.py --dataroot data/vqacp2/ --img_root data/coco/ --checkpoint_path saved_models_cp2/best_model.pth --output saved_models_cp2/result/
  • Compute detailed accuracy for each answer type:
python comput_score.py --input saved_models_cp2/result/XX.json --dataroot data/vqacp2/

Pretrained model & Well-trained model

If you don't want to train from scratch, you can download the pretrained base model from here(for ml_loss), and fine-tune it with our self-supervised loss as below:

CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ 
--img_root data/coco/ --output saved_models_cp2/ --self_loss_weight 3 --ml_loss --checkpoint_path ml_pretrained.pth

A well-trained model (for ml_loss) can be found here. The test results file produced by it can be found here and its performance is as follows:

Overall score: 58.58
Yes/No: 87.47 Num: 40.3 other: 48.45

Reference

If you found this code is useful, please cite the following paper:

@inproceedings{ijcai2020-151,
  title     = {Overcoming Language Priors with Self-supervised Learning for Visual Question Answering},
  author    = {Zhu, Xi and Mao, Zhendong and Liu, Chunxiao and Zhang, Peng and Wang, Bin and Zhang, Yongdong},
  booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
               Artificial Intelligence, {IJCAI-20}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},             
  editor    = {Christian Bessiere}	
  pages     = {1083--1089},
  year      = {2020},
  month     = {7},
  note      = {Main track}
  doi       = {10.24963/ijcai.2020/151},
  url       = {https://doi.org/10.24963/ijcai.2020/151},
}

ssl-vqa's People

Contributors

crossmodalgroup avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.