Coder Social home page Coder Social logo

trellixvulnteam / transformers-vqa_sdai Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yikuan8/transformers-vqa

0.0 0.0 0.0 1.94 MB

An implementation that downstreams pre-trained V+L models to VQA tasks. Now support: VisualBERT, LXMERT, and UNITER

Python 58.35% Jupyter Notebook 41.65%

transformers-vqa_sdai's Introduction

Transformers-VQA

An implementation of down-streaming trending pre-trained V+L models to VQA tasks.

Now support: VisualBERT, LXMERT, and UNITER on Linux and Google Colab.

Notes:

  • This is only a beta version, please feel free to leave an issue if you encounter any error.
  • Our implementation is built on the great repo of LXMERT.
  • Please consider citing the original work of V+L models if you adopt their pre-trained weights.
  • If you find our implementation helps, please consider citing this:)
@inproceedings{li2020comparison,
  title={A comparison of pre-trained vision-and-language models for multimodal representation learning across medical images and reports},
  author={Li, Yikuan and Wang, Hanyin and Luo, Yuan},
  booktitle={2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)},
  pages={1999--2004},
  year={2020},
  organization={IEEE}
}

USAGE

We provide an interactive example of fine-tuning your customized dataset using Google Colab.

Colab Notebook

Following is another example of fine-tuning VQA 2.0 dataset on a linux server.

0. Clone our repo.

1. Install all python dependencies (a virtual environment is highly recommended):

pip install -r requirements.txt

2. Download pre-trained models and place them to data/pretrained/

You can download these models from their own github repo. We also provide command lines to handle this:

VisualBERT:

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1kuPr187zWxSJbtCbVW87XzInXltM-i9Y' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1kuPr187zWxSJbtCbVW87XzInXltM-i9Y" -O models/pretrained/visualbert.th && rm -rf /tmp/cookies.txt

UNITER:

wget https://convaisharables.blob.core.windows.net/uniter/pretrained/uniter-base.pt -P models/pretrained/

LXMERT:

wget https://nlp.cs.unc.edu/data/model_LXRT.pth -P models/pretrained/

3 Download re-distributed json files for VQA 2.0 (copy from airsplay/lxmert)

wget https://nlp.cs.unc.edu/data/lxmert_data/vqa/train.json -P data/
wget https://nlp.cs.unc.edu/data/lxmert_data/vqa/nominival.json -P  data/
wget https://nlp.cs.unc.edu/data/lxmert_data/vqa/minival.json -P data/
wget https://nlp.cs.unc.edu/data/lxmert_data/vqa/test.json -P data/

4 Download faster-rcnn features for MS COCO train2014 (17 GB) and val2014 (8 GB) images (copy from airsplay/lxmert), this process will take a while

wget https://nlp.cs.unc.edu/data/lxmert_data/mscoco_imgfeat/train2014_obj36.zip -P data/img
unzip data/img/train2014_obj36.zip -d data/img && rm data/img/train2014_obj36.zip
wget https://nlp.cs.unc.edu/data/lxmert_data/mscoco_imgfeat/val2014_obj36.zip -P data/img
unzip data/img/val2014_obj36.zip -d data && rm data/img/val2014_obj36.zip
wget https://nlp.cs.unc.edu/data/lxmert_data/mscoco_imgfeat/test2015_obj36.zip -P data/img
unzip data/img/test2015_obj36.zip -d data && rm data/img/test2015_obj36.zip

5 Now you have fulfill all requirements and dependencies, run this command before fine-tuning on the entire training dataset:

python vqa.py --tiny

6 If no error pops up, you are good to go. Please refer param.py for all settings. Here is an example of fine-tuning UNITER:

python vqa.py --model uniter --epochs 6 --max_seq_length 20 --load_pretrained models/pretrained/uniter-base.pt --output models/trained/

7.a Local validation with BEST config in step 6:

python vqa.py --test minival --load_trained models/trained/BEST

7.b Inference on VQA test split, results will be saved in models/trained/test_predict.json

python vqa.py --test test --load_trained models/trained/BEST

transformers-vqa_sdai's People

Contributors

hanyinwang avatar trellixvulnteam avatar yikuan8 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.