Coder Social home page Coder Social logo

zhegan27 / villa Goto Github PK

View Code? Open in Web Editor NEW
118.0 8.0 12.0 869 KB

Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

Home Page: https://arxiv.org/pdf/2006.06195.pdf

License: MIT License

Dockerfile 0.33% Python 96.86% Shell 2.81%
vision-and-language adversarial-training pretraining visual-question-answering neurips-2020

villa's People

Contributors

linjieli222 avatar zhegan27 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

villa's Issues

About the reproduction of VCR experiment results

Hi,
Thanks for your great work!
When i use the following command to train a model, it seems can't reach the expected results in the paper.
horovodrun -np 1 python train_vcr_adv.py --config config/train-vcr-base-4gpu-adv.json \ --output_dir vcr/output_base
Only use one GPU,I got these results
100%|##########| 8000/8000 [4:58:12<00:00, 1.98s/it][1,0]<stderr>:09/10/2021 08:48:59 - INFO - __main__ - ============Step 8000============= [1,0]<stderr>:09/10/2021 08:48:59 - INFO - __main__ - 1280000 examples trained at 71 ex/s [1,0]<stderr>:09/10/2021 08:48:59 - INFO - __main__ - =========================================== [1,0]<stderr>: [1,0]<stderr>:09/10/2021 08:48:59 - INFO - __main__ - start running validation... [1,0]<stderr>: [[[[1,0]<stderr>:09/10/2021 08:54:06 - INFO - __main__ - validation finished in 307 seconds, score_qa: 72.28 score_qar: 75.06 score: 54.35

I am confused that this result is a few percentage points different from the one mentioned in the paper.
What should i do? Thanks in advance!!!

VQA pre-processing

I'd like to apply this model to my own VQA-like dataset.
However, the dataset is in json format (like the original VQA dataset), so I need to convert it to lmdb file format.
So, if you have the code to convert the original VQA data to lmdb format, could you please provide the code?
Specifically, how did you calculate the "target" values in the text lmdb?

Features of img_pos_feat

Hello,

I noticed that img_pos_feat have 7 features. I assumed that 4 of them are coordinates of the boxes. What are the other 3? Is there a code where I can see how 7 features were derived?

How to extract features to do image retrieval

Thank you for this amazing piece of work.

I'm interested in using VILLA or UNITER to do image retrieval.

I'd like to pre-extract features from VILLA for a folder of images and then retrieve them at inference time by using a text query.

I note that in your paper you publish image retrieval and text retrieval metrics.

I've run the code as noted in the UNITER repo:

# text annotation preprocessing
bash scripts/create_txtdb.sh $PATH_TO_STORAGE/txt_db $PATH_TO_STORAGE/ann

# image feature extraction (Tested on Titan-Xp; may not run on latest GPUs)
bash scripts/extract_imgfeat.sh $PATH_TO_IMG_FOLDER $PATH_TO_IMG_NPY

# image preprocessing
bash scripts/create_imgdb.sh $PATH_TO_IMG_NPY $PATH_TO_STORAGE/img_db

Most of the scripts and examples I can see in the repo require both images and text to be presented to the model.

Do you have any examples or advice on how to get text-only representations/features that could be used to then retrieve images by their pre-encoded features?

Thanks for any help or guidance you can provide.

Checkpoints of Villa models to run on validation set

Hello,

Thanks for your work and available code. I have downloaded your checkpoints using
download_pretrained.sh

It downloaded several VILLA models, where one of them is villa-base.pt. Then I would like to run the validation on the checkpoint model as

python train_vqa_adv.py --config config/train-vqa-base-1gpu-adv.json --checkpoint saved_data/pretrained/villa-base.pt  --valid_steps 1

However, I noticed that when model is loaded from the checkpoint, weights of self.vqa_output are not updated. What would be your suggestion if I want to take your best model and use it to run on a validation set?

The number of trainable parameters

Hello,it's a great work!Can you tell me what is the number of trainable parameters for the model fine-tuning retrieval task, using Uniter_base

As the epoch increased, so did the GPU memory

Hi ,
Thanks for your great work!
When I fine tuning the VQA ,I met the problems that:
As the epoch increased, so did the GPU memory,Eventually,It will exceed the GPU's highest memory which causes the stopping.

And when using multiple GPUs for training, GPU0 uses more internal memory than any other.

This problem has been bothering me for a long time, and I want to ask do you know what is the reason?

Thanks for your reply~:)

training setup

Hi,
Thanks for your excellent work. I am not sure the batchsize in your paper is same as it in the code? In code, 3072 refers to total tokens, corresponding to about real 32 examples each iteration.

a) Maybe 32(real batchsize)*8(Grad. Accu) is dominant factor?
b) Our V100 machine (16G) can not process the 3072 tokens, so maybe 1024 tokens(about 8 real examples), 8 Gpus, 4(Grad. Accu) is another workable plan?
c) Besides, the train-vqa-large-8gpu-adv.json you released can reproduce the paper result? Some parameters seem to be set differently from the paper (e.g. Adv .Lr ..)

We deeply hope to reproduce your best results in our limited resource scenario. Thank a lot.

When will the adversarial training code of pretraining in indomain dataset be released?

Hi, zhe;

Thanks for your excellent work. Recently I want to reproduce some results in Villa and conduct pre-training on indomain datasets. I am curious about whether it is possible to mimic the adversarial training codes in train_vqa_adv.py to pretraining stage simply? Is there any specific configuration for adversarial training in pretraining stage?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.