vision-cair / visualgpt Goto Github PK

View Code? Open in Web Editor NEW

313.0 14.0 49.0 6.32 MB

VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models

License: MIT License

Python 100.00%

visualgpt data-efficient-image-caption image-caption

visualgpt's People

Contributors

Stargazers

Watchers

Forkers

junchen14 pinakinathc gyq716 nlgkhoi trendingtechnology gmuraleekrishna hsouporto won548 namnaku87 huang-xx saitamandl caodoanh2001 ldzhangyx netamor metavai treezzz trellixvulnteam hao1635 standardgalactic ske159 numericx skyrain888 happyhj chunniunai220ml dilipdmn91 funson aorist-ai metamorphart msgpo linbirg emmytheo richardsonjf tonywhite11 knightcn1983 yangyusong xiaohan007963 swhustla hitesh0011 sanyaade-teachings jfontestad mmarking c0de3 aprzegalinska sohaib0399 flavio58it foreverps paperwave chansky6

visualgpt's Issues

How to do inference?

There doesn't appear to be any examples of how to do inference with this model.

I'm pretty confused, in training you guys feed in the entire label along with the image then use the label for the loss. What am I supposed to feed in if I have a new image?

Additionally, do I just use gpt2-large tokenizer to decode the model outputs?

coco_detections.hdf5

[coco_detections.hdf5]Hi，the file can't be downloaded, is there any other link?

Running code on IU X-Ray dataset

Hello, I am interested in running your Visual GPT model on the IU Xray dataset. Can you please explain how I can use this model to train on the dataset? I saw issue #4 but I was not able to understand how to create a .h5 file for the IU-Xray dataset.
Could you please walk me through how I can set up the .h5 file for IU-Xray?

Trying to run code on IU X-ray database

Hi, I've been interested in image captioning and specifically automatic medical report generation, and I stumbled across your VisualGPT which seemed to take a promising approach, and I've been trying to get it to work with other databases, specifically IU as mentioned in your article.

I can't figure out how you guys have set up the COCO database, and how I should be trying to structure IU X-ray to fit into your code. Is it still supposed to use COCO_detections.hdf5? Or am I supposed to create a hdf5 file for IU?

About memory overflow error during training

Hi! Thanks for the code. When I train the model to 42 batch, I met with the following error using 4*gtx2080ti:
“CUDA: out of memory, tried to allocate...”
I set the batchsize = 10, it still occurs. Is it totally a hardware problem? What device you use to train the model?
I notice that you said your code doesn't support multi-gpu training 1 years ago, is it still not supported yet?
Thank you!

About total data performance

Hi, I am glad to read this article. This essay is the first work that focuses on efficiently adapting large pretrained language
models for image captioning, which inspires me a lot!
In the result display section, it mainly shows the results of training using some data sets at different sampling rates. Therefore, I would like to ask, have you tested the results on all data sets without sampling? How does the performance compare to M2Transformer?

experiment_log_last.pth not found after Epoch 0 evaluation step has been completed

First of all, thanks for sharing your code.

After epoch 0 evaluation step completed , i got the following error
"FileNotFoundError: [Errno 2] No such file or directory: 'saved_models/experiment_log_last.pth'"

Could you please help me how to solve it ?

Batch size in Evaluation Strategy

Hi,
Good to see a new idea in image captioning!

Here is my question. I noticed that in your evaluation stage, you input the sample one by one, that is in 'train_visualGPT.py' line 62. And also I found that you cast the batch size in validation and evaluation 5 times smaller than one in training. Will there be a specific reason for these? As I notice that the evaluation actually supports mini-batch strategy, and your strategy will cost tons of time if the evaluation set is huge.

I am kind of a freshman in this area so my question might be silly. Feel free to let me know what you think.

Regards,

ModuleNotFoundError: No module named 'transformers'

I fount the above error while runing the shell command in colab.

!python /content/VisualGPT/train_visualGPT.py --batch_size 50 --head 12 --tau 0.2 --features_path /content/drive/MyDrive/coco_detections.hdf5 --annotation_folder /content/annotations --lr 1e-4 --gpt_model_type gpt --random_seed 42 --log_file logs/log --exp_name experiment_log --lr 1e-4 --decoder_layer 12 --optimizer_type adamw --gradient_accumulation_steps 2 --train_percentage 0.001 --split_train_data

Using pip/requirements.txt instead of conda

Hi, the requirements.txt doesn't work in this repo because some packages are not available on pypi (or at least not for python 3.8).

I just wanted to dump the steps I had to take to make this work.

First I needed to find a set of libraries that could work together in requirements.txt. This config seems to work for me:

absl-py==0.8.1
asn1crypto==1.2.0
cachetools==4.1.1
certifi==2019.9.11
cffi==1.13.2
chardet==3.0.4
click==7.1.2
cryptography==2.8
cycler==0.10.0
cymem==2.0.2
Cython==0.29.14
cytoolz==0.9.0.1
#dataclasses==0.7
dill==0.3.2
#en-core-web-sm==2.0.0
filelock==3.0.12
future==0.17.1
google-auth==1.21.1
google-auth-oauthlib==0.4.1
grpcio==1.25.0
h5py==2.8.0
idna==2.8
joblib==0.16.0
kiwisolver==1.1.0
Markdown==3.1.1
matplotlib==2.2.3
mkl-fft==1.3.0
mkl-random==1.2.2
mkl-service==2.4.0
msgpack==0.6.2
msgpack-numpy==0.4.4.3
multiprocess==0.70.9
murmurhash==0.28.0
numpy==1.16.4
oauthlib==3.1.0
packaging==20.4
pathlib==1.0.1
pathos==0.2.3
Pillow==6.2.1
plac==0.9.6
pox==0.2.7
ppft==1.6.6.1
preshed==2.0.1
protobuf==3.10.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycocotools==2.0.3
pycparser==2.19
pyOpenSSL==19.1.0
pyparsing==2.4.5
PySocks==1.7.1
python-dateutil==2.8.1
pytz==2019.3
regex==2017.4.5
requests==2.22.0
requests-oauthlib==1.3.0
rsa==4.6
sacremoses==0.0.43
sentencepiece==0.1.91
six==1.13.0
spacy==2.1.0
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
termcolor==1.1.0
thinc==7.0.2
tokenizers==0.8.1rc2
toolz==0.10.0
torch==1.6.0
torchtext==0.7.0
tqdm==4.32.2
transformers==3.1.0
ujson==1.35
urllib3==1.24.2
Werkzeug==0.16.0
wrapt==1.10.11

The issues with the normal requirements are thinc, spacy and the mkl libs. After, I needed to upgrade numpy to latest (numpy==1.22.0) in order to fix some runtime errors.

I also had to update torch after the fact to get cuda 11 working, seems like torch 1.8 works. Installed with pip install -U --force-reinstall --no-cache-dir torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

After that it seems to be training.

Variable mention before initialization

VisualGPT/train_visualGPT.py

Line 150 in 72d6b74

if args.fp16:

In line 150 of file train_visualGPT a variable named fp16 is mentioned. This variable was never initialized in the arguments

Explaining the cross attention

Hi,

thank you for answering my last question!

I am currently trying to explain part of the caption generation process and I am interested in Figure.5 where you highlighted the visual scores on the generated captions.

However, if my understanding is correct, you have not put any code about your explaining method in the repo.
It will be really appreciated if you would like to give some coding example about the visualization for better understanding!

Cheers!

About checkpoints on 0.1, 0.5, 1% on MS-COCO and IU X-ray?

Dear friend,

Thank you for your novel work for the low-resource image captioning datasets.
However, I wonder why you do not provide the checkpoints from all baselines you used and your proposed method on MS-COCO, IU X-Ray as well as your 0.1, 0.5 and 1% MS-COCO training datasets?

It seems that this repo is only used to train on MS-COCO dataset, how about IU X-Ray? Did you modify from https://github.com/cuhksz-nlp/R2Gen or directly use this repo for experiments?

I think above points should be made clear.
Thank you very much.

multi-GPU

Hi, thanks for the code! Is there a version supporting multi-gpu training?