Coder Social home page Coder Social logo

visualgpt's People

Contributors

junchen14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

visualgpt's Issues

How to do inference?

There doesn't appear to be any examples of how to do inference with this model.

I'm pretty confused, in training you guys feed in the entire label along with the image then use the label for the loss. What am I supposed to feed in if I have a new image?

Additionally, do I just use gpt2-large tokenizer to decode the model outputs?

coco_detections.hdf5

[coco_detections.hdf5]Hi,the file can't be downloaded, is there any other link?

Running code on IU X-Ray dataset

Hello, I am interested in running your Visual GPT model on the IU Xray dataset. Can you please explain how I can use this model to train on the dataset? I saw issue #4 but I was not able to understand how to create a .h5 file for the IU-Xray dataset.
Could you please walk me through how I can set up the .h5 file for IU-Xray?

Trying to run code on IU X-ray database

Hi, I've been interested in image captioning and specifically automatic medical report generation, and I stumbled across your VisualGPT which seemed to take a promising approach, and I've been trying to get it to work with other databases, specifically IU as mentioned in your article.

I can't figure out how you guys have set up the COCO database, and how I should be trying to structure IU X-ray to fit into your code. Is it still supposed to use COCO_detections.hdf5? Or am I supposed to create a hdf5 file for IU?

About memory overflow error during training

Hi! Thanks for the code. When I train the model to 42 batch, I met with the following error using 4*gtx2080ti:
“CUDA: out of memory, tried to allocate...”
I set the batchsize = 10, it still occurs. Is it totally a hardware problem? What device you use to train the model?
I notice that you said your code doesn't support multi-gpu training 1 years ago, is it still not supported yet?
Thank you!

About total data performance

Hi, I am glad to read this article. This essay is the first work that focuses on efficiently adapting large pretrained language
models for image captioning, which inspires me a lot!
In the result display section, it mainly shows the results of training using some data sets at different sampling rates. Therefore, I would like to ask, have you tested the results on all data sets without sampling? How does the performance compare to M2Transformer?

Batch size in Evaluation Strategy

Hi,
Good to see a new idea in image captioning!

Here is my question. I noticed that in your evaluation stage, you input the sample one by one, that is in 'train_visualGPT.py' line 62. And also I found that you cast the batch size in validation and evaluation 5 times smaller than one in training. Will there be a specific reason for these? As I notice that the evaluation actually supports mini-batch strategy, and your strategy will cost tons of time if the evaluation set is huge.

I am kind of a freshman in this area so my question might be silly. Feel free to let me know what you think.

Regards,

ModuleNotFoundError: No module named 'transformers'

I fount the above error while runing the shell command in colab.

!python /content/VisualGPT/train_visualGPT.py --batch_size 50 --head 12 --tau 0.2 --features_path /content/drive/MyDrive/coco_detections.hdf5 --annotation_folder /content/annotations --lr 1e-4 --gpt_model_type gpt --random_seed 42 --log_file logs/log --exp_name experiment_log --lr 1e-4 --decoder_layer 12 --optimizer_type adamw --gradient_accumulation_steps 2 --train_percentage 0.001 --split_train_data

Using pip/requirements.txt instead of conda

Hi, the requirements.txt doesn't work in this repo because some packages are not available on pypi (or at least not for python 3.8).

I just wanted to dump the steps I had to take to make this work.

First I needed to find a set of libraries that could work together in requirements.txt. This config seems to work for me:

absl-py==0.8.1
asn1crypto==1.2.0
cachetools==4.1.1
certifi==2019.9.11
cffi==1.13.2
chardet==3.0.4
click==7.1.2
cryptography==2.8
cycler==0.10.0
cymem==2.0.2
Cython==0.29.14
cytoolz==0.9.0.1
#dataclasses==0.7
dill==0.3.2
#en-core-web-sm==2.0.0
filelock==3.0.12
future==0.17.1
google-auth==1.21.1
google-auth-oauthlib==0.4.1
grpcio==1.25.0
h5py==2.8.0
idna==2.8
joblib==0.16.0
kiwisolver==1.1.0
Markdown==3.1.1
matplotlib==2.2.3
mkl-fft==1.3.0
mkl-random==1.2.2
mkl-service==2.4.0
msgpack==0.6.2
msgpack-numpy==0.4.4.3
multiprocess==0.70.9
murmurhash==0.28.0
numpy==1.16.4
oauthlib==3.1.0
packaging==20.4
pathlib==1.0.1
pathos==0.2.3
Pillow==6.2.1
plac==0.9.6
pox==0.2.7
ppft==1.6.6.1
preshed==2.0.1
protobuf==3.10.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycocotools==2.0.3
pycparser==2.19
pyOpenSSL==19.1.0
pyparsing==2.4.5
PySocks==1.7.1
python-dateutil==2.8.1
pytz==2019.3
regex==2017.4.5
requests==2.22.0
requests-oauthlib==1.3.0
rsa==4.6
sacremoses==0.0.43
sentencepiece==0.1.91
six==1.13.0
spacy==2.1.0
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
termcolor==1.1.0
thinc==7.0.2
tokenizers==0.8.1rc2
toolz==0.10.0
torch==1.6.0
torchtext==0.7.0
tqdm==4.32.2
transformers==3.1.0
ujson==1.35
urllib3==1.24.2
Werkzeug==0.16.0
wrapt==1.10.11

The issues with the normal requirements are thinc, spacy and the mkl libs. After, I needed to upgrade numpy to latest (numpy==1.22.0) in order to fix some runtime errors.

I also had to update torch after the fact to get cuda 11 working, seems like torch 1.8 works. Installed with pip install -U --force-reinstall --no-cache-dir torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

After that it seems to be training.

Explaining the cross attention

Hi,

thank you for answering my last question!

I am currently trying to explain part of the caption generation process and I am interested in Figure.5 where you highlighted the visual scores on the generated captions.

However, if my understanding is correct, you have not put any code about your explaining method in the repo.
It will be really appreciated if you would like to give some coding example about the visualization for better understanding!

Cheers!

About checkpoints on 0.1, 0.5, 1% on MS-COCO and IU X-ray?

Dear friend,

Thank you for your novel work for the low-resource image captioning datasets.
However, I wonder why you do not provide the checkpoints from all baselines you used and your proposed method on MS-COCO, IU X-Ray as well as your 0.1, 0.5 and 1% MS-COCO training datasets?

It seems that this repo is only used to train on MS-COCO dataset, how about IU X-Ray? Did you modify from https://github.com/cuhksz-nlp/R2Gen or directly use this repo for experiments?

I think above points should be made clear.
Thank you very much.

multi-GPU

Hi, thanks for the code! Is there a version supporting multi-gpu training?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.