gt-vision-lab / vqa_lstm_cnn Goto Github PK
View Code? Open in Web Editor NEWTrain a deeper LSTM and normalized CNN Visual Question Answering model. This current code can get 58.16 on OpenEnded and 63.09 on Multiple-Choice on test-standard.
Train a deeper LSTM and normalized CNN Visual Question Answering model. This current code can get 58.16 on OpenEnded and 63.09 on Multiple-Choice on test-standard.
Hi,
I am confused that how to use the multiple choice answer in the multiple-choice task when training and evaluate the model?
Can we process the multiple choice answer the same as the open-ended task?
Thanks.
Hi guys, I recently attempt to implement this repository with tensorflow.
However the accuracy only reached about 47% though I've checked again and again.
Since I am not that kind of familiar with lua, can anyone help figure out what's the problem?
Here's my code:
https://github.com/JamesChuanggg/vqa-tf/blob/master/model_VQA.py
I simply follow each step of the original lua code.
hi! when i run file: prepro_img.lua, and process for imgs, when it processing for the num 70000+ of train2014, raise the error : " Unsupported marker type 0xf0" ? how to solve it?
i am wondering why we use unk token in test set ,will it affects the results ,and also if i hava a validation set used for early stopping .The encoded question must use the training vocab or add validation new words to vocab?
Hi,
I am trying to use this model for abstract scenes multiple choice answers and wanted to confirm the parameters for preprocessing and training.
In prepro.py is it okay that I leave num_ans
to be 1000, and in train.lua leave num_output
to be 1000?
Thank you!
Hi all,
We were able to save the model and ran eval.lua for evaluating questions on validation images.
Now, we wanted to use the model to answer questions about a new image. If such code is already available, we would love to use it. Otherwise we shall write the code ourselves and share it back.
Thanks,
Abhinav
envy@ub1404envy:~/os_prj/github/_QA/VT-vision-lab/VQA_LSTM_CNN$ th eval.lua -input_img_h5 data_img.h5 -input_ques_h5 data_prepro.h5 -input_json data_prepro.json -model_path model/lstm.t7
{
out_path : "result/"
batch_size : 500
model_path : "model/lstm.t7"
gpuid : 7
input_ques_h5 : "data_prepro.h5"
rnn_size : 512
common_embedding_size : 1024
input_img_h5 : "data_img.h5"
input_encoding_size : 200
input_json : "data_prepro.json"
img_norm : 1
backend : "cudnn"
num_output : 1000
rnn_layer : 2
}
nil
/home/envy/torch/install/bin/luajit: /home/envy/torch/install/share/lua/5.1/trepl/init.lua:384: /home/envy/torch/install/share/lua/5.1/trepl/init.lua:384: /home/envy/torch/install/share/lua/5.1/cudnn/ffi.lua:1279: 'libcudnn (R4) not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure files named as libcudnn.so.4 or libcudnn.4.dylib are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)
stack traceback:
[C]: in function 'error'
/home/envy/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
eval.lua:49: in main chunk
[C]: in function 'dofile'
...envy/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
my CUDA install at
envy@ub1404envy:~/os_prj/github/_QA/VT-vision-lab/VQA_LSTM_CNN$ ll /usr/local/cuda-7.5/targets/x86_64-linux/lib/
total 791192
drwxr-xr-x 3 root root 4096 Feb 11 00:06 ./
drwxr-xr-x 4 root root 4096 Dec 7 05:47 ../
-rw-r--r-- 1 root root 28585480 Aug 15 2015 libcublas_device.a
lrwxrwxrwx 1 root root 16 Aug 15 2015 libcublas.so -> libcublas.so.7.5*
lrwxrwxrwx 1 root root 19 Aug 15 2015 libcublas.so.7.5 -> libcublas.so.7.5.18*
-rwxr-xr-x 1 root root 23938736 Aug 15 2015 libcublas.so.7.5.18*
-rw-r--r-- 1 root root 28220076 Aug 15 2015 libcublas_static.a
-rw-r--r-- 1 root root 322936 Aug 15 2015 libcudadevrt.a
lrwxrwxrwx 1 root root 16 Aug 15 2015 libcudart.so -> libcudart.so.7.5*
lrwxrwxrwx 1 root root 19 Aug 15 2015 libcudart.so.7.5 -> libcudart.so.7.5.18*
-rwxr-xr-x 1 root root 383336 Aug 15 2015 libcudart.so.7.5.18*
-rw-r--r-- 1 root root 720192 Aug 15 2015 libcudart_static.a
-rwxr-xr-x 1 root root 11172416 Feb 11 00:06 libcudnn.so*
-rwxr-xr-x 1 root root 11172416 Feb 11 00:06 libcudnn.so.6.5*
-rwxr-xr-x 1 root root 11172416 Feb 11 00:06 libcudnn.so.6.5.48*
-rw-r--r-- 1 root root 11623922 Feb 11 00:06 libcudnn_static.a
lrwxrwxrwx 1 root root 15 Aug 15 2015 libcufft.so -> libcufft.so.7.5*
lrwxrwxrwx 1 root root 18 Aug 15 2015 libcufft.so.7.5 -> libcufft.so.7.5.18*
-rwxr-xr-x 1 root root 111231960 Aug 15 2015 libcufft.so.7.5.18*
-rw-r--r-- 1 root root 115104400 Aug 15 2015 libcufft_static.a
lrwxrwxrwx 1 root root 16 Aug 15 2015 libcufftw.so -> libcufftw.so.7.5*
To get the image features, run
$ th prepro_img.lua -input_json data_prepro.json -image_root path_to_image_root -cnn_proto path_to_cnn_prototxt -cnn_
im=im*255;
im2=im:clone()
im2[{{3},{},{}}]=im[{{1},{},{}}]-123.68
im2[{{2},{},{}}]=im[{{2},{},{}}]-116.779
im2[{{1},{},{}}]=im[{{3},{},{}}]-103.939
hello,
could someone plz explain to me this part of the code
I download the pretrained mode here: https://filebox.ece.vt.edu/~jiasenlu/codeRelease/vqaRelease/train_val/pretrained_lstm_train-val_test
and the corresponding features here:
https://filebox.ece.vt.edu/~jiasenlu/codeRelease/vqaRelease/train_val/data_train-val_test.zip
There is no error when i run eval.lua.
After i put the result files to here: https://github.com/VT-vision-lab/VQA
I came across the following error:
"
loading VQA annotations and questions into memory...
0:00:07.128280
creating index...
index created!
Loading and preparing results...
Traceback (most recent call last):
File "vqaEvalDemo.py", line 31, in
vqaRes = vqa.loadRes(resFile, quesFile)
File "../../VQA/PythonHelperTools/vqaTools/vqa.py", line 165, in loadRes
'Results do not correspond to current VQA set. Either the results do not have predictions for all question ids in annotation file or there is atleast one question id that does not belong to the question ids in the annotation file.'
AssertionError: Results do not correspond to current VQA set. Either the results do not have predictions for all question ids in annotation file or there is atleast one question id that does not belong to the question ids in the annotation file.
"
I have run the eval.lua to generate the result json files and then used the VQA tools to calculate the accuracy. I have tried using both the evaluate.py script and also the running the vqaEvalDemo.py from the VQA folder. Both of them give the following error.
loading VQA annotations and questions into memory...
0:00:19.813268
creating index...
index created!
Loading and preparing results...
Traceback (most recent call last):
File "evaluate.py", line 5, in <module>
from vqaEvalDemo import evaluate
File "/workspace/VQA_LSTM_CNN/VQA/PythonEvaluationTools/vqaEvalDemo.py", line 31, in <module>
vqaRes = vqa.loadRes(resFile, quesFile)
File "/workspace/VQA_LSTM_CNN/VQA/PythonHelperTools/vqaTools/vqa.py", line 174, in loadRes
'Results do not correspond to current VQA set. Either the results do not have predictions for all question ids in annotation file or there is atleast one question id that does not belong to the question ids in the annotation file.'
AssertionError: Results do not correspond to current VQA set. Either the results do not have predictions for all question ids in annotation file or there is atleast one question id that does not belong to the question ids in the annotation file.
I made sure the Questions and annotations files are exactly the same as the ones used in the training. And they are also the same ones available on the visualqa.org.
Score reported for abstract images using this code is 65 on CodaLab leaderboard.
Yet, running the code as is (which is pre-set for coco dataset),
I got 55 on validation set of abstract dataset using the evaluation tool provided.
Since the difference is pretty large, I'm assuming that setting (e.g. batch size, iterations, learning rate, layer..etc.) should be quite different from real(coco) dataset.
I was wondering if the setting to achieve the reported score on abstract dataset (i.e. how the code should be modified) can be shared.
The eval.lua script expects a gpuid (-1 for CPU) option, but running in CPU model also requires loading the 'cutorch' and 'cunn' packages. This leads to an error on startup itself and simply commenting those out and running the code in CPU mode fails on loading the pretrained_model_lstm.t7 file. I am expecting there is a workaround for this?
PS : I have cunn and cutorch packages installed (no GPU though)
Hi,
I was trying to use the pre-trained image features you provide, but looking at the shape of the hdf5 it seems to be (82459, 4096), instead of (82783, 4096) - there are 82783 images in the COCO dataset.
Which images are the ones that have been removed?
Thanks
envy@ub1404envy:/os_prj/github/_QA/VQA_LSTM_CNN$ ll/os_prj/github/_QA/VQA_LSTM_CNN$ th train.lua -backend nn
total 5387680
drwxrwxr-x 7 envy envy 4096 Feb 18 12:33 ./
drwxrwxr-x 5 envy envy 4096 Feb 18 00:37 ../
drwxrwxr-x 4 envy envy 4096 Feb 15 17:29 data/
-rw-rw-r-- 1 envy envy 2014627936 Feb 18 12:32 data_img.h5
-rw-rw-r-- 1 envy envy 2014627936 Dec 14 00:03 data_img.h5-ori
-rw-rw-r-- 1 envy envy 84335736 Feb 18 12:03 data_prepro.h5
-rw-rw-r-- 1 envy envy 9169211 Feb 18 12:03 data_prepro.json
-rw-rw-r-- 1 envy envy 716074236 Dec 16 14:45 data_train_val.zip
-rwxrwxr-x 1 envy envy 9395 Dec 29 19:26 eval.lua*
-rwxrwxr-x 1 envy envy 741 Dec 29 19:26 evaluate.py*
drwxrwxr-x 8 envy envy 4096 Dec 29 19:26 .git/
drwxrwxr-x 2 envy envy 4096 Dec 29 19:26 misc/
drwxrwxr-x 2 envy envy 4096 Feb 17 00:18 model/
-rw-rw-r-- 1 envy envy 3005 Feb 18 12:31 path_to_cnn_prototxt.lua
-rwxrwxr-x 1 envy envy 3403 Dec 29 19:26 prepro_img.lua*
-rwxrwxr-x 1 envy envy 9279 Dec 29 19:26 prepro.py*
-rw-rw-r-- 1 envy envy 53612941 Dec 14 19:57 pretrained_lstm_train.t7
-rw-rw-r-- 1 envy envy 49743190 Dec 16 14:04 pretrained_lstm_train_val.t7.zip
-rwxrwxr-x 1 envy envy 3625 Dec 29 19:26 readme.md*
drwxrwxr-x 2 envy envy 4096 Feb 17 00:18 result/
-rwxrwxr-x 1 envy envy 10759 Dec 29 19:26 train.lua*
-rw-rw-r-- 1 envy envy 574671192 Sep 24 2014 VGG_ILSVRC_19_layers.caffemodel
-rw-rw-r-- 1 envy envy 2715 Feb 18 12:05 yknote---log--1
envy@ub1404envy:
{
learning_rate_decay_every : 50000
batch_size : 500
gpuid : 0
common_embedding_size : 1024
input_img_h5 : "data_img.h5"
input_encoding_size : 200
learning_rate_decay_start : -1
input_json : "data_prepro.json"
num_output : 1000
input_ques_h5 : "data_prepro.h5"
rnn_size : 512
max_iters : 150000
checkpoint_path : "model/"
save_checkpoint_every : 25000
learning_rate : 0.0003
img_norm : 1
backend : "nn"
rnn_layer : 2
seed : 123
}
DataLoader loading h5 file: data_prepro.h5
DataLoader loading h5 file: data_img.h5
Building the model...
shipped data function to cuda...
/home/envy/torch/install/bin/luajit: train.lua:200: index out of range at /home/envy/torch/pkg/torch/lib/TH/generic/THTensorMath.c:156
stack traceback:
[C]: in function 'index'
train.lua:200: in function 'next_batch'
train.lua:247: in function 'opfunc'
/home/envy/torch/install/share/lua/5.1/optim/rmsprop.lua:32: in function 'rmsprop'
train.lua:303: in main chunk
[C]: in function 'dofile'
...envy/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
envy@ub1404envy:~/os_prj/github/_QA/VQA_LSTM_CNN$
Hi,
We triend running the code and have Titan X with 12 GB RAM. But we are getting following error message. What could be the possible reason for going Out of Memory ?
cuda runtime error (2) : out of memory at /home/ankit/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:40
stack traceback:
[C]: at 0x7fb62f736820
[C]: in function '__add'
train.lua:276: in function 'opfunc'
/home/ankit/torch/install/share/lua/5.1/optim/rmsprop.lua:32: in function 'rmsprop'
train.lua:303: in main chunk
[C]: in function 'dofile'
...nkit/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
envy@ub1404envy:/os_prj/github/_QA/VQA_LSTM_CNN$ th prepro_img.lua -backend nn -input_json data_prepro.json -image_root data_prepro.h5 -cnn_proto model/ -cnn_model VGG_ILSVRC_19_layers.caffemodel/os_prj/github/_QA/VQA_LSTM_CNN$
{
backend : "nn"
image_root : "data_prepro.h5"
cnn_proto : "model/"
batch_size : 10
input_json : "data_prepro.json"
gpuid : 1
out_name : "data_img.h5"
cnn_model : "VGG_ILSVRC_19_layers.caffemodel"
}
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192
Successfully loaded VGG_ILSVRC_19_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
processing 82459 images...
/home/envy/torch/install/bin/luajit: /home/envy/torch/install/share/lua/5.1/image/init.lua:650: attempt to call method 'nDimension' (a nil value)
stack traceback:
/home/envy/torch/install/share/lua/5.1/image/init.lua:650: in function 'scale'
prepro_img.lua:51: in function 'loadim'
prepro_img.lua:95: in main chunk
[C]: in function 'dofile'
...envy/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
envy@ub1404envy:
envy@ub1404envy:~/os_prj/github/_QA/VQA_LSTM_CNN$ tree
.
├── data
│ ├── annotations
│ │ ├── mscoco_train2014_annotations.json
│ │ ├── mscoco_val2014_annotations.json
│ │ ├── MultipleChoice_mscoco_test2015_questions.json
│ │ ├── MultipleChoice_mscoco_test-dev2015_questions.json
│ │ ├── MultipleChoice_mscoco_train2014_questions.json
│ │ ├── MultipleChoice_mscoco_val2014_questions.json
│ │ ├── OpenEnded_mscoco_test2015_questions.json
│ │ ├── OpenEnded_mscoco_test-dev2015_questions.json
│ │ ├── OpenEnded_mscoco_train2014_questions.json
│ │ └── OpenEnded_mscoco_val2014_questions.json
│ ├── vqa_preprocessing.py
│ ├── vqa_raw_test.json
│ ├── vqa_raw_train.json
│ └── zip
│ ├── Annotations_Train_mscoco.zip
│ ├── Annotations_Val_mscoco.zip
│ ├── Questions_Test_mscoco.zip
│ ├── Questions_Train_mscoco.zip
│ └── Questions_Val_mscoco.zip
├── data_prepro.h5
├── data_prepro.json
├── data_train_val.zip
├── eval.lua
├── evaluate.py
├── misc
│ ├── LSTM.lua
│ ├── netdef.lua
│ └── RNNUtils.lua
├── model
├── path_to_cnn_prototxt.lua
├── prepro_img.lua
├── prepro.py
├── pretrained_lstm_train.t7
├── pretrained_lstm_train_val.t7.zip
├── readme.md
├── result
├── train.lua
├── VGG_ILSVRC_19_layers.caffemodel
├── vgg_ilsvrc_19_layers_deploy-prototxt
├── vgg_ilsvrc_19_layers_deploy-prototxt.lua
├── vgg_ilsvrc_19_layers_deploy-prototxt.lua.lua
├── yknote---log--1
└── yknote---log--2
6 directories, 39 files
How to implement this feedback mechanism in NN?
Firstly, this is really helpful! Thanks for making this public and reproducible.
I see that that the steps and scripts to reproduce results on the multiple choice type of questions are clearly written. Do you plan to make public the scripts to do the same on the open-ended type questions too?
Specifically, I was curious to know how one would go about preprocessing using data/vqa_preprocessing.py
and preproc.py
on open-ended type of questions which works with your model.
COCO_val2014_000000320612.jpg
is apparently a PNG and will make image preprocessing break at (quite literally) the last minute. This is more of a PSA than anything else since the problem is detected too far along in the pipeline to practically fix.
Hi all,
I'm excited to do some work on the text processing side of the Visual QA task. I develop the spaCy NLP library. I think we should be able to get some extra accuracy, with some extra NLP logic on the question parsing side. We'll see.
The first thing I'd like to try is mapping out of vocabulary words to similar tokens, using a word2vec model. For instance, let's say the word colour is OOV. Seems easy to map this to color.
Input: What colour is his shirt?
Tokens: ["What", "colour", "is", "his", "shirt", "?"]
Transform: ["What", "color", "is", "his", "shirt", "?"]
I think this input normalization trick is novel, but it makes sense to me for this problem. It lets you exploit pre-trained vectors without interfering with the rest of your model choices.
I think the normalization could be taken a bit further, by using the POS tagger and parser to compute context-specific keys, so that the replacement could be more exact (sense2vec). I think just the word replacement is probably okay though.
It's also easy to calculate auxiliary features with spaCy. It's easy to train a question classifier, of course. I'm not sure the model is making many errors of that type, though.
If I had to say one thing was unsatisfying about the model, I'd say it's the multiclass classification output. Have you tried having the model output a vector, and using it to find a nearest neighbour?
Hello,
I trained the model as you described for 150K epochs, for some reason I'm getting only 49% accuracy, while using you pretrained parameters gives 58% as you mentioned. Does anyone have an idea what could cause such drop in performance?
Thanks!
It might be a bit confusing to have two output and did not use the second output
In Line 83 :
question = [w if wtoi.get(w,len(wtoi)) != len(wtoi) else 'UNK' for w in txt]
and Line 145:
if atoi.get(img['ans'],len(atoi)) != len(atoi):
You need to check if
question = [w if wtoi.get(w,len(wtoi)+1) != len(wtoi)+1 else 'UNK' for w in txt]
and
if atoi.get(img['ans'],len(atoi)+1) != len(atoi)+1:
since your indices begin with 1.
Hi all
I just want to make sure is the number of training picture is 82783?(from the VQA website)
since i found that in data_prepro.h5 or data_prepro.json, the number is 82460?
with h5py.File(h5_data_path,'r') as hf:
tem = hf.get('img_pos_train')
train_data['img_list'] = np.array(tem)
when I check the data:
np.unique(train_data['img_list']).shape[0]
>> 82460
Do I miss something?
Hi! The memory of my Gpu is 8G, When i run the train.lua on a single Gpu, it raise an error "out of memory". is there any way to solve it? I have 2 Gpus in total.
Hi,
We want to use this model as a baseline that we are comparing against, so how should we cite it?
Thanks for open-sourcing the code,
Ilija
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.