Coder Social home page Coder Social logo

peteanderson80 / bottom-up-attention Goto Github PK

View Code? Open in Web Editor NEW
1.4K 26.0 377.0 13.77 MB

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Home Page: http://panderson.me/up-down-attention/

License: MIT License

CMake 0.91% Makefile 0.22% HTML 0.06% CSS 0.08% Jupyter Notebook 63.93% C++ 25.32% Shell 0.47% Python 6.52% Cuda 2.10% MATLAB 0.29% C 0.08% Dockerfile 0.02%
vqa visual-question-answering captioning-images faster-rcnn caffe image-captioning mscoco mscoco-dataset

bottom-up-attention's People

Contributors

alessandrosteri avatar bharatpublic avatar bharatsingh430 avatar peteanderson80 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bottom-up-attention's Issues

How to load .tsv datas in Tensorflow?

I am trying to use your .tsv datas as image features for image captioning. However, I have no idea how to load the .tsv data so as to random batch sample the features items and match correlated captions. I find a way to solve it that is to turn .tsv into .json format, and make the .json file to be a dict with "image_ids" as the keys. However, the .json file to too large to load. In fact, I am even failed to generate the json file since the lack of Memory. I am also failed to use tensorflow's textlinereader.

So,
How to load .tsv datas rightly?
Looking forward to your help!
Thank you!

I run python demo.py failed!

I was download the model file first. when I run demo.py get message like this:
I1017 11:02:41.250684 40048 net.cpp:131] Top shape: 1 2 126 14 (3528)
I1017 11:02:41.250689 40048 net.cpp:139] Memory required for data: 117482412
I1017 11:02:41.250692 40048 layer_factory.hpp:77] Creating layer rpn_cls_prob_reshape
I1017 11:02:41.250699 40048 net.cpp:86] Creating Layer rpn_cls_prob_reshape
I1017 11:02:41.250704 40048 net.cpp:408] rpn_cls_prob_reshape <- rpn_cls_prob
I1017 11:02:41.250713 40048 net.cpp:382] rpn_cls_prob_reshape -> rpn_cls_prob_reshape
I1017 11:02:41.250741 40048 net.cpp:124] Setting up rpn_cls_prob_reshape
I1017 11:02:41.250747 40048 net.cpp:131] Top shape: 1 18 14 14 (3528)
I1017 11:02:41.250751 40048 net.cpp:139] Memory required for data: 117496524
I1017 11:02:41.250756 40048 layer_factory.hpp:77] Creating layer proposal
F1017 11:02:41.250799 40048 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, BoxAnnotatorOHEM, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, InnerProductBlob, Input, LRN, LSTM, LSTMUnit, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, PSROIPooling, Parameter, Pooling, Power, RNN, ROIPooling, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, SmoothL1LossOHEM, Softmax, SoftmaxWithLoss, SoftmaxWithLossOHEM, Split, TanH, Threshold, Tile, WindowData)
*** Check failure stack trace: ***
Aborted (core dumped)

Can sombody share the Pretrained features?

I used Chrome to download the features, but the speed is lower than 50kb/s and when downloading 50~100MB, it would interupt and when I redownload it will start from the begining.....
I feel helpless...

why using the VG data to train the Faster-RCNN model

Thanks for sharing the models and features. I have tried the feature for VQA with my own model, really surprising results indeed :)
I have two questions as follows:

  1. As the VQA datasets is based on the images of MSCOCO, will it be better to train the faster rcnn model on the COCO dataset directly?
  2. Could a better object detection model, e.g., R-fcn or Deformable R-fcn further improve the VQA performance?

Training Scripts

It is a nice work and the feature improves a lot ! Could you please share the detailed training process for the final model?

Compilation Error

i got the following error:

token ""CUDACC_VER is no longer supported. Use CUDACC_VER_MAJOR, CUDACC_VER_MINOR, and CUDACC_VER_BUILD instead."" is not valid in preprocessor expressions

i've updated eigen already and still get the same error...
what should i do?

example images without bounding boxes

First of all, thanks for making this project public.

In the paper and this repo's README, two example images are used to show your model's prediction qualitatively. (images with bike and oven) Since I can't figure out where those images are in the dataset (VG or COCO), is it possible for you to provide where those images are?

Thanks in advance.

Generation of 1600-400-20 vocab files

I run the setup_vg.py script with the max_objects/attributes/relation set to 1600/400/20 respectively. However, the generated vocabulary files are slightly different from the ones provided in the 1600-400-20 folder. Is there any manual post-procedure?

GloVe word embedding for top down attention model

Thank you for the fascinating paper and code!

I notice that for the image captioning model, you decided to use the standard vocabulary and train the word embedding matrix from scratch. So I've been wondering if I apply the approach mentioned in your previous paper about Constrained Beam Search (pretrained GloVe vectors with expanded vocabulary), will it improve the performance of the model?

Test 2014 adaptive features are not getting prepared

I tried creating numpy files from test2014 variable box features using the following read_csv file in this git. but it says there's an error in padding while decoding from string to numpy array.

item[field] = np.frombuffer(base64.decodestring(item[field]),
dtype=np.float32).reshape((item['num_boxes'],-1))

it says, incorrect padding within decodestring function. I tried adding ''=" at the end but then the dimensions of resultant numpy array mismatches with num_boxes. This error occurs for every tsv file in test2014 adaptive feature set.

Tried debugging the code:
hRPQAAAAAAAAAAAAAAALSF8D0AAAAAAAAAAEK5hkBUkI4/ubE7PwAAAABMs648eTKRO2Xq5z1mSOE+aKcrPwAAAAAAAAAAAAAAABsb3T7nhK49jVvEPirGgjqzJrM+AAAAAJLICj8B0G4+HmhvPvccLz4AAAAAq37BOoivl0AtCwg8AAAAAAAAAAAAAAAAtsQoPgAAAADbHCo7AAAAAAAAAACSgow/AAAAAFYsqT+fwoM9AAAAAIEkFkHFf6U8AAAAAAAAAACspfQ+AAAAAAAAAACbh5E9AAAAAF/CRz8AAAAAAAAAAGXGa0BfWrs/FetIPKe0RD8AAAAAzPLROwAAAAAAAAAAAAAAAHy/Rz7JO49A/5cPP8bSlz4AAAAANOKEQAAAAAAAAAAAAAAAAAAAA437192 length of string=237423

/home/juan_fernandez/scripts/read_tsv.py(74)()
-> pdb.set_trace()
(Pdb) c
Traceback (most recent call last):
File "scripts/read_tsv.py", line 74, in
pdb.set_trace()
File "/home/juan_fernandez/anaconda2/envs/py27/lib/python2.7/base64.py", line 328, in decodestring
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

Any help would be appreciated.

how to set larger batch size in training????

I modify the BATCH_SIZE from 64 to 192 in faster_rcnn_end2end_resnet.yml, but I get error :

F1206 14:38:02.929175 195836 loss_layer.cpp:19] Check failed: bottom[0]->num() == bottom[1]->num() (32 vs. 96) The data and label should have the same number.

I think I miss some thing in configure setting. May be another parameter should also be changed corresponding to the BATCH_SIZE.

Image caption

hi,can you release your image caption implement version?

About test results

Hi, I just run the test code using your trained resnet101 model on the test set. I got the following numbers on object detection task:

Mean AP = 0.0146
Weighted Mean AP = 0.1799
Mean Detection Threshold = 0.328

The mean AP (1.46%) is far from the number (10.2%) you reported in the table at the bottom of readme. The weighted mean AP is a bit higher than the number you reported. I am wondering whether there is a typo in your table.

thanks!

Tensorflow version ?

Hi,

I have a question. I would like to know whether you will be releasing a tensorflow version of your code ?

Struggling for installation

Hello,

I am heavily failing to build the shipped Caffe with anaconda, mainly linking errors w.r.t google protobuf. Been struggling for like 7-8 hours and I am pretty close to give up.

So the question is what are the modifications shipped in the caffe/ folder? Can't we really use upstream Caffe?

not able to define function in lib/fast_rcnn/test.py

I want to define a function in lib/fast_rcnn/test.py . I implemented new function in test.py and imported test in demo.ipnb. When I access new function as test.new_function() it throw error module object has not attribute as new_function. How can we define new function test.py ?

binascii.Error: Incorrect padding

Got binascii.Error: Incorrect padding when reading image 300104 from test2014/test2014_resnet101_faster_rcnn_genome.tsv.1 with tools/read_tsv.py. Anything wrong?

Traceback (most recent call last):
  File "read_tsv.py", line 64, in <module>
    read_and_save(os.path.join(in_dir, in_file), out_dir)
  File "read_tsv.py", line 45, in read_and_save
    item['features'] = np.frombuffer(base64.decodestring(item['features']), dtype=np.float32).reshape((item['num_boxes'], -1))
  File "/usr/lib64/python2.7/base64.py", line 321, in decodestring
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

which proto is for generate_tsv.py๏ผŸ

I want to use your model to get features, but I don't know what proto do I need for generate_tsv.py.
Following the "test part", I choose "models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt", but it report error :
cudnn.hpp:122] Check failed: status == CUDNN_STATUS_SUCCESS (3 vs. 0) CUDNN_STATUS_BAD_PARAM
I will run this model in XMedia Wikipedia and Pascal. I want more introduction about this part please.
btw, do I need to resize the img shape to 224,224,3 ?
Thank you

Running evaluation script on CPU

Hello,

is it possible to run the evaluation script on CPU? Should I still install Caffe in the way proposed on the readme of this repository?

Thanks,
Claudio

Required gpu memory

Hello
I am trying to use the pretrained model to extract image features, I have GTX 1070(8 gb) and I get out of memory error when I use the network on one image, I suspect this is a caffe issue regarding memory mangament, what do you suggest to solve this issue without decreasing performance?

image captioning task

hi, peteanderson:
I have paid close attention to you for a long time in the Cross-modal field. And, your bottom-up& top-down work really made greatly improved than other works. According to your paper, i accomplished the top-down algorithm for image captioning. But i could not reproduce your 'CIDER loss'. So i just use the 'cross entropy loss' . If possible, i hope you will put your image captioning code on github.
Wish you will rollout another wonderful article in CVPR 2018.

How l2-normalization over feature is implemented ?

this paper states that L2 normalization of the image features is crucial for good performance. However, you just use pool5 data, which is average pooled to become a 2048 vector in generate_tsv.py

I'm wondering if you have implemented L2-normalization over feature or not. If you did, please inform me how you do it. Thanks a lot~

ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/tuple'

Hi , I run generate_tsv.py to generate pretrained features,but i meet a problem.

here is the problem :
**Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, self._kwargs)
File "generate_tsv.py", line 157, in generate_tsv
net = caffe.Net(prototxt, caffe.TRAIN, weights=weights)
File "/home/lijinze/bottom-up-attention-master/tools/../lib/rpn/anchor_target_layer.py", line 27, in setup
layer_params = yaml.load(self.param_str)
File "/usr/local/lib/python2.7/dist-packages/yaml/init.py", line 72, in load
return loader.get_single_data()
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 39, in get_single_data
return self.construct_document(node)
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 48, in construct_document
for dummy in generator:
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 398, in construct_yaml_map
value = self.construct_mapping(node)
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 208, in construct_mapping
return BaseConstructor.construct_mapping(self, node, deep=deep)
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 133, in construct_mapping
value = self.construct_object(value_node, deep=deep)
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 88, in construct_object
data = constructor(self, node)
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 414, in construct_undefined
node.start_mark)
ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/tuple'
in "", line 2, column 11:
'scales': !!python/tuple [4, 8, 16, 32]

Could you do me a favor?Thank you very much!

Problem in running demo.ipynb

net = caffe.Net(prototxt, caffe.TEST, weights=weights)
Traceback (most recent call last):
File "", line 1, in
Boost.Python.ArgumentError: Python argument types in
Net.init(Net, str, int)
did not match C++ signature:
init(boost::python::api::object, std::string, std::string, int)
init(boost::python::api::object, std::string, int)

When I run make -j8 && make pycaffe error.

CXX src/caffe/internal_thread.cpp
CXX src/caffe/layer.cpp
CXX src/caffe/blob.cpp
CXX src/caffe/syncedmem.cpp
CXX src/caffe/solver.cpp
CXX src/caffe/layer_factory.cpp
CXX src/caffe/data_transformer.cpp
CXX src/caffe/layers/hdf5_data_layer.cpp
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from src/caffe/syncedmem.cpp:3:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/syncedmem.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from src/caffe/blob.cpp:7:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/blob.o] Error 1
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from ./include/caffe/layer.hpp:12,
from src/caffe/layer_factory.cpp:8:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/layer_factory.o] Error 1
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from ./include/caffe/layer.hpp:12,
from src/caffe/layer.cpp:1:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/layer.o] Error 1
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from ./include/caffe/layer.hpp:12,
from ./include/caffe/layers/hdf5_data_layer.hpp:10,
from src/caffe/layers/hdf5_data_layer.cpp:17:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/layers/hdf5_data_layer.o] Error 1
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from src/caffe/data_transformer.cpp:10:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/data_transformer.o] Error 1
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from ./include/caffe/layer.hpp:12,
from ./include/caffe/net.hpp:12,
from ./include/caffe/solver.hpp:7,
from src/caffe/solver.cpp:6:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/solver.o] Error 1
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from src/caffe/internal_thread.cpp:5:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/internal_thread.o] Error 1

activation of the relation prediction

@peteanderson80
I saw the option "HAS_RELATION" in the cfg file. I turned it on and add a top[6] data for the proposal_target_layer and set the param num_rel_classes to 21(I am not sure if this is correct for the vg_1600-400-20 dataset) and start training, I got the following error:

File "/home/work/bottom-up-attention/tools/../lib/roi_data_layer/minibatch.py", line 55, in get_minibatch
    "Generation of gt_relations doesn't accomodate dropping objects"
AssertionError: Generation of gt_relations doesn't accomodate dropping objects

Is there something wrong with my setting?

read_tsv file error

when I use default setting, which r+b, to open tsv file, error occurs like
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
in line "for item in reader:"
when I use r+
TypeError: expected bytes-like object, not str. occur at np.frombuffer(base64.decodestring(item[field]),
my environment in windows. should it matter?

How about the image caption model?

Hello, the attribute extraction net using bottom-up attention you proposed is impressing! It indeed boosts the image caption performance in the paper. Besides the attention model, I am also interested in the caption model designed in the paper. In the paper, you mentioned that your caption model achieves performance comparable to start-of-art on most evaluation metrics. So to compare with my own model, can you provide your captioning model implementation code? :)

run tools/demo.ipynb error

[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 6305:21: Message type "caffe.LayerParameter" has no field named "roi_pooling_param".
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0216 21:21:02.787189 22074 upgrade_proto.cpp:90] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: /bottom-up-attention/models/vg/ResNet-101/faster_rcnn_end2end/test.prototxt
*** Check failure stack trace: ***
Aborted

could you please give me a solution to fact this error? Thank you!!

Could you please list the platform version?

When I make pycaffe follow your configuration, it always arise some annoying conflicts. Could you please list the platform used in your code?

Besides, my environment is :

  • Ubuntu16.04
  • CUDA 8
  • CUDNN 5.0
  • Opencv2.4.13
  • mkl2016
  • NCCL

Is that worked?
Thanks~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.