Coder Social home page Coder Social logo

stanfordnlp / mac-network Goto Github PK

View Code? Open in Web Editor NEW
492.0 31.0 120.0 210 KB

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)

License: Apache License 2.0

Python 100.00%
attention clevr machine-reasoning compositional-attention-networks tensorflow question-answering vqa

mac-network's Introduction

Compostional Attention Networks for Real-World Reasoning

Drew A. Hudson & Christopher D. Manning

Please note: We have updated the GQA challenge deadline to be May 15. Best of Luck! :)

This is the implementation of Compositional Attention Networks for Machine Reasoning (ICLR 2018) on two visual reasoning datasets: CLEVR dataset and the New GQA dataset (CVPR 2019). We propose a fully differentiable model that learns to perform multi-step reasoning. See our website and blogpost for more information about the model!

In particular, the implementation includes the MAC cell at mac_cell.py. The code supports the standard cell as presented in the paper as well as additional extensions and variants. Run python main.py -h or see config.py for the complete list of options.

The adaptation of MAC as well as several baselines for the GQA dataset are located at the GQA branch.

Bibtex

For MAC:

@inproceedings{hudson2018compositional,
  title={Compositional Attention Networks for Machine Reasoning},
  author={Hudson, Drew A and Manning, Christopher D},
  journal={International Conference on Learning Representations (ICLR)},
  year={2018}
}

For the GQA dataset:

@article{hudson2018gqa,
  title={GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering},
  author={Hudson, Drew A and Manning, Christopher D},
  journal={Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}

Requirements

  • Tensorflow (originally has been developed with 1.3 but should work for later versions as well).
  • We have performed experiments on Maxwell Titan X GPU. We assume 12GB of GPU memory.
  • See requirements.txt for the required python packages and run pip install -r requirements.txt to install them.

Pre-processing

Before training the model, we first have to download the CLEVR dataset and extract features for the images:

Dataset

To download and unpack the data, run the following commands:

wget https://dl.fbaipublicfiles.com/clevr/CLEVR_v1.0.zip
unzip CLEVR_v1.0.zip
mv CLEVR_v1.0 CLEVR_v1
mkdir CLEVR_v1/data
mv CLEVR_v1/questions/* CLEVR_v1/data/

The final command moves the dataset questions into the data directory, where we will put all the data files we use during training.

Feature extraction

Extract ResNet-101 features for the CLEVR train, val, and test images with the following commands:

python extract_features.py --input_image_dir CLEVR_v1/images/train --output_h5_file CLEVR_v1/data/train.h5 --batch_size 32
python extract_features.py --input_image_dir CLEVR_v1/images/val --output_h5_file CLEVR_v1/data/val.h5 --batch_size 32
python extract_features.py --input_image_dir CLEVR_v1/images/test --output_h5_file CLEVR_v1/data/test.h5 --batch_size 32

Training

To train the model, run the following command:

python main.py --expName "clevrExperiment" --train --testedNum 10000 --epochs 25 --netLength 4 @configs/args.txt

First, the program preprocesses the CLEVR questions. It tokenizes them and maps them to integers to prepare them for the network. It then stores a JSON with that information about them as well as word-to-integer dictionaries in the ./CLEVR_v1/data directory.

Then, the program trains the model. Weights are saved by default to ./weights/{expName} and statistics about the training are collected in ./results/{expName}, where expName is the name we choose to give to the current experiment.

Notes

  • The number of examples used for training and evaluation can be set by --trainedNum and --testedNum respectively.
  • You can use the -r flag to restore and continue training a previously pre-trained model.
  • We recommend you to try out varying the number of MAC cells used in the network through the --netLength option to explore different lengths of reasoning processes.
  • Good lengths for CLEVR are in the range of 4-16 (using more cells tends to converge faster and achieves a bit higher accuracy, while lower number of cells usually results in more easily interpretable attention maps).

Model variants

We have explored several variants of our model. We provide a few examples in configs/args2-4.txt. For instance, you can run the first by:

python main.py --expName "experiment1" --train --testedNum 10000 --epochs 40 --netLength 6 @configs/args2.txt
  • args2 uses a non-recurrent variant of the control unit that converges faster.
  • args3 incorporates self-attention into the write unit.
  • args4 adds control-based gating over the memory.

See config.py for further available options (Note that some of them are still in an experimental stage).

Evalutation

To evaluate the trained model, and get predictions and attention maps, run the following:

python main.py --expName "clevrExperiment" --finalTest --testedNum 10000 --netLength 16 -r --getPreds --getAtt @configs/args.txt

The command will restore the model we have trained, and evaluate it on the validation set. JSON files with predictions and the attention distributions resulted by running the model are saved by default to ./preds/{expName}.

  • In case you are interested in getting attention maps (--getAtt), and to avoid having large prediction files, we advise you to limit the number of examples evaluated to 5,000-20,000.

Visualization

After we evaluate the model with the command above, we can visualize the attention maps generated by running:

python visualization.py --expName "clevrExperiment" --tier val 

(Tier can be set to train or test as well). The script supports filtering of the visualized questions by various ways. See visualization.py for further details.

To get more interpretable visualizations, it is highly recommended to reduce the number of cells to 4-8 (--netLength). Using more cells allows the network to learn more effective ways to approach the task but these tend to be less interpretable compared to a shorter networks (with less cells).

Optionally, to make the image attention maps look a little bit nicer, you can do the following (using imagemagick):

for x in preds/clevrExperiment/*Img*.png; do magick convert $x -brightness-contrast 20x35 $x; done;

Thank you for your interest in our model! Please contact me at [email protected] for any questions, comments, or suggestions! :-)

mac-network's People

Contributors

dorarad avatar drtonyr avatar kamalkraj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mac-network's Issues

About Grounding in GQA

Thanks for the repo

I find that you have did the Grounding experiments in GQA Datasets in the paper, but I'm confused on it, for example, how did you transform GQA questions to the Grounding sentences, and what is the answer corresponding on the GQA question when doing the Grounding experiments, could you detail it, thanks.

Training on VQAv2

I see that there is option to choose dataset to be VQA. Wanted to know if I could train it on VQAv2 and if so how?

preprocess.py extraDataset

First of all, I am trying to use extraDataset. There is one bug I found in preprocessData function:
for tier in extraData should be extraDataset

Secondly, it is unclear how we can use extra options, especially what extraVal means. Does it mean that only extra validation set is used in validation, but the code seems to only train on the validation set? So does it mean that the model only train on the validation set of the extra data?

Number of unique answers.

Thank you very much for the nice dataset !

I have a question about the number of unique answers in the GQA dataset.
When computing the number of unique answers I get:

  • 1845 answers for the training split (based on combining each 'answer' of the 10 training files)
  • 1852 answers for train + valid
  • 1853 for train + valid + test_dev splits.

In the paper, you mention that there are 1878, is this discrepancy caused by some answers only being present in the test split ?

Have a great day :)

Yana

Evaluation error

Hello! Thank you so much for putting up this beautiful work!

After training on the args.txt configuration, I proceeded to evaluate as per the instructions. I obtained the following error. Do you know what is going on?

2020-06-26 00:53:56.788567: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-26 00:53:56.795400: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2599845000 Hz
2020-06-26 00:53:56.795894: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xecc6dc0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-26 00:53:56.795930: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-06-26 00:53:56.801185: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-26 00:53:56.923729: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xecf9b60 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-26 00:53:56.923796: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P40, Compute Capability 6.1
2020-06-26 00:53:56.925923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla P40 major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:84:00.0
2020-06-26 00:53:56.927409: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.928514: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.929673: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.930791: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.932004: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.933170: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.933611: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.933650: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-06-26 00:53:56.933691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-26 00:53:56.933715: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-06-26 00:53:56.933736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
�[1mPreprocess data...�[0m
�[1mLoading data...�[0m
took 76.19 seconds
�[1mLoading word vectors...�[0m
0
{'': 0, '': 1, '': 2, '': 3, 'are': 4, 'there': 5, 'more': 6, 'big': 7, 'green': 8, 'things': 9, 'than': 10, 'large': 11, 'purple': 12, 'shiny': 13, 'cubes': 14, 'how': 15, 'many': 16, 'other': 17, 'of': 18, 'the': 19, 'same': 20, 'shape': 21, 'as': 22, 'tiny': 23, 'cyan': 24, 'matte': 25, 'object': 26, 'is': 27, 'color': 28, 'sphere': 29, 'cube': 30, 'what': 31, 'material': 32, 'that': 33, 'right': 34, 'brown': 35, 'cylinder': 36, 'and': 37, 'left': 38, 'gray': 39, 'on': 40, 'side': 41, 'small': 42, 'rubber': 43, 'behind': 44, 'thing': 45, 'to': 46, 'metallic': 47, 'size': 48, 'any': 49, 'have': 50, 'block': 51, 'blue': 52, 'yellow': 53, 'a': 54, ';': 55, 'it': 56, 'ball': 57, 'its': 58, 'in': 59, 'front': 60, 'does': 61, 'number': 62, 'red': 63, 'spheres': 64, 'made': 65, 'metal': 66, 'cylinders': 67, 'both': 68, 'balls': 69, 'or': 70, 'blocks': 71, 'objects': 72, 'visible': 73, 'another': 74, 'has': 75, 'greater': 76, 'fewer': 77, 'less': 78, 'either': 79, 'anything': 80, 'else': 81, 'do': 82, 'an': 83, 'equal': 84}
85
{'yes': 0, '2': 1, 'no': 2, 'rubber': 3, 'large': 4, '0': 5, 'sphere': 6, 'gray': 7, 'cube': 8, 'blue': 9, 'brown': 10, '1': 11, 'yellow': 12, 'purple': 13, 'cylinder': 14, 'small': 15, 'green': 16, 'metal': 17, '3': 18, '4': 19, 'cyan': 20, '6': 21, 'red': 22, '5': 23, '8': 24, '7': 25, '9': 26, '10': 27}
28
{'': 0, '': 1, '': 2, '': 3, 'are': 4, 'there': 5, 'more': 6, 'big': 7, 'green': 8, 'things': 9, 'than': 10, 'large': 11, 'purple': 12, 'shiny': 13, 'cubes': 14, 'yes': 15, 'how': 16, 'many': 17, 'other': 18, 'of': 19, 'the': 20, 'same': 21, 'shape': 22, 'as': 23, 'tiny': 24, 'cyan': 25, 'matte': 26, 'object': 27, '2': 28, 'is': 29, 'color': 30, 'sphere': 31, 'cube': 32, 'no': 33, 'what': 34, 'material': 35, 'that': 36, 'right': 37, 'brown': 38, 'cylinder': 39, 'and': 40, 'left': 41, 'rubber': 42, 'gray': 43, 'on': 44, 'side': 45, 'small': 46, 'behind': 47, 'thing': 48, '0': 49, 'to': 50, 'metallic': 51, 'size': 52, 'any': 53, 'have': 54, 'block': 55, 'blue': 56, 'yellow': 57, 'a': 58, ';': 59, 'it': 60, 'ball': 61, 'its': 62, 'in': 63, 'front': 64, 'does': 65, 'number': 66, 'red': 67, 'spheres': 68, 'made': 69, 'metal': 70, 'cylinders': 71, '1': 72, 'both': 73, 'balls': 74, 'or': 75, 'blocks': 76, 'objects': 77, 'visible': 78, 'another': 79, 'has': 80, 'greater': 81, 'fewer': 82, 'less': 83, '3': 84, '4': 85, 'either': 86, 'anything': 87, 'else': 88, 'do': 89, '6': 90, 'an': 91, 'equal': 92, '5': 93, '8': 94, '7': 95, '9': 96, '10': 97}
98
took 0.00 seconds
�[1mVectorizing data...�[0m
took 13.70 seconds
took �[1m�[34m89.90�[0m seconds
�[1mBuilding model...�[0m
took �[1m�[34m17.59�[0m seconds
Traceback (most recent call last):
File "main.py", line 802, in
main()
File "main.py", line 691, in main
epoch = loadWeights(sess, saver, init)
File "main.py", line 190, in loadWeights
config.restoreEpoch, config.lr = lastLoggedEpoch()
File "main.py", line 62, in lastLoggedEpoch
epoch = int(lastLine[0])
ValueError: invalid literal for int() with base 10: 'epoch'

Thank you very much in advance for the help! :) Take care :)

Pretrained mac network

Thanks for this awesome code base and dataset! :)
Do you plan to release pretrained weights for the mac network?

-bash: fork: Cannot allocate memory

Thanks for the repo

On trying to evaluate the model using python main.py --expName "gqaExperiment" --finalTest --testedNum 1000 --netLength 4 -r --submission --getPreds @configs/gqa/gqa.txt, it always seems to stop preprocessing at 64% and gives the error -bash: fork: Cannot allocate memory

Does this have something to do with having only 16 GB of RAM or would this be because of some other issue?

Thanks

About object features

Hi~, thank you for your great work. I have one question about the object features. The object features files just contain object features and objects' bounding boxes, how can I know the object class and attributes? I know that sceneGraphs.zip provides information about the objects, attributes and relations in the image, but how does this information correspond to the object features files?
Looking forward for your response, thank you~

ValueError while training on data1.2

I followed the instructions on the readme.md and the training part worked with no error with data.zip before.

However, when I clone the current version of the gqa branch and followed the instructions on the same md from the beginning(downloading everything again and merging), I am getting the following error in the training part just after the first epoch:

ValueError: Index (148690) out of range (0-108076)

Full stack trace after the first epoch is as follows:

eb  1, 78 (10000 / 10000), t = 0.32 (0.00+0.23), lr 0.0003, l = 1.9574, a = 0.5469, avL = 1.9961, avA = 0.5434, g = -1.0000, emL = 2.0224, emA = 0.5367; gqaExTraceback (most recent call last):
  File "main.py", line 848, in <module>
    main()
  File "main.py", line 710, in main
    evalRes = runEvaluation(sess, model, data["main"], dataOps, epoch, getPreds = getPreds, prevRes = evalRes)
  File "main.py", line 251, in runEvaluation
    minLoss = prevRes["val"]["minLoss"] if prevRes else float("inf"))
  File "main.py", line 571, in runEpoch
    imagesBatch = loadImageBatch(data["images"], batch)
  File "main.py", line 363, in loadImageBatch
    imageBatch[i, 0:numObjects] = toFile(imageId)["features"][imageId["idx"], 0:numObjects]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/dataset.py", line 553, in __getitem__
    selection = sel.select(self.shape, args, dsid=self.id)
  File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 94, in select
    sel[args]
  File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 261, in __getitem__
    start, count, step, scalar = _handle_simple(self.shape,args)
  File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 457, in _handle_simple
    x,y,z = _translate_int(int(arg), length)
  File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 477, in _translate_int
    raise ValueError("Index (%s) out of range (0-%s)" % (exp, length-1))
ValueError: Index (148690) out of range (0-108076)

using it for custom questions?

Hey,

After training it, I want to use the model to take custom questions given some image from the dataset.

I am not sure if I am right, but it feel like the code does not have that mode. Particularly, that it loads the json file for val,test and re initializes the embedding. Also vocab file takes quite a while to load from .pkl files.
Can you please help me with this?

About memory's variational dropout

First, thanks for sharing your great work.

As following your code, a question came across about your variational dropout on memory vector. (https://github.com/stanfordnlp/mac-network/blob/master/mac_cell.py#L215, https://github.com/stanfordnlp/mac-network/blob/master/mac_cell.py#L590)

It seems the mask is generated once when building graph, maintaining its shape afterwards (64, 512), and is applied always whether the model is in training or evaluation.

Since this kind of dropout produces stochastic result while evaluation, and accept only fixed batch size. I am wondering Is it ok to apply such method.

Correct me if I'm reading the code wrong.

Scene graph baseline for GQA

Hello,

Is there a way to run the scene graph baseline reported in the paper or are there any available details on how to implement it?

Scene graph

Just wonder, does anyone has the scene graph for all splits of the GQA Dataset?

I can't reproduce the performance as paper report

Thanks for your interesting work and I met some confusions when I run your code.
I run mac-network on GQA dataset as your github guidance. But the valid accuracy is lower than the paper mentioned. I run your code and the performance is listed as:
mac-network valid accuracy: 43.82
GQA-LSTM valid accuracy: 45.54
GQA-LSTMCNN valid accuracy: 42.03
Am I doing something wrong?

About object number

Hi,
I just processed the scene graph and the object feature, but I found that the number of objects in scenegraph.json is not equal to that in gqa_objects.h5 file. For example, the number of objects for image '2386621' is 16 in train_sceneGraphs.json but is 18 in gqa_objects.h5 file. Is there anything wrong with my processing? And how to match the object number and the feature?
Thanks!

KeyError: '11183447'

Hi, I uploaded test.json to evalai (test2019 phase) and got the following issues:

Traceback (most recent call last):
File "/code/scripts/workers/submission_worker.py", line 336, in run_submission
submission_metadata=submission_serializer.data,
File "/tmp/tmpnr9tlxl6/compute/challenge_data/challenge_225/main.py", line 96, in evaluate
output["result"].append({tier: getScores(questions, questions, predictions, tier, kwargs['submission_metadata']['method_name'])})
File "/tmp/tmpnr9tlxl6/compute/challenge_data/challenge_225/main.py", line 315, in getScores
predicted = predictions[qid]
KeyError: '11183447'

I find that the question id '11183447' is in validation split, not in the test split.
So, it is strange that there is a keyerror here in test2019 phase.

questions are not related to images

Hi,

I just trained a baseline(LSTM+CNN) and checked the predictions, but I figured out in the generated json file, several images have questions not corresponding to the objects in them. For example, this image(id:2359959) has questions:
Is there a sandwich in the image?
What kind of food is it?
Is the sandwich on the right?
Are there any clocks or flags?

But actually there is no food in the image.

Also, in this image(id:2371593) the question is:
In which part of the picture is the cat, the bottom or the top?

But actually the object is a person.

The questions are automatically generated for the images using scene graph, and I feel confused about in which step these mistakes may happen?

Thanks!

About fine-tuning on CLEVR-Humans

Hello again!

If you don't mind, I have one more question for detailed procedure of fine-tuning the CLEVR-Humans dataset.

I was able to reproduce 12-step MAC's accuracy (98.9%) using Pytorch, but failed to reproduce Humans after FT (result was 76.6%, lower than paper's 81.5%).

My fine-tuning was done by (1) load fully trained model on CLEVR, (2) initialize new words' embedding vectors just as original words, (3) re-training the model on CLEVR-Humans train dataset ONLY following original model's learning schedule.

It seems your fine-tuning code trains the model on mixture CLEVR and CLEVR-Humans train dataset rather than using only CLEVR-Humans train dataset. (sorry if I misread again 😢) So I'm guessing that this difference might be the reason.

Since using the mixture of both dataset will take longer than just using CLEVR-Humans, I'm opening the issue thinking you might encountered the same problem and could help me out.

Thanks!

how to submit to test server

Thanks for this nice repo. I have tried to run experiments on GQA, and it has no problem. After I have trained the model, I did not find the instruction on how I can create the .json file that can be used to submit to the EvalAI test server. Maybe you mentioned it somewhere, but I did not find it. It will be good if you can let me know how such a .json file can be created in order to submit to test server. Thank you!

AttributeError: module 'tensorflow.python.ops.nn' has no attribute 'rnn_cell'

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
File "main.py", line 23, in
from model import MACnet
File "/home/gpuuser/shikha_phd/shikha/mac-network_figure_qa/model.py", line 6, in
import ops
File "/home/gpuuser/shikha_phd/shikha/mac-network_figure_qa/ops.py", line 5, in
from mi_gru_cell import MiGRUCell
File "/home/gpuuser/shikha_phd/shikha/mac-network_figure_qa/mi_gru_cell.py", line 4, in
class MiGRUCell(tf.nn.rnn_cell.RNNCell):
AttributeError: module 'tensorflow.python.ops.nn' has no attribute 'rnn_cell'

Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000).

Hi, I'm trying to run the baseline:
CUDA_VISIBLE_DEVICES=1,2 python main.py --expName "gqaLSTM-CNN" --train --testedNum 10000 --epochs 25 @configs/gqa/gqaLSTMCNN.txt

I used tensorflow 1.5 with cudnn-7.3.1 and cuda-toolkit-9.0, but I got the error:
Preprocess data...
load dictionaries
Loading data...
Reading tier train
Reading tier val
Reading tier testdev
took 26.13 seconds
Loading word vectors...
loaded embs from file
took 0.02 seconds
Vectorizing data...
took 6.98 seconds
answerWordsNum
1845
took 35.19 seconds
Building model...
took 4.80 seconds
2019-04-08 09:23:06.386644: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-08 09:23:11.112367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:06:00.0
totalMemory: 7.93GiB freeMemory: 7.81GiB
2019-04-08 09:23:11.251205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 1 with properties:
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:09:00.0
totalMemory: 11.93GiB freeMemory: 2.15GiB
2019-04-08 09:23:11.251260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Device peer to peer matrix
2019-04-08 09:23:11.251277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1126] DMA: 0 1
2019-04-08 09:23:11.251322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 0: Y N
2019-04-08 09:23:11.251329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 1: N Y
2019-04-08 09:23:11.251341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:06:00.0, compute capability: 6.1)
2019-04-08 09:23:11.251352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:09:00.0, compute capability: 5.2)
Initializing weights
Training epoch 1...
2019-04-08 09:23:22.341741: E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2019-04-08 09:23:22.342735: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
Aborted

How could I fix it?

Best,
Ziyan

About MAC on GQA-like images

Hello,

I would like to run the model on images that are not in the GQA dataset, but as if they were in GQA (basically I just want replace some images of the dataset with other images, and keep asking the same questions). For running the model on GQA I simply followed the instructions on the GQA branch, which consist in downloading the spatial features and the objects features and then to merge them.

But how do I extract those features from other images? I saw the extract_features.py script, but I don't fully understand how to use it in order to extract both spatial and object features. And what about the other parameters (image_height, image_width, model_stage, batch_size)? What should I use in order to extract features in the same way as the ones that you generated and put available to download?

Thanks in advance.

Not found: Key macModel/MACnetwork/MACCell/linearLayerqInput10/biases/bias not found in checkpoint

Could not run the evaluation code: python main.py --expName "clevrExperiment" --finalTest --testedNum 10000 --netLength 16 -r --getPreds --getAtt @configs/args.txt
Due to error: Not found: Key macModel/MACnetwork/MACCell/linearLayerqInput10/biases/bias not found in checkpoint.
Check the main.py, found did not save bias.
Can you suggest how to fix this issues and where did you name bias, save to the checkpoint in the code? Thanks very much.

NotFoundError: 2 root error(s) found.
(0) Not found: Key macModel/MACnetwork/MACCell/linearLayerqInput10/biases/bias not found in checkpoint
[[{{node save/RestoreV2}}]]
(1) Not found: Key macModel/MACnetwork/MACCell/linearLayerqInput10/biases/bias not found in checkpoint
[[{{node save/RestoreV2}}]]
[[save/RestoreV2/_309]]
0 successful operations.
0 derived errors ignored.

consistency, validity, and plausibility in GQA

Dear @dorarad, I encounter several problems when I run a project on GQA. Could you please help me?

  1. Consistency evaluation. Which .json should be used to evaluate the consistency? I used testdev_balanced_questions.json, but an error occurred, ['2062326'] key error. I found this id is included in the testdev_all_questions.json.

  2. Validity and Plausibility. According to the provided eval.py, the json file should be the train_choices.json and val_choices.json. The KeyError: '201497576' will be triggered in code line: valid = belongs(predicted, choices[qid]["valid"], question). And, the two files have no ["valid"] and ["plausible"].

Could you please help me to solve these problems? Thank you

About Evaluation

For the evaluation I run the given command in readme with "--test" parameter but it gives "Index out of range" error. What might be the cause?

Testing on epoch 25...
Traceback (most recent call last):2 (0.00+0.24), lr 0.003, l = 2.3097, a = 0.5703, avL = 2.5296, avA = 0.6206, g = -1.0000, emL = 2.4391, emA = 0.6370; gqaExperiment
File "main.py", line 850, in
main()
File "main.py", line 777, in main
evalRes = runEvaluation(sess, model, data["main"], dataOps, epoch, evalTest = False, getPreds = True)
File "main.py", line 258, in runEvaluation
minLoss = prevRes["test"]["minLoss"] if prevRes else float("inf"))
File "main.py", line 573, in runEpoch
imagesBatch = loadImageBatch(data["images"], batch)
File "main.py", line 365, in loadImageBatch
imageBatch[i, 0:numObjects] = toFile(imageId)["features"][imageId["idx"], 0:numObjects]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/dataset.py", line 553, in getitem
selection = sel.select(self.shape, args, dsid=self.id)
File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 94, in select
sel[args]
File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 261, in getitem
start, count, step, scalar = _handle_simple(self.shape,args)
File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 457, in _handle_simple
x,y,z = _translate_int(int(arg), length)
File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 477, in _translate_int
raise ValueError("Index (%s) out of range (0-%s)" % (exp, length-1))
ValueError: Index (150458) out of range (0-148854)

License?

Hi,

Awesome paper :) Question: what is the license for the code?

Hugh

Train on full GQA dataset

hi, I currently work on this dataset.
I followed the guide and successfully train on Data1.2.zip and CLEVR version.
but now I want to train on the full Dataset from the website (70G).

  1. Is Dataset1.2.zip a small subset of which on the website, or it has all questions and images (just remove some unnecessary part)?
  2. if yes, then what should I do to run on "full" dataset with the baseline model?

thanks for your great work

GQA 2020 submisstion

I generated the submit_predict.json and submited it to GQA evaluation server. However, I got an accuracy of 0 in test phase, but the result in dev phase makes sense. Is it possible that I predict all wrong answers in test split?

What is wrong with the submission file?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.