stanfordnlp / mac-network Goto Github PK

View Code? Open in Web Editor NEW

492.0 31.0 120.0 210 KB

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)

License: Apache License 2.0

Python 100.00%

attention clevr machine-reasoning compositional-attention-networks tensorflow question-answering vqa

mac-network's Introduction

Compostional Attention Networks for Real-World Reasoning

Drew A. Hudson & Christopher D. Manning

Please note: We have updated the GQA challenge deadline to be May 15. Best of Luck! :)

This is the implementation of Compositional Attention Networks for Machine Reasoning (ICLR 2018) on two visual reasoning datasets: CLEVR dataset and the New GQA dataset (CVPR 2019). We propose a fully differentiable model that learns to perform multi-step reasoning. See our website and blogpost for more information about the model!

In particular, the implementation includes the MAC cell at mac_cell.py. The code supports the standard cell as presented in the paper as well as additional extensions and variants. Run python main.py -h or see config.py for the complete list of options.

The adaptation of MAC as well as several baselines for the GQA dataset are located at the GQA branch.

Bibtex

For MAC:

@inproceedings{hudson2018compositional,
  title={Compositional Attention Networks for Machine Reasoning},
  author={Hudson, Drew A and Manning, Christopher D},
  journal={International Conference on Learning Representations (ICLR)},
  year={2018}
}

For the GQA dataset:

@article{hudson2018gqa,
  title={GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering},
  author={Hudson, Drew A and Manning, Christopher D},
  journal={Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}

Requirements

Tensorflow (originally has been developed with 1.3 but should work for later versions as well).
We have performed experiments on Maxwell Titan X GPU. We assume 12GB of GPU memory.
See requirements.txt for the required python packages and run pip install -r requirements.txt to install them.

Pre-processing

Before training the model, we first have to download the CLEVR dataset and extract features for the images:

Dataset

To download and unpack the data, run the following commands:

wget https://dl.fbaipublicfiles.com/clevr/CLEVR_v1.0.zip
unzip CLEVR_v1.0.zip
mv CLEVR_v1.0 CLEVR_v1
mkdir CLEVR_v1/data
mv CLEVR_v1/questions/* CLEVR_v1/data/

The final command moves the dataset questions into the data directory, where we will put all the data files we use during training.

Feature extraction

Extract ResNet-101 features for the CLEVR train, val, and test images with the following commands:

python extract_features.py --input_image_dir CLEVR_v1/images/train --output_h5_file CLEVR_v1/data/train.h5 --batch_size 32
python extract_features.py --input_image_dir CLEVR_v1/images/val --output_h5_file CLEVR_v1/data/val.h5 --batch_size 32
python extract_features.py --input_image_dir CLEVR_v1/images/test --output_h5_file CLEVR_v1/data/test.h5 --batch_size 32

Training

To train the model, run the following command:

python main.py --expName "clevrExperiment" --train --testedNum 10000 --epochs 25 --netLength 4 @configs/args.txt

First, the program preprocesses the CLEVR questions. It tokenizes them and maps them to integers to prepare them for the network. It then stores a JSON with that information about them as well as word-to-integer dictionaries in the ./CLEVR_v1/data directory.

Then, the program trains the model. Weights are saved by default to ./weights/{expName} and statistics about the training are collected in ./results/{expName}, where expName is the name we choose to give to the current experiment.

Notes

The number of examples used for training and evaluation can be set by --trainedNum and --testedNum respectively.
You can use the -r flag to restore and continue training a previously pre-trained model.
We recommend you to try out varying the number of MAC cells used in the network through the --netLength option to explore different lengths of reasoning processes.
Good lengths for CLEVR are in the range of 4-16 (using more cells tends to converge faster and achieves a bit higher accuracy, while lower number of cells usually results in more easily interpretable attention maps).

Model variants

We have explored several variants of our model. We provide a few examples in configs/args2-4.txt. For instance, you can run the first by:

python main.py --expName "experiment1" --train --testedNum 10000 --epochs 40 --netLength 6 @configs/args2.txt

args2 uses a non-recurrent variant of the control unit that converges faster.
args3 incorporates self-attention into the write unit.
args4 adds control-based gating over the memory.

See config.py for further available options (Note that some of them are still in an experimental stage).

Evalutation

To evaluate the trained model, and get predictions and attention maps, run the following:

python main.py --expName "clevrExperiment" --finalTest --testedNum 10000 --netLength 16 -r --getPreds --getAtt @configs/args.txt

The command will restore the model we have trained, and evaluate it on the validation set. JSON files with predictions and the attention distributions resulted by running the model are saved by default to ./preds/{expName}.

In case you are interested in getting attention maps (--getAtt), and to avoid having large prediction files, we advise you to limit the number of examples evaluated to 5,000-20,000.

Visualization

After we evaluate the model with the command above, we can visualize the attention maps generated by running:

python visualization.py --expName "clevrExperiment" --tier val

(Tier can be set to train or test as well). The script supports filtering of the visualized questions by various ways. See visualization.py for further details.

To get more interpretable visualizations, it is highly recommended to reduce the number of cells to 4-8 (--netLength). Using more cells allows the network to learn more effective ways to approach the task but these tend to be less interpretable compared to a shorter networks (with less cells).

Optionally, to make the image attention maps look a little bit nicer, you can do the following (using imagemagick):

for x in preds/clevrExperiment/*Img*.png; do magick convert $x -brightness-contrast 20x35 $x; done;

Thank you for your interest in our model! Please contact me at [email protected] for any questions, comments, or suggestions! :-)

mac-network's People

Contributors

Stargazers

Watchers

Forkers

shubhampachori12110095 fotwo hyzcn tonydeep g-wang alikhalilli yinjc ammarshadiq meinwerk zgsxwsdxg stepabi faezs codeaudit deesatzed mannykayy chiuyeelau konstantinklepikov wanjinchang tony32769 agdolla kgraph yjimmyy kkochubey1 afcarl yynil ml-lab marcusrobust casillas-qf shyamalschandra thilinicooray kamalkraj batermj zdepablo mayankjobanputra bigdatasciencegroup decewei linhanxiao kimisissi kimiqq maplewzx chubbymaggie kushalkafle alkalami ankitshah009 yyf8989 sibei bbnsumanth pvk444 cyhbrilliant lazycrazyowl theailabs fendaq aningstar schangpi soudia henryfriedlander tomarraj008 yourtone mgunay15 abhinavds ronghanghu gnahzak limoneren lizw14 andreicostinescu whungt alexiehta fagan2888 aurooj lbda1 nlprmx chrisams dorarad wh-forker sushantakpani tianrenwang excelsimon andresespinosapc glaciohound ronsoohyeong wtdeng linktopast1990 solmur tommylitlle arturbeg rosspeckomplekt aidaah akshay107 jmfanbu michalp21 paradoxzw sumedhgodbole alexmirrington alirezazareian schen149 iswarm aiyaaa b-matchlsr bhathiya-hw zhmd

mac-network's Issues

About Grounding in GQA

Thanks for the repo

I find that you have did the Grounding experiments in GQA Datasets in the paper, but I'm confused on it, for example, how did you transform GQA questions to the Grounding sentences, and what is the answer corresponding on the GQA question when doing the Grounding experiments, could you detail it, thanks.

About "all_submission_data.json" file

Hello,
When I try to run the command for submission, it cannot find "all_submission_data.json" file. I downloaded all the data from https://nlp.stanford.edu/data/gqa and I also could not find that file. Am I missing something?
Thanks,

Training on VQAv2

I see that there is option to choose dataset to be VQA. Wanted to know if I could train it on VQAv2 and if so how?

ops.gatedAct no such function, in gqa branch

In model.py classifier method, line, an ops.gatedAct function is called, which does not (currently) exist in ops.py

preprocess.py extraDataset

First of all, I am trying to use extraDataset. There is one bug I found in preprocessData function:
for tier in extraData should be extraDataset

Secondly, it is unclear how we can use extra options, especially what extraVal means. Does it mean that only extra validation set is used in validation, but the code seems to only train on the validation set? So does it mean that the model only train on the validation set of the extra data?

Issue with the repo link using git clone

Hey, I think the download link for the repo has some issues. If i clone the directory using https://github.com/stanfordnlp/mac-network.git through git clone, all the files are not downloaded. The download zip option seems to be working fine though.

Number of unique answers.

Thank you very much for the nice dataset !

I have a question about the number of unique answers in the GQA dataset.
When computing the number of unique answers I get:

1845 answers for the training split (based on combining each 'answer' of the 10 training files)
1852 answers for train + valid
1853 for train + valid + test_dev splits.

In the paper, you mention that there are 1878, is this discrepancy caused by some answers only being present in the test split ?

Have a great day :)

Yana

Evaluation error

Hello! Thank you so much for putting up this beautiful work!

After training on the args.txt configuration, I proceeded to evaluate as per the instructions. I obtained the following error. Do you know what is going on?

2020-06-26 00:53:56.788567: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-26 00:53:56.795400: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2599845000 Hz
2020-06-26 00:53:56.795894: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xecc6dc0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-26 00:53:56.795930: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-06-26 00:53:56.801185: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-26 00:53:56.923729: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xecf9b60 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-26 00:53:56.923796: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P40, Compute Capability 6.1
2020-06-26 00:53:56.925923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla P40 major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:84:00.0
2020-06-26 00:53:56.927409: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.928514: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.929673: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.930791: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.932004: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.933170: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.933611: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /share/apps/python3/3.7.3/intel/lib:/share/apps/gcc/6.3.0/lib64:/share/apps/gcc/6.3.0/lib:/share/apps/mpc/1.0.3/gnu/lib:/share/apps/mpfr/3.1.5/gnu/lib:/share/apps/gmp/6.1.2/gnu/lib:/share/apps/intel/19.0.1/mkl/lib/intel64:/share/apps/intel/19.0.1/lib/intel64:/share/apps/centos/7/usr/lib64:/opt/slurm/lib64
2020-06-26 00:53:56.933650: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-06-26 00:53:56.933691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-26 00:53:56.933715: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-06-26 00:53:56.933736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
�[1mPreprocess data...�[0m
�[1mLoading data...�[0m
took 76.19 seconds
�[1mLoading word vectors...�[0m
0
{'': 0, '': 1, '': 2, '': 3, 'are': 4, 'there': 5, 'more': 6, 'big': 7, 'green': 8, 'things': 9, 'than': 10, 'large': 11, 'purple': 12, 'shiny': 13, 'cubes': 14, 'how': 15, 'many': 16, 'other': 17, 'of': 18, 'the': 19, 'same': 20, 'shape': 21, 'as': 22, 'tiny': 23, 'cyan': 24, 'matte': 25, 'object': 26, 'is': 27, 'color': 28, 'sphere': 29, 'cube': 30, 'what': 31, 'material': 32, 'that': 33, 'right': 34, 'brown': 35, 'cylinder': 36, 'and': 37, 'left': 38, 'gray': 39, 'on': 40, 'side': 41, 'small': 42, 'rubber': 43, 'behind': 44, 'thing': 45, 'to': 46, 'metallic': 47, 'size': 48, 'any': 49, 'have': 50, 'block': 51, 'blue': 52, 'yellow': 53, 'a': 54, ';': 55, 'it': 56, 'ball': 57, 'its': 58, 'in': 59, 'front': 60, 'does': 61, 'number': 62, 'red': 63, 'spheres': 64, 'made': 65, 'metal': 66, 'cylinders': 67, 'both': 68, 'balls': 69, 'or': 70, 'blocks': 71, 'objects': 72, 'visible': 73, 'another': 74, 'has': 75, 'greater': 76, 'fewer': 77, 'less': 78, 'either': 79, 'anything': 80, 'else': 81, 'do': 82, 'an': 83, 'equal': 84}
85
{'yes': 0, '2': 1, 'no': 2, 'rubber': 3, 'large': 4, '0': 5, 'sphere': 6, 'gray': 7, 'cube': 8, 'blue': 9, 'brown': 10, '1': 11, 'yellow': 12, 'purple': 13, 'cylinder': 14, 'small': 15, 'green': 16, 'metal': 17, '3': 18, '4': 19, 'cyan': 20, '6': 21, 'red': 22, '5': 23, '8': 24, '7': 25, '9': 26, '10': 27}
28
{'': 0, '': 1, '': 2, '': 3, 'are': 4, 'there': 5, 'more': 6, 'big': 7, 'green': 8, 'things': 9, 'than': 10, 'large': 11, 'purple': 12, 'shiny': 13, 'cubes': 14, 'yes': 15, 'how': 16, 'many': 17, 'other': 18, 'of': 19, 'the': 20, 'same': 21, 'shape': 22, 'as': 23, 'tiny': 24, 'cyan': 25, 'matte': 26, 'object': 27, '2': 28, 'is': 29, 'color': 30, 'sphere': 31, 'cube': 32, 'no': 33, 'what': 34, 'material': 35, 'that': 36, 'right': 37, 'brown': 38, 'cylinder': 39, 'and': 40, 'left': 41, 'rubber': 42, 'gray': 43, 'on': 44, 'side': 45, 'small': 46, 'behind': 47, 'thing': 48, '0': 49, 'to': 50, 'metallic': 51, 'size': 52, 'any': 53, 'have': 54, 'block': 55, 'blue': 56, 'yellow': 57, 'a': 58, ';': 59, 'it': 60, 'ball': 61, 'its': 62, 'in': 63, 'front': 64, 'does': 65, 'number': 66, 'red': 67, 'spheres': 68, 'made': 69, 'metal': 70, 'cylinders': 71, '1': 72, 'both': 73, 'balls': 74, 'or': 75, 'blocks': 76, 'objects': 77, 'visible': 78, 'another': 79, 'has': 80, 'greater': 81, 'fewer': 82, 'less': 83, '3': 84, '4': 85, 'either': 86, 'anything': 87, 'else': 88, 'do': 89, '6': 90, 'an': 91, 'equal': 92, '5': 93, '8': 94, '7': 95, '9': 96, '10': 97}
98
took 0.00 seconds
�[1mVectorizing data...�[0m
took 13.70 seconds
took �[1m�[34m89.90�[0m seconds
�[1mBuilding model...�[0m
took �[1m�[34m17.59�[0m seconds
Traceback (most recent call last):
File "main.py", line 802, in
main()
File "main.py", line 691, in main
epoch = loadWeights(sess, saver, init)
File "main.py", line 190, in loadWeights
config.restoreEpoch, config.lr = lastLoggedEpoch()
File "main.py", line 62, in lastLoggedEpoch
epoch = int(lastLine[0])
ValueError: invalid literal for int() with base 10: 'epoch'

Thank you very much in advance for the help! :) Take care :)

Pretrained mac network

Thanks for this awesome code base and dataset! :)
Do you plan to release pretrained weights for the mac network?

-bash: fork: Cannot allocate memory

Thanks for the repo

On trying to evaluate the model using python main.py --expName "gqaExperiment" --finalTest --testedNum 1000 --netLength 4 -r --submission --getPreds @configs/gqa/gqa.txt, it always seems to stop preprocessing at 64% and gives the error -bash: fork: Cannot allocate memory

Does this have something to do with having only 16 GB of RAM or would this be because of some other issue?

Thanks

About object features

Hi~, thank you for your great work. I have one question about the object features. The object features files just contain object features and objects' bounding boxes, how can I know the object class and attributes? I know that sceneGraphs.zip provides information about the objects, attributes and relations in the image, but how does this information correspond to the object features files?
Looking forward for your response, thank you~

ValueError while training on data1.2

I followed the instructions on the readme.md and the training part worked with no error with data.zip before.

However, when I clone the current version of the gqa branch and followed the instructions on the same md from the beginning(downloading everything again and merging), I am getting the following error in the training part just after the first epoch:

ValueError: Index (148690) out of range (0-108076)

Full stack trace after the first epoch is as follows:

eb  1, 78 (10000 / 10000), t = 0.32 (0.00+0.23), lr 0.0003, l = 1.9574, a = 0.5469, avL = 1.9961, avA = 0.5434, g = -1.0000, emL = 2.0224, emA = 0.5367; gqaExTraceback (most recent call last):
  File "main.py", line 848, in <module>
    main()
  File "main.py", line 710, in main
    evalRes = runEvaluation(sess, model, data["main"], dataOps, epoch, getPreds = getPreds, prevRes = evalRes)
  File "main.py", line 251, in runEvaluation
    minLoss = prevRes["val"]["minLoss"] if prevRes else float("inf"))
  File "main.py", line 571, in runEpoch
    imagesBatch = loadImageBatch(data["images"], batch)
  File "main.py", line 363, in loadImageBatch
    imageBatch[i, 0:numObjects] = toFile(imageId)["features"][imageId["idx"], 0:numObjects]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/dataset.py", line 553, in __getitem__
    selection = sel.select(self.shape, args, dsid=self.id)
  File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 94, in select
    sel[args]
  File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 261, in __getitem__
    start, count, step, scalar = _handle_simple(self.shape,args)
  File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 457, in _handle_simple
    x,y,z = _translate_int(int(arg), length)
  File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 477, in _translate_int
    raise ValueError("Index (%s) out of range (0-%s)" % (exp, length-1))
ValueError: Index (148690) out of range (0-108076)

using it for custom questions?

Hey,

After training it, I want to use the model to take custom questions given some image from the dataset.

I am not sure if I am right, but it feel like the code does not have that mode. Particularly, that it loads the json file for val,test and re initializes the embedding. Also vocab file takes quite a while to load from .pkl files.
Can you please help me with this?

‘official download page’ link encountered 404 error

Hi
‘official download page’ link encountered 404 error.
Plz check the website can work normally.

About memory's variational dropout

First, thanks for sharing your great work.

As following your code, a question came across about your variational dropout on memory vector. (https://github.com/stanfordnlp/mac-network/blob/master/mac_cell.py#L215, https://github.com/stanfordnlp/mac-network/blob/master/mac_cell.py#L590)

It seems the mask is generated once when building graph, maintaining its shape afterwards (64, 512), and is applied always whether the model is in training or evaluation.

Since this kind of dropout produces stochastic result while evaluation, and accept only fixed batch size. I am wondering Is it ok to apply such method.

Correct me if I'm reading the code wrong.

Difference between all_val_data.json and val_all_questions.json

There is a file called all_val_data.json (436.4MB) in https://nlp.stanford.edu/data/gqa/data.zip and a similar file called val_all_questions.json (1.7G) in https://nlp.stanford.edu/data/gqa/questions.zip.
So, which one is the validation set?

About the balanced dataset size

Hi,
I saw in the paper https://cs.stanford.edu/people/dorarad/gqa/gqaPaper.pdf that the balanced GQA dataset consists of 1.7M questions. However, when I download the dataset on https://nlp.stanford.edu/data/gqa/questions1.2.zip and count the number of samples. It's weird since I only found the balanced dataset with 1M samples.
Thanks.

Scene graph baseline for GQA

Hello,

Is there a way to run the scene graph baseline reported in the paper or are there any available details on how to implement it?

Scene graph

Just wonder, does anyone has the scene graph for all splits of the GQA Dataset?

Interpretability

Hi there!

First of all thanks for publishing the code! I really enjoy this work :-).
I would like to ask if you could maybe share the exact parameters you used to create the network that produces this: https://camo.githubusercontent.com/e9e9464bfc10736d86b150ada2d8f68e74d3afae/68747470733a2f2f63732e7374616e666f72642e6564752f70656f706c652f646f72617261642f6d61632f696d67732f76697375616c2e706e67

Thanks in advance!

I can't reproduce the performance as paper report

Thanks for your interesting work and I met some confusions when I run your code.
I run mac-network on GQA dataset as your github guidance. But the valid accuracy is lower than the paper mentioned. I run your code and the performance is listed as:
mac-network valid accuracy: 43.82
GQA-LSTM valid accuracy: 45.54
GQA-LSTMCNN valid accuracy: 42.03
Am I doing something wrong?

Which tools used to draw such a variety of elegant and decent Figures?

Hi, Thanks for your insightful work.
I want to know which tools you used to draw the figures in your BLOGPOST and paper. There all look so attracting & vivid!

About object number

Hi,
I just processed the scene graph and the object feature, but I found that the number of objects in scenegraph.json is not equal to that in gqa_objects.h5 file. For example, the number of objects for image '2386621' is 16 in train_sceneGraphs.json but is 18 in gqa_objects.h5 file. Is there anything wrong with my processing? And how to match the object number and the feature?
Thanks!

About releasing features for the ground-truth bounding boxes

Hi, @dorarad Could you kindly tell me when will you release the features for the ground-truth bounding boxes

KeyError: '11183447'

Hi, I uploaded test.json to evalai (test2019 phase) and got the following issues:

Traceback (most recent call last):
File "/code/scripts/workers/submission_worker.py", line 336, in run_submission
submission_metadata=submission_serializer.data,
File "/tmp/tmpnr9tlxl6/compute/challenge_data/challenge_225/main.py", line 96, in evaluate
output["result"].append({tier: getScores(questions, questions, predictions, tier, kwargs['submission_metadata']['method_name'])})
File "/tmp/tmpnr9tlxl6/compute/challenge_data/challenge_225/main.py", line 315, in getScores
predicted = predictions[qid]
KeyError: '11183447'

I find that the question id '11183447' is in validation split, not in the test split.
So, it is strange that there is a keyerror here in test2019 phase.

questions are not related to images

Hi,

I just trained a baseline(LSTM+CNN) and checked the predictions, but I figured out in the generated json file, several images have questions not corresponding to the objects in them. For example, this image(id:2359959) has questions:
Is there a sandwich in the image?
What kind of food is it?
Is the sandwich on the right?
Are there any clocks or flags?

But actually there is no food in the image.

Also, in this image(id:2371593) the question is:
In which part of the picture is the cat, the bottom or the top?

But actually the object is a person.

The questions are automatically generated for the images using scene graph, and I feel confused about in which step these mistakes may happen?

Thanks!

About fine-tuning on CLEVR-Humans

Hello again!

If you don't mind, I have one more question for detailed procedure of fine-tuning the CLEVR-Humans dataset.

I was able to reproduce 12-step MAC's accuracy (98.9%) using Pytorch, but failed to reproduce Humans after FT (result was 76.6%, lower than paper's 81.5%).

My fine-tuning was done by (1) load fully trained model on CLEVR, (2) initialize new words' embedding vectors just as original words, (3) re-training the model on CLEVR-Humans train dataset ONLY following original model's learning schedule.

It seems your fine-tuning code trains the model on mixture CLEVR and CLEVR-Humans train dataset rather than using only CLEVR-Humans train dataset. (sorry if I misread again 😢) So I'm guessing that this difference might be the reason.

Since using the mixture of both dataset will take longer than just using CLEVR-Humans, I'm opening the issue thinking you might encountered the same problem and could help me out.

Thanks!

how to submit to test server

Thanks for this nice repo. I have tried to run experiments on GQA, and it has no problem. After I have trained the model, I did not find the instruction on how I can create the .json file that can be used to submit to the EvalAI test server. Maybe you mentioned it somewhere, but I did not find it. It will be good if you can let me know how such a .json file can be created in order to submit to test server. Thank you!

AttributeError: module 'tensorflow.python.ops.nn' has no attribute 'rnn_cell'

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
File "main.py", line 23, in
from model import MACnet
File "/home/gpuuser/shikha_phd/shikha/mac-network_figure_qa/model.py", line 6, in
import ops
File "/home/gpuuser/shikha_phd/shikha/mac-network_figure_qa/ops.py", line 5, in
from mi_gru_cell import MiGRUCell
File "/home/gpuuser/shikha_phd/shikha/mac-network_figure_qa/mi_gru_cell.py", line 4, in
class MiGRUCell(tf.nn.rnn_cell.RNNCell):
AttributeError: module 'tensorflow.python.ops.nn' has no attribute 'rnn_cell'

About gqa_sptail_info.json

In what script are these files being generated?

Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000).

Hi, I'm trying to run the baseline:
CUDA_VISIBLE_DEVICES=1,2 python main.py --expName "gqaLSTM-CNN" --train --testedNum 10000 --epochs 25 @configs/gqa/gqaLSTMCNN.txt

I used tensorflow 1.5 with cudnn-7.3.1 and cuda-toolkit-9.0, but I got the error:
Preprocess data...
load dictionaries
Loading data...
Reading tier train
Reading tier val
Reading tier testdev
took 26.13 seconds
Loading word vectors...
loaded embs from file
took 0.02 seconds
Vectorizing data...
took 6.98 seconds
answerWordsNum
1845
took 35.19 seconds
Building model...
took 4.80 seconds
2019-04-08 09:23:06.386644: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-08 09:23:11.112367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:06:00.0
totalMemory: 7.93GiB freeMemory: 7.81GiB
2019-04-08 09:23:11.251205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 1 with properties:
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:09:00.0
totalMemory: 11.93GiB freeMemory: 2.15GiB
2019-04-08 09:23:11.251260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Device peer to peer matrix
2019-04-08 09:23:11.251277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1126] DMA: 0 1
2019-04-08 09:23:11.251322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 0: Y N
2019-04-08 09:23:11.251329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 1: N Y
2019-04-08 09:23:11.251341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:06:00.0, compute capability: 6.1)
2019-04-08 09:23:11.251352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:09:00.0, compute capability: 5.2)
Initializing weights
Training epoch 1...
2019-04-08 09:23:22.341741: E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2019-04-08 09:23:22.342735: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
Aborted

How could I fix it?

Best,
Ziyan

About MAC on GQA-like images

Hello,

I would like to run the model on images that are not in the GQA dataset, but as if they were in GQA (basically I just want replace some images of the dataset with other images, and keep asking the same questions). For running the model on GQA I simply followed the instructions on the GQA branch, which consist in downloading the spatial features and the objects features and then to merge them.

But how do I extract those features from other images? I saw the extract_features.py script, but I don't fully understand how to use it in order to extract both spatial and object features. And what about the other parameters (image_height, image_width, model_stage, batch_size)? What should I use in order to extract features in the same way as the ones that you generated and put available to download?

Thanks in advance.

When will the code of the Neural State Machine be released?

Thanks.

Servers are not responding to download image features

Hi,

We needed to re-download all features from the given link in the readme. However, currently the servers are not responding both for objects and spatial features links (https://www.isitdownrightnow.com/nlp.stanford.edu.html). Is this a temporary outage?

Not found: Key macModel/MACnetwork/MACCell/linearLayerqInput10/biases/bias not found in checkpoint

Could not run the evaluation code: python main.py --expName "clevrExperiment" --finalTest --testedNum 10000 --netLength 16 -r --getPreds --getAtt @configs/args.txt
Due to error: Not found: Key macModel/MACnetwork/MACCell/linearLayerqInput10/biases/bias not found in checkpoint.
Check the main.py, found did not save bias.
Can you suggest how to fix this issues and where did you name bias, save to the checkpoint in the code? Thanks very much.

NotFoundError: 2 root error(s) found.
(0) Not found: Key macModel/MACnetwork/MACCell/linearLayerqInput10/biases/bias not found in checkpoint
[[{{node save/RestoreV2}}]]
(1) Not found: Key macModel/MACnetwork/MACCell/linearLayerqInput10/biases/bias not found in checkpoint
[[{{node save/RestoreV2}}]]
[[save/RestoreV2/_309]]
0 successful operations.
0 derived errors ignored.

Non-issue

consistency, validity, and plausibility in GQA

Dear @dorarad, I encounter several problems when I run a project on GQA. Could you please help me?

Consistency evaluation. Which .json should be used to evaluate the consistency? I used testdev_balanced_questions.json, but an error occurred, ['2062326'] key error. I found this id is included in the testdev_all_questions.json.
Validity and Plausibility. According to the provided eval.py, the json file should be the train_choices.json and val_choices.json. The KeyError: '201497576' will be triggered in code line: valid = belongs(predicted, choices[qid]["valid"], question). And, the two files have no ["valid"] and ["plausible"].

Could you please help me to solve these problems? Thank you

About Evaluation

For the evaluation I run the given command in readme with "--test" parameter but it gives "Index out of range" error. What might be the cause?

Testing on epoch 25...
Traceback (most recent call last):2 (0.00+0.24), lr 0.003, l = 2.3097, a = 0.5703, avL = 2.5296, avA = 0.6206, g = -1.0000, emL = 2.4391, emA = 0.6370; gqaExperiment
File "main.py", line 850, in
main()
File "main.py", line 777, in main
evalRes = runEvaluation(sess, model, data["main"], dataOps, epoch, evalTest = False, getPreds = True)
File "main.py", line 258, in runEvaluation
minLoss = prevRes["test"]["minLoss"] if prevRes else float("inf"))
File "main.py", line 573, in runEpoch
imagesBatch = loadImageBatch(data["images"], batch)
File "main.py", line 365, in loadImageBatch
imageBatch[i, 0:numObjects] = toFile(imageId)["features"][imageId["idx"], 0:numObjects]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/dataset.py", line 553, in getitem
selection = sel.select(self.shape, args, dsid=self.id)
File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 94, in select
sel[args]
File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 261, in getitem
start, count, step, scalar = _handle_simple(self.shape,args)
File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 457, in _handle_simple
x,y,z = _translate_int(int(arg), length)
File "/home/ec2-user/conda/envs/tf_gpu/lib/python3.6/site-packages/h5py/_hl/selections.py", line 477, in _translate_int
raise ValueError("Index (%s) out of range (0-%s)" % (exp, length-1))
ValueError: Index (150458) out of range (0-148854)

License?

Hi,

Awesome paper :) Question: what is the license for the code?

Hugh

Train on full GQA dataset

hi, I currently work on this dataset.
I followed the guide and successfully train on Data1.2.zip and CLEVR version.
but now I want to train on the full Dataset from the website (70G).

Is Dataset1.2.zip a small subset of which on the website, or it has all questions and images (just remove some unnecessary part)?
if yes, then what should I do to run on "full" dataset with the baseline model?

thanks for your great work

GQA 2020 submisstion

I generated the submit_predict.json and submited it to GQA evaluation server. However, I got an accuracy of 0 in test phase, but the result in dev phase makes sense. Is it possible that I predict all wrong answers in test split?

What is wrong with the submission file?

Is the question engine for gqa dataset available?

Hi. Thanks for the amazing repository. I was wondering if the question generation engine used to create the gqa dataset is available anywhere.