Coder Social home page Coder Social logo

iter-reason's Introduction

Iterative Visual Reasoning Beyond Convolutions

By Xinlei Chen, Li-Jia Li, Li Fei-Fei and Abhinav Gupta.

Disclaimer

  • This is the authors' implementation of the system described in the paper, not an official Google product.
  • Right now:
    • The available reasoning module is based on convolutions and spatial memory.
    • For simplicity, the released code uses the tensorflow default crop_and_resize operation, rather than the customized one reported in the paper (I find the default one is actually better by ~1%).

Prerequisites

  1. Tensorflow, tested with version 1.6 with Ubuntu 16.04, installed with:
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.6.0-cp27-none-linux_x86_64.whl
  1. Other packages needed can be installed with pip:
pip install Cython easydict matplotlib opencv-python Pillow pyyaml scipy
  1. For running COCO, the API can be installed globally:
# any path is okay
mkdir ~/install && cd ~/install
git clone https://github.com/cocodataset/cocoapi.git cocoapi
cd cocoapi/PythonAPI
python setup.py install --user

Setup and Running

  1. Clone the repository.
git clone https://github.com/endernewton/iter-reason.git
cd iter-reason
  1. Set up data, here we use ADE20K as an example.
mkdir -p data/ADE
cd data/ADE
wget -v http://groups.csail.mit.edu/vision/datasets/ADE20K/ADE20K_2016_07_26.zip
tar -xzvf ADE20K_2016_07_26.zip
mv ADE20K_2016_07_26/* ./
rmdir ADE20K_2016_07_26
# then get the train/val/test split
wget -v http://xinleic.xyz/data/ADE_split.tar.gz
tar -xzvf ADE_split.tar.gz
rm -vf ADE_split.tar.gz
cd ../..
  1. Set up pre-trained ImageNet models. This is similarly done in tf-faster-rcnn. Here by default we use ResNet-50 as the backbone:
 mkdir -p data/imagenet_weights
 cd data/imagenet_weights
 wget -v http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
 tar -xzvf resnet_v1_50_2016_08_28.tar.gz
 mv resnet_v1_50.ckpt res50.ckpt
 cd ../..
  1. Compile the library (for computing bounding box overlaps).
cd lib
make
cd ..
  1. Now you are ready to run! For example, to train and test the baseline:
./experiments/scripts/train.sh [GPU_ID] [DATASET] [NET] [STEPS] [ITER] 
# GPU_ID is the GPU you want to test on
# DATASET in {ade, coco, vg} is the dataset to train/test on, defined in the script
# NET in {res50, res101} is the backbone networks to choose from
# STEPS (x10K) is the number of iterations before it reduces learning rate, can support multiple steps separated by character 'a'
# ITER (x10K) is the total number of iterations to run
# Examples:
# train on ADE20K for 320K iterations, reducing learning rate at 280K.
./experiments/scripts/train.sh 0 ade 28 32
# train on COCO for 720K iterations, reducing at 320K and 560K.
./experiments/scripts/train.sh 1 coco 32a56 72
  1. To train and test the reasoning modules (based on ResNet-50):
./experiments/scripts/train_memory.sh [GPU_ID] [DATASET] [MEM] [STEPS] [ITER] 
# MEM in {local} is the type of reasoning modules to use 
# Examples:
# train on ADE20K on the local spatial memory.
./experiments/scripts/train_memory.sh 0 ade local 28 32
  1. Once the training is done, you can test the models separately with test.sh and test_memory.sh, we also provided a separate set of scripts to test on larger image inputs.

  2. You can use tensorboard to visualize and track the progress, for example:

tensorboard --logdir=tensorboard/res50/ade_train_5/ --port=7002 &

References

@inproceedings{chen18iterative,
    author = {Xinlei Chen and Li-Jia Li and Li Fei-Fei and Abhinav Gupta},
    title = {Iterative Visual Reasoning Beyond Convolutions},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    Year = {2018}
}

The idea of spatial memory was developed in:

@inproceedings{chen2017spatial,
    author = {Xinlei Chen and Abhinav Gupta},
    title = {Spatial Memory for Context Reasoning in Object Detection},
    booktitle = {Proceedings of the International Conference on Computer Vision},
    Year = {2017}
}

iter-reason's People

Contributors

endernewton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

iter-reason's Issues

some questions about vgd

what is the file synsets.txt in floder /lib/datasets/visual_genome #24?
I can not find this file in this project or at the vgd website http://visualgenome.org/api/v0/api_home.html..

graph based module

Hi,xinlei
     I like your paper very much, but I have som​e confusion about the graph based reasoning?  In these issues, ​I found that you have given some raw codes, however not to i​​n​​c​​o​​r​​p​​orate them to the ​mai​n code ​​​​​​.Did you continue to complete this work later?
Looking forward to your answer.

How to test my own image?

Hello, I am very interested in your novel method. When I finish the "train_memory.sh" ,how can I test my own image? Just apply "test_memory.sh"? I would like to know details,thanks!

Reshape error when testing on COCO

Hello, I was trying run this code to but got a reshape error when testing on COCO.
I have finished training and got a model. Then I run ./experiments/scripts/test_memory.sh 0 coco local 32a56 72 and got output like this:

+ set -e
+ export PYTHONUNBUFFERED=True
+ PYTHONUNBUFFERED=True
+ NET_BASE=res101
+ GPU_ID=0
+ DATASET=coco
+ NET=res101_local
+ OIFS=' 	
'
+ IFS=a
+ STEP=32a56
+ STEPSIZE='['
+ for i in '$STEP'
+ STEPSIZE='[320000,'
+ for i in '$STEP'
+ STEPSIZE='[320000,560000,'
+ STEPSIZE='[320000,560000,]'
+ IFS=' 	
'
+ ITERS=720000
+ array=($@)
+ len=5
+ EXTRA_ARGS=
+ EXTRA_ARGS_SLUG=
+ case ${DATASET} in
+ TRAIN_IMDB=coco_2014_train+coco_2014_valminusminival
+ TEST_IMDBS=("coco_2014_minival")
+ declare -a TEST_IMDBS
+ [[ ! -z '' ]]
+ EXTRA_ARGS_SLUG=32a56_72
++ date +%Y-%m-%d_%H-%M-%S
+ LOG=experiments/logs/test_res101_local_coco_2014_train+coco_2014_valminusminival_32a56_72.txt.2018-12-28_11-12-35
+ exec
++ tee -a experiments/logs/test_res101_local_coco_2014_train+coco_2014_valminusminival_32a56_72.txt.2018-12-28_11-12-35
+ echo Logging output to experiments/logs/test_res101_local_coco_2014_train+coco_2014_valminusminival_32a56_72.txt.2018-12-28_11-12-35
Logging output to experiments/logs/test_res101_local_coco_2014_train+coco_2014_valminusminival_32a56_72.txt.2018-12-28_11-12-35
+ set +x
+ for TEST_IMDB in '"${TEST_IMDBS[@]}"'
+ CUDA_VISIBLE_DEVICES=0
+ python ./tools/test_memory.py --imdb coco_2014_minival --model output/res101_local/coco_2014_train+coco_2014_valminusminival/32a56_72/res101_local_iter_720000.ckpt --cfg experiments/cfgs/res101_local.yml --tag 32a56_72 --net res101_local --visualize --set
Called with args:
Namespace(cfg_file='experiments/cfgs/res101_local.yml', imdb_name='coco_2014_minival', model='output/res101_local/coco_2014_train+coco_2014_valminusminival/32a56_72/res101_local_iter_720000.ckpt', net='res101_local', set_cfgs=[], tag='32a56_72', visualize=True)
Using config:
{'BOTTLE_SCALE': 16.0,
 'CLASSES': None,
 'DATA_DIR': '/home/zty/iter-reason/data',
 'EPS': 1e-14,
 'EXP_DIR': 'res101_local',
 'MEM': {'BETA': 0.5,
         'C': 512,
         'CONV': 3,
         'CROP_SIZE': 7,
         'CT_CONV': 3,
         'CT_FCONV': 3,
         'CT_L': 3,
         'C_STD': 0.01,
         'FC_C': 4096,
         'FC_L': 2,
         'FM_R': 1.0,
         'FP_R': 1.0,
         'INIT_H': 20,
         'INIT_W': 20,
         'IN_CONV': 3,
         'IN_L': 2,
         'ITER': 3,
         'STD': 0.01,
         'U_STD': 0.01,
         'VG_R': 1.0,
         'WEIGHT': 1.0,
         'WEIGHT_FINAL': 1.0},
 'MOBILENET': {'DEPTH_MULTIPLIER': 1.0,
               'FIXED_LAYERS': 5,
               'REGU_DEPTH': False,
               'WEIGHT_DECAY': 4e-05},
 'PIXEL_MEANS': array([[[102.9801, 115.9465, 122.7717]]]),
 'POOLING_SIZE': 7,
 'RESNET': {'FIXED_BLOCKS': 1, 'MAX_POOL': False},
 'RNG_SEED': 3,
 'ROOT_DIR': '/home/zty/iter-reason',
 'TEST': {'MAX_SIZE': 1000, 'SCALES': [600]},
 'TRAIN': {'BATCH_SIZE': 128,
           'BBOX_THRESH': 1.0,
           'BIAS_DECAY': False,
           'DISPLAY': 20,
           'DOUBLE_BIAS': False,
           'GAMMA': 0.1,
           'IMS_PER_BATCH': 1,
           'MAX_SIZE': 1000,
           'MOMENTUM': 0.9,
           'RATE': 0.0005,
           'SCALES': [600],
           'SNAPSHOT_ITERS': 10000,
           'SNAPSHOT_KEPT': 2,
           'SNAPSHOT_PREFIX': 'res101_local',
           'STEPSIZE': [30000],
           'SUMMARY_INTERVAL': 180,
           'SUMMARY_ITERS': 500,
           'USE_FLIPPED': True,
           'WEIGHT_DECAY': 0.0001}}
loading annotations into memory...
Done (t=1.07s)
creating index...
index created!
coco_2014_minival gt roidb loaded from /home/zty/iter-reason/data/cache/coco_2014_minival_gt_roidb.pkl
2018-12-28 11:12:40.903894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:03:00.0
totalMemory: 10.73GiB freeMemory: 10.52GiB
2018-12-28 11:12:40.903961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-28 11:12:41.438366: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-28 11:12:41.438445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2018-12-28 11:12:41.438464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2018-12-28 11:12:41.438919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10152 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:03:00.0, compute capability: 7.5)
WARNING:tensorflow:From /home/zty/iter-reason/tools/../lib/nets/base_memory.py:97: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
ITERATION: 00
ITERATION: 01
ITERATION: 02
WARNING:tensorflow:From /home/zty/iter-reason/tools/../lib/nets/attend_memory.py:104: calling softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.
Instructions for updating:
dim is deprecated, use axis instead
Loading model check point from output/res101_local/coco_2014_train+coco_2014_valminusminival/32a56_72/res101_local_iter_720000.ckpt
Loaded.
score: 1/5000 10.799s
/home/zty/iter-reason/output/res101_local/coco_2014_minival/32a56_72/res101_local_iter_720000/images/COCO_val2014_000000532481.jpg
score: 2/5000 5.542s
score: 3/5000 3.844s
score: 4/5000 2.918s
score: 5/5000 2.351s
score: 6/5000 1.984s
score: 7/5000 1.761s
score: 8/5000 1.562s
score: 9/5000 1.431s
score: 10/5000 1.303s
score: 11/5000 1.192s
score: 12/5000 1.123s
score: 13/5000 1.065s
score: 14/5000 1.017s
score: 15/5000 0.955s
score: 16/5000 0.900s
score: 17/5000 0.869s
score: 18/5000 0.845s
score: 19/5000 0.808s
score: 20/5000 0.772s
score: 21/5000 0.739s
/home/zty/iter-reason/output/res101_local/coco_2014_minival/32a56_72/res101_local_iter_720000/images/COCO_val2014_000000409630.jpg
score: 22/5000 0.722s
score: 23/5000 0.698s
score: 24/5000 0.676s
score: 25/5000 0.652s
Traceback (most recent call last):
  File "./tools/test_memory.py", line 125, in <module>
    test_net(sess, net, imdb, imdb.roidb, filename, args.visualize, iter_test=iter_test)
  File "/home/zty/iter-reason/tools/../lib/model/test.py", line 36, in test_net
    test_net_base(sess, net, imdb, roidb, weights_filename, visualize)
  File "/home/zty/iter-reason/tools/../lib/model/test.py", line 53, in test_net_base
    all_scores[i], blobs = im_detect(sess, imdb, net, [roidb[i]])
  File "/home/zty/iter-reason/tools/../lib/model/test.py", line 21, in im_detect
    _, scores = net.test_image(sess, blobs)
  File "/home/zty/iter-reason/tools/../lib/nets/attend_memory.py", line 274, in test_image
    feed_dict=self._parse_dict(blobs))
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero
	 [[node SMN_1/fc7/flatten/flatten/Reshape (defined at /home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1623)  = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](SMN_1/mem_ct_pool5/crops, SMN_1/fc7/flatten/flatten/Reshape/shape)]]
	 [[{{node aggregate/attend_cls_score/_1113}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2344_aggregate/attend_cls_score", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op u'SMN_1/fc7/flatten/flatten/Reshape', defined at:
  File "./tools/test_memory.py", line 109, in <module>
    net.create_architecture("TEST", imdb.num_classes, tag='default')
  File "/home/zty/iter-reason/tools/../lib/nets/base_memory.py", line 568, in create_architecture
    rois = self._build_memory(training, testing)
  File "/home/zty/iter-reason/tools/../lib/nets/attend_memory.py", line 178, in _build_memory
    rois, batch_ids, iter)
  File "/home/zty/iter-reason/tools/../lib/nets/attend_memory.py", line 152, in _build_pred
    mem_fc7 = self._fc_iter(mem_ct_pool5, is_training, "fc7", iter)
  File "/home/zty/iter-reason/tools/../lib/nets/base_memory.py", line 173, in _fc_iter
    mem_fc7 = slim.flatten(mem_pool5, scope='flatten')
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
    return func(*args, **current_args)
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1623, in flatten
    outputs = core_layers.flatten(inputs)
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/layers/core.py", line 311, in flatten
    return layer.apply(inputs)
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 817, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 374, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/keras/layers/core.py", line 551, in call
    outputs = array_ops.reshape(inputs, (array_ops.shape(inputs)[0], -1))
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 6482, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero
	 [[node SMN_1/fc7/flatten/flatten/Reshape (defined at /home/zty/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1623)  = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](SMN_1/mem_ct_pool5/crops, SMN_1/fc7/flatten/flatten/Reshape/shape)]]
	 [[{{node aggregate/attend_cls_score/_1113}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2344_aggregate/attend_cls_score", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

I'm using python 2.7, CUDA 9.0, and TensorFlow 1.12.0.
Can anybody help this problem? Thanks!

By the way, does anybody know how to use Visual Genome dataset? I don't know where to find the file "synsets.txt" and how to set parameters.

Does the attention:a_0 come from the all-zero initialized memory S_0 ?

Hi!
In iter-reason/lib/nets/attend_memory.py row:154

At iteration 0, you use all-zero initialized memory S_0 to predict the confidence of f_0, while i think it's more reasonable to use roi features come from conv_base which is consistent with your paper.

I'm not sure if I understand it correctly.

Reasoning modules

How to use Global: graph-base reasoning model that described in your paper?
Does the model include Global: graph-base reasoning model When I use the command "./experiments/scripts/train_memory.sh 0 ade local 28 32"? Thanks!

Does this version include the graph-based module?

Hello, I want to know the implementation details of the graph-based module in your paper, but I can't find the corresponding codes in this version. Does this version include that module? Thanks so much for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.