xmyqsh / fpn Goto Github PK

View Code? Open in Web Editor NEW

154.0 154.0 58.0 149 KB

Feature Pyramid Network

Python 89.97% Shell 1.07% Makefile 0.02% C++ 7.48% Cuda 1.46%

fpn's Introduction

Hi there 👋

fpn's People

Contributors

Stargazers

Watchers

Forkers

thomasdic2000 fallingdust xupengcoding holygen lijuan123 chl916185 zbxzc35 leighton613 10183308 feynman27 enjoyyzh123 bryantheyao benjamesbabala xychen9459 wanjinchang junsenselee summer-meet xtanitfy walkoncross yangxue0827 dlluopf qwzhong1988 vanpersie32 whrenstone yougoforward tingjiaqi123 inmgjim hangil55 dreadlord1984 whz1861 ztf-ucas horaccefeng gaochen315 superhero1991 attendfov huipengzhang humengdoudou zgsxwsdxg jobqiu hongkaichen victelk superinno hezhihao10 powermano feitiandemiaomi tqdavid hhgxx123 destinyzs hukongtao wendywlx tsing-cv liujiandu zymale madhavane-y medical-images-process zxycynthia crystalmiaoshu iq-scm

fpn's Issues

Results on COCO 2014, with TFFRCNN baseline

Hi, I've been looking for a working tensorflow implementation of FPN for some time now, and I think that this actually works :).

I'm using TFFRCNN to establish a baseline (this repo also seams to be a direct port of that, but i could be mistaken?). First i tried traning on pascal voc 2007 and testing on pascal voc 2007. That, sadly, didn't give an increase in accuracy (TFFRCNN reported 0.7 mAP and this reported 0.698 mAP, both were trained for 160k iterations), but the RPN loss during training was really good, so that gave me hope :)

But the COCO dataset seams to be a better candidate for testing this, first, because this is what the authors of the FPN paper report on and second, because the COCO evaluation metrics is significantly more fine grained.

below are the result:


Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 = 0.17, 0.20, 0.03
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 = 0.34, 0.37, 0.03
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 = 0.16, 0.20, 0.04
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 = 0.03, 0.08, 0.05
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 = 0.18, 0.23, 0.05
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 = 0.29, 0.27, -0.02
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 = 0.19, 0.21, 0.02
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 = 0.27, 0.33, 0.06
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 = 0.27, 0.33, 0.06
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 = 0.05, 0.13, 0.08
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 = 0.30, 0.39, 0.09
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 = 0.47, 0.47, 0.00

Where the 3 numbers at the end are (TFFRCNN, FPN, difference between the two).

The paper uses a slightly different training and testing set (training_set + train35k for traning and minival for testing (minival is only 5k images, while val is 40k images)). But the relative differences between
faster-rcnn and FPN are in the same ballpark. Were seeing large increases in performance for small instances (both AR and AP), which is exactly what FPN sets out to do. So congrats! @xmyqsh :). The only result thats worse that TFFRCNN is the large instances, but that maybe remedied by two thing. First I only used (P3 -P5) for the class/bbox heads (similar to the paper), but I see you now use P6 aswell. Second I accidentally used OHEM when training/testing TFFRCNN, and not for FPN, so the test is actually not completely fair to FPN.

I'm going to focus on implementing RoiAlign and attaching a Mask head, so we can maybe replicate the results of Mask R-CNN.

PS. @xmyqsh should I do a pull request so that we can all train/test on coco (I just took the coco dataset code from TFFRCNN and made a few changes to the training code, so that instances with no gt boxes in the traningset are handled)

no module named cython_bbox

when I train train my dataset with end to end,it show's error :
from ..utils.cython_bbox import bbox_overlaps
ImportError:no module named cython_bbox
and I can't find cython_bbox.py in utils.
please help me,thanks!

Could you share the trained model parameters?

Could you share the trained model parameters?On coco or voc? @xmyqsh

dataset structure

Hi, you reference the folder pascal_voc0712 in the training/test scripts, but what is the structure of that folder?

I guess it contains the pascal voc 2007 and pascal voc 2012 dataset, but i cant find instructions on how to create that folder?

what's difference between alt_opt training & end2end training?

TypeError: exceptions must be old-style classes or derived from BaseException, not Command exited with non-zero status 1

Please help me
Loading pretrained model weights from data/pretrain_model/Resnet50.npy
Traceback (most recent call last):
File "./tools/train_net_alt_opt.py", line 109, in
max_iters=max_iters)
File "./tools/../lib/fast_rcnn/train_opt_alt.py", line 513, in train_net
sw.train_model(sess, vs_names, max_iters)
File "./tools/../lib/fast_rcnn/train_opt_alt.py", line 406, in train_model
self.train_rpn(sess, vs_names[0], max_iters[0], init_model=self.pretrained_mod
File "./tools/../lib/fast_rcnn/train_opt_alt.py", line 154, in train_rpn
raise 'Check your pretrained model {:s}'.format(init_model)
TypeError: exceptions must be old-style classes or derived from BaseException, not
Command exited with non-zero status 1
8.84user 0.76system 0:08.69elapsed 110%CPU (0avgtext+0avgdata 928076maxresident)k
0inputs+2224outputs (0major+134721minor)pagefaults 0swaps

Is it proper to backprop all of the proposal target layer losses?

Is it proper to backprop all of the proposal target layer losses?
Totally four proposal target layer, _feat_stride equals 4/8/16/32

Does anybody have thought the question like this?

Memory leak problem with proposal_layer.py

Hi, I encountered a memory leak problem when I inference images with the trained fpn model.

In my case, I load the model with net = caffe.Net(...) in python and using net.forward() to get the detection results (scores, predicted bboxes). I monitored the memory usage of the program and I found that more inferences done, more memory used. I looked into the problem and found it might have something to do with the proposal_layer.py file.

I found that if I comment out the lines after

FPN/lib/rpn_msr/proposal_layer.py

Line 194 in 5473ce0

leveled_rois = [None] * 5

there will be no memory leak. However, if I comment out the lines after (including this line)

FPN/lib/rpn_msr/proposal_layer.py

Line 200 in 5473ce0

for level_idx in xrange(0, 5):

memroy leak appears. So I think the memory leak is caused by the following lines.

FPN/lib/rpn_msr/proposal_layer.py

Lines 195 to 198 in 5473ce0

    
           leveled_idxs = [[], [], [], [], []] 
        
           for idx, roi in enumerate(rpn_rois): 
        
               level_idx = level(roi) - 2 
        
               leveled_idxs[level_idx].append(idx)

Does anybody have this problem too and is my guess correct?
Any opinions will be appreciated.

alt training error

InvalidArgumentError (see above for traceback): Shape [-1,5] has negative dimensions
[[Node: leveled_rois_4 = Placeholderdtype=DT_FLOAT, shape=[?,5], _device="/job:localhost/replica:0/task:0/gpu:0"]]

We've got an error while stopping in post-mortem: <type 'exceptions.KeyboardInterrupt'>

nms model cannot be imported

Hi @xmyqsh ,

When I run the training command, there is an error message
Traceback (most recent call last): File "./faster_rcnn/train_net.py", line 23, in <module> from lib.fast_rcnn.train import get_training_roidb, train_net File "./faster_rcnn/../lib/fast_rcnn/__init__.py", line 9, in <module> from . import train File "./faster_rcnn/../lib/fast_rcnn/train.py", line 15, in <module> from .nms_wrapper import nms_wrapper File "./faster_rcnn/../lib/fast_rcnn/nms_wrapper.py", line 14, in <module> from ..nms.gpu_nms import gpu_nms ImportError: No module named gpu_nms Command exited with non-zero status 1

Do you have any idea to fix this issue?

Thank you!

Can you share the link of pretrain ResNet 50 imagenetpre-train model

Hi, I noticed that in your experiment script you have mentioned that you used resnet 50 pretrain model, could you please share the link for downloading this model, thank you very much

InternalError (see above for traceback): Failed to run py callback pyfunc_0: see error log.

hello, I have successfully iterated 100 times，the following mistake suddenly appeared, anyone have idea about this?

iter: 0 / 200000, total loss: 18.0627, rpn_loss_cls: 0.9430, rpn_loss_box: 10.3524, loss_cls: 4.4159, loss_box: 2.3514, lr: 0.000500
speed: 3.120s / iter
2017-12-29 13:22:04.715605: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2001 get requests, put_count=1760 evicted_count=1000 eviction_rate=0.568182 and unsatisfied allocation rate=0.670165
2017-12-29 13:22:04.715654: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
image: 025553069_K1210503_T001_1_10.jpg iter: 20 / 200000, total loss: 3.2476, rpn_loss_cls: 0.2447, rpn_loss_box: 2.8628, loss_cls: 0.0771, loss_box: 0.0630, lr: 0.000500
speed: 1.036s / iter
image: 030446539_K1221297_419_1_05.jpg iter: 40 / 200000, total loss: 4.8578, rpn_loss_cls: 0.1838, rpn_loss_box: 3.7386, loss_cls: 0.4705, loss_box: 0.4649, lr: 0.000500
speed: 1.033s / iter
2017-12-29 13:22:46.655261: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3000 get requests, put_count=3099 evicted_count=1000 eviction_rate=0.322685 and unsatisfied allocation rate=0.308
2017-12-29 13:22:46.655287: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
image: 030454261_K1221455_T001_5_04.jpg iter: 60 / 200000, total loss: 2.1451, rpn_loss_cls: 0.1146, rpn_loss_box: 1.7099, loss_cls: 0.2562, loss_box: 0.0643, lr: 0.000500
speed: 0.708s / iter
image: 025913309_K1214499_161_1_07.jpg iter: 80 / 200000, total loss: 2.7814, rpn_loss_cls: 0.2832, rpn_loss_box: 1.0201, loss_cls: 0.6792, loss_box: 0.7989, lr: 0.000500
speed: 1.077s / iter
image: 030141742_K1217637_285_1_28.jpg iter: 100 / 200000, total loss: 2.7696, rpn_loss_cls: 0.2172, rpn_loss_box: 0.8301, loss_cls: 0.7361, loss_box: 0.9862, lr: 0.000500
speed: 0.945s / iter
Traceback (most recent call last):
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 82, in call
ret = func(*args)
File "faster_rcnn/../lib/rpn_msr/anchor_target_layer.py", line 151, in anchor_target_layer
argmax_overlaps = overlaps.argmax(axis=1) # (A)
ValueError: attempt to get argmax of an empty sequence
2017-12-29 13:23:49.293478: W tensorflow/core/framework/op_kernel.cc:1152] Internal: Failed to run py callback pyfunc_0: see error log.
2017-12-29 13:23:49.330989: W tensorflow/core/framework/op_kernel.cc:1152] Internal: Failed to run py callback pyfunc_0: see error log.
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _recv_gt_boxes_0, _recv_gt_ishard_0, _recv_dontcare_areas_0, _recv_im_info_0, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
Traceback (most recent call last):
File "faster_rcnn/train_net.py", line 106, in
restore=bool(int(args.restore)))
File "faster_rcnn/../lib/fast_rcnn/train.py", line 407, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "faster_rcnn/../lib/fast_rcnn/train.py", line 261, in train_model
cls_prob, bbox_pred, rois = sess.run(fetches=fetch_list, feed_dict=feed_dict)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Failed to run py callback pyfunc_0: see error log.
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _recv_gt_boxes_0, _recv_gt_ishard_0, _recv_dontcare_areas_0, _recv_im_info_0, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10/_663 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_379_gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Caused by op u'RPN/rpn-data/PyFunc', defined at:
File "faster_rcnn/train_net.py", line 98, in
network = get_network(args.network_name)
File "faster_rcnn/../lib/networks/factory.py", line 22, in get_network
return FPN_train()
File "faster_rcnn/../lib/networks/FPN_train.py", line 25, in init
self.setup()
File "faster_rcnn/../lib/networks/FPN_train.py", line 418, in setup
.anchor_target_layer(_feat_stride[2:], anchor_size[2:], name = 'rpn-data'))
File "faster_rcnn/../lib/networks/network.py", line 34, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "faster_rcnn/../lib/networks/network.py", line 380, in anchor_target_layer
[tf.float32,tf.float32,tf.float32,tf.float32])
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 189, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 40, in _py_func
name=name)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

InternalError (see above for traceback): Failed to run py callback pyfunc_0: see error log.
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _recv_gt_boxes_0, _recv_gt_ishard_0, _recv_dontcare_areas_0, _recv_im_info_0, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10/_663 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_379_gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Command exited with non-zero status 1
167.60user 9.99system 2:55.72elapsed 101%CPU (0avgtext+0avgdata 2398032maxresident)k
0inputs+3640outputs (0major+1128621minor)pagefaults 0swaps

Loading pretrained model weights from data/pretrain_model/Resnet50.npy，I met this problem

num gt: 2
num fg: 17
num bg: 111
cudaCheckError() failed in ROIPoolForward: invalid device function
cudaCheckError() failed in ROIPoolForward: driver shutting down
2017-08-21 10:19:34.578317: F tensorflow/stream_executor/cuda/cuda_driver.cc:312] Check failed: CUDA_SUCCESS == cuCtxSetCurrent(cuda_context->context()) (0 vs. 4)
2017-08-21 10:19:34.578347: E tensorflow/stream_executor/stream.cc:289] Error recording event in stream: error recording CUDA event on stream 0x61a7d80: CUDA_ERROR_DEINITIALIZED; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2017-08-21 10:19:34.578429: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_DEINITIALIZED
2017-08-21 10:19:34.578450: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected Event status: 1
Command terminated by signal 6
65.76user 10.62system 1:13.23elapsed 104%CPU (0avgtext+0avgdata 1891988maxresident)k
0inputs+3016outputs (0major+831469minor)pagefaults 0swaps
@xmyqsh ，I met this problem, how to solve？

Encounter this error: tensorflow.python.framework.errors_impl.NotFoundError

When I run train_net.py, I have met the following error:

Traceback (most recent call last):
File "faster_rcnn/train_net.py", line 26, in
from lib.networks.factory import get_network
File "faster_rcnn/../lib/networks/init.py", line 8, in
from .FPN_train import FPN_train
File "faster_rcnn/../lib/networks/FPN_train.py", line 9, in
from .network import Network
File "faster_rcnn/../lib/networks/network.py", line 5, in
from ..roi_pooling_layer import roi_pooling_op as roi_pool_op
File "faster_rcnn/../lib/roi_pooling_layer/init.py", line 7, in
import roi_pooling_op
File "faster_rcnn/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: faster_rcnn/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE

What's the reason? Is there anybody who will give some advice?
Thank you.

running test_net.py gives no detection result

I managed to do training on my own datasets, after 12w iterations I went for a test by using commands similar to
CUDA_VISIBLE_DEVICES=0 python ./faster_rcnn/test_net.py --gpu 0 --weights output/FPN_end2end/voc_0712_trainval/FPN_iter_370000.ckpt --imdb voc_0712_test --cfg ./experiments/cfgs/FPN_end2end.yml --network FPN_test

But detection results from some images turn out to be nothing while the others seem normal.
I have my own loader inherited from imdb and it works well on faster-rcnn from this repository:

https://github.com/ruotianluo/pytorch-faster-rcnn

I'm totally confused, any suggestions will be appreciated, thanks.

Can not get the final checkpoint.

Hi, I use your model to train VOC2007.
I get the following checkpoint

FPN_alt_opt_stage1_Fast_RCNN_iter_4.ckpt.data-00000-of-00001
FPN_alt_opt_stage1_Fast_RCNN_iter_4.ckpt.index
FPN_alt_opt_stage1_Fast_RCNN_iter_4.ckpt.meta
FPN_alt_opt_stage1_RPN_iter_8.ckpt.data-00000-of-00001
FPN_alt_opt_stage1_RPN_iter_8.ckpt.index
FPN_alt_opt_stage1_RPN_iter_8.ckpt.meta
FPN_alt_opt_stage2_Fast_RCNN_iter_4.ckpt.data-00000-of-00001
FPN_alt_opt_stage2_Fast_RCNN_iter_4.ckpt.index
FPN_alt_opt_stage2_Fast_RCNN_iter_4.ckpt.meta
FPN_alt_opt_stage2_RPN_iter_8.ckpt.data-00000-of-00001
FPN_alt_opt_stage2_RPN_iter_8.ckpt.index
FPN_alt_opt_stage2_RPN_iter_8.ckpt.meta

but don't have the final checkpoint, and I get this error
./tools/test_net.py --gpu 0 --weights --imdb voc_2007_test --cfg experiments/cfgs/FPN_alt_opt.yml --network FPN_alt_opt_train_test python: can't open file './tools/test_net.py': [Errno 2] No such file or directory

How can I fix it.
Thank you!

FPN ROI Choosing

Hi there! As I scan through Feature Pyramid Network for Object Detection, I found a part where there is a formula for choosing the feature map for ROI based on the size of the region proposal. Can you show me how you implement this? I wish to implement FPN on the new Object Detection API provided by Tensorflow.

How to run test_net? (to get the mAP of my task)

Hi
Thx for your excellent job! I'm a new learner and it benefits me a lot.
I notice that there is no test_net.py in FPN/faster_rcnn/

So, could u tell me how to run a test? I neet to know the mAP of my task.

When i use the test_net.py from another project, there gose an error
Traceback (most recent call last):
File "faster_rcnn/test_net.py", line 85, in
network = get_network(args.network_name)
File "faster_rcnn/../lib/networks/factory.py", line 19, in get_network
return FPN_test()
File "faster_rcnn/../lib/networks/FPN_test.py", line 25, in init
self.setup()
File "faster_rcnn/../lib/networks/FPN_test.py", line 231, in setup
.fc(n_classes, relu=False, name='cls_score')
File "faster_rcnn/../lib/networks/network.py", line 34, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "faster_rcnn/../lib/networks/network.py", line 390, in fc
dim *= d
TypeError: unsupported operand type(s) for *=: 'int' and 'NoneType'

How can i fix this?

Thx again :)
@xmyqsh

could u share the corresponding result?

did u use both the voc2007 & voc2012?

could u share the corresponding result?

Getting -1 for map using VOC07+12 Trainval for validation

Hi,
I used your shared ckpt file to test the validation and run with VOC07 + 12 but get all -1 for mAP, (also try 07 but get the same result :( ) @xmyqsh

I am wondering anyone has any ideas about this error?

VGG based FPN model

I want to use FPN for the VGG model since I only have 2 gtx 980 with 8 G memory, do you have plan to share a VGG based model ?

ValueError: Shape of a new variable (Fast-RCNN/fc6/weights) must be fully defined, but instead was (?, 1024).

Error happened when construct the train network, it seems some parameters of input before fc6 are lost, could you give some suggestions

alt_opt testing problem

The alt_opt training has been done. But when I test the network, the file ../tools/test_net.py doesn't exsit.

can't download .ckpt file

I can not download .ckpt file in google drive,is there any other means to download ?

Train New Dataset

Hi @xmyqsh, thanks for your work firstly. The code is neat and wonderful. But I wonder if you could add more info of dataset structure and how we use this code to train and test. BTW, it seems a little bit difficult if I change the training set to my dataset, would you give me some advice? The pascol_voc.py in datasets contains many functions and I am not sure if I should rewrite all the functions. (my dataset structure similar to the VOC)
@ouchjm

Memory allocation problem

Hi, I encountered memory allocation problem during initialization (solving).

InternalError (see above for traceback): Dst tensor is not initialized. [[Node: fc7_new/weights/Momentum/Initializer/Const = Const[_class=["loc:@fc7_new/weights"], dtype=DT_FLOAT, value=Tensor<type: float shape: [4096,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

and

Limit: 5111519641
InUse: 5111519488
MaxInUse: 5111519488
NumAllocs: 148
MaxAllocSize: 4991561984

My setting:

single GPU, 11.57G GPU memory available
(same as faster rcnn) 1 image, shorter side 600, batch size 128, rpn batch size 256
same error if i change to batch size 16, rpn batch size 32

So I assume the model is too large to fit in?

Tensorflow does have the issue that taking up too much memory in initialization. Wonder anyone has encountered or resolved this issue?

Thanks!

cannot convert float infinity to integer

我在训练自己的数据集时，由于数据集中有些图片是没有我要标注的目标，所以这部分图片没有标签，我在训练是出现
2017-11-15 21:07:00.175803: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: exceptions.ValueError: attempt to get argmax of an empty sequence
可是当我把这部分图片的标签设置成0 0 0 0，生成xml文件，但是在训练时出现
tensorflow.python.framework.errors_impl.UnknownError: exceptions.OverflowError: cannot convert float infinity to integer @xmyqsh

Rationale behind rpn loss in build_loss

In the build_loss function, I see that the labels are filtered for only foreground/background regions before calculating the rpn class loss and that the SmoothL1Loss is normalized by the number of foreground regions. I don't see this in the original Faster-RCNN implementation. @xmyqsh Can you please explain why this is necessary?

tensorflow.python.framewor.errors

当我运行FPN_alt_opt.sh时，出现如下问题：
dxt@dxt-System-Product-Name:~/FPN-master (2)$ ./experiments/scripts/FPN_alt_opt.sh 0 FPN_alt_opt pascal_voc0712

set -e
export PYTHONUNBUFFERED=True
PYTHONUNBUFFERED=True
GPU_ID=0
NET=FPN_alt_opt
NET_lc=fpn_alt_opt
DATASET=pascal_voc0712
array=($@)
len=3
EXTRA_ARGS=
EXTRA_ARGS_SLUG=
case $DATASET in
TRAIN_IMDB=voc_0712_trainval
TEST_IMDB=voc_0712_test
PT_DIR=pascal_voc
CFG=experiments/cfgs/FPN_alt_opt.yml
++ date +%Y_%m_%d_%H_%M_%S
LOG=experiments/logs/faster_rcnn_end2end_FPN_alt_opt_.txt.2018_03_06_21_06_59
exec
++ tee -a experiments/logs/faster_rcnn_end2end_FPN_alt_opt_.txt.2018_03_06_21_06_59
tee: experiments/logs/faster_rcnn_end2end_FPN_alt_opt_.txt.2018_03_06_21_06_59: 没有那个文件或目录
echo Logging output to experiments/logs/faster_rcnn_end2end_FPN_alt_opt_.txt.2018_03_06_21_06_59
Logging output to experiments/logs/faster_rcnn_end2end_FPN_alt_opt_.txt.2018_03_06_21_06_59
CUDA_VISIBLE_DEVICES=0
time python ./tools/train_net_alt_opt.py --gpu 0 --weights data/pretrain_model/Resnet50.npy --imdb voc_0712_trainval --cfg experiments/cfgs/FPN_alt_opt.yml --network FPN_alt_opt_train_test
Traceback (most recent call last):
File "./tools/train_net_alt_opt.py", line 26, in
from lib.networks.factory import get_network
File "./tools/../lib/networks/init.py", line 8, in
from .FPN_train import FPN_train
File "./tools/../lib/networks/FPN_train.py", line 9, in
from .network import Network
File "./tools/../lib/networks/network.py", line 5, in
from ..roi_pooling_layer import roi_pooling_op as roi_pool_op
File "./tools/../lib/roi_pooling_layer/init.py", line 7, in
import roi_pooling_op
File "./tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library
None, None, error_msg, error_code)
tensorflow.python.framework.errors_impl.NotFoundError: ./tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumES3
Command exited with non-zero status 1
1.84user 0.35system 0:02.14elapsed 102%CPU (0avgtext+0avgdata 198848maxresident)k
0inputs+32outputs (0major+49923minor)pagefaults 0swaps

@xmyqsh 我不知道如何解决，请帮助我，谢谢啦！！！

different shape error when training now data

When I traing my data using
nohup ./experiments/scripts/FPN_end2end.sh 0 FPN pascal_voc2007 --set RNG_SEED 42 TRAIN.SCALES "[800]" > FPN.log 2>&1 &
I got this trouble

ignore bn4f_branch2a offset
ignore res5a_branch2c weights
ignore res5a_branch2b weights
ignore res5a_branch2a weights
ignore res3d_branch2b weights
ignore res3d_branch2c weights
ignore res3d_branch2a weights
./faster_rcnn/../lib/fast_rcnn/bbox_transform.py:61: RuntimeWarning: overflow encountered in exp
  pred_w = np.exp(dw) * widths[:, np.newaxis]
./faster_rcnn/../lib/fast_rcnn/bbox_transform.py:61: RuntimeWarning: overflow encountered in multiply
  pred_w = np.exp(dw) * widths[:, np.newaxis]
./faster_rcnn/../lib/fast_rcnn/bbox_transform.py:62: RuntimeWarning: overflow encountered in exp
  pred_h = np.exp(dh) * heights[:, np.newaxis]
./faster_rcnn/../lib/fast_rcnn/bbox_transform.py:62: RuntimeWarning: overflow encountered in multiply
  pred_h = np.exp(dh) * heights[:, np.newaxis]
Traceback (most recent call last):
  File "./faster_rcnn/train_net.py", line 109, in <module>
    restore=bool(int(args.restore)))
  File "./faster_rcnn/../lib/fast_rcnn/train.py", line 407, in train_net
    sw.train_model(sess, max_iters, restore=restore)
  File "./faster_rcnn/../lib/fast_rcnn/train.py", line 277, in train_model
    bbox_pred = bbox_pred * np.tile(self.bbox_stds, (bbox_pred.shape[0], 1)) + \
ValueError: operands could not be broadcast together with shapes (128,84) (128,8) 
Command exited with non-zero status 1
13.00user 2.02system 0:14.48elapsed 103%CPU (0avgtext+0avgdata 2385504maxresident)k
0inputs+3096outputs (0major+371289minor)pagefaults 0swaps

I don't know how to fix it.

Experimental result

Cool code! I wonder whether you have reimplemented the exact results of the FPN paper?

Best!
guangxing

Use ResNet101 Model

If I want to use ResNet 101 ,what should I change

UnknownError (see above for traceback): KeyError: b'TRAIN'

When I started to train end-to-end , the error happened:

W tensorflow/core/framework/op_kernel.cc:1158] Unknown: KeyError: b'TRAIN'

Caused by op 'RPN/rpn_rois/PyFunc', defined at:
File "./faster_rcnn/train_net.py", line 101, in
network = get_network(args.network_name)
File "./faster_rcnn/../lib/networks/factory.py", line 22, in get_network
return FPN_train()
File "./faster_rcnn/../lib/networks/FPN_train.py", line 25, in init
self.setup()
File "./faster_rcnn/../lib/networks/FPN_train.py", line 427, in setup
.proposal_layer(_feat_stride[2:], anchor_size[2:], 'TRAIN',name = 'rpn_rois'))
File "./faster_rcnn/../lib/networks/network.py", line 34, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "./faster_rcnn/../lib/networks/network.py", line 345, in proposal_layer
[tf.float32]),
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 198, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/ops/gen_script_ops.py", line 38, in _py_func
name=name)
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()

UnknownError (see above for traceback): KeyError: b'TRAIN'
[[Node: RPN/rpn_rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT], token="pyfunc_1", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/Reshape_2/_717, RPN/rpn_bbox_pred/BiasAdd/_719, RPN/Reshape_5/_721, RPN/rpn_bbox_pred_1/BiasAdd/_723, RPN/Reshape_8/_725, RPN/rpn_bbox_pred_2/BiasAdd/_727, RPN/Reshape_11/_729, RPN/rpn_bbox_pred_3/BiasAdd/_731, RPN/Reshape_14/_733, RPN/rpn_bbox_pred_4/BiasAdd/_735, _arg_im_info_0_5, RPN/rpn_rois/PyFunc/input_11, RPN/rpn_rois/PyFunc/input_12, RPN/rpn_rois/PyFunc/input_13)]]
[[Node: gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13/_619 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_2711_gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]

training loss is nan

i noticed that lib/networks/network.py have been modified.
use the previous version is fine, but use modified version will lead to gradient explosion.
by the way, the dataset i use is caltech pedestrian detection.
apologize for my poor English:)

ValueError: attempt to get argmax of an empty sequence

please help me .when i train my own data
2017-11-15 21:07:00.175803: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: exceptions.ValueError: attempt to get argmax of an empty sequence
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _arg_gt_boxes_0_3, _arg_gt_ishard_0_4, _arg_dontcare_areas_0_2, _arg_im_info_0_5, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
Traceback (most recent call last):
File "./faster_rcnn/train_net.py", line 109, in
restore=bool(int(args.restore)))
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 409, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 263, in train_model
cls_prob, bbox_pred, rois = sess.run(fetches=fetch_list, feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: exceptions.ValueError: attempt to get argmax of an empty sequence
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _arg_gt_boxes_0_3, _arg_gt_ishard_0_4, _arg_dontcare_areas_0_2, _arg_im_info_0_5, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13/_619 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_2711_gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Caused by op u'RPN/rpn-data/PyFunc', defined at:
File "./faster_rcnn/train_net.py", line 101, in
network = get_network(args.network_name)
File "./faster_rcnn/../lib/networks/factory.py", line 22, in get_network
return FPN_train()
File "./faster_rcnn/../lib/networks/FPN_train.py", line 25, in init
self.setup()
File "./faster_rcnn/../lib/networks/FPN_train.py", line 418, in setup
.anchor_target_layer(_feat_stride[2:], anchor_size[2:], name = 'rpn-data'))
File "./faster_rcnn/../lib/networks/network.py", line 34, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "./faster_rcnn/../lib/networks/network.py", line 380, in anchor_target_layer
[tf.float32,tf.float32,tf.float32,tf.float32])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/script_ops.py", line 198, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 38, in _py_func
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): exceptions.ValueError: attempt to get argmax of an empty sequence
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _arg_gt_boxes_0_3, _arg_gt_ishard_0_4, _arg_dontcare_areas_0_2, _arg_im_info_0_5, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13/_619 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_2711_gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]

stuck in loading the resnet50

I am using two 970 to run the network, it just get stuck in loading the Resnet50.npy, maybe the memory is too small, but I suspect it will be stuck in a later stage than stuck in loading the Resnet50 anyway

2017-08-16 09:00:28.165008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1 
2017-08-16 09:00:28.165016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y Y 
2017-08-16 09:00:28.165028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1:   Y Y 
2017-08-16 09:00:28.165033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
2017-08-16 09:00:28.165036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 970, pci bus id: 0000:06:00.0)
Computing bounding-box regression targets...

19.63user 1.20system 0:25.18elapsed 82%CPU (0avgtext+0avgdata 1314772maxresident)k
583592inputs+3008outputs (159major+346781minor)pagefaults 0swaps
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Loading pretrained model weights from data/pretrain_model/Resnet50.npy
Traceback (most recent call last):
  File "./faster_rcnn/train_net.py", line 109, in <module>
    restore=bool(int(args.restore)))
  File "./faster_rcnn/../lib/fast_rcnn/train.py", line 407, in train_net
    sw.train_model(sess, max_iters, restore=restore)
  File "./faster_rcnn/../lib/fast_rcnn/train.py", line 164, in train_model
    raise 'Check your pretrained model {:s}'.format(self.pretrained_model)
TypeError: exceptions must be old-style classes or derived from BaseException, not str
Command exited with non-zero status 1
19.63user 1.20system 0:25.18elapsed 82%CPU (0avgtext+0avgdata 1314772maxresident)k
583592inputs+3008outputs (159major+346781minor)pagefaults 0swaps

where to download resnet50.npy

when try to train FPN, i found the following exception:
Exception: Check your pretrained model data/pretrain_model/Resnet50.npy

where can i download Resnet50.npy?

Getting -1 for map using VOC07+12 Trainval

Hi,thank you for your code,I'm new on this ,I use your net and change the baseline to resnext using resnext50 which is convertied from caffe model ,but I get -1 mAP for all classes,can you tell me why? Thank you and please forgive my poor English!

	leveled_idxs = [[], [], [], [], []]
	for idx, roi in enumerate(rpn_rois):
	level_idx = level(roi) - 2
	leveled_idxs[level_idx].append(idx)

xmyqsh / fpn Goto Github PK

fpn's Introduction

Hi there 👋

fpn's People

Contributors

Stargazers

Watchers

Forkers

fpn's Issues

So, could u tell me how to run a test? I neet to know the mAP of my task.

How can i fix this?

Recommend Projects

Recommend Topics

Recommend Org