xmyqsh / fpn Goto Github PK
View Code? Open in Web Editor NEWFeature Pyramid Network
Feature Pyramid Network
Hi, I've been looking for a working tensorflow implementation of FPN for some time now, and I think that this actually works :).
I'm using TFFRCNN to establish a baseline (this repo also seams to be a direct port of that, but i could be mistaken?). First i tried traning on pascal voc 2007 and testing on pascal voc 2007. That, sadly, didn't give an increase in accuracy (TFFRCNN reported 0.7 mAP and this reported 0.698 mAP, both were trained for 160k iterations), but the RPN loss during training was really good, so that gave me hope :)
But the COCO dataset seams to be a better candidate for testing this, first, because this is what the authors of the FPN paper report on and second, because the COCO evaluation metrics is significantly more fine grained.
below are the result:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 = 0.17, 0.20, 0.03
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 = 0.34, 0.37, 0.03
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 = 0.16, 0.20, 0.04
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 = 0.03, 0.08, 0.05
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 = 0.18, 0.23, 0.05
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 = 0.29, 0.27, -0.02
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 = 0.19, 0.21, 0.02
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 = 0.27, 0.33, 0.06
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 = 0.27, 0.33, 0.06
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 = 0.05, 0.13, 0.08
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 = 0.30, 0.39, 0.09
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 = 0.47, 0.47, 0.00
Where the 3 numbers at the end are (TFFRCNN, FPN, difference between the two).
The paper uses a slightly different training and testing set (training_set + train35k for traning and minival for testing (minival is only 5k images, while val is 40k images)). But the relative differences between
faster-rcnn and FPN are in the same ballpark. Were seeing large increases in performance for small instances (both AR and AP), which is exactly what FPN sets out to do. So congrats! @xmyqsh :). The only result thats worse that TFFRCNN is the large instances, but that maybe remedied by two thing. First I only used (P3 -P5) for the class/bbox heads (similar to the paper), but I see you now use P6 aswell. Second I accidentally used OHEM when training/testing TFFRCNN, and not for FPN, so the test is actually not completely fair to FPN.
I'm going to focus on implementing RoiAlign and attaching a Mask head, so we can maybe replicate the results of Mask R-CNN.
PS. @xmyqsh should I do a pull request so that we can all train/test on coco (I just took the coco dataset code from TFFRCNN and made a few changes to the training code, so that instances with no gt boxes in the traningset are handled)
when I train train my dataset with end to end,it show's error :
from ..utils.cython_bbox import bbox_overlaps
ImportError:no module named cython_bbox
and I can't find cython_bbox.py in utils.
please help me,thanks!
Could you share the trained model parameters?On coco or voc? @xmyqsh
Hi, you reference the folder pascal_voc0712 in the training/test scripts, but what is the structure of that folder?
I guess it contains the pascal voc 2007 and pascal voc 2012 dataset, but i cant find instructions on how to create that folder?
Please help me
Loading pretrained model weights from data/pretrain_model/Resnet50.npy
Traceback (most recent call last):
File "./tools/train_net_alt_opt.py", line 109, in
max_iters=max_iters)
File "./tools/../lib/fast_rcnn/train_opt_alt.py", line 513, in train_net
sw.train_model(sess, vs_names, max_iters)
File "./tools/../lib/fast_rcnn/train_opt_alt.py", line 406, in train_model
self.train_rpn(sess, vs_names[0], max_iters[0], init_model=self.pretrained_mod
File "./tools/../lib/fast_rcnn/train_opt_alt.py", line 154, in train_rpn
raise 'Check your pretrained model {:s}'.format(init_model)
TypeError: exceptions must be old-style classes or derived from BaseException, not
Command exited with non-zero status 1
8.84user 0.76system 0:08.69elapsed 110%CPU (0avgtext+0avgdata 928076maxresident)k
0inputs+2224outputs (0major+134721minor)pagefaults 0swaps
Is it proper to backprop all of the proposal target layer losses?
Totally four proposal target layer, _feat_stride equals 4/8/16/32
Does anybody have thought the question like this?
Hi, I encountered a memory leak problem when I inference images with the trained fpn model.
In my case, I load the model with net = caffe.Net(...)
in python and using net.forward()
to get the detection results (scores, predicted bboxes). I monitored the memory usage of the program and I found that more inferences done, more memory used. I looked into the problem and found it might have something to do with the proposal_layer.py
file.
I found that if I comment out the lines after
FPN/lib/rpn_msr/proposal_layer.py
Line 194 in 5473ce0
FPN/lib/rpn_msr/proposal_layer.py
Line 200 in 5473ce0
FPN/lib/rpn_msr/proposal_layer.py
Lines 195 to 198 in 5473ce0
Does anybody have this problem too and is my guess correct?
Any opinions will be appreciated.
InvalidArgumentError (see above for traceback): Shape [-1,5] has negative dimensions
[[Node: leveled_rois_4 = Placeholderdtype=DT_FLOAT, shape=[?,5], _device="/job:localhost/replica:0/task:0/gpu:0"]]
We've got an error while stopping in post-mortem: <type 'exceptions.KeyboardInterrupt'>
Hi @xmyqsh ,
When I run the training command, there is an error message
Traceback (most recent call last): File "./faster_rcnn/train_net.py", line 23, in <module> from lib.fast_rcnn.train import get_training_roidb, train_net File "./faster_rcnn/../lib/fast_rcnn/__init__.py", line 9, in <module> from . import train File "./faster_rcnn/../lib/fast_rcnn/train.py", line 15, in <module> from .nms_wrapper import nms_wrapper File "./faster_rcnn/../lib/fast_rcnn/nms_wrapper.py", line 14, in <module> from ..nms.gpu_nms import gpu_nms ImportError: No module named gpu_nms Command exited with non-zero status 1
Do you have any idea to fix this issue?
Thank you!
Hi, I noticed that in your experiment script you have mentioned that you used resnet 50 pretrain model, could you please share the link for downloading this model, thank you very much
hello, I have successfully iterated 100 times,the following mistake suddenly appeared, anyone have idea about this?
iter: 0 / 200000, total loss: 18.0627, rpn_loss_cls: 0.9430, rpn_loss_box: 10.3524, loss_cls: 4.4159, loss_box: 2.3514, lr: 0.000500
speed: 3.120s / iter
2017-12-29 13:22:04.715605: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2001 get requests, put_count=1760 evicted_count=1000 eviction_rate=0.568182 and unsatisfied allocation rate=0.670165
2017-12-29 13:22:04.715654: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
image: 025553069_K1210503_T001_1_10.jpg iter: 20 / 200000, total loss: 3.2476, rpn_loss_cls: 0.2447, rpn_loss_box: 2.8628, loss_cls: 0.0771, loss_box: 0.0630, lr: 0.000500
speed: 1.036s / iter
image: 030446539_K1221297_419_1_05.jpg iter: 40 / 200000, total loss: 4.8578, rpn_loss_cls: 0.1838, rpn_loss_box: 3.7386, loss_cls: 0.4705, loss_box: 0.4649, lr: 0.000500
speed: 1.033s / iter
2017-12-29 13:22:46.655261: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3000 get requests, put_count=3099 evicted_count=1000 eviction_rate=0.322685 and unsatisfied allocation rate=0.308
2017-12-29 13:22:46.655287: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
image: 030454261_K1221455_T001_5_04.jpg iter: 60 / 200000, total loss: 2.1451, rpn_loss_cls: 0.1146, rpn_loss_box: 1.7099, loss_cls: 0.2562, loss_box: 0.0643, lr: 0.000500
speed: 0.708s / iter
image: 025913309_K1214499_161_1_07.jpg iter: 80 / 200000, total loss: 2.7814, rpn_loss_cls: 0.2832, rpn_loss_box: 1.0201, loss_cls: 0.6792, loss_box: 0.7989, lr: 0.000500
speed: 1.077s / iter
image: 030141742_K1217637_285_1_28.jpg iter: 100 / 200000, total loss: 2.7696, rpn_loss_cls: 0.2172, rpn_loss_box: 0.8301, loss_cls: 0.7361, loss_box: 0.9862, lr: 0.000500
speed: 0.945s / iter
Traceback (most recent call last):
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 82, in call
ret = func(*args)
File "faster_rcnn/../lib/rpn_msr/anchor_target_layer.py", line 151, in anchor_target_layer
argmax_overlaps = overlaps.argmax(axis=1) # (A)
ValueError: attempt to get argmax of an empty sequence
2017-12-29 13:23:49.293478: W tensorflow/core/framework/op_kernel.cc:1152] Internal: Failed to run py callback pyfunc_0: see error log.
2017-12-29 13:23:49.330989: W tensorflow/core/framework/op_kernel.cc:1152] Internal: Failed to run py callback pyfunc_0: see error log.
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _recv_gt_boxes_0, _recv_gt_ishard_0, _recv_dontcare_areas_0, _recv_im_info_0, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
Traceback (most recent call last):
File "faster_rcnn/train_net.py", line 106, in
restore=bool(int(args.restore)))
File "faster_rcnn/../lib/fast_rcnn/train.py", line 407, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "faster_rcnn/../lib/fast_rcnn/train.py", line 261, in train_model
cls_prob, bbox_pred, rois = sess.run(fetches=fetch_list, feed_dict=feed_dict)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Failed to run py callback pyfunc_0: see error log.
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _recv_gt_boxes_0, _recv_gt_ishard_0, _recv_dontcare_areas_0, _recv_im_info_0, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10/_663 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_379_gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]
Caused by op u'RPN/rpn-data/PyFunc', defined at:
File "faster_rcnn/train_net.py", line 98, in
network = get_network(args.network_name)
File "faster_rcnn/../lib/networks/factory.py", line 22, in get_network
return FPN_train()
File "faster_rcnn/../lib/networks/FPN_train.py", line 25, in init
self.setup()
File "faster_rcnn/../lib/networks/FPN_train.py", line 418, in setup
.anchor_target_layer(_feat_stride[2:], anchor_size[2:], name = 'rpn-data'))
File "faster_rcnn/../lib/networks/network.py", line 34, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "faster_rcnn/../lib/networks/network.py", line 380, in anchor_target_layer
[tf.float32,tf.float32,tf.float32,tf.float32])
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 189, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 40, in _py_func
name=name)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()
InternalError (see above for traceback): Failed to run py callback pyfunc_0: see error log.
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _recv_gt_boxes_0, _recv_gt_ishard_0, _recv_dontcare_areas_0, _recv_im_info_0, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10/_663 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_379_gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]
Command exited with non-zero status 1
167.60user 9.99system 2:55.72elapsed 101%CPU (0avgtext+0avgdata 2398032maxresident)k
0inputs+3640outputs (0major+1128621minor)pagefaults 0swaps
num gt: 2
num fg: 17
num bg: 111
cudaCheckError() failed in ROIPoolForward: invalid device function
cudaCheckError() failed in ROIPoolForward: driver shutting down
2017-08-21 10:19:34.578317: F tensorflow/stream_executor/cuda/cuda_driver.cc:312] Check failed: CUDA_SUCCESS == cuCtxSetCurrent(cuda_context->context()) (0 vs. 4)
2017-08-21 10:19:34.578347: E tensorflow/stream_executor/stream.cc:289] Error recording event in stream: error recording CUDA event on stream 0x61a7d80: CUDA_ERROR_DEINITIALIZED; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2017-08-21 10:19:34.578429: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_DEINITIALIZED
2017-08-21 10:19:34.578450: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected Event status: 1
Command terminated by signal 6
65.76user 10.62system 1:13.23elapsed 104%CPU (0avgtext+0avgdata 1891988maxresident)k
0inputs+3016outputs (0major+831469minor)pagefaults 0swaps
@xmyqsh ,I met this problem, how to solve?
When I run train_net.py, I have met the following error:
Traceback (most recent call last):
File "faster_rcnn/train_net.py", line 26, in
from lib.networks.factory import get_network
File "faster_rcnn/../lib/networks/init.py", line 8, in
from .FPN_train import FPN_train
File "faster_rcnn/../lib/networks/FPN_train.py", line 9, in
from .network import Network
File "faster_rcnn/../lib/networks/network.py", line 5, in
from ..roi_pooling_layer import roi_pooling_op as roi_pool_op
File "faster_rcnn/../lib/roi_pooling_layer/init.py", line 7, in
import roi_pooling_op
File "faster_rcnn/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: faster_rcnn/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE
What's the reason? Is there anybody who will give some advice?
Thank you.
I managed to do training on my own datasets, after 12w iterations I went for a test by using commands similar to
CUDA_VISIBLE_DEVICES=0 python ./faster_rcnn/test_net.py --gpu 0 --weights output/FPN_end2end/voc_0712_trainval/FPN_iter_370000.ckpt --imdb voc_0712_test --cfg ./experiments/cfgs/FPN_end2end.yml --network FPN_test
But detection results from some images turn out to be nothing while the others seem normal.
I have my own loader inherited from imdb and it works well on faster-rcnn from this repository:
I'm totally confused, any suggestions will be appreciated, thanks.
Hi, I use your model to train VOC2007.
I get the following checkpoint
FPN_alt_opt_stage1_Fast_RCNN_iter_4.ckpt.data-00000-of-00001
FPN_alt_opt_stage1_Fast_RCNN_iter_4.ckpt.index
FPN_alt_opt_stage1_Fast_RCNN_iter_4.ckpt.meta
FPN_alt_opt_stage1_RPN_iter_8.ckpt.data-00000-of-00001
FPN_alt_opt_stage1_RPN_iter_8.ckpt.index
FPN_alt_opt_stage1_RPN_iter_8.ckpt.meta
FPN_alt_opt_stage2_Fast_RCNN_iter_4.ckpt.data-00000-of-00001
FPN_alt_opt_stage2_Fast_RCNN_iter_4.ckpt.index
FPN_alt_opt_stage2_Fast_RCNN_iter_4.ckpt.meta
FPN_alt_opt_stage2_RPN_iter_8.ckpt.data-00000-of-00001
FPN_alt_opt_stage2_RPN_iter_8.ckpt.index
FPN_alt_opt_stage2_RPN_iter_8.ckpt.meta
but don't have the final checkpoint, and I get this error
./tools/test_net.py --gpu 0 --weights --imdb voc_2007_test --cfg experiments/cfgs/FPN_alt_opt.yml --network FPN_alt_opt_train_test python: can't open file './tools/test_net.py': [Errno 2] No such file or directory
How can I fix it.
Thank you!
Hi there! As I scan through Feature Pyramid Network for Object Detection, I found a part where there is a formula for choosing the feature map for ROI based on the size of the region proposal. Can you show me how you implement this? I wish to implement FPN on the new Object Detection API provided by Tensorflow.
Hi
Thx for your excellent job! I'm a new learner and it benefits me a lot.
I notice that there is no test_net.py in FPN/faster_rcnn/
When i use the test_net.py from another project, there gose an error
Traceback (most recent call last):
File "faster_rcnn/test_net.py", line 85, in
network = get_network(args.network_name)
File "faster_rcnn/../lib/networks/factory.py", line 19, in get_network
return FPN_test()
File "faster_rcnn/../lib/networks/FPN_test.py", line 25, in init
self.setup()
File "faster_rcnn/../lib/networks/FPN_test.py", line 231, in setup
.fc(n_classes, relu=False, name='cls_score')
File "faster_rcnn/../lib/networks/network.py", line 34, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "faster_rcnn/../lib/networks/network.py", line 390, in fc
dim *= d
TypeError: unsupported operand type(s) for *=: 'int' and 'NoneType'
Thx again :)
@xmyqsh
did u use both the voc2007 & voc2012?
could u share the corresponding result?
Hi,
I used your shared ckpt file to test the validation and run with VOC07 + 12 but get all -1 for mAP, (also try 07 but get the same result :( ) @xmyqsh
I am wondering anyone has any ideas about this error?
I want to use FPN for the VGG model since I only have 2 gtx 980 with 8 G memory, do you have plan to share a VGG based model ?
Error happened when construct the train network, it seems some parameters of input before fc6 are lost, could you give some suggestions
The alt_opt training has been done. But when I test the network, the file ../tools/test_net.py doesn't exsit.
I can not download .ckpt file in google drive,is there any other means to download ?
Hi @xmyqsh, thanks for your work firstly. The code is neat and wonderful. But I wonder if you could add more info of dataset structure and how we use this code to train and test. BTW, it seems a little bit difficult if I change the training set to my dataset, would you give me some advice? The pascol_voc.py in datasets contains many functions and I am not sure if I should rewrite all the functions. (my dataset structure similar to the VOC)
@ouchjm
Hi, I encountered memory allocation problem during initialization (solving).
InternalError (see above for traceback): Dst tensor is not initialized. [[Node: fc7_new/weights/Momentum/Initializer/Const = Const[_class=["loc:@fc7_new/weights"], dtype=DT_FLOAT, value=Tensor<type: float shape: [4096,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
and
Limit: 5111519641
InUse: 5111519488
MaxInUse: 5111519488
NumAllocs: 148
MaxAllocSize: 4991561984
My setting:
So I assume the model is too large to fit in?
Tensorflow does have the issue that taking up too much memory in initialization. Wonder anyone has encountered or resolved this issue?
Thanks!
我在训练自己的数据集时,由于数据集中有些图片是没有我要标注的目标,所以这部分图片没有标签,我在训练是出现
2017-11-15 21:07:00.175803: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: exceptions.ValueError: attempt to get argmax of an empty sequence
可是当我把这部分图片的标签设置成0 0 0 0,生成xml文件,但是在训练时出现
tensorflow.python.framework.errors_impl.UnknownError: exceptions.OverflowError: cannot convert float infinity to integer @xmyqsh
In the build_loss function, I see that the labels are filtered for only foreground/background regions before calculating the rpn class loss and that the SmoothL1Loss is normalized by the number of foreground regions. I don't see this in the original Faster-RCNN implementation. @xmyqsh Can you please explain why this is necessary?
当我运行FPN_alt_opt.sh时,出现如下问题:
dxt@dxt-System-Product-Name:~/FPN-master (2)$ ./experiments/scripts/FPN_alt_opt.sh 0 FPN_alt_opt pascal_voc0712
@xmyqsh 我不知道如何解决,请帮助我,谢谢啦!!!
When I traing my data using
nohup ./experiments/scripts/FPN_end2end.sh 0 FPN pascal_voc2007 --set RNG_SEED 42 TRAIN.SCALES "[800]" > FPN.log 2>&1 &
I got this trouble
ignore bn4f_branch2a offset
ignore res5a_branch2c weights
ignore res5a_branch2b weights
ignore res5a_branch2a weights
ignore res3d_branch2b weights
ignore res3d_branch2c weights
ignore res3d_branch2a weights
./faster_rcnn/../lib/fast_rcnn/bbox_transform.py:61: RuntimeWarning: overflow encountered in exp
pred_w = np.exp(dw) * widths[:, np.newaxis]
./faster_rcnn/../lib/fast_rcnn/bbox_transform.py:61: RuntimeWarning: overflow encountered in multiply
pred_w = np.exp(dw) * widths[:, np.newaxis]
./faster_rcnn/../lib/fast_rcnn/bbox_transform.py:62: RuntimeWarning: overflow encountered in exp
pred_h = np.exp(dh) * heights[:, np.newaxis]
./faster_rcnn/../lib/fast_rcnn/bbox_transform.py:62: RuntimeWarning: overflow encountered in multiply
pred_h = np.exp(dh) * heights[:, np.newaxis]
Traceback (most recent call last):
File "./faster_rcnn/train_net.py", line 109, in <module>
restore=bool(int(args.restore)))
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 407, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 277, in train_model
bbox_pred = bbox_pred * np.tile(self.bbox_stds, (bbox_pred.shape[0], 1)) + \
ValueError: operands could not be broadcast together with shapes (128,84) (128,8)
Command exited with non-zero status 1
13.00user 2.02system 0:14.48elapsed 103%CPU (0avgtext+0avgdata 2385504maxresident)k
0inputs+3096outputs (0major+371289minor)pagefaults 0swaps
I don't know how to fix it.
Cool code! I wonder whether you have reimplemented the exact results of the FPN paper?
Best!
guangxing
If I want to use ResNet 101 ,what should I change
When I started to train end-to-end , the error happened:
W tensorflow/core/framework/op_kernel.cc:1158] Unknown: KeyError: b'TRAIN'
Caused by op 'RPN/rpn_rois/PyFunc', defined at:
File "./faster_rcnn/train_net.py", line 101, in
network = get_network(args.network_name)
File "./faster_rcnn/../lib/networks/factory.py", line 22, in get_network
return FPN_train()
File "./faster_rcnn/../lib/networks/FPN_train.py", line 25, in init
self.setup()
File "./faster_rcnn/../lib/networks/FPN_train.py", line 427, in setup
.proposal_layer(_feat_stride[2:], anchor_size[2:], 'TRAIN',name = 'rpn_rois'))
File "./faster_rcnn/../lib/networks/network.py", line 34, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "./faster_rcnn/../lib/networks/network.py", line 345, in proposal_layer
[tf.float32]),
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 198, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/ops/gen_script_ops.py", line 38, in _py_func
name=name)
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()
UnknownError (see above for traceback): KeyError: b'TRAIN'
[[Node: RPN/rpn_rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT], token="pyfunc_1", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/Reshape_2/_717, RPN/rpn_bbox_pred/BiasAdd/_719, RPN/Reshape_5/_721, RPN/rpn_bbox_pred_1/BiasAdd/_723, RPN/Reshape_8/_725, RPN/rpn_bbox_pred_2/BiasAdd/_727, RPN/Reshape_11/_729, RPN/rpn_bbox_pred_3/BiasAdd/_731, RPN/Reshape_14/_733, RPN/rpn_bbox_pred_4/BiasAdd/_735, _arg_im_info_0_5, RPN/rpn_rois/PyFunc/input_11, RPN/rpn_rois/PyFunc/input_12, RPN/rpn_rois/PyFunc/input_13)]]
[[Node: gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13/_619 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_2711_gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]
i noticed that lib/networks/network.py have been modified.
use the previous version is fine, but use modified version will lead to gradient explosion.
by the way, the dataset i use is caltech pedestrian detection.
apologize for my poor English:)
please help me .when i train my own data
2017-11-15 21:07:00.175803: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: exceptions.ValueError: attempt to get argmax of an empty sequence
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _arg_gt_boxes_0_3, _arg_gt_ishard_0_4, _arg_dontcare_areas_0_2, _arg_im_info_0_5, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
Traceback (most recent call last):
File "./faster_rcnn/train_net.py", line 109, in
restore=bool(int(args.restore)))
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 409, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 263, in train_model
cls_prob, bbox_pred, rois = sess.run(fetches=fetch_list, feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: exceptions.ValueError: attempt to get argmax of an empty sequence
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _arg_gt_boxes_0_3, _arg_gt_ishard_0_4, _arg_dontcare_areas_0_2, _arg_im_info_0_5, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13/_619 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_2711_gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]
Caused by op u'RPN/rpn-data/PyFunc', defined at:
File "./faster_rcnn/train_net.py", line 101, in
network = get_network(args.network_name)
File "./faster_rcnn/../lib/networks/factory.py", line 22, in get_network
return FPN_train()
File "./faster_rcnn/../lib/networks/FPN_train.py", line 25, in init
self.setup()
File "./faster_rcnn/../lib/networks/FPN_train.py", line 418, in setup
.anchor_target_layer(_feat_stride[2:], anchor_size[2:], name = 'rpn-data'))
File "./faster_rcnn/../lib/networks/network.py", line 34, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "./faster_rcnn/../lib/networks/network.py", line 380, in anchor_target_layer
[tf.float32,tf.float32,tf.float32,tf.float32])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/script_ops.py", line 198, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 38, in _py_func
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): exceptions.ValueError: attempt to get argmax of an empty sequence
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _arg_gt_boxes_0_3, _arg_gt_ishard_0_4, _arg_dontcare_areas_0_2, _arg_im_info_0_5, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13/_619 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_2711_gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]
I am using two 970 to run the network, it just get stuck in loading the Resnet50.npy, maybe the memory is too small, but I suspect it will be stuck in a later stage than stuck in loading the Resnet50 anyway
2017-08-16 09:00:28.165008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1
2017-08-16 09:00:28.165016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y Y
2017-08-16 09:00:28.165028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1: Y Y
2017-08-16 09:00:28.165033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
2017-08-16 09:00:28.165036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 970, pci bus id: 0000:06:00.0)
Computing bounding-box regression targets...
19.63user 1.20system 0:25.18elapsed 82%CPU (0avgtext+0avgdata 1314772maxresident)k
583592inputs+3008outputs (159major+346781minor)pagefaults 0swaps
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Loading pretrained model weights from data/pretrain_model/Resnet50.npy
Traceback (most recent call last):
File "./faster_rcnn/train_net.py", line 109, in <module>
restore=bool(int(args.restore)))
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 407, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 164, in train_model
raise 'Check your pretrained model {:s}'.format(self.pretrained_model)
TypeError: exceptions must be old-style classes or derived from BaseException, not str
Command exited with non-zero status 1
19.63user 1.20system 0:25.18elapsed 82%CPU (0avgtext+0avgdata 1314772maxresident)k
583592inputs+3008outputs (159major+346781minor)pagefaults 0swaps
when try to train FPN, i found the following exception:
Exception: Check your pretrained model data/pretrain_model/Resnet50.npy
where can i download Resnet50.npy?
Hi,thank you for your code,I'm new on this ,I use your net and change the baseline to resnext using resnext50 which is convertied from caffe model ,but I get -1 mAP for all classes,can you tell me why? Thank you and please forgive my poor English!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.