Coder Social home page Coder Social logo

mx-maskrcnn's People

Contributors

anuragarnab avatar falaktheoptimist avatar huangzehao avatar kohillyang avatar lokeshh avatar rogerchern avatar winstywang avatar xinghedyc avatar zehaos avatar ziyuehuang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mx-maskrcnn's Issues

How to test my own picture

I try to test my own image,but errors found.can you tell me the detail to test, thank you for very much

when i run bash scripts/train_alternate.sh show error below:

Traceback (most recent call last):
File "train_alternate_mask_fpn.py", line 114, in
main()
File "train_alternate_mask_fpn.py", line 111, in main
args.rcnn_epoch, args.rcnn_lr, args.rcnn_lr_step)
File "train_alternate_mask_fpn.py", line 31, in alternate_train
train_shared=False, lr=rpn_lr, lr_step=rpn_lr_step)
File "/home/luo/mx-maskrcnn-master/rcnn/tools/train_rpn.py", line 149, in train_rpn
arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)
File "/home/luo/mx-maskrcnn-master/incubator-mxnet/python/mxnet/module/base_module.py", line 460, in fit
for_training=True, force_rebind=force_rebind)
File "/home/luo/mx-maskrcnn-master/rcnn/core/module.py", line 141, in bind
force_rebind=False, shared_module=None)
File "/home/luo/mx-maskrcnn-master/incubator-mxnet/python/mxnet/module/module.py", line 417, in bind
state_names=self._state_names)
File "/home/luo/mx-maskrcnn-master/incubator-mxnet/python/mxnet/module/executor_group.py", line 231, in init
self.bind_exec(data_shapes, label_shapes, shared_group)
File "/home/luo/mx-maskrcnn-master/incubator-mxnet/python/mxnet/module/executor_group.py", line 327, in bind_exec
shared_group))
File "/home/luo/mx-maskrcnn-master/incubator-mxnet/python/mxnet/module/executor_group.py", line 603, in _bind_ith_exec
shared_buffer=shared_data_arrays, **input_shapes)
File "/home/luo/mx-maskrcnn-master/incubator-mxnet/python/mxnet/symbol/symbol.py", line 1491, in simple_bind
raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (1, 3, 1024, 2048)
bbox_weight: (1, 12, 174592)
bbox_target: (1, 12, 174592)
label: (1, 523776)
[15:49:36] src/storage/storage.cc:59: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: invalid device ordinal

how i should do? thanks

The number of sampled regular locations

The number of sampled regular locations in your implementation seems 3.

Dtype h_stride = (hend - hstart)/3.0;
Dtype w_stride = (wend - wstart)/3.0;

But the author samples 4 regular locations.

Is 3 better than 4?
Do you observe diminishing as the number of regular locations increase?

AttributeError: 'module' object has no attribute 'ROIAlign' When I run demo.sh

I got some warning when I ran make:

x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/local/cuda/include -I/usr/include/python2.7 -c gpu_nms.cpp -o build/temp.linux-x86_64-2.7/gpu_nms.o -Wno-unused-function
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarraytypes.h:1788:0,
from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/arrayobject.h:4,
from gpu_nms.cpp:499:
/usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
c++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wl,-Bsymbolic-functions -Wl,-z,relro -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/nms_kernel.o build/temp.linux-x86_64-2.7/gpu_nms.o -L/usr/local/cuda/lib64 -Wl,-R/usr/local/cuda/lib64 -lcudart -o /home/lzq12138/lkd/maskrcnn/mx-maskrcnn/rcnn/cython/gpu_nms.so
cd rcnn/pycocotools; python2 setup.py build_ext --inplace; rm -rf build; cd ../../
Warning: Extension name '_mask' does not match fully qualified name 'rcnn.pycocotools._mask' of '_mask.pyx'
Compiling _mask.pyx because it depends on /usr/local/lib/python2.7/dist-packages/Cython/Includes/numpy/init.pxd.

I can run bash scripts/train_alternate.sh successfully ,then I canceled the train with ctrl+c and trun to bash scripts/demo.sh

Error in CustomOp.forward: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/mxnet/operator.py", line 782, in forward_entry
aux=tensors[4])
File "/home/lzq12138/lkd/maskrcnn/mx-maskrcnn/rcnn/PY_OP/fpn_roi_pooling.py", line 76, in forward
roi_pool = mx.nd.ROIAlign(feat_dict['stride%s' % s], _rois, (self._pool_h, self._pool_w), 1.0 / float(s))
AttributeError: 'module' object has no attribute 'ROIAlign'

[00:34:38] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [00:34:38] src/operator/custom/custom.cc:293: Check failed: reinterpret_cast(params.info->callbacks[kCustomOpForward])( ptrs.size(), ptrs.data(), tags.data(), reinterpret_cast<const int*>(req.data()), static_cast(ctx.is_train), params.info->contexts[kCustomOpForward])

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x272c4c) [0x7f1961857c4c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x33ffaf) [0x7f1961924faf]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x20ef112) [0x7f19636d4112]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(MXExecutorForward+0x15) [0x7f1963665675]
[bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f19d312ce40]
[bt] (5) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7f19d312c8ab]
[bt] (6) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7f19d333c3df]
[bt] (7) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7f19d3340d82]
[bt] (8) python2(PyObject_Call+0x43) [0x4b0cb3]
[bt] (9) python2(PyEval_EvalFrameEx+0x5faf) [0x4c9faf]

Demo Test got all " NAN scores"

Emm.... I propose issue again...
My dataset is 5 classes. There are several terms of my config.

config.NUM_CLASSES = 5
config.SCALES = [(1024, 2048)]
config.CLASS_ID = [0, 1, 2, 3, 4]
config.TRAIN.SCALE = True (!!! This term you never use in your project!!!)
default.rpn_epoch = 4
default.rcnn_epoch = 12
dataset.ObjectSnap.NUM_CLASSES = 5
dataset.ObjectSnap.CLASS_ID = [0, 1, 2, 3, 4]
dataset.ObjectSnap.SCALES = [(1024, 2048)]

I build a small dataset with 41 images for verification. The net is trained well. Parts of my Log:

Epoch[2] Batch [20] Speed: 10.90 samples/sec Train-RPNAcc=0.903181, RPNLogLoss=0.238827, RPNL1Loss=0.933737,
Epoch[2] Batch [40] Speed: 12.69 samples/sec Train-RPNAcc=0.919255, RPNLogLoss=0.200416, RPNL1Loss=0.927370,
Epoch[2] Train-RPNAcc=0.919255
Epoch[2] Train-RPNLogLoss=0.200416
Epoch[2] Train-RPNL1Loss=0.927370
Epoch[2] Time cost=7.387

Epoch[0] Batch [20] Speed: 1.18 samples/sec Train-RCNNAcc=0.936756, RCNNLogLoss=0.921319, RCNNL1Loss=1.693366, MaskACC=0.981722, MaskLogLoss=0.033281,
Epoch[0] Batch [40] Speed: 1.28 samples/sec Train-RCNNAcc=0.946408, RCNNLogLoss=0.617575, RCNNL1Loss=1.681111, MaskACC=0.983959, MaskLogLoss=0.030308,
Epoch[0] Train-RCNNAcc=0.946408
Epoch[0] Train-RCNNLogLoss=0.617575
Epoch[0] Train-RCNNL1Loss=1.681111
Epoch[0] Train-MaskACC=0.983959
Epoch[0] Train-MaskLogLoss=0.030308
Epoch[0] Time cost=67.991

Finally, I get my final-0000.params model
However! When I use several images in my train lists, I got all NAN scores for both my demo test and evaluation test. Jesus! How could that happen??

PS: One more doubt point, the log shows I get a very high Acc (like 0.8~0.9) even at the very beginning of my training!

Hope get your reply soon~ Thanks~

MXNetError: [11:03:30] /mx-maskrcnn/incubator-mxnet/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:110: Check failed: err == cudaSuccess (7 vs. 0) Name: MapPlanKernel ErrStr:too many resources requested for launch

Seems out of resources ,but I have 8G GPU memory and it not take all, why happens this?

MXNetError: [11:03:30] /media/jintian/Netac/CodeSpace/ng/auto_car/mx-maskrcnn/incubator-mxnet/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:110: Check failed: err == cudaSuccess (7 vs. 0) Name: MapPlanKernel ErrStr:too many resources requested for launch

Ask for: The commits which affect the performance of net.

I feel both sad and happy to that you find the filp-mask mistake. I do think it affects the effectiveness of my trained net(cost more than 6 days). So I wonder: Is there some other commits which will affect the training of net?

Thanks! I still admire your works!

Out of array boundary?

int hlow = min(max(static_cast<int>(floor(h)), 0), height-1);
int hhigh = hlow + 1;

the max possible value of hlow seems to be height-1, and the max value of hhigh is height. So is there some risk that the most bottom or right value out of feature array boundary?

Also, in these two lines,

Dtype alpha = (hlow == hhigh) ? static_cast<Dtype>(0.5) : (h - hlow) / (hhigh - hlow);
Dtype beta = (wleft == wright) ? static_cast<Dtype>(0.5) : (w - wleft) / (wright - wleft);

hlow is never equal to hhigh and denominator seems always be 1.

libPlayCtrl.so undefined symbol: AR_SetParam

when i compile mxnet, i got this error: OSError: libPlayCtrl.so: undefined symbol: AR_SetParam, could you help me solve this problem.I usually use caffe framework, this is my first time to use mxnet. Thank you.

demo problem

mkdir: cannot create directory 'data/cityscape/results/': File exists
Namespace(dataset='Cityscape', dataset_path='data/cityscape', epoch=0, gpu=0, has_rpn=True, image_set='val', network='resnet_fpn', prefix='model/final', proposal='rpn', result_path='data/cityscape/results/', root_path='data', shuffle=False, thresh=0.001, vis=False)
{'ANCHOR_RATIOS': [0.5, 1, 2],
'ANCHOR_SCALES': [8],
'CLASS_ID': [0, 24, 25, 26, 27, 28, 31, 32, 33],
'FIXED_PARAMS': ['conv0', 'stage1', 'gamma', 'beta'],
'FIXED_PARAMS_SHARED': ['conv0',
'stage1',
'stage2',
'stage3',
'stage4',
'P5',
'P4',
'P3',
'P2',
'gamma',
'beta'],
'NUM_ANCHORS': 3,
'NUM_CLASSES': 9,
'PIXEL_MEANS': array([0, 0, 0]),
'RCNN_FEAT_STRIDE': [32, 16, 8, 4],
'ROIALIGN': True,
'RPN_FEAT_STRIDE': [64, 32, 16, 8, 4],
'SCALES': [(1024, 2048)],
'TEST': {'BATCH_IMAGES': 1,
'HAS_RPN': True,
'NMS': 0.3,
'PROPOSAL_MIN_SIZE': [64, 32, 16, 8, 4],
'PROPOSAL_NMS_THRESH': 0.7,
'PROPOSAL_POST_NMS_TOP_N': 2000,
'PROPOSAL_PRE_NMS_TOP_N': 20000,
'RPN_MIN_SIZE': [64, 32, 16, 8, 4],
'RPN_NMS_THRESH': 0.7,
'RPN_POST_NMS_TOP_N': 1000,
'RPN_PRE_NMS_TOP_N': 6000},
'TRAIN': {'ASPECT_GROUPING': True,
'BATCH_IMAGES': 1,
'BATCH_ROIS': 256,
'BBOX_MEANS': [0.0, 0.0, 0.0, 0.0],
'BBOX_NORMALIZATION_PRECOMPUTED': False,
'BBOX_REGRESSION_THRESH': 0.5,
'BBOX_STDS': [0.1, 0.1, 0.2, 0.2],
'BBOX_WEIGHTS': array([ 1., 1., 1., 1.]),
'BG_THRESH_HI': 0.5,
'BG_THRESH_LO': 0.0,
'FG_FRACTION': 0.25,
'FG_THRESH': 0.5,
'RPN_BATCH_SIZE': 256,
'RPN_BBOX_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'RPN_CLOBBER_POSITIVES': False,
'RPN_FG_FRACTION': 0.5,
'RPN_MIN_SIZE': [64, 32, 16, 8, 4],
'RPN_NEGATIVE_OVERLAP': 0.3,
'RPN_NMS_THRESH': 0.7,
'RPN_POSITIVE_OVERLAP': 0.7,
'RPN_POSITIVE_WEIGHT': -1.0,
'RPN_POST_NMS_TOP_N': 2000,
'RPN_PRE_NMS_TOP_N': 12000,
'SCALE': True,
'SCALE_RANGE': [0.8, 1]}}
num_images 500
cityscape_val gt roidb loaded from data/cache/cityscape_val_gt_roidb.pkl
scripts/demo.sh: line 24: 31359 Segmentation fault (core dumped) python demo_mask.py --network resnet_fpn --dataset ${DATASET} --image_set ${TEST_SET} --prefix ${PREFIX} --result_path ${RESULT_PATH} --has_rpn --epoch 0 --gpu 0
This is my running log, I have finished the mxnet install, but when I run the demo.sh, I met this problem, anyone helps?

make cython failed

When i made the cython, it abort error message below, I run this in virtual machine, I know it's a problem about the gpu or cpu, i just don't know how to fix it, Someone can help me?

cd rcnn/cython/; python setup.py build_ext --inplace; rm -rf build; cd ../../
Traceback (most recent call last):
File "setup.py", line 58, in
CUDA = locate_cuda()
File "setup.py", line 46, in locate_cuda
raise EnvironmentError('The nvcc binary could not be '
EnvironmentError: The nvcc binary could not be located in your $PATH. Either add it to your path, or set $CUDAHOME
cd rcnn/pycocotools; python setup.py build_ext --inplace; rm -rf build; cd ../../
Warning: Extension name '_mask' does not match fully qualified name 'rcnn.pycocotools._mask' of '_mask.pyx'
running build_ext

Error in "_make_data_and_labels" -- Use my dataset to train net

I convert my dataset to the type of cityscapes to use this net. Of course, I modify several palaces for file index of dataset and class_id.
However, the training gets crashed after achieving the first training part of RPN.
The crash happend in
data_on_imgs['img_%s' % im_i]['bbox_targets_on_levels']['stride%s' % s] = np.concatenate([_bbox_targets, bbox_targets_pad])

data_on_imgs['img_%s' % im_i]['bbox_weights_on_levels']['stride%s' % s] = np.concatenate([_bbox_weights, bbox_weights_pad])

File "mx-maskrcnn/rcnn/core/loader.py", line 278, in _make_data_and_labels
ValueError: all the input array dimensions except for the concatenation axis must match exactly

PS: I check the pairs of corresponding img and label, they all get the same size in pairs.

Really Thank you guys for these a lot of help!

fail to "make"

When I build related cython code in step 4, I encountered the following error:
bug1
bug2

How could I solve it?
Thanks!

This application failed to start because it could not find or load the Qt platform plugin "xcb" in "". Available platform plugins are: minimal, offscreen, xcb.

Hi I encountered the following problem while runing scripts/demo_single_image.sh:

~/mx-maskrcnn$ bash scripts/demo_single_image.sh
Namespace(dataset='Cityscape', epoch=0, gpu=0, image_name='figures/test.jpg', network='resnet_fpn', prefix='model/final', thresh=0.3, vis=True)
This application failed to start because it could not find or load the Qt platform plugin "xcb"
in "".

Available platform plugins are: minimal, offscreen, xcb.

Reinstalling the application may fix this problem.
scripts/demo_single_image.sh: 行 18: 18915 已放弃 (核心已转储) python2 -m rcnn.tools.demo_single_image --network resnet_fpn --dataset ${DATASET} --prefix ${PREFIX} --epoch 0 --gpu 0 --image_name figures/test.jpg --thresh 0.3 --vis true

Thank you for your answer!!

Full of errors.

Thanks for this "Wonderfull" opensource, but I have to say the codes are really shit. I opened a issue reported a bug and return a commit point, after I tried it, same errors happen again. Too simple the technical people.

src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory

I tried to train the model with coco dataset, but when it train rcnn part, it say out of memory.
Would you give me some hints to solve this problem?
My gpu is GTX1080, 8G
There are some information during training
{'ANCHOR_RATIOS': [0.5, 1, 2],
'ANCHOR_SCALES': [8],
'CLASS_ID': array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80]),
'FIXED_PARAMS': ['conv0', 'stage1', 'gamma', 'beta'],
'FIXED_PARAMS_SHARED': ['conv0',
'stage1',
'stage2',
'stage3',
'stage4',
'P5',
'P4',
'P3',
'P2',
'gamma',
'beta'],
'NUM_ANCHORS': 3,
'NUM_CLASSES': 81,
'PIXEL_MEANS': array([0, 0, 0]),
'RCNN_FEAT_STRIDE': [32, 16, 8, 4],
'ROIALIGN': True,
'RPN_FEAT_STRIDE': [64, 32, 16, 8, 4],
'SCALES': [(800, 1000)],
'TEST': {'BATCH_IMAGES': 1,
'HAS_RPN': True,
'NMS': 0.3,
'PROPOSAL_MIN_SIZE': [64, 32, 16, 8, 4],
'PROPOSAL_NMS_THRESH': 0.7,
'PROPOSAL_POST_NMS_TOP_N': 2000,
'PROPOSAL_PRE_NMS_TOP_N': 20000,
'RPN_MIN_SIZE': [64, 32, 16, 8, 4],
'RPN_NMS_THRESH': 0.7,
'RPN_POST_NMS_TOP_N': 1000,
'RPN_PRE_NMS_TOP_N': 6000},
'TRAIN': {'ASPECT_GROUPING': True,
'BATCH_IMAGES': 1,
'BATCH_ROIS': 256,
'BBOX_MEANS': [0.0, 0.0, 0.0, 0.0],
'BBOX_NORMALIZATION_PRECOMPUTED': False,
'BBOX_REGRESSION_THRESH': 0.5,
'BBOX_STDS': [0.1, 0.1, 0.2, 0.2],
'BBOX_WEIGHTS': array([ 1., 1., 1., 1.]),
'BG_THRESH_HI': 0.5,
'BG_THRESH_LO': 0.0,
'FG_FRACTION': 0.25,
'FG_THRESH': 0.5,
'RPN_BATCH_SIZE': 32,
'RPN_BBOX_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'RPN_CLOBBER_POSITIVES': False,
'RPN_FG_FRACTION': 0.5,
'RPN_MIN_SIZE': [64, 32, 16, 8, 4],
'RPN_NEGATIVE_OVERLAP': 0.3,
'RPN_NMS_THRESH': 0.7,
'RPN_POSITIVE_OVERLAP': 0.7,
'RPN_POSITIVE_WEIGHT': -1.0,
'RPN_POST_NMS_TOP_N': 2000,
'RPN_PRE_NMS_TOP_N': 12000,
'SCALE': True,
'SCALE_RANGE': [0.8, 1]}}

About multi-scale training

There is multi-scale training in your implementation.

config.TRAIN.SCALE_RANGE = (0.8, 1)

But the author resizes the shorter edge to 800 pixels on coco and 2048×1024 pixels on Cityscapes.

The FPN backend should have a good scale invariant ability(I'm not sure this description is proper or not).

Just looking forward single scale training result.

train the cityscape dataset with RCNNL1Loss=nan?

when training the dataset in stage TRAIN RCNN WITH RPN INIT AND DETECTION (after COMBINE RPN2 WITH RCNN1), the RCNNL1Loss is always nan, i try several times, and i set the base_lr really low, it is always nan,
environment is p40/8 gpu

when training on my own pictures,show error below

num_images 1962
cityscape_small_train gt roidb loaded from model/res50-fpn/cityscape/alternate/cache/cityscape_small_train_gt_roidb.pkl
append flipped images to roidb
filtered 3924 roidb entries: 3924 -> 0
[]
0
Traceback (most recent call last):
File "train_alternate_mask_fpn.py", line 115, in
main()
File "train_alternate_mask_fpn.py", line 112, in main
args.rcnn_epoch, args.rcnn_lr, args.rcnn_lr_step)
File "train_alternate_mask_fpn.py", line 31, in alternate_train
train_shared=False, lr=rpn_lr, lr_step=rpn_lr_step)
File "/home/zhkj/mx-maskrcnn/rcnn/tools/train_rpn.py", line 54, in train_rpn
allowed_border=9999)
File "/home/zhkj/mx-maskrcnn/rcnn/core/loader.py", line 407, in init
self.get_batch()
File "/home/zhkj/mx-maskrcnn/rcnn/core/loader.py", line 507, in get_batch
iroidb = [roidb[i] for i in range(islice.start, islice.stop)]
IndexError: list index out of range

how to resolve it?thanks for your answer!!!

How to evaluate the result?

Hi, thank u very much for sharring. I run your demo and successfully segment 500 pictures--the result is really wonderfully!

However, how can I get the accuracy (as u mention in your table, u get a average accuracy of 26.2 on test set)?

Error when resume to train net

When I try to resume my net, this error happened.

File "train_alternate_mask_fpn.py", line 121, in
main()
File "train_alternate_mask_fpn.py", line 118, in main
args.rcnn_epoch, args.rcnn_lr, args.rcnn_lr_step)
File "train_alternate_mask_fpn.py", line 35, in alternate_train
train_shared=False, lr=rpn_lr, lr_step=rpn_lr_step)
File "/home/liyuw/Geonet-mx-maskrcnn/rcnn/tools/train_rpn.py", line 73, in train_rpn
arg_params, aux_params = load_param(prefix, begin_epoch, convert=True)
File "/home/liyuw/Geonet-mx-maskrcnn/rcnn/utils/load_model.py", line 49, in load_param
arg_params, aux_params = load_checkpoint(prefix, epoch)
File "/home/liyuw/Geonet-mx-maskrcnn/rcnn/utils/load_model.py", line 15, in load_checkpoint
save_dict = mx.nd.load('%s-%04d.params' % (prefix, epoch))
File "./incubator-mxnet/python/mxnet/ndarray/utils.py", line 174, in load
ctypes.byref(names)))
File "./incubator-mxnet/python/mxnet/base.py", line 146, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))

mxnet.base.MXNetError: [19:27:16] src/io/local_filesys.cc:166: Check failed: allow_null LocalFileSystem: fail to open "model/res50-fpn/cityscape/alternate/rpn1-0000.params"
I do not have the sudo permission.

I got no mask on demo scripts

hi, I put our data as cityscape format. And train it successfully and get the final.params model.

But when I evaluate it, got the following result:
##################################################
what : AP AP_50%
##################################################
person : 0.000 0.000
rider : nan nan
car : nan nan
truck : nan nan
bus : nan nan
train : nan nan
motorcycle : nan nan
bicycle : nan nan

average : 0.000 0.000

And got no mask on demo scripts.
would you please help to suggest the problem happened in train process or some other reanson.

Thank you.

ImportError:No module named bbox

When I run train_alternate_mask_fpn.py, I get this error: ImportError:No module named bbox. I check there is the script in /mx-maskrcnn/rnn/cython/bbox.pyx, but I don't know why it can't import this script? It is the reason of .pyx?

Can not access model on dropbox anymore

Hi,
I could see the drop box link about 2h ago on my mobile, but when I tried again now to download it, the site can not be accessed anymore, whether from PC or mobile. Is the link de-activated or cancelled? Can you repost the param file somewhere else?
Thanks
Tets

Abnormal RCNNL1Loss and MaskLogLoss after RuntimeWarning

Hi guys,

I am encountering the following issue while training on a single TITAN X GPU:

INFO:root:Epoch[0] Batch [820] Speed: 0.53 samples/sec Train-RCNNAcc=0.753992, RCNNLogLoss=0.927809, RCNNL1Loss=4.257946, MaskACC=0.902436, MaskLogLoss=0.166465,
/home/twang/work/fine_segmentation/baselines/mx-maskrcnn/rcnn/core/metric.py:134: RuntimeWarning: invalid value encountered in greater
idx = np.where(np.logical_and(mask_prob > 0.5, mask_weight == 1))
INFO:root:Epoch[0] Batch [840] Speed: 0.56 samples/sec Train-RCNNAcc=0.753502, RCNNLogLoss=0.987536, RCNNL1Loss=24666331660818701805251597434880.000000, MaskACC=0.901641, MaskLogLoss=nan,

The RCNNL1Loss basically explodes after this RuntimeWarning.
The MaskLogLoss was okay before this RuntimeWarning, but becomes NaN afterwards.

The error occurred during the 3rd training stage: TRAIN RCNN WITH IMAGENET INIT AND RPN DETECTION

Any advice will be much appreciated. Thanks!

Problem while training

Hi I encountered the following problem while training:

Calling initializer with init(str, NDArray) has been deprecated.please use init(mx.init.InitDesc(...), NDArray) instead.
init_internal(k, arg_params[k])
init P5_lateral_bias
init rpn_conv_weight
init rpn_conv_bias
init rpn_conv_cls_weight
init rpn_conv_cls_bias
init P4_lateral_weight
init P4_lateral_bias
init P4_aggregate_weight
init P4_aggregate_bias
init P3_lateral_weight
init P3_lateral_bias
init P3_aggregate_weight
init P3_aggregate_bias
init P2_lateral_weight
init P2_lateral_bias
init P2_aggregate_weight
init P2_aggregate_bias
init rpn_conv_bbox_weight
init rpn_conv_bbox_bias
lr 0.004 lr_epoch_diff [6] lr_iters [8895]
[14:30:08] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[14:30:12] /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/dmlc-core/include/dmlc/./logging.h:308: [14:30:12] src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory

Stack trace returned 10 entries:
[bt] (0) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f71355ef3cc]
[bt] (1) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet7storage23GPUPooledStorageManager5AllocEm+0x15e) [0x7f71365e80de]
[bt] (2) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet11StorageImpl5AllocEmNS_7ContextE+0x69) [0x7f71365eb5e9]
[bt] (3) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(+0x172088f) [0x7f713665b88f]
[bt] (4) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor19InitDataEntryMemoryEPSt6vectorINS_7NDArrayESaIS3_EE+0x2a54) [0x7f71366643b4]
[bt] (5) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor15FinishInitGraphEN4nnvm6SymbolENS2_5GraphEPNS_8ExecutorERKSt13unordered_mapINS2_9NodeEntryENS_7NDArrayENS2_13NodeEntryHashENS2_14NodeEntryEqualESaISt4pairIKS8_S9_EEE+0xa11) [0x7f713666ac81]
[bt] (6) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor4InitEN4nnvm6SymbolERKNS_7ContextERKSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES4_St4lessISD_ESaISt4pairIKSD_S4_EEERKSt6vectorIS4_SaIS4_EESR_SR_RKSt13unordered_mapISD_NS2_6TShapeESt4hashISD_ESt8equal_toISD_ESaISG_ISH_ST_EEERKSS_ISD_iSV_SX_SaISG_ISH_iEEERKSN_INS_9OpReqTypeESaIS18_EERKSt13unordered_setISD_SV_SX_SaISD_EEPSN_INS_7NDArrayESaIS1I_EES1L_S1L_PSS_ISD_S1I_SV_SX_SaISG_ISH_S1I_EEEPNS_8ExecutorERKSS_INS2_9NodeEntryES1I_NS2_13NodeEntryHashENS2_14NodeEntryEqualESaISG_IKS1S_S1I_EEE+0x909) [0x7f713666d1e9]
[bt] (7) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(ZN5mxnet8Executor10SimpleBindEN4nnvm6SymbolERKNS_7ContextERKSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES3_St4lessISC_ESaISt4pairIKSC_S3_EEERKSt6vectorIS3_SaIS3_EESQ_SQ_RKSt13unordered_mapISC_NS1_6TShapeESt4hashISC_ESt8equal_toISC_ESaISF_ISG_SS_EEERKSR_ISC_iSU_SW_SaISF_ISG_iEEERKSM_INS_9OpReqTypeESaIS17_EERKSt13unordered_setISC_SU_SW_SaISC_EEPSM_INS_7NDArrayESaIS1H_EES1K_S1K_PSR_ISC_S1H_SU_SW_SaISF_ISG_S1H_EEEPS0+0x233) [0x7f713666d813]
[bt] (8) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(MXExecutorSimpleBind+0x2d4a) [0x7f713662c40a]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f715bc80e40]

Traceback (most recent call last):
File "train_alternate_mask_fpn.py", line 114, in
main()
File "train_alternate_mask_fpn.py", line 111, in main
args.rcnn_epoch, args.rcnn_lr, args.rcnn_lr_step)
File "train_alternate_mask_fpn.py", line 31, in alternate_train
train_shared=False, lr=rpn_lr, lr_step=rpn_lr_step)
File "/home/cougarnet.uh.edu/csmailis/mx-maskrcnn/rcnn/tools/train_rpn.py", line 149, in train_rpn
arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)
File "/home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/module/base_module.py", line 460, in fit
for_training=True, force_rebind=force_rebind)
File "/home/cougarnet.uh.edu/csmailis/mx-maskrcnn/rcnn/core/module.py", line 141, in bind
force_rebind=False, shared_module=None)
File "/home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/module/module.py", line 417, in bind
state_names=self._state_names)
File "/home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/module/executor_group.py", line 231, in init
self.bind_exec(data_shapes, label_shapes, shared_group)
File "/home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/module/executor_group.py", line 327, in bind_exec
shared_group))
File "/home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/module/executor_group.py", line 603, in _bind_ith_exec
shared_buffer=shared_data_arrays, **input_shapes)
File "/home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/symbol.py", line 1479, in simple_bind
raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (1, 3, 1024, 2048)
bbox_weight: (1, 12, 174592)
bbox_target: (1, 12, 174592)
label: (1, 523776)
[14:30:12] src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory

Stack trace returned 10 entries:
[bt] (0) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f71355ef3cc]
[bt] (1) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet7storage23GPUPooledStorageManager5AllocEm+0x15e) [0x7f71365e80de]
[bt] (2) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet11StorageImpl5AllocEmNS_7ContextE+0x69) [0x7f71365eb5e9]
[bt] (3) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(+0x172088f) [0x7f713665b88f]
[bt] (4) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor19InitDataEntryMemoryEPSt6vectorINS_7NDArrayESaIS3_EE+0x2a54) [0x7f71366643b4]
[bt] (5) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor15FinishInitGraphEN4nnvm6SymbolENS2_5GraphEPNS_8ExecutorERKSt13unordered_mapINS2_9NodeEntryENS_7NDArrayENS2_13NodeEntryHashENS2_14NodeEntryEqualESaISt4pairIKS8_S9_EEE+0xa11) [0x7f713666ac81]
[bt] (6) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor4InitEN4nnvm6SymbolERKNS_7ContextERKSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES4_St4lessISD_ESaISt4pairIKSD_S4_EEERKSt6vectorIS4_SaIS4_EESR_SR_RKSt13unordered_mapISD_NS2_6TShapeESt4hashISD_ESt8equal_toISD_ESaISG_ISH_ST_EEERKSS_ISD_iSV_SX_SaISG_ISH_iEEERKSN_INS_9OpReqTypeESaIS18_EERKSt13unordered_setISD_SV_SX_SaISD_EEPSN_INS_7NDArrayESaIS1I_EES1L_S1L_PSS_ISD_S1I_SV_SX_SaISG_ISH_S1I_EEEPNS_8ExecutorERKSS_INS2_9NodeEntryES1I_NS2_13NodeEntryHashENS2_14NodeEntryEqualESaISG_IKS1S_S1I_EEE+0x909) [0x7f713666d1e9]
[bt] (7) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(ZN5mxnet8Executor10SimpleBindEN4nnvm6SymbolERKNS_7ContextERKSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES3_St4lessISC_ESaISt4pairIKSC_S3_EEERKSt6vectorIS3_SaIS3_EESQ_SQ_RKSt13unordered_mapISC_NS1_6TShapeESt4hashISC_ESt8equal_toISC_ESaISF_ISG_SS_EEERKSR_ISC_iSU_SW_SaISF_ISG_iEEERKSM_INS_9OpReqTypeESaIS17_EERKSt13unordered_setISC_SU_SW_SaISC_EEPSM_INS_7NDArrayESaIS1H_EES1K_S1K_PSR_ISC_S1H_SU_SW_SaISF_ISG_S1H_EEEPS0+0x233) [0x7f713666d813]
[bt] (8) /home/cougarnet.uh.edu/csmailis/mx-maskrcnn/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(MXExecutorSimpleBind+0x2d4a) [0x7f713662c40a]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f715bc80e40]

Any ideas on how to resolve this ?

Some problem about preparing dataset

Hello! Thanks for your amazing work !

I want to prepare my dataset like Cityscapes, however I am confusing about the annotation in cityscapes dataset. In load_from_seg(rcnn/dataset/cityscape) function, what's the meaning of ins_id ?

I tried to parse gtFine png file and found that each class is 1000*class_id and plus instance_id ? Is that correct ? ex: Label 23 have 8 instance, therefore the value in segmentation is [23000, 23001 ...23007], however, in aachen_000000_000019_gtFine_instanceIds.png the ids of label 26 is
[26003, 26004, 26005, 26006, 26007, 26008, 26009, 26010]

To sum up , I will appreciate if you can explain what is ins_id for and perhaps the annotation format of cityscapes (Cannot find detailed doc in official website :( ). Thank you very much.

compile mxnet error with cuda 8.0

/home/cgangee/code/mxnet/mshadow/mshadow/./base.h:371:43: error: ‘CUDA_R_32I’ was not declared in this scope
static const cudaDataType_t kCudaFlag = CUDA_R_32I;

Scale Problem

I find that your code may not correct when using multi-scale dataset.
After I fix the scale error about un-used Config.scale. The network always get nan loss, when training the multi-scale, especially when changing scale size from [1024, 2048] to [640, 1024].

BTW, I do change the value of config.TRAIN.SCALE to True. And the core/tester.py line160 and line226 will get error on evaluation. If you do not set "scale" into function "im_detect_mask"!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.