Coder Social home page Coder Social logo

open-mmlab / mmdetection Goto Github PK

View Code? Open in Web Editor NEW
28.9K 371.0 9.4K 63.25 MB

OpenMMLab Detection Toolbox and Benchmark

Home Page: https://mmdetection.readthedocs.io

License: Apache License 2.0

Shell 0.77% Python 99.16% Dockerfile 0.07%
object-detection instance-segmentation fast-rcnn faster-rcnn mask-rcnn cascade-rcnn ssd retinanet pytorch panoptic-segmentation

mmdetection's People

Contributors

aronlin avatar bigwangyudong avatar chhluo avatar czm369 avatar daavoo avatar erotemic avatar hellock avatar hhaandroid avatar jbwang1997 avatar johnson-wang avatar jshilong avatar mambawong avatar melikovk avatar mxbonn avatar myownskyw7 avatar oceanpang avatar rangeking avatar rangilyu avatar runningleon avatar ryanxli avatar sanbuphy avatar shinya7y avatar thangvubk avatar tianyuandu avatar v-qjqs avatar xvjiarui avatar yhcao6 avatar zwhus avatar zwwwayne avatar zytx121 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mmdetection's Issues

RetinaNet

Hi, hellock
Tanks for bring such great project to us! but, where can I find the retinanet code, do you release this one. I personal need a pytorch version retinanet as a baseline in my experiment, and there is a deadline. Can you help me? Thanks a lot!!!

error when resume from checkpoing

I try to train a model resume from checkpoint epoch_5.pth, but got the following message:

Traceback (most recent call last): File "tools/train.py", line 84, in <module> main() File "tools/train.py", line 80, in main logger=logger) File "/home/rusu5516/anaconda3/envs/pytorch4/lib/python3.5/site-packages/mmdet-0.5.0+c21ff08-py3.5.egg/mmdet/apis/train.py", line 59, in train_detector _non_dist_train(model, dataset, cfg, validate=validate) File "/home/rusu5516/anaconda3/envs/pytorch4/lib/python3.5/site-packages/mmdet-0.5.0+c21ff08-py3.5.egg/mmdet/apis/train.py", line 117, in _non_dist_train runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/rusu5516/anaconda3/envs/pytorch4/lib/python3.5/site-packages/mmcv-0.2.0-py3.5.egg/mmcv/runner/runner.py", line 349, in run epoch_runner(data_loaders[i], **kwargs) File "/home/rusu5516/anaconda3/envs/pytorch4/lib/python3.5/site-packages/mmcv-0.2.0-py3.5.egg/mmcv/runner/runner.py", line 262, in train self.call_hook('after_train_iter') File "/home/rusu5516/anaconda3/envs/pytorch4/lib/python3.5/site-packages/mmcv-0.2.0-py3.5.egg/mmcv/runner/runner.py", line 222, in call_hook getattr(hook, fn_name)(self) File "/home/rusu5516/anaconda3/envs/pytorch4/lib/python3.5/site-packages/mmcv-0.2.0-py3.5.egg/mmcv/runner/hooks/optimizer.py", line 20, in after_train_iter runner.optimizer.step() File "/home/rusu5516/anaconda3/envs/pytorch4/lib/python3.5/site-packages/torch/optim/sgd.py", line 101, in step buf.mul_(momentum).add_(1 - dampening, d_p) RuntimeError: The expanded size of the tensor (480) must match the existing size (64) at non-singleton dimension 1

ERROR: Unexpected segmentation fault encountered in worker.

I'm training Mask RCNN and got this error during the 9th epoch. It seems like a dataloader deadlock.

I also encountered dataloader deadlock using Detectron.pytorch code before and the solution was to train with 1 img/gpu. (Check this issue)

Any idea what might cause this problem? I'm not sure whether it is a PyTorch dataloader problem or the dataset function: __getitem()__ problem.

Thanks in advance.

error when testing

When I test my trained model using the following command, I get something wrong

python tools/test.py configs/mask_rcnn_r50_fpn_1x.py  ./work_dirs/mask_rcnn_r50_fpn_1x/latest.pth --gpus 2 --eval proposal_fast --out results.pkl
loading annotations into memory...
Done (t=0.27s)
creating index...
index created!
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 11.5 task/s, elapsed: 436s, ETA:     0s

writing results to results.pkl
Starting evaluate proposal_fast
Traceback (most recent call last):
  File "tools/test.py", line 114, in <module>
    main()
  File "tools/test.py", line 110, in main
    coco_eval(result_file, eval_types, dataset.coco)
  File "/home/jiangboyuan/mmdetection/mmdet/core/evaluation/coco_utils.py", line 20, in coco_eval
    ar = fast_eval_recall(result_file, coco, np.array(max_dets))
  File "/home/jiangboyuan/mmdetection/mmdet/core/evaluation/coco_utils.py", line 73, in fast_eval_recall
    gt_bboxes, results, max_dets, iou_thrs, print_summary=False)
  File "/home/jiangboyuan/mmdetection/mmdet/core/evaluation/recall.py", line 86, in eval_recalls
    if proposals[i].ndim == 2 and proposals[i].shape[1] == 5:
AttributeError: 'tuple' object has no attribute 'ndim'

Any suggestion about how to resolve it?

Has this code been tested with cuda 10.0?

Hi
I have problems when running the 'compile.sh'. I got the following error:

`/home/yao/.local/lib/python2.7/site-packages/torch/lib/include/THC/THCAtomics.cuh(100): error: cannot overload functions distinguished by return type alone

/home/yao/.local/lib/python2.7/site-packages/torch/lib/include/THC/THCAtomics.cuh(123): error: return value type does not match the function type

2 errors detected in the compilation of "/tmp/tmpxft_0000150c_00000000-4_roi_align_kernel.cpp4.ii".
error: command '/usr/local/cuda/bin/nvcc' failed with exit status 2
`
It seems that the problems is with the cuda version. What cuda version has been used testing this package? I am using cuda 10.0 and pytorch 0.4.1.

BUG REPORT for convfc_bbox_head.py

AttributeError: 'ConvFCRoIHead' object has no attribute 'normalize'/'with_bias':

when i change type of bbox_head in cfg from 'SharedFCRoIHead' to 'ConvFCRoIHead'. the above error occurs.

i add the two attribute like below and run pip install . the error fixed.

class ConvFCRoIHead(BBoxHead):
    """More general bbox head, with shared conv and fc layers and two optional
    separated branches.

                                /-> cls convs -> cls fcs -> cls
    shared convs -> shared fcs
                                \-> reg convs -> reg fcs -> reg
    """

    def __init__(self,
                 num_shared_convs=0,
                 num_shared_fcs=0,
                 num_cls_convs=0,
                 num_cls_fcs=0,
                 num_reg_convs=0,
                 num_reg_fcs=0,
                 conv_out_channels=256,
                 fc_out_channels=1024,
                 normalize=None,  #  add this line
                 with_bias=False,  #  add this line
                 *args,
                 **kwargs):
        super(ConvFCRoIHead, self).__init__(*args, **kwargs)
        assert (num_shared_convs + num_shared_fcs + num_cls_convs + num_cls_fcs
                + num_reg_convs + num_reg_fcs > 0)
        if num_cls_convs > 0 or num_reg_convs > 0:
            assert num_shared_fcs == 0
        if not self.with_cls:
            assert num_cls_convs == 0 and num_cls_fcs == 0
        if not self.with_reg:
            assert num_reg_convs == 0 and num_reg_fcs == 0
        self.num_shared_convs = num_shared_convs
        self.num_shared_fcs = num_shared_fcs
        self.num_cls_convs = num_cls_convs
        self.num_cls_fcs = num_cls_fcs
        self.num_reg_convs = num_reg_convs
        self.num_reg_fcs = num_reg_fcs
        self.conv_out_channels = conv_out_channels
        self.fc_out_channels = fc_out_channels
        self.normalize = normalize  #  add this line
        self.with_bias = with_bias   #  add this line

below is a config example for 'ConvFCRoIHead' in bbox_head

bbox_head=dict(
        type='ConvFCRoIHead',
        num_shared_convs=2,
        num_shared_fcs=0,
        num_cls_convs=1,
        num_cls_fcs=2,
        num_reg_convs=1,
        num_reg_fcs=2,
        conv_out_channels=256,
        fc_out_channels=1024,
        normalize={'type': 'BN'},
        # BBoxHead
        in_channels=256,
        roi_feat_size=7,
        num_classes=11,
        target_means=[0., 0., 0., 0.],
        target_stds=[0.1, 0.1, 0.2, 0.2],
        reg_class_agnostic=False))

Pytorch 1.0 support feature

As pytorch already announced pytorch 1.0, it would be state-of-art if api can running on newest version pytorch

can this run on cuda9.2 ?

i saw your Software environment is:

  • Python 3.6 / 3.7
  • PyTorch 0.4.1
  • CUDA 9.0.176
  • CUDNN 7.0.4
  • NCCL 2.1.15

the time field of the training log is in minutes or in seconds?

I wonder the time field of the training log is in minutes or in seconds? As below:

2018-10-23 10:51:07,015 - INFO - Epoch [3][50/2500] lr: 0.02000, time: 1.090, data_time: 0.015, loss_reg: 0.0096, acc: 99.0479, loss_cls: 0.0230, loss_rpn_cls: 0.0012, loss_rpn_reg: 0.0055, loss: 0.0394
2018-10-23 10:51:58,164 - INFO - Epoch [3][100/2500] lr: 0.02000, time: 1.023, data_time: 0.006, loss_reg: 0.0099, acc: 99.0254, loss_cls: 0.0234, loss_rpn_cls: 0.0011, loss_rpn_reg: 0.0060, loss: 0.0403
2018-10-23 10:52:50,521 - INFO - Epoch [3][150/2500] lr: 0.02000, time: 1.047, data_time: 0.007, loss_reg: 0.0097, acc: 98.9961, loss_cls: 0.0237, loss_rpn_cls: 0.0011, loss_rpn_reg: 0.0064, loss: 0.0408

CPU support?

Hi,

Is it possible to run mmdetection on CPU (without GPU)?

compile error

when I run ./compile.sh, I got the following errors.

`/home/rusu5516/miniconda3/envs/pytorch4/lib/python3.5/site-packages/torch/lib/include/ATen/TensorMethods.h:646:36: required from here
/usr/include/c++/6/tuple:483:67: error: mismatched argument pack lengths while expanding ‘std::is_constructible<_Elements, _UElements&&>’
return _and<is_constructible<_Elements, _UElements&&>...>::value;
^~~~~
/usr/include/c++/6/tuple:484:1: error: body of constexpr function ‘static constexpr bool std::_TC<, _Elements>::_MoveConstructibleTuple() [with _UElements = {std::tuple<at::Tensor, at::Tensor, at::Tensor>}; bool = true; _Elements = {at::Tensor, at::Tensor, at::Tensor}]’ not a return-statement
}

/home/rusu5516/miniconda3/envs/pytorch4/lib/python3.5/site-packages/torch/lib/include/ATen/TensorMethods.h:646:36: required from here
/usr/include/c++/6/tuple:489:65: error: mismatched argument pack lengths while expanding ‘std::is_convertible<_UElements&&, _Elements>’
return _and<is_convertible<_UElements&&, _Elements>...>::value;
^~~~~
/usr/include/c++/6/tuple:490:1: error: body of constexpr function ‘static constexpr bool std::_TC<, _Elements>::_ImplicitlyMoveConvertibleTuple() [with _UElements = {std::tuple<at::Tensor, at::Tensor, at::Tensor>}; bool = true; _Elements = {at::Tensor, at::Tensor, at::Tensor}]’ not a return-statement
}

`

and plenty of these.

Good design choice

Good design choice for Iteration pipeline.
As for TwoStageDetector, especially the pyramid structure in FPN, lots of multi-image argsort in rpn_head will be bottleneck when you attempt to enlarge the batchsize. This affects training speed and GPU utility dramatically as the batchsize getting larger.
The SNIPER uses half-precision training and smaller input image size allows him has a larger batchsize even on a single GPU, he adopts the two stages training process instead of end-to-end to cope with this bottleneck.
I have tried to speed up the rpn_head by multi-processing or multi-threading in python directly. While, because of the existence of the fork overhead and the GIL, both of them seems not a choice.
Do you have any idea on speeding up the rpn_head directly?

how to create results.pkl?

when I train the model, I run:
python tools/train.py /home1/clx/mmdetection/configs/mask_rcnn_r50_fpn_1x.py --gpus 2 --work_dir /home1/clx/mmdetection/logs/logs_mask/ --validat

but after the training process, in the work_dir, there exists a file named "20181019_155702.log"

Segmentation fault

when run the script “python tools/train.py configs/faster_rcnn_r50_fpn_1x.py --gpus 1 --work_dir logs --validate ”,meet the problem
2018-10-13 23:54:59,086 - INFO - workflow: [('train', 1)], max: 12 epochs
Segmentation fault

RuntimeError: arguments are located on different GPUs when test

after training a faster rcnn model, the runtime error occurs when using the provided test.py or the code. following is the trace back。

i trained the model on gpu1. the error always occurs whether i use gpu1 or others.

Traceback (most recent call last):
File "tools/test.py", line 121, in
main(arguments)
File "tools/test.py", line 86, in main
outputs = single_test(model, data_loader, args.show)
File "tools/test.py", line 20, in single_test
result = model(return_loss=False, rescale=not show, **data)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/mmdet/models/detectors/base.py", line 81, in forward
return self.forward_test(img, img_meta, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/mmdet/models/detectors/base.py", line 73, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/usr/local/lib/python3.5/dist-packages/mmdet/models/detectors/two_stage.py", line 149, in simple_test
self.test_cfg.rpn) if proposals is None else proposals
File "/usr/local/lib/python3.5/dist-packages/mmdet/models/detectors/test_mixins.py", line 10, in simple_test_rpn
proposal_list = self.rpn_head.get_proposals(*proposal_inputs)
File "/usr/local/lib/python3.5/dist-packages/mmdet/models/rpn_heads/rpn_head.py", line 197, in get_proposals
img_meta[img_id]['img_shape'], cfg)
File "/usr/local/lib/python3.5/dist-packages/mmdet/models/rpn_heads/rpn_head.py", line 229, in _get_proposals_single
self.target_stds, img_shape)
File "/usr/local/lib/python3.5/dist-packages/mmdet/core/bbox/transforms.py", line 54, in delta2bbox
gw = pw * dw.exp()
RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:314

code in test.py

def parse_args(args):
    parser = argparse.ArgumentParser(description='MMDet test detector')
    parser.add_argument('config', help='test config file path')
    parser.add_argument('checkpoint', help='checkpoint file')
    parser.add_argument(
        '--gpus', default=1, type=int, help='GPU number used for testing')
    parser.add_argument(
        '--proc_per_gpu',
        default=1,
        type=int,
        help='Number of processes per GPU')
    parser.add_argument('--out', help='output result file')
    parser.add_argument(
        '--eval',
        type=str,
        nargs='+',
        choices=['proposal', 'proposal_fast', 'bbox', 'segm', 'keypoints'],
        help='eval types')
    parser.add_argument('--show', action='store_true', help='show results')
    args = parser.parse_args(args)
    return args


def main(args):
    args = parse_args(args)

    if args.out is not None and not args.out.endswith(('.pkl', '.pickle')):
        raise ValueError('The output file must be a pkl file.')

    cfg = mmcv.Config.fromfile(args.config)
    cfg.model.pretrained = None
    cfg.data.test.test_mode = True

    dataset = obj_from_dict(cfg.data.test, datasets, dict(test_mode=True))
    if args.gpus == 1:
        model = build_detector(
            cfg.model, train_cfg=None, test_cfg=cfg.test_cfg)
        load_checkpoint(model, args.checkpoint)
        model = MMDataParallel(model, device_ids=[1])
        print('using cuda 1')

        data_loader = build_dataloader(
            dataset,
            imgs_per_gpu=1,
            workers_per_gpu=cfg.data.workers_per_gpu,
            num_gpus=1,
            dist=False,
            shuffle=False)
        outputs = single_test(model, data_loader, args.show)
    else:
        model_args = cfg.model.copy()
        model_args.update(train_cfg=None, test_cfg=cfg.test_cfg)
        model_type = getattr(detectors, model_args.pop('type'))
        outputs = parallel_test(
            model_type,
            model_args,
            args.checkpoint,
            dataset,
            _data_func,
            range(args.gpus),
            workers_per_gpu=args.proc_per_gpu)

    if args.out:
        print('writing results to {}'.format(args.out))
        mmcv.dump(outputs, args.out)
        eval_types = args.eval
        if eval_types:
            print('Starting evaluate {}'.format(' and '.join(eval_types)))
            if eval_types == ['proposal_fast']:
                result_file = args.out
            else:
                result_file = args.out + '.json'
                results2json(dataset, outputs, result_file)
            coco_eval(result_file, eval_types, dataset.coco)


if __name__ == '__main__':
    arguments = [
        'configs/faster_rcnn_r50_fpn_1x.py',
        'data/faster_rcnn_r50_fpn_1x/epoch_12.pth',
        '--gpus=1',
        '--out=test.pkl'
    ]
    main(arguments)

code in readme

def infer():
    import mmcv
    from mmcv.runner import load_checkpoint
    from mmdet.models import build_detector
    from mmdet.apis import inference_detector, show_result
    from mmcv.parallel import scatter, collate, MMDataParallel

    cfg = mmcv.Config.fromfile('configs/faster_rcnn_r50_fpn_1x.py')
    cfg.model.pretrained = None

    # construct the model and load checkpoint
    model = build_detector(cfg.model, test_cfg=cfg.test_cfg)
    print(model)
    checkpoint_path = 'data/faster_rcnn_r50_fpn_1x/epoch_12.pth'
    _ = load_checkpoint(model, checkpoint_path, map_location='cuda:1')
    model = MMDataParallel(model, device_ids=[1])
    # test a single image
    img_path = '/workspace/nas/test/'
    img = mmcv.imread(img_path+'0.jpg')
    result = inference_detector(model, img, cfg, device='cuda:1')
    show_result(img, result)

    # test a list of images
    # img_path = '/workspace/nas/test/'
    # imgs = ['0.jpg', '1.jpg']
    # imgs = [img_path+img for img in imgs]
    # for i, result in enumerate(inference_detector(model, imgs, cfg, device='cuda:2')):
    #     print(i, imgs[i])
    #     show_result(imgs[i], result)

The parameter 'validate' is not work.

I find the _non_dist_train() function in mmdet/apis/train.py does not use the parameter 'validate', so I could not see the validate result even though I use the parameter '--validate'.

FishNet backbone and Guided Anchoring?

Thanks the team very much for sharing the code!

When I read the code, I cannot find the FisherNet backbone and the Guided Anchoring design. Will these parts be released in the future, or just they lie in somewhere that I haven't noticed?

Is caffe-style pretrained models available?

Detectron.pytorch allows us to finetune from Detectron weights. Is this possible for this repo or do you provide the pretrained weights which are trained in caffe style? Somehow the performance of caffe style models is better.

Cannot train a (faster + res50 + fpn) model with similar mAP using one GPU

Hello, Thanks for your excellent work.
I firstly use the pre-trained model of "faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth",
and test using one Titan xp on server.
the mAP show a similar result:
image

Then I use the following cmd to train: python ./tools/train.py ./configs/faster_rcnn_r50_fpn_1x.py --gpus 1 --work_dir ./experiments/faster_rcnn_r50_fpn_1x
On the only one Titan xp, it costs 3 days...
When I test it use the same scripts, the mAP shows an inferior result:
image
I guess this might because I only use one GPU to train. But the total number of mini-batch is fixed, and there exists no BN option for res50 on single GPU...I cannot figure why the mAP drops nearly 10 points...
Could you please give some advice

how to use my own dataset

hello! Thanks for your nice work.
If i want to train using my own dataset, what should i do?
Our dataset is pascal voc format medical image dataset.

How to train with cityscapes dataset?

I use the python scripts convert_cityscapes_to_coco.py and successfully convert the cityscapes dataset to coco. But when I modified the config file faster_rcnn_r50_fpn_1x.py and use the command python tools/train.py configs/faster_rcnn_r50_fpn_1x.py --gpus 2 --work_dir ./out --validate to train, I got the error:

`2018-10-16 11:05:01,156 - INFO - Distributed training: False
2018-10-16 11:05:01,600 - INFO - load model from: modelzoo://resnet50
2018-10-16 11:05:01,825 - WARNING - unexpected key in source state_dict: fc.weight, fc.bias

missing keys in source state_dict: layer3.4.bn1.num_batches_tracked, layer1.0.downsample.1.num_batches_tracked, layer3.5.bn2.num_batches_tracked, layer3.0.bn2.num_batches_tracked, layer4.1.bn3.num_batches_tracked, layer2.1.bn1.num_batches_tracked, layer3.4.bn3.num_batches_tracked, layer1.1.bn2.num_batches_tracked, layer3.3.bn3.num_batches_tracked, layer2.2.bn3.num_batches_tracked, layer3.1.bn1.num_batches_tracked, layer1.2.bn1.num_batches_tracked, layer4.0.bn1.num_batches_tracked, layer2.3.bn2.num_batches_tracked, layer3.1.bn3.num_batches_tracked, layer4.2.bn1.num_batches_tracked, layer3.4.bn2.num_batches_tracked, layer4.0.bn2.num_batches_tracked, layer4.2.bn3.num_batches_tracked, layer2.1.bn3.num_batches_tracked, layer2.0.bn3.num_batches_tracked, layer1.0.bn1.num_batches_tracked, layer1.0.bn3.num_batches_tracked, layer4.1.bn2.num_batches_tracked, layer3.0.bn1.num_batches_tracked, layer1.0.bn2.num_batches_tracked, layer3.2.bn1.num_batches_tracked, layer2.1.bn2.num_batches_tracked, layer4.1.bn1.num_batches_tracked, layer3.0.downsample.1.num_batches_tracked, layer1.2.bn3.num_batches_tracked, layer2.0.downsample.1.num_batches_tracked, layer2.3.bn3.num_batches_tracked, layer3.3.bn2.num_batches_tracked, layer1.1.bn1.num_batches_tracked, layer4.2.bn2.num_batches_tracked, layer3.3.bn1.num_batches_tracked, layer1.1.bn3.num_batches_tracked, layer2.0.bn1.num_batches_tracked, layer3.0.bn3.num_batches_tracked, layer2.3.bn1.num_batches_tracked, layer3.1.bn2.num_batches_tracked, layer4.0.downsample.1.num_batches_tracked, layer2.0.bn2.num_batches_tracked, layer2.2.bn2.num_batches_tracked, layer3.5.bn1.num_batches_tracked, layer3.2.bn3.num_batches_tracked, layer2.2.bn1.num_batches_tracked, layer3.2.bn2.num_batches_tracked, bn1.num_batches_tracked, layer3.5.bn3.num_batches_tracked, layer4.0.bn3.num_batches_tracked, layer1.2.bn2.num_batches_tracked

loading annotations into memory...
Done (t=4.59s)
creating index...
index created!
2018-10-16 11:05:09,360 - INFO - Start running, host: chenkai@Autodrive, work_dir: /home/chenkai/Documents/mmdetection/out
2018-10-16 11:05:09,360 - INFO - workflow: [('train', 1)], max: 12 epochs
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f2cce44c1d0>>
Traceback (most recent call last):
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 399, in del
self._shutdown_workers()
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers
self.worker_result_queue.get()
File "/usr/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 737, in answer_challenge
response = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Traceback (most recent call last):
File "tools/train.py", line 82, in
main()
File "tools/train.py", line 78, in main
logger=logger)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/apis/train.py", line 59, in train_detector
_non_dist_train(model, dataset, cfg, validate=validate)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/apis/train.py", line 117, in _non_dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmcv-0.2.0-py3.6.egg/mmcv/runner/runner.py", line 349, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmcv-0.2.0-py3.6.egg/mmcv/runner/runner.py", line 255, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/apis/train.py", line 37, in batch_processor
losses = model(**data)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 123, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 133, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 77, in parallel_apply
raise output
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 53, in _worker
output = module(*input, **kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/models/detectors/base.py", line 79, in forward
return self.forward_train(img, img_meta, **kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/models/detectors/two_stage.py", line 111, in forward_train
self.train_cfg.rcnn)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/models/bbox_heads/bbox_head.py", line 73, in get_bbox_target
target_stds=self.target_stds)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/core/bbox/bbox_target.py", line 25, in bbox_target
target_stds=target_stds)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/core/utils/misc.py", line 24, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/core/bbox/bbox_target.py", line 62, in proposal_target_single
labels, reg_num_classes)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/core/bbox/bbox_target.py", line 75, in expand_target
bbox_targets_expand[i, start:end] = bbox_targets[i, :]
RuntimeError: The expanded size of the tensor (0) must match the existing size (4) at non-singleton dimension 0`

Could you help me to solve this problem? @hellock

Good design choice, do you have any idea on speeding up the rpn_head directly?

Good design choice for Iteration pipeline.
As for TwoStageDetector, especially the pyramid structure in FPN, lots of multi-image argsort in rpn_head will be bottleneck when you attempt to enlarge the batchsize. This affects training speed and GPU utility dramatically as the batchsize getting larger.
The SNIPER uses half-precision training and smaller input image size allows him has a larger batchsize even on a single GPU, he adopts the two stages training process instead of end-to-end to cope with this bottleneck.
I have tried to speed up the rpn_head by multi-processing or multi-threading in python directly. While, because of the existence of the fork overhead and the GIL, both of them seems not a choice.
Do you have any idea on speeding up the rpn_head directly?

ImportError: roi_align_cuda

I compiled and test images use this example, but when I import, I go this error?
Traceback (most recent call last):
File "tools/test.py", line 9, in
from mmdet.core import results2json, coco_eval
File "/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/mmdet/core/init.py", line 6, in
from .post_processing import * # noqa: F401, F403
File "/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/mmdet/core/post_processing/init.py", line 1, in
from .bbox_nms import multiclass_nms
File "/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/mmdet/core/post_processing/bbox_nms.py", line 3, in
from mmdet.ops import nms
File "/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/mmdet/ops/init.py", line 2, in
from .roi_align import RoIAlign, roi_align
File "/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/mmdet/ops/roi_align/init.py", line 1, in
from .functions.roi_align import roi_align
File "/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/mmdet/ops/roi_align/functions/roi_align.py", line 3, in
from .. import roi_align_cuda
ImportError: cannot import name 'roi_align_cuda'

I check floder, no roi_align_cuda, How I solve it? Messges in my compile:
Building roi align op...
running build_ext
building 'roi_align_cuda' extension
creating build
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/src
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/torch/lib/include -I/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/torch/lib/include/TH -I/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.5m -I/home/yu/.virtualenvs/Pytorch/include/python3.5m -c src/roi_align_cuda.cpp -o build/temp.linux-x86_64-3.5/src/roi_align_cuda.o -DTORCH_EXTENSION_NAME=roi_align_cuda -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
src/roi_align_cuda.cpp: In function ‘int roi_align_forward_cuda(at::Tensor, at::Tensor, int, int, float, int, at::Tensor)’:
src/roi_align_cuda.cpp:20:80: error: ‘AT_CHECK’ was not declared in this scope
#define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, " must be a CUDAtensor ")
^
src/roi_align_cuda.cpp:24:3: note: in expansion of macro ‘CHECK_CUDA’
CHECK_CUDA(x);
^
src/roi_align_cuda.cpp:31:3: note: in expansion of macro ‘CHECK_INPUT’
CHECK_INPUT(features);
^
src/roi_align_cuda.cpp: In function ‘int roi_align_backward_cuda(at::Tensor, at::Tensor, int, int, float, int, at::Tensor)’:
src/roi_align_cuda.cpp:20:80: error: ‘AT_CHECK’ was not declared in this scope
#define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, " must be a CUDAtensor ")
^
src/roi_align_cuda.cpp:24:3: note: in expansion of macro ‘CHECK_CUDA’
CHECK_CUDA(x);
^
src/roi_align_cuda.cpp:59:3: note: in expansion of macro ‘CHECK_INPUT’
CHECK_INPUT(top_grad);
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Building roi pool op...
running build_ext
building 'roi_pool_cuda' extension
creating build
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/src
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/torch/lib/include -I/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/torch/lib/include/TH -I/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.5m -I/home/yu/.virtualenvs/Pytorch/include/python3.5m -c src/roi_pool_cuda.cpp -o build/temp.linux-x86_64-3.5/src/roi_pool_cuda.o -DTORCH_EXTENSION_NAME=roi_pool_cuda -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
src/roi_pool_cuda.cpp: In function ‘int roi_pooling_forward_cuda(at::Tensor, at::Tensor, int, int, float, at::Tensor, at::Tensor)’:
src/roi_pool_cuda.cpp:19:80: error: ‘AT_CHECK’ was not declared in this scope
#define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, " must be a CUDAtensor ")
^
src/roi_pool_cuda.cpp:23:3: note: in expansion of macro ‘CHECK_CUDA’
CHECK_CUDA(x);
^
src/roi_pool_cuda.cpp:30:3: note: in expansion of macro ‘CHECK_INPUT’
CHECK_INPUT(features);
^
src/roi_pool_cuda.cpp: In function ‘int roi_pooling_backward_cuda(at::Tensor, at::Tensor, at::Tensor, float, at::Tensor)’:
src/roi_pool_cuda.cpp:19:80: error: ‘AT_CHECK’ was not declared in this scope
#define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, " must be a CUDAtensor ")
^
src/roi_pool_cuda.cpp:23:3: note: in expansion of macro ‘CHECK_CUDA’
CHECK_CUDA(x);
^
src/roi_pool_cuda.cpp:57:3: note: in expansion of macro ‘CHECK_INPUT’
CHECK_INPUT(top_grad);
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Building nms op...
rm .so
echo "Compiling nms kernels..."
Compiling nms kernels...
python setup.py build_ext --inplace
running build_ext
building 'cpu_nms' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/cuda/include -I/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -I/home/yu/.virtualenvs/Pytorch/include/python3.5m -c cpu_nms.cpp -o build/temp.linux-x86_64-3.5/cpu_nms.o -Wno-unused-function -Wno-write-strings
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include/numpy/ndarraytypes.h:1821:0,
from /home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from /home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from cpu_nms.cpp:621:
/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.5/cpu_nms.o -L/usr/local/cuda/lib64 -lcudart -o /home/yu/mmdetection/mmdet/ops/nms/cpu_nms.cpython-35m-x86_64-linux-gnu.so
building 'gpu_nms' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/cuda/include -I/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -I/home/yu/.virtualenvs/Pytorch/include/python3.5m -c gpu_nms.cpp -o build/temp.linux-x86_64-3.5/gpu_nms.o -Wno-unused-function -Wno-write-strings
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include/numpy/ndarraytypes.h:1821:0,
from /home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from /home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from gpu_nms.cpp:623:
/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
/usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -I/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -I/home/yu/.virtualenvs/Pytorch/include/python3.5m -c nms_kernel.cu -o build/temp.linux-x86_64-3.5/nms_kernel.o -arch=sm_52 --ptxas-options=-v -c --compiler-options -fPIC
ptxas info : 0 bytes gmem
ptxas info : Compiling entry function '_Z10nms_kernelifPKfPy' for 'sm_52'
ptxas info : Function properties for _Z10nms_kernelifPKfPy
128 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 38 registers, 20480 bytes smem, 344 bytes cmem[0], 12 bytes cmem[2]
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.5/gpu_nms.o build/temp.linux-x86_64-3.5/nms_kernel.o -L/usr/local/cuda/lib64 -lcudart -o /home/yu/mmdetection/mmdet/ops/nms/gpu_nms.cpython-35m-x86_64-linux-gnu.so
building 'cpu_soft_nms' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/cuda/include -I/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -I/home/yu/.virtualenvs/Pytorch/include/python3.5m -c cpu_soft_nms.cpp -o build/temp.linux-x86_64-3.5/cpu_soft_nms.o -Wno-unused-function -Wno-write-strings
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include/numpy/ndarraytypes.h:1821:0,
from /home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from /home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from cpu_soft_nms.cpp:621:
/home/yu/.virtualenvs/Pytorch/lib/python3.5/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
cpu_soft_nms.cpp: In function ‘PyObject
__pyx_pf_12cpu_soft_nms_cpu_soft_nms(PyObject*, PyArrayObject*, float, float, float, unsigned int)’:
cpu_soft_nms.cpp:2450:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
__pyx_t_11 = ((__pyx_v_pos < __pyx_v_N) != 0);
^
cpu_soft_nms.cpp:2961:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
__pyx_t_11 = ((__pyx_v_pos < __pyx_v_N) != 0);
^
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.5/cpu_soft_nms.o -L/usr/local/cuda/lib64 -lcudart -o /home/yu/mmdetection/mmdet/ops/nms/cpu_soft_nms.cpython-35m-x86_64-linux-gnu.so

Add high-level APIs for training and inference

Training

from mmcv import Config

cfg = Config.fromfile('xx_config.py')
# users can overwrite some config values here
model = build_detector(cfg.model, train_cfg=cfg.train_cfg)
train_detector(model)

Inference

from mmcv import Config
from mmcv.runner import load_checkpoint

cfg = Config.fromfile('xx_config.py')
model = build_detector(cfg.model, train_cfg=cfg.train_cfg)
# another method to construct a model is "model = FasterRCNN(**kwargs)"
load_checkpoint(model, 'checkpoint.pth')
bboxes = inference_detector(model, 'a.jpg', device='cuda:0')

ERROR: assert hasattr(dataset, 'flag')

I used the command

python tools/test.py configs/mask_rcnn_r50_fpn_1x.py weights/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth --gpus 1 --out result.pkl

to test the pretrained model. But I meet one error. The output is

loading annotations into memory...
Done (t=0.61s)
creating index...
index created!
Traceback (most recent call last):
File "tools/test.py", line 114, in
main()
File "tools/test.py", line 84, in main
shuffle=False)
File "/home/jwwangchn/anaconda2/lib/python2.7/site-packages/mmdet/datasets/loader/build_loader.py", line 28, in build_dataloader
sampler = GroupSampler(dataset, imgs_per_gpu)
File "/home/jwwangchn/anaconda2/lib/python2.7/site-packages/mmdet/datasets/loader/sampler.py", line 14, in init
assert hasattr(dataset, 'flag')
AssertionError

I have installed mmcv 0.2.0 by source. I want to know how to solve this problem.

how to print the intermediate variables

Since i want to do some adjustments in RPNHead, but i can't even print strings(mush less saving Tensor) after i add print("###") in rpn_head.py or even in apis/train.py when i training faster_rcnn using the example_configs.
image
But got nothing "##" out in the log when the losses appeared

Training with default "configs/faster_rcnn_r50_fpn_1x.py" only gets AP=35.6

Hi,
Thanks for sharing this great work.

I tried your code to train a Faster RCNN FPN with ResNet-50 detector with this config file configs/faster_rcnn_r50_fpn_1x.py by running
python tools/train.py ./configs/faster_rcnn_r50_fpn_1x.py --gpus 4 --work_dir ./output --validate

And I tested the model with
python tools/test.py ./configs/faster_rcnn_r50_fpn_1x.py ./output/latest.pth --gpus 4 --out ./output/results.pkl --eval bbox.

I got this result:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.356
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.571
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.382
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.204
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.393
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.450
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.304
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.489
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.515
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.331
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.556
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.644

I noticed the result (35.6) is lower than the one reported in the MODEL_ZOO (36.4).

I also tested on the 11th epoch and the result is 35.0.

Do you think it is normal? I don't think the variance should be this large.

Strange behaviours when training with imgs_per_gpu=3.

Hi.

Thank you for the great work. I found there were some strange behaviours when I was trying to train faster R-CNN with imgs_per_gpu=3 (although the number 3 is inappropriate to some extent). One is that in the very beginning training accuracy is above ~95%, which is relatively high compared to ~89% with imgs_per_gpu=2 or 4. And the other is that after validation epoch, an error related to indices happened.

Loading and preparing results... Traceback (most recent call last): File "./tools/train.py", line 81, in <module> main() File "./tools/train.py", line 77, in main logger=logger) File "/home/root/miniconda3/envs/mytorch/lib/python3.6/site-packages/mmdet/apis/train.py", line 57, in train_detector _dist_train(model, dataset, cfg, validate=validate) File "/home/root/miniconda3/envs/mytorch/lib/python3.6/site-packages/mmdet/apis/train.py", line 92, in _dist_train runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/root/miniconda3/envs/mytorch/lib/python3.6/site-packages/mmcv/runner/runner.py", line 349, in run epoch_runner(data_loaders[i], **kwargs) File "/home/root/miniconda3/envs/mytorch/lib/python3.6/site-packages/mmcv/runner/runner.py", line 265, in train self.call_hook('after_train_epoch') File "/home/root/miniconda3/envs/mytorch/lib/python3.6/site-packages/mmcv/runner/runner.py", line 222, in call_hook getattr(hook, fn_name)(self) File "/home/root/miniconda3/envs/mytorch/lib/python3.6/site-packages/mmdet/core/evaluation/eval_hooks.py", line 93, in after_train_epoch self.evaluate(runner, results) File "/home/root/miniconda3/envs/mytorch/lib/python3.6/site-packages/mmdet/core/evaluation/eval_hooks.py", line 134, in evaluate cocoDt = cocoGt.loadRes(tmp_file) File "/home/root/miniconda3/envs/mytorch/lib/python3.6/site-packages/pycocotools-2.0-py3.6-linux-x86_64.egg/pycocotools/coco.py", line 318, in loadRes if 'caption' in anns[0]: IndexError: list index out of range

How to train/eval simultaneously on multiple COCO-style datasets?

For example, I have multiple annotation files (in json format) for different datasets. I want to train/eval simultaneously on those datasets by providing the paths of the annotation files like what Detectron.pytorch does.
What is the easy way to achieve this with mmdetection?
My idea is to make use of torch.utils.data.ConcatDataset and modify the ann_file to support multiple ann_files in config.

fatal error: cuda_runtime.h: No such file or directory when building nms.

The error message is as follow:

In file included from <command-line>:0:0: 
/usr/include/stdc-predef.h:59:1: fatal error: cuda_runtime.h: No such file or directory  
#endif  
^ 
compilation terminated. 
error: command 'nvcc' failed with exit status 1 
make: *** [all] Error 1

This error occurred in a Ubuntu 14.04 machine with cuda 8.0, the envirionment variables are set properly.

On another Ubuntu 16.04 machine with cuda 9.0, the compilation goes smoothly.

Where does the 'weight' as the input of the Focal Loss come from?

def sigmoid_focal_loss(pred, target, weight, gamma=2.0, alpha=0.25, reduction='elementwise_mean'): pred_sigmoid = pred.sigmoid() pt = (1 - pred_sigmoid) * target + pred_sigmoid * (1 - target) weight = (alpha * target + (1 - alpha) * (1 - target)) * weight weight = weight * pt.pow(gamma) return F.binary_cross_entropy_with_logits( pred, target, weight, reduction=reduction)
There is an input named weight of the focal loss. Could you explain what this weight is and how I can get it. Thank you very much

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.