Coder Social home page Coder Social logo

grimoire / mmdetection-to-tensorrt Goto Github PK

View Code? Open in Web Editor NEW
577.0 14.0 82.0 426 KB

convert mmdetection model to tensorrt, support fp16, int8, batch input, dynamic shape etc.

License: Apache License 2.0

Python 99.38% Dockerfile 0.62%
tensorrt mmdetection inference object-detection faster-rcnn cascade-rcnn ssd retinanet yolov3

mmdetection-to-tensorrt's Introduction

MMDet to TensorRT

Note

The main branch is used to support model conversion of MMDetection>=3.0. If you want to convert model on older MMDetection, Please switch to branch:

News

  • 2024.02: Support MMDetection>=3.0

Introduction

This project aims to support End2End deployment of models in MMDetection with TensorRT.

Mask support is experiment.

Features:

  • fp16
  • int8(experiment)
  • batched input
  • dynamic input shape
  • combination of different modules
  • DeepStream

Requirement

  • install MMDetection:

    pip install openmim
    mim install mmdet==3.3.0
  • install torch2trt_dynamic:

    git clone https://github.com/grimoire/torch2trt_dynamic.git torch2trt_dynamic
    cd torch2trt_dynamic
    pip install -e .
  • install amirstan_plugin:

    • Install tensorrt: TensorRT

    • clone repo and build plugin

      git clone --depth=1 https://github.com/grimoire/amirstan_plugin.git
      cd amirstan_plugin
      git submodule update --init --progress --depth=1
      mkdir build
      cd build
      cmake -DTENSORRT_DIR=${TENSORRT_DIR} ..
      make -j10

      [!NOTE]

      DON'T FORGET setting the environment variable(in ~/.bashrc):

      export AMIRSTAN_LIBRARY_PATH=${amirstan_plugin_root}/build/lib

Installation

Host

git clone https://github.com/grimoire/mmdetection-to-tensorrt.git
cd mmdetection-to-tensorrt
pip install -e .

Docker

Build docker image

sudo docker build -t mmdet2trt_docker:v1.0 docker/

Run (will show the help for the CLI entrypoint)

sudo docker run --gpus all -it --rm -v ${your_data_path}:${bind_path} mmdet2trt_docker:v1.0

Or if you want to open a terminal inside de container:

sudo docker run --gpus all -it --rm -v ${your_data_path}:${bind_path} --entrypoint bash mmdet2trt_docker:v1.0

Example conversion:

sudo docker run --gpus all -it --rm -v ${your_data_path}:${bind_path} mmdet2trt_docker:v1.0 ${bind_path}/config.py ${bind_path}/checkpoint.pth ${bind_path}/output.trt

Usage

Create a TensorRT model from mmdet model. detail can be found in getting_started.md

CLI

# conversion might take few minutes.
mmdet2trt ${CONFIG_PATH} ${CHECKPOINT_PATH} ${OUTPUT_PATH}

Run mmdet2trt -h for help on optional arguments.

Python

shape_ranges=dict(
    x=dict(
        min=[1,3,320,320],
        opt=[1,3,800,1344],
        max=[1,3,1344,1344],
    )
)
trt_model = mmdet2trt(cfg_path,
                      weight_path,
                      shape_ranges=shape_ranges,
                      fp16_mode=True)

# save converted model
torch.save(trt_model.state_dict(), save_model_path)

# save engine if you want to use it in c++ api
with open(save_engine_path, mode='wb') as f:
    f.write(trt_model.state_dict()['engine'])

Note

The input of the engine is the tensor after preprocess. The output of the engine is num_dets, bboxes, scores, class_ids. if you enable the enable_mask flag, there will be another output mask. The bboxes output of the engine did not divided by scale_factor.

how to perform inference with the converted model.

from mmdet.apis import inference_detector
from mmdet2trt.apis import create_wrap_detector

# create wrap detector
trt_detector = create_wrap_detector(trt_model, cfg_path, device_id)

# result share same format as mmdetection
result = inference_detector(trt_detector, image_path)

Try demo in demo/inference.py, or demo/cpp if you want to do inference with c++ api.

Read getting_started.md for more details.

How does it works?

Most other project use pytorch=>ONNX=>tensorRT route, This repo convert pytorch=>tensorRT directly, avoid unnecessary ONNX IR. Read how-does-it-work for detail.

Support Model/Module

Note

Some models have only been tested on MMDet<3.0. If you found any failed model, Please report in the issue.

  • Faster R-CNN
  • Cascade R-CNN
  • Double-Head R-CNN
  • Group Normalization
  • Weight Standardization
  • DCN
  • SSD
  • RetinaNet
  • Libra R-CNN
  • FCOS
  • Fovea
  • CARAFE
  • FreeAnchor
  • RepPoints
  • NAS-FPN
  • ATSS
  • PAFPN
  • FSAF
  • GCNet
  • Guided Anchoring
  • Generalized Attention
  • Dynamic R-CNN
  • Hybrid Task Cascade
  • DetectoRS
  • Side-Aware Boundary Localization
  • YOLOv3
  • PAA
  • CornerNet(WIP)
  • Generalized Focal Loss
  • Grid RCNN
  • VFNet
  • GROIE
  • Mask R-CNN(experiment)
  • Cascade Mask R-CNN(experiment)
  • Cascade RPN
  • DETR
  • YOLOX

Tested on:

  • torch=2.2.0
  • tensorrt=8.6.1
  • mmdetection=3.3.0
  • cuda=11.7

FAQ

read this page if you meet any problem.

License

This project is released under the Apache 2.0 license.

mmdetection-to-tensorrt's People

Contributors

chenxinfeng4 avatar daavoo avatar dableuteef avatar grimoire avatar init-22 avatar mmeendez8 avatar tehkillerbee avatar vedrusss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mmdetection-to-tensorrt's Issues

Problem running via docker

I ran the docker using the command:
sudo docker run --gpus all -it --rm -v ${your_data_path}:${bind_path} mmdet2trt_docker:v1.0 ${bind_path}/config.py ${bind_path}/checkpoint.pth ${bind_path}/output.trt

And got an error:
pkg_resources.DistributionNotFound: The 'mmdet2trt' distribution was not found and is required by the application

What could this mean?

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

when using code on pyCharm as follow:

import torch
import mmdet2trt

cfg_path = 'faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
weight_path = 'faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
save_path = 'helloworld.trt'
opt_shape_param=[
    [   [1,3,320,320],      # min shape
        [1,3,800,1344],     # optimize shape
        [1,3,1344,1344],    # max shape ] ]
max_workspace_size=1<<30    # some module need large workspace, add workspace size when OOM.
trt_model = mmdet2trt(cfg_path, weight_path, opt_shape_param=opt_shape_param, fp16_mode=True, max_workspace_size=max_workspace_size)

however, error log output and Python be quit force.
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

debuging from 'import mmdet2trt' via Python Console and found the code break from
import matplotlib.pyplot as plt in ./mmdetection/mmdet/apis/inference.py

Curiously, when i run the demo from mmdetection (mmdetecion/demo/image_demo.py) and it runs normally. At least it should be comfirmed mmdetecion is normally.

So, what's problem about this case ?

test.py

Thanks for your helpful work.

I have a problem when I tested trt_model in a dateset.

I found that using tool/test.py to test the COCO2017 dataset causes problems, and I think it's because of an error related to mmdection 'single_gpu_test' and the result from 'ModelWarper'.

The result of the error is.
400000/5000, 1053.8 task/s, elapsed: 380s, ETA: -374s

It should be stopped at 5000 not 400000. Could you please help me with this problem?

The trt_model I am using is transferred from mmdetection 'faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'.

inference error

Thank you for your excellent work. I built environment according to ducuments. But when I try ATSS, I got this error:
base env:

torch=1.6.0
tensorrt=7.0.11
mmdetection=2.3.0
cuda=10.0
cudnn=7.6

run this:

python demo/inference.py tests/images/test_2.jpg ~/dzws/mmdetection/configs/atss/atss_r50_fpn_1x_coco.py checkpoints/atss_r50_fpn_1x_coco.pth detouts/atss.wts

error:

(mmdetection) vs@vx:~/dzws/tmp-file/mmdet_trt/mmdetection-to-tensorrt$ python demo/inference.py tests/images/test_2.jpg ~/dzws/mmdetection/configs/atss/atss_r50_fpn_1x_coco.py checkpoints/atss_r50_fpn_1x_coco.pth detouts/atss.wts
[TensorRT] ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::949, condition: profileMinDims.d[i] <= dimensions.d[i]
[TensorRT] ERROR: Parameter check failed at: engine.cpp::resolveSlots::1092, condition: allInputDimensionsSpecified(routine)
Traceback (most recent call last):
  File "demo/inference.py", line 62, in <module>
    main()
  File "demo/inference.py", line 31, in main
    result = inference_detector(trt_model, image_path, cfg_path, args.device)
  File "/home/visi/dzws/tmp-file/mmdet_trt/mmdetection-to-tensorrt/mmdet2trt/apis/inference.py", line 35, in inference_detector
    result = model(tensor)
  File "/home/visi/miniconda3/envs/mmdetection/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/visi/miniconda3/envs/mmdetection/lib/python3.7/site-packages/torch2trt/torch2trt.py", line 415, in forward
    shape = tuple(self.context.get_binding_shape(idx))
ValueError: __len__() should return >= 0

Input 800x1344. Any suggestions? Thanks.

python inference.py but output [ all zeros result]

HI:

after building the env on docker, I run the inference.py with :

  • img : demo.jpg from mmdetection
  • config : faster_rcnn_r50_fpn_1x_coco.py
  • ckg : faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
demo# python inference.py ../image.jpg ../../mmdet/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py ../faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth 1.trt
INFO:root:load model from config:../../mmdet/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
INFO:root:model warmup
INFO:root:convert model
Warning: Encountered known unsupported method torch.Tensor.new_zeros
Warning: Encountered known unsupported method torch.Tensor.new_tensor
Warning: Encountered known unsupported method torch.Tensor.new_tensor
Warning: Encountered known unsupported method torch.Tensor.new_zeros
INFO:root:convert take time 44.59940958023071 s
TRTModule()
TRTModule()

however, the output of classification and box is incorrect as follow

trt_bbox:
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]], device='cuda:0')
trt_classfication_result:
tensor([-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1.], device='cuda:0')

The inference.py below ( i did not change anything ):

from mmdet2trt import mmdet2trt
import torch
from argparse import ArgumentParser

from mmdet2trt.apis import inference_detector, init_detector
import cv2,logging


def main():
    parser = ArgumentParser()
    parser.add_argument('img', help='Image file')
    parser.add_argument('config', help='mmdet Config file')
    parser.add_argument('checkpoint', help='mmdet Checkpoint file')
    parser.add_argument('save_path', help='tensorrt model save path')
    parser.add_argument(
        '--device', default='cuda:0', help='Device used for inference')
    parser.add_argument(
        '--score-thr', type=float, default=0.3, help='bbox score threshold')
    parser.add_argument("--fp16", type=bool, default=False, help="enable fp16 inference")
    args = parser.parse_args()

    cfg_path = args.config

    trt_model = mmdet2trt(cfg_path, args.checkpoint,log_level = logging.INFO, fp16_mode=args.fp16, device=args.device)
    print(trt_model)
    torch.save(trt_model.state_dict(), args.save_path)

    trt_model = init_detector(args.save_path)
    print(trt_model)
    image_path = args.img

    result = inference_detector(trt_model, image_path, cfg_path, args.device)
#    print(result)
    num_detections = result[0].item()

    trt_bbox = result[1][0]
    trt_score = result[2][0]
    print('trt_bbox:')
    print(trt_bbox)
    trt_cls = result[3][0]
    print('trt_classfication_result:')
    print(trt_cls)
    image = cv2.imread(image_path)
    input_image_shape = image.shape
    for i in range(num_detections):
        scores = trt_score[i].item()
        classes = int(trt_cls[i].item())
        if scores < args.score_thr:
            continue
        bbox = tuple(trt_bbox[i])
        bbox = tuple(int(v) for v in bbox)

        color = ((classes>>2 &1) *128 + (classes>>5 &1) *128,
                (classes>>1 &1) *128 + (classes>>4 &1) *128,
                (classes>>0 &1) *128 + (classes>>3 &1) *128)
        cv2.rectangle(image, bbox[:2], bbox[2:], color, thickness=5)

    if input_image_shape[0]>1280 or input_image_shape[1]>720:
        scales = min(720/image.shape[0], 1280/image.shape[1])
        image = cv2.resize(image, (0,0), fx=scales, fy=scales)
    cv2.imwrite('image.jpg', image)

if __name__ == '__main__':
    main()

Issue while converting with int8 support

Describe the bug
Hi! First of all thank you a lot for you great work! But when I try to add int8 support I got some errpor - can you pls help me to understand what needs to be changed?
Traceback (most recent call last): File "convert_to_trt.py", line 39, in <module> device=device) File "/root/space/mmdetection-to-tensorrt/mmdet2trt/mmdet2trt.py", line 147, in mmdet2trt int8_calib_algorithm=int8_calib_algorithm) File "/root/space/torch2trt_dynamic/torch2trt_dynamic/torch2trt_dynamic.py", line 561, in torch2trt_dynamic config.set_calibration_profile(profile) AttributeError: 'tensorrt.tensorrt.IBuilderConfig' object has no attribute 'set_calibration_profile'

To Reproduce
`opt_shape_param = None
config_path = '/user_home/packages_atss_stage2/logs_packages_atss/stage2_packages_atss.py'
model_path = '/user_home/packages_atss_stage2/logs_packages_atss/epoch_10.pth'
result_path = '/user_home/packages_atss_stage2/logs_packages_atss/int8_mmdet_to_trt.pth'

fp16 = False
int8 = True
device = 'cuda:0'
trt_log_level = 'VERBOSE'
return_wrap_model = False
max_workspace_size = 1 << 30 # some module and tactic need large workspace.
trt_model = mmdet2trt(config_path,
model_path,
opt_shape_param=opt_shape_param,
fp16_mode=fp16,
int8_mode=int8,
max_workspace_size=max_workspace_size,
trt_log_level=trt_log_level,
return_wrap_model=return_wrap_model,
device=device)

torch.save(trt_model.state_dict(), result_path)`

enviroment:
Docekr file from repo
Additional context
Add any other context about the problem here.

Thanks for your great work, and there are some questions for help

This is a very good and helpful work for me ,thank you.But I have two questions about your work:
1.For 'mmdetection-to-tensorrt',is the 'amirstan_plugin' tool must needed? Because one of the tool 'cub' I can't installed.
The website you give is broken now.
2.Do you test this tool in NVIDIA TX/AGX platform? I haven't installed this tool on my computer and AGX platform dut to qusetion 1, but it is meanful to use your tool in Tx/AGX platform . Can you give me some help?
However, thanks for you job.

yolov3 can not use int8 calibration?

Thanks for your great job. I'm trying int8 conversion. Using SSD, the calibration has been called, and ran normally. But using yolov3, the calibration process did not run, although the engine was generated finally.

I've tried several configs, shape, batch, and entropy/minmax mode, still no. Any ideas? Thanks in advance.

Settings:
# for yolov3 320
opt_shape_param = [
[
[1, 3, 320, 320],
[1, 3, 320, 320],
[1, 3, 320, 320],
]
]

Environment:
GPU: Tesla V100
nvidia-diver:418.152.00
cuda: 11.0
cudnn: 8.0
tensorrt: 7.1.2.8
pytorch: 1.7
torchvision: 0.8
mmdetection: 2.7

DCNv2 support

Trying to convert Faster RCNN R-50 DCNv2 to tensorRT, but I'm getting following error:

[TensorRT] ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::948, condition: profileMaxDims.d[i] >= dimensions.d[i]
[TensorRT] ERROR: Parameter check failed at: engine.cpp::resolveSlots::1092, condition: allInputDimensionsSpecified(routine)
Traceback (most recent call last):
  File "infer_rt.py", line 59, in <module>
    results = model(image_batch)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/space/torch2trt_dynamic/torch2trt_dynamic/torch2trt_dynamic.py", line 458, in forward
    shape = tuple(self.context.get_binding_shape(idx))
ValueError: __len__() should return >= 0

Here is my code:

import mmcv
import os
import time
import numpy as np
import cv2
from mmdet2trt import mmdet2trt
from mmdet2trt.apis import inference_detector, init_detector
import torch
from mmdet.datasets.pipelines import Compose

config_file = '/models/faster_rcnn_r50_fpn_mdconv_c3-c5_group4_1x_coco.py'
checkpoint_file = '/models/output.trt'

model = init_detector(checkpoint_file)

def preprocess(img, cfg):
    if isinstance(cfg, str):
        cfg = mmcv.Config.fromfile(cfg)
    data = dict(img=img)
    cfg = cfg.copy()
    cfg.data.test.pipeline[0].type = 'LoadImageFromWebcam'
    test_pipeline = cfg.data.test.pipeline
    test_pipeline = Compose(test_pipeline)
    data = test_pipeline(data)
    return data['img'][0]._data

def draw_label(image, point, label, font=cv2.FONT_HERSHEY_SIMPLEX,
               font_scale=0.5, thickness=2):
    size = cv2.getTextSize(label, font, font_scale, thickness)[0]
    x, y = point
    cv2.rectangle(image, (x, y - size[1]),
                  (x + size[0], y), (255, 0, 0), cv2.FILLED)
    cv2.putText(image, label, point, font, font_scale,
                (255, 255, 255), thickness)

IMAGE_PATH = "/models/dataset/sort_potato_l2/val/images"
image_batch = []
for img in os.listdir(IMAGE_PATH):
    if img.endswith("jpg"):
        image = os.path.join(IMAGE_PATH, img)
        image = cv2.imread(image)
        image = preprocess(image, config_file)
        image_batch.append(image)
    if len(image_batch) == 4:
        tic = time.time()
        image_batch = torch.stack(image_batch, axis = 0).cuda()
        results = model(image_batch)
        print("FPS: ", 1/(time.time()-tic))
        image_batch = []

inference compare between trt and pytorch

HI~

i using inference with 20 pics for testing Speed , GPU Memory-Usage of tensorrt and pytorch, respectivly.

env:
config : faster_rcnn_r50_fpn_1x_coco.py
checkpoint : faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
GPU: TITAN X (Pascal)

the result is all most the same , but :

  • Question 1 : tensorrt will spent longest time for the fist pic?
  • Question 2 : tensorrt doesn't seem to have any improvement in Speed and GPU Memory-Usage?
[trt] inference take time 1.2806954383850098 s , detect 8 objects
[trt] inference take time 0.07403969764709473 s , detect 22 objects
[trt] inference take time 0.0725240707397461 s , detect 11 objects
[trt] inference take time 0.0938711166381836 s , detect 3 objects
[trt] inference take time 0.18393778800964355 s , detect 4 objects
[trt] inference take time 0.07542586326599121 s , detect 3 objects
[trt] inference take time 0.07514834403991699 s , detect 5 objects
[trt] inference take time 0.0743105411529541 s , detect 19 objects
[trt] inference take time 0.06311678886413574 s , detect 33 objects
[trt] inference take time 0.07394289970397949 s , detect 3 objects
[trt] inference take time 0.08452844619750977 s , detect 5 objects
[trt] inference take time 0.07141757011413574 s , detect 28 objects
[trt] inference take time 0.07546305656433105 s , detect 54 objects
[trt] inference take time 0.06720089912414551 s , detect 9 objects
[trt] inference take time 0.07281136512756348 s , detect 4 objects
[trt] inference take time 0.06780433654785156 s , detect 56 objects
[trt] inference take time 0.0691215991973877 s , detect 17 objects
[trt] inference take time 0.07524847984313965 s , detect 13 objects
[trt] inference take time 0.08305907249450684 s , detect 31 objects
[trt] inference take time 0.0662374496459961 s , detect 39 objects
[trt] inference average time 0.13999524116516113 s

[torch] inference take time 0.08921217918395996 s 
[torch] inference take time 0.08358907699584961 s 
[torch] inference take time 0.08429265022277832 s 
[torch] inference take time 0.08585500717163086 s 
[torch] inference take time 0.07990050315856934 s 
[torch] inference take time 0.08589792251586914 s 
[torch] inference take time 0.09028077125549316 s 
[torch] inference take time 0.09110355377197266 s 
[torch] inference take time 0.08078670501708984 s 
[torch] inference take time 0.08568334579467773 s 
[torch] inference take time 0.09473133087158203 s 
[torch] inference take time 0.08323526382446289 s 
[torch] inference take time 0.08632254600524902 s 
[torch] inference take time 0.08114051818847656 s 
[torch] inference take time 0.08165812492370605 s 
[torch] inference take time 0.08303499221801758 s 
[torch] inference take time 0.08093047142028809 s 
[torch] inference take time 0.08914780616760254 s 
[torch] inference take time 0.08186078071594238 s 
[torch] inference take time 0.07793211936950684 s 
[torch] inference average time 0.08482978343963624 s

Memory-Usage : 2110MiB / 12188MiB

error:batchedNMSPlugin.cpp

hi,I met the problem:
#assertion/amirstan_plugin/src/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp,143
Aborted (core dumped)

image

enviroment:

  • OS: [Ubuntu]
  • python_version: [3.6]
  • pytorch_version: [1.6]
  • cuda_version: [cuda-10.2]
  • cudnn_version: [7.6.5]
  • mmdetection_version: [2.4]

Looking forward to your help~~ thankyou

inference

Can produce trt but there will be a prompt
"Warning: Encountered known unsupported method torch.Tensor.new_zeros"
But during the test, it prompted a problem.
"[TensorRT] ERROR: (Unnamed Layer* 204) [ElementWise]: dimensions not compatible for elementwise
[TensorRT] ERROR: shapeMachine.cpp (285)-Shape Error in operator(): broadcast with incompatible dimensions
[TensorRT] ERROR: Instruction: CHECK_BROADCAST 167 100
Traceback (most recent call last):
File "demo/inference.py", line 61, in
main()
File "demo/inference.py", line 31, in main
result = inference_detector(trt_model, image_path, cfg_path, args.device)
File "/home/nie/mmdetection2trt/mm2trt/mmdet2trt/apis/inference.py", line 39, in inference_detector
result = model(tensor)
File "/home/nie/anaconda3/envs/mla/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/nie/anaconda3/envs/mla/lib/python3.7/site-packages/torch2trt/torch2trt.py", line 394, in forward
shape = tuple(self.context.get_binding_shape(idx))
ValueError: len() should return >= 0"

MemoryError on jetson TX2

I am trying to convert model from mmdetection2tensorrt using the Dockerfile provided on TX2 machine but getting Memory error issues

mmdet2trt configs/retinanet_r50_fpn_2x_coco.py weights/retinanet_r50_fpn_2x_coco_20200131-fdb43119.pth weights/model.trt --min-scale 1 3 800 600 --max-scale 1 3 800 600 --opt-scale 1 3 800 600
INFO:mmdet2trt:Model warmup
INFO:mmdet2trt:Converting model
[TensorRT] INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[TensorRT] INFO: Detected 1 inputs and 4 output network tensors.
INFO:mmdet2trt:Conversion took 80.97697949409485 s
INFO:mmdet2trt:Saving TRT model to: weights/model.trt
Killed

enviroment:

  • OS: Ubuntu 18.04 LTS
  • python_version: 3.6.9
  • pytorch_version: 1.6.0
  • cuda_version: 10.2
  • cudnn_version: [e.g. 8.0.2.39]
  • mmdetection_version: [e.g. 2.7.0]

we have made several changes to Dockerfile to be able to make it run jetson tx2 device.

FROM nvcr.io/nvidia/l4t-base:r32.4.4

### update apt and install libs
RUN apt-get update &&\
    apt-get install -y vim cmake libsm6 libxext6 libxrender-dev libgl1-mesa-glx git

### torch install 
RUN wget https://nvidia.box.com/shared/static/9eptse6jyly1ggt9axbja2yrmj6pbarc.whl -O torch-1.6.0-cp36-cp36m-linux_aarch64.whl &&\
    apt-get install -y python3-pip libopenblas-base libopenmpi-dev &&\
    pip3 install Cython &&\
    pip3 install numpy torch-1.6.0-cp36-cp36m-linux_aarch64.whl
### python
RUN pip3 install --upgrade pip

# ### install mmcv

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y python3-opencv

### scikit image
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update -y \
  && apt-get install -y --no-install-recommends apt-utils \
  && apt-get install -y \
    python3-dev libpython3-dev python-pil python3-tk python-imaging-tk \
    build-essential wget locales liblapack-dev

RUN sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen && \
    dpkg-reconfigure --frontend=noninteractive locales && \
    update-locale LANG=en_US.UTF-8
ENV LANG en_US.UTF-8



RUN wget -q -O /tmp/get-pip.py --no-check-certificate https://bootstrap.pypa.io/get-pip.py \
  && python3 /tmp/get-pip.py \
  && pip3 install -U pip
RUN pip3 install -U testresources setuptools

RUN pip3 install -U numpy
#####

RUN git clone https://github.com/open-mmlab/mmcv.git /root/space/mmcv &&\
    cd root/space/mmcv &&\
    MMCV_WITH_OPS=1 pip install -e .

### git mmdetection
RUN git clone --depth=1 https://github.com/open-mmlab/mmdetection.git /root/space/mmdetection

### install mmdetection
RUN cd /root/space/mmdetection &&\ 
    pip3 install -r requirements.txt &&\
    python3 setup.py develop

## install cmake - amirstan plugin below requires cmake version > 3.13
RUN cd /root/space/ &&\
    wget https://github.com/Kitware/CMake/releases/download/v3.19.1/cmake-3.19.1.tar.gz &&\
    tar -xf cmake-3.19.1.tar.gz &&\
    cd cmake-3.19.1 &&\
    apt-get install -y libssl-dev &&\
    ./configure &&\
    make &&\
    make install


### git amirstan plugin
RUN git clone --depth=1 https://github.com/grimoire/amirstan_plugin.git /root/space/amirstan_plugin &&\ 
    cd /root/space/amirstan_plugin &&\ 
    git submodule update --init --progress --depth=1

### install amirstan plugin
RUN cd /root/space/amirstan_plugin &&\ 
    mkdir build &&\
    cd build &&\
    cmake .. &&\
    make -j10 &&\
    echo "export AMIRSTAN_LIBRARY_PATH=/root/space/amirstan_plugin/build/lib" >> /root/.bashrc

### git torch2trt_dynamic
RUN git clone --depth=1 https://github.com/grimoire/torch2trt_dynamic.git /root/space/torch2trt_dynamic

### install torch2trt_dynamic
RUN cd /root/space/torch2trt_dynamic &&\
    python3 setup.py develop

### git mmdetection-to-tensorrt
RUN git clone --depth=1 https://github.com/grimoire/mmdetection-to-tensorrt.git /root/space/mmdetection-to-tensorrt

### install mmdetection-to-tensorrt
RUN cd /root/space/mmdetection-to-tensorrt &&\
    python3 setup.py develop

## setuptools for python3
RUN apt-get install -y python3-setuptools

### install torchvision
RUN  apt-get install -y libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev &&\
     git clone --branch v0.7.0 https://github.com/pytorch/vision torchvision &&\
     cd torchvision &&\
     export BUILD_VERSION=0.7.0 &&\  
     python3 setup.py install

WORKDIR /root/space

How to export a model with only batched NMS layer

Hi, thanks for your amazing work!

I want to export a model with only "BatchedNMS" (located in post_processing/batched_nms.py) for some reason. In other words, I just want to speed up the process of NMS through tensorRT.
Could you please tell me how to organize the config file, or how to export it directly?

Thank you!

Some questions about this project

This is a very interesting and useful project, thank you for your contribution, but I still have three questions:
1.I can get the results on retinanet and yolo v3, but the two-stage detection algorithm, such as faster-rcnn, cascade_rcnn, does not work, and a Segmentation fault (core dumped) error is reported.
2.I used fp16 mode, but the size of the model converted on retinanet and yolo v3 is larger than the original model size on mmdetection, and the inference time is also very long, so there must be some problems, but I don’t know where it happened.
3.retinanet and yolo v3 can only work under the specified size.
Finally, I try to join the discuss group, please agree, thank you, the following is my environment:
enviroment:

  • OS: [Ubuntu18.04]
  • python_version: [3.6]
  • pytorch_version: [1.3]
  • cuda_version: [cuda-10.0]
  • cudnn_version: [7.6.5]
  • mmdetection_version: [2.5.0]

int8 model convert error

env: GPU:tesla t4 nvidia-diver:450.51.6 cuda:11.03 cudnn:8.04 tensorrt:7.1.3.4 pytorch:1.6/1.7 torchvision:0.7/0.8 mmdetection:2.7

Warning: Encountered known unsupported method torch.Tensor.new_tensor
Warning: Encountered known unsupported method torch.Tensor.new_tensor
Warning: Encountered known unsupported method torch.Tensor.new_tensor
Warning: Encountered known unsupported method torch.Tensor.new_tensor
Warning: Encountered known unsupported method torch.Tensor.new_tensor
Warning: Encountered known unsupported method torch.Tensor.new_tensor
[TensorRT] INFO: Detected 1 inputs and 4 output network tensors.
[TensorRT] INFO: Starting Calibration.
[TensorRT] ERROR: engine.cpp (936) - Cuda Error in executeInternal: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (32) - Cuda Error in free: 700 (an illegal memory access was encountered)
terminate called after throwing an instance of 'nvinfer1::CudaError'
  what():  std::exception
Aborted

detail with cuda-memcheck :

========= Invalid __global__ read of size 4
=========     at 0x00000960 in void allClassNMS_kernel<float, float, int=2>(int, int, int, int, float, bool, bool, float*, float*, int*, float*, int*, bool)
=========     by thread (288,0,0) in block (0,0,0)
=========     Address 0x7f2751b0d890 is out of bounds
=========     Device Frame:void allClassNMS_kernel<float, float, int=2>(int, int, int, int, float, bool, bool, float*, float*, int*, float*, int*, bool) (void allClassNMS_kernel<float, float, int=2>(int, int, int, int, float, bool, bool, float*, float*, int*, float*, int*, bool) : 0x960)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/lib64/libcuda.so (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/data/cyw/mmdet2trt/amirstan_plugin/build/lib/libamirstan_plugin.so [0x7ea0b]
=========     Host Frame:/data/cyw/mmdet2trt/amirstan_plugin/build/lib/libamirstan_plugin.so [0xc0751]
=========     Host Frame:/data/cyw/mmdet2trt/amirstan_plugin/build/lib/libamirstan_plugin.so (_Z18allClassNMS_kernelIffLi2EEviiiifbbPT0_PT_PiS3_S4_b + 0x1ea) [0x5e4ea]
=========     Host Frame:/data/cyw/mmdet2trt/amirstan_plugin/build/lib/libamirstan_plugin.so (_Z15allClassNMS_gpuIffE14pluginStatus_tP11CUstream_stiiiifbbPvS3_S3_S3_S3_b + 0x14d) [0x5d2dd]
=========     Host Frame:/data/cyw/mmdet2trt/amirstan_plugin/build/lib/libamirstan_plugin.so (_Z12nmsInferenceP11CUstream_stiiibiiiiiffN8nvinfer18DataTypeEPKvS2_S4_PvS5_S5_S5_S5_bbb + 0x2c8) [0x5cdb8]
=========     Host Frame:/data/cyw/mmdet2trt/amirstan_plugin/build/lib/libamirstan_plugin.so (_ZN8amirstan6plugin22BatchedNMSPluginCustom7enqueueEPKN8nvinfer116PluginTensorDescES5_PKPKvPKPvSA_P11CUstream_st + 0x74) [0x57694]
=========     Host Frame:/data/cyw/TensorRT-7.1.3.4/lib/libnvinfer.so.7 (_ZNK8nvinfer12rt4cuda24PluginV2DynamicExtRunner7executeERKNS0_13CommonContextERKNS0_19ExecutionParametersE + 0x3e4) [0x76b5a4]
=========     Host Frame:/data/cyw/TensorRT-7.1.3.4/lib/libnvinfer.so.7 (_ZN8nvinfer12rt16ExecutionContext15enqueueInternalEPP10CUevent_st + 0x4a7) [0x6f7287]
=========     Host Frame:/data/cyw/TensorRT-7.1.3.4/lib/libnvinfer.so.7 (_ZN8nvinfer12rt16ExecutionContext9enqueueV2EPPvP11CUstream_stPP10CUevent_st + 0x1fc) [0x6f8f0c]
=========     Host Frame:/data/cyw/miniconda/envs/open-mmlab1.6/lib/python3.7/site-packages/tensorrt/tensorrt.so [0x9ece6]
=========     Host Frame:/data/cyw/miniconda/envs/open-mmlab1.6/lib/python3.7/site-packages/tensorrt/tensorrt.so [0xd91e4]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x274) [0x165914]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165a31]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x52fe) [0x1d239e]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x114829]
=========     Host Frame:python (_PyFunction_FastCallDict + 0x1d5) [0x115925]
=========     Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x1344d3]
=========     Host Frame:python (PyObject_Call + 0x6e) [0x126ffe]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x1e4a) [0x1ceeea]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x114829]
=========     Host Frame:python (_PyFunction_FastCallDict + 0x1d5) [0x115925]
=========     Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x1344d3]
=========     Host Frame:python [0x16be1a]
=========     Host Frame:python (_PyObject_FastCallKeywords + 0x48b) [0x16cccb]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x49e6) [0x1d1a86]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x164e7b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x416) [0x1cd4b6]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x164e7b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x416) [0x1cd4b6]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x164e7b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x416) [0x1cd4b6]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x114829]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x115714]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x11573c]
=========     Host Frame:python [0x22cf14]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x237331]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x237523]
=========     Host Frame:python [0x238655]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x23877c]
=========     Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x223d5]
=========     Host Frame:python [0x1dcff0]
=========
#assertion/data/cyw/mmdet2trt/amirstan_plugin/src/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp,140
========= Error: process didn't terminate successfully
========= No CUDA-MEMCHECK results found

convert one-stage detector failed

hello, first i should say this is a great job.
But when I convert an one-stage detector(retinanet and atss) in mmdet(2.5.0), I encounter this error infos.
image

It seems that some tensor traced is wrong. Could you help me to fix it?
my env:
ubuntu16.04
cuda10.1
cudnn7.6.5
tensorrt6.0.1.5

TorchVision: 0.5.0
OpenCV: 4.4.0
MMCV: 1.1.6
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 10.1
MMDetection: 2.5.0+

A stable release version tagged to specific version of MMDetection?

Hi, may I know if there is a plan to release a stable version (tagged) for a specific version of MMDetection? I've noticed that there are some bugs that has been solved for bugs in mmdetection but hasn't been tagged yet.

I wanted to make sure that I can still reproduce my conversion a few months later as a reference. Alternatively, could you point to me which version of MMDetection should I be using with the tag version 0.3.0?

Thanks!

How to change the class scores before it is sent to nms?

First , I want to thanks for your great job, it really helped me a lot. Could you please help me to solve the question below?

I changed the method "get_bboxes" of class "BBoxHead" and added a new parameter for original "mmdetection" in the test mode. This parameter is a function which I did some tricks so as to change the scores(the result of F.sofmax...).

If I want to make the same change in "mmdetection-to-tensorrt", what should I do? Or how can I get the original scores out of the get_bboxes method when I use "mmdetection-to-tensorrt"?

ModuleNotFoundError: No module named 'mmdet2trt.models'; 'mmdet2trt' is not a package

I run the mmdet2trt.py to convert .pth to .engine file, however I got following error:

Traceback (most recent call last):
  File "mmdet2trt.py", line 8, in <module>
    from mmdet2trt.models.builder import build_wraper
  File "/home/sycv/workplace/pengyuzhou/mmdetection-to-tensorrt/mmdet2trt/mmdet2trt.py", line 8, in <module>
    from mmdet2trt.models.builder import build_wraper
ModuleNotFoundError: No module named 'mmdet2trt.models'; 'mmdet2trt' is not a package

I run the script by python mmdet2trt.py --config=/home/sycv/workplace/pengyuzhou/mmdetection/configs/yolo/yolov3_d53_mstrain-608_273e_coco.py --checkpoint==/home/sycv/workplace/pengyuzhou/mmdetection-to-tensorrt/yolov3_d53_mstrain-608_273e_coco-139f5633.pth --output==/home/sycv/workplace/pengyuzhou/mmdetection-to-tensorrt/demo/

To Reproduce
Code snippet about how to reproduce the bug.

enviroment:

  • OS: [e.g. Ubuntu]
  • python_version: 3.7
  • pytorch_version: 1.6
  • cuda_version: 10.2
  • cudnn_version: 7.6.5
  • mmdetection_version: 2.5

Additional context
Add any other context about the problem here.

error in deformable_im2col: a PTX JIT compilation failed

When running the mmdet2trt with my config file and checkpoint, I got the following warning:

root@f406c58d8080:~/space/mmdetection-to-tensorrt# mmdet2trt /home/config_ct2_full.py /home/epoch_24.pth /home/qcgd.trt
/usr/local/lib/python3.6/dist-packages/mmcv/utils/misc.py:304: UserWarning: "deformable_groups" is deprecated in `DeformConv2d.__init__`, please use "deform_groups" instead
  f'"{src_arg_name}" is deprecated in '
/usr/local/lib/python3.6/dist-packages/mmcv/utils/misc.py:304: UserWarning: "out_size" is deprecated in `RoIAlign.__init__`, please use "output_size" instead
  f'"{src_arg_name}" is deprecated in '
/usr/local/lib/python3.6/dist-packages/mmcv/utils/misc.py:304: UserWarning: "sample_num" is deprecated in `RoIAlign.__init__`, please use "sampling_ratio" instead
  f'"{src_arg_name}" is deprecated in '
INFO:mmdet2trt:Model warmup
INFO:mmdet2trt:Converting model
**Warning: Encountered known unsupported method torch.Tensor.new_tensor
Warning: Encountered known unsupported method torch.Tensor.new_tensor**
[TensorRT] INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[TensorRT] INFO: Detected 1 inputs and 4 output network tensors.
INFO:mmdet2trt:Conversion took 97.33503580093384 s
INFO:mmdet2trt:Saving TRT model to: /home/qcgd.trt

And then I tried to test the converted tensorrt model by using the tools/test.py, and I got the following errror:

root@f406c58d8080:~/space/mmdetection-to-tensorrt/tools# python3 test.py /home/config_ct2_full.py /home/qcgd.trt --ou         t /home/result.pkl
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
[                                                  ] 0/176, elapsed: 0s, ETA:error in deformable_im2col: a PTX JIT co         mpilation failed
error in deformable_im2col: a PTX JIT compilation failed
error in deformable_im2col: a PTX JIT compilation failed
error in deformable_im2col: a PTX JIT compilation failed
error in deformable_im2col: a PTX JIT compilation failed
error in deformable_im2col: a PTX JIT compilation failed
error in deformable_im2col: a PTX JIT compilation failed
error in deformable_im2col: a PTX JIT compilation failed
error in deformable_im2col: a PTX JIT compilation failed
error in deformable_im2col: a PTX JIT compilation failed
error in deformable_im2col: a PTX JIT compilation failed
error in deformable_im2col: a PTX JIT compilation failed
error in deformable_im2col: a PTX JIT compilation failed
#assertion/root/space/amirstan_plugin/src/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp,138
Aborted (core dumped)

The mmdet config file contains:

model = dict(
    type='FasterRCNN',
    pretrained='/home/resnet50.pth',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        dcn=dict(type='DCN', deformable_groups=1, fallback_on_stride=False),
        stage_with_dcn=(False, True, True, True)),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.05, 0.1, 0.4, 0.8, 1.0, 1.25, 2.5, 10.0, 20],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    roi_head=dict(
        type='StandardRoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', out_size=7, sample_num=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=5,
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0))))
train_cfg = dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            match_low_quality=True,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    rpn_proposal=dict(
        nms_across_levels=False,
        nms_pre=2000,
        nms_post=1000,
        max_num=1000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.5,
            min_pos_iou=0.5,
            match_low_quality=False,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=512,
            pos_fraction=0.25,
            neg_pos_ub=-1,
            add_gt_as_proposals=True),
        pos_weight=-1,
        debug=False))
test_cfg = dict(
    rpn=dict(
        nms_across_levels=False,
        nms_pre=1000,
        nms_post=1000,
        max_num=1000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        score_thr=0.05,
        nms=dict(type='nms', iou_threshold=0.5),
        max_per_img=100))

The docker image is created by using the docker/Dockfile. Anybody can help to solve this problem? Thx

skeleton or stgcn module

hi~how long will the skeleton or st-gcn module be added?
will it take me long time to write a module supporting skeleton and st-gcn?I haven't written a program about tensorrt. :)

error when convert mmdetection-yolov3 to tensorrt

1. enviroment:

  • OS: ubuntu 18.04
  • python_version: Python 3.7.6
  • pytorch_version:
>>> torch.__version__
'1.7.0+cu110'
>>> torchvision.__version__
'0.8.1+cu110'
>>> mmdet.__version__
'2.5.0'
>>> mmcv.__version__
'1.2.0'
>>> tensorrt.__version__
'7.2.1.6'
  • cuda_version:
nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:38_PDT_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.TC445_37.28540450_0
  • cudnn_version:
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 5
  • mmdetection_version: 2.5.0

  • nvidia driver version:

NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1

2. error infomation

/home/yangshuai/anaconda3/envs/mmlab/bin/python /home/yangshuai/Downloads/pycharm-2019.3.3/plugins/python/helpers/pydev/pydevd.py --cmd-line --multiproc --qt-support=auto --client 127.0.0.1 --port 46145 --file /home/yangshuai/project/runtime/mmdetection-to-tensorrt/demo/inference.py
pydev debugger: process 6725 is connecting

Connected to pydev debugger (build 193.6494.30)
<class 'tensorrt.tensorrt.IPluginFactory'>
WARNING:root:module mmdet.models.VFNetHead not exist.
WARNING:root:module mmdet.models.VFNet not exist.
[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin GridAnchorDynamicPluginDynamic version 1
Traceback (most recent call last):
  File "/home/yangshuai/project/runtime/mmdetection-to-tensorrt/mmdet2trt/mmdet2trt.py", line 147, in mmdet2trt
    int8_calib_algorithm=int8_calib_algorithm)
  File "/home/yangshuai/project/runtime/torch2trt_dynamic/torch2trt_dynamic/torch2trt_dynamic.py", line 513, in torch2trt_dynamic
    outputs = module(*inputs)
  File "/home/yangshuai/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yangshuai/project/runtime/mmdetection-to-tensorrt/mmdet2trt/models/detectors/two_stage.py", line 50, in forward
    rois = rpn_head(feat, x)
  File "/home/yangshuai/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yangshuai/project/runtime/mmdetection-to-tensorrt/mmdet2trt/models/dense_heads/rpn_head.py", line 31, in forward
    device=cls_scores[0].device)
  File "/home/yangshuai/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yangshuai/project/runtime/mmdetection-to-tensorrt/mmdet2trt/core/anchor/anchor_generator.py", line 61, in forward
    x, stride=self.generator.strides[index], device=device)
  File "/home/yangshuai/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yangshuai/project/runtime/torch2trt_dynamic/torch2trt_dynamic/torch2trt_dynamic.py", line 307, in wrapper
    converter['converter'](ctx)
  File "/home/yangshuai/project/runtime/mmdetection-to-tensorrt/mmdet2trt/converters/anchor_generator.py", line 28, in convert_AnchorGeneratorDynamic
    base_anchors=base_anchors)
  File "/home/yangshuai/project/runtime/mmdetection-to-tensorrt/mmdet2trt/converters/plugins/create_gridanchordynamic_plugin.py", line 69, in create_gridanchordynamic_plugin
    return creator.create_plugin(layer_name, pfc)
AttributeError: 'NoneType' object has no attribute 'create_plugin'

It seems Python can not find plugin in create_gridanchordynamic_plugin.py, creator returns None:

    creator = trt.get_plugin_registry().get_plugin_creator(
        'GridAnchorDynamicPluginDynamic', '1', '')

I had install TensorRT, torch2trt_dynamic, amirstan_plugin, mmdet, mmdetection-to-tensorrt according to instructions.

Mask rcnn can not use int8 calib?

while i set (enable_mask=True) and (int8_mode=True, int8_calib_dataset=calib_dataset, int8_calib_alg="entropy"), the model can be convert to int8 model, but the calib will not work.

Whether the output of the model is the result before the NMS?

Thank you for your great work. But I have some problems when I conver the mmdetection to the torch model or the tensorrt model.
Whether the output of the model is the result before the NMS? I obtained the result with 100-dimensional when I print "inference_detector(trt_model, image_path, cfg_path, args.device)". And the result is different from the processing result of the mmdetection model.
And I test the torch_model and the tensorrt model with the parameters of return_wrap_model=True. The output of the tow models is same, but they are both tensor with 100-dimensional which is much more than my ground-truth.

performs well in NVIDIA agx platform,and some questions about batch inference

Thanks for your help, I have tested this tool in NVIDIA agx platform, we get a nearly 3X speed up compared using mmdetection directly.That's an amazing result!
And I have another question about batch inference for help. Mmdetection2.4 has supported batch inference. It is useful for us to add batch inference in this tool. The existed batch inference method in this tool is not work well, right?
By the way, this is really a good tool for the people use mmdetection. I hnow this is your part time job and there are still some inference qusetions while testing. Are you ready to publish a more detailed turiol? I will use this tool for a long time and I wish I can do something for this tool if I can ,such as docs writing, hhhh.
My email: [email protected] and wechat : heboyong

kerError

KeyError: 'QG_RCNN is not in the detector registry'

GPU Memory-Usage Control

Refering to the issue in Project mmdetection-to-tensorrt Multiple batch for one inference was support now. cool ~

However, GPU Memory-Usage will be increased when using multiple batch ( GPU Memory Usage is a tough problems in some project ) . So, how to control the Memory-Usage and then keep GPU Memory-Usage As low as possible ?
is it any params could be set in it ?

mAP of the converted cascade rcnn model drops

The project is a great job. We successfully converted the cascade rcnn model and increased the inference speed. But the mAP of the model on coco drops from 45 to 41.9 after the conversion for both fp32 and fp 16 setting. Could you please tell us where is the problem may be? Thanks. @grimoire

run mmdet2trt() failed

Hi,
I used the latest version of mmdetection-to-tensorrt,and the latest version of amirstan_plugin, I run mmdet2trt failed.
my model configuration is as follows:
CFG='./configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py' CHECK_POINT='./checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'

and then I called mmdet2trt api as follows:
trt_model, torch_model = mmdet2trt(ConfigInfo.CFG, ConfigInfo.CHECK_POINT, opt_shape_param=None, fp16_mode=False, max_workspace_size=1 << 30, return_warp_model=True)

but I got this failed issue:

root@de6cc444df48:/home/install-file/tensorrt/mmdetection-to-tensorrt-master# python test_mmdet2trt.py Warning: Encountered known unsupported method torch.Tensor.new_zeros Warning: Encountered known unsupported method torch.Tensor.new_tensor Warning: Encountered known unsupported method torch.Tensor.new_tensor python: /home/install-file/tensorrt/amirstan_plugin/src/plugin/repeatDimsPlugin/repeatDimsPlugin.cpp:144: virtual nvinfer1::DataType amirstan::plugin::RepeatDimsPluginDynamic::getOutputDataType(int, const nvinfer1::DataType*, int) const: Assertion nbInputs == 1' failed.
Aborted (core dumped)
`
thanks

Bug: Generalized Focal Loss

There is a problem for the conversion of Generalized Focal Loss module.

There is a warning:

WARNING:root:can't find wrap module for type:<class 'mmdet.models.detectors.gfl.GFL'>, use <class 'mmdet2trt.models.detectors.two_stage.TwoStageDetectorWraper'> instead.

I cannot continue to convert the module in GTX 1080 as:

Traceback (most recent call last):
File "tools/TRTconvert.py", line 173, in
main()
File "tools/TRTconvert.py", line 155, in main
max_workspace_size=max_workspace_size)
File "/home/fyy/Downloads/mmdetection-to-tensorrt/mmdet2trt/mmdet2trt.py", line 99, in mmdet2trt
wrap_model = build_wraper(torch_model, TwoStageDetectorWraper)
File "/home/fyy/Downloads/mmdetection-to-tensorrt/mmdet2trt/models/builder.py", line 36, in build_wraper
wrap_model = default_wraper(module)
File "/home/fyy/Downloads/mmdetection-to-tensorrt/mmdet2trt/models/detectors/two_stage.py", line 18, in init
mmdet_rpn_head = self.model.rpn_head
File "/home/fyy/anaconda3/envs/mmdec/lib/python3.6/site-packages/torch/nn/modules/module.py", line 594, in getattr
type(self).name, name))
AttributeError: 'GFL' object has no attribute 'rpn_head'

but the conversion will be successful using Tesla T4 with the same warning.

Could you please figure out the problem?

教程

请问作者,有教程文档说明嘛?

can't use trtexec to load the .engine file

To Reproduce

  1. export .engine file:
  • using docker "mmdet2trt_docker:v1.0" to export xxx.engine file.
  1. using NVIDIA TensorRT image to test the .engine file:
  • "docker run --rm -it --gpus all -v [your_local_path]:/workspace/hostdir nvcr.io/nvidia/tensorrt:20.11-py3"
  • install torch
  • install torch2trt_dynamic
  • install amirstan_plugin
  • using "trtexec --loadEngine=xxx.engine --verbose" to test the .engine file

enviroment:

  • OS: [Ubuntu]
  • python_version: [3.6.9]
  • pytorch_version: [1.6.0]
  • cuda_version: [10.2]
  • mmdetection_version: [2.3.0]
  • docker: nvcr.io/nvidia/tensorrt:20.03-py3

I attached some screenshots. Could you help to check the reason?
Thanks in advance!

1
2
3

mmdet2trt run error

While running DCNv2 model convertion to TRT I'm getting following error:

Traceback (most recent call last):
  File "/usr/local/bin/mmdet2trt", line 11, in <module>
    load_entry_point('mmdet2trt', 'console_scripts', 'mmdet2trt')()
  File "/root/space/mmdetection-to-tensorrt/mmdet2trt/mmdet2trt.py", line 358, in main
    enable_mask=args.enable_mask)
  File "/root/space/mmdetection-to-tensorrt/mmdet2trt/mmdet2trt.py", line 147, in mmdet2trt
    int8_calib_algorithm=int8_calib_algorithm)
  File "/root/space/torch2trt_dynamic/torch2trt_dynamic/torch2trt_dynamic.py", line 513, in torch2trt_dynamic
    outputs = module(*inputs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/space/mmdetection-to-tensorrt/mmdet2trt/models/detectors/two_stage.py", line 52, in forward
    result = self.roi_head_wraper(feat, rois, x.shape[2:])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/space/mmdetection-to-tensorrt/mmdet2trt/models/roi_heads/cascade_roi_head.py", line 104, in forward
    i, feat, torch.cat([rois_pad, rois], dim=1))
  File "/root/space/mmdetection-to-tensorrt/mmdet2trt/models/roi_heads/cascade_roi_head.py", line 59, in _bbox_forward
    roi_feats = bbox_roi_extractor(x[:bbox_roi_extractor.num_inputs], rois)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/space/mmdetection-to-tensorrt/mmdet2trt/models/roi_heads/roi_extractors/single_level_roi_extractor.py", line 22, in forward
    return self.roi_extractor(feats, rois, roi_scale_factor)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/space/torch2trt_dynamic/torch2trt_dynamic/torch2trt_dynamic.py", line 298, in wrapper
    outputs = method(*args, **kwargs)
  File "/root/space/mmdetection-to-tensorrt/mmdet2trt/models/roi_heads/roi_extractors/pooling_layers/roi_align_extractor.py", line 15, in forward
    return self.module(feats, rois, roi_scale_factor)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/fp16_utils.py", line 164, in new_func
    return old_func(*args, **kwargs)
  File "/root/space/mmdetection/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py", line 92, in forward
    rois_ = rois[inds]
  File "/root/space/torch2trt_dynamic/torch2trt_dynamic/torch2trt_dynamic.py", line 298, in wrapper
    outputs = method(*args, **kwargs)
IndexError: too many indices for tensor of dimension 2

I want to decrease model inference time, that's why I've modified mmdet2trt.py script - added following opt_shape_param onto line 321 (all sizes are of mod32):

    opt_shape_param=[
        [
            [1,3,544,960],    #  all are same because it's needed to use built engine in C++
            [1,3,544,960],  
            [1,3,544,960], 
        ]
    ]

Running following command to get the error above:

mmdet2trt --save-engine=true mmdetection/configs/dcn/cascade_rcnn_r101_fpn_dconv_c3-c5_1x_coco.py pytorch_models/cascade_rcnn_r101_fpn_dconv_c3-c5_1x_coco_20200203-3b2f0594.pth pytorch_models/cascade_rcnn_r101_fpn_dconv_c3-c5_1x_coco_20200203-3b2f0594_converted_fp32.pth

Notice: all that works fine if I use bigger sizes, for example [1,3,608,1088].
Why smaller sizes don't work?

enviroment:
the converter is run within the Docker provided in the project.

Additional context
Looks like there is some issue with padding during calibration step.

Can't use triton-inference-server to deploy the trt engine.

Describe the bug
I want to deploy the trt engine with triton-inference-server, but it can't load the trt model.

To Reproduce

I've converted the trt engine file from mmdet model with docker container CLI:

mmdet2trt --fp16 cascade_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco_fp16.py epoch_5.pth output.trt

So I got output.trt model file. And then I created the following directories to place my model:

models
    ├── big_model
    │   └── 1
    │       └── model.plan  # (rename from output.trt)
    └── libamirstan_plugin.so

and then want to deploy it with tritonserver:

docker run --rm --gpus device=3 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
  --env LD_PRELOAD=/models/libamirstan_plugin.so -p 8800:8000 -p 8801:8001 -p 8802:8002 \
  -v $(pwd):/models nvcr.io/nvidia/tritonserver:20.08-py3 \
  tritonserver --model-repository=/models --strict-model-config=false --log-verbose=1

However it cannot load the model:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 20.08 (build 15533555)

Copyright (c) 2018-2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

I0224 07:22:55.402466 1 metrics.cc:184] found 1 GPUs supporting NVML metrics
I0224 07:22:55.408652 1 metrics.cc:193]   GPU 0: GeForce RTX 2080 Ti
I0224 07:22:55.409009 1 server.cc:119] Initializing Triton Inference Server
I0224 07:22:55.850319 1 pinned_memory_manager.cc:195] Pinned memory pool is created at '0x7f13f6000000' with size 268435456
I0224 07:22:55.852497 1 netdef_backend_factory.cc:46] Create NetDefBackendFactory
I0224 07:22:55.852517 1 plan_backend_factory.cc:48] Create PlanBackendFactory
I0224 07:22:55.852523 1 plan_backend_factory.cc:55] Registering TensorRT Plugins
I0224 07:22:55.852566 1 logging.cc:52] Registered plugin creator - ::BatchTilePlugin_TRT version 1
I0224 07:22:55.852579 1 logging.cc:52] Registered plugin creator - ::BatchedNMS_TRT version 1
I0224 07:22:55.852600 1 logging.cc:52] Registered plugin creator - ::CoordConvAC version 1
I0224 07:22:55.852638 1 logging.cc:52] Registered plugin creator - ::CropAndResize version 1
I0224 07:22:55.852645 1 logging.cc:52] Registered plugin creator - ::DetectionLayer_TRT version 1
I0224 07:22:55.852653 1 logging.cc:52] Registered plugin creator - ::FlattenConcat_TRT version 1
I0224 07:22:55.852660 1 logging.cc:52] Registered plugin creator - ::GenerateDetection_TRT version 1
I0224 07:22:55.852671 1 logging.cc:52] Registered plugin creator - ::GridAnchor_TRT version 1
I0224 07:22:55.852683 1 logging.cc:52] Registered plugin creator - ::GridAnchorRect_TRT version 1
I0224 07:22:55.852709 1 logging.cc:52] Registered plugin creator - ::InstanceNormalization_TRT version 1
I0224 07:22:55.852716 1 logging.cc:52] Registered plugin creator - ::LReLU_TRT version 1
I0224 07:22:55.852723 1 logging.cc:52] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
I0224 07:22:55.852734 1 logging.cc:52] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
I0224 07:22:55.852742 1 logging.cc:52] Registered plugin creator - ::NMS_TRT version 1
I0224 07:22:55.852749 1 logging.cc:52] Registered plugin creator - ::Normalize_TRT version 1
I0224 07:22:55.852757 1 logging.cc:52] Registered plugin creator - ::PriorBox_TRT version 1
I0224 07:22:55.852765 1 logging.cc:52] Registered plugin creator - ::ProposalLayer_TRT version 1
I0224 07:22:55.852772 1 logging.cc:52] Registered plugin creator - ::Proposal version 1
I0224 07:22:55.852779 1 logging.cc:52] Registered plugin creator - ::PyramidROIAlign_TRT version 1
I0224 07:22:55.852785 1 logging.cc:52] Registered plugin creator - ::Region_TRT version 1
I0224 07:22:55.852793 1 logging.cc:52] Registered plugin creator - ::Reorg_TRT version 1
I0224 07:22:55.852802 1 logging.cc:52] Registered plugin creator - ::ResizeNearest_TRT version 1
I0224 07:22:55.852810 1 logging.cc:52] Registered plugin creator - ::RPROI_TRT version 1
I0224 07:22:55.852816 1 logging.cc:52] Registered plugin creator - ::SpecialSlice_TRT version 1
I0224 07:22:55.852828 1 onnx_backend_factory.cc:53] Create OnnxBackendFactory
I0224 07:22:55.860046 1 libtorch_backend_factory.cc:53] Create LibTorchBackendFactory
I0224 07:22:55.860167 1 custom_backend_factory.cc:46] Create CustomBackendFactory
I0224 07:22:55.860172 1 backend_factory.h:44] Create TritonBackendFactory
I0224 07:22:55.860203 1 ensemble_backend_factory.cc:47] Create EnsembleBackendFactory
I0224 07:22:55.860364 1 autofill.cc:142] TensorFlow SavedModel autofill: Internal: unable to autofill for 'big_model', unable to find savedmodel directory named 'model.savedmodel'
I0224 07:22:55.860396 1 autofill.cc:155] TensorFlow GraphDef autofill: Internal: unable to autofill for 'big_model', unable to find graphdef file named 'model.graphdef'
I0224 07:22:55.860420 1 autofill.cc:168] PyTorch autofill: Internal: unable to autofill for 'big_model', unable to find PyTorch file named 'model.pt'
I0224 07:22:55.860450 1 autofill.cc:180] Caffe2 NetDef autofill: Internal: unable to autofill for 'big_model', unable to find netdef files: 'model.netdef' and 'init_model.netdef'
I0224 07:22:56.123378 1 autofill.cc:376] failed to load /models/big_model/1/model.plan: Internal: onnx runtime error 1: /workspace/onnxruntime/onnxruntime/core/session/inference_session.cc:279 onnxruntime::InferenceSession::InferenceSession(const onnxruntime::SessionOptions&, const onnxruntime::Environment&, const void*, int) result was false. Could not parse model successfully while constructing the inference session

I0224 07:22:56.123459 1 autofill.cc:212] ONNX autofill: Internal: unable to autofill for 'big_model', unable to find onnx file
WARNING: Since openmp is enabled in this build, this API cannot be used to configure intra op num threads. Please use the openmp environment variables to control the number of threads.
E0224 07:23:19.480148 1 logging.cc:43] coreReadArchive.cpp (38) - Serialization Error in verifyHeader: 0 (Version tag does not match)
E0224 07:23:19.516318 1 logging.cc:43] INVALID_STATE: std::exception
E0224 07:23:19.516344 1 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
I0224 07:23:19.548534 1 autofill.cc:225] TensorRT autofill: Internal: unable to autofill for 'big_model', unable to find a compatible plan file.
W0224 07:23:19.548552 1 autofill.cc:265] Proceeding with simple config for now
I0224 07:23:19.548576 1 model_config_utils.cc:629] autofilled config: name: "big_model"

E0224 07:23:19.558529 1 model_repository_manager.cc:1633] unexpected platform type  for big_model
error: creating server: Internal - failed to load all models

Can anyone please point out what I'm missing? Thank you.

enviroment:

  • OS: [e.g. Ubuntu] 18.04
  • python_version: [e.g. 3.7] 3.6
  • pytorch_version: [e.g. 1.5.0] 1.6
  • cuda_version: [e.g. cuda-10.1] 10.2
  • cudnn_version: [e.g. 8.0.2.39] don't know, I use the docker provided.
  • mmdetection_version: [e.g. 2.3.0] 2.9.0

CUDA error: an illegal memory access was encountered

Describe the bug
First of all - thank you for a great project
I installed this repo on the nvcr.io/nvidia/pytorch:20.10-py3 image (together with all prerequisites).

I'm trying to convert the LVIS model from mmdetection (https://github.com/open-mmlab/mmdetection/blob/master/configs/lvis/mask_rcnn_r101_fpn_sample1e-3_mstrain_1x_lvis_v1.py).
The conversion process seems to go OK, but when I try to do inference I get RuntimeError: CUDA error: an illegal memory access was encountered when accessing the result tensors.
I tried both with and without mask support.

Please let me know if I'm doing something wrong.. Thanks again!

To Reproduce
Convert model using this command line:

mmdet2trt --fp16 1 --enable-mask 0 --save-engine 1 --max-workspace-gb 4 \
mask_rcnn_r101_fpn_sample1e-3_mstrain_1x_lvis_v1.py \
mask_rcnn_r101_fpn_sample1e-3_mstrain_1x_lvis_v1-ec55ce32.pth \
faster_rcnn_r101_lvis

Inference code:

import mmdet2trt.apis
import mmdet.apis
import imageio
from skimage.transform import resize

config_path = 'mask_rcnn_r101_fpn_sample1e-3_mstrain_1x_lvis_v1.py'
chekpoint_path = 'mask_rcnn_r101_fpn_sample1e-3_mstrain_1x_lvis_v1-ec55ce32.pth'

img = imageio.imread('test_image_1.jpg')
img = resize(img, (800, 1333), order=1, anti_aliasing=True, preserve_range=True)

mmdet_model = mmdet.apis.init_detector(config_path, chekpoint_path, 'cuda:0')

mmdet_res = mmdet.apis.inference_detector(mmdet_model, img)

trt_model = mmdet2trt.apis.init_detector('faster_rcnn_r101_lvis')

trt_res = mmdet2trt.apis.inference_detector(trt_model, img, config_path, 'cuda:0')

enviroment:

  • OS: [e.g. Ubuntu]
  • python_version: 3.6
  • pytorch_version: 1.7
  • cuda_version: 11.1
  • cudnn_version: [e.g. 8.0.2.39]
  • mmdetection_version: 2.6.0

Additional context
Add any other context about the problem here.

GPU Memory

Thanks for your useful project. It works really well in Tesla T4. I tested several algorithms, and on average, the algorithms can be three times faster. For some models such as HTC with backbone R-50-FPN, even about 8x speed up.

Since MMDetection does not support YOLOv4 now, I analyzed TensorRT's performance for YOLOv4 using the project 'Tianxiaomo/pytorch-YOLOv4'. About 2x speed up can be got compared to Darknet. I also found GPU memory-usage can be decreased to about 1/4 with TensorRT engine compared to pytorch model.

However, GPU memory-usage will be increased in your project. I studied the optimization principle of TensorRT, probably the memory-usage should be decreased because of dynamic tensor memory.

Could you please figure out the problem?

enviroment:

  • OS: Centos
  • python_version: 3.6
  • pytorch_version:1.5.1
  • cuda_version: cuda-10.2
  • cudnn_version: 7.6.5
  • mmdetection_version: 2.5.0

the question about interpolate

hey,when the tensor is interpolated in the fpn layer, the output value has changed, which is different from the original mmdetection output value, so this makes the output of the entire network deviate,The problem maybe happen in torch2trt_dynamic/torch2trt_dynamic/converters/interpolate_custom.py ,Is there any solution for this?

WARNING:root:can't find wrap module for type:<class 'mmdet.models.backbones.resnext.ResNeXt'>, use <class 'mmdet2trt.models.backbones.base_backbone.BaseBackboneWraper'> instead.

WARNING:root:can't find wrap module for type:<class 'mmdet.models.backbones.resnext.ResNeXt'>, use <class 'mmdet2trt.models.backbones.base_backbone.BaseBackboneWraper'> instead.
[TensorRT] ERROR: Parameter check failed at: ../builder/Layers.cpp::TopKLayer::3499, condition: k > 0 && k <= MAX_TOPK_K
[TensorRT] INTERNAL ERROR: Assertion failed: mParams.k > 0
../builder/Layers.cpp:3534
Aborting...
Traceback (most recent call last):
File "to_trt_model.py", line 32, in
trt_model = mmdet2trt(cfg_path, weight_path, opt_shape_param=opt_shape_param, fp16_mode=True, max_workspace_size=max_workspace_size)
File "/home/willer/tianchi/mmd_solution/mmdetection-to-tensorrt/mmdet2trt/mmdet2trt.py", line 147, in mmdet2trt
int8_calib_algorithm=int8_calib_algorithm)
File "/home/willer/tianchi/mmd_solution/torch2trt_dynamic/torch2trt_dynamic/torch2trt_dynamic.py", line 513, in torch2trt_dynamic
outputs = module(*inputs)
File "/home/willer/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in call_impl
result = self.forward(*input, **kwargs)
File "/home/willer/tianchi/mmd_solution/mmdetection-to-tensorrt/mmdet2trt/models/detectors/single_stage.py", line 49, in forward
result = bbox_head(feat, x)
File "/home/willer/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in call_impl
result = self.forward(*input, **kwargs)
File "/home/willer/tianchi/mmd_solution/mmdetection-to-tensorrt/mmdet2trt/models/dense_heads/gfl_head.py", line 79, in forward
scores = mm2trt_util.gather_topk(scores, 1, topk_inds)
File "/home/willer/tianchi/mmd_solution/mmdetection-to-tensorrt/mmdet2trt/ops/util_ops.py", line 50, in gather_topk
num_index = len(index.shape)
File "/home/willer/tianchi/mmd_solution/torch2trt_dynamic/torch2trt_dynamic/shape_converter.py", line 11, in new_getattribute

return get_tensor_shape(self)
File "/home/willer/tianchi/mmd_solution/torch2trt_dynamic/torch2trt_dynamic/shape_converter.py", line 5, in get_tensor_shape
return self.size()
File "/home/willer/tianchi/mmd_solution/torch2trt_dynamic/torch2trt_dynamic/torch2trt_dynamic.py", line 307, in wrapper
converter'converter'
File "/home/willer/tianchi/mmd_solution/torch2trt_dynamic/torch2trt_dynamic/converters/size.py", line 36, in convert_size
input_trt = trt_(ctx.network, input)
File "/home/willer/tianchi/mmd_solution/torch2trt_dynamic/torch2trt_dynamic/torch2trt_dynamic.py", line 144, in trt_
num_dim = len(t._trt.shape)
RuntimeError: std::exception

C++ inference error due to opt_shape_param

Hi @grimoire,

Thanks for your cool work! Could you please take a look at the problem I faced using converted model in C++?

To be able to run inference in C++ I must specify same opt_shape_param for min, max and opt like bellow (while model convertion):

opt_shape_param=[
        [
            [1,3,800,1344], 
            [1,3,800,1344],
            [1,3,800,1344]
        ]
      ]

Otherwise engine->getBindingDimensions() in C++ returns completely wrong set of input tensor dimensions when I try to use serialized engine file.

But next I noticed this approach works incorrectly in C++ - it never provides detections (num_detections returned is always zero and there are all zeros in other output tensors). I've checked how it works in python and found it raises following error:

[TensorRT] ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::1046, condition: profileMinDims.d[i] <= dimensions.d[i]

I guessed that meant I must specify different sets for min/max/opt opt_shape_param. I tried following and converted .pth model worked well in python:

opt_shape_param=[
        [
            [1,3,320,320], 
            [1,3,800,1344],
            [1,3,1344,1344]
        ]
      ]

But serialized .engine file in that case didn't work in C++. Looks like .engine includes wrong input tensor dims (as I wrote in the beginning).

Is there any solution for dynamic dims in C++? Or possibility to use non-dynamic dims?

I'm working within docker container built using provided in the project Dockerfile.

Thanks!

Not quite sure how to use batch inference

Hello! Thank you for a nice repository!

I have managed to get the basic optimization to work, i.e. i can convert models from FP32 to FP16 when the batch_size = 1. So for example, using:

opt_shape_param=[
[
[1,3,320,240], # min shape
[1,3,1333,800], # optimize shape
[1,3,1333,1333], # max shape
]
]

This gives me varying increases in performance depending on the model i optimize.

I am now trying to get a model to be optimized for 8 images as input i.e.

opt_shape_param=[
[
[8,3,320,240], # min shape
[8,3,640,480], # optimize shape
[8,3,1080,720], # max shape
]
]

I have read that batch inference is supported but when i look at the api for inference_detector, it can only accept one image or image path. So i tried using TRTModule() directly but now i am getting only zeroes as output even though nothing seems to be wrong with building the engine.

Would you perhaps have the time to explain how to use batch inference with a simple example?
If you need any additional information into what i have done so far I will post it here.

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.