syencil / tensorrt Goto Github PK

TensorRT-7 Network Lib 包括常用目标检测、关键点检测、人脸检测、OCR等可训练自己数据

CMake 0.51% C++ 97.74% C 0.49% Cuda 1.01% Objective-C 0.26%

fcos hourglass psenet retinaface retinanet tensorrt yolov3 yolov5

tensorrt's Introduction

TensorRT-7 Network Lib

Introduction

Python ===> Onnx ===> tensorRT ===> .h/.so
支持FP32，FP16，INT8量化。支持serialize，deserialize
基于线程池实现多线程并发，提升预处理和后处理的速度
重写或融合部分Opencv算子，提升Cache使用率以及避免不必要的扫描操作
支持infer时GPU和CPU端异步进行实现延迟隐藏
支持剪枝、蒸馏、量化、换轻量级backbone
推荐搭配https://github.com/Syencil/mobile-yolov5-pruning-distillation使用

Model Zoo

Model	Training git	Infer Time	Total Time
Yolov5x	https://github.com/ultralytics/yolov5 https://github.com/Syencil/mobile-yolov5-pruning-distillation	32.5ms	58ms
PANNet(Pse++)	https://github.com/WenmuZhou/PAN.pytorch	18.5ms	45ms
PSENet	https://github.com/WenmuZhou/PSENet.pytorch	22ms	48ms
Yolov3	https://github.com/YunYang1994/tensorflow-yolov3	14.5ms	29.5ms
Retinaface	https://github.com/biubug6/Pytorch_Retinaface https://github.com/Syencil/Pytorch_Retinaface	2.3ms	12.3ms
Retinanet	mmdetection + configs/nas_fpn/retinanet_r50_fpn_crop640_50e_coco.py	22.9ms	333ms
Fcos	mmdetection + configs/fcos/fcos_r50_caffe_fpn_4x4_1x_coco.py	-	-
ResNet	-	-	-
Hourglass	https://github.com/Syencil/Keypoints	28ms	37ms
SimplePose	https://github.com/microsoft/human-pose-estimation.pytorch	3ms	7ms

测试环境为Tesla P40 + 4个CPU线程。

Quick Start

Code -> Onnx

	git	Convert
tensorflow	https://github.com/onnx/tensorflow-onnx	`python -m tf2onnx.convert`
pytorch	-	`torch.onnx.export(model, img, weights, verbose=False, opset_version=11, input_names=['images'], output_names=['output'])`
Onnx	onnx-simplifier	`python3 -m onnxsim in.onnx out.onnx`

C++

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make + project_lib
make + project_name
./bin/project_name

Tips

Onnx必须指定为输入全尺寸，再实际中trt也不存在理想上的动态输入，所以必须在freeze阶段指明输入大小。
构建新项目时，通常只需要需要继承TensorRT类下面的DetectionTRT/SegmentationTRT/KeypointTRT类。只需要实现postProcess就可以用了。上层暴露出来的接口为initSession和predOneImage两个方法，方便调用。
由于ONNX和TRT目前算子实现的比较完善，大多数时候只需要实现相应后处理即可，针对特定算子通常可以再python代码中用一些trick进行替换，实在不行可以考虑自定义plugin
关于CHW和HWC的数据格式
- CHW: 对于GPU更优。使用CUDA做infer或者后处理的话，由于硬件DRAM的原因，CHW可以保证线程是以coalescing的方式读取。具体性能对比参考Programming_Massively_Parallel_Processors
- HWC: 对于CPU更优。使用CPU进行处理的时候，HWC格式可以保证单个线程处理的数据具有连续的内存地址。而CPU缓存具有空间局部性，这样能极大的提升效率。
- 综上：如果后处理使用CPU进行decode，建议在onnx输出HWC格式，如果使用GPU进行decode，建议在onnx输出CHW格式。对于输入端则没有具体测试，主要是相信tensorflow虽然按照之前的HWC格式，但是在操作中肯定也是做了优化

Darknet

简介

位置：yolov3_darknet_main.cpp

注意事项

利用pytorch-yolov4将darknet模型转换成onnx之后使用

SimplePose

简介

位置：simplePose_main.cpp
python训练代码git：https://github.com/microsoft/human-pose-estimation.pytorch

注意事项

转出onnx之后，在解析onnx时，需要将tmp Cuda的空间设置大一点，不然解析deconv的时候会报错。

StreamProcess

简介

位置：stream_main.cpp
此项目为基于yolov5的GPU和CPU端分离之后进行延迟隐藏的简单demo
以对视频进行推理和渲染为基础示例，可以自由更改或重写preFunc和postFunc来实现不同的需求

PanNet (PseNet V2)

简介

位置：psenetv2_main.cpp
python训练原版代码git：https://github.com/WenmuZhou/PAN.pytorch
适配TensorRT修改后的代码git：https://github.com/Syencil/PAN.pytorch

注意事项

pan和pse代码其实高度相似，导出的方法可以参考PseNet也可以参考我fork后改的代码。
pan网络中转出onnx的结果是没有经过sigmoid的(尝试一下加在后处理)
sigmoid在CPU中计算耗时比较大，可以参考fast-sigmoid-algorithm。 CPU上性能对比结果100000 times sigmoid ==> 2.81878ms fast sigmoid ==> 0.589737ms，而GPU上两者差异忽略不记。

    fast_sigmoid(x) = (x / (1 + |x|)) * 0.5 + 0.5

PseNet

简介

位置：psenet_main.cpp
python训练原版代码git：https://github.com/WenmuZhou/PSENet.pytorch

注意事项

torch转onnx的代码可以加在predict.py中，只需要在Pytorch_model这个类里面加一个成员函数即可

    def export(self, onnx_path, input_size):
        assert isinstance(input_size, list) or isinstance(input_size, tuple)
        self.net.export = True
        img = torch.zeros((1, 3, input_size[0], input_size[1])).to(self.device)
        with torch.no_grad():
            torch.onnx.export(self.net, img, onnx_path, verbose=True, opset_version=11, export_params=True, do_constant_folding=True)
        print("Onnx Simplify...")
        os.system("python3 -m onnxsim {} {}".format(onnx_path, onnx_path))
        print('Export complete. ONNX model saved to %s\nView with https://github.com/lutzroeder/netron' % onnx_path)

为了方便trt的处理，我把sigmoid加入到了torch的代码中。在models/model.py中修改PSENet的forward代码，同时__init__中加入成员变量export=False来控制

        if self.export:
            x = torch.sigmoid(x)
        return x

在onnx转换为trt的时候可能会出现This version of TensorRT only supports asymmetric这个问题，bilinear的上采样方式可能会存在问题，解决方式是将所有的F.interpolate中的align_corners=True，同时修改onnx-tensorrt中对应的cpp然后重新编译替换trt的lib
如果需要看每一个kernel的特征图，只需要在psenet.cpp里面把注释打开即可。

Yolov5

简介

位置：yolov5_main.cpp
python训练原版代码git：https://github.com/ultralytics/yolov5
模型压缩加速git：https://github.com/Syencil/mobile-yolov5-pruning-distillation

注意事项

trt的decode针对的是BxHxWxAC的格式（方便按height方向并行化以及其他嵌入式接入）。原版yolov5导出的onnx是BxAxHxWxC，需要在models/yolo.py第28行改为

            if self.export:
                x[i] = x[i].permute(0, 2, 3, 1).contiguous()
            else:
                x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

RetinaFace

简介

位置：retinaface_main.cpp
Python训练原版代码git：https://github.com/biubug6/Pytorch_Retinaface

注意事项

执行convert_to_onnx.py的时候需要更改opset_version=11，verbose=True
因为项目不需要关键点，所以把landmark的decode部分去掉了
直接使用阈值0.6（原版0.02 + topK）过滤然后接NMS
支持多线程操作

Yolov3

简介

位置：yolov3_main.cpp
Python训练原版代码git：https://github.com/YunYang1994/tensorflow-yolov3
适配TensorRT修改后的代码git：https://github.com/Syencil/tensorflow-yolov3

注意事项

训练部分同原版git相同，主要在freeze的时候使用了固定尺寸输入，并修改了python中decode的实现方法。修改为core/yolov3.py增加一个decode_full_shape类方法

    def decode_full_shape(self, conv_output, anchors, stride):
        """
        return tensor of shape [batch_size, output_size, output_size, anchor_per_scale, 5 + num_classes]
               contains (x, y, w, h, score, probability)
        """
        conv_shape = conv_output.get_shape().as_list()
        batch_size = conv_shape[0]
        output_size = conv_shape[1]
        anchor_per_scale = len(anchors)

        conv_output = tf.reshape(conv_output, (batch_size, output_size, output_size, anchor_per_scale, 5 + self.num_class), name="reshape")

        conv_raw_dxdy = conv_output[:, :, :, :, 0:2]
        conv_raw_dwdh = conv_output[:, :, :, :, 2:4]
        conv_raw_conf = conv_output[:, :, :, :, 4:5]
        conv_raw_prob = conv_output[:, :, :, :, 5:]

        y_np = np.tile(np.arange(output_size, dtype=np.int32)[..., np.newaxis], [1, output_size])
        x_np = np.tile(np.arange(output_size, dtype=np.int32)[np.newaxis, ...], [output_size, 1])

        xy_grid_np = np.concatenate([np.reshape(x_np, [np.shape(x_np)[0], np.shape(x_np)[1], 1]), np.reshape(y_np, [np.shape(y_np)[0], np.shape(y_np)[1], 1])], axis=2)
        xy_grid_np = np.tile(np.reshape(xy_grid_np, [1, np.shape(xy_grid_np)[0], np.shape(xy_grid_np)[1], 1, np.shape(xy_grid_np)[2]]), [batch_size, 1, 1, anchor_per_scale, 1])

        anchor_np = np.tile(np.reshape(anchors, [1, 1, 1, -1]), [batch_size, output_size, output_size, 1])

        xy_grid = tf.constant(xy_grid_np, dtype=tf.float32)
        stride_tf = tf.constant(shape=[batch_size, output_size, output_size, anchor_per_scale * 2], value=stride, dtype=tf.float32)
        anchor_tf = tf.constant(anchor_np, dtype=tf.float32)

        pred_xy = tf.sigmoid(conv_raw_dxdy)
        pred_wh = tf.exp(conv_raw_dwdh)

        pred_xy = tf.reshape(pred_xy, [batch_size, output_size, output_size, anchor_per_scale * 2])
        pred_wh = tf.reshape(pred_wh, [batch_size, output_size, output_size, anchor_per_scale * 2])
        xy_grid = tf.reshape(xy_grid, [batch_size, output_size, output_size, anchor_per_scale * 2])

        pred_xy = tf.add(pred_xy, xy_grid)
        pred_xy = tf.multiply(pred_xy, stride_tf)
        pred_wh = tf.multiply(pred_wh, anchor_tf)
        pred_wh = tf.multiply(pred_wh, stride_tf)

        pred_xy = tf.reshape(pred_xy, [batch_size, output_size, output_size, anchor_per_scale, 2])
        pred_wh = tf.reshape(pred_wh, [batch_size, output_size, output_size, anchor_per_scale, 2])

        pred_xywh = tf.concat([pred_xy, pred_wh], axis=4)

        pred_conf = tf.sigmoid(conv_raw_conf)
        pred_prob = tf.sigmoid(conv_raw_prob)

        return tf.concat([pred_xywh, pred_conf, pred_prob], axis=4, name="decode")

NMS代码采用faster-rcnn中的NMS。改动部分在于支持多维度bbox输入，并且shared memory改为动态数组。
INT8部分50张图差不多就够了
支持多线程操作

RetinaNet

简介

位置：retinanet_main.cpp
Python训练代码git：mmdetection configs/nas_fpn/retinanet_r50_fpn_crop640_50e_coco.py

注意事项

使用转换onnx时候需要设置opset=11
如果在解析onnx时遇到 Assertion failed: ctx->tensors().count(inputName) 这个错误的话，下载最新的onnx-tensorrt源码编译，替换trt对应的lib

ResNet

简介

位置：resnet_main.cpp
对应任意可直接转换的分类模型

FCOS

简介

位置：fcos_main.cpp
Python训练代码git：mmdetection configs/fcos/fcos_r50_caffe_fpn_4x4_1x_coco.py

注意事项

目前trt暂时不支持Group Normalization，如果需要使用GN版本需要单独实现。
有空会更新GN

Hourglass

简介

位置：hourglass_main.cpp（Hourglass）
Python训练代码git：https://github.com/Syencil/Keypoints

更新日志

2021.01.26

增加darknet的trt版本

2020.11.05

增加了一个KeypointTRT抽象类。截至目前可完成检测、分割、关键点的"一键"模型转换和部署
实现微软的SimplePose

2020.09.10

实现StreamProcess，将CPU和GPU端分离实现延迟隐藏，以yolov5和视频流为demon
线程安全队列可设置成容量有限的队列（避免爆内存）。同时将push和emplace操作改成try_系列，即可能成功或者失败而不是阻塞。
增加一个nms_cpu的方式

2020.09.03

重写opencv的bilinear resize算子
将cvtColor和HWC2CHW融合为一个
提升cache的命中率
开启多线程进行图像预处理，Opencv原始的一套 resize + cvtColor 大概15ms。改写之后 resize + cvtColor + HWC2CHW一共4.8ms

2020.08.31

实现线程安全队列
实现基于线程安全队列的线程池
将模型decode部分用线程池代替std::thread实现。应用在yolov3，yolov5，retinaface模型上
抽空把预处理部分也改了，目前遍历的方式不太能cache shot。

2020.08.26

整体抽象化，严格按照面向对象将模型剥离出来，提高代码复用率。
Detection类已转换网络：yolov5，yolov3，retinaface，retinanet，fcos
Segmentation类已转换网络：psenet，psev2
尚未转换的网络：rensnet（分类），hourglass（关键点）

2020.08.19

重新写了CMAKE，把历史遗留的一些问题解决了。重新组织语言，尽可能用cmake内置的一些变量，同时把依赖的一些路径进行合并，提升工程的可移植性。
测了OpenMP，在多线程执行任务的速度上不如std::thread，但是需要频繁开启销毁线程时速度比std::thread快。后来查到OpenMP是基于线程池的，故考虑用线程池来代替。

2020.07.13

增加OCR系列的PANNet，即PSEv2。模型整体轻量化，且不需要像PSE那样设置这么多kernel。不过讲道理，PSE和PAN在decode的时候都要遍历所有的文本像素点，并没有快很多。测试发现decode部分的实际速度差距已经很小了，感觉FPS提升主要还是换了轻量的backbone。
这一次将后处理的sigmoid操作没有放到onnx中。同时在网上发现一个fast sigmoid的操作。不过如果走GPU的话差异并不大，走CPU的话速度差了5 6倍。
改了一下OCR检测输出，既有mask也有RBox

2020.07.04

增加OCR系列的PSENet。infer时间不算太慢，但是decode部分的渐进式扩张算法耗时太久，这一块其实可以再优化。

2020.06.30

细节修复，使用cudaEvent来计算时间，而不是cpu

2020.06.17

U版的yolo性能确实好，Python代码封装的也很不错。试了一下yolo5s转成trt速度快了好多，但是准确率也不低
目前准备转OCR中的ctpn，但是由于代码复现效果一直不好，可能还得等一段时间。
有空更新一下ncnn的git

2020.05.28

今天开始决定把每次更新的内容记录一下。

支持多线程（部署在yolo和retinaface上）
把BRG转RGB和HWC转CHW放在一起操作
worker=4的时候前后处理总用时可以从30ms压缩到15ms左右

tensorrt's People

Contributors

Stargazers

Watchers

Forkers

hyaihjq davis-love-ai xinsuinizhuan wolfworld6 chaos1992 donnyyou jingmouren yyht wendybin233 xuhao9166 tiqq111 hufangjian guanshuicheng zhangzeyu1989 xuhuaze707313 aliushn xyt2008 lyp-deeplearning li-lai starry226 zhh-2023 zlszhonglongshen personalliu benimaru1997wgm datomi79 felixzhang7 zzmcdc baifanysu chuanruihu jewelc92 pustar azuredsky runauto mlbo opencvfun jensenhjs willforcv linnawang76 xxradon knightofdawn pgdtgq wuzuowuyou jkoforgh gvraky spyderm4n ucas-iigroup ycdhqzhiai guruzoa rena-ganba yulinhuyang baodijun summonswar aaa-fan hxl1990 neverstoplearn wangzhenlin123 qlhua001 gaofssvm xiesibo chester-zzz idontlikelongname jjzhang166 ngsford g7b9 qixuxiang chaucerg zy0851 lxgychen czy789 lijuny jdadong cqray1990 jqup20 outbreak-hui taol777 yh-meng monoloxo lianlou baicaipcx wu-yakun shenmayufei butlelyguru iloveai8086 zouweilin23 myknowntime yudiprtm lvpchen henye mvpzhangqiu lifeng0718 standdrinkmilk wentaoguang senli123 liqianqi jo-dean classicvalues hyb1234hi lindsayshuo eagle20111 git-tengsun

tensorrt's Issues

Serializing Engine Failed

多线程问题

for (auto &future : futures) {
            future = mThreadPool.submit(&Yolov5::postProcessParall, this, block_start, block_size, height, width, scale_idx, postThres, origin_output, &bboxes);
            block_start += block_size;
        }
        this->postProcessParall(block_start, height-block_start, height, width, scale_idx, postThres, origin_output, &bboxes);
        for (auto &future : futures){
            future.get();
        }

请问

this->postProcessParall(block_start, height-block_start, height, width, scale_idx, postThres, origin_output, &bboxes);

这一步是有什么用吗？
这一步之前提交多线程操作
这一步之后运行多线程操作
这一步没看懂有什么用

yolov5s 检测不出框

大佬好，感谢大佬的工作

我是从您那个轻量化 yolov5s 来的，用的也是您那个 yolov5s_voc 的代码，放到这个工程里面，都是可以成功运行的，但是最后预测完图片上啥也没有，打印了一下 bboxs 大小也是 0

求教大佬这个该如何解决

再次感谢！

hi

可以加个联系方式吗,xiong97531,wechat num

最后的trt可以被编译成so吗？

如题，谢谢！

请问yolov5的版本是多少的，v4.0的么

这个项目YOLOV5只能用x一个版本吗？

如题

CalibrationTablePath and SerializedPath

hi, could you please tell me how can I get CalibrationTablePath file and SerializedPath file?
Thanks, look forward for your reply

TRT plugin的相关疑问？

作者你好，对于TRT不支持的层是如何处理的呢？比如Unsample层？期待回复

请问，有开源的tensorrt对siamrpn目标跟踪算法加速的吗？

yolov5 run error:Found unsupported datatype (11) when importing initializer: model.0.conv.total_ops

hello，I follow the Quick Start，and get yolov5s.onnx from https://github.com/Syencil/mobile-yolov5-pruning-distillation, when run the ./bin/yolov5 raise the error "Found unsupported datatype (11) when importing initializer: model.0.conv.total_ops". I check the onnx export log, "DOUBLE" datatype exists, may be this lead to error, how can i solve this, any help will be appreciated.
I use the pytorch1.4 and TensorRT7.1.3.4.

tensorrt是deb安装的么？谢谢

YOLOv5s 精度降低及模型数据格式问题

主要存在两个问题，求解答：
1、在问题#11中，对于yolov5s转换完的模型出现检测出来的目标少于python的问题，您指出是预处理方式的问题，请问具体如何修改？是否有尝试结果？
2、在运行转换好的trt模型时，出现如下的报错
[11/18/2020-14:24:13] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/18/2020-14:24:13] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/18/2020-14:24:13] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/18/2020-14:24:13] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/18/2020-14:24:13] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/18/2020-14:24:13] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/18/2020-14:24:13] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/18/2020-14:24:13] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/18/2020-14:24:13] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/18/2020-14:24:17] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
这个能够有比较简单的方法解决吗？还是需要去原始模型代码中转换相关部分的数据类型？

编译问题

参考以下编译，报错。其中执行make + project_lib或make + project_name会报make: *** 没有规则可制作目标“+”。停止。错误。

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make + project_lib
make + project_name
./bin/project_name

如果直接make报以下错误，请问如何解决或者对依赖库有版本要求

[ 28%] Linking CXX executable bin/stream
/usr/bin/ld: 找不到 -lnvinfer
/usr/bin/ld: 找不到 -lnvonnxparser
/usr/bin/ld: 找不到 -lnvinfer
/usr/bin/ld: 找不到 -lnvonnxparser
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/stream.dir/build.make:129：bin/stream] 错误 1
make[1]: *** [CMakeFiles/Makefile2:184：CMakeFiles/stream.dir/all] 错误 2
make: *** [Makefile:84：all] 错误 2

使用cmake编译后，使用make指令报错。

[ 20%] Linking CXX executable bin/yolo5
/usr/bin/ld: 找不到 -lyolo5trt

这个报错找不到yolo5trt库是什么原因？
cmakelist.txt里面还有哪些要改

hand keypoints detection

@Syencil

Is this repo has implementation for hand pose keypoints detection?

please advise

再实际中trt也不存在理想上的动态输入?trt不是支持动态输入阿，像psenet的话肯定需要动态输入阿

are you available?

@Syencil Hi there,
i have a task dealing with text detection.
are you available for freelancing?
if so contact me by my email so i can share with you more information.
[email protected]

RetinaFace issue on Jetson Nano with TensorRT 7.1.3 & Cuda 10.2

Hey @Syencil .
First of all thank you for the awesome code samples.

RetinaFace sample works well on TensorRT 7.1.3 with Cuda 11 but fails with no detection on Jetson Nano (TRT 7.1.3 & Cuda 10.2).

Can you help me in this regard?

tensorRT6.0有问题

tensorRT6.0 尝试有问题，编译加载pytorch的onnx，会有问题

YOLOv5 parse problem

The problem is that I can not parse the yolov5 model

the error is

While parsing node number 176 [Resize]:
ERROR: ModelImporter.cpp:124 In function parseGraph:
[5] Assertion failed: ctx->tensors().count(inputName)
[07/28/2021-11:07:41] [E] Parsing File Failed
[07/28/2021-11:07:41] [E] Init Session Failed!
Segmentation fault

My environment

Tensorrt 7.0
cuda 10.2
opencv 3.4

I hope I can get some help, thanks !

how to build in win10?

Hi,thank for your shareing. But i have some problems when I cmake in win10.
here is my cmakelist(only build retinaface):

cmake_minimum_required(VERSION 3.5)
project(tensorRT)
set_property(GLOBAL PROPERTY USE_FOLDERS on)

output

set(EXECUTABLE_OUTPUT_PATH "${PROJECT_BINARY_DIR}/bin")
message(STATUS "Project_binary_dir : ${PROJECT_BINARY_DIR}")

c++ 11

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")

tensorRT

set(tensorrt_dir D:/c/TensorRT-7.0.0.11.Windows10.x86_64.cuda-10.0.cudnn7.6/TensorRT-7.0.0.11)
set(project_dir my/path/to/tensorRT-7)
include_directories(${tensorrt_dir}/include)
include_directories(${project_dir}/include)
link_directories(${tensorrt_dir}/lib)
link_directories(${project_dir}/source)
link_directories(${project_dir}/lib)

Loggers

aux_source_directory(${common_dir}/source common_src)
set(COMMON_SRC ${common_src} CACHE INTERNAL "common_source" )
set(LOGGER_SRC ${common_dir}/source/logger.cpp CACHE INTERNAL "logger" )

message(STATUS "TensorRT Header => ${tensorrt_dir}/include")
message(STATUS "TensorRT Lib => ${tensorrt_dir}/lib")

find opencv

find_package(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})
if(NOT OpenCV_LIBRARY_DIRS)
set(OpenCV_LIBRARY_DIRS D:/software/opencv/build/x64/vc14/lib)
message(WARING " Can not find opencv lib. It will use the default path => ${OpenCV_LIBRARY_DIRS}")
endif()
link_directories(${OpenCV_LIBRARY_DIRS})
message(STATUS "OpenCV_INCLUDE_DIRS => ${OpenCV_INCLUDE_DIRS}")
message(STATUS "OpenCV_LIBRARY_DIRS => ${OpenCV_LIBRARY_DIRS}")

if(NOT OpenCV_FOUND)
message(ERROR "OpenCV not found!")
endif(NOT OpenCV_FOUND)

find cuda

find_package(CUDA)
find_package(CUDA REQUIRED)

#include_directories(${CUDA_INCLUDE_DIRS})
include_directories(C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/include)
if(NOT CUDA_LIBRARY_DIRS)
set(CUDA_LIBRARY_DIRS C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/lib/x64)
message(WARING " Can not find CUDA lib. It will use the default path => ${CUDA_LIBRARY_DIRS}")
endif()
link_directories(${CUDA_LIBRARY_DIRS})
message(STATUS "CUDA_INCLUDE_DIRS : ${CUDA_INCLUDE_DIRS}")
message(STATUS "CUDA_LIBRARY_DIRS : ${CUDA_LIBRARY_DIRS}")

###############################################
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${project_dir}/lib)
set(TRT source/tensorrt.cpp source/logger.cpp source/utils.cpp source/utils.cu)
set(INT8 source/Int8Calibrator.cu source/Int8Calibrator.cpp)

set(RETINAFACE ${TRT} ${INT8} source/retinaface.cpp)

set(CV_LIB libopencv_core.so libopencv_imgproc.so libopencv_imgcodecs.so)
set(TRT_LIB libnvinfer.so libnvonnxparser.so cudart.so)

cuda_add_executable(retinaface retinaface_main.cpp)
target_link_libraries(retinaface retinafacetrt.so ${TRT_LIB} ${CV_LIB})

#########################3#####################
cuda_add_library(retinafacetrt SHARED ${RETINAFACE})

the "set(project_dir /work/tensorRT-7) ",what is means of the file :tensorRT-7? I can not generate this and find in gits file.

I can cmake/build successfully with compiler VNC14 and generate tensorRT.sln. BUT, it can not open retinaface.h when I debug the .sln, fatal error: no such file in retinaface.vcxproj.
and do not generate retinafaceInt8.calibration and retinafaceInt8.calibration.
so what should I do for compiling in win10?

编译错误：/usr/bin/ld: cannot find -lfcostrt

jetson Xavier NX，jetpack 4.5， cuda10.2，tensor rt 7.1.3

cmake之后直接make，报错：

[ 31%] Linking CXX executable bin/fcos
/usr/bin/ld: cannot find -lfcostrt
/usr/bin/ld: cannot find -llibnvinfer.so.7.1.3
/usr/bin/ld: cannot find -llibnvonnxparser.so.7.1.3
collect2: error: ld returned 1 exit status
CMakeFiles/fcos.dir/build.make:113: recipe for target 'bin/fcos' failed
make[2]: *** [bin/fcos] Error 1
CMakeFiles/Makefile2:215: recipe for target 'CMakeFiles/fcos.dir/all' failed
make[1]: *** [CMakeFiles/fcos.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

我将CMakeLists.txt中的第56行：
set(TRT_LIB libnvinfer.so.7.0.0 libnvonnxparser.so.7.0.0 cuda.so)
改为：
set(TRT_LIB /usr/lib/aarch64-linux-gnu/libnvinfer.so.7.1.3 /usr/lib/aarch64-linux-gnu/libnvonnxparser.so.7.1.3 cuda.so)
说明：我的libnvinfer.so文件在/usr/lib/aarch64-linux-gnu/位置，并且版本是7.1.3，并没有找到7.0.0，所以修改了版本和位置。

然后再make，报错：

[ 31%] Linking CXX executable bin/fcos
/usr/bin/ld: cannot find -lfcostrt
collect2: error: ld returned 1 exit status
CMakeFiles/fcos.dir/build.make:117: recipe for target 'bin/fcos' failed
make[2]: *** [bin/fcos] Error 1
CMakeFiles/Makefile2:215: recipe for target 'CMakeFiles/fcos.dir/all' failed
make[1]: *** [CMakeFiles/fcos.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

比之前少了两行，但还是没有生成libfcostrt.so这个文件。

No SOURCES given to target: ctpn

hi there,

when I do the cmake . step, error pops up like:

CMake Error at /opt/cmake-3.12.2-Linux-x86_64/share/cmake-3.12/Modules/FindCUDA.cmake:1816 (add_library):
Cannot find source file:
source/ctpn.cpp

please help, thanks!

Psnet onnx转trt咨询

大佬，请问Psenet转trt的代码方便提供一份不？

About the infer time problem

Hi there! Thank you for your excellent codes and it helps me a lot. I trained a network with pytorch and deployed it with tensorRT successfully. But the infer time (do NOT include pre/post process) got longer compared to inferring in torch. While converted to INT8 the model is getting faster but not enough. Is that normal? Maybe there is something I missed while deploying the model. I have no idea about it and can you hint me with any ideas?
GPU: GTX1080Ti/CUDA10.0
Model: DeeplabV3Plus with backbone ResNet50
pytorch1.6 infer time 15ms
tensorrt infer time 22ms/FP32, 13ms/INT8

可以吧，训练好的测试模型也上传一下吗？

这样可以跑一跑代码

/usr/bin/ld: 找不到 -lyolov5trt

in make:
not found yolov5trt

run yolov5 failed

运行yolov5代码，报错

double free or corruption (out)

通过删减代码，执行前处理和推理过程后会报此类错误，重复释放内存问题

数据预处理问题

大佬，我想问一下，就是数据预处理那，cv2直接读图是BGR格式，但是原工程训练时候用的是RGB格式，不转成RGB没有问题吗？

Retinanet pytorch 转 onnx

你好，请问你将mmdetection中Retinanet的模型转换为onnx是使用mmdetection提供的工具转换的吗？

Batch process

Do you plan to support batch processing??

请问支持yolov5m.pt或者yolov5l.pt吗？

还是说只能对yolov5x.pt进行加速？

trtParams.CalibrationTablePath

你好! psenetv2 里面 trtParams.CalibrationTablePath和trtParams.SerializedPath这两个参数是怎么定的

PSEnet: onnx转pb模型

作者你好，我有一个问题，是关于把pse算法的onnx模型转换到pb模型，遇到的一个问题：
**Traceback (most recent call last):
File "/home/fffan/fffan_files/Experiment/Example/onnx2pb/onnx2pb.py", line 45, in
onnx2pb_2(onnx_input_path)
File "/home/fffan/fffan_files/Experiment/Example/onnx2pb/onnx2pb.py", line 14, in onnx2pb_2
tf_rep = prepare(model)
File "/home/fffan/下载/onnx-tensorflow-tf-1.x/onnx_tf/backend.py", line 65, in prepare
return cls.onnx_model_to_tensorflow_rep(model, strict)
File "/home/fffan/下载/onnx-tensorflow-tf-1.x/onnx_tf/backend.py", line 85, in onnx_model_to_tensorflow_rep
return cls._onnx_graph_to_tensorflow_rep(model.graph, opset_import, strict)
File "/home/fffan/下载/onnx-tensorflow-tf-1.x/onnx_tf/backend.py", line 146, in _onnx_graph_to_tensorflow_rep
strict=strict)
File "/home/fffan/下载/onnx-tensorflow-tf-1.x/onnx_tf/backend.py", line 241, in _onnx_node_to_tensorflow_op
return handler.handle(node, tensor_dict=tensor_dict, strict=strict)
File "/home/fffan/下载/onnx-tensorflow-tf-1.x/onnx_tf/handlers/handler.py", line 60, in handle
cls.args_check(node, kwargs)
File "/home/fffan/下载/onnx-tensorflow-tf-1.x/onnx_tf/handlers/backend/resize.py", line 89, in args_check
"Tensorflow")
File "/home/fffan/下载/onnx-tensorflow-tf-1.x/onnx_tf/common/exception.py", line 50, in call
raise self._func(self.get_message(op, framework))
RuntimeError: Resize coordinate_transformation_mode=pytorch_half_pixel is not supported in Tensorflow.
我没看到载pytorch有关pytorch_half_pixel这个的使用，但是转换模型的时候总是这个错。请问作者能不能解决这个问题？
非常期待作者的回复。

fcos bug

hi，fcos.cpp的45行，
float tmp = sigmoid(cls_f[pos*length+c]) * cen_f[pos];

是不是应该写成
float tmp = sigmoid(cls_f[pos+length*c]) * cen_f[pos];

yolov5s int8量化不出框？

您好，fp32测试是正常出框的，int8量化后不出框是怎么回事？

冻结图

在tf2中如何将yolov3转换为冻结图，并生成对应的.pb文件？

pytorch 模型的输入格式为B，C，H，W，而opencv读入的图片为H，W，C，请问是C++是怎么将H，W，C转为C，H，W格式的？类似于numpy.transpose的功能

这是一个非常好的项目, 但是https://github.com/Syencil/tensorflow-yolov3没有了，能否重新上传您的项目呢，我使用原版代码无法完成.pb到.onnx的转换

如题: https://github.com/Syencil/tensorflow-yolov3没有了，能否重新上传您的项目，我使用原版代码无法完成.pb到.onnx的转换,您可以分享一下么

请教关于pytorch-yolov3-onnx-tensorRT加速问题

大佬，我自己尝试将U版的yolov3转onnx转tensorRT，但是测到的速度在pytorch原版上是推理20ms一张图而在tensorRT推理也是差不多这个速度。请问要如何加速呀？转int8吗？

resnet infer编译有问题，代码继承那可能需要改下

retinaface : INVALID ARGUMENT : can not find binding of given name 588,587.

ONNX IR version: 0.0.6
Opset version: 11
Producer name: pytorch
Producer version: 1.5
Domain:
Model version: 0

While parsing node number 108 [Resize]:
ERROR: ModelImporter.cpp:124 In function parseGraph:
[5] Assertion failed: ctx->tensors().count(inputName)
[08/12/2020-10:45:42] [E] Parsing File Failed
[08/12/2020-10:45:42] [E] Init Session Failed!
Segmentation fault (core dumped)

I have check the print export_onnx_model. no %108.

if I choose Opset version=12, the print of export_onnx_model.:%108:Long()
but,the same error counted

syencil / tensorrt Goto Github PK

tensorrt's Introduction

TensorRT-7 Network Lib

Introduction

Model Zoo

Quick Start

Code -> Onnx

C++

Tips

Darknet

简介

注意事项

SimplePose

简介

注意事项

StreamProcess

简介

PanNet (PseNet V2)

简介

注意事项

PseNet

简介

注意事项

Yolov5

简介

注意事项

RetinaFace

简介

注意事项

Yolov3

简介

注意事项

RetinaNet

简介

注意事项

ResNet

简介

FCOS

简介

注意事项

Hourglass

简介

更新日志

2021.01.26

2020.11.05

2020.09.10

2020.09.03

2020.08.31

2020.08.26

2020.08.19

2020.07.13

2020.07.04

2020.06.30

2020.06.17

2020.05.28

tensorrt's People

Contributors

Stargazers

Watchers

Forkers

tensorrt's Issues

output

c++ 11

tensorRT

Loggers

find opencv

find cuda

Recommend Projects

Recommend Topics

Recommend Org