Coder Social home page Coder Social logo

Comments (7)

FeiYull avatar FeiYull commented on May 24, 2024

@JinRanYAO Is the data you're testing a picture or a video?

from tensorrt-alpha.

FeiYull avatar FeiYull commented on May 24, 2024

@JinRanYAO Try to use the following instructions to achieve fp16 quantization, and improve performance by about 100%

./trtexec --onnx=yolov8n-pose.onnx --saveEngine=yolov8n-pose-fp16.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:2x3x640x640 --maxShapes=images:4x3x640x640 --fp16

【FP32】:
[04/07/2024-09:15:16] [I] preprocess time = 0.841472; infer time = 5.80734; postprocess time = 0.186192
[04/07/2024-09:15:16] [I] preprocess time = 0.837504; infer time = 5.76032; postprocess time = 0.13976
[04/07/2024-09:15:16] [I] preprocess time = 0.845184; infer time = 5.75726; postprocess time = 0.209248
[04/07/2024-09:15:16] [I] preprocess time = 0.839952; infer time = 5.76222; postprocess time = 0.170016
[04/07/2024-09:15:16] [I] preprocess time = 0.844816; infer time = 5.76472; postprocess time = 0.146288
[04/07/2024-09:15:16] [I] preprocess time = 0.838784; infer time = 5.76434; postprocess time = 0.203216
[04/07/2024-09:15:16] [I] preprocess time = 0.808864; infer time = 5.5223; postprocess time = 0.150368
[04/07/2024-09:15:16] [I] preprocess time = 0.811856; infer time = 5.52139; postprocess time = 0.184
[04/07/2024-09:15:16] [I] preprocess time = 0.80856; infer time = 5.52371; postprocess time = 0.20792
[04/07/2024-09:15:16] [I] preprocess time = 0.809776; infer time = 5.51814; postprocess time = 0.168032
[04/07/2024-09:15:16] [I] preprocess time = 0.810064; infer time = 5.5215; postprocess time = 0.208496
[04/07/2024-09:15:16] [I] preprocess time = 0.811216; infer time = 5.51797; postprocess time = 0.201968
[04/07/2024-09:15:16] [I] preprocess time = 0.809136; infer time = 5.51658; postprocess time = 0.179296

【FP16】:
[04/07/2024-09:15:26] [I] preprocess time = 0.84056; infer time = 2.59362; postprocess time = 0.177744
[04/07/2024-09:15:26] [I] preprocess time = 0.84752; infer time = 2.43448; postprocess time = 0.132512
[04/07/2024-09:15:26] [I] preprocess time = 0.840256; infer time = 2.42754; postprocess time = 0.206288
[04/07/2024-09:15:26] [I] preprocess time = 0.841216; infer time = 2.43272; postprocess time = 0.160144
[04/07/2024-09:15:26] [I] preprocess time = 0.840736; infer time = 2.42774; postprocess time = 0.137648
[04/07/2024-09:15:26] [I] preprocess time = 0.841296; infer time = 2.4313; postprocess time = 0.194464
[04/07/2024-09:15:26] [I] preprocess time = 0.840992; infer time = 2.43011; postprocess time = 0.149072
[04/07/2024-09:15:26] [I] preprocess time = 0.83664; infer time = 2.43083; postprocess time = 0.184176
[04/07/2024-09:15:26] [I] preprocess time = 0.841136; infer time = 2.4283; postprocess time = 0.20736
[04/07/2024-09:15:26] [I] preprocess time = 0.844864; infer time = 2.4312; postprocess time = 0.165424
[04/07/2024-09:15:26] [I] preprocess time = 0.842; infer time = 2.42846; postprocess time = 0.207552
[04/07/2024-09:15:26] [I] preprocess time = 0.8444; infer time = 2.43054; postprocess time = 0.203488
[04/07/2024-09:15:26] [I] preprocess time = 0.84024; infer time = 2.43106; postprocess time = 0.179952

from tensorrt-alpha.

JinRanYAO avatar JinRanYAO commented on May 24, 2024

@FeiYull Thank you for your quick reply!

  1. My project is built on ROS, so I don't use utils::InputStream. I use yolov8.init() in the beginning. When I receive an image, I run the following code each frame. I think it is like using utils::InputStream::IMAGE? Is this code reasonable or somewhere can be improved?
        imgs_batch.emplace_back(frame.clone());
        yolov8.copy(imgs_batch);
	utils::DeviceTimer d_t1; yolov8.preprocess(imgs_batch);  float t1 = d_t1.getUsedTime();
	utils::DeviceTimer d_t2; yolov8.infer();				  float t2 = d_t2.getUsedTime();
	utils::DeviceTimer d_t3; yolov8.postprocess(imgs_batch); float t3 = d_t3.getUsedTime();
	float avg_times[3] = { t1, t2, t3 };
	sample::gLogInfo << "preprocess time = " << avg_times[0] << "; "
		"infer time = " << avg_times[1] << "; "
		"postprocess time = " << avg_times[2] << std::endl;
	yolov8.reset();
	imgs_batch.clear();
  1. Thanks, I try to use fp16, and the infer time decreased from 40ms to 30ms, with also 20ms preprocess time. Can I use int8 to get faster?
  2. Additionally, my raw image shape is 1920x1080. Is too much time spent on resize?

from tensorrt-alpha.

FeiYull avatar FeiYull commented on May 24, 2024

@JinRanYAO
It is recommended to enter the function YOLOv8Pose::preprocess to test the internal time overhead.

void YOLOv8Pose::preprocess(const std::vectorcv::Mat& imgsBatch)

123

from tensorrt-alpha.

JinRanYAO avatar JinRanYAO commented on May 24, 2024

@FeiYull It seems that resize, bgr2rgb, norm and hwc2chw cost almost the same time, about 5ms for each process. Could I use the similar fuctions in opencv when I receive image, instead of using these processes here?

from tensorrt-alpha.

FeiYull avatar FeiYull commented on May 24, 2024

@JinRanYAO U can merge the following operations to one:

  1. resizeDevice
  2. bgr2rgbDevice
  3. normDevice

Inside the resizeDevice's cuda kernel function you call, modify the following:

[modify bofore]

[modify after]
`
//pdst[0] = c0;
//pdst[1] = c1;
//pdst[2] = c2;

// bgr2rgb
pdst[0] = c2;
pdst[1] = c1;
pdst[2] = c0;

// normlization
// float scale = 255.f
// float means[3] = { 0.f, 0.f, 0.f };
// float stds[3] = { 1.f, 1.f, 1.f };
pdst[0] = (pdst[0] / scale - means[0]) / stds[0];
pdst[1] = (pdst[1] / scale - means[0]) / stds[0];
pdst[2] = (pdst[2] / scale - means[0]) / stds[0];
`

from tensorrt-alpha.

JinRanYAO avatar JinRanYAO commented on May 24, 2024

@FeiYull Thanks for your advice, the preprocess time decreases to 8ms after merging resize, bgr2rgb, norm to one. Then I resize the image to trtfile size when it is received, and use the same src_size and dst_size in yolov8-pose. Finally I simplify the preporcess code by deleting affinematrix and interpolation to save more time. Here is my code now.

__global__
void resize_rgb_padding_device_kernel(unsigned char* src, int src_width, int src_height, int src_area, int src_volume,
        float* dst, int dst_width, int dst_height, int dst_area, int dst_volume,
        int batch_size, float padding_value, float inv_scale)
{
    int dx = blockDim.x * blockIdx.x + threadIdx.x;
    int dy = blockDim.y * blockIdx.y + threadIdx.y;
    
    if (dx < dst_area && dy < batch_size)
    {
        int dst_y = dx / dst_width;
        int dst_x = dx % dst_width;

        unsigned char* v = src + dy * src_volume + dst_y * src_width * 3 + dst_x * 3;

        float* pdst = dst + dy * dst_volume + dst_y * dst_width * 3 + dst_x * 3;
        pdst[0] = (v[2] + 0.5f) * inv_scale;
        pdst[1] = (v[1] + 0.5f) * inv_scale;
        pdst[2] = (v[0] + 0.5f) * inv_scale;
    }
}

After simplifying, the preprocess time decreases to about 6ms, with right inference result. Is this code all right or anything can be improved?

from tensorrt-alpha.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.