Hello, While looking through the code for deploying a model using Te

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

When configuring TensorRT backend for int8, both int8 and fp16 is enabled about mmdeploy HOT 4 CLOSED

open-mmlab commented on May 9, 2024

When configuring TensorRT backend for int8, both int8 and fp16 is enabled

from mmdeploy.

Comments (4)

grimoire commented on May 9, 2024

It is OK to enable both of them. TensorRT will choose a higher-precision kernel if it results in overall lower runtime, or if no low-precision implementation exists.
Read this for more details.

from mmdeploy.

tehkillerbee commented on May 9, 2024

@grimoire That makes sense however Nvidia states "..There are three precision flags: FP16, INT8, and TF32, and they may be enabled independently.." so are you really supposed to enable more than one at the same time? Well if it works, I guess it is fine. In any case, I will test this further and see what happens on my Jetson AGX Xavier.

Slightly off-topic. In the link you sent it states that "..TensorRT will still choose a higher-precision kernel if it results in overall lower runtime..."
However, if I enable FP16 when using a GPU architecture that does not use it (eg. my Quadro P2000), a warning is given:

[TRT] [W] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.

Naturally, the default FP32 would've been the fastest - but the FP16 is still used instead. So does TensorRT actually pick the fastest one in this case?

from mmdeploy.

grimoire commented on May 9, 2024

According to my experiment, enabling both flags is slightly faster than int8 or fp16 only, so I guess TensorRT would do some optimization about the precision.

And I guess TensorRT will use fp16 if the layer does not support int8(whatever device). That's why the performance would be bad on the device without fp16 support.

from mmdeploy.

tehkillerbee commented on May 9, 2024

@grimoire I see, that is good to know. In that case, I think we can close this issue.

from mmdeploy.

When configuring TensorRT backend for int8, both int8 and fp16 is enabled about mmdeploy HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent