Comments (6)
TRTIS only supports variable-sized dimension for batching, but this is a common request so we are planning on fixing it. Issue #8 is tracking this request so add upvotes there to indicate that you are interested in it.
from server.
Did you solve this issue? And if so, could you share your solution?
I am also interested in how to serve models with variably sized outputs.
from server.
I was wrong, as the output shape is not variable, there is an upper bound for number of objects detected. So jus set the dims to this upper bound. That should work fine
from server.
Thanks for getting back @srihari-humbarwadi. Seems defining an upper bound is fine because your model type returns a fixed size tensor, but I'm still curious if variable sizes are supported in tensorrt-inference-server
. Perhaps a dev can point me to the relevant docs??
For context:
I'm assuming your model (possibly from here ?) outputs tensors of fixed size, intending for boxes to be ignored based on the associated score.
However, returning a fixed size output is not ideal for performance reasons. While it doesn't matter much for simple result types, consider the case where the served model is a MaskRCNN and the return type includes a pixel mask for each detected object. Without an output signature with variable sized tensors, the payload size would be worst-case for every return. I like to support variable outputs to reduce the payload for the common case (where less than max objects are detected). For tf-serving, this involved modifying the output before exporting a saved model, such that the return type only includes results for object's whose score exceeds some threshold.
Is this behavior supported in tensorrt-inference-server
from server.
Thanks @deadeyegoodwin !
from server.
hello,have you solved it?
from server.
Related Issues (20)
- rapidjson.JSONDecodeError: Parse error at offset 116: Invalid value. HOT 2
- Triton's gpu memory footprint is twice that of Tensorrt-LLM
- Server stuck Inference phase from Client's request ! HOT 2
- why can not cancel the request in the first model of ensemble_model HOT 6
- How to make use of dynamic batching with Triton Python backend? HOT 1
- MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort HOT 1
- In trtton, the difference in inference speed is not large depending on the number of GPUs. HOT 4
- Tensorrt_backend does not close plugin handle. HOT 1
- [feature request] Static (or mostly static) build instructions (currently build with only `libboost_filesystem.a` fails) HOT 3
- The model got an issue with the OpenVINO Backend. HOT 3
- CPU Core Affinity Pinning for `KIND_CPU` models HOT 1
- When max_batch_size=0, how to set cuda graph shape HOT 2
- about BLOCK_SIZE and 65536 HOT 2
- Shared base model weights in memory HOT 1
- boost library cannot download HOT 2
- Fail to download boost_1_76_0.tar.gz while create stub HOT 3
- Add CMake FetchContent support to triton client libraries HOT 6
- [Question] Could we use dynamic rank input on triton? HOT 2
- `pb_utils.TritonError.NOT_HEALTHY` Error Code HOT 2
- The size of restfulapi response is too big, triton server is oom-killed by kernel.But why does triton continue to apply memory without considering memory limits? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from server.