Comments (6)
Thanks for reporting this issue. I have filed an internal issue for further investigation.
from server.
@justanhduc 22.12 is a very old release. We have made some changes in our client library to use CUDA python for cuda shared memory handle implementation.
Can you try upgrading to our latest client and let us know if you are seeing this issue here?
We are still using cudaGetDevice call so probably you would run into this issue again.
from server.
Hi @tanmayv25. Thanks for the pointer. I used the latest server docker (24.03) and the latest client and I'm still facing this issue. Could you have a look further into this issue?
from server.
Hi @justanhduc
I tried to reproduce the issue and created a repro script. I used Triton 24.03. With p =1, I did not get any error and the script ran successfully.
Attaching the repro script for reference.
cuda_shm_repo.zip
from server.
@justanhduc Can you help us reproduce the issue?
from server.
Hi guys @lkomali @tanmayv25. Sorry I didn't see the noti earlier. I will run your code and see if I can reproduce the error
from server.
Related Issues (20)
- Unable to use pytoch library with libtorch backend when using triton inference server In-Process python API HOT 5
- model analyser stucks HOT 2
- Question to huggingface model using triton
- Questions about input and output shape in model configuration when batch size is 1 HOT 3
- Model Management HOT 1
- passing input data HOT 1
- Python backend status zombie but Tritonserver `v2/health` still return 200 OK HOT 1
- Onnxruntime backend doesn't load model when container is running on Ubuntu HOT 1
- Cant build python+onnx+ternsorrtllm backends r24.04 HOT 3
- increase chunk size for streaming with tensorrtllm_backend
- trt accelerator
- Can't build the Docker image r24.04 on Azure Nvidia VMI HOT 3
- Build error when building new image on top of the `nvcr.io/nvidia/tritonserver:24.04-py3-sdk` container image from NGC HOT 3
- Feature Questions HOT 1
- Triton Server for the model mixtral-8x7b HOT 1
- max_batch_effect HOT 1
- launch_triton_server.py attempts to place two models on the same GPU instead of one model on two GPUs
- CUDA Failing to initialize in docker container HOT 3
- Add to the serve-side metrics on the input and output sizes HOT 1
- Pods Receiving Traffic Too Early When Scaling with HPA Causes 'Socket Closed' Errors on Triton Inference Server
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from server.