Comments (6)
I have observed that the issue is in a pod being prematurely added to the service endpoints while it is still in the initialization phase, specifically when the init container starts. This premature addition leads to client errors as the pod is not fully ready to handle requests.
Logs:
2024-05-29 11:29:00 vertex-triton-server-6d64b9586d-pjfl9 0/1 Pending 0 2m17s <none> gke-vertex-serving-cluster-gpupool-92732217-22df <none> <none>
2024-05-29 11:29:01 vertex-triton-server-6d64b9586d-pjfl9 0/1 Init:0/1 0 2m18s <none> gke-vertex-serving-cluster-gpupool-92732217-22df <none> <none>
2024-05-29 11:29:03 vertex-triton-server-6d64b9586d-pjfl9 0/1 Init:0/1 0 2m20s 10.4.4.4 gke-vertex-serving-cluster-gpupool-92732217-22df <none> <none>
2024-05-29 11:29:10 vertex-triton-server-6d64b9586d-pjfl9 0/1 PodInitializing 0 2m27s 10.4.4.4 gke-vertex-serving-cluster-gpupool-92732217-22df <none> <none>
During scaling up, when a new pod is in the Init:0/1 state, it is already being assigned an IP (10.4.4.4). This results in client errors as mentioned above:
tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Socket closed
However, I am able to get the READY status correctly from the pods. So, the issue is probably not related to the readiness probe.
from server.
@patriksabol very interesting problem. Pods should not be selected by a service until they're running, have passed their startup and readiness probes.
In your second post it appears that none of the pods are past the init container stage, yet you're seeing their readiness probe succeeding? Is that correct?
Given the specific error of tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Socket closed
, I'd like to see the definition of your service as well. There could be something in that which is leading to the problem.
In the mean time, I'll review Triton code to see we've somehow introduced any timing issues w.r.t. readiness/liveness probes.
from server.
@whoisj In my second post, there is only one pod at four different times. I wanted to show that an IP address was assigned during the init state.
Meanwhile, I have removed the initContainer, and now the IP address is assigned to a running pod:
2024-05-29 15:13:45 vertex-triton-server-74c9fcf77f-q9gmf 0/1 ContainerCreating 0 4m45s <none> gke-vertex-serving-cluster-gpupool-a0253cf2-v5j9 <none> <none>
2024-05-29 15:15:07 vertex-triton-server-74c9fcf77f-q9gmf 0/1 Running 0 6m7s 10.96.5.4 gke-vertex-serving-cluster-gpupool-a0253cf2-v5j9 <none> <none>
2024-05-29 15:15:46 vertex-triton-server-74c9fcf77f-q9gmf 0/1 Running 0 6m46s 10.96.5.4 gke-vertex-serving-cluster-gpupool-a0253cf2-v5j9 <none> <none>
2024-05-29 15:15:46 vertex-triton-server-74c9fcf77f-q9gmf 1/1 Running 0 6m46s 10.96.5.4 gke-vertex-serving-cluster-gpupool-a0253cf2-v5j9 <none> <none>
But I am seeing READY status correctly for pods, meaning when READY is 1/1, using this command:
kubectl get pods -o jsonpath='{.items[*].status.conditions[?(@.type=="Ready")]}'
However, the issue with the "Socket closed" error still persists. When all pods are in the READY state, there are no "Socket closed" errors.
This is my client code:
with grpcclient.InferenceServerClient("IP_ADDRESS:8001") as client: # Note the change to grpcclient and typically a different port
input0 = grpcclient.InferInput("tile", [1, 3, image.shape[0], image.shape[1]], "UINT8")
input0.set_data_from_numpy(np.expand_dims(np.moveaxis(image, -1, 0), axis=0))
# Prepare the model_name input as an array of bytes
model_name_bytes = np.array([model_name.encode('utf-8')])
model_name_bytes = np.expand_dims(model_name_bytes, axis=0)
try:
input1 = grpcclient.InferInput("model_name", [1, 1], "BYTES")
input1.set_data_from_numpy(model_name_bytes)
outputs = [
grpcclient.InferRequestedOutput("geojson_output")
]
response = client.infer('cartographer_model', [input0, input1], outputs=outputs, model_version="1")
geojson_result = response.as_numpy("geojson_output").tobytes().decode('utf-8')
print(f'{strftime("%Y-%m-%d %H:%M:%S")} [INFO] Received GeoJSON response')
except Exception as e:
print(f'{strftime("%Y-%m-%d %H:%M:%S")} [ERROR] {e}')
print(f'{strftime("%Y-%m-%d %H:%M:%S")} [ERROR] Failed to receive GeoJSON response')
This is service definition:
apiVersion: v1
kind: Service
metadata:
name: vertex-triton-server-service
labels:
app: vertex-triton-server
spec:
type: LoadBalancer
ports:
- port: 8000
targetPort: 8000
name: http
- port: 8001
targetPort: 8001
name: grpc
- port: 8002
targetPort: 8002
name: metrics
selector:
app: vertex-triton-server
from server.
in your service definition, I believe targetPort
should be the name of the port in the target container.
apiVersion: v1
kind: Service
metadata:
name: vertex-triton-server-service
labels:
app: vertex-triton-server
spec:
type: LoadBalancer
ports:
- port: 8000
targetPort: http-triton
name: http
- port: 8001
targetPort: grpc-triton
name: grpc
- port: 8002
targetPort: metrics-triton
name: metrics
selector:
app: vertex-triton-server
By specifying the numeric port number, you could somehow be bypassing the service's selector. I am not 100% sure, but I think it's worth trying the port names instead to see if it resolves the issue or not. Let me know.
from server.
Unfortunately, that does not work. It seems that passing values by name is just for clarity of configuration.
from server.
As I mentioned, given the error, it appears to be a problem with the service and not with Triton Server.
Perhaps could check Triton Server logs to see if any inference requests are even being sent to the pods in question.
from server.
Related Issues (20)
- Stateful decoupled bls model: malloc_consolidate(): unaligned fastbin chunk detected
- triton need api docs like vllm fastapi docs HOT 1
- How to use StopStream when use AsyncStreamInfer?
- ValidateBytesInputs() check failed in Big Endian Machines HOT 2
- How to send the byte or string data in array in perf analyzer HOT 2
- vllm backend - UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'numpy' HOT 1
- Support passing variables in config.pbtxt
- Failed to stat file model.onxx while using conda-pack in configs HOT 1
- Support request cancellation on timeout for sync grpc client
- Discrepancy in Inference Timing between trtexec and Triton Server(TensorRT backend) with gRPC Communication for YOLOV8 HOT 1
- Inconsistent prediction results using onnx backend with tensorrt enabled
- [feature request] C# / .NET bindings for in-proc C-API and in-proc wrapper's C++-API HOT 3
- Encounter `Stub process is not healthy` only with kserve pod HOT 1
- low performance at large concurrent requests HOT 5
- tritonserver preload trt plugin got warning message and many core files : Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
- Can't load custom backend shared library from s3 (24.07) HOT 2
- Build Triton and Backends On Windows HOT 4
- floating point exception with Triton version 24.07 when loading tensorrt_llm backend models HOT 1
- How is the order determined for loading a model onto a specific device?
- [ERROR] No available memory for the cache blocks.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from server.