I have deployed the Triton Inference Server to Google Cloud Platform and I am using a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Pods Receiving Traffic Too Early When Scaling with HPA Causes 'Socket Closed' Errors on Triton Inference Server about server HOT 6 OPEN

patriksabol commented on September 26, 2024

Pods Receiving Traffic Too Early When Scaling with HPA Causes 'Socket Closed' Errors on Triton Inference Server

from server.

Comments (6)

patriksabol commented on September 26, 2024

I have observed that the issue is in a pod being prematurely added to the service endpoints while it is still in the initialization phase, specifically when the init container starts. This premature addition leads to client errors as the pod is not fully ready to handle requests.

Logs:

2024-05-29 11:29:00 vertex-triton-server-6d64b9586d-pjfl9   0/1     Pending           0          2m17s   <none>     gke-vertex-serving-cluster-gpupool-92732217-22df   <none>           <none>
2024-05-29 11:29:01 vertex-triton-server-6d64b9586d-pjfl9   0/1     Init:0/1          0          2m18s   <none>     gke-vertex-serving-cluster-gpupool-92732217-22df   <none>           <none>
2024-05-29 11:29:03 vertex-triton-server-6d64b9586d-pjfl9   0/1     Init:0/1          0          2m20s   10.4.4.4   gke-vertex-serving-cluster-gpupool-92732217-22df   <none>           <none>
2024-05-29 11:29:10 vertex-triton-server-6d64b9586d-pjfl9   0/1     PodInitializing   0          2m27s   10.4.4.4   gke-vertex-serving-cluster-gpupool-92732217-22df   <none>           <none>

During scaling up, when a new pod is in the Init:0/1 state, it is already being assigned an IP (10.4.4.4). This results in client errors as mentioned above:

tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Socket closed

However, I am able to get the READY status correctly from the pods. So, the issue is probably not related to the readiness probe.

from server.

whoisj commented on September 26, 2024

@patriksabol very interesting problem. Pods should not be selected by a service until they're running, have passed their startup and readiness probes.

In your second post it appears that none of the pods are past the init container stage, yet you're seeing their readiness probe succeeding? Is that correct?

Given the specific error of tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Socket closed, I'd like to see the definition of your service as well. There could be something in that which is leading to the problem.

In the mean time, I'll review Triton code to see we've somehow introduced any timing issues w.r.t. readiness/liveness probes.

from server.

patriksabol commented on September 26, 2024

@whoisj In my second post, there is only one pod at four different times. I wanted to show that an IP address was assigned during the init state.

Meanwhile, I have removed the initContainer, and now the IP address is assigned to a running pod:

2024-05-29 15:13:45 vertex-triton-server-74c9fcf77f-q9gmf   0/1     ContainerCreating   0          4m45s   <none>      gke-vertex-serving-cluster-gpupool-a0253cf2-v5j9   <none>           <none>
2024-05-29 15:15:07 vertex-triton-server-74c9fcf77f-q9gmf   0/1     Running             0          6m7s    10.96.5.4   gke-vertex-serving-cluster-gpupool-a0253cf2-v5j9   <none>           <none>
2024-05-29 15:15:46 vertex-triton-server-74c9fcf77f-q9gmf   0/1     Running             0          6m46s   10.96.5.4   gke-vertex-serving-cluster-gpupool-a0253cf2-v5j9   <none>           <none>
2024-05-29 15:15:46 vertex-triton-server-74c9fcf77f-q9gmf   1/1     Running             0          6m46s   10.96.5.4   gke-vertex-serving-cluster-gpupool-a0253cf2-v5j9   <none>           <none>

But I am seeing READY status correctly for pods, meaning when READY is 1/1, using this command:

kubectl get pods -o jsonpath='{.items[*].status.conditions[?(@.type=="Ready")]}'

However, the issue with the "Socket closed" error still persists. When all pods are in the READY state, there are no "Socket closed" errors.

This is my client code:

with grpcclient.InferenceServerClient("IP_ADDRESS:8001") as client:  # Note the change to grpcclient and typically a different port
    input0 = grpcclient.InferInput("tile", [1, 3, image.shape[0], image.shape[1]], "UINT8")
    input0.set_data_from_numpy(np.expand_dims(np.moveaxis(image, -1, 0), axis=0))

    # Prepare the model_name input as an array of bytes
    model_name_bytes = np.array([model_name.encode('utf-8')])
    model_name_bytes = np.expand_dims(model_name_bytes, axis=0)
    try:
        input1 = grpcclient.InferInput("model_name", [1, 1], "BYTES")
        input1.set_data_from_numpy(model_name_bytes)

        outputs = [
            grpcclient.InferRequestedOutput("geojson_output")
        ]

        response = client.infer('cartographer_model', [input0, input1], outputs=outputs, model_version="1")

        geojson_result = response.as_numpy("geojson_output").tobytes().decode('utf-8')
        print(f'{strftime("%Y-%m-%d %H:%M:%S")} [INFO] Received GeoJSON response')
    except Exception as e:
        print(f'{strftime("%Y-%m-%d %H:%M:%S")} [ERROR] {e}')
        print(f'{strftime("%Y-%m-%d %H:%M:%S")} [ERROR] Failed to receive GeoJSON response')

This is service definition:

apiVersion: v1
kind: Service
metadata:
  name: vertex-triton-server-service
  labels:
    app: vertex-triton-server
spec:
  type: LoadBalancer
  ports:
    - port: 8000
      targetPort: 8000
      name: http
    - port: 8001
      targetPort: 8001
      name: grpc
    - port: 8002
      targetPort: 8002
      name: metrics
  selector:
    app: vertex-triton-server

from server.

whoisj commented on September 26, 2024

in your service definition, I believe targetPort should be the name of the port in the target container.

apiVersion: v1
kind: Service
metadata:
  name: vertex-triton-server-service
  labels:
    app: vertex-triton-server
spec:
  type: LoadBalancer
  ports:
    - port: 8000
      targetPort: http-triton
      name: http
    - port: 8001
      targetPort: grpc-triton
      name: grpc
    - port: 8002
      targetPort: metrics-triton
      name: metrics
  selector:
    app: vertex-triton-server

By specifying the numeric port number, you could somehow be bypassing the service's selector. I am not 100% sure, but I think it's worth trying the port names instead to see if it resolves the issue or not. Let me know.

from server.

patriksabol commented on September 26, 2024

Unfortunately, that does not work. It seems that passing values by name is just for clarity of configuration.

from server.

whoisj commented on September 26, 2024

As I mentioned, given the error, it appears to be a problem with the service and not with Triton Server.

Perhaps could check Triton Server logs to see if any inference requests are even being sent to the pods in question.

from server.

Pods Receiving Traffic Too Early When Scaling with HPA Causes 'Socket Closed' Errors on Triton Inference Server about server HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent