Deion Client's TURN connections should remain alive after th

Addendum: with 2 "ready" and one "terminating" pods: <div class="snippet-clipboard

TURN connection breaks when the backend pod enters graceful shutdown about stunner HOT 13 CLOSED

rg0now commented on June 16, 2024

TURN connection breaks when the backend pod enters graceful shutdown

from stunner.

Comments (13)

rg0now commented on June 16, 2024

This ends up being two related bugs, both occurring due to that we immediately remove the pod IP from the stunnerd config when the pod enters the terminating state, instead of waiting until it finally really terminates. This then causes the below problems:

Clients can no longer refresh existing permissions to terminating pods. This usually happens 1-5 mins after pod shutdown starts.
Sending packets on existing connections to terminating pods immediately fail with "peer port administratively prohibited" error.

The takeaway is that fixing this on the stunnerd side would be more difficult than expected.

from stunner.

rg0now commented on June 16, 2024

Further investigations: turns out the problem is that we're still using the old Endpoints API for backend pod discovery, which does not consider terminating pods, in contrast to the modern EndpointSlice API, which does.

By default, Kubernetes removes all pod IPs from the Endpoints object that belong to "Terminating" backend pods. For the above example, after graceful shutdown starts we get and empty Endpoints resource:

apiVersion: v1
kind: Endpoints
metadata:
  name: media-plane
  namespace: default
  labels:
    app: media-plane

Since we use the endpoint IPs in this object for permission request handlers and for port-range filtering, we immediately break all existing connections to "Terminating" pods.

The EndpointSlice API, however, returns all the pod IPs, containing the Terminating ones as well:

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: media-plane-qzbrt
  namespace: default
  labels:
    app: media-plane
    kubernetes.io/service-name: media-plane
addressType: IPv4
ports:
- name: ""
  port: 9001
  protocol: UDP
endpoints:
- addresses:
  - 10.244.0.3
  conditions:
    ready: false
    serving: true
    terminating: true
  nodeName: stunner

Observe the endpoint IP 10.244.0.3 with terminating: true: that's our terminating backend pod.

So the solution is to rewrite the gateway operator from the Endpoints API to the EndpointSlice API and then add the pod IPs for terminating pods to the list of permitted endpoints.

from stunner.

rg0now commented on June 16, 2024

Addendum: with 2 "ready" and one "terminating" pods:

media-plane-55658cb4f5-hdw6c   1/1     Running       0          10.244.0.14   
media-plane-55658cb4f5-pjp9c   1/1     Terminating   0          10.244.0.12   
media-plane-55658cb4f5-vjvnz   1/1     Running       0          10.244.0.13

We get this EndpointSlice:

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: media-plane-qzbrt
  namespace: default
  labels:
    app: media-plane
    kubernetes.io/service-name: media-plane
ports:
- name: ""
  port: 9001
  protocol: UDP
addressType: IPv4
endpoints:
- addresses:
  - 10.244.0.12
  conditions:
    ready: false
    serving: true
    terminating: true
  nodeName: stunner
  targetRef:
    kind: Pod
    name: media-plane-55658cb4f5-pjp9c
    namespace: default
    uid: 277d19b4-f5ab-4616-8024-4deebca8f7e9
- addresses:
  - 10.244.0.13
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: stunner
  targetRef:
    kind: Pod
    name: media-plane-55658cb4f5-vjvnz
    namespace: default
    uid: 965ae9b0-13c8-4145-a850-533b857876af
- addresses:
  - 10.244.0.14
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: stunner
  targetRef:
    kind: Pod
    name: media-plane-55658cb4f5-hdw6c
    namespace: default
    uid: 02a3e3e2-b83c-442c-afe4-4a7ad647bf19

from stunner.

rg0now commented on June 16, 2024

I've tested this and the issue can no longer be reproduced with the new STUNner dev version that uses the EndpointSlice controller.

Fire up the UDP greeter example again but set terminationGracePeriodSeconds: 300 in the media-plane deployment.

Create a turncat tunnel and create a TURN allocation:

export IPERF_ADDR=$(kubectl get pod -l app=media-plane -o jsonpath="{.items[0].status.podIP}")
turncat --log=all:TRACE - 'k8s://stunner/udp-gateway:udp-listener' udp://$IPERF_ADDR:9001   
Hi
Greetings from STUNner!
...

Scale the media-plane deployment down to 0 pods:

kubectl scale deployment media-plane --replicas=0

This will trigger the UDP greeter pod to enter into a TERMINATING state:

kubectl get pods
NAME                           READY   STATUS        RESTARTS   AGE
media-plane-55658cb4f5-d7h2l   1/1     Terminating   0          91s

And the turncat tunnel stays open:

...
Hi again after terminate
Greetings from STUNner!
And the connection remains open, isn't it?
Greetings from STUNner!
...

from stunner.

TURN connection breaks when the backend pod enters graceful shutdown about stunner HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent