Coder Social home page Coder Social logo

Comments (13)

rg0now avatar rg0now commented on June 16, 2024

This ends up being two related bugs, both occurring due to that we immediately remove the pod IP from the stunnerd config when the pod enters the terminating state, instead of waiting until it finally really terminates. This then causes the below problems:

  • Clients can no longer refresh existing permissions to terminating pods. This usually happens 1-5 mins after pod shutdown starts.
  • Sending packets on existing connections to terminating pods immediately fail with "peer port administratively prohibited" error.

The takeaway is that fixing this on the stunnerd side would be more difficult than expected.

from stunner.

rg0now avatar rg0now commented on June 16, 2024

Further investigations: turns out the problem is that we're still using the old Endpoints API for backend pod discovery, which does not consider terminating pods, in contrast to the modern EndpointSlice API, which does.

By default, Kubernetes removes all pod IPs from the Endpoints object that belong to "Terminating" backend pods. For the above example, after graceful shutdown starts we get and empty Endpoints resource:

apiVersion: v1
kind: Endpoints
metadata:
  name: media-plane
  namespace: default
  labels:
    app: media-plane

Since we use the endpoint IPs in this object for permission request handlers and for port-range filtering, we immediately break all existing connections to "Terminating" pods.

The EndpointSlice API, however, returns all the pod IPs, containing the Terminating ones as well:

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: media-plane-qzbrt
  namespace: default
  labels:
    app: media-plane
    kubernetes.io/service-name: media-plane
addressType: IPv4
ports:
- name: ""
  port: 9001
  protocol: UDP
endpoints:
- addresses:
  - 10.244.0.3
  conditions:
    ready: false
    serving: true
    terminating: true
  nodeName: stunner

Observe the endpoint IP 10.244.0.3 with terminating: true: that's our terminating backend pod.

So the solution is to rewrite the gateway operator from the Endpoints API to the EndpointSlice API and then add the pod IPs for terminating pods to the list of permitted endpoints.

from stunner.

rg0now avatar rg0now commented on June 16, 2024

Addendum: with 2 "ready" and one "terminating" pods:

media-plane-55658cb4f5-hdw6c   1/1     Running       0          10.244.0.14   
media-plane-55658cb4f5-pjp9c   1/1     Terminating   0          10.244.0.12   
media-plane-55658cb4f5-vjvnz   1/1     Running       0          10.244.0.13   

We get this EndpointSlice:

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: media-plane-qzbrt
  namespace: default
  labels:
    app: media-plane
    kubernetes.io/service-name: media-plane
ports:
- name: ""
  port: 9001
  protocol: UDP
addressType: IPv4
endpoints:
- addresses:
  - 10.244.0.12
  conditions:
    ready: false
    serving: true
    terminating: true
  nodeName: stunner
  targetRef:
    kind: Pod
    name: media-plane-55658cb4f5-pjp9c
    namespace: default
    uid: 277d19b4-f5ab-4616-8024-4deebca8f7e9
- addresses:
  - 10.244.0.13
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: stunner
  targetRef:
    kind: Pod
    name: media-plane-55658cb4f5-vjvnz
    namespace: default
    uid: 965ae9b0-13c8-4145-a850-533b857876af
- addresses:
  - 10.244.0.14
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: stunner
  targetRef:
    kind: Pod
    name: media-plane-55658cb4f5-hdw6c
    namespace: default
    uid: 02a3e3e2-b83c-442c-afe4-4a7ad647bf19

from stunner.

rg0now avatar rg0now commented on June 16, 2024

I've tested this and the issue can no longer be reproduced with the new STUNner dev version that uses the EndpointSlice controller.

  1. Fire up the UDP greeter example again but set terminationGracePeriodSeconds: 300 in the media-plane deployment.

  2. Create a turncat tunnel and create a TURN allocation:

    export IPERF_ADDR=$(kubectl get pod -l app=media-plane -o jsonpath="{.items[0].status.podIP}")
    turncat --log=all:TRACE - 'k8s://stunner/udp-gateway:udp-listener' udp://$IPERF_ADDR:9001   
    Hi
    Greetings from STUNner!
    ...
    
  3. Scale the media-plane deployment down to 0 pods:

    kubectl scale deployment media-plane --replicas=0
    

    This will trigger the UDP greeter pod to enter into a TERMINATING state:

    kubectl get pods
    NAME                           READY   STATUS        RESTARTS   AGE
    media-plane-55658cb4f5-d7h2l   1/1     Terminating   0          91s
    
  4. And the turncat tunnel stays open:

    ...
    Hi again after terminate
    Greetings from STUNner!
    And the connection remains open, isn't it?
    Greetings from STUNner!
    ...
    

from stunner.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.