Coder Social home page Coder Social logo

dgkanatsios / azuregameserversscalingkubernetes Goto Github PK

View Code? Open in Web Editor NEW
39.0 4.0 16.0 54.22 MB

Scaling Dedicated Game Servers on Azure Kubernetes Service

License: MIT License

Go 89.57% Shell 6.21% Dockerfile 0.49% HTML 1.84% Makefile 1.90%
kubernetes azure aks openarena golang scaling game game-servers

azuregameserversscalingkubernetes's People

Contributors

brianpeek avatar dependabot[bot] avatar dgkanatsios avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

azuregameserversscalingkubernetes's Issues

Work with Scheduled Events for Azure Linux VMs

Azure provides an API to see if a VM will reboot for maintenance. Details for this API are listed here. You can try it this way:

kubectl run busybox --image=busybox --rm --restart=Never -it -- /bin/sh
wget -O- --header=Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2017-08-01 # first call takes some time, it will probably return 200 and empty body
# go to the portal, manually click “reboot” on a VM
# after a while …
wget -O- --header=Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2017-08-01
Connecting to 169.254.169.254 (169.254.169.254:80)
{"DocumentIncarnation":1,"Events":[{"EventId":"8A506C35-679A-4F78-9D76-C16DBE65EE6F","EventStatus":"Scheduled","EventType":"Reboot","ResourceType":"VirtualMachine","Resources":["aks-nodepool1-34166363-1"],"NotBefore":"Mon, 26 Nov 2018 10:52:34 GMT"}]}

Proposed design:

  • Create a DaemonSet with an app that calls this API and runs every X minutes
  • If there is a reboot scheduled event
  • Do a cleanup on the Pods on this Node (e.g. if it’s a game server, don’t schedule any more players/games here)
  • Cordon the Node so it’s unschedulable
  • When the reboot finishes (we should keep state in a ConfigMap?), uncordon the Node

Scaling issue

Sometimes there is a race condition issue on the DGSCol controller

repro:

  • create a collection of 5 replicas
  • (optionally) set active players
  • scale it down to 3

You will see 3 Running DGS and 3 as MarkedForDeletion.

The problem lies in the fact that non-parented DGSs (the MarkedForDeletion ones) trigger DGSCol controller updates (because of #31). In one of these updates, there is a chance that another scale down operation will occur, while the original one is still in progress.

DGS and Pod have the same name

In current implementation, the Pod that's created by a DedicatedGameServer instance has the same name as this instance. This might create a lot of problems in the future, so we should change it.

Don't create hostPort if not needed

Right now the project creates hostPort (and thus, opens traffic to the Internet) for all ports in all containers. We should come up with a way so that each DedicatedGameServer selects which containers' ports will have hostPorts created.
Probably we need to declare this in the DedicatedGameServer YAML.

Introduce Admission Webhook

We should introduce Admission Webhooks on the project. At first, they should do two tasks

i) validate that incoming DedicatedGameServerCollection and DedicatedGameServer have request limits at their Pod template
ii) introduce pod-affinity for the Pods so they are better grouped together

hostNetwork

Investigate whether pods created on hostNetwork is the best idea

Testing

Provide some testing methods for the Controllers

PortRegistry issue

noticed a panic when 5 DGS were loaded with no exported ports. We should write a test about it

Improve delete strategy

Currently, when we scale down a collection we randomly choose the DGSs that will become MarkedForDeletion and be removed out of the collection. We could improve this algorithm by prioritising DGSs that

  • are in MarkedForDeletion
  • are Idle
  • have 0 players
  • all others

Customize image creation

Right now, the OpenArena image details are mostly hardcoded. Think about making it more flexible.

Custom scheduler

It would be interesting to investigate the creation of a custom Pod scheduler. This scheduler would distribute the Pods/DedicatedGameServers to each Node, by satisfying the rule ‘is Node A full of running Pods? If not, keep scheduling there. If yes, schedule on B’ and so on and so forth.

Pod Autoscaling

Investigate autoscaling. User should issue minimum and maximum replicas when creating a DGSCollection. Investigate adding autoscaler on pods and nodes.

Better Port assigning

Right now a port is randomly created and it’s existence is checked on a azure Storage. It would be faster and more optimal if the entire Port table contents are loaded into memory first.

x509: certificate signed by unknown authority

I am hit with this message during "Testing with NodeJs App" step. I used a new OpenSSL certificate as per the instructions in the FAQ.

kubectl apply -f https://raw.githubusercontent.com/dgkanatsios/azuregameserversscalingkubernetes/master/artifacts/examples/simplenodejsudp/dedicatedgameservercollection.yaml

Error from server (InternalError): error when creating "https://raw.githubusercontent.com/dgkanatsios/azuregameserversscalingkubernetes/master/artifacts/examples/simplenodejsudp/dedicatedgameservercollection.yaml": Internal error occurred: failed calling webhook "aks-gaming-webhookserver-mutation.azuregaming.com": Post https://aks-gaming-webhookserver.dgs-system.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority

Is this because I am using OpenSSL, instead of a valid CA. Is there a way to get past this without actually procuring a cert from a CA?

DedicatedGameServer status - what to do if failed?

A DedicatedGameServer can signal to our API Server that it has Failed. We do nothing here other than marking the entire DedicatedGameServerCollection as Failed. We should investigate if we should do something else in this case:

  • restart the DedicatedGameServer (simply delete it and one will be recreated by the DedicatedGameServerCollection Controller)
  • have the DedicatedGameServerCollection Controller create an extra one, so the cluster administrator can investigate the faulty one's logs
  • anything else?

We could have the user select what to do via an extra flag on the DedicatedGameServerCollection.

Table Storage

Right now, we're using Table Storage as the backend storage for the project. Is it the best option? Can we provide alternatives? Maybe create a storage interface?

Namespaces

Right now the default namespace is used for both admin stuff (operator/apiserver) and game server pods. We should Create and manage two namespaces: one for the actual pods and one for the admin stuff (operator/apiserver)

Error while running OpenArena Pods

Unable to get the pods running post the last step:

kubectl get pods
NAME READY STATUS RESTARTS AGE
openarena-adjjh-bsuvw 0/1 Completed 0 14m
openarena-kiaoy-iaijg 0/1 Completed 0 14m
openarena-lkkvy-lvqdz 0/1 Completed 0 14m
openarena-rcrzp-owwdf 0/1 Completed 0 14m
openarena-utwna-cnknn 0/1 Completed 0 14m

kubectl logs openarena-adjjh-bsuvw
Starting: /data/oa_ded.x86_64 +set dedicated 2 +set fs_homepath /data/openarena +set net_port 27960 +exec server_config_openarena-adjjh.cfg +map dm4ish
Start processing
stdbuf: failed to run command '/data/oa_ded.x86_64': No such file or directory
Finished processing

I checked on the Azure File share, I could not find '/data/oa_ded.x86_64' file. However, I do see bunch of .cfg files. (see attached)

image

Errors during DGS delete

When you delete a DedicatedGameServerCollection, some errors are logged into the console

Example:

INFO[0072] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"openarena-collection-example-ifzdl", UID:"0a1a0d34-91b9-11e8-81bc-aaf3f78eddce", APIVersion:"v1", ResourceVersion:"
236963", FieldPath:""}): type: 'Warning' reason: 'Error in getting the DedicatedGameServer' dedicatedgameservers.azuregaming.com "openarena-collection-example-ifzdl" not found
INFO[0072] dedicatedgameservers.azuregaming.com "openarena-collection-example-ayqki" not found
E0727 19:21:53.103336    3217 PodController.go:160] error syncing 'default/openarena-collection-example-ayqki': dedicatedgameservers.azuregaming.com "openarena-collection-example-ayqki" not
found

We should check if they can be removed.

Error while setting the MarkedForDeletion flag

During testing with TestNodeJS App (https://github.com/dgkanatsios/azuregameserversscalingkubernetes/blob/master/docs/installation.md#testing-with-nodejs-demo-app-an-echo-http-server), the pod goes to error state each time while setting the DedicatedGameServer MarkedForDeletion state to true (markedfordeletion|true).

Pod log:

$ kubectl logs simplenodejsudp-emijv-lxgdr
UDP Server listening on 0.0.0.0:22222
Set status Assigned OK
Set status Healthy OK
Message received from 104.172.182.40:53085 -

UDP message sent to 104.172.182.40:53085

Message received from 104.172.182.40:53085 - hello

UDP message sent to 104.172.182.40:53085

Message received from 104.172.182.40:53085 - players|8

Set Active Players to running OK
UDP message sent to 104.172.182.40:53085

Message received from 104.172.182.40:53085 - markedfordeletion|true

Set Server Status OK
/app/index.js:87
serverResponse = ${serverResponse}, set Server Status to ${status} OK\n;
^
ReferenceError: status is not defined
at Request. (/app/index.js:87:69)
at Request._callback (/app/node_modules/lodash/lodash.js:10052:25)
at Request.requestRetryReply [as reply] (/app/node_modules/requestretry/index.js:105:19)
at Request. (/app/node_modules/requestretry/index.js:138:10)
at Request.self.callback (/app/node_modules/request/request.js:185:22)
at Request.emit (events.js:182:13)
at Request. (/app/node_modules/request/request.js:1161:10)
at Request.emit (events.js:182:13)
at IncomingMessage. (/app/node_modules/request/request.js:1083:12)
at Object.onceWrapper (events.js:273:13)

Provide a unified status for DedicatedGameServerCollection

We need to provide a unified status for the entire DedicatedGameServerCollection object. Probably we should handle this on the DGSCollectionController?

An open question is whether this status I) is needed and II) if this status has to be depicted in Table Storage as well

Documentation for #54

Add related documentation for

  • applying default requests/limits to DGS Pods
  • building the project
  • simplenodejsudp project commands (active players, status)

Wrong DGSCol status when scaling down

  • Create a DGSCol of X replicas
  • Scale it down to Y
  • Notice that the AvailableReplicas shows X-1 instead of Y and that state (either DGSCol or Pod) does not show the correct value

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.