In this workshop we'll set up a raspberry pi cluster and run some configuration scenarios to familiarize you with kubernetes cluster administration.
The raspberry pi cluster is connected to the internet and is accessible from the workstation.
Basic knowledge of linux and SSH is expected
Each team has been given their own raspberry pi cluster with unique IP addresses and node names. Each raspberrypi has been preconfigured to run kubernetes.
Linux shell: with ssh support and the kubernetes-CLI (kubectl) installed.
Node: any version of node.js or your favourite tool to generate http load
Each cluster consists of 4 nodes. Organize yourselves so that each team member chooses 1 node to configure then perform the following steps:
-
SSH into your node
k8-t<team>-n<node>
NOTE On some machines,
.local
,.lan
or.localdomain
should be added as suffix.ssh pi@k8-t<team>-n<node>
-
Install Docker
Kubernetes is an container orchestrator that runs on docker.
Run the following command to install docker comminuty edition on the node:
curl -sSL get.docker.com | sh
-
Install kubeadm
Kubernetes is managed through APIs on both master and worker nodes.
kubeadm
is used for cluster administration, such as joining a cluster.Install the kubeadm tool on your node:
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - && \ echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list && \ sudo apt-get update -q && \ sudo apt-get install -qy kubeadm
This step is based on the following source.
One or more nodes in the cluster is responsible for managing the worker nodes. A cluster is created by initializing one or more masters and then joining worker nodes to the cluster.
The kubernetes API resides on the master node(s). The kubernetes CLI kubectl
sends API requests to the master which in turn sends requests to the kubelet api on each worker node.
NOTE This should only be executed on the master node - node number 1
. If your node has a different number, please continute to the Worker Nodes section and keep asking your master node teammate for the join command.
-
Pull the images to speed up the init process and avoid potential timeout:
sudo kubeadm config images pull
-
Initialize the cluster, your teammates will have to join within 10 minutes:
NOTE: If you are assigned node number
4
do not join the cluster at this stage. If you accidentally do, please executesudo kubeadm reset
on the node andkubectl delete node k8-t<team>-n4
to remove yourself from the cluster.sudo kubeadm init --token-ttl=10m
The command finishes successfully with the following message:
Note the token and hash is unique for each master
... Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.10.140:6443 --token r3xyoq.t92yjpgsdrf0y7e4 \ --discovery-token-ca-cert-hash sha256:0d919582cbce47e25a8ac22f7166c9633816475b8a87be749d62953c0ef492f0
This step is based on the "Master Node Setup" from this source.
Get the join command containing the token and certificate hash from the team member assigned to the master node.
NOTE If you have node number 4, please wait with this step. Only nodes 1, 2 and 3 should be added to the cluster at this stage.
-
Execute the command and join the master to form a cluster:
ssh pi@k8-t<team>-n<node>.local sudo kubeadm join k8-t1-n1:6443 --token ${TOKEN} --discovery-token-ca-cert-hash sha256:${CERT_HASH}
kubectl stores credentials in a config file. Multiple credentials, or contexts, can be stored in the same file. Commands are sent to the active context.
-
Log in to the master node and create a copy of the credentials that is readable by the pi user:
ssh k8-t<team>-n1 sudo cp /etc/kubernetes/admin.conf . sudo chown $(id -u):$(id -g) admin.conf exit
-
On your workstation copy the credentials file using scp:
Warning If you have existing kubernetes configuration, back it up before executing the command below as it will overwrite the configuration.
mkdir ~/.kube/conf scp pi@k8s-t<team>-n1.local:~/admin.conf ~/.kube/config
-
Check the status of the nodes:
kubectl get nodes
The nodes are connected but not ready to recieve workloads.
k8-t1-n1 NotReady master 6m39s v1.16.3 k8-t1-n2 NotReady <none> 5m45s v1.16.3 k8-t1-n3 NotReady <none> 5m41s v1.16.3 k8-t1-n4 NotReady <none> 5m37s v1.16.3
-
Check the status of the pods in the kubernetes system namespace:
kubectl get pods -n kube-system
The coredns pods are running, but not answering on health checks due to a missing network driver
NAME READY STATUS RESTARTS AGE coredns-5644d7b6d9-nsvl9 0/1 Pending 0 3m16s coredns-5644d7b6d9-s62lp 0/1 Pending 0 3m16s etcd-k8-t5-n1 1/1 Running 0 2m24s kube-apiserver-k8-t5-n1 1/1 Running 0 2m16s kube-controller-manager-k8-t5-n1 1/1 Running 0 2m37s kube-proxy-bb5p6 1/1 Running 0 2m14s kube-proxy-m48xp 1/1 Running 0 3m16s kube-proxy-wcdrr 1/1 Running 0 2m2s kube-scheduler-k8-t5-n1 1/1 Running 0 2m13s
The node network driver adds an overlay network on the cluster that allows cross-node communication.
-
Install the Weave Net network driver.
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
This adds a damonset resouce, which adds a pod on each node, providing the overlay network.
-
Check the status of the pods in the kube-system namespace:
kubectl get pods -n kube-system
Wait until all pods are in the
Running
state. Note that weave-net pods have been added to all nodes. To see which pod is running on what node-o wide
can be added to the above command.NAME READY STATUS RESTARTS AGE coredns-5644d7b6d9-bnnbd 1/1 Running 0 159m coredns-5644d7b6d9-srxpc 1/1 Running 0 159m etcd-k8-t1-n1 1/1 Running 3 158m kube-apiserver-k8-t1-n1 1/1 Running 3 158m kube-controller-manager-k8-t1-n1 1/1 Running 4 158m kube-proxy-228w8 1/1 Running 3 107m kube-proxy-d5hs7 1/1 Running 3 106m kube-proxy-fd8lw 1/1 Running 3 106m kube-proxy-h4jnm 1/1 Running 3 159m kube-proxy-k6z7n 1/1 Running 3 120m kube-scheduler-k8-t1-n1 1/1 Running 4 158m weave-net-dhfnz 2/2 Running 11 27m weave-net-f7c5c 2/2 Running 11 27m weave-net-kpk4j 2/2 Running 10 27m weave-net-m8fr4 2/2 Running 12 27m weave-net-rb8vf 2/2 Running 11 27m
For educational purposes, we use LEDs to show the loads running on the cluster. Each led signifies one pod which is running one or more containers. Ready 1/1
in the command above signifies that 1 out of 1 containers in the pod are responding to health checks and ready to serve traffic.
When pods are terminated, a grace period allows for graceful termination of services. For his reason, it takes a few seconds for a LED to turn off after it's been terminated.
Table of LED colors:
Color | Pods |
---|---|
Red | whack a pod (initially) |
Green | whack a pod (after upgrade) |
Yellow | node presentation |
Blue | nginx ingress |
Purple | 404-service |
Green flash | pod is starting |
Red flash | pod is terminating |
-
Grant access to the blinkt pods to get pod information from the kubernetes API:
kubectl create -f https://raw.githubusercontent.com/apprenda/blinkt-k8s-controller/master/kubernetes/blinkt-k8s-controller-rbac.yaml
Role Based Acces Control (RBAC) policies specify which kubernetes API calls can be made, in which namespaces by which users or service accounts. This allows for fine grained access control of kubernetes resouces.
To show the status of the pods in the cluster, the blinkt pods must be allowed to read pod status from the API.
The policy below Creates a service account that is bound to the cluster admin role, allowing the pods full access to all kubernetes APIs, this is not recommended in production.
-
Add the blinkt controller:
kubectl apply -f blinkt-k8s-controller-ds.yaml
The blinkt controller is added as a deamonset. Daemonsets add a pod to each node in the cluster. This is typically used for services that need to run on all nodes, like logging and monitoring agents.
-
Check that the daemonset has been added:
kubectl get daemonset -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE blinkt-k8s-controller 0 0 0 0 0 deviceType=blinkt 8s kube-proxy 3 3 3 3 3 beta.kubernetes.io/os=linux 9m24s weave-net 3 3 3 3 3 <none> 2m1s 2m1s
The Daemonset reports that it wants 0 pods running in the cluster. Looking at the
NODE SELECTOR
column, we can see that it requires a label on the node to start the pod. This way, node labels can be used to add services that have special hardware or requirements. -
Label the nodes to enable the blinkt controller
kubectl label node k8-t<team>-n1 deviceType=blinkt kubectl label node k8-t<team>-n2 deviceType=blinkt kubectl label node k8-t<team>-n3 deviceType=blinkt
The leds should flash all green as the daemonset starts on the node.
-
Check the number of desired pods in the daemonset:
kubectl get daemonset -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE blinkt-k8s-controller 2 2 2 2 2 deviceType=blinkt 21h kube-proxy 5 5 5 5 5 beta.kubernetes.io/os=linux 22h weave-net 5 5 5 5 5 <none Three nodes were labeled, but only 2 pods were scheduled. This is due to workloads not being allowed to run on the master node.
-
remove the NoSchedule taint on the master node:
kubectl taint nodes k8-t<team>-n1 node-role.kubernetes.io/master:NoSchedule-
The LEDs will flash green on the master as the pod is scheduled and started.
If you want to taint it again, execute the following command, you'll see the daemonset flashing red as the pod is evicted.
kubectl taint nodes k8-t<team>-n1 node-role.kubernetes.io/master=:NoSchedule
Untaint the master again so that all LED strips are active.
This section is from Source.
-
Deploy an application in the cluster
kubectl apply -f deployment.yaml
The application runs as a deployment, and will be scheduled on any node which has enough resources to run it.
The deployment runs with 5 replicas, so 5 RED LEDs should light up on your nodes
-
Check which nodes the pods are running
kubectl get pods -o wide
The LEDs should match the
NODE
columnNAME READY STATUS RESTARTS AGE IP NODE lmw-leaf-58cbf5674-7czpc 1/1 Running 0 14m 10.40.0.1 t<team>_n1 lmw-leaf-58cbf5674-kqb95 1/1 Running 0 14m 10.32.0.9 t<team>_n3 lmw-leaf-58cbf5674-wp9rd 1/1 Running 0 14m 10.38.0.1 t<team>_n2 lmw-leaf-58cbf5674-lp264 1/1 Running 0 14m 10.43.0.3 t<team>_n2 lmw-leaf-58cbf5674-vrttg 1/1 Running 0 14m 10.44.0.8 t<team>_n3
Look at the whack a pod board. Your team should now have 5 moles ready to be whacked.
-
Let's upgrade it:
Kubernets uses rolling upgrades by default. That means that old pods are terminated (as many as the disriuption budget allows) as new ones are started. When the new pods answer on health checks the remaining pods are terminated. This means upgrades can be performed without downtime
Red is a negative color, edit the
deployment.yaml
, commenting out theblinktColor: FF0000 #red
line and using a hash (#) character, and removing the comment from the#blinktColor: 00FF00 #green
.kubectl apply -f deployment.yaml
Leds should show pods gradually being terminated, replacing red with green. Looking at the whack a pod screen, you should see your moles being whacked and new ones appearing.
-
Let's scale it:
kubectl scale --replicas=16 deployment/lmw-leaf
Lot's of moles and green LEDs! Check the status
kubectl get pods
A few of the pods are in Pending state:
NAME READY STATUS RESTARTS AGE lmw-leaf-68f4cc7f5d-gm79b 0/1 Pending 0 103s
-
Figure out why the application isn't starting:
kubectl describe pod lmw-leaf-68f4cc7f5d-gm79b
... Warning FailedScheduling <unknown> default-scheduler 0/3 nodes are available: 3 Insufficient cpu.
The application has a resource requirement when being scheduled on a node and we don't have enough resources to run all pods.
grep resources: -A2 deployment.yaml
resource requests signify the minimum amount of resources this container requires to run. If it's not available, we'll need to add more nodes to the cluster.
resources: requests: cpu: 1000m
Node 4 to the rescue!
-
Fail to join the master:
ssh k8-t<team>-n4 sudo kubeadm join 192.168.10.140:6443 --token r3xyoq.t92yjpgsdrf0y7e4 \ --discovery-token-ca-cert-hash sha256:0d919582cbce47e25a8ac22f7166c9633816475b8a87be749d62953c0ef492f0 -v=1
That didn't work. The token has expired! stop waiting by pressing
ctrl+c
.Failed to connect to API Server "192.168.10.140:6443": token id "krl4bq" is invalid for this cluster or it has expired. Use "kubeadm token create" on the control-plane node to create a new valid token
-
Create a new token:
ssh k8-t<team>-n1 sudo kubeadm token create --print-join-command --ttl 10m
-
Use the join command printed to join the node:
ssh k8-t<team>-n4 sudo kubeadm join 192.168.40.107:6443 --token kh3pz3.s31hmgt9e31jdp0i --discovery-token-ca-cert-hash sha256:e48ae4a31280d4057caf6ef60d86055c51ea0bf1619a006038d4f34ce5a21a95 -v1
-
Check that the node has joined, and wait for it to be in a
Ready
state:kubectl get nodes
-
Label the new node to schedule the blinkt pod, enabling the LEDs:
kubectl label node k8-t<team>-n4 deviceType=blinkt
-
All of your pods should now be in
Running
state:kubectl get pods
-
run a simple web application on the cluster:
kubectl apply -f kubernetes-rocks.yaml
The spec also exposes the the pods through a NodePort service which assigns a random unassigned port above 30000 on all nodes. In production environment you have a loadbalancer in front of the port.
-
Browse to the application
kubectl get svc
Check which port was assigned in the
PORT(S)
column. The application is listening on port 8000 and the mapped port is after the colon (:)NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes-rocks NodePort 10.105.251.110 <none> 8000:30631/TCP 11m
In the above example, the service is assigned to port 30631, browsing to
http://k8-t<team>-n<node>:30631
should show the page. -
Label the nodes to see where the application is running:
kubectl label node k8-t<team>-n1 failure-domain.beta.kubernetes.io/zone=k8-t1-n1 kubectl label node k8-t<team>-n2 failure-domain.beta.kubernetes.io/zone=k8-t1-n2 kubectl label node k8-t<team>-n3 failure-domain.beta.kubernetes.io/zone=k8-t1-n3 kubectl label node k8-t<team>-n4 failure-domain.beta.kubernetes.io/zone=k8-t1-n4
The application uses the kubernetes API to determine which failure zone it is running in. When we design kubernetes clusters we generally want multiple nodes in each failure zone, in the event of zone availability.
-
Reload the web page, the failure zone should be visible in the top right corner
An ingress controller configures ingress resources that allow traffic into the cluster. Ingress resource the configure the controller to route traffic to services based on http path and/or host.
We'll use helm to deploy a well defined configuration for this application. Helm uses templating to generate and apply lots of yaml files in the cluster:
-
install helm
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
-
Add the stable chart repo:
This adds a repo uri and gives it the name stable.
helm repo add stable https://kubernetes-charts.storage.googleapis.com/
-
Install an nginx based ingress controller:
helm upgrade --install --force ingress-controller stable/nginx-ingress \ --set controller.replicaCount=2 \ --set controller.service.type=NodePort \ --set controller.service.nodePorts.http=30080 \ --set controller.service.nodePorts.https=30443 \ --set controller.service.omitClusterIP="true" \ --set controller.image.repository=quay.io/kubernetes-ingress-controller/nginx-ingress-controller-arm \ --set controller.podLabels.blinkt=show \ --set controller.podLabels.blinktColor=0000FF \ --set defaultBackend.replicaCount=2 \ --set defaultBackend.image.repository=gcr.io/google_containers/defaultbackend-arm \ --set defaultBackend.service.omitClusterIP="true" \ --set defaultBackend.podLabels.blinkt=show \ --set defaultBackend.podLabels.blinktColor=FF0080
helm charts set values that are used in the templating of yaml files. In this example we set them on the command line directly, but normally this is provided in a values.yaml file and checked into version control.
The above configuration sets the LED color to blue for the ingress-controller and purple for the the 404-service, also known as the default backend. Any requests that do not match a service in the ingress is sent to the default backend.
-
Configure the service to use the ingress
kubectl apply -f kubernetes-rocks-ingress.yaml
-
Browse to the service through the ingress:
browse to
http://k8s-t<team>-n<node>:30080
andhttps://k8s-t<team>-n<node>:30443
.The nginx ingress controller provides a default self signed certificate for https.
Let's simulate some instabillity
-
Install the kubernetes dashboard for simple service monitoring
helm upgrade --install --force chaos stable/chaoskube \ --set imageTag=v0.16.0-arm32v6 \ --set namespaces=default \ --set labels='app!=nginx-ingress' \ --set dryRun=false \ --set rbac.create=true \ --set interval=1s
-
Let's generate some load through the the NodePort service:
npm install -g artillery artillery quick --count 10 -n 20 http://t<team>-n<node>:<port>
All traffic should generate an ok (200) response:
... Codes: 200: 200
Increase the load by increasing number of calls
count
or the number of concurrent usersn
. -
Let's generate some load through the the ingress service:
artillery quick --count 10 -n 20 http://t<team>-n<node>:30080 artillery quick --count 10 -n 20 http://t<team>-n<node>:30443
Time to patch the servers without service downtime
-
Cordon the node to prevent new pods from being scheduled:
kubectl cordon k8-t<team>-n1 kubectl drain k8-t<team>-n1 --ignore-daemonsets --force
Cordon
prevents new loads from being scheduled on the node.drain
cordons and evicts all workloads, moving them to other nodes. -
Uncordon the node to allow workloads to be scheduled again.
kubectl uncordon k8-t<team>-n1
Affinity and anti-affinity can control which pods are scheduled together. Let's separate some app pods from loadbalancer pods:
-
Add an anti-affinity to the worker pods:
spec: affinity: # podAntiAffinity: # requiredDuringSchedulingIgnoredDuringExecution: # - labelSelector: # matchExpressions: # - key: app # operator: In # values: # - nginx-ingress # topologyKey: "kubernetes.io/hostname"
As the yellow worker pods are killed pƄ chaos kube they should be scheduled separate from the blue loadbalancer pods
-
Restart all the pods
Pods are cattle and can be slaughtered indiscriminately as they will be restarted as soon as possible. This will however result in downtime, and setting grace period 0 may lead to etcd corruption and in not recommended in production.
kubectl delete pods --all --force --grace-period=0
The kubernetes dashboard provides simple metrics and a graphical management tool for kubernetes.
-
Apply the kubernetes-dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta6/aio/deploy/recommended.yaml
-
make the service account cluster admin.
The kubernetes dashboard can only do what the logged in user is allowed to. Setting the service account as cluster admin and logging in as that user allows for cluster wide access. This is not reocommended for production systems:
kubectl create clusterrolebinding kubernetes-dashboard-admin-binding --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:kubernetes-dashboard
-
Get the auth token:
The kubernetes dashboard uses auth tokens to log in. The user or service account determines the level of access:
kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard get secret | awk '/^kubernetes-dashboard-token-/{print $1}') | awk '$1=="token:"{print $2}'
Copy the entire token from terminal as we'll require it shortly:
eyJhbGciOiJSUzI1NiIsImtpZCI6InhJb3RJNWxodklSUlRxbHVPUllZczgzSURxeXhYelRHVFhvNFFIRmoza2cifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi05cDg0NSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjYzYjIyNTlkLTg3NDUtNDg1ZC1iMjA2LWIzYWNiNDJiOWFkZSIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.ivR84ABsG6ikYxAkfet9HeTHP0zApTtscxmKG_a5waJ3tVejF0pUQpbf1TBdKPFr1rPvvMwlADCSBBWJ2moz3svsqUHUt6KGX0zAD-BqpQN9JjQYIscZgOUvACH9Q2QbP5GwQxUI-DOcBEEb_WdAXSpRyp4G4h-Nv_4CoEexMfvmUzlJnnzDGvLBaSL7Fh597AogY84dft9QOrb8bw1nbHPmcAMwPSuuqNAPPbMtiyYyOq_JfU5-bDzR1znEKbzj05dP0jqYQ-FHncQcJ2uMfoow50x0f557V_qSxPU-C-eBPudQ-TVhw3fOxq-xUpgRm9WvcRTyxMYhllafLcgMcA
-
Start the proxy server
Since the service is not exposed through the any service, we can start a kubernetes API proxy to access the service:
kubectl proxy
browse to
http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/#/login
, select Token login and paste the token from the above command. -
Accessing any service
If you look closely at the above URI, you can replace the namespace, servicename and port to access any service:
Replace
- namespace with
default
- protocol from
https
tohttp
- port add
3000
Result:
http://localhost:8001/api/v1/namespaces/default/services/http:lmw-leaf:3000/proxy/
You now see the output of the lmw-leaf service
- namespace with
From source.
The moles will no longer be repressed
-
Uninstall chaoskube
Helm charts can easily be removed:
helm delete chaos
Your cluster should now be healthy
-
Reduce the cpu resource requests and increase the number of moles
kubectl patch deployment lmw-leaf --patch "
spec:
replicas: 30
template:
spec:
containers:
- name: lmw-leaf
resources:
requests:
cpu: 100m
"