Coder Social home page Coder Social logo

kubeadm-ansible's Introduction

Kubeadm Ansible Playbook

Build a Kubernetes cluster using Ansible with kubeadm. The goal is easily install a Kubernetes cluster on machines running:

  • Ubuntu 16.04
  • CentOS 7
  • Debian 9

System requirements:

  • Deployment environment must have Ansible 2.4.0+
  • Master and nodes must have passwordless SSH access

Usage

Add the system information gathered above into a file called hosts.ini. For example:

[master]
192.16.35.12

[node]
192.16.35.[10:11]

[kube-cluster:children]
master
node

If you're working with ubuntu, add the following properties to each host ansible_python_interpreter='python3':

[master]
192.16.35.12 ansible_python_interpreter='python3'

[node]
192.16.35.[10:11] ansible_python_interpreter='python3'

[kube-cluster:children]
master
node

Before continuing, edit group_vars/all.yml to your specified configuration.

For example, I choose to run flannel instead of calico, and thus:

# Network implementation('flannel', 'calico')
network: flannel

Note: Depending on your setup, you may need to modify cni_opts to an available network interface. By default, kubeadm-ansible uses eth1. Your default interface may be eth0.

After going through the setup, run the site.yaml playbook:

$ ansible-playbook site.yaml
...
==> master1: TASK [addon : Create Kubernetes dashboard deployment] **************************
==> master1: changed: [192.16.35.12 -> 192.16.35.12]
==> master1:
==> master1: PLAY RECAP *********************************************************************
==> master1: 192.16.35.10               : ok=18   changed=14   unreachable=0    failed=0
==> master1: 192.16.35.11               : ok=18   changed=14   unreachable=0    failed=0
==> master1: 192.16.35.12               : ok=34   changed=29   unreachable=0    failed=0

The playbook will download /etc/kubernetes/admin.conf file to $HOME/admin.conf.

If it doesn't work download the admin.conf from the master node:

$ scp k8s@k8s-master:/etc/kubernetes/admin.conf .

Verify cluster is fully running using kubectl:

$ export KUBECONFIG=~/admin.conf
$ kubectl get node
NAME      STATUS    AGE       VERSION
master1   Ready     22m       v1.6.3
node1     Ready     20m       v1.6.3
node2     Ready     20m       v1.6.3

$ kubectl get po -n kube-system
NAME                                    READY     STATUS    RESTARTS   AGE
etcd-master1                            1/1       Running   0          23m
...

Resetting the environment

Finally, reset all kubeadm installed state using reset-site.yaml playbook:

$ ansible-playbook reset-site.yaml

Additional features

These are features that you could want to install to make your life easier.

Enable/disable these features in group_vars/all.yml (all disabled by default):

# Additional feature to install
additional_features:
  helm: false
  metallb: false
  healthcheck: false

Helm

This will install helm in your cluster (https://helm.sh/) so you can deploy charts.

MetalLB

This will install MetalLB (https://metallb.universe.tf/), very useful if you deploy the cluster locally and you need a load balancer to access the services.

Healthcheck

This will install k8s-healthcheck (https://github.com/emrekenci/k8s-healthcheck), a small application to report cluster status.

Utils

Collection of scripts/utilities

Vagrantfile

This Vagrantfile is taken from https://github.com/ecomm-integration-ballerina/kubernetes-cluster and slightly modified to copy ssh keys inside the cluster (install https://github.com/dotless-de/vagrant-vbguest is highly recommended)

Tips & Tricks

Specify user for Ansible

If you use vagrant or your remote user is root, add this to hosts.ini

[master]
192.16.35.12 ansible_user='root'

[node]
192.16.35.[10:11] ansible_user='root'

Access Kubernetes Dashboard

As of release 1.7 Dashboard no longer has full admin privileges granted by default, so you need to create a token to access the resources:

$ kubectl -n kube-system create sa dashboard
$ kubectl create clusterrolebinding dashboard --clusterrole cluster-admin --serviceaccount=kube-system:dashboard
$ kubectl -n kube-system get sa dashboard -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: 2017-11-27T17:06:41Z
  name: dashboard
  namespace: kube-system
  resourceVersion: "69076"
  selfLink: /api/v1/namespaces/kube-system/serviceaccounts/dashboard
  uid: 56b880bf-d395-11e7-9528-448a5ba4bd34
secrets:
- name: dashboard-token-vg52j

$ kubectl -n kube-system describe secrets dashboard-token-vg52j
...
token:      eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtdG9rZW4tdmc1MmoiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTZiODgwYmYtZDM5NS0xMWU3LTk1MjgtNDQ4YTViYTRiZDM0Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZCJ9.bVRECfNS4NDmWAFWxGbAi1n9SfQ-TMNafPtF70pbp9Kun9RbC3BNR5NjTEuKjwt8nqZ6k3r09UKJ4dpo2lHtr2RTNAfEsoEGtoMlW8X9lg70ccPB0M1KJiz3c7-gpDUaQRIMNwz42db7Q1dN7HLieD6I4lFsHgk9NPUIVKqJ0p6PNTp99pBwvpvnKX72NIiIvgRwC2cnFr3R6WdUEsuVfuWGdF-jXyc6lS7_kOiXp2yh6Ym_YYIr3SsjYK7XUIPHrBqWjF-KXO_AL3J8J_UebtWSGomYvuXXbbAUefbOK4qopqQ6FzRXQs00KrKa8sfqrKMm_x71Kyqq6RbFECsHPA

$ kubectl proxy

Copy and paste the token from above to dashboard.

Login the dashboard:

kubeadm-ansible's People

Contributors

0x77dev avatar ailurusumbra avatar cdrage avatar deamwork avatar dvidr avatar jlafourc avatar karolkieglerski avatar khann-adill avatar n0madic avatar njordr avatar perriea avatar priitliivak avatar sc68cal avatar tuxpeople avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubeadm-ansible's Issues

Kubeadm init idempotency

I would expect when I run the site playbook for a second time it would leave the kubernetes cluster in its current state.
In its current version the site playbook really removes the existing cluster with kubeadm reset.

Is there any way to run kubeadm init in an idempotent manner? In its current version it throws errors that few resources already exist.

flannel doesn't start

get pods --all-namespaces

kube-system kube-flannel-ds-bdcxj 0/1 CrashLoopBackOff 6 10m
kube-system kube-flannel-ds-llstg 0/1 CrashLoopBackOff 6 10m

kubectl exec kube-apiserver-master.domain.local -c kube-apiserver -n kube-system ps
--authorization-mode=Node,RBAC

kubectl get sa flannel -n kube-system
NAME SECRETS AGE
flannel 1 19m

when i try : kubectl create -f https://github.com/coreos/flannel/blob/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
error: error converting YAML to JSON: yaml: line 376: mapping values are not allowed in this context

and my kubectl logs:

Error adding network: open /run/flannel/subnet.env: no such file or directory
Error while adding to cni network: open /run/flannel/subnet.env: no such file or directory

Reset Kubernetes component failed

TASK [kubernetes/master : Reset Kubernetes component] ********************************************************************************************************************
Wednesday 04 July 2018 12:49:13 +0530 (0:00:00.486) 0:00:27.593 ********
fatal: [10.10.121.244]: FAILED! => {"changed": true, "cmd": "kubeadm reset", "delta": "0:00:00.043798", "end": "2018-07-04 12:51:06.493269", "msg": "non-zero return code", "rc": 1, "start": "2018-07-04 12:51:06.449471", "stderr": "Aborted reset operation", "stderr_lines": ["Aborted reset operation"], "stdout": "[reset] WARNING: changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.\n[reset] are you sure you want to proceed? [y/N]: ", "stdout_lines": ["[reset] WARNING: changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.", "[reset] are you sure you want to proceed? [y/N]: "]}
to retry, use: --limit @/root/kubeadm-ansible/site.retry

Unable to join nodes

Hi,

I get the following error when trying to join nodes to the master:

TASK [kubernetes-node : Join to Kubernetes cluster] ****************************
fatal: [kubernetes-node1.k8s.anfieldroad.int]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_default_ipv4'\n\nThe error appears to have been in '/home/praetorian/andy/ansible/roles/kubernetes-node/tasks/join.yml': line 7, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Join to Kubernetes cluster\n ^ here\n"}

cat inventories/infrastructure/group_vars/kubernetes-cluster.yml

file: kubernetes-cluster.yml

master_ip: "{{ hostvars[groups['kubernetes-master'][0]]['ansible_default_ipv4'].address | default(groups['kubernetes-master'][0]) }}"

Is something incorrect here? (apart from the fact my group name is different from your code, this is intentional).

when kubeadm upgrade to v1.10.1 ,it will be fatal

export OS_IMAGE=centos7
[all:vars]

# Kubernetes
#kube_version=v1.9.3
kube_version=v1.10.1
token=b0f7b8.8d1767876297d85c
TASK [kubernetes/master : Init Kubernetes cluster] *******************************************

fatal

fatal: [192.16.35.12]: FAILED! => {"changed": true, "cmd": "kubeadm init --service-cidr 10.96.0.0/12 --kubernetes-version v1.10.1 --pod-network-cidr 10.244.0.0/16 --token b0f7b8.8d1767876297d85c ", "delta": "0:02:10.301000", "end": "2018-04-21 17:10:02.936757", "msg": "non-zero return code", "rc": 1, "start": "2018-04-21 17:07:52.635757", "stderr": "\t[WARNING FileExisting-crictl]: crictl not found in system path\nSuggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl\ncouldn't initialize a Kubernetes cluster", "stderr_lines": ["\t[WARNING FileExisting-crictl]: crictl not found in system path", "Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl", "couldn't initialize a ........1.10.1", "\t\t- k8s.gcr.io/etcd-amd64:3.1.12 (only if no external etcd endpoints are configured)", "", "If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:", "\t- 'systemctl status kubelet'", "\t- 'journalctl -xeu kubelet'"]}
to retry, use: --limit @/vagrant/site.retry

PLAY RECAP ****************************************************************************************************************************************************************************
192.16.35.12 : ok=11 changed=2 unreachable=0 failed=1

journalctl -xeu kubelet'

output

Apr 21 17:24:50 master1 kubelet[27523]: W0421 17:24:50.988435 27523 hostport_manager.go:68] The binary conntrack is not installed, this can cause failures in network connection clea
Apr 21 17:24:50 master1 kubelet[27523]: W0421 17:24:50.988492 27523 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Apr 21 17:24:50 master1 kubelet[27523]: I0421 17:24:50.988537 27523 docker_service.go:244] Docker cri networking managed by cni
Apr 21 17:24:51 master1 kubelet[27523]: I0421 17:24:51.005521 27523 docker_service.go:249] Docker Info: &{ID:NC7E:CYY4:TOY6:QJPM:CP6S:KQT7:GKCY:NF63:MOT6:B2IW:5HQY:FHVS Containers:0
Apr 21 17:24:51 master1 kubelet[27523]: F0421 17:24:51.005645 27523 server.go:233] failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "systemd
Apr 21 17:24:51 master1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Apr 21 17:24:51 master1 systemd[1]: Unit kubelet.service entered failed state.
Apr 21 17:24:51 master1 systemd[1]: kubelet.service failed.
lines 1219-1270/1270 (END)

/etc/systemd/system/kubelet.service.d does not exist

riday 26 April 2019 10:43:12 +0300 (0:00:01.133) 0:00:41.830 **********
fatal: [192.168.0.14]: FAILED! => {"changed": false, "checksum": "eecc1bfa10de9cbc7a8bc88d9bc9911558e4a949", "msg": "Destination directory /etc/systemd/system/kubelet.service.d does not exist"}
to retry, use: --limit @/root/workspace/github/kubeadm-ansible/site.retry

[root@k8s-redhat-master1 kubeadm-ansible]# rpm -qa | grep kube
kubeadm-1.14.1-0.x86_64
kubelet-1.14.1-0.x86_64
kubernetes-cni-0.7.5-0.x86_64
kubectl-1.14.1-0.x86_64

I thanks it's version wrong. then set special version, always still failed

Thanks!
Br, Cody

Support for 1.12.1

Looks like we've got to add a port parameter to the token when updating / using 1.12.1 of Kubeadm:

fatal: [192.168.1.115]: FAILED! => {"changed": true, "cmd": "kubeadm join --ignore-preflight-errors --token b0f7b8.8d1767876297d85c 192.168.1.124:6443 --discovery-token-unsafe-skip-ca-verification", "delta": "0:00:00.027406", "end": "2018-10-17 14:41:58.072164", "msg": "non-zero return code", "rc": 3, "start": "2018-10-17 14:41:58.044758", "stderr": "[tlsBootstra
pToken: Invalid value: \"\": the bootstrap token is invalid, discovery: Invalid value: \"\": discoveryToken or discoveryFile must be set, discoveryTokenAPIServers: Invalid value: \"b0f7b8.8d1767876297d85c\": address b0f7b8.8d1767876297d85c: missing port in address]", "stderr_lines": ["[tlsBootstrapToken: Invalid value: \"\": the bootstrap token is invalid, discov
ery: Invalid value: \"\": discoveryToken or discoveryFile must be set, discoveryTokenAPIServers: Invalid value: \"b0f7b8.8d1767876297d85c\": address b0f7b8.8d1767876297d85c: missing port in address]"], "stdout": "[validation] WARNING: kubeadm doesn't fully support multiple API Servers yet", "stdout_lines": ["[validation] WARNING: kubeadm doesn't fully support mul
tiple API Servers yet"]}

Not updating apt cache after adding repositories on Ubuntu 16.04

During playbook execution, the docker-engine package is not installed.

TASK [docker : Add Docker yum repository] *******************************************************************************************************************
skipping: [k8s-master]
skipping: [k8s-node-0]
skipping: [k8s-node-1]

TASK [docker : Install docker engine (RHEL/CentOS)] *********************************************************************************************************
skipping: [k8s-master]
skipping: [k8s-node-0]
skipping: [k8s-node-1]

TASK [docker : Install docker engine (Debian/Ubuntu)] *******************************************************************************************************
fatal: [k8s-master]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'docker-engine' is available"}
fatal: [k8s-node-0]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'docker-engine' is available"}
fatal: [k8s-node-1]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'docker-engine' is available"}
to retry, use: --limit @/home/ubuntu/kubeadm-ansible/site.retry

A manual "apt-get update" or adding a "update_cache: yes" on roles/docker/tasks/pkg.yml "(install docker engine" task) solves the issue.

Notice: Shutting down dockerproject.org APT and YUM repos 2020-03-31

Ansible Log:

TASK [docker : Install docker engine (RHEL/CentOS)] ***********************************************************************************
Thursday 04 June 2020  20:56:27 +0300 (0:00:01.111)       0:00:16.571 ********* 
fatal: [10.103.37.33]: FAILED! => {"changed": false, "msg": "Failure talking to yum: failure: repodata/repomd.xml from Docker: [Errno 256] No more mirrors to try.\nhttps://yum.dockerproject.org/repo/main/centos/7/repodata/repomd.xml: [Errno 14] HTTPS Error 404 - Not Found"}

Docker blog article: https://www.docker.com/blog/changes-dockerproject-org-apt-yum-repositories/

License?

Thanks for putting this together @kairen! I'm using this repo as a source of inspiration for my own kubernetes deployments with ansible, but I both 1) want to properly credit you, and 2) be able to legally use some of what you've written in my own deployments.

In order for others to freely use what you've done (with attribution), do you plan to add a permissive license (perhaps MIT, BSD-3, Apache) to this repository?

Processing Conflict: docker-engine-17.03.1.ce-1.el7.centos.x86_64 conflicts docker-io

I had tried to execute this scripte, but met this issue.
Could you help me to check it? :)

[root@master1 kubeadm-ansible]# cat site.yaml 
---

- hosts: kube-cluster
  gather_facts: yes
  become: yes
  roles:
    - { role: docker, tags: docker }

- hosts: master
  gather_facts: yes
  become: yes
  roles:
    - { role: kubernetes/master, tags: master }
    - { role: cni, tags: cni }

- hosts: node
  gather_facts: yes
  become: yes
  roles:
    - { role: kubernetes/node, tags: node }


[root@master1 kubeadm-ansible]# ansible-playbook site.yaml

TASK [docker : Install docker engine (RHEL/CentOS)] **************************************************************************************************************************************************************************************************************************
Friday 14 December 2018  11:33:03 +0200 (0:00:00.550)       0:00:02.481 ******* 
fatal: [192.168.0.32]: FAILED! => {"changed": false, "msg": "\n\nTransaction check error:\n  file /usr/bin/docker from install of docker-engine-17.03.1.ce-1.el7.centos.x86_64 conflicts with file from package docker-common-2:1.13.1-75.git8633870.el7.centos.x86_64\n  file /usr/bin/docker-containerd from install of docker-engine-17.03.1.ce-1.el7.centos.x86_64 conflicts with file from package docker-common-2:1.13.1-75.git8633870.el7.centos.x86_64\n  file /usr/bin/docker-containerd-shim from install of docker-engine-17.03.1.ce-1.el7.centos.x86_64 conflicts with file from package docker-common-2:1.13.1-75.git8633870.el7.centos.x86_64\n  file /usr/bin/dockerd from install of docker-engine-17.03.1.ce-1.el7.centos.x86_64 conflicts with file from package docker-common-2:1.13.1-75.git8633870.el7.centos.x86_64\n\nError Summary\n-------------\n\n", "rc": 1, "results": ["Loaded plugins: fastestmirror, langpacks\nLoading mirror speeds from cached hostfile\n * epel: mirror.yandex.ru\nResolving Dependencies\n--> Running transaction check\n---> Package docker-engine.x86_64 0:17.03.1.ce-1.el7.centos will be installed\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package            Arch        Version                       Repository   Size\n================================================================================\nInstalling:\n docker-engine      x86_64      17.03.1.ce-1.el7.centos       Docker       19 M\n\nTransaction Summary\n================================================================================\nInstall  1 Package\n\nTotal size: 19 M\nInstalled size: 19 M\nDownloading packages:\nRunning transaction check\nRunning transaction test\n"]}
fatal: [192.168.0.33]: FAILED! => {"changed": false, "msg": "\n\nTransaction check error:\n  file /usr/bin/docker from install of docker-engine-17.03.1.ce-1.el7.centos.x86_64 conflicts with file from package docker-common-2:1.13.1-75.git8633870.el7.centos.x86_64\n  file /usr/bin/docker-containerd from install of docker-engine-17.03.1.ce-1.el7.centos.x86_64 conflicts with file from package docker-common-2:1.13.1-75.git8633870.el7.centos.x86_64\n  file /usr/bin/docker-containerd-shim from install of docker-engine-17.03.1.ce-1.el7.centos.x86_64 conflicts with file from package docker-common-2:1.13.1-75.git8633870.el7.centos.x86_64\n  file /usr/bin/dockerd from install of docker-engine-17.03.1.ce-1.el7.centos.x86_64 conflicts with file from package docker-common-2:1.13.1-75.git8633870.el7.centos.x86_64\n\nError Summary\n-------------\n\n", "rc": 1, "results": ["Loaded plugins: fastestmirror, langpacks\nLoading mirror speeds from cached hostfile\n * epel: www.nic.funet.fi\nResolving Dependencies\n--> Running transaction check\n---> Package docker-engine.x86_64 0:17.03.1.ce-1.el7.centos will be installed\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package            Arch        Version                       Repository   Size\n================================================================================\nInstalling:\n docker-engine      x86_64      17.03.1.ce-1.el7.centos       Docker       19 M\n\nTransaction Summary\n================================================================================\nInstall  1 Package\n\nTotal size: 19 M\nInstalled size: 19 M\nDownloading packages:\nRunning transaction check\nRunning transaction test\n"]}
fatal: [192.168.0.18]: FAILED! => {"changed": false, "msg": "Error: docker-engine conflicts with 2:docker-1.13.1-88.git07f3374.el7.centos.x86_64\n", "rc": 1, "results": ["Loaded plugins: fastestmirror, langpacks\nLoading mirror speeds from cached hostfile\n * epel: www.nic.funet.fi\nResolving Dependencies\n--> Running transaction check\n---> Package docker-engine.x86_64 0:17.03.1.ce-1.el7.centos will be installed\n--> Processing Conflict: docker-engine-17.03.1.ce-1.el7.centos.x86_64 conflicts docker-io\n--> Restarting Dependency Resolution with new changes.\n--> Running transaction check\n---> Package docker.x86_64 2:1.13.1-75.git8633870.el7.centos will be updated\n---> Package docker.x86_64 2:1.13.1-88.git07f3374.el7.centos will be an update\n--> Processing Dependency: docker-common = 2:1.13.1-88.git07f3374.el7.centos for package: 2:docker-1.13.1-88.git07f3374.el7.centos.x86_64\n--> Processing Dependency: docker-client = 2:1.13.1-88.git07f3374.el7.centos for package: 2:docker-1.13.1-88.git07f3374.el7.centos.x86_64\n--> Running transaction check\n---> Package docker-client.x86_64 2:1.13.1-75.git8633870.el7.centos will be updated\n---> Package docker-client.x86_64 2:1.13.1-88.git07f3374.el7.centos will be an update\n---> Package docker-common.x86_64 2:1.13.1-75.git8633870.el7.centos will be updated\n---> Package docker-common.x86_64 2:1.13.1-88.git07f3374.el7.centos will be an update\n--> Processing Conflict: docker-engine-17.03.1.ce-1.el7.centos.x86_64 conflicts docker-io\n--> Processing Conflict: docker-engine-17.03.1.ce-1.el7.centos.x86_64 conflicts docker\n--> Finished Dependency Resolution\n You could try using --skip-broken to work around the problem\n You could try running: rpm -Va --nofiles --nodigest\n"]}
        to retry, use: --limit @/root/workspace/kubeadm-ansible/site.retry

PLAY RECAP *******************************************************************************************************************************************************************************************************************************************************************
192.168.0.18               : ok=3    changed=0    unreachable=0    failed=1   
192.168.0.32               : ok=3    changed=0    unreachable=0    failed=1   
192.168.0.33               : ok=3    changed=0    unreachable=0    failed=1   

CentOS 7 Docker package name update

TASK [docker : Install docker engine (RHEL/CentOS)] ***********************************************************************************
Thursday 04 June 2020  21:07:43 +0300 (0:00:01.219)       0:00:13.731 ********* 
fatal: [10.103.37.30]: FAILED! => {"changed": false, "msg": "No package matching 'docker-engine-17.03.*' found available, installed or updated", "rc": 126, "results": ["No package matching 'docker-engine-17.03.*' found available, installed or updated"]}

kube-dns issue

Hello,

i have an issue with kube-dns.
i use flannel.

Events:
  Type     Reason                  Age                 From               Message
  ----     ------                  ----                ----               -------
  Normal   Scheduled               49s                 default-scheduler  Successfully assigned kube-dns-598d7bf7d4-2f5hd to myvm2
  Normal   SuccessfulMountVolume   49s                 kubelet, myvm2     MountVolume.SetUp succeeded for volume "kube-dns-config"
  Normal   SuccessfulMountVolume   49s                 kubelet, myvm2     MountVolume.SetUp succeeded for volume "kube-dns-token-mwz9h"
  Normal   SandboxChanged          32s (x11 over 47s)  kubelet, myvm2     Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  31s (x12 over 48s)  kubelet, myvm2     Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kube-dns-598d7bf7d4-2f5hd_kube-system" network: open /run/flannel/subnet.env: no such file or directory

OL7.8 unexpected kernel config: CONFIG_CGROUP_PIDS

fatal: [10.200.152.11]: FAILED! => changed=true
cmd: |-
kubeadm init --service-cidr 10.96.0.0/12 --kubernetes-version v1.22.2 --pod-network-cidr 10.244.0.0/16 --token b0f7b8.8d1767876297d85c --apiserver-advertise-address 10.200.152.11 --cri-socket=/var/run/containerd/containerd.sock
delta: '0:00:10.179032'
end: '2021-09-22 16:00:24.597867'
msg: non-zero return code
rc: 1
start: '2021-09-22 16:00:14.418835'
stderr: |2-
[WARNING FileExisting-tc]: tc not found in system path
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: NAME:
crictl info - Display information of the container runtime

USAGE:
   crictl info [command options] [arguments...]

OPTIONS:
   --output value, -o value  Output format, One of: json|yaml (default: "json")

time="2021-09-22T16:00:24-04:00" level=fatal msg="failed to connect: failed to connect: context deadline exceeded"
, error: exit status 1
        [ERROR SystemVerification]: unexpected kernel config: CONFIG_CGROUP_PIDS
        [ERROR SystemVerification]: missing required cgroups: pids
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

stderr_lines:
stdout: |-
[init] Using Kubernetes version: v1.22.2
[preflight] Running pre-flight checks
[preflight] The system verification failed. Printing the output from the verification:
[0;37mKERNEL_VERSION[0m: [0;32m4.1.12-124.22.2.el7uek.x86_64[0m
[0;37mCONFIG_NAMESPACES[0m: [0;32menabled[0m
[0;37mCONFIG_NET_NS[0m: [0;32menabled[0m
[0;37mCONFIG_PID_NS[0m: [0;32menabled[0m
[0;37mCONFIG_IPC_NS[0m: [0;32menabled[0m
[0;37mCONFIG_UTS_NS[0m: [0;32menabled[0m
[0;37mCONFIG_CGROUPS[0m: [0;32menabled[0m
[0;37mCONFIG_CGROUP_CPUACCT[0m: [0;32menabled[0m
[0;37mCONFIG_CGROUP_DEVICE[0m: [0;32menabled[0m
[0;37mCONFIG_CGROUP_FREEZER[0m: [0;32menabled[0m
[0;37mCONFIG_CGROUP_PIDS[0m: [0;31mnot set[0m
[0;37mCONFIG_CGROUP_SCHED[0m: [0;32menabled[0m
[0;37mCONFIG_CPUSETS[0m: [0;32menabled[0m
[0;37mCONFIG_MEMCG[0m: [0;32menabled[0m
[0;37mCONFIG_INET[0m: [0;32menabled[0m
[0;37mCONFIG_EXT4_FS[0m: [0;32menabled (as module)[0m
[0;37mCONFIG_PROC_FS[0m: [0;32menabled[0m
[0;37mCONFIG_NETFILTER_XT_TARGET_REDIRECT[0m: [0;32menabled (as module)[0m
[0;37mCONFIG_NETFILTER_XT_MATCH_COMMENT[0m: [0;32menabled (as module)[0m
[0;37mCONFIG_FAIR_GROUP_SCHED[0m: [0;32menabled[0m
[0;37mCONFIG_OVERLAY_FS[0m: [0;32menabled (as module)[0m
[0;37mCONFIG_AUFS_FS[0m: [0;33mnot set - Required for aufs.[0m
[0;37mCONFIG_BLK_DEV_DM[0m: [0;32menabled (as module)[0m
[0;37mCONFIG_CFS_BANDWIDTH[0m: [0;32menabled[0m
[0;37mCONFIG_CGROUP_HUGETLB[0m: [0;32menabled[0m
[0;37mCONFIG_SECCOMP[0m: [0;32menabled[0m
[0;37mCONFIG_SECCOMP_FILTER[0m: [0;32menabled[0m
[0;37mOS[0m: [0;32mLinux[0m
[0;37mCGROUPS_CPU[0m: [0;32menabled[0m
[0;37mCGROUPS_CPUACCT[0m: [0;32menabled[0m
[0;37mCGROUPS_CPUSET[0m: [0;32menabled[0m
[0;37mCGROUPS_DEVICES[0m: [0;32menabled[0m
[0;37mCGROUPS_FREEZER[0m: [0;32menabled[0m
[0;37mCGROUPS_MEMORY[0m: [0;32menabled[0m
[0;37mCGROUPS_PIDS[0m: [0;31mmissing[0m
[0;37mCGROUPS_HUGETLB[0m: [0;32menabled[0m
stdout_lines:

unable to init my kubeadm cluster, any idea?

kubeadm is failing to configure master

I tried to run the script on baremetal i.e. ubuntu 16.0.4, gettign following error

fatal: [10.21.236.81]: FAILED! => {"changed": true, "cmd": "kubeadm reset", "delta": "0:00:00.064778", "end": "2018-07-12 03:29:03.101311", "msg": "non-zero return code", "rc": 1, "start": "2018-07-12 03:29:03.036533", "stderr": "Aborted reset operation", "stderr_lines": ["Aborted reset operation"], "stdout": "[reset] WARNING: changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.\n[reset] are you sure you want to proceed? [y/N]: ", "stdout_lines": ["[reset] WARNING: changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.", "[reset] are you sure you want to proceed? [y/N]: "]}

No access to file error while running the playbook

Hello,

I downloaded this repository. In my case, I use only one node (master node as master and worker node). So, I have such a hosts.ini file:

[master]
172.x.x.5

[kube-cluster:children]
master

[default]
ansible_become=true
ansible_user='<my_user_name>'
ansible_become_method='sudo'
ansible_become_user='kubernetes'

...as I need to log as the sudo, but not root user ('kubernetes' user).

I am getting such an error (console log):

 [WARNING]: Invalid characters were found in group names but not replaced, use
-vvvv to see details


PLAY [kube-cluster] ************************************************************

TASK [Gathering Facts] *********************************************************
Tuesday 16 July 2019  16:40:02 +0200 (0:00:00.023)       0:00:00.023 ********** 
ok: [172.x.x.5]

TASK [commons/os-checker : Get os_version from /etc/os-release] ****************
Tuesday 16 July 2019  16:40:04 +0200 (0:00:02.057)       0:00:02.080 ********** 
skipping: [172.x.x.5]

TASK [commons/os-checker : Get distro name from /etc/os-release] ***************
Tuesday 16 July 2019  16:40:04 +0200 (0:00:00.036)       0:00:02.117 ********** 
skipping: [172.x.x.5]

TASK [commons/os-checker : Set fact ansible_os_family var to Debian] ***********
Tuesday 16 July 2019  16:40:04 +0200 (0:00:00.036)       0:00:02.153 ********** 
skipping: [172.x.x.5]

TASK [commons/os-checker : Set fact ansible_os_family var to Debian] ***********
Tuesday 16 July 2019  16:40:04 +0200 (0:00:00.035)       0:00:02.189 ********** 
skipping: [172.x.x.5]

TASK [commons/os-checker : Set fact ansible_os_family var to RedHat] ***********
Tuesday 16 July 2019  16:40:04 +0200 (0:00:00.035)       0:00:02.224 ********** 
skipping: [172.x.x.5]

TASK [commons/os-checker : Override config file directory for Debian] **********
Tuesday 16 July 2019  16:40:04 +0200 (0:00:00.034)       0:00:02.259 ********** 
skipping: [172.x.x.5]

TASK [docker : Install Docker container engine] ********************************
Tuesday 16 July 2019  16:40:04 +0200 (0:00:00.034)       0:00:02.294 ********** 
included: /home/user/Pulpit/kubeadm-ansible-master/roles/docker/tasks/pkg.yml for 172.x.x.5

TASK [docker : Install apt-transport-https] ************************************
Tuesday 16 July 2019  16:40:05 +0200 (0:00:00.059)       0:00:02.353 ********** 
skipping: [172.x.x.5]

TASK [docker : Add Docker APT GPG key] *****************************************
Tuesday 16 July 2019  16:40:05 +0200 (0:00:00.034)       0:00:02.388 ********** 
skipping: [172.x.x.5]

TASK [docker : Add Docker APT repository] **************************************
Tuesday 16 July 2019  16:40:05 +0200 (0:00:00.035)       0:00:02.424 ********** 
skipping: [172.x.x.5]

TASK [docker : Add Docker yum repository] **************************************
Tuesday 16 July 2019  16:40:05 +0200 (0:00:00.035)       0:00:02.460 ********** 
fatal: [172.x.x.5]: FAILED! => {"changed": false, "details": "[Errno 13] Access denied: '/etc/yum.repos.d/docker.repo'", "msg": "Cannot open repo file /etc/yum.repos.d/docker.repo."}

PLAY RECAP *********************************************************************
172.x.x.5                 : ok=2    changed=0    unreachable=0    failed=1    skipped=9    rescued=0    ignored=0   

Tuesday 16 July 2019  16:40:05 +0200 (0:00:00.861)       0:00:03.322 ********** 
=============================================================================== 
Gathering Facts --------------------------------------------------------- 2.06s
docker : Add Docker yum repository -------------------------------------- 0.86s
docker : Install Docker container engine -------------------------------- 0.06s
commons/os-checker : Get os_version from /etc/os-release ---------------- 0.04s
commons/os-checker : Get distro name from /etc/os-release --------------- 0.04s
docker : Add Docker APT GPG key ----------------------------------------- 0.04s
commons/os-checker : Set fact ansible_os_family var to Debian ----------- 0.04s
docker : Add Docker APT repository -------------------------------------- 0.04s
commons/os-checker : Set fact ansible_os_family var to Debian ----------- 0.04s
commons/os-checker : Override config file directory for Debian ---------- 0.03s
docker : Install apt-transport-https ------------------------------------ 0.03s
commons/os-checker : Set fact ansible_os_family var to RedHat ----------- 0.03s

insecure registries issue

I get an error:
fatal: [10.1.4.19]: FAILED! => {"msg": "The conditional check 'insecure_registrys is defined and insecure_registrys > 0' failed. The error was: Unexpected templating type error occurred on ({% if insecure_registrys is defined and insecure_registrys > 0 %} True {% else %} False {% endif %}): '>' not supported between instances of 'NoneType' and 'int'\n\nThe error appears to have been in '/Users/admin/Documents/work/kubeadm-ansible/roles/docker/tasks/main.yml': line 18, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Add any insecure registrys to Docker config\n ^ here\n"

is this an ansible bug? I tried to add
--extra-vars 'INSECURE_REGISTRY=[]' but that didn't help either

Kubeadm

FAILED! => {"changed": true, "cmd": "kubeadm reset", "delta": "0:00:00.031730", "end": "2018-06-28 11:39:08.070430", "msg": "non-zero return code", "rc": 1, "start": "2018-06-28 11:39:08.038700", "stderr": "Aborted reset operation", "stderr_lines": ["Aborted reset operation"], "stdout": "[reset] WARNING: changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.\n[reset] are you sure you want to proceed? [y/N]: ", "stdout_lines": ["[reset] WARNING: changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.", "[reset] are you sure you want to proceed? [y/N]: "]}

安装时报错

安装时报如下错误:

k8s-m1: The key's randomart image is:
    k8s-m1: +---[RSA 2048]----+
    k8s-m1: |     o=+*o+*=o.  |
    k8s-m1: |     ..=ooo+oo   |
    k8s-m1: |      o   o.*oo  |
    k8s-m1: |     . . o *.o   |
    k8s-m1: |      = S *      |
    k8s-m1: |     o o O o     |
    k8s-m1: |        E ...    |
    k8s-m1: |         o +=    |
    k8s-m1: |          ==..   |
    k8s-m1: +----[SHA256]-----+
    k8s-m1: Warning: Permanently added '192.16.35.10' (ECDSA) to the list of known hosts.
    k8s-m1: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

Kubernetes upgrade

I changed the Kubernetes version from 1.14.0 to 1.16.0 after I ran the playbook for the first time and then re-run it. It ran trough and updated the packages like this:

TASK [commons/pre-install : Install kubernetes packages (RHEL/CentOS)] ******************************************************************************************************************************************************************************************************************
Friday 27 September 2019  16:26:28 +0200 (0:00:00.295)       0:01:25.410 ******
changed: [172.16.202.130] => (item=['kubelet-1.16.0', 'kubeadm-1.16.0', 'kubectl-1.16.0'])

But it did not take care of upgrading the configuration, kubectl still shows me 1.14.0 for all nodes. It works like this: https://v1-15.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade-1-15/

From sudo kubeadm upgrade plan on the master, one can get the available version to upgrade to. Upgrade then the first master node like this: sudo kubeadm upgrade apply <NEWVERSION>. Once this is done, every additional master node needs sudo kubeadm upgrade node. Do not forget to restart Kubelet.

As for the worker nodes, do (one node after another):

kubectl drain $NODE --ignore-daemonsets
sudo kubeadm upgrade node
sudo systemctl restart kubelet
kubectl uncordon $NODE

I'm afraid, I've currently not enough time to implement this myself.

Depends: libltdl7 (>= 2.4.6) but it is not installable

playbook fails on installing docker-ce with missing dependencies. I can fix that manually but I am not sure how to do that in ansible

Any idea how to fix that ?

Distribution:
ISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"
NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"

Container runtime error while executing site.yaml

TASK [kubernetes/master : Init Kubernetes cluster] ****************************************************************************************************
Tuesday 13 April 2021 09:09:51 +0000 (0:00:10.456) 0:01:31.555 *********
fatal: [10.128.0.5]: FAILED! => {"changed": true, "cmd": "kubeadm init --service-cidr 10.96.0.0/12 --kubernetes-version v1.19.2 --pod-network-cidr 10.244.0.0/16 --token b0f7b8.8d1767876297d85c --apiserver-advertise-address 10.128.0.5 --cri-socket=/var/run/containerd/containerd.sock \n", "delta": "0:00:10.188770", "end": "2021-04-13 09:10:01.929454", "msg": "non-zero return code", "rc": 1, "start": "2021-04-13 09:09:51.740684", "stderr": "W0413 09:09:51.799678 4261 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]\nerror execution phase preflight: [preflight] Some fatal errors occurred:\n\t[ERROR CRI]: container runtime is not running: output: NAME:\n crictl info - Display information of the container runtime\n\nUSAGE:\n crictl info [command options] [arguments...]\n\nOPTIONS:\n --output value, -o value Output format, One of: json|yaml (default: "json")\n \ntime="2021-04-13T09:10:01Z" level=fatal msg="failed to connect: failed to connect: context deadline exceeded"\n, error: exit status 1\n[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["W0413 09:09:51.799678 4261 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]", "error execution phase preflight: [preflight] Some fatal errors occurred:", "\t[ERROR CRI]: container runtime is not running: output: NAME:", " crictl info - Display information of the container runtime", "", "USAGE:", " crictl info [command options] [arguments...]", "", "OPTIONS:", " --output value, -o value Output format, One of: json|yaml (default: "json")", " ", "time="2021-04-13T09:10:01Z" level=fatal msg="failed to connect: failed to connect: context deadline exceeded"", ", error: exit status 1", "[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[init] Using Kubernetes version: v1.19.2\n[preflight] Running pre-flight checks", "stdout_lines": ["[init] Using Kubernetes version: v1.19.2", "[preflight] Running pre-flight checks"]}

Hardcoded boostrap token

group_vars/all.yml contains a hardcoded bootstrap token. Although it makes it easier to clone-and-play the script, IMO. there should be some information in the documentation or during execution, that it's recommended to change the token before deploying the cluster.

Kube-system critical pods eviction

I used this set up and I had 3 nodes cluster. It was fine, before I applied one more deployment. All critical pods got evicted.

I think that any of these critical pods should never be a candidate for the eviction.

Issue with Vagrant / Centos

Running with CentOS

export OS_IMAGE=centos7

Do a vagrant up, get the following:

==> master1: TASK [kubernetes/master : Init Kubernetes cluster] *****************************
==> master1: fatal: [192.16.35.12]: FAILED! => {"changed": true, "cmd": "kubeadm init --service-cidr 10.96.0.0/12 --kubernetes-version v1.9.1 --pod-network-cidr 10.244.0.0/16 --apiserver-advertise-address 192.16.35.12 --token b0f7b8.8d1767876297d85c ", "delta": "0:01:57.685986", "end": "2018-02-14 14:47:44.580484", "msg": "non-zero return code", "rc": 1, "start": "2018-02-14 14:45:46.894498", "stderr": "\t[WARNING FileExisting-crictl]: crictl not found in system path\ncouldn't initialize a Kubernetes cluster", "stderr_lines": ["\t[WARNING FileExisting-crictl]: crictl not found in system path", "couldn't initialize a Kubernetes cluster"], "stdout": "[init] Using Kubernetes version: v1.9.1\n[init] Using Authorization modes: [Node RBAC]\n[preflight] Running pre-flight checks.\n[preflight] Starting the kubelet service\n[certificates] Generated ca certificate and key.\n[certificates] Generated apiserver certificate and key.\n[certificates] apiserver serving cert is signed for DNS names [master1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.16.35.12]\n[certificates] Generated apiserver-kubelet-client certificate and key.\n[certificates] Generated sa key and public key.\n[certificates] Generated front-proxy-ca certificate and key.\n[certificates] Generated front-proxy-client certificate and key.\n[certificates] Valid certificates and keys now exist in \"/etc/kubernetes/pki\"\n[kubeconfig] Wrote KubeConfig file to disk: \"admin.conf\"\n[kubeconfig] Wrote KubeConfig file to disk: \"kubelet.conf\"\n[kubeconfig] Wrote KubeConfig file to disk: \"controller-manager.conf\"\n[kubeconfig] Wrote KubeConfig file to disk: \"scheduler.conf\"\n[controlplane] Wrote Static Pod manifest for component kube-apiserver to \"/etc/kubernetes/manifests/kube-apiserver.yaml\"\n[controlplane] Wrote Static Pod manifest for component kube-controller-manager to \"/etc/kubernetes/manifests/kube-controller-manager.yaml\"\n[controlplane] Wrote Static Pod manifest for component kube-scheduler to \"/etc/kubernetes/manifests/kube-scheduler.yaml\"\n[etcd] Wrote Static Pod manifest for a local etcd instance to \"/etc/kubernetes/manifests/etcd.yaml\"\n[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory \"/etc/kubernetes/manifests\".\n[init] This might take a minute or longer if the control plane images have to be pulled.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\n\nUnfortunately, an error has occurred:\n\ttimed out waiting for the condition\n\nThis error is likely caused by:\n\t- The kubelet is not running\n\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\n\t- There is no internet connection, so the kubelet cannot pull the following control plane images:\n\t\t- gcr.io/google_containers/kube-apiserver-amd64:v1.9.1\n\t\t- gcr.io/google_containers/kube-controller-manager-amd64:v1.9.1\n\t\t- gcr.io/google_containers/kube-scheduler-amd64:v1.9.1\n\nIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:\n\t- 'systemctl status kubelet'\n\t- 'journalctl -xeu kubelet'", "stdout_lines": ["[init] Using Kubernetes version: v1.9.1", "[init] Using Authorization modes: [Node RBAC]", "[preflight] Running pre-flight checks.", "[preflight] Starting the kubelet service", "[certificates] Generated ca certificate and key.", "[certificates] Generated apiserver certificate and key.", "[certificates] apiserver serving cert is signed for DNS names [master1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.16.35.12]", "[certificates] Generated apiserver-kubelet-client certificate and key.", "[certificates] Generated sa key and public key.", "[certificates] Generated front-proxy-ca certificate and key.", "[certificates] Generated front-proxy-client certificate and key.", "[certificates] Valid certificates and keys now exist in \"/etc/kubernetes/pki\"", "[kubeconfig] Wrote KubeConfig file to disk: \"admin.conf\"", "[kubeconfig] Wrote KubeConfig file to disk: \"kubelet.conf\"", "[kubeconfig] Wrote KubeConfig file to disk: \"controller-manager.conf\"", "[kubeconfig] Wrote KubeConfig file to disk: \"scheduler.conf\"", "[controlplane] Wrote Static Pod manifest for component kube-apiserver to \"/etc/kubernetes/manifests/kube-apiserver.yaml\"", "[controlplane] Wrote Static Pod manifest for component kube-controller-manager to \"/etc/kubernetes/manifests/kube-controller-manager.yaml\"", "[controlplane] Wrote Static Pod manifest for component kube-scheduler to \"/etc/kubernetes/manifests/kube-scheduler.yaml\"", "[etcd] Wrote Static Pod manifest for a local etcd instance to \"/etc/kubernetes/manifests/etcd.yaml\"", "[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory \"/etc/kubernetes/manifests\".", "[init] This might take a minute or longer if the control plane images have to be pulled.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "", "Unfortunately, an error has occurred:", "\ttimed out waiting for the condition", "", "This error is likely caused by:", "\t- The kubelet is not running", "\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)", "\t- There is no internet connection, so the kubelet cannot pull the following control plane images:", "\t\t- gcr.io/google_containers/kube-apiserver-amd64:v1.9.1", "\t\t- gcr.io/google_containers/kube-controller-manager-amd64:v1.9.1", "\t\t- gcr.io/google_containers/kube-scheduler-amd64:v1.9.1", "", "If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:", "\t- 'systemctl status kubelet'", "\t- 'journalctl -xeu kubelet'"]}
==> master1: 	to retry, use: --limit @/vagrant/site.retry
==> master1:
==> master1: PLAY RECAP *********************************************************************
==> master1: 192.16.35.10               : ok=9    changed=5    unreachable=0    failed=0
==> master1: 192.16.35.11               : ok=9    changed=5    unreachable=0    failed=0
==> master1: 192.16.35.12               : ok=18   changed=13   unreachable=0    failed=1
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

See here for a description of the issue - kubernetes/kubeadm#613

Resolution from Kube is not targetted until 1.10.

Partial fix described in last issue message

Support working behind proxy

In many organization, environments are set up with firewall. Running programs with network requirements behind the proxy is always troublesome. There are ways to alleviate the proxy issue and can be part of the offerings. This will ease out learning curve for the green horns.

[ERROR KubeletVersion]: the kubelet version is higher than the control plane version.

Hi,

I am using this repo to deploy k8s to VMs. Last time (Dec 2020) I did it, and it was elegant and smooth, went without any issues.
Today I am stuck with this error initializing cluster.

To reproduce :
Workstation = (same from Dec 2020) Ubuntu 40~18.04.1-Ubuntu + Ansible 2.9.20

I create cluster with one master and one worker

Tried all of the following OS versions (master + worker):
Ubuntu 18.04
Ubuntu 20.04
CentOS 7
CentOS 8

Tried with two different cloud providers (Serverspace and Contabo)

I also tried the newest repo clone and I tried my "old" clone from Dec 2020, all this combinations resulted with identical error (check below)

Thanks for any clues!

TASK [kubernetes/master : Init Kubernetes cluster] ****************************************************************************************************************************************************************
Friday 16 April 2021 20:44:39 +0200 (0:00:01.864) 0:03:18.800 **********
fatal: [master1]: FAILED! => {"changed": true, "cmd": "kubeadm init --service-cidr 10.96.0.0/12 --kubernetes-version v1.20.1 --pod-network-cidr 10.244.0.0/16 --token b0f7b8.8d1767876297d85c --apiserver-advertise-address 31.44.3.145 \n", "delta": "0:00:00.350402", "end": "2021-04-16 18:44:41.815988", "msg": "non-zero return code", "rc": 1, "start": "2021-04-16 18:44:41.465586", "stderr": "\t[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/\nerror execution phase preflight: [preflight] Some fatal errors occurred:\n\t[ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: "1.21.0" Control plane version: "1.20.1"\n[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["\t[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/", "error execution phase preflight: [preflight] Some fatal errors occurred:", "\t[ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: "1.21.0" Control plane version: "1.20.1"", "[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[init] Using Kubernetes version: v1.20.1\n[preflight] Running pre-flight checks", "stdout_lines": ["[init] Using Kubernetes version: v1.20.1", "[preflight] Running pre-flight checks"]}

PLAY RECAP ********************************************************************************************************************************************************************************************************

clustered kubernetes master servers

Is this possible using these ansible scripts? I set my masters to be multiple servers but it only initialised the first master.

I'd like to use my own variation of these ansible playbooks (I was originally writing my own and borrowed the initialisation functionality from this) to build a production k8s cluster but it needs to have redundancy and fault tolerance with the master.

Cheers

Andy

setup failure: Error: E:Conflicting values set for option Signed-By regarding source https://apt.kubernetes.io/ kubernetes-xenial: /usr/share/keyrings/kubernetes-archive-keyring.gpg != , E:The list of sources could not be read.

TASK [commons/pre-install : Install Kubernetes packages] ***********************
Thursday 12 August 2021 17:12:07 -0700 (0:00:00.171) 0:00:52.200 *******
included: /home/yy/kubeadm-ansible/roles/commons/pre-install/tasks/pkg.yml for 192.168.0.11, 192.168.0.12

TASK [commons/pre-install : Add Kubernetes APT GPG key] ************************
Thursday 12 August 2021 17:12:07 -0700 (0:00:00.113) 0:00:52.314 *******
changed: [192.168.0.11]
changed: [192.168.0.12]

TASK [commons/pre-install : Add Kubernetes APT repository] *********************
Thursday 12 August 2021 17:12:09 -0700 (0:00:02.143) 0:00:54.457 *******
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: apt_pkg.Error: E:Conflicting values set for option Signed-By regarding source https://apt.kubernetes.io/ kubernetes-xenial: /usr/share/keyrings/kubernetes-archive-keyring.gpg != , E:The list of sources could not be read.
fatal: [192.168.0.11]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 192.168.0.11 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File "/home/yy/.ansible/tmp/ansible-tmp-1628813529.8666284-1749-124276574227406/AnsiballZ_apt_repository.py", line 102, in \r\n _ansiballz_main()\r\n File "/home/yy/.ansible/tmp/ansible-tmp-1628813529.8666284-1749-124276574227406/AnsiballZ_apt_repository.py", line 94, in _ansiballz_main\r\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\r\n File "/home/yy/.ansible/tmp/ansible-tmp-1628813529.8666284-1749-124276574227406/AnsiballZ_apt_repository.py", line 40, in invoke_module\r\n runpy.run_module(mod_name='ansible.modules.apt_repository', init_globals=None, run_name='main', alter_sys=True)\r\n File "/usr/lib/python3.6/runpy.py", line 205, in run_module\r\n return _run_module_code(code, init_globals, run_name, mod_spec)\r\n File "/usr/lib/python3.6/runpy.py", line 96, in _run_module_code\r\n mod_name, mod_spec, pkg_name, script_name)\r\n File "/usr/lib/python3.6/runpy.py", line 85, in _run_code\r\n exec(code, run_globals)\r\n File "/tmp/ansible_apt_repository_payload_yi7jlojn/ansible_apt_repository_payload.zip/ansible/modules/apt_repository.py", line 604, in \r\n File "/tmp/ansible_apt_repository_payload_yi7jlojn/ansible_apt_repository_payload.zip/ansible/modules/apt_repository.py", line 581, in main\r\n File "/usr/lib/python3/dist-packages/apt/cache.py", line 168, in init\r\n self.open(progress)\r\n File "/usr/lib/python3/dist-packages/apt/cache.py", line 223, in open\r\n self._cache = apt_pkg.Cache(progress)\r\napt_pkg.Error: E:Conflicting values set for option Signed-By regarding source https://apt.kubernetes.io/ kubernetes-xenial: /usr/share/keyrings/kubernetes-archive-keyring.gpg != , E:The list of sources could not be read.\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: apt_pkg.Error: E:Conflicting values set for option Signed-By regarding source https://apt.kubernetes.io/ kubernetes-xenial: /usr/share/keyrings/kubernetes-archive-keyring.gpg != , E:The list of sources could not be read.
fatal: [192.168.0.12]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 192.168.0.12 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File "/home/yy/.ansible/tmp/ansible-tmp-1628813529.943152-1750-197405217095954/AnsiballZ_apt_repository.py", line 102, in \r\n _ansiballz_main()\r\n File "/home/yy/.ansible/tmp/ansible-tmp-1628813529.943152-1750-197405217095954/AnsiballZ_apt_repository.py", line 94, in _ansiballz_main\r\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\r\n File "/home/yy/.ansible/tmp/ansible-tmp-1628813529.943152-1750-197405217095954/AnsiballZ_apt_repository.py", line 40, in invoke_module\r\n runpy.run_module(mod_name='ansible.modules.apt_repository', init_globals=None, run_name='main', alter_sys=True)\r\n File "/usr/lib/python3.6/runpy.py", line 205, in run_module\r\n return _run_module_code(code, init_globals, run_name, mod_spec)\r\n File "/usr/lib/python3.6/runpy.py", line 96, in _run_module_code\r\n mod_name, mod_spec, pkg_name, script_name)\r\n File "/usr/lib/python3.6/runpy.py", line 85, in _run_code\r\n exec(code, run_globals)\r\n File "/tmp/ansible_apt_repository_payload_hi8nupwx/ansible_apt_repository_payload.zip/ansible/modules/apt_repository.py", line 604, in \r\n File "/tmp/ansible_apt_repository_payload_hi8nupwx/ansible_apt_repository_payload.zip/ansible/modules/apt_repository.py", line 581, in main\r\n File "/usr/lib/python3/dist-packages/apt/cache.py", line 168, in init\r\n self.open(progress)\r\n File "/usr/lib/python3/dist-packages/apt/cache.py", line 223, in open\r\n self._cache = apt_pkg.Cache(progress)\r\napt_pkg.Error: E:Conflicting values set for option Signed-By regarding source https://apt.kubernetes.io/ kubernetes-xenial: /usr/share/keyrings/kubernetes-archive-keyring.gpg != , E:The list of sources could not be read.\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}

PLAY RECAP *********************************************************************
192.168.0.11 : ok=16 changed=8 unreachable=0 failed=1 skipped=13 rescued=0 ignored=0
192.168.0.12 : ok=16 changed=8 unreachable=0 failed=1 skipped=13 rescued=0 ignored=0
192.168.0.13 : ok=12 changed=7 unreachable=0 failed=0 skipped=8 rescued=0 ignored=0

Thursday 12 August 2021 17:12:10 -0700 (0:00:00.945) 0:00:55.402 *******

docker : Install docker engine (Debian/Ubuntu) ------------------------- 21.04s
docker : Install apt-transport-https ------------------------------------ 9.63s
docker : Add Docker APT repository -------------------------------------- 4.91s
docker : Add Docker APT GPG key ----------------------------------------- 2.31s
commons/pre-install : Add Kubernetes APT GPG key ------------------------ 2.14s
docker : Copy Docker engine service file -------------------------------- 2.13s
Gathering Facts --------------------------------------------------------- 2.09s
docker : Copy Docker environment config file ---------------------------- 1.58s
Gathering Facts --------------------------------------------------------- 1.44s
docker : Enable and check Docker service -------------------------------- 1.42s
docker : Hold docker version -------------------------------------------- 1.22s
docker : Add any insecure registries to Docker config ------------------- 1.07s
commons/pre-install : Add Kubernetes APT repository --------------------- 0.95s
commons/os-checker : Override config file directory for Debian ---------- 0.25s
docker : Add registry to Docker config ---------------------------------- 0.25s
commons/os-checker : Set fact ansible_os_family var to RedHat ----------- 0.24s
docker : Install docker engine (RHEL/CentOS) ---------------------------- 0.24s
commons/os-checker : Set fact ansible_os_family var to Debian ----------- 0.24s
commons/os-checker : Set fact ansible_os_family var to Debian ----------- 0.24s

How to join new node to exist cluster?

#vi host.ini

[master]
192.168.0.38

[node]
192.168.0.4
192.168.0.20 (new node)
192.168.0.17

[kube-cluster:children]
master
node

kubeadm-ansible]# ansible-playbook site.yaml

TASK [kubernetes/node : Join to Kubernetes cluster] ***********************************************************************************************************************
Friday 20 December 2019 04:56:28 +0200 (0:00:00.842) 0:01:45.157 *******
fatal: [192.168.0.20]: FAILED! => {"changed": true, "cmd": "kubeadm join --token b0f7b8.8d1767876297d85c --discovery-token-unsafe-skip-ca-verification 192.168.0.38:6443\n", "delta": "0:05:00.4}

.......
........
PLAY RECAP ****************************************************************************************************************************************************************
192.168.0.17 : ok=20 changed=3 unreachable=0 failed=0 skipped=28 rescued=0 ignored=0
192.168.0.20 : ok=21 changed=14 unreachable=0 failed=1 skipped=27 rescued=0 ignored=0
1

init kubernetes cluster task fails

I ran the pre-install.sh manually and it failed at the init task and produced this error:

fatal: [192.16.35.12]: FAILED! => {"changed": true, "cmd": "kubeadm init --service-cidr 10.96.0.0/12 --kubernetes-version v1.8.3 --pod-network-cidr 10.244.0.0/16 --apiserver-advertise-address 192.16.35.12 --token b0f7b8.8d1767876297d85c ", "delta": "0:00:00.284910", "end": "2017-12-21 18:53:55.383813", "msg": "non-zero return code", "rc": 2, "start": "2017-12-21 18:53:55.098903", "stderr": "\t[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 17.05.0-ce. Max validated version: 17.03\n\t[WARNING FileExisting-crictl]: crictl not found in system path\n[preflight] Some fatal errors occurred:\n\t[ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: "1.9.0" Control plane version: "1.8.3"\n\t[ERROR Swap]: running with swap on is not supported. Please disable swap\n[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...", "stderr_lines": ["\t[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 17.05.0-ce. Max validated version: 17.03", "\t[WARNING FileExisting-crictl]: crictl not found in system path", "[preflight] Some fatal errors occurred:", "\t[ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: "1.9.0" Control plane version: "1.8.3"", "\t[ERROR Swap]: running with swap on is not supported. Please disable swap", "[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=..."], "stdout": "[init] Using Kubernetes version: v1.8.3\n[init] Using Authorization modes: [Node RBAC]\n[preflight] Running pre-flight checks.", "stdout_lines": ["[init] Using Kubernetes version: v1.8.3", "[init] Using Authorization modes: [Node RBAC]", "[preflight] Running pre-flight checks."]}

Can't use pod-network-cidr with /25 netmask

I'm trying to specify a pod-network-cidr with a /25 netmask. I first had to specify node-cidr-mask-size by switching to using kubeadm-init.yml instead of CLI args. The cluster now starts and the nodes join just fine. On my master, I can see that the netmask is correctly specified (IP redacted):

$  kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'
X.X.X.X/25

However, when I try to actually deploy anything, it fails and I get the message below:

  Warning  FailedCreatePodSandBox  9m25s                    kubelet, node02   Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "211a1decea926872a991ecd80eb45f70fdadf7085c69323c733950d9bb84cd0a" network for pod "nginx-deployment-6dd86d77d-xtv2w": NetworkPlugin cni failed to set up pod "nginx-deployment-6dd86d77d-xtv2w_default" network: open /run/flannel/subnet.env: no such file or directory
  Normal   SandboxChanged          9m22s (x12 over 9m33s)   kubelet, node02   Pod sandbox changed, it will be killed and re-created.

Any assistance would be appreciated :)

ip forwarding getting disabled after swapiness was set

It may just be my machine, but I noticed that ip forwarding was getting set to 0 after swapiness was set to 0 in the playbook. Since forwarding was already set to 0 in the config file, the task that sets swapiness was reloading the /etc/sysctl.conf file so I added this line to keep forwarding enabled:

  • { name: 'net.ipv4.ip_forward', value: '1' }

I believe forwarding is required for docker to work.

My ansible version is 2.4.3.0 and i was running ubuntu 16.04 with the latest patches...

adding a new value to the sysctl.conf file must reload the file?

I also noticed that installing docker turns forwarding on but not permanently via the sysctl.conf file (a reboot will turn forwarding back on when docker starts probably)

issue with vagrant and ubuntu 16.04 (xenia64)

I am using the playbook to deploy three nodes using vagrant and ubuntu/xenia64 (I have the same issue using bento/ubuntu16.04).
the playbook stop while trying to install helm has shown in the gist https://gist.github.com/upolymorph/abbf9a1465ae9987e998e19adc4caf84

accessing to the master nodes i have the folowing output:

kubectl -n kube-system get nodes
NAME STATUS ROLES AGE VERSION
k8sceph-01 Ready master 53m v1.9.8

kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
etcd-k8sceph-01 1/1 Running 0 1h
kube-apiserver-k8sceph-01 1/1 Running 0 1h
kube-controller-manager-k8sceph-01 1/1 Running 0 1h
kube-dns-6f4fd4bdf-xlxmz 3/3 Running 0 1h
kube-flannel-ds-p5qw6 1/1 Running 0 54m
kube-proxy-gtl7q 1/1 Running 0 1h
kube-scheduler-k8sceph-01 1/1 Running 0 1h
tiller-deploy-7ccf99cd64-tm9cp 0/1 Pending 0 54m

kubectl -n kube-system describe pod tiller-deploy-7ccf99cd64-tm9cp
Name: tiller-deploy-7ccf99cd64-tm9cp
Namespace: kube-system
Node:
Labels: app=helm
name=tiller
pod-template-hash=3779557820
Annotations:
Status: Pending
IP:
Controlled By: ReplicaSet/tiller-deploy-7ccf99cd64
Containers:
tiller:
Image: gcr.io/kubernetes-helm/tiller:v2.9.1
Ports: 44134/TCP, 44135/TCP
Liveness: http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
Environment:
TILLER_NAMESPACE: kube-system
TILLER_HISTORY_MAX: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-947p2 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-947p2:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-947p2
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Warning FailedScheduling 20s (x191 over 55m) default-scheduler 0/1 nodes are available: 1 PodToleratesNodeTaints.

Can you please provide me with some guidances?
Thanks in advance

can't join a node

When the playbook reaches the "Join to kubernetes cluster" task it fails with:

failed to check server version: Get https://192.168.0.70:6443/version: x509: certificate has expired or is not yet val│ 2345 root 20 0 111896 4248 3456 S 0.0 0.1 0:00.69 docker-proxy packet_write_wait
id

I am new to kubernetes so I am probably doing something wrong, any help would be appreciated!

cannot join network of a non running container

Install on ubuntu 16.04.

After ansible, i've the following message:

~# kubectl get po -n kube-system
NAME                                    READY     STATUS                                                                                                                                                                                                                                                                                RESTARTS   AGE
etcd-vps343998                          1/1       Running                                                                                                                                                                                                                                                                               0          11h
kube-apiserver-vps343998                1/1       Running                                                                                                                                                                                                                                                                               0          11h
kube-controller-manager-vps343998       1/1       Running                                                                                                                                                                                                                                                                               1          11h
kube-dns-2425271678-bxlmx               0/3       rpc error: code = 2 desc = failed to start container "64bdfddb7e49d85b49222d0e81288c581543bcb86015e04785041bac875c2934": Error response from daemon: {"message":"cannot join network of a non running container: 5f2c0d0b4b2aae1c80adc073a1cea37c9e240b3225a69b4a3794eb3f84e6b86c"}   135        11h
kube-flannel-ds-9w491                   1/2       CrashLoopBackOff                                                                                                                                                                                                                                                                      139        11h
kube-flannel-ds-bhkm2                   1/2       CrashLoopBackOff                                                                                                                                                                                                                                                                      138        11h
kube-flannel-ds-cshf7                   1/2       CrashLoopBackOff                                                                                                                                                                                                                                                                      138        11h
kube-flannel-ds-kn0zq                   1/2       CrashLoopBackOff                                                                                                                                                                                                                                                                      139        11h
kube-flannel-ds-ms9z6                   1/2       CrashLoopBackOff                                                                                                                                                                                                                                                                      138        11h
kube-proxy-5lz6m                        1/1       Running                                                                                                                                                                                                                                                                               1          11h
kube-proxy-mprw8                        1/1       Running                                                                                                                                                                                                                                                                               1          11h
kube-proxy-spc8b                        1/1       Running                                                                                                                                                                                                                                                                               1          11h
kube-proxy-td03n                        1/1       Running                                                                                                                                                                                                                                                                               1          11h
kube-proxy-xcv6j                        1/1       Running                                                                                                                                                                                                                                                                               1          11h
kube-scheduler-vps343998                1/1       Running                                                                                                                                                                                                                                                                               2          11h
kubernetes-dashboard-3044843954-1cbtx   0/1       rpc error: code = 2 desc = failed to start container "face0f1dc264f30c9cc540b7a86fff613b610fa3de90884bb73ff6a85a2b464a": Error response from daemon: {"message":"cannot join network of a non running container: f5d4722cebe93e3256f555b14a52e92f832510a23877f10124d98fec85e1731d"}   60         11h

Error with the "kubelet version is higher than the control plane version"

I got this fatal error .

[preflight] Some fatal errors occurred:\n\t[ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: "1.14.1" Control plane version: "1.13.0"\n[preflight]

Where could I change the control plane version or Kuberlet version ?

Thanks,

Norman Su

Check for Firewall

Looks like it's not being checked if there is a firewall. I had issues deploying on CentOS 7 before I stopped the Firewall.

The ports should be opened automatically if needed (eg. 6443).

tried using this repo, getting the syntax error, would you mind please review this

cat inventory

[master]
10.145.74.94

[node]
10.145.74.95
10.145.74.96
10.145.74.97
10.145.74.98

[kube-cluster:children]
master
node

ansible-playbook site.yml -vvv

Using /root/boltz/kubeadm-ansible/ansible.cfg as config file
ERROR! no action detected in task. This often indicates a misspelled module name, or incorrect module path.

The error appears to have been in '/root/boltz/kubeadm-ansible/roles/docker/tasks/main.yml': line 3, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  • name: Install Docker container engine
    ^ here

The error appears to have been in '/root/boltz/kubeadm-ansible/roles/docker/tasks/main.yml': line 3, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  • name: Install Docker container engine
    ^ here

TASK [kubernetes/master : Init Kubernetes cluster] - Port 2379 is in use

cat inventory
[master]
16.200.8.123
[node]
16.200.0.106
16.200.8.122
[kube-cluster:children]
master
node

ansible-playbook site.yaml
yield the error:
TASK [kubernetes/master : Init Kubernetes cluster] ********************************************************************************************************************************
fatal: [16.200.8.123]: FAILED! => {"changed": true, "cmd": "kubeadm init --service-cidr 10.96.0.0/12 --kubernetes-version v1.9.3 --pod-network-cidr 10.244.0.0/16 --token b0f7b8.8d1767876297d85c ", "delta": "0:00:00.252457", "end": "2018-03-15 03:44:54.458903", "msg": "non-zero return code", "rc": 2, "start": "2018-03-15 03:44:54.206446", "stderr": "\t[WARNING FileExisting-crictl]: crictl not found in system path\n[preflight] Some fatal errors occurred:\n\t[ERROR Port-2379]: Port 2379 is in use\n\t[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty\n[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...", "stderr_lines": ["\t[WARNING FileExisting-crictl]: crictl not found in system path", "[preflight] Some fatal errors occurred:", "\t[ERROR Port-2379]: Port 2379 is in use", "\t[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty", "[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=..."], "stdout": "[init] Using Kubernetes version: v1.9.3\n[init] Using Authorization modes: [Node RBAC]\n[preflight] Running pre-flight checks.", "stdout_lines": ["[init] Using Kubernetes version: v1.9.3", "[init] Using Authorization modes: [Node RBAC]", "[preflight] Running pre-flight checks."]}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.