Coder Social home page Coder Social logo

kairen / kube-ansible Goto Github PK

View Code? Open in Web Editor NEW
436.0 25.0 196.0 2.73 MB

Build a Kubernetes cluster via Ansible playbook. :wrench: :wrench: :wrench:

License: Apache License 2.0

Shell 5.07% Python 1.30% Ruby 0.15% HTML 93.47%
kubernetes ceph vagrant virtualbox ansible k8s-conformance

kube-ansible's Introduction

Build Status

Kubernetes Ansible

A collection of playbooks for deploying/managing/upgrading a Kubernetes cluster onto machines, they are fully automated command to bring up a Kubernetes cluster on bare-metal or VMs.

asciicast

Feature list:

  • Support Kubernetes v1.10.0+.
  • Highly available Kubernetes cluster.
  • Full of the binaries installation.
  • Kubernetes addons:
    • Promethues Monitoring.
    • EFK Logging.
    • Metrics Server.
    • NGINX Ingress Controller.
    • Kubernetes Dashboard.
  • Support container network:
    • Calico.
    • Flannel.
  • Support container runtime:
    • Docker.
    • NVIDIA-Docker.(Require NVIDIA driver and CUDA 9.0+)
    • Containerd.
    • CRI-O.

Quick Start

In this section you will deploy a cluster via vagrant.

Prerequisites:

  • Ansible version: v2.5 (or newer).
  • Vagrant: >= 2.0.0.
  • VirtualBox: >= 5.0.0.
  • Mac OS X need to install sshpass tool.
$ brew install http://git.io/sshpass.rb

The getting started guide will use Vagrant with VirtualBox to deploy a Kubernetes cluster onto virtual machines. You can deploy the cluster with a single command:

$ ./hack/setup-vms
Cluster Size: 1 master, 2 worker.
  VM Size: 1 vCPU, 2048 MB
  VM Info: ubuntu16, virtualbox
  CNI binding iface: eth1
Start to deploy?(y):
  • You also can use sudo ./hack/setup-vms -p libvirt -i eth1 command to deploy the cluster onto KVM.

If you want to access API you need to create RBAC object define the permission of role. For example using cluster-admin role:

$ kubectl create clusterrolebinding open-api --clusterrole=cluster-admin --user=system:anonymous

Login the addon's dashboard:

As of release 1.7 Dashboard no longer has full admin privileges granted by default, so you need to create a token to access the resources:

$ kubectl -n kube-system create sa dashboard
$ kubectl create clusterrolebinding dashboard --clusterrole cluster-admin --serviceaccount=kube-system:dashboard
$ kubectl -n kube-system get sa dashboard -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: 2017-11-27T17:06:41Z
  name: dashboard
  namespace: kube-system
  resourceVersion: "69076"
  selfLink: /api/v1/namespaces/kube-system/serviceaccounts/dashboard
  uid: 56b880bf-d395-11e7-9528-448a5ba4bd34
secrets:
- name: dashboard-token-vg52j

$ kubectl -n kube-system describe secrets dashboard-token-vg52j
...
token:      eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtdG9rZW4tdmc1MmoiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTZiODgwYmYtZDM5NS0xMWU3LTk1MjgtNDQ4YTViYTRiZDM0Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZCJ9.bVRECfNS4NDmWAFWxGbAi1n9SfQ-TMNafPtF70pbp9Kun9RbC3BNR5NjTEuKjwt8nqZ6k3r09UKJ4dpo2lHtr2RTNAfEsoEGtoMlW8X9lg70ccPB0M1KJiz3c7-gpDUaQRIMNwz42db7Q1dN7HLieD6I4lFsHgk9NPUIVKqJ0p6PNTp99pBwvpvnKX72NIiIvgRwC2cnFr3R6WdUEsuVfuWGdF-jXyc6lS7_kOiXp2yh6Ym_YYIr3SsjYK7XUIPHrBqWjF-KXO_AL3J8J_UebtWSGomYvuXXbbAUefbOK4qopqQ6FzRXQs00KrKa8sfqrKMm_x71Kyqq6RbFECsHPA

Copy and paste the token to dashboard.

Manual deployment

In this section you will manually deploy a cluster on your machines.

Prerequisites:

  • Ansible version: v2.5 (or newer).
  • Linux distributions: Ubuntu 16+/Debian/CentOS 7.x.
  • All Master/Node should have password-less access from deploy node.

For machine example:

IP Address Role CPU Memory
172.16.35.9 vip - -
172.16.35.10 k8s-m1 4 8G
172.16.35.11 k8s-n1 4 8G
172.16.35.12 k8s-n2 4 8G
172.16.35.13 k8s-n3 4 8G

Add the machine info gathered above into a file called inventory/hosts.ini. For inventory example:

[etcds]
k8s-m1
k8s-n[1:2]

[masters]
k8s-m1
k8s-n1

[nodes]
k8s-n[1:3]

[kube-cluster:children]
masters
nodes

Set the variables in group_vars/all.yml to reflect you need options. For example:

# overide kubernetes version(default: 1.10.6)
kube_version: 1.11.2

# container runtime, supported: docker, nvidia-docker, containerd.
container_runtime: docker

# container network, supported: calico, flannel.
cni_enable: true
container_network: calico
cni_iface: ''

# highly available variables
vip_interface: ''
vip_address: 172.16.35.9

# etcd variables
etcd_iface: ''

# kubernetes extra addons variables
enable_dashboard: true
enable_logging: false
enable_monitoring: false
enable_ingress: false
enable_metric_server: true

# monitoring grafana user/password
monitoring_grafana_user: "admin"
monitoring_grafana_password: "p@ssw0rd"

Deploy a Kubernetes cluster

If everything is ready, just run cluster.yml playbook to deploy the cluster:

$ ansible-playbook -i inventory/hosts.ini cluster.yml

And then run addons.yml to create addons:

$ ansible-playbook -i inventory/hosts.ini addons.yml

Verify cluster

Verify that you have deployed the cluster, check the cluster as following commands:

$ kubectl -n kube-system get po,svc

NAME                                 READY     STATUS    RESTARTS   AGE       IP             NODE
po/haproxy-master1                   1/1       Running   0          2h        172.16.35.10   k8s-m1
...

Reset cluster

Finally, if you want to clean the cluster and redeploy, you can reset the cluster by reset-cluster.yml playbook.:

$ ansible-playbook -i inventory/hosts.ini reset-cluster.yml

Contributing

Pull requests are always welcome!!! I am always thrilled to receive pull requests.

kube-ansible's People

Contributors

box9527 avatar denis256 avatar florentflament avatar kairen avatar khann-adill avatar pryorda avatar twillert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kube-ansible's Issues

storageclass example?

first of all, i love your project. it was quite helpful for me to get k8s with hyperconverged ceph running in ansible. Do you have an example storageclass based on this project? I have been trying to create one but im not having luck. it gives an error about incorrect secrets and i have been trying a different number of id and secret names and have not had luck. I attached one version of the example below.

---
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
   name: normal
provisioner: kubernetes.io/rbd
parameters:
  monitors: 10.10.73.3:6789,10.10.44.4:6789,10.10.39.4:6789
  adminId: admin
  adminSecretName: ceph-conf-combined
  adminSecretNamespace: ceph
  pool: kube
  userId: client
  userSecretName: ceph-client-key

Appreciate any help you can give.

timeout to create kube-apiserver to kubelet RBAC

TASK [k8s-setup : Create kube-apiserver to kubelet RBAC] **************************************************************************************************************************************************
Tuesday 20 April 2021 20:30:08 +0800 (0:00:01.632) 0:03:44.899 *********
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (10 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (9 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (8 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (7 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (6 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (5 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (4 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (3 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (2 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (1 retries left).
fatal: [10.85.246.115 -> 10.85.246.115]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "apply", "-f", "/tmp/apiserver-to-kubelet-rbac.yml"], "delta": "0:00:30.385037", "end": "2021-04-20 20:36:06.896236", "msg": "non-zero return code", "rc": 1, "start": "2021-04-20 20:35:36.511199", "stderr": "Unable to connect to the server: dial tcp 10.85.247.115:6443: i/o timeout", "stderr_lines": ["Unable to connect to the server: dial tcp 10.85.247.115:6443: i/o timeout"], "stdout": "", "stdout_lines": []}

centos07, dict object' has no attribute u'ansible_eth1'"

Trying to run a vagrant build based upon centos7, after a few minutes this error keeps appearing.

TASK [etcd : Copy Etcd conf template file] *************************************
fatal: [172.16.35.10]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: https://{{ etcd_ip_addr }}:{{ etcd_peer_port }}: {% if etcd_iface != '' %}{{ hostvars[inventory_hostname]['ansible_' + etcd_iface].ipv4.address }} {%- else %}{{ hostvars[inventory_hostname].ansible_default_ipv4.address }}{% endif %}: 'dict object' has no attribute u'ansible_eth1'"}

Tried to run the vagrant build using ubuntu as os, this runs without errors.

Unable to install addons properly

in master branch
my inventory/group_vars/all.yml

---

kube_version: 1.13.4

# Container runtime,
# Supported: docker, nvidia-docker, containerd.
container_runtime: docker

# Container network,
# Supported: calico, flannel.
cni_enable: true
container_network: calico

# Kubernetes HA extra variables.
vip_interface: ""
vip_address: 192.168.3.100

# Kubernetes extra addons
enable_ingress: true
enable_dashboard: false
enable_logging: true
enable_monitoring: false
enable_metric_server: true

grafana_user: "admin"
grafana_password: "admin"

error message:
fatal: [k8s-m1]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'namespace'\n\nThe error appears to have been in '/root/kube-ansible/roles/k8s-addon/tasks/main.yml': line 35, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Check {{ addon.name }} addon dependencies status\n ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written as:\n\n with_items:\n - \"{{ foo }}\"\n"}

Kubernetes version update/upgrade issue.

  • kube-ansible can be updated without work to kube_version: 1.14.7 in kube-ansible/inventory/group_vars/all.yml

  • I cannot upgrade to (latest)stable nor to kube_version: 1.15.X

Upgrade existing cluster

Hello,

It's more than an issue, it's a question about how and what is the best practices to upgrade a cluster, knowing that I use your playbook to deploy the cluster ?

Thank you

API server certificate

Hi, I am running a couple of applications that checked for API server certificate and your ansible doesn't seem to contain the right hostnames (the default / usual API hostname is kubernetes.default.svc, but your certificate seems to have only kubernetes.default). Where could I changed that? Thanks

ๅ…ณไบŽkeepalivedไธปๅค‡็š„็–‘้—ฎ

ไฝ ๅฅฝ๏ผŒๆœ‰ๅนธ่ƒฝๅคŸไบ†่งฃๅˆฐๆœฌ้กน็›ฎ๏ผŒๅœจๆญๅปบๆ—ถๆœ‰ไธ€ไบ›ๅฐ็–‘้—ฎ่ฟ˜่ฏทๅคง็ฅž่ƒฝๅคŸๆŒ‡็‚นไธ€ไธ‹๏ผŒ่ฐข่ฐขใ€‚

  1. ๅœจๆœฌ้กน็›ฎ่ฟ›่กŒapiserver่ดŸ่ฝฝ็š„ๆ—ถๅ€™็”จๅˆฐไบ†keepalived๏ผŒไฝ†ๆ˜ฏๆˆ‘ๆŸฅ้˜…ไบ†ไธ€ไบ›่ต„ๆ–™ๅ‘็Žฐ้ƒฝๆ˜ฏ่ฏด็š„priorityๅ€ผ่ถŠๅคงๆ‰ๆ˜ฏไธป่Š‚็‚น๏ผŒๅ…ถไฝ™ไธบๅค‡็”จ่Š‚็‚น๏ผŒไฝ†ๆœฌ้กน็›ฎไธญpriorityๅ€ผๆœ€ๅฐ็š„ๅชๆœ‰ไธ€ไธช่Š‚็‚น
    ๆœฌ้กน็›ฎpriority้…็ฝฎๆ–‡ไปถ๏ผš
    keepalived_priority: "{% if inventory_hostname == groups['masters'][0] %}100{% else %}150{% endif %}"

    ๆˆ‘ๆŸฅ้˜…ๅˆฐ็›ธๅ…ณๆ–‡ๆกฃ
  1. ๅœจๆœฌ้กน็›ฎไธญkeepalived้ƒจ็ฝฒๅˆฐไบ†ๆ‰€ๆœ‰master่Š‚็‚น๏ผŒไฝ†ๅนถๆฒกๆœ‰ๅ‘็Žฐๅœจkeepalived็š„้…็ฝฎไธญๆœ‰ไธปๅค‡ๅˆ‡ๆข็š„้…็ฝฎ๏ผŒไนŸๆฒกๆœ‰้…็ฝฎkeepalived pod็š„ๆŽข้’ˆ๏ผŒๆœฌ้กน็›ฎไธญ็š„keepalivedๆœ‰ไธปๅค‡ๅˆ‡ๆข็š„็›ธๅ…ณไป‹็ปๅ—๏ผŸ

่ฟ˜่ฏทๅคง็ฅž่ƒฝๅคŸๅœจ็™พๅฟ™ไน‹ไธญๆŒ‡็‚นไธ€ไบŒ๏ผŒไธ‡ๅˆ†ๆ„Ÿๆฟ€๏ผŒ่ฐข่ฐขใ€‚

Add new features for v1.9.x

Todo list:

  • Add new integration test.
  • Update legacy addons and CNIs.
  • Support offline install.
  • Refactor some codes.

Dashboard access

setup completed zero errors on cent OS 7 vm's running in vmware.
kubectl retrieves everything, get pods services describe etc.
kubectl proxy then running

http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
get's me this
Error: 'dial tcp 10.244.1.6:8443: connect: network is unreachable'
Trying to reach: 'https://10.244.1.6:8443/'

kubectl -n kube-system get po,svc
NAME READY STATUS RESTARTS AGE
pod/calico-node-7gvv8 2/2 Running 4 6h
pod/calico-node-gqxz2 2/2 Running 4 6h
pod/coredns-6d98b868c7-slk7m 1/1 Running 1 4h
pod/elasticsearch-logging-0 1/1 Running 1 3h
pod/fluentd-es-8tw67 1/1 Running 1 3h
pod/fluentd-es-q6fj6 1/1 Running 1 3h
pod/kibana-logging-56c4d58dcd-blcxh 1/1 Running 1 3h
pod/kube-proxy-7vqh5 1/1 Running 3 6h
pod/kube-proxy-vb2fs 1/1 Running 3 6h
pod/kubernetes-dashboard-6948bdb78-wb59s 1/1 Running 1 4h
pod/metrics-server-86bd9d7667-lwpg6 1/1 Running 1 4h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/calico-typha ClusterIP 10.111.173.6 5473/TCP 6h
service/elasticsearch-logging ClusterIP 10.111.176.201 9200/TCP 3h
service/kibana-logging ClusterIP 10.111.132.176 5601/TCP 3h
service/kube-controller-manager ClusterIP None 10252/TCP 1h
service/kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 6h
service/kube-scheduler ClusterIP None 10251/TCP 1h
service/kubelet ClusterIP None 10250/TCP 3h
service/kubernetes-dashboard ClusterIP 10.106.191.94 443/TCP 6h
service/metrics-server ClusterIP 10.103.82.224 443/TCP 6h

What else can I check?

nginx ingress controller

Hello,

first of all thank you for this amazing work,
I have a question about the ingress controller, I always get this:
curl -H "Host: game.domain1.io" 192.168.232.133
curl: (7) Failed connect to 192.168.232.133:80; Connection refused

thank you

kibana-logging service cannot access

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "services \"kibana-logging\" is forbidden: User \"system:anonymous\" cannot get services/proxy in the namespace \"kube-system\"",
  "reason": "Forbidden",
  "details": {
    "name": "kibana-logging",
    "kind": "services"
  },
  "code": 403
}

cluster.yml

Hello,

The Ansible Playbook cluster.yml can not be thrown twice, otherwise the kubernetes cluster falls into error.

Regards,
Maxence

VMs not reboot persistent - kubelet not starting because of swap

Hi,
thanks for sharing this!

After setting up a working cluster the individual VMs are not reboot safe. After a reboot the kubelet was not start because of swap being enabled. The playbooks does a 'swapoff -a' which is not persistent.

One way of diabling swap in a persistent way would be to create a SystemD service to turn it off before starting Kubelet. Like this:

# /etc/systemd/system/swapoff.service

[Unit]
Description=Swapoff, kubelet requirement
After=network.target
Before=kubelet.service

[Service]
Type=oneshot
ExecStart=/sbin/swapoff -a
RemainAfterExit=true

[Install]
WantedBy=multi-user.target

Where would be the best place in the code to implement this? I would be happy to give a pull request a shot.

Cheers,
Thomas

Request

Hi there

This is an amazing Repository, thank you very much.

May I ask if you will ever add the option to use Weave as your CNI??

Cheers

Cert Sign Approval

Hi

When i use your repo to start a cluster, everything works great, but i have one issue.

When the cert signing for the master node has to happen, the request gets created, but it keeps in a pending state, the script continue and completes.
When i log into the cluster all the nodes are there but the master, the one where the cert signing is pending.
I reset the cluster and started the playbook again, this time keeping an eye for when the cert signing request gets done, and then I approved the request by hand, and that fixed the issue, and i could see all nodes on the cluster.

What could cause the cert signing to be stuck in pending, and i have to do it by hand??

Thank you for he awesome work so far, and any help will be greatly appreciated.

Cheers

Dashboard timeout error

After installing k8s, I try to open the dashboard (https://server-ip:8443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/) and get an error:
Error: 'dial tcp 10.244.0.24:8443: i/o timeout' Trying to reach: 'https://10.244.0.24:8443/'
Can you tell me what the problem might be?

Test cluster:
$ kubectl -n kube-system get po,svc
pod/calico-node-cwpt9 2/2 Running 0 14m
pod/coredns-896d9f87d-2nlfs 1/1 Running 0 15m
pod/coredns-autoscaler-58784cd54d-svlwv 1/1 Running 0 15m
pod/kube-proxy-lrj6c 1/1 Running 0 14m
pod/kubernetes-dashboard-57df4db6b-6w49h 1/1 Running 0 10m
pod/metrics-server-68d85f76bb-7n2n5 1/1 Running 0 8m15s

service/calico-typha ClusterIP 10.106.84.71 5473/TCP 14m
service/kube-controller-manager ClusterIP None 10252/TCP 8m37s
service/kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 15m
service/kube-scheduler ClusterIP None 10251/TCP 8m37s
service/kubelet ClusterIP None 10250/TCP 6m37s
service/kubernetes-dashboard ClusterIP 10.103.103.89 443/TCP 10m
service/metrics-server ClusterIP 10.106.81.208 443/TCP 8m15s

Prometheus alerts

KubeControllerManager and KubeScheduler work normaly , but here has critical alerts.

image

Generation of certificates locally

Hello,

When there are several master, it is absolutely necessary to generate the SSL certificates on the machine host (localhost) and after sending them to the different server.
Otherwise, we end up with totally different certificates on the masters, and it does not work.

Maxence

One Task is failing : Set taint to effect NoSchedule please help.

TASK [k8s-setup : Set taint to effect NoSchedule] **********************************************************************************************************************
Monday 15 October 2018 13:09:52 +0200 (0:00:00.353) 0:01:34.555 ********
FAILED - RETRYING: Set taint to effect NoSchedule (10 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (10 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (10 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (9 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (9 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (9 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (8 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (8 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (8 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (7 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (7 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (7 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (6 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (6 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (6 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (5 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (5 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (5 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (4 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (4 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (4 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (3 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (3 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (3 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (2 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (2 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (2 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (1 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (1 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (1 retries left).
fatal: [foundery02]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "taint", "nodes", "Foundery02", "node-role.kubernetes.io/master=:NoSchedule", "--overwrite"], "delta": "0:00:00.088008", "end": "2018-10-15 13:10:14.851340", "msg": "non-zero return code", "rc": 1, "start": "2018-10-15 13:10:14.763332", "stderr": "Error from server (NotFound): nodes "Foundery02" not found", "stderr_lines": ["Error from server (NotFound): nodes "Foundery02" not found"], "stdout": "", "stdout_lines": []}
...ignoring
fatal: [foundery03]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "taint", "nodes", "Foundery03", "node-role.kubernetes.io/master=:NoSchedule", "--overwrite"], "delta": "0:00:00.085448", "end": "2018-10-15 13:10:15.293709", "msg": "non-zero return code", "rc": 1, "start": "2018-10-15 13:10:15.208261", "stderr": "Error from server (NotFound): nodes "Foundery03" not found", "stderr_lines": ["Error from server (NotFound): nodes "Foundery03" not found"], "stdout": "", "stdout_lines": []}
...ignoring
fatal: [foundery04]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "taint", "nodes", "Foundery04", "node-role.kubernetes.io/master=:NoSchedule", "--overwrite"], "delta": "0:00:00.086700", "end": "2018-10-15 13:10:16.130539", "msg": "non-zero return code", "rc": 1, "start": "2018-10-15 13:10:16.043839", "stderr": "Error from server (NotFound): nodes "Foundery04" not found", "stderr_lines": ["Error from server (NotFound): nodes "Foundery04" not found"], "stdout": "", "stdout_lines": []}
...ignoring

etcd not work

Hello,

The etcd cluster is not "work".

root@master1:~# etcdctl cluster-health
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused
; error #1: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused
error #1: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

Maxence

CoreDNS cannot access external domain names.

I install the cluster with default parameters, but coredns not working properly.

Node OS: ubuntu 18.04

kubectl logs coredns-7945bc8d5c-qp5s7 -n kube-system --tail=100 -f

......
2018/08/20 07:02:44 [ERROR] 2 smtp.office365.com. A: unreachable backend: read udp 10.244.2.27:34504->10.96.0.10:53: i/o timeout
2018/08/20 07:02:44 [ERROR] 2 smtp.office365.com. AAAA: unreachable backend: read udp 10.244.2.27:40291->10.96.0.10:53: i/o timeout
2018/08/20 07:02:44 [ERROR] 2 smtp.office365.com. AAAA: unreachable backend: read udp 10.244.2.27:42891->10.96.0.10:53: i/o timeout
2018/08/20 07:02:44 [ERROR] 2 elasticsearch-logging. AAAA: unreachable backend: read udp 10.244.2.27:55194->10.96.0.10:53: i/o timeout
2018/08/20 07:02:44 [ERROR] 2 smtp.office365.com. AAAA: unreachable backend: read udp 10.244.2.27:36631->10.96.0.10:53: i/o timeout
......
kubectl describe pod coredns-7945bc8d5c-qp5s7 -n kube-system

......
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  55m               default-scheduler  Successfully assigned kube-system/coredns-7945bc8d5c-qp5s7 to k8s-n2
  Normal   Pulled     54m               kubelet, k8s-n2    Container image "192.168.21.29:5000/coredns:1.1.3" already present on machine
  Normal   Created    54m               kubelet, k8s-n2    Created container
  Normal   Started    54m               kubelet, k8s-n2    Started container
  Warning  Unhealthy  4m (x9 over 51m)  kubelet, k8s-n2    Liveness probe failed: Get http://10.244.2.27:8080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

ETCD enable and started failed

Hello,
I'm trying to use your code to setup a HA k8s but unfortunately I get this error :
fatal: [Master-k8s-2]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service etcd: Job for etcd.service failed because a timeout was exceeded. See "systemctl status etcd.service" and "journalctl -xe" for details.\n"}

and when I check the logs I see error: tls: bad certificate)

do you have any idea why ?

Thank you for your reply

Ansible issue with if statement

Thanks for taking time to write put this together. I ran into this error:

fatal: [192.168.1.20]: FAILED! => {"msg": "Unexpected templating type error occurred on ({% if num_nodes >= 0 -%}{{ 90 * 1024 + num_nodes|int * nanny_memory_per_node}}Ki{%- else -%}90Mi{% endif -%}): '>=' not supported between instances of 'str' and 'int'"}

Ansible does like the comparison in this statement:

"{% if num_nodes >= 0 -%}{{ 90 * 1024 + num_nodes|int * nanny_memory_per_node}}Ki{%- else -%}90Mi{% endif -%}"

I believe it should be like this:

"{% if num_nodes | int >= 0 -%}{{ 90 * 1024 + num_nodes|int * nanny_memory_per_node}}Ki{%- else -%}90Mi{% endif -%}"

x509: certificate signed by unknown authority issue after reset cluster and reinstall

Hi, I hit below errors, when I first success install it and reset it, then install it again ,have below errors:

fatal: [192.168.4.91 -> 192.168.4.91]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "apply", "-f", "/tmp/apiserver-to-kubelet-rbac.yml"], "delta": "0:00:02.393257", "end": "2019-07-16 09:45:25.181932", "msg": "non-zero return code", "rc": 1, "start": "2019-07-16 09:45:22.788675", "stderr": "unable to recognize \"/tmp/apiserver-to-kubelet-rbac.yml\": Get https://192.168.4.91:8443/api?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")\nunable to recognize \"/tmp/apiserver-to-kubelet-rbac.yml\": Get https://192.168.4.91:8443/api?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")", "stderr_lines": ["unable to recognize \"/tmp/apiserver-to-kubelet-rbac.yml\": Get https://192.168.4.91:8443/api?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")", "unable to recognize \"/tmp/apiserver-to-kubelet-rbac.yml\": Get https://192.168.4.91:8443/api?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"], "stdout": "", "stdout_lines": []}

etcd SSL cert creation fails

TASK [cert : Create etcd SSL certificate key files] ***********************************************************************************************************************
Tuesday 12 February 2019 15:11:02 -0500 (0:00:00.194) 0:00:10.056 ******
fatal: [k8s-m1]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_default_ipv4'\n\nThe error appears to have been in '/home/mike/kube-ansible/roles/cert/tasks/create-etcd-certs.yml': line 49, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Create etcd SSL certificate key files\n ^ here\n"}

Upgrade existing Cluster

Hello,

It's more than an issue, it's a question about how and what is the best practices to upgrade a cluster, knowing that I use your playbook to deploy the cluster ?

Thank you

Deployment failure with ansible issue

I'm getting this error while deploying:

TASK [k8s-setup : Copy Keepalived manifest and config files into cluster] ***********************************************************************************************************************************************************************************************************************************************************
Wednesday 02 October 2019  13:17:18 +0200 (0:00:00.666)       0:04:54.832 *****
fatal: [172.16.202.130]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_eth1'"}

I tried to fix this by adding ansible_eth1: eth1 to the group_vars. But no I'm getting this error:

TASK [k8s-setup : Copy Keepalived manifest and config files into cluster] ***********************************************************************************************************************************************************************************************************************************************************
Wednesday 02 October 2019  13:30:26 +0200 (0:00:00.679)       0:04:42.554 *****
fatal: [172.16.202.130]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.parsing.yaml.objects.AnsibleUnicode object' has no attribute 'ipv4'"}

Please advise. Ansible used:

ansible 2.8.5
  config file = /Users/tdeutsch/git/kube-ansible/ansible.cfg
  configured module search path = ['/Users/tdeutsch/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/Cellar/ansible/2.8.5/libexec/lib/python3.7/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.7.4 (default, Sep  7 2019, 18:27:02) [Clang 10.0.1 (clang-1001.0.46.4)]

Could not start Kubernetes core components

upon running the shell script: ./hack/setup-vms

I get error:

TASK [k8s-setup : Wait for Kubernetes core component start] ********************
Thursday 02 May 2019  14:12:40 -0400 (0:00:00.353)       0:03:15.687 **********
failed: [192.168.1.10] (item=6443) => {"changed": false, "elapsed": 300, "item": 6443, "msg": "Timeout when waiting for 127.0.0.1:6443"}```

PLAY RECAP *********************************************************************
192.168.1.10 : ok=140 changed=77 unreachable=0 failed=1
192.168.1.12 : ok=28 changed=19 unreachable=0 failed=0
192.168.1.13 : ok=28 changed=19 unreachable=0 failed=0

Thursday 02 May 2019 14:19:43 -0400 (0:07:03.147) 0:10:18.835 **********

k8s-setup : Wait for Kubernetes core component start ------------------ 423.15s
download/package : Downloading kubelet file ---------------------------- 81.90s
download/package : Downloading kubectl file ---------------------------- 28.81s
download/package : Downloading docker file ----------------------------- 15.01s
download/package : Downloading cni file --------------------------------- 4.37s
download/package : Downloading cfssl file ------------------------------- 4.25s
download/package : Extract docker file ---------------------------------- 3.69s
download/package : Downloading etcd file -------------------------------- 3.27s
cert : Generate Kubernetes SSL certificate json files ------------------- 2.05s
cert : Create Kubernetes SSL certificate key files ---------------------- 1.82s
common/copy-files : Write the content of files -------------------------- 1.67s
k8s-setup : Copy Kubernetes manifest and config files into cluster ------ 1.49s
cert : Delete unnecessary Kubernetes files ------------------------------ 1.48s
download/package : Downloading cfssljson file --------------------------- 1.43s
common/copy-files : Check the files already exists ---------------------- 1.41s
common/copy-files : Read the config files ------------------------------- 1.28s
download/package : Extract cni file ------------------------------------- 1.24s
download/package : Symlinks docker to /usr/local/bin -------------------- 1.16s
download/package : Extract etcd file ------------------------------------ 0.99s
etcd : Enable and restart etcd ------------------------------------------ 0.93s
Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.
==> k8s-n3: The previous process exited with exit code 1.
==> k8s-n2: The previous process exited with exit code 1.
==> k8s-n1: The previous process exited with exit code 1.
==> k8s-m1: The previous process exited with exit code 1.

TLS issue

Hi

Firstly, thank you for the awesome work with this repo.

I am stuck in my process of creating a cluster.

The script runs fine until it gets to the: "TASK [k8s-setup : Create kube-apiserver to kubelet RBAC]" part.

Error im getting:
"fatal: [172.24.49.100 -> 172.24.49.100]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "apply", "-f", "/tmp/apiserver-to-kubelet-rbac.yml"], "delta": "0:00:00.074493", "end": "2018-09-06 14:30:49.764857", "msg": "non-zero return code", "rc": 1, "start": "2018-09-06 14:30:49.690364", "stderr": "unable to recognize "/tmp/apiserver-to-kubelet-rbac.yml": Get https://:8443/api?timeout=32s: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config\nunable to recognize "/tmp/apiserver-to-kubelet-rbac.yml": Get https://:8443/api?timeout=32s: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config", "stderr_lines": ["unable to recognize "/tmp/apiserver-to-kubelet-rbac.yml": Get https://:8443/api?timeout=32s: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config", "unable to recognize "/tmp/apiserver-to-kubelet-rbac.yml": Get https://:8443/api?timeout=32s: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config"], "stdout": "", "stdout_lines": []}"

Im also getting this error whenever i try to use kubectl

Unable to connect to the server: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config

So it would seem that the issue is that nothing can invoke the command kubectl

Please, any help will be greatly appreciated

Default deployment using setup tool without prompt?

Is there a way to deploy the default environment (1b, 2w, 1c, 2048m...) without the "Start deploying? (y)" prompt?
Tried ./tools/setup --force but didn't work

./tools/setup -f
./tools/func-vars: line 109: 2: unbound variable

I'd like to run kube-ansible within my sh script and deploy automatically

Upgrade existing cluster

Hello,

It's more than an issue, it's a question about how and what is the best practices to upgrade a cluster, knowing that I use your playbook to deploy the cluster ?
@kairen
Thank you

Certificate error

Hello,

I had a certificate error when I use kubectl.
I do not know to go wrong, would it be me?

capture d ecran 2017-10-04 a 00 16 44

Regards,
Maxence

Dashboard isn't installed

Hello,

I had tried this quick start method with 1 k8s master and 2 k8s node. Kubernetes dashboard isn't available though I executed addons also.

root@master1:# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready master 33m v1.9.1
node1 Ready 32m v1.9.1
node2 Ready master 33m v1.9.1
root@master1:
# kubectl cluster-info
Kubernetes master is running at https://172.16.35.9:6443

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
root@master1:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-92nfd 2/2 Running 0 32m
kube-system calico-node-gb8gv 2/2 Running 0 32m
kube-system calico-node-nbll8 2/2 Running 0 32m
kube-system calico-policy-controller-fb675cfbc-s9kpl 1/1 Running 0 32m
kube-system haproxy-master1 1/1 Running 0 32m
kube-system haproxy-node2 1/1 Running 0 32m
kube-system keepalived-master1 1/1 Running 0 32m
kube-system keepalived-node2 1/1 Running 0 32m
kube-system kube-apiserver-master1 1/1 Running 0 32m
kube-system kube-apiserver-node2 1/1 Running 0 33m
kube-system kube-controller-manager-master1 1/1 Running 0 32m
kube-system kube-controller-manager-node2 1/1 Running 0 32m
kube-system kube-dns-74bf5c4b94-vhcc5 3/3 Running 0 32m
kube-system kube-proxy-7mjrf 1/1 Running 0 32m
kube-system kube-proxy-hm2bj 1/1 Running 0 32m
kube-system kube-proxy-ncqfs 1/1 Running 0 32m
kube-system kube-scheduler-master1 1/1 Running 0 32m
kube-system kube-scheduler-node2 1/1 Running 0 32m

Could you guide me where am i doing mistake, that I can avoid and bring k8s dashbord

Trouble with setting up cluster

Hi there

during the ansible run to create the cluster, i get stuck at "[k8s-setup : Wait for Kubernetes core component start]"
with error:
"11:14:24.197345 11914 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Get https://:8443/api/v1/nodes?fieldSelector=metadata.name%3Dcptk8spoc01.capetown.fwslash.net&limit=500&resourceVersion=0: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config"

Also found that the docker network is on 172.x.x.x and so is our environment.

Any idea what this could be

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.