Coder Social home page Coder Social logo

kubeadm-ha's People

Contributors

cruse123 avatar fossabot avatar hxdnshx avatar lentil1016 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubeadm-ha's Issues

Grafana部署

你好,按照您分支里的部署grafana,如何安装插件呢,插件需要重启,删了重建就没了

怎么实现安装coredns的

看项目里面文件没有包含coredns,测试能装上coredns,咋实现的啊,能写几个文档说说嘛?

calico未知问题导致的nodePort失效,Traefik无法使用问题

每次都完全删除/新建虚机,重装3次依然出现nodePort失效和Traefik无法使用的问题(之前也装过多次,并没有出现该问题,该问题是昨天的某一次重装开始就一直出现):

  1. nodePort:可以在每一个node上看到已经都在监听建立的nodePort了,但是该nodePort只在pod所在的node好用,其他node上均不好用。
  2. Traefik也一样,只有域名指向pod所在的node才能访问到。

我猜测是calico的问题,但是看到包括calico在内的所有pod都是running没有问题的,所以我就想替换成flannel试试,但是很不幸,直接用项目主的yaml删除calico,并安装flannel失败,说是sandbox有问题。仔细看了看项目主的脚本,发现是在组ha之前就apply了calico,所以猜测是这里导致flannel无法使用。我又删掉、重建虚机,将项目主脚本中的apply calico换成apply flannel后,nodePort和ingress都可以正常运行了。

这里想请教项目主一个问题,你们私有云/公有云用k8s时,一般集群外访问应用,nodePort和Ingress应该指向那个master?因为slave容易down,所以我觉得不能指向slave,除非在公有云环境用SLB等LoadBalancer指向所有slave然后做健康检测。
您给出的方案是3个master,所以单指向哪个master都不行,你们是怎么做的?是把keepalived的虚拟IP配到公网上,还是在k8s前面再放一个haproxy呢?

各节点服务怎么重启?

各节点服务怎么重启?
如果服务器节点挂了,怎么启动,重启了, dashboard.multi.io报Gateway Timeout错误

kube-proxy在ipvs mode下定时更新ipvs规则的疑问

kube-proxy在ipvs mode下定时更新ipvs规则的疑问,
我手动添加了一些LVS规则,貌似会被kube-proxy给清理掉,为什么会清理掉呢?
kube-proxy不是刷新部分LVS规则?如果我需要手动加一些LVS规则,有怎么做呢?
请教一下,谢谢!

kubernetes-dashboard無法訪問

我按照此方法成功部署了3masrer k8s集群,但dashboard無法訪問
1、查看 kubernetes-dashboard日誌信息如下:

# kubectl logs  kubernetes-dashboard-94b488ddb-tn9cg -n kube-system
2018/12/25 05:41:39 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/12/25 05:42:09 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/12/25 05:42:39 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.

2、查看Metric 日誌

# kubectl logs metrics-server-5bff75b59f-54t27 -n kube-system
Error from server (BadRequest): a container name must be specified for pod metrics-server-5bff75b59f-54t27, choose one of: [metrics-server metrics-server-nanny]

1.14 dashboard能访问,无法登陆

安装的1.14版本,安装很顺利,dashboard能访问,但是登陆不了,也按照文章说明进行了删除操作,还是不行,浏览器控制台看有发送请求,是认证信息的问题吗,我用的是dashboard-adin-token登陆的,也试了.kube/config,求求解

关于高可用集群证书的问题

大佬,再打扰你一下关于高可用集群证书的问题。。之前再v1.11的时候是我在每台master上生成每个组件的如apiserver,etcd。100年证书,同时再用kubeadm alpha phase把节点标记成master之类

现在用kubeadm join 实现ha的话,我还要这么做么。。而且我还需要手动把每个master的etcd加到集群里,现在这个命令是帮你完成这个功能了是么

kubernetes: master init node localhost.localdomain not found

Hi,
I am using kubeadm init to init one master node based on one VM. But during the process, node"localhost.localdomain" nout found problem always shows up.
My /etc/hosts:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost4 localhost4.localdomain4
192.168.5.130 localhost.localdomain

The docekr version os 18.06-ce, kubernetes version is v1.13.4
The kernel is based on linux-4.9.137, self compiled. These are all the modules the kernel has:
image
Some solutions says that kubernetes use its own dns to resolv the master name, not the /resolv.conf. Is it true?
image
This is the output of systemctl status kubelet -l:

Loaded: loaded (/kind/systemd/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
ΓööΓöÇ10-kubeadm.conf
Active: active (running) since Tue 2019-05-14 10:40:31 UTC; 4min 7s ago
Docs: http://kubernetes.io/docs/
Main PID: 1212 (kubelet)
CGroup: /system.slice/kubelet.service
ΓööΓöÇ1212 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --fail-swap-on=false

May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.097781 1212 kubelet.go:2266] node "localhost.localdomain" not found
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.155161 1212 remote_runtime.go:96] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.155183 1212 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "kube-apiserver-localhost.localdomain_kube-system(a9b7dd24584050387fb803d4e69019ca)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.155192 1212 kuberuntime_manager.go:662] createPodSandbox for pod "kube-apiserver-localhost.localdomain_kube-system(a9b7dd24584050387fb803d4e69019ca)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.155223 1212 pod_workers.go:190] Error syncing pod a9b7dd24584050387fb803d4e69019ca ("kube-apiserver-localhost.localdomain_kube-system(a9b7dd24584050387fb803d4e69019ca)"), skipping: failed to "CreatePodSandbox" for "kube-apiserver-localhost.localdomain_kube-system(a9b7dd24584050387fb803d4e69019ca)" with CreatePodSandboxError: "CreatePodSandbox for pod "kube-apiserver-localhost.localdomain_kube-system(a9b7dd24584050387fb803d4e69019ca)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown"
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.168221 1212 remote_runtime.go:96] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-scheduler-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.168243 1212 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "kube-scheduler-localhost.localdomain_kube-system(4b52d75cab61380f07c0c5a69fb371d4)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-scheduler-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.168252 1212 kuberuntime_manager.go:662] createPodSandbox for pod "kube-scheduler-localhost.localdomain_kube-system(4b52d75cab61380f07c0c5a69fb371d4)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-scheduler-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.168281 1212 pod_workers.go:190] Error syncing pod 4b52d75cab61380f07c0c5a69fb371d4 ("kube-scheduler-localhost.localdomain_kube-system(4b52d75cab61380f07c0c5a69fb371d4)"), skipping: failed to "CreatePodSandbox" for "kube-scheduler-localhost.localdomain_kube-system(4b52d75cab61380f07c0c5a69fb371d4)" with CreatePodSandboxError: "CreatePodSandbox for pod "kube-scheduler-localhost.localdomain_kube-system(4b52d75cab61380f07c0c5a69fb371d4)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-scheduler-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown"
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.198628 1212 kubelet.go:2266] node "localhost.localdomain" not found

The command is "kubeadm init --ignore-preflight-errors=all --config=kubeadm.conf --skip-token-print --v=6"

The content of kubeadm.conf is:
apiServer:
certSANs:
localhost
apiVersion: kubeadm.k8s.io/v1beta1
clusterName: simple
controllerManager:
extraArgs:
enable-hostpath-provisioner: "true"
kind: ClusterConfiguration
kubernetesVersion: v1.13.4
metadata:
name: config

apiVersion: kubeadm.k8s.io/v1beta1
bootstrapTokens:

  • token: abcdef.0123456789abcdef
    kind: InitConfiguration
    localAPIEndpoint:
    bindPort: 6443
    metadata:
    name: config

apiVersion: kubeadm.k8s.io/v1beta1
kind: JoinConfiguration
metadata:
name: config

apiVersion: kubelet.config.k8s.io/v1beta1
evictionHard:
imagefs.available: 0%
nodefs.available: 0%
nodefs.inodesFree: 0%
imageGCHighThresholdPercent: 100
kind: KubeletConfiguration
metadata:
name: config

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
metadata:
name: config

您好,执行您的脚本出现无法通过VIP加入集群的报错

error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get https://10.16.10.50:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: dial tcp 10.16.10.50:6443: connect: connection refused

然后查看keepalived的日志发现如下报错:

Apr 02 11:47:49 k8s-master1 Keepalived_healthcheckers[28292]: Remote Web server [10.16.10.91]:6443 succeed on service.
Apr 02 11:47:49 k8s-master1 Keepalived_healthcheckers[28292]: Adding service [10.16.10.91]:6443 to VS [10.16.10.50]:6443
Apr 02 11:47:49 k8s-master1 Keepalived_healthcheckers[28292]: IPVS (cmd 1159, errno 2): No such file or directory

内核我参照您搭建1.14那篇帖子更新过了,开启了IPVS
这个是我的cluster-info文件:

CP0_IP=10.16.10.91
CP1_IP=10.16.10.92
CP2_IP=10.16.10.93
VIP=10.16.10.50
NET_IF=eth0
CIDR=10.244.0.0/16

双网卡下会报这个错误

[root@test111 kubeadm-ha]# cat cluster-info

CP0_IP=192.168.56.104
CP0_HOSTNAME=test111
CP1_IP=192.168.56.105
CP1_HOSTNAME=test222
CP2_IP=192.168.56.107
CP2_HOSTNAME=test333
VIP=192.168.56.199
NET_IF=enp0s3
CIDR=192.168.0.0/16

f20181127162850
f20181127162919
f20181127162945

master01節點重啟后導致dashboard不能訪問問題

非常感謝老師前幾天幫我解決了通過ingress方式訪問dashboard控制台問題,現在又遇到一個新問題,我在測試master節點高可用時,發現master01重啟后,導致上面的dashboard pod漂移到了worker節點,導致dashboard無法通過dashboard.multi.io域名訪問,這是不是CoreDNS解析的問題?

请问下用这个项目搭建k8s集群使用问题

请问下用这个项目搭建k8s集群,上面搭建个网站,域名要解析到那个IP?是VIP所在机器的外网IP对吗?LVS只是对集群内部高可用的,对吗?还是说在外部再搭建一个nginx/lvs来转发请求?域名解析用外部搭建的这个nginx/lvs服务器IP?有什么建议吗?

keepalive+LVS可以正常工作?

按照楼主部署keepalvied+lvs
1.persistence_timeout 50 这个是50S时间是不是很长不能达到负载均衡
2.还有这边用curl -k https://10.130.29.83:6443 这边好像只能访问到10.130.29.80的这个apiserver,其他两个是出现无法解析,并没有实现轮循。

kubectl相关命令会延迟大约半分钟才会出结果

大家有遇到这种情况吗?比如kubectl get nodes, kubectl get po --all-namespaces
环境: CentOS 7.5.1804, 3台虚机均为4C8T。
3个master都是这种情况。我去k8s看了看并没有人提这个问题所以才来这边问问。
v1.13.0集群没有这个问题。

如何去设置k8s 集群的时区呢

主要是k8s 系统用的一些容器
例如scheduler,controller-manager,apiserver
尝试在官方镜像基础上, 配置/etc/localtime, 但并不是很好用
请问没有简便的方法?

项目中LVS的问题请教下

ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.96.0.1:443 rr
-> 192.168.43.11:6443 Masq 1 1 0
-> 192.168.43.12:6443 Masq 1 0 0
-> 192.168.43.13:6443 Masq 1 3 0
TCP 10.96.0.10:53 rr
-> 10.233.0.2:53 Masq 1 0 0
-> 10.233.1.2:53 Masq 1 0 0
TCP 10.98.26.247:80 rr
-> 192.168.43.11:8080 Masq 1 0 0
-> 192.168.43.12:8080 Masq 1 0 0
-> 192.168.43.13:8080 Masq 1 0 0
TCP 10.99.198.122:443 rr
-> 10.233.1.5:8443 Masq 1 0 0
TCP 10.100.48.93:80 rr
-> 10.233.1.4:8082 Masq 1 0 0
TCP 10.100.136.4:5473 rr
TCP 10.105.169.229:80 rr
-> 10.233.2.2:3000 Masq 1 0 0
TCP 10.107.12.250:8083 rr
-> 10.233.2.2:8083 Masq 1 0 0
TCP 10.107.12.250:8086 rr
-> 10.233.2.2:8086 Masq 1 0 0
UDP 10.96.0.10:53 rr
-> 10.233.0.2:53 Masq 1 0 0
-> 10.233.1.2:53 Masq 1 0 0
项目中LVS我的疑问是,LVS DR模式,要同网段IP和同端口来转发,但是项目中LVS的IP不是同网段和同端口也能转发,这是如何实现的呢?请教下,如果知道,请回复下,谢谢

报错怎么差文件和你文章步骤一样

name: Invalid value: "master_1": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is 'a-z0-9?(.a-z0-9?)*')
cp: 无法获取"/etc/kubernetes/admin.conf" 的文件状态(stat): 没有那个文件或目录
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
failed to load admin kubeconfig: open /root/.kube/config: no such file or directory
/etc/kubernetes/pki/ca.crt: No such file or directory
/etc/kubernetes/pki/ca.key: No such file or directory
/etc/kubernetes/pki/sa.key: No such file or directory
/etc/kubernetes/pki/sa.pub: No such file or directory
/etc/kubernetes/pki/front-proxy-ca.crt: No such file or directory
/etc/kubernetes/pki/front-proxy-ca.key: No such file or directory
/etc/kubernetes/pki/etcd/ca.crt: No such file or directory
/etc/kubernetes/pki/etcd/ca.key: No such file or directory
/etc/kubernetes/admin.conf: No such file or directory

CIDR设置不合适

楼主设置pod网络的cidr为:CIDR=172.168.0.0/16
而这个网段并不是一个私网网段,私网网段有三个
A类:10.0.0.0/8
B类:172.16.0.0/12 --> 172.16.0.0/16~172.31.0.0/16
C类:192.168.0.0/16
而上述网段并不是这三个网段的子集,也就是说这个地址段与公网中的地址有重叠,这会造成pod访问公网地址时会有部分地址无法到达。

master上如果停止和启动kube-scheduler

按这个项目部署了三台虚拟机,都正常,在其中一台执行systemctl stop docker
ps aux |grep etc
看到有 kube-scheduler 和kube-controller-manager开头的进程
怎么正常关闭呢?我kill pid 然后要怎么把这个启动起来 ?

1.14.0版本kubelet 问题

1.14.0版本kubelet 问题,在系统日志里面出现大量的日志。
Active: active (running) since Tue 2019-04-16 22:54:30 CST; 1h 6min ago
Docs: https://kubernetes.io/docs/
Main PID: 859 (kubelet)
Tasks: 22
Memory: 130.2M
CGroup: /system.slice/kubelet.service
└─859 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1

Apr 17 00:01:08 k8s-m1 kubelet[859]: W0417 00:01:08.884515 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/libcontainer_22369_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): readdirent: no such file or directory
Apr 17 00:01:08 k8s-m1 kubelet[859]: W0417 00:01:08.884549 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_22369_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_22369_systemd_test_default.slice: no such file or directory
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.693640 859 container.go:523] Failed to update stats for container "/libcontainer_22469_systemd_test_default.slice": open /sys/fs/cgroup/cpu,cpuacct/libcontainer_22469_systemd_test_default.slice/cpuacct.stat: no such file or directory, continuing to push stats
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.717462 859 container.go:523] Failed to update stats for container "/libcontainer_22475_systemd_test_default.slice": failed to parse memory.memsw.usage_in_bytes - read /sys/fs/cgroup/memory/libcontainer_22475_systemd_test_default.slice/memory.memsw.usage_in_bytes: no such device, continuing to push stats
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.748654 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/blkio/libcontainer_22481_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/libcontainer_22481_systemd_test_default.slice: no such file or directory
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.749910 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_22481_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_22481_systemd_test_default.slice: no such file or directory
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.752770 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/blkio/libcontainer_22481_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/libcontainer_22481_systemd_test_default.slice: no such file or directory
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.752807 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/libcontainer_22481_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/libcontainer_22481_systemd_test_default.slice: no such file or directory
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.752824 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_22481_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_22481_systemd_test_default.slice: no such file or directory
Apr 17 00:01:18 k8s-m1 kubelet[859]: W0417 00:01:18.882369 859 container.go:523] Failed to update stats for container "/libcontainer_22530_systemd_test_default.slice": open /sys/fs/cgroup/cpu,cpuacct/libcontainer_22530_systemd_test_default.slice/cpuacct.usage_percpu: no such file or directory, continuing to push stats

请问怎么解决呢?

有个项目里面443端口的疑问

ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.96.0.1:443 rr
  -> 192.168.2.55:6443           Masq    1      2          0         
  -> 192.168.2.56:6443           Masq    1      1          0         
  -> 192.168.2.57:6443           Masq    1      0          0         
TCP  10.96.0.10:53 rr
  -> 10.205.1.3:53                Masq    1      0          0         
  -> 10.205.1.4:53                Masq    1      0          0

LVS会把443端口准发给其他节点的6443端口
traefik也会用到80和443端口,这样不会冲突吗?
LVS用到的443端口是哪个地方会用到?怎么修改这个呢?

install cni docker启动不了

ls: /calico-secrets: No such file or directory
Wrote Calico CNI binaries to /host/opt/cni/bin
CNI plugin version: v3.5.3
/host/secondary-bin-dir is non-writeable, skipping
Using CNI config template from CNI_NETWORK_CONFIG environment variable.
CNI config: {
  "name": "k8s-pod-network",
  "cniVersion": "0.3.0",
  "plugins": [
    {
      "type": "calico",
      "log_level": "info",
      "datastore_type": "kubernetes",
      "nodename": "master1",
      "mtu": 1440,
      "ipam": {
        "type": "host-local",
        "subnet": "usePodCidr"
      },
      "policy": {
          "type": "k8s"
      },
      "kubernetes": {
          "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "portmap",
      "snat": true,
      "capabilities": {"portMappings": true}
    }
  ]
}
Created CNI config 10-calico.conflist
Done configuring CNI.  Sleep=false

安装HA master的时候报证书错误

我是全离线安装,脚本下载到本地运行,安装HA master的时候报错:
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.240.200:6443"
[discovery] Failed to request cluster info, will try again: [Get https://192.168.240.200:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 192.168.240.200:6443: connect: connection refused]

本地curl 发现证书错误:
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).

[root@localhost ~]# cat cluster-info

CP0_IP=192.168.240.201
CP1_IP=192.168.240.202
CP2_IP=192.168.240.203
VIP=192.168.240.200
NET_IF=ens33
CIDR=10.244.0.0/16

可能是什么原因?

使用4.19版本内核导致无法开启ipvs

由于最新稳定版4.19内核将nf_conntrack_ipv4更名为nf_conntrack,目前的kube-proxy不支持在4.19版本内核下开启ipvs
详情可以查看:kubernetes/kubernetes#70304
对于该问题的修复10月30日刚刚合并到代码主干,所以目前还没有包含此修复的kubernetes版本发出
读者可以选择安装我提供的4.18版本内核,或者不开启IPVS
4.18版本内核RPM下载链接:https://pan.baidu.com/s/1dCeozuMRQ96MBBjGpf0cjA 提取码:3nqg

请问下为什么要用4.18版本内核?

是因为 4.18内核,kube-proxy才能支持ipvs ?
CentOS Linux系统默认的 (3.10.0-693.el7.x86_64) 7 (Core)
这个不支持吗?3.10.0-693.el7.x86_64打个补丁 是否也可以 ?

还有个疑问,集群结构摘要没有看到traefik,是和lvs一起在load balance这层吗?

k8s集群装完如何加apiServer的IP

请教一下,k8s集群安装完了,有个新的apiServer的IP,如何加apiServer的IP?
`kubernetesVersion: v1.12.1
apiServerCertSANs:

  • ${CP0_IP}
  • ${CP1_IP}
  • ${CP2_IP}
  • ${CP0_HOSTNAME}
  • ${CP1_HOSTNAME}
  • ${CP2_HOSTNAME}
  • ${VIP}`
    还是说只能重置k8s集群,重新初始化安装?应该有直接加apiServer的IP方法吧?

traefik如何换成nginx呢?指导下

traefik如何换成nginx呢?我想把traefik-ingress-controller换成ingress-nginx可以换吗?
或者项目是否可以支持可以选择traefik还是nginx ?

kubeadm join "VIP" 问题,

你好,请问此脚本生成后的集群如何将VIP 作为master的join IP,如下,在脚本执行完毕后,提示join master ip 是其中一个master的ip,是否可以将加入master的ip 修改为vip,如果操作。感谢

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

kubeadm join 10.23.0.100:6443 --token hl1tr9.kbdvliosudxkv0ad --discovery-token-ca-cert-hash sha256:3045a014cbf0958669d1a0023ac7fc5f5795836948f00ddbadaa58a080bd7915

Waiting for etcd bootup...

执行join的时候,提示 configmaps"cluster-info"禁止访问

脚本位置:https://github.com/Lentil1016/kubeadm-ha/blob/b2c2e5be3056babb3389b47db6f9a9b11041c3d1/kubeha-gen.sh#L172

错误提示:

front-proxy-ca.crt                                     100% 1038   761.9KB/s   00:00
front-proxy-ca.key                                     100% 1679   861.7KB/s   00:00
ca.crt                                                 100% 1017   543.4KB/s   00:00
ca.key                                                 100% 1679   659.8KB/s   00:00
admin.conf                                             100% 5444     2.0MB/s   00:00
admin.conf                                             100% 5444     3.4MB/s   00:00
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "10.1.1.8:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.1.1.8:6443"
[discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-public"]
[discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-public"]
[discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-public"]
[discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-public"]

报错后:

  1. 使用 kubectl get all -n kube-public命令发现这个namespace下没有任何资源。
  2. 在 /var/lib/kubelet/config.yaml 文件中(该文件是执行脚本过程生成的)看到有一项 anonymous:false配置,部分如下:
address: 0.0.0.0
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt

kube-public里既没有clusterinfo,又不允许匿名访问,请问一下,您在创建集群的时候有遇到过这种问题吗,有没有什么解决方案?

kubeadm-ha验证一下dns, pod network 问题。

集群节点

[root@k8s-m1 ~]# kubectl get cs
NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health": "true"}
[root@k8s-m1 ~]# kubectl get nodes
NAME     STATUS   ROLES    AGE    VERSION
k8s-m1   Ready    master   115m   v1.13.0
k8s-m2   Ready    master   115m   v1.13.0
k8s-m3   Ready    master   114m   v1.13.0
k8s-n1   Ready    <none>   19m    v1.13.0

[root@k8s-m1 ~]# kubectl get pods -n kube-system -o wide
NAME                                   READY   STATUS    RESTARTS   AGE    IP               NODE     NOMINATED NODE   READINESS GATES
calico-node-4q6ls                      2/2     Running   0          19m    192.168.88.114   k8s-n1   <none>           <none>
calico-node-5xcg6                      2/2     Running   0          115m   192.168.88.112   k8s-m2   <none>           <none>
calico-node-9jn4z                      2/2     Running   0          114m   192.168.88.113   k8s-m3   <none>           <none>
calico-node-ngm2t                      2/2     Running   0          115m   192.168.88.111   k8s-m1   <none>           <none>
coredns-86c58d9df4-nk9jl               1/1     Running   0          115m   10.244.0.2       k8s-m1   <none>           <none>
coredns-86c58d9df4-z8wtn               1/1     Running   0          115m   10.244.0.3       k8s-m1   <none>           <none>
etcd-k8s-m1                            1/1     Running   0          115m   192.168.88.111   k8s-m1   <none>           <none>
etcd-k8s-m2                            1/1     Running   0          115m   192.168.88.112   k8s-m2   <none>           <none>
etcd-k8s-m3                            1/1     Running   0          114m   192.168.88.113   k8s-m3   <none>           <none>
kube-apiserver-k8s-m1                  1/1     Running   0          114m   192.168.88.111   k8s-m1   <none>           <none>
kube-apiserver-k8s-m2                  1/1     Running   0          115m   192.168.88.112   k8s-m2   <none>           <none>
kube-apiserver-k8s-m3                  1/1     Running   0          114m   192.168.88.113   k8s-m3   <none>           <none>
kube-controller-manager-k8s-m1         1/1     Running   1          115m   192.168.88.111   k8s-m1   <none>           <none>
kube-controller-manager-k8s-m2         1/1     Running   0          115m   192.168.88.112   k8s-m2   <none>           <none>
kube-controller-manager-k8s-m3         1/1     Running   0          114m   192.168.88.113   k8s-m3   <none>           <none>
kube-proxy-49ds9                       1/1     Running   0          114m   192.168.88.113   k8s-m3   <none>           <none>
kube-proxy-grbhk                       1/1     Running   0          19m    192.168.88.114   k8s-n1   <none>           <none>
kube-proxy-n6pxn                       1/1     Running   0          115m   192.168.88.112   k8s-m2   <none>           <none>
kube-proxy-vdgwk                       1/1     Running   0          115m   192.168.88.111   k8s-m1   <none>           <none>
kube-scheduler-k8s-m1                  1/1     Running   1          115m   192.168.88.111   k8s-m1   <none>           <none>
kube-scheduler-k8s-m2                  1/1     Running   0          115m   192.168.88.112   k8s-m2   <none>           <none>
kube-scheduler-k8s-m3                  1/1     Running   0          114m   192.168.88.113   k8s-m3   <none>           <none>
kubernetes-dashboard-94b488ddb-qns8h   1/1     Running   0          114m   10.244.0.5       k8s-m1   <none>           <none>
metrics-server-5bff75b59f-mhs8d        2/2     Running   0          78m    10.244.2.2       k8s-m3   <none>           <none>
traefik-ingress-controller-859dl       1/1     Running   0          114m   192.168.88.111   k8s-m1   <none>           <none>
traefik-ingress-controller-fnt49       1/1     Running   0          114m   192.168.88.112   k8s-m2   <none>           <none>
traefik-ingress-controller-tctl4       1/1     Running   0          19m    192.168.88.114   k8s-n1   <none>           <none>
traefik-ingress-controller-tmqn2       1/1     Running   0          113m   192.168.88.113   k8s-m3   <none>           <none>

测试方式

kubectl create deployment nginx --image=nginx:alpine

kubectl get pods -l app=nginx -o wide

kubectl scale deployment nginx --replicas=2

kubectl expose deployment nginx --port=80 --type=NodePort

kubectl get services nginx

NodePort 负载正常。

验证 dns, pod network 这里就不行。

[root@k8s-m1 ~]#  kubectl run -it curl --image=radial/busyboxplus:curl
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
[ root@curl-66959f6557-q9964:/ ]$ nslookup nginx
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      nginx
Address 1: 198.18.4.134
[ root@curl-66959f6557-q9964:/ ]$ curl http://198.18.4.134
curl: (56) Recv failure: Connection reset by peer
[ root@curl-66959f6557-q9964:/ ]$ curl http://nginx
curl: (56) Recv failure: Connection reset by peer
[ root@curl-66959f6557-q9964:/ ]$ exit
Session ended, resume using 'kubectl attach curl-66959f6557-q9964 -c curl -i -t' command when the pod is running

198.18 这个是calico的吗??

执行1.12.1版本部署脚本时,设置10.244.0.0/16网段的podSubnet,有时却出现172.17.0.0/16网段的容器

该问题有几率产生以下后果:

  • calico-node容器始终保持不可用
  • dashboard和heapster等容器需要重启后才能访问
  • coredns被启动在172.17网段,导致无法解析服务名

一次典型的故障示例如下:

kubectl get pods -o wide -n kube-system                                  
NAME                                              READY   STATUS    RESTARTS   AGE     IP             NODE                    NOMINATED NODE
calico-node-hc6k7                                 1/2     Running   0          3m33s   10.130.29.82   centos-7-x86-64-29-82   <none>
calico-node-jksjw                                 1/2     Running   0          3m33s   10.130.29.81   centos-7-x86-64-29-81   <none>
calico-node-xbblx                                 1/2     Running   1          3m33s   10.130.29.80   centos-7-x86-64-29-80   <none>
coredns-576cbf47c7-mb5xc                          1/1     Running   0          5m45s   10.244.0.4     centos-7-x86-64-29-80   <none>
coredns-576cbf47c7-tzw72                          1/1     Running   0          5m45s   10.244.0.5     centos-7-x86-64-29-80   <none>
etcd-centos-7-x86-64-29-80                        1/1     Running   0          4m46s   10.130.29.80   centos-7-x86-64-29-80   <none>
etcd-centos-7-x86-64-29-81                        1/1     Running   0          102s    10.130.29.81   centos-7-x86-64-29-81   <none>
etcd-centos-7-x86-64-29-82                        1/1     Running   0          2m2s    10.130.29.82   centos-7-x86-64-29-82   <none>
heapster-v1.5.4-65ff99c48d-25xvw                  2/2     Running   0          3m17s   172.17.0.2     centos-7-x86-64-29-81   <none>
kube-apiserver-centos-7-x86-64-29-80              1/1     Running   0          2m51s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-apiserver-centos-7-x86-64-29-81              1/1     Running   3          2m46s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-apiserver-centos-7-x86-64-29-82              1/1     Running   0          106s    10.130.29.82   centos-7-x86-64-29-82   <none>
kube-controller-manager-centos-7-x86-64-29-80     1/1     Running   2          4m48s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-controller-manager-centos-7-x86-64-29-81     1/1     Running   1          2m46s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-controller-manager-centos-7-x86-64-29-82     1/1     Running   0          104s    10.130.29.82   centos-7-x86-64-29-82   <none>
kube-proxy-4l7fg                                  1/1     Running   0          3m55s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-proxy-hd2k5                                  1/1     Running   0          3m39s   10.130.29.82   centos-7-x86-64-29-82   <none>
kube-proxy-mgttp                                  1/1     Running   0          5m45s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-scheduler-centos-7-x86-64-29-80              1/1     Running   2          4m46s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-scheduler-centos-7-x86-64-29-81              1/1     Running   1          3m17s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-scheduler-centos-7-x86-64-29-82              1/1     Running   0          107s    10.130.29.82   centos-7-x86-64-29-82   <none>
kubernetes-dashboard-6b475b66b5-d7d78             1/1     Running   0          28s     10.244.0.6     centos-7-x86-64-29-80   <none>
monitoring-influxdb-grafana-v4-65cc9bb8c8-2zcbz   2/2     Running   0          3m17s   172.17.0.3     centos-7-x86-64-29-81   <none>
traefik-ingress-controller-2wkn9                  1/1     Running   0          3m22s   10.130.29.82   centos-7-x86-64-29-82   <none>
traefik-ingress-controller-jkxvg                  1/1     Running   0          3m22s   10.130.29.81   centos-7-x86-64-29-81   <none>
traefik-ingress-controller-kb9v4                  1/1     Running   0          2m19s   10.130.29.80   centos-7-x86-64-29-80   <none>

网络模块更新为flannel

我把kubeha-gen.sh中的calico更换为flannel,在两节点的集群环境中,一个master状态为ready另一个unready。检查发现flannel的pod只在正常节点上创建运行。flannel的yml文件如下。

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
rules:
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni
        image: registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni
        image: registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist

daemonset 不应该是每个节点都会创建相应pod么?
尝试过在创建好的集群中再次创建flannel 都会报重复创建的错误。
有什么解决办法么?

3台k8s组ha挂了两台是否会有影响

按ha kubernetes 1.11.0/1.12.1 cluster架构图,3台master,如果挂了两台,不会有影响,我测试发现dashboard页面会报Bad Gateway,不会全部切换到正常的节点吗?
VIP是有切换到正常的节点

keepalived 测试VIP无法正常漂移,主节点只要开机就会固定VIP

[root@node2 ~]# ip addr|grep inet|grep ens32
    inet 192.168.0.12/24 brd 192.168.0.255 scope global noprefixroute ens32
[root@node2 ~]# for i in 10 11 12 15; do echo -n 192.168.0.$i"-->" ;curl https://192.168.0.$i:6443/healthz -k -m 3 -s  ;echo; done
192.168.0.10-->
192.168.0.11-->
192.168.0.12-->ok
192.168.0.15-->ok
[root@node2 ~]# ip addr|grep inet|grep ens32
    inet 192.168.0.12/24 brd 192.168.0.255 scope global noprefixroute ens32
[root@node2 ~]# ssh 192.168.0.11 ip addr|grep inet|grep ens32
    inet 192.168.0.11/24 brd 192.168.0.255 scope global noprefixroute ens32
    inet 192.168.0.10/32 scope global ens32
[root@node2 ~]# ssh 192.168.0.11 lsof -i:6443
[root@node2 ~]# lsof -i:6443 |head -1
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
[root@node2 ~]# lsof -i:6443 |head -3
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
kube-apis 5112 root    3u  IPv6  78963      0t0  TCP *:sun-sr-https (LISTEN)
kube-apis 5112 root    6u  IPv6 225893      0t0  TCP node:sun-sr-https->node2:45511 (ESTABLISHED)
[root@node2 ~]#

请教下安装镜像问题

看了作者的kubeadm HA master(v1.11.0)集群搭建指南文章,其实说到
harbor只支持{hostname}/{project}/{image}:{tag},所以k8s.gcr.io/pause:3.1一类的镜像放不进去。
作者安装的脚本和yamp编排文件中是否可以修改成{hostname}/{project}/{image}:{tag}这样的格式呢?
有办法吗?如果不能修改为什么呢?官网安装yaml限制了吗?这块不懂,请教一下

initial master timeout 151.101.132.133:443: i/o timeout

很奇怪在我运行脚本的过程中,似乎脚本要访问这个server, 貌似是一个公网的ip, 不知道这个脚本需要从这个ip取什么东西吗——

[markmaster] Marking the node k8s-master03 as master by adding the label "node-role.kubernetes.io/master=''"
[markmaster] Marking the node k8s-master03 as master by adding the taints [node-role.kubernetes.io/master:NoSchedule]
Unable to connect to the server: dial tcp 151.101.132.133:443: i/o timeout
Unable to connect to the server: dial tcp 151.101.132.133:443: i/o timeout
Cluster create finished.

Email Address []:[email protected]
secret/ssl created
Unable to connect to the server: dial tcp 151.101.132.133:443: i/o timeout
^C

K8S HA高可用切换测试问题

按照文档部署集群
1.使用power off 和shutdown关闭管理节点apiserver会在10S内切换。
2.物理机直接关闭电源虚拟机直接通过控制台关闭电源,会出现apiserver5min以上不可用,虚拟IP可以正常切换(关闭kube-controller-manager,kube-scheduler所选主的那个节点)还请帮忙确认下

按照脚本部署的只有一个 master节点。

集群节点配置:

[root@k8s-m1 kubeadm-ha]# ./kubeha-gen.sh

cluster-info:
  master-01:        192.168.88.111
  master-02:        192.168.88.112
  master-02:        192.168.88.113
  VIP:              192.168.88.110
  Net Interface:    eth1
  CIDR:             10.244.0.0/16

日志

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

keepalived-1.conf                                                                                                100% 1266     1.1MB/s   00:00
Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service.
[preflight] running pre-flight checks
[reset] no etcd config found. Assuming external etcd
[reset] please manually reset etcd to prevent further issues
[reset] stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

keepalived-2.conf                                                                                                100% 1266   658.7KB/s   00:00
Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service.
[preflight] running pre-flight checks
[reset] no etcd config found. Assuming external etcd
[reset] please manually reset etcd to prevent further issues
[reset] stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

[init] Using Kubernetes version: v1.13.0
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-m1 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-m1 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-m1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.2.15 192.168.88.110 192.168.88.111 192.168.88.112 192.168.88.113 192.168.88.110]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 21.505018 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "k8s-m1" as an annotation
[mark-control-plane] Marking the node k8s-m1 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8s-m1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: r7p6k3.6rifm9nykjsqabh4
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 192.168.88.110:6443 --token r7p6k3.6rifm9nykjsqabh4 --discovery-token-ca-cert-hash sha256:937c26b6ffa84b341ac95122fedc1fe3e5aeba4677ad36cf6b4adb9cd6c693d8

clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
configmap/calico-config created
service/calico-typha created
deployment.apps/calico-typha created
poddisruptionbudget.policy/calico-typha created
daemonset.extensions/calico-node created
serviceaccount/calico-node created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
ca.crt                                                                                                           100% 1025   494.9KB/s   00:00
ca.key                                                                                                           100% 1675     1.3MB/s   00:00
sa.key                                                                                                           100% 1675     1.3MB/s   00:00
sa.pub                                                                                                           100%  451   354.7KB/s   00:00
front-proxy-ca.crt                                                                                               100% 1038     1.1MB/s   00:00
front-proxy-ca.key                                                                                               100% 1675     1.4MB/s   00:00
ca.crt                                                                                                           100% 1017   835.0KB/s   00:00
ca.key                                                                                                           100% 1675     1.5MB/s   00:00
admin.conf                                                                                                       100% 5454     1.5MB/s   00:00
admin.conf                                                                                                       100% 5454     4.7MB/s   00:00
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[discovery] Trying to connect to API Server "192.168.88.110:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.88.110:6443"
[discovery] Failed to connect to API Server "192.168.88.110:6443": token id "8zioq9" is invalid for this cluster or it has expired. Use "kubeadm token create" on the master node to creating a new valid token
[discovery] Trying to connect to API Server "192.168.88.110:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.88.110:6443"
[discovery] Failed to connect to API Server "192.168.88.110:6443": token id "8zioq9" is invalid for this cluster or it has expired. Use "kubeadm token create" on the master node to creating a new valid token
[discovery] Trying to connect to API Server "192.168.88.110:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.88.110:6443"
[discovery] Requesting info from "https://192.168.88.110:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.88.110:6443"
[discovery] Successfully established connection with API Server "192.168.88.110:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[join] Running pre-flight checks before initializing the new control plane instance
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-m2 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-m2 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-m2 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.2.15 192.168.88.110 192.168.88.111 192.168.88.112 192.168.88.113 192.168.88.110]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Using existing up-to-date kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Checking Etcd cluster health
error syncing endpoints with etc: dial tcp 10.0.2.15:2379: connect: connection refused
ca.crt                                                                                                           100% 1025   705.4KB/s   00:00
ca.key                                                                                                           100% 1675     1.6MB/s   00:00
sa.key                                                                                                           100% 1675     1.0MB/s   00:00
sa.pub                                                                                                           100%  451   410.0KB/s   00:00
front-proxy-ca.crt                                                                                               100% 1038   887.2KB/s   00:00
front-proxy-ca.key                                                                                               100% 1675     1.2MB/s   00:00
ca.crt                                                                                                           100% 1017   676.7KB/s   00:00
ca.key                                                                                                           100% 1675     1.6MB/s   00:00
admin.conf                                                                                                       100% 5454     3.2MB/s   00:00
admin.conf                                                                                                       100% 5454     3.6MB/s   00:00
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[discovery] Trying to connect to API Server "192.168.88.110:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.88.110:6443"
[discovery] Requesting info from "https://192.168.88.110:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.88.110:6443"
[discovery] Successfully established connection with API Server "192.168.88.110:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[join] Running pre-flight checks before initializing the new control plane instance
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-m3 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-m3 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-m3 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.2.15 192.168.88.110 192.168.88.111 192.168.88.112 192.168.88.113 192.168.88.110]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Using existing up-to-date kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Checking Etcd cluster health
error syncing endpoints with etc: dial tcp 10.0.2.15:2379: connect: connection refused
Cluster create finished.
Generating a 4096 bit RSA private key
..............++
..++
writing new private key to '/root/ikube/tls/tls.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) []:CN
State or Province Name (full name) []:Beijing
Locality Name (eg, city) []:Haidian
Organization Name (eg, company) []:Channelsoft
Organizational Unit Name (eg, section) []:R & D Department
Common Name (eg, your name or your server's hostname) []:*.multi.io
Email Address []:[email protected]
secret/ssl created
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
configmap/metrics-server-config created
deployment.extensions/metrics-server created
service/metrics-server created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-key-holder created
serviceaccount/kubernetes-dashboard-admin created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard-admin created
clusterrole.rbac.authorization.k8s.io/cluster-watcher created
deployment.apps/kubernetes-dashboard created
service/kubernetes-dashboard created
ingress.extensions/dashboard created
Plugin install finished.
Plugin install finished.
Waiting for all pods into 'Running' status. You can press 'Ctrl + c' to terminate this waiting any time you like.

NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health": "true"}
NAME     STATUS   ROLES    AGE    VERSION
k8s-m1   Ready    master   119s   v1.13.0
NAME                                   READY   STATUS    RESTARTS   AGE
calico-node-gnfsn                      2/2     Running   0          100s
coredns-86c58d9df4-nfdf4               1/1     Running   0          100s
coredns-86c58d9df4-qdsz5               1/1     Running   0          100s
etcd-k8s-m1                            1/1     Running   0          66s
kube-apiserver-k8s-m1                  1/1     Running   0          51s
kube-controller-manager-k8s-m1         1/1     Running   0          59s
kube-proxy-ht6pq                       1/1     Running   0          100s
kube-scheduler-k8s-m1                  1/1     Running   0          44s
kubernetes-dashboard-94b488ddb-7nznf   1/1     Running   0          42s
metrics-server-64d45d7bd7-fkxz6        2/2     Running   0          10s

k8s集群如果出现一台异常疑问

k8s集群三台如果出现一台异常无法恢复怎么加一台master ?
是要重新初始化,重新生成证书吗?这样应该会影响k8s集群上的应用吧
v1.12.X 用的这个版本,v1.12.X和v1.13.X有什么好方法解决这种情况吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.