lentil1016 / kubeadm-ha Goto Github PK

View Code? Open in Web Editor NEW

213.0 213.0 131.0 152 KB

Deprecated! Boot a ha kubernetes 1.11.0/1.12.1/1.13.0/1.14.0 cluster with kubeadm.

License: GNU General Public License v3.0

Shell 100.00%

kubeadm kubernetes

kubeadm-ha's People

Contributors

Stargazers

Watchers

Forkers

weir2010 shareinto ddling1216 keven0706 gzchen008 skymysky forging2012 huangqun tommyare dugites xubin1986 tengda-john aexleader msnetc jinanxiaolaohu k8sinchina j4ckzh0u lilingzj xionglihdfs joenarry hjsw1 jianghaocloud jssfy jishandong lujay thomas-yanght xxxlihui cruse123 githubhth liushuai168 carlziess letiscoding raymondlwb devops1987 liujunfei980 dk-lockdown wendydd soulmz swt914 xlogin yoyopie 759126711 hugoren chenrui2014 binbean jsjilei1986 surpass fossabot w781873267 chuanqi-liu chenhuimin wolfkingchuang ai2083 wangwj gangchentfs qiupei liuyusen gaozhengzhou xixisidi charmfocus bingyun84 afghanistanyn bigdatahunter engchina gaozhiqiangk icegroupgmail yhtsnda wajika tianyuegithub rockants99 rainimaybe liuxin638507 duke-lv shockingdawn sunnyboy3 jinpf304 90linux blacksunday08 xtyangzheng uglyliu xiaoshou0427 youngercloud ieihadn jimmyzhong yufenghui icyxp hongzhenglin zhangkg gilberttu linjunjianbaba woyaowoyao es365 xuqil zylzzy jiaoyilun xubaojin jadeluo pingod bradbann wangyingchun163

kubeadm-ha's Issues

Grafana部署

你好，按照您分支里的部署grafana，如何安装插件呢，插件需要重启，删了重建就没了

怎么实现安装coredns的

看项目里面文件没有包含coredns，测试能装上coredns，咋实现的啊，能写几个文档说说嘛？

calico未知问题导致的nodePort失效，Traefik无法使用问题

每次都完全删除/新建虚机，重装3次依然出现nodePort失效和Traefik无法使用的问题(之前也装过多次，并没有出现该问题，该问题是昨天的某一次重装开始就一直出现)：

nodePort：可以在每一个node上看到已经都在监听建立的nodePort了，但是该nodePort只在pod所在的node好用，其他node上均不好用。
Traefik也一样，只有域名指向pod所在的node才能访问到。

我猜测是calico的问题，但是看到包括calico在内的所有pod都是running没有问题的，所以我就想替换成flannel试试，但是很不幸，直接用项目主的yaml删除calico，并安装flannel失败，说是sandbox有问题。仔细看了看项目主的脚本，发现是在组ha之前就apply了calico，所以猜测是这里导致flannel无法使用。我又删掉、重建虚机，将项目主脚本中的apply calico换成apply flannel后，nodePort和ingress都可以正常运行了。

这里想请教项目主一个问题，你们私有云/公有云用k8s时，一般集群外访问应用，nodePort和Ingress应该指向那个master?因为slave容易down，所以我觉得不能指向slave，除非在公有云环境用SLB等LoadBalancer指向所有slave然后做健康检测。
您给出的方案是3个master，所以单指向哪个master都不行，你们是怎么做的？是把keepalived的虚拟IP配到公网上，还是在k8s前面再放一个haproxy呢？

各节点服务怎么重启？

各节点服务怎么重启？
如果服务器节点挂了，怎么启动，重启了， dashboard.multi.io报Gateway Timeout错误

kube-proxy在ipvs mode下定时更新ipvs规则的疑问

kube-proxy在ipvs mode下定时更新ipvs规则的疑问，
我手动添加了一些LVS规则，貌似会被kube-proxy给清理掉，为什么会清理掉呢？
kube-proxy不是刷新部分LVS规则？如果我需要手动加一些LVS规则，有怎么做呢？
请教一下，谢谢！

kernel 4.18的离线源有吗？elrepo里只有4.4和5.0的，谢谢！

请问大家是怎么升级的？

请问哪里有执行脚本之前要做事项的教程？

RT.
BTW: 之前LVS有BUG，不知道现在BUG解决了吗，还需要升级内核到4.19吗？

Thx.

kubernetes-dashboard無法訪問

我按照此方法成功部署了3masrer k8s集群，但dashboard無法訪問
1、查看 kubernetes-dashboard日誌信息如下：

# kubectl logs  kubernetes-dashboard-94b488ddb-tn9cg -n kube-system
2018/12/25 05:41:39 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/12/25 05:42:09 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/12/25 05:42:39 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.

2、查看Metric 日誌

# kubectl logs metrics-server-5bff75b59f-54t27 -n kube-system
Error from server (BadRequest): a container name must be specified for pod metrics-server-5bff75b59f-54t27, choose one of: [metrics-server metrics-server-nanny]

1.14 dashboard能访问，无法登陆

安装的1.14版本，安装很顺利，dashboard能访问，但是登陆不了，也按照文章说明进行了删除操作，还是不行，浏览器控制台看有发送请求，是认证信息的问题吗，我用的是dashboard-adin-token登陆的，也试了.kube/config，求求解

关于高可用集群证书的问题

大佬，再打扰你一下关于高可用集群证书的问题。。之前再v1.11的时候是我在每台master上生成每个组件的如apiserver，etcd。100年证书，同时再用kubeadm alpha phase把节点标记成master之类

现在用kubeadm join 实现ha的话，我还要这么做么。。而且我还需要手动把每个master的etcd加到集群里，现在这个命令是帮你完成这个功能了是么

关于calico3.6.0中存在一些问题，已经再3.61中已修复。

情爱的作者您好，
您在中文社区遇到的3.6.0版本问题已经修复，请查收后校验，
projectcalico/calico#2501 (comment)

烦请在https://www.kubernetes.org.cn/5213.html此文档中更新calico的版本，谢谢！

kubernetes: master init node localhost.localdomain not found

Hi,
I am using kubeadm init to init one master node based on one VM. But during the process, node"localhost.localdomain" nout found problem always shows up.
My /etc/hosts:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost4 localhost4.localdomain4
192.168.5.130 localhost.localdomain
The docekr version os 18.06-ce, kubernetes version is v1.13.4
The kernel is based on linux-4.9.137, self compiled. These are all the modules the kernel has:

Some solutions says that kubernetes use its own dns to resolv the master name, not the /resolv.conf. Is it true?

This is the output of systemctl status kubelet -l:

Loaded: loaded (/kind/systemd/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
ΓööΓöÇ10-kubeadm.conf
Active: active (running) since Tue 2019-05-14 10:40:31 UTC; 4min 7s ago
Docs: http://kubernetes.io/docs/
Main PID: 1212 (kubelet)
CGroup: /system.slice/kubelet.service
ΓööΓöÇ1212 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --fail-swap-on=false

May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.097781 1212 kubelet.go:2266] node "localhost.localdomain" not found
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.155161 1212 remote_runtime.go:96] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.155183 1212 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "kube-apiserver-localhost.localdomain_kube-system(a9b7dd24584050387fb803d4e69019ca)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.155192 1212 kuberuntime_manager.go:662] createPodSandbox for pod "kube-apiserver-localhost.localdomain_kube-system(a9b7dd24584050387fb803d4e69019ca)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.155223 1212 pod_workers.go:190] Error syncing pod a9b7dd24584050387fb803d4e69019ca ("kube-apiserver-localhost.localdomain_kube-system(a9b7dd24584050387fb803d4e69019ca)"), skipping: failed to "CreatePodSandbox" for "kube-apiserver-localhost.localdomain_kube-system(a9b7dd24584050387fb803d4e69019ca)" with CreatePodSandboxError: "CreatePodSandbox for pod "kube-apiserver-localhost.localdomain_kube-system(a9b7dd24584050387fb803d4e69019ca)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown"
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.168221 1212 remote_runtime.go:96] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-scheduler-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.168243 1212 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "kube-scheduler-localhost.localdomain_kube-system(4b52d75cab61380f07c0c5a69fb371d4)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-scheduler-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.168252 1212 kuberuntime_manager.go:662] createPodSandbox for pod "kube-scheduler-localhost.localdomain_kube-system(4b52d75cab61380f07c0c5a69fb371d4)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-scheduler-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.168281 1212 pod_workers.go:190] Error syncing pod 4b52d75cab61380f07c0c5a69fb371d4 ("kube-scheduler-localhost.localdomain_kube-system(4b52d75cab61380f07c0c5a69fb371d4)"), skipping: failed to "CreatePodSandbox" for "kube-scheduler-localhost.localdomain_kube-system(4b52d75cab61380f07c0c5a69fb371d4)" with CreatePodSandboxError: "CreatePodSandbox for pod "kube-scheduler-localhost.localdomain_kube-system(4b52d75cab61380f07c0c5a69fb371d4)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-scheduler-localhost.localdomain": Error response from daemon: cgroups: cgroup deleted: unknown"
May 14 10:44:39 localhost.localdomain kubelet[1212]: E0514 10:44:39.198628 1212 kubelet.go:2266] node "localhost.localdomain" not found

The command is "kubeadm init --ignore-preflight-errors=all --config=kubeadm.conf --skip-token-print --v=6"

The content of kubeadm.conf is:
apiServer:
certSANs:
localhost
apiVersion: kubeadm.k8s.io/v1beta1
clusterName: simple
controllerManager:
extraArgs:
enable-hostpath-provisioner: "true"
kind: ClusterConfiguration
kubernetesVersion: v1.13.4
metadata:
name: config

apiVersion: kubeadm.k8s.io/v1beta1
bootstrapTokens:

token: abcdef.0123456789abcdef
kind: InitConfiguration
localAPIEndpoint:
bindPort: 6443
metadata:
name: config

apiVersion: kubeadm.k8s.io/v1beta1
kind: JoinConfiguration
metadata:
name: config

apiVersion: kubelet.config.k8s.io/v1beta1
evictionHard:
imagefs.available: 0%
nodefs.available: 0%
nodefs.inodesFree: 0%
imageGCHighThresholdPercent: 100
kind: KubeletConfiguration
metadata:
name: config

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
metadata:
name: config

您好，执行您的脚本出现无法通过VIP加入集群的报错

error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get https://10.16.10.50:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: dial tcp 10.16.10.50:6443: connect: connection refused

然后查看keepalived的日志发现如下报错：
Apr 02 11:47:49 k8s-master1 Keepalived_healthcheckers[28292]: Remote Web server [10.16.10.91]:6443 succeed on service. Apr 02 11:47:49 k8s-master1 Keepalived_healthcheckers[28292]: Adding service [10.16.10.91]:6443 to VS [10.16.10.50]:6443 Apr 02 11:47:49 k8s-master1 Keepalived_healthcheckers[28292]: IPVS (cmd 1159, errno 2): No such file or directory

内核我参照您搭建1.14那篇帖子更新过了，开启了IPVS
这个是我的cluster-info文件：

CP0_IP=10.16.10.91
CP1_IP=10.16.10.92
CP2_IP=10.16.10.93
VIP=10.16.10.50
NET_IF=eth0
CIDR=10.244.0.0/16

双网卡下会报这个错误

[root@test111 kubeadm-ha]# cat cluster-info

CP0_IP=192.168.56.104
CP0_HOSTNAME=test111
CP1_IP=192.168.56.105
CP1_HOSTNAME=test222
CP2_IP=192.168.56.107
CP2_HOSTNAME=test333
VIP=192.168.56.199
NET_IF=enp0s3
CIDR=192.168.0.0/16

master01節點重啟后導致dashboard不能訪問問題

非常感謝老師前幾天幫我解決了通過ingress方式訪問dashboard控制台問題，現在又遇到一個新問題，我在測試master節點高可用時，發現master01重啟后，導致上面的dashboard pod漂移到了worker節點，導致dashboard無法通過dashboard.multi.io域名訪問，這是不是CoreDNS解析的問題？

请问下用这个项目搭建k8s集群使用问题

请问下用这个项目搭建k8s集群，上面搭建个网站，域名要解析到那个IP？是VIP所在机器的外网IP对吗？LVS只是对集群内部高可用的，对吗？还是说在外部再搭建一个nginx/lvs来转发请求？域名解析用外部搭建的这个nginx/lvs服务器IP？有什么建议吗？

keepalive+LVS可以正常工作?

按照楼主部署keepalvied+lvs
1.persistence_timeout 50 这个是50S时间是不是很长不能达到负载均衡
2.还有这边用curl -k https://10.130.29.83:6443 这边好像只能访问到10.130.29.80的这个apiserver，其他两个是出现无法解析，并没有实现轮循。

kubectl相关命令会延迟大约半分钟才会出结果

大家有遇到这种情况吗？比如kubectl get nodes, kubectl get po --all-namespaces
环境: CentOS 7.5.1804, 3台虚机均为4C8T。
3个master都是这种情况。我去k8s看了看并没有人提这个问题所以才来这边问问。
v1.13.0集群没有这个问题。

如何去设置k8s 集群的时区呢

主要是k8s 系统用的一些容器
例如scheduler，controller-manager，apiserver
尝试在官方镜像基础上，配置/etc/localtime，但并不是很好用
请问没有简便的方法？

kube-proxy在ipvs mode下工作几小时后停止更新ipvs规则，导致新Pod无法被访问

kubernetes/kubernetes#71071

项目中LVS的问题请教下

ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.96.0.1:443 rr
-> 192.168.43.11:6443 Masq 1 1 0
-> 192.168.43.12:6443 Masq 1 0 0
-> 192.168.43.13:6443 Masq 1 3 0
TCP 10.96.0.10:53 rr
-> 10.233.0.2:53 Masq 1 0 0
-> 10.233.1.2:53 Masq 1 0 0
TCP 10.98.26.247:80 rr
-> 192.168.43.11:8080 Masq 1 0 0
-> 192.168.43.12:8080 Masq 1 0 0
-> 192.168.43.13:8080 Masq 1 0 0
TCP 10.99.198.122:443 rr
-> 10.233.1.5:8443 Masq 1 0 0
TCP 10.100.48.93:80 rr
-> 10.233.1.4:8082 Masq 1 0 0
TCP 10.100.136.4:5473 rr
TCP 10.105.169.229:80 rr
-> 10.233.2.2:3000 Masq 1 0 0
TCP 10.107.12.250:8083 rr
-> 10.233.2.2:8083 Masq 1 0 0
TCP 10.107.12.250:8086 rr
-> 10.233.2.2:8086 Masq 1 0 0
UDP 10.96.0.10:53 rr
-> 10.233.0.2:53 Masq 1 0 0
-> 10.233.1.2:53 Masq 1 0 0
项目中LVS我的疑问是，LVS DR模式，要同网段IP和同端口来转发，但是项目中LVS的IP不是同网段和同端口也能转发，这是如何实现的呢？请教下，如果知道，请回复下，谢谢

报错怎么差文件和你文章步骤一样

name: Invalid value: "master_1": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is 'a-z0-9?(.a-z0-9?)*')
cp: 无法获取"/etc/kubernetes/admin.conf" 的文件状态(stat): 没有那个文件或目录
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
failed to load admin kubeconfig: open /root/.kube/config: no such file or directory
/etc/kubernetes/pki/ca.crt: No such file or directory
/etc/kubernetes/pki/ca.key: No such file or directory
/etc/kubernetes/pki/sa.key: No such file or directory
/etc/kubernetes/pki/sa.pub: No such file or directory
/etc/kubernetes/pki/front-proxy-ca.crt: No such file or directory
/etc/kubernetes/pki/front-proxy-ca.key: No such file or directory
/etc/kubernetes/pki/etcd/ca.crt: No such file or directory
/etc/kubernetes/pki/etcd/ca.key: No such file or directory
/etc/kubernetes/admin.conf: No such file or directory

CIDR设置不合适

楼主设置pod网络的cidr为：CIDR=172.168.0.0/16
而这个网段并不是一个私网网段，私网网段有三个
A类：10.0.0.0/8
B类：172.16.0.0/12 --> 172.16.0.0/16~172.31.0.0/16
C类：192.168.0.0/16
而上述网段并不是这三个网段的子集，也就是说这个地址段与公网中的地址有重叠，这会造成pod访问公网地址时会有部分地址无法到达。

metrics-server is not currently supported in kubernetes-dashboard

check: kubernetes/dashboard#2986

master上如果停止和启动kube-scheduler

按这个项目部署了三台虚拟机，都正常，在其中一台执行systemctl stop docker
ps aux |grep etc
看到有 kube-scheduler 和kube-controller-manager开头的进程
怎么正常关闭呢？我kill pid 然后要怎么把这个启动起来？

1.14.0版本kubelet 问题

1.14.0版本kubelet 问题，在系统日志里面出现大量的日志。
Active: active (running) since Tue 2019-04-16 22:54:30 CST; 1h 6min ago
Docs: https://kubernetes.io/docs/
Main PID: 859 (kubelet)
Tasks: 22
Memory: 130.2M
CGroup: /system.slice/kubelet.service
└─859 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1

Apr 17 00:01:08 k8s-m1 kubelet[859]: W0417 00:01:08.884515 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/libcontainer_22369_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): readdirent: no such file or directory
Apr 17 00:01:08 k8s-m1 kubelet[859]: W0417 00:01:08.884549 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_22369_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_22369_systemd_test_default.slice: no such file or directory
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.693640 859 container.go:523] Failed to update stats for container "/libcontainer_22469_systemd_test_default.slice": open /sys/fs/cgroup/cpu,cpuacct/libcontainer_22469_systemd_test_default.slice/cpuacct.stat: no such file or directory, continuing to push stats
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.717462 859 container.go:523] Failed to update stats for container "/libcontainer_22475_systemd_test_default.slice": failed to parse memory.memsw.usage_in_bytes - read /sys/fs/cgroup/memory/libcontainer_22475_systemd_test_default.slice/memory.memsw.usage_in_bytes: no such device, continuing to push stats
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.748654 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/blkio/libcontainer_22481_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/libcontainer_22481_systemd_test_default.slice: no such file or directory
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.749910 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_22481_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_22481_systemd_test_default.slice: no such file or directory
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.752770 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/blkio/libcontainer_22481_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/libcontainer_22481_systemd_test_default.slice: no such file or directory
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.752807 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/libcontainer_22481_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/libcontainer_22481_systemd_test_default.slice: no such file or directory
Apr 17 00:01:14 k8s-m1 kubelet[859]: W0417 00:01:14.752824 859 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_22481_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_22481_systemd_test_default.slice: no such file or directory
Apr 17 00:01:18 k8s-m1 kubelet[859]: W0417 00:01:18.882369 859 container.go:523] Failed to update stats for container "/libcontainer_22530_systemd_test_default.slice": open /sys/fs/cgroup/cpu,cpuacct/libcontainer_22530_systemd_test_default.slice/cpuacct.usage_percpu: no such file or directory, continuing to push stats

请问怎么解决呢？

有个项目里面443端口的疑问

ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.96.0.1:443 rr
  -> 192.168.2.55:6443           Masq    1      2          0         
  -> 192.168.2.56:6443           Masq    1      1          0         
  -> 192.168.2.57:6443           Masq    1      0          0         
TCP  10.96.0.10:53 rr
  -> 10.205.1.3:53                Masq    1      0          0         
  -> 10.205.1.4:53                Masq    1      0          0

LVS会把443端口准发给其他节点的6443端口
traefik也会用到80和443端口，这样不会冲突吗？
LVS用到的443端口是哪个地方会用到？怎么修改这个呢？

install cni docker启动不了

ls: /calico-secrets: No such file or directory
Wrote Calico CNI binaries to /host/opt/cni/bin
CNI plugin version: v3.5.3
/host/secondary-bin-dir is non-writeable, skipping
Using CNI config template from CNI_NETWORK_CONFIG environment variable.
CNI config: {
  "name": "k8s-pod-network",
  "cniVersion": "0.3.0",
  "plugins": [
    {
      "type": "calico",
      "log_level": "info",
      "datastore_type": "kubernetes",
      "nodename": "master1",
      "mtu": 1440,
      "ipam": {
        "type": "host-local",
        "subnet": "usePodCidr"
      },
      "policy": {
          "type": "k8s"
      },
      "kubernetes": {
          "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "portmap",
      "snat": true,
      "capabilities": {"portMappings": true}
    }
  ]
}
Created CNI config 10-calico.conflist
Done configuring CNI.  Sleep=false

安装HA master的时候报证书错误

我是全离线安装，脚本下载到本地运行，安装HA master的时候报错：
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.240.200:6443"
[discovery] Failed to request cluster info, will try again: [Get https://192.168.240.200:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 192.168.240.200:6443: connect: connection refused]

本地curl 发现证书错误:
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).

[root@localhost ~]# cat cluster-info

CP0_IP=192.168.240.201
CP1_IP=192.168.240.202
CP2_IP=192.168.240.203
VIP=192.168.240.200
NET_IF=ens33
CIDR=10.244.0.0/16

可能是什么原因？

使用4.19版本内核导致无法开启ipvs

由于最新稳定版4.19内核将nf_conntrack_ipv4更名为nf_conntrack，目前的kube-proxy不支持在4.19版本内核下开启ipvs
详情可以查看：kubernetes/kubernetes#70304
对于该问题的修复10月30日刚刚合并到代码主干，所以目前还没有包含此修复的kubernetes版本发出
读者可以选择安装我提供的4.18版本内核，或者不开启IPVS
4.18版本内核RPM下载链接：https://pan.baidu.com/s/1dCeozuMRQ96MBBjGpf0cjA 提取码：3nqg

请问下为什么要用4.18版本内核？

是因为 4.18内核，kube-proxy才能支持ipvs ?
CentOS Linux系统默认的 (3.10.0-693.el7.x86_64) 7 (Core)
这个不支持吗？3.10.0-693.el7.x86_64打个补丁是否也可以？

还有个疑问，集群结构摘要没有看到traefik，是和lvs一起在load balance这层吗？

什么时候升级到v1.12.3啊?

Kubernetes首爆严重安全漏洞请升级Kubernetes
https://aqzt.com/5515.html
v1.12.1有这个漏洞吧？什么时候升级到v1.12.3啊?

k8s集群装完如何加apiServer的IP

请教一下，k8s集群安装完了，有个新的apiServer的IP，如何加apiServer的IP?
`kubernetesVersion: v1.12.1
apiServerCertSANs:

${CP0_IP}
${CP1_IP}
${CP2_IP}
${CP0_HOSTNAME}
${CP1_HOSTNAME}
${CP2_HOSTNAME}
${VIP}`
还是说只能重置k8s集群，重新初始化安装？应该有直接加apiServer的IP方法吧？

traefik如何换成nginx呢？指导下

traefik如何换成nginx呢？我想把traefik-ingress-controller换成ingress-nginx可以换吗？
或者项目是否可以支持可以选择traefik还是nginx ?

关于calico.yaml，官网提供的yaml总会有一个calico-kube-controllers这个deployment，为什么大佬的yaml没有这个呢

kubeadm join "VIP" 问题，

你好，请问此脚本生成后的集群如何将VIP 作为master的join IP，如下，在脚本执行完毕后，提示join master ip 是其中一个master的ip，是否可以将加入master的ip 修改为vip，如果操作。感谢

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

kubeadm join 10.23.0.100:6443 --token hl1tr9.kbdvliosudxkv0ad --discovery-token-ca-cert-hash sha256:3045a014cbf0958669d1a0023ac7fc5f5795836948f00ddbadaa58a080bd7915

Waiting for etcd bootup...

执行join的时候，提示 configmaps"cluster-info"禁止访问

脚本位置：https://github.com/Lentil1016/kubeadm-ha/blob/b2c2e5be3056babb3389b47db6f9a9b11041c3d1/kubeha-gen.sh#L172

错误提示：

front-proxy-ca.crt                                     100% 1038   761.9KB/s   00:00
front-proxy-ca.key                                     100% 1679   861.7KB/s   00:00
ca.crt                                                 100% 1017   543.4KB/s   00:00
ca.key                                                 100% 1679   659.8KB/s   00:00
admin.conf                                             100% 5444     2.0MB/s   00:00
admin.conf                                             100% 5444     3.4MB/s   00:00
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "10.1.1.8:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.1.1.8:6443"
[discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-public"]
[discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-public"]
[discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-public"]
[discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-public"]

报错后：

使用 kubectl get all -n kube-public命令发现这个namespace下没有任何资源。
在 /var/lib/kubelet/config.yaml 文件中（该文件是执行脚本过程生成的）看到有一项 anonymous:false配置，部分如下:

address: 0.0.0.0
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt

kube-public里既没有clusterinfo，又不允许匿名访问，请问一下，您在创建集群的时候有遇到过这种问题吗，有没有什么解决方案？

kubeadm-ha验证一下dns, pod network 问题。

集群节点

[root@k8s-m1 ~]# kubectl get cs
NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health": "true"}
[root@k8s-m1 ~]# kubectl get nodes
NAME     STATUS   ROLES    AGE    VERSION
k8s-m1   Ready    master   115m   v1.13.0
k8s-m2   Ready    master   115m   v1.13.0
k8s-m3   Ready    master   114m   v1.13.0
k8s-n1   Ready    <none>   19m    v1.13.0

[root@k8s-m1 ~]# kubectl get pods -n kube-system -o wide
NAME                                   READY   STATUS    RESTARTS   AGE    IP               NODE     NOMINATED NODE   READINESS GATES
calico-node-4q6ls                      2/2     Running   0          19m    192.168.88.114   k8s-n1   <none>           <none>
calico-node-5xcg6                      2/2     Running   0          115m   192.168.88.112   k8s-m2   <none>           <none>
calico-node-9jn4z                      2/2     Running   0          114m   192.168.88.113   k8s-m3   <none>           <none>
calico-node-ngm2t                      2/2     Running   0          115m   192.168.88.111   k8s-m1   <none>           <none>
coredns-86c58d9df4-nk9jl               1/1     Running   0          115m   10.244.0.2       k8s-m1   <none>           <none>
coredns-86c58d9df4-z8wtn               1/1     Running   0          115m   10.244.0.3       k8s-m1   <none>           <none>
etcd-k8s-m1                            1/1     Running   0          115m   192.168.88.111   k8s-m1   <none>           <none>
etcd-k8s-m2                            1/1     Running   0          115m   192.168.88.112   k8s-m2   <none>           <none>
etcd-k8s-m3                            1/1     Running   0          114m   192.168.88.113   k8s-m3   <none>           <none>
kube-apiserver-k8s-m1                  1/1     Running   0          114m   192.168.88.111   k8s-m1   <none>           <none>
kube-apiserver-k8s-m2                  1/1     Running   0          115m   192.168.88.112   k8s-m2   <none>           <none>
kube-apiserver-k8s-m3                  1/1     Running   0          114m   192.168.88.113   k8s-m3   <none>           <none>
kube-controller-manager-k8s-m1         1/1     Running   1          115m   192.168.88.111   k8s-m1   <none>           <none>
kube-controller-manager-k8s-m2         1/1     Running   0          115m   192.168.88.112   k8s-m2   <none>           <none>
kube-controller-manager-k8s-m3         1/1     Running   0          114m   192.168.88.113   k8s-m3   <none>           <none>
kube-proxy-49ds9                       1/1     Running   0          114m   192.168.88.113   k8s-m3   <none>           <none>
kube-proxy-grbhk                       1/1     Running   0          19m    192.168.88.114   k8s-n1   <none>           <none>
kube-proxy-n6pxn                       1/1     Running   0          115m   192.168.88.112   k8s-m2   <none>           <none>
kube-proxy-vdgwk                       1/1     Running   0          115m   192.168.88.111   k8s-m1   <none>           <none>
kube-scheduler-k8s-m1                  1/1     Running   1          115m   192.168.88.111   k8s-m1   <none>           <none>
kube-scheduler-k8s-m2                  1/1     Running   0          115m   192.168.88.112   k8s-m2   <none>           <none>
kube-scheduler-k8s-m3                  1/1     Running   0          114m   192.168.88.113   k8s-m3   <none>           <none>
kubernetes-dashboard-94b488ddb-qns8h   1/1     Running   0          114m   10.244.0.5       k8s-m1   <none>           <none>
metrics-server-5bff75b59f-mhs8d        2/2     Running   0          78m    10.244.2.2       k8s-m3   <none>           <none>
traefik-ingress-controller-859dl       1/1     Running   0          114m   192.168.88.111   k8s-m1   <none>           <none>
traefik-ingress-controller-fnt49       1/1     Running   0          114m   192.168.88.112   k8s-m2   <none>           <none>
traefik-ingress-controller-tctl4       1/1     Running   0          19m    192.168.88.114   k8s-n1   <none>           <none>
traefik-ingress-controller-tmqn2       1/1     Running   0          113m   192.168.88.113   k8s-m3   <none>           <none>

测试方式

kubectl create deployment nginx --image=nginx:alpine

kubectl get pods -l app=nginx -o wide

kubectl scale deployment nginx --replicas=2

kubectl expose deployment nginx --port=80 --type=NodePort

kubectl get services nginx

NodePort 负载正常。

验证 dns, pod network 这里就不行。

[root@k8s-m1 ~]#  kubectl run -it curl --image=radial/busyboxplus:curl
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
[ root@curl-66959f6557-q9964:/ ]$ nslookup nginx
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      nginx
Address 1: 198.18.4.134
[ root@curl-66959f6557-q9964:/ ]$ curl http://198.18.4.134
curl: (56) Recv failure: Connection reset by peer
[ root@curl-66959f6557-q9964:/ ]$ curl http://nginx
curl: (56) Recv failure: Connection reset by peer
[ root@curl-66959f6557-q9964:/ ]$ exit
Session ended, resume using 'kubectl attach curl-66959f6557-q9964 -c curl -i -t' command when the pod is running

198.18 这个是calico的吗？？

kubeadm init 执行报错问题

1、脚本版本1.11.0
2、cluster-info配置

3、执行kubeadm提示如下异常，请问可能是什么原因导致？

执行1.12.1版本部署脚本时，设置10.244.0.0/16网段的podSubnet，有时却出现172.17.0.0/16网段的容器

该问题有几率产生以下后果：

calico-node容器始终保持不可用
dashboard和heapster等容器需要重启后才能访问
coredns被启动在172.17网段，导致无法解析服务名

一次典型的故障示例如下：

kubectl get pods -o wide -n kube-system                                  
NAME                                              READY   STATUS    RESTARTS   AGE     IP             NODE                    NOMINATED NODE
calico-node-hc6k7                                 1/2     Running   0          3m33s   10.130.29.82   centos-7-x86-64-29-82   <none>
calico-node-jksjw                                 1/2     Running   0          3m33s   10.130.29.81   centos-7-x86-64-29-81   <none>
calico-node-xbblx                                 1/2     Running   1          3m33s   10.130.29.80   centos-7-x86-64-29-80   <none>
coredns-576cbf47c7-mb5xc                          1/1     Running   0          5m45s   10.244.0.4     centos-7-x86-64-29-80   <none>
coredns-576cbf47c7-tzw72                          1/1     Running   0          5m45s   10.244.0.5     centos-7-x86-64-29-80   <none>
etcd-centos-7-x86-64-29-80                        1/1     Running   0          4m46s   10.130.29.80   centos-7-x86-64-29-80   <none>
etcd-centos-7-x86-64-29-81                        1/1     Running   0          102s    10.130.29.81   centos-7-x86-64-29-81   <none>
etcd-centos-7-x86-64-29-82                        1/1     Running   0          2m2s    10.130.29.82   centos-7-x86-64-29-82   <none>
heapster-v1.5.4-65ff99c48d-25xvw                  2/2     Running   0          3m17s   172.17.0.2     centos-7-x86-64-29-81   <none>
kube-apiserver-centos-7-x86-64-29-80              1/1     Running   0          2m51s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-apiserver-centos-7-x86-64-29-81              1/1     Running   3          2m46s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-apiserver-centos-7-x86-64-29-82              1/1     Running   0          106s    10.130.29.82   centos-7-x86-64-29-82   <none>
kube-controller-manager-centos-7-x86-64-29-80     1/1     Running   2          4m48s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-controller-manager-centos-7-x86-64-29-81     1/1     Running   1          2m46s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-controller-manager-centos-7-x86-64-29-82     1/1     Running   0          104s    10.130.29.82   centos-7-x86-64-29-82   <none>
kube-proxy-4l7fg                                  1/1     Running   0          3m55s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-proxy-hd2k5                                  1/1     Running   0          3m39s   10.130.29.82   centos-7-x86-64-29-82   <none>
kube-proxy-mgttp                                  1/1     Running   0          5m45s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-scheduler-centos-7-x86-64-29-80              1/1     Running   2          4m46s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-scheduler-centos-7-x86-64-29-81              1/1     Running   1          3m17s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-scheduler-centos-7-x86-64-29-82              1/1     Running   0          107s    10.130.29.82   centos-7-x86-64-29-82   <none>
kubernetes-dashboard-6b475b66b5-d7d78             1/1     Running   0          28s     10.244.0.6     centos-7-x86-64-29-80   <none>
monitoring-influxdb-grafana-v4-65cc9bb8c8-2zcbz   2/2     Running   0          3m17s   172.17.0.3     centos-7-x86-64-29-81   <none>
traefik-ingress-controller-2wkn9                  1/1     Running   0          3m22s   10.130.29.82   centos-7-x86-64-29-82   <none>
traefik-ingress-controller-jkxvg                  1/1     Running   0          3m22s   10.130.29.81   centos-7-x86-64-29-81   <none>
traefik-ingress-controller-kb9v4                  1/1     Running   0          2m19s   10.130.29.80   centos-7-x86-64-29-80   <none>

网络模块更新为flannel

我把kubeha-gen.sh中的calico更换为flannel，在两节点的集群环境中，一个master状态为ready另一个unready。检查发现flannel的pod只在正常节点上创建运行。flannel的yml文件如下。

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
rules:
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni
        image: registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni
        image: registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist

daemonset 不应该是每个节点都会创建相应pod么？
尝试过在创建好的集群中再次创建flannel 都会报重复创建的错误。
有什么解决办法么？

3台k8s组ha挂了两台是否会有影响

按ha kubernetes 1.11.0/1.12.1 cluster架构图，3台master,如果挂了两台，不会有影响，我测试发现dashboard页面会报Bad Gateway，不会全部切换到正常的节点吗？
VIP是有切换到正常的节点

keepalived 测试VIP无法正常漂移,主节点只要开机就会固定VIP

[root@node2 ~]# ip addr|grep inet|grep ens32
    inet 192.168.0.12/24 brd 192.168.0.255 scope global noprefixroute ens32
[root@node2 ~]# for i in 10 11 12 15; do echo -n 192.168.0.$i"-->" ;curl https://192.168.0.$i:6443/healthz -k -m 3 -s  ;echo; done
192.168.0.10-->
192.168.0.11-->
192.168.0.12-->ok
192.168.0.15-->ok
[root@node2 ~]# ip addr|grep inet|grep ens32
    inet 192.168.0.12/24 brd 192.168.0.255 scope global noprefixroute ens32
[root@node2 ~]# ssh 192.168.0.11 ip addr|grep inet|grep ens32
    inet 192.168.0.11/24 brd 192.168.0.255 scope global noprefixroute ens32
    inet 192.168.0.10/32 scope global ens32
[root@node2 ~]# ssh 192.168.0.11 lsof -i:6443
[root@node2 ~]# lsof -i:6443 |head -1
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
[root@node2 ~]# lsof -i:6443 |head -3
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
kube-apis 5112 root    3u  IPv6  78963      0t0  TCP *:sun-sr-https (LISTEN)
kube-apis 5112 root    6u  IPv6 225893      0t0  TCP node:sun-sr-https->node2:45511 (ESTABLISHED)
[root@node2 ~]#

请教下安装镜像问题

看了作者的kubeadm HA master(v1.11.0)集群搭建指南文章，其实说到
harbor只支持{hostname}/{project}/{image}:{tag}，所以k8s.gcr.io/pause:3.1一类的镜像放不进去。
作者安装的脚本和yamp编排文件中是否可以修改成{hostname}/{project}/{image}:{tag}这样的格式呢？
有办法吗？如果不能修改为什么呢？官网安装yaml限制了吗？这块不懂，请教一下

metrics-server 不支持docker使用systemd么

安装metrics-server

报错
截图如下：

我docker使用的是systemd

initial master timeout 151.101.132.133:443: i/o timeout

很奇怪在我运行脚本的过程中，似乎脚本要访问这个server，貌似是一个公网的ip，不知道这个脚本需要从这个ip取什么东西吗——

[markmaster] Marking the node k8s-master03 as master by adding the label "node-role.kubernetes.io/master=''"
[markmaster] Marking the node k8s-master03 as master by adding the taints [node-role.kubernetes.io/master:NoSchedule]
Unable to connect to the server: dial tcp 151.101.132.133:443: i/o timeout
Unable to connect to the server: dial tcp 151.101.132.133:443: i/o timeout
Cluster create finished.

Email Address []:[email protected]
secret/ssl created
Unable to connect to the server: dial tcp 151.101.132.133:443: i/o timeout
^C

K8S HA高可用切换测试问题

按照文档部署集群
1.使用power off 和shutdown关闭管理节点apiserver会在10S内切换。
2.物理机直接关闭电源虚拟机直接通过控制台关闭电源，会出现apiserver5min以上不可用，虚拟IP可以正常切换（关闭kube-controller-manager，kube-scheduler所选主的那个节点）还请帮忙确认下

按照脚本部署的只有一个 master节点。

集群节点配置:

[root@k8s-m1 kubeadm-ha]# ./kubeha-gen.sh

cluster-info:
  master-01:        192.168.88.111
  master-02:        192.168.88.112
  master-02:        192.168.88.113
  VIP:              192.168.88.110
  Net Interface:    eth1
  CIDR:             10.244.0.0/16

日志

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

keepalived-1.conf                                                                                                100% 1266     1.1MB/s   00:00
Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service.
[preflight] running pre-flight checks
[reset] no etcd config found. Assuming external etcd
[reset] please manually reset etcd to prevent further issues
[reset] stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

keepalived-2.conf                                                                                                100% 1266   658.7KB/s   00:00
Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service.
[preflight] running pre-flight checks
[reset] no etcd config found. Assuming external etcd
[reset] please manually reset etcd to prevent further issues
[reset] stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

[init] Using Kubernetes version: v1.13.0
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-m1 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-m1 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-m1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.2.15 192.168.88.110 192.168.88.111 192.168.88.112 192.168.88.113 192.168.88.110]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 21.505018 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "k8s-m1" as an annotation
[mark-control-plane] Marking the node k8s-m1 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8s-m1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: r7p6k3.6rifm9nykjsqabh4
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 192.168.88.110:6443 --token r7p6k3.6rifm9nykjsqabh4 --discovery-token-ca-cert-hash sha256:937c26b6ffa84b341ac95122fedc1fe3e5aeba4677ad36cf6b4adb9cd6c693d8

clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
configmap/calico-config created
service/calico-typha created
deployment.apps/calico-typha created
poddisruptionbudget.policy/calico-typha created
daemonset.extensions/calico-node created
serviceaccount/calico-node created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
ca.crt                                                                                                           100% 1025   494.9KB/s   00:00
ca.key                                                                                                           100% 1675     1.3MB/s   00:00
sa.key                                                                                                           100% 1675     1.3MB/s   00:00
sa.pub                                                                                                           100%  451   354.7KB/s   00:00
front-proxy-ca.crt                                                                                               100% 1038     1.1MB/s   00:00
front-proxy-ca.key                                                                                               100% 1675     1.4MB/s   00:00
ca.crt                                                                                                           100% 1017   835.0KB/s   00:00
ca.key                                                                                                           100% 1675     1.5MB/s   00:00
admin.conf                                                                                                       100% 5454     1.5MB/s   00:00
admin.conf                                                                                                       100% 5454     4.7MB/s   00:00
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[discovery] Trying to connect to API Server "192.168.88.110:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.88.110:6443"
[discovery] Failed to connect to API Server "192.168.88.110:6443": token id "8zioq9" is invalid for this cluster or it has expired. Use "kubeadm token create" on the master node to creating a new valid token
[discovery] Trying to connect to API Server "192.168.88.110:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.88.110:6443"
[discovery] Failed to connect to API Server "192.168.88.110:6443": token id "8zioq9" is invalid for this cluster or it has expired. Use "kubeadm token create" on the master node to creating a new valid token
[discovery] Trying to connect to API Server "192.168.88.110:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.88.110:6443"
[discovery] Requesting info from "https://192.168.88.110:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.88.110:6443"
[discovery] Successfully established connection with API Server "192.168.88.110:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[join] Running pre-flight checks before initializing the new control plane instance
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-m2 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-m2 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-m2 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.2.15 192.168.88.110 192.168.88.111 192.168.88.112 192.168.88.113 192.168.88.110]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Using existing up-to-date kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Checking Etcd cluster health
error syncing endpoints with etc: dial tcp 10.0.2.15:2379: connect: connection refused
ca.crt                                                                                                           100% 1025   705.4KB/s   00:00
ca.key                                                                                                           100% 1675     1.6MB/s   00:00
sa.key                                                                                                           100% 1675     1.0MB/s   00:00
sa.pub                                                                                                           100%  451   410.0KB/s   00:00
front-proxy-ca.crt                                                                                               100% 1038   887.2KB/s   00:00
front-proxy-ca.key                                                                                               100% 1675     1.2MB/s   00:00
ca.crt                                                                                                           100% 1017   676.7KB/s   00:00
ca.key                                                                                                           100% 1675     1.6MB/s   00:00
admin.conf                                                                                                       100% 5454     3.2MB/s   00:00
admin.conf                                                                                                       100% 5454     3.6MB/s   00:00
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[discovery] Trying to connect to API Server "192.168.88.110:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.88.110:6443"
[discovery] Requesting info from "https://192.168.88.110:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.88.110:6443"
[discovery] Successfully established connection with API Server "192.168.88.110:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[join] Running pre-flight checks before initializing the new control plane instance
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-m3 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-m3 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-m3 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.2.15 192.168.88.110 192.168.88.111 192.168.88.112 192.168.88.113 192.168.88.110]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Using existing up-to-date kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Checking Etcd cluster health
error syncing endpoints with etc: dial tcp 10.0.2.15:2379: connect: connection refused
Cluster create finished.
Generating a 4096 bit RSA private key
..............++
..++
writing new private key to '/root/ikube/tls/tls.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) []:CN
State or Province Name (full name) []:Beijing
Locality Name (eg, city) []:Haidian
Organization Name (eg, company) []:Channelsoft
Organizational Unit Name (eg, section) []:R & D Department
Common Name (eg, your name or your server's hostname) []:*.multi.io
Email Address []:[email protected]
secret/ssl created
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
configmap/metrics-server-config created
deployment.extensions/metrics-server created
service/metrics-server created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-key-holder created
serviceaccount/kubernetes-dashboard-admin created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard-admin created
clusterrole.rbac.authorization.k8s.io/cluster-watcher created
deployment.apps/kubernetes-dashboard created
service/kubernetes-dashboard created
ingress.extensions/dashboard created
Plugin install finished.

Plugin install finished.
Waiting for all pods into 'Running' status. You can press 'Ctrl + c' to terminate this waiting any time you like.

NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health": "true"}
NAME     STATUS   ROLES    AGE    VERSION
k8s-m1   Ready    master   119s   v1.13.0
NAME                                   READY   STATUS    RESTARTS   AGE
calico-node-gnfsn                      2/2     Running   0          100s
coredns-86c58d9df4-nfdf4               1/1     Running   0          100s
coredns-86c58d9df4-qdsz5               1/1     Running   0          100s
etcd-k8s-m1                            1/1     Running   0          66s
kube-apiserver-k8s-m1                  1/1     Running   0          51s
kube-controller-manager-k8s-m1         1/1     Running   0          59s
kube-proxy-ht6pq                       1/1     Running   0          100s
kube-scheduler-k8s-m1                  1/1     Running   0          44s
kubernetes-dashboard-94b488ddb-7nznf   1/1     Running   0          42s
metrics-server-64d45d7bd7-fkxz6        2/2     Running   0          10s

k8s集群如果出现一台异常疑问

k8s集群三台如果出现一台异常无法恢复怎么加一台master ？
是要重新初始化，重新生成证书吗？这样应该会影响k8s集群上的应用吧
v1.12.X 用的这个版本，v1.12.X和v1.13.X有什么好方法解决这种情况吗？