zhangguanzhang / kubernetes-ansible Goto Github PK
View Code? Open in Web Editor NEW:christmas_tree:ansible多网卡机器上一键部署高可用Kubernetes(systemd)
:christmas_tree:ansible多网卡机器上一键部署高可用Kubernetes(systemd)
张工,好,以下问题求助,有空帮忙分析下那里出了问题,
failing or missing response from https://10.96.18.80:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.18.80:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
v1beta1.metrics.k8s.io kube-system/metrics-server False (FailedDiscoveryCheck) 7m20s
[root@ CoreAddons]#kubectl get all -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/coredns-59fc9fcd9b-4fcgl 1/1 Running 0 11h 192.1.0.12 10.249.13.160 <none> <none>
pod/coredns-59fc9fcd9b-rms4b 1/1 Running 0 11h 192.1.3.19 10.249.13.162 <none> <none>
pod/metrics-server-576f8588d9-fcbhv 1/1 Running 0 7m46s 192.1.3.21 10.249.13.162 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kube-controller-manager ClusterIP 10.96.13.224 <none> 10252/TCP 11d <none>
service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 11h k8s-app=kube-dns
service/kube-scheduler ClusterIP 10.96.112.106 <none> 10251/TCP 11d <none>
service/metrics-server ClusterIP 10.96.18.80 <none> 443/TCP 7m46s k8s-app=metrics-server
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/coredns 2/2 2 2 11h coredns 10.249.12.47/k8sv17/coredns:1.6.5 k8s-app=kube-dns
deployment.apps/metrics-server 1/1 1 1 7m46s metrics-server 10.249.12.47/k8sv17/metrics-server-amd64:v0.3.6 k8s-app=metrics-server
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/coredns-59fc9fcd9b 2 2 2 11h coredns 10.249.12.47/k8sv17/coredns:1.6.5 k8s-app=kube-dns,pod-template-hash=59fc9fcd9b
replicaset.apps/metrics-server-576f8588d9 1 1 1 7m46s metrics-server 10.249.12.47/k8sv17/metrics-server-amd64:v0.3.6 k8s-app=metrics-server,pod-template-hash=576f8588d9
Kubernetes-ansible/roles/master/tasks/main.yml
when: inventory_hostname in groups['Master']
Kubernetes-ansible/roles/master/templates/kube-apiserver.service.j2
--apiserver-count={{ groups['Master'] | length }} \
use master branches:
error:
["changed": false, "msg": "Could not find or access 'common/time/ntp.conf.j2']
because:
path:[/tasks/time/chrony.yml] or [ntp.yml]
- name: Send ntp configuration file
template: src=common/time/ntp.conf.j2 dest=/etc/ntp.conf
modify
- name: Send ntp configuration file
template: src=ntp.conf.j2 dest=/etc/ntp.conf```
name is chrony
你的连接搞成这样,又依靠分支分支搞,实在没办法给你推送新的东西,本来想修改点东西给到你。
Nov 26 13:41:09 master01 sshd[16199]: pam_unix(sshd:session): session opened for user root by (uid=0)
Nov 26 13:41:10 master01 kubelet[6675]: E1126 13:41:10.001237 6675 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSIDriver: Get https://172.16.128.24
Nov 26 13:41:10 master01 kubelet[6675]: E1126 13:41:10.004250 6675 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Get https://172.16.128.240:84
Nov 26 13:41:10 master01 kubelet[6675]: E1126 13:41:10.009359 6675 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: Get https://172.16.128
Nov 26 13:41:10 master01 kubelet[6675]: E1126 13:41:10.010773 6675 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://172.16.128.24
Nov 26 13:41:10 master01 kubelet[6675]: E1126 13:41:10.013051 6675 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get https://172.16.128.240:8443/
Nov 26 13:41:10 master01 kubelet[6675]: E1126 13:41:10.344709 6675 controller.go:125] failed to ensure node lease exists, will retry in 7s, error: Get https://172.16.128.240:8443/apis/coor
Nov 26 13:41:10 master01 kube-apiserver[2256]: E1126 13:41:10.411097 2256 authentication.go:65] Unable to authenticate the request due to an error: [invalid bearer token, square/go-jose: e
Nov 26 13:41:10 master01 flanneld[15541]: E1126 13:41:10.828927 15541 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:310: Failed to list *v1.Node: Unauthorized
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
k8s-app: metrics-server
name: metrics-server
spec:
containers:
- args:
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
image: zhangguanzhang/metrics-server:v0.3.6
imagePullPolicy: IfNotPresent
name: metrics-server
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: tmp-dir
- mountPath: /etc/localtime
name: host-time
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: metrics-server
serviceAccountName: metrics-server
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /etc/localtime
type: ""
name: host-time
- emptyDir: {}
name: tmp-dir
Events:
Type Reason Age From Message
Normal Scheduled 46m default-scheduler Successfully assigned kube-system/coredns-58b448b5d9-hhlxt to 158.143.70.21
Warning FailedCreatePodSandBox 33m (x19 over 46m) kubelet, 158.143.70.21 Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image "gcr.azk8s.cn/google_containers/pause-amd64:3.1": Error response from daemon: Get http://gcr.azk8s.cn/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning FailedCreatePodSandBox 33m kubelet, 158.143.70.21 Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "aa57fe03f3b22c54a17783490252498e67bcdd7bb4c226c7055df91625d4551c" network for pod "coredns-58b448b5d9-hhlxt": NetworkPlugin cni failed to set up pod "coredns-58b448b5d9-hhlxt_kube-system" network: failed to find plugin "loopback" in path [/opt/cni/bin], failed to clean up sandbox container "aa57fe03f3b22c54a17783490252498e67bcdd7bb4c226c7055df91625d4551c" network for pod "coredns-58b448b5d9-hhlxt": NetworkPlugin cni failed to teardown pod "coredns-58b448b5d9-hhlxt_kube-system" network: failed to find plugin "portmap" in path [/opt/cni/bin]]
Normal SandboxChanged 11m (x104 over 33m) kubelet, 158.143.70.21 Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 3m9s (x26 over 8m43s) kubelet, 158.143.70.21 Pod sandbox changed, it will be killed and re-created.
[root@node1 .kube]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-58b448b5d9-hhlxt 0/1 ContainerCreating 0 47m
coredns-58b448b5d9-mptg5 0/1 ContainerCreating 0 47m
metrics-server-86c9cbd9f5-zblwz 0/1 ContainerCreating 0 47m
[root@node1 .kube]#
Searching for interface using 10.249.21.198
Using interface with name eth0 and address 10.249.21.198
Using 10.249.21.198 as external address
Waiting 10m0s for node controller to sync
Starting kube subnet manager
Node controller sync successful
Created subnet manager: Kubernetes Subnet Manager - 10.249.21.198
Installing signal handlers
Found network config - Backend type: vxlan
VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
VXLAN device already exists
Returning existing device
Error registering network: failed to acquire lease: node "10.249.21.198" p
Stopping shutdownHandler...
Start healthz server on 10.249.21.198:8471
ansible-playbook deploy.yml --tags tls 出错(我用的是独立ansible机器,不是master0),信息如下:
TASK [tls : apiserver-etcd-client --- part.1] *************************************************************************************************************************************************
changed: [localhost]
TASK [tls : apiserver-etcd-client --- part.2] *************************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "openssl x509 -in apiserver-etcd-client.csr -req -CA etcd/ca.crt -CAkey etcd/ca.key -CAcreateserial -extensions v3_req_etcd -extfile openssl.cnf -out apiserver-etcd-client.crt -days 10000\n", "delta": "0:00:00.013557", "end": "2019-07-10 15:03:09.043126", "msg": "non-zero return code", "rc": 1, "start": "2019-07-10 15:03:09.029569", "stderr": "Error Loading extension section v3_req_etcd\n140280250644368:error:220A4076:X509 V3 routines:a2i_GENERAL_NAME:bad ip address:v3_alt.c:476:value=etcd001.k8s.local\n140280250644368:error:22098080:X509 V3 routines:X509V3_EXT_nconf:error in extension:v3_conf.c:95:name=subjectAltName, value=@alt_names_etcd", "stderr_lines": ["Error Loading extension section v3_req_etcd", "140280250644368:error:220A4076:X509 V3 routines:a2i_GENERAL_NAME:bad ip address:v3_alt.c:476:value=etcd001.k8s.local", "140280250644368:error:22098080:X509 V3 routines:X509V3_EXT_nconf:error in extension:v3_conf.c:95:name=subjectAltName, value=@alt_names_etcd"], "stdout": "", "stdout_lines": []}
按照你的guide, 试了几次都是到flannel 失败。
TASK [KubernetesCoreAddons : 开机并启动flanneld] *********************************************************************
fatal: [192.168.1.17]: FAILED! => {"changed": false, "msg": "Unable to restart service flanneld: Job for flanneld.service failed because a timeout was exceeded. See "systemctl status flanneld.service" and "journalctl -xe" for details.\n"}
fatal: [192.168.1.18]: FAILED! => {"changed": false, "msg": "Unable to restart service flanneld: Job for flanneld.service failed because a timeout was exceeded. See "systemctl status flanneld.service" and "journalctl -xe" for details.\n"}
fatal: [192.168.1.38]: FAILED! => {"changed": false, "msg": "Unable to restart service flanneld: Job for flanneld.service failed because a timeout was exceeded. See "systemctl status flanneld.service" and "journalctl -xe" for details.\n"}
fatal: [192.168.1.28]: FAILED! => {"changed": false, "msg": "Unable to restart service flanneld: Job for flanneld.service failed because a timeout was exceeded. See "systemctl status flanneld.service" and "journalctl -xe" for details.\n"}
fatal: [192.168.1.27]: FAILED! => {"changed": false, "msg": "Unable to restart service flanneld: Job for flanneld.service failed because a timeout was exceeded. See "systemctl status flanneld.service" and "journalctl -xe" for details.\n"}
[root@vm17 CoreAddons]# journalctl -xe |grep flanneld
Jul 08 16:36:05 vm17.suibian.int flanneld[32209]: E0708 16:36:05.632948 32209 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:310: Failed to list *v1.Node: Unauthorized
ifconfig
[root@vm17 CoreAddons]# ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:8d:03:72:29 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.17 netmask 255.255.255.0 broadcast 192.168.1.255
ether 52:54:00:4c:eb:28 txqueuelen 1000 (Ethernet)
RX packets 4056110 bytes 543595368 (518.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3506639 bytes 782355415 (746.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 809083 bytes 110526361 (105.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 809083 bytes 110526361 (105.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
roles和scripts下都缺少文件呢
请教:
为何kubelet 的证书就一年,ansible安装的参数文件配置的是10年
/etc/kubernetes/pki/kubelet.crt
notBefore=Aug 20 11:52:15 2020 GMT
notAfter=Aug 20 11:52:15 2021 GMT
其它证书没问题,还有个admin.crt是做什么用的?有时也是一年。
/etc/kubernetes/pki/front-proxy-client.crt
notBefore=Aug 20 12:47:08 2020 GMT
notAfter=Jun 5 12:47:08 2294 GMT
/etc/kubernetes/pki/kube-scheduler.crt
notBefore=Aug 20 12:47:09 2020 GMT
notAfter=Jun 5 12:47:09 2294 GMT
/etc/kubernetes/pki/sa.crt
notBefore=Aug 20 12:47:09 2020 GMT
notAfter=Jun 5 12:47:09 2294 GMT
/etc/kubernetes/pki/admin.crt
notBefore=Aug 20 12:47:10 2020 GMT
notAfter=Jun 5 12:47:10 2294 GMT
Ubuntu 18.04.4 not have this directory, how about "/etc/systemd/system" ?
fatal: [192.168.11.172]: FAILED! => {"changed": false, "msg": "No package matching 'docker-ce-3:18.09.7-3.el7' found available, installed or updated", "rc": 126, "results": ["No package match
ing 'docker-ce-3:18.09.7-3.el7' found available, installed or updated"]}
git clone下来,
roles 目录 都为空,是昨加事?求助,多谢
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.