Coder Social home page Coder Social logo

个人部署记录 about kubeasy HOT 33 CLOSED

ztdawang avatar ztdawang commented on July 27, 2024
个人部署记录

from kubeasy.

Comments (33)

ztdawang avatar ztdawang commented on July 27, 2024

另外:建议启用eBPF

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

1、含空格的文件

opensuse 系统 {{ ansible_distribution_file_variety }} 默认的名称,不予改动

2、添加AlmaLinux9.0支持:

make runtime 根据包管理工具,而不是更具系统版本判断,如有报错,请提供报错信息截图

3、metrics报错:

是否修改K8S默认参数?

4、calico问题

多网卡?

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

另外:建议启用eBPF

eBPF对内核有要求,默认不开启,特殊需要可自定义配置

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

metrics报错:已修复

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

calico问题:calico官方没有对1.25测试,所以请等待calico官方支持或使用其他网络插件

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

怪不得calico我一直部署失败

#明明使用的是IPIP模式,却报BGP AS号错误
2022-11-17 04:51:01.723 [INFO][9] startup/startup.go 701: No AS number configured on node resource, using global value

#健康检查一直通不过
Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused

Readiness probe failed: 2022-11-17 04:16:55.353 [INFO][229] confd/health.go 180: Number of node(s) with BGP peering established = 2

Readiness probe failed: 2022-11-17 04:55:05.392 [INFO][1029] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.126.

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

make download只是把一些程序下载到/tmp/k8s-component目录下而已,似乎没啥用?
建议让用户在安装集群前 先执行下载所需要的程序和镜像(完善一下make download),分发到各个节点(master节点需要哪些,node节点需要哪些)确保完整后,再开始其它安装步骤。

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

本来是准备使用 make download 来打包所需要的二进制软件包,去掉 scripts/version.py 文件和 plugins/filter/extension.py 中重复代码,但是想了一下,每次release都会通过脚本打包,就保留下来了这个download.yml,该命令不是给用户用的

具体部署应该是通过release页面下载所需要的软件包
步骤参考:https://github.com/buxiaomo/kubeasy#used-kubeasy-binary-or-kubeasy-registry-file
通过该步骤,5分钟即可部署好集群,不需要等太长时间

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

谢谢!
按照上面的这个步骤,我又重新部署了一遍集群
网络插件换成flanneld,pod也是一直崩溃,报错如下:(没重新安装集群前,我安装了一遍flanneld也是报这个错)
reflector.go:347] github.com/flannel-io/flannel/subnet/kube/kube.go:421: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

ip link查看:
两台node节点 无 cni0 veth虚拟设备

ip route查看:
无10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1 这种记录

不知道AlmaLinux 9.0在网络方面有何特殊之处?

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

装完之后过一回就有报错了

[root@localhost ~]# kubectl get no -o wide
NAME        STATUS   ROLES                  AGE    VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION                CONTAINER-RUNTIME
localhost   Ready    control-plane,master   4m3s   v1.25.4   172.16.0.205   <none>        AlmaLinux 9.0 (Emerald Puma)   5.14.0-70.13.1.el9_0.x86_64   containerd://1.6.8
[root@localhost ~]# kubectl get po -o wide -A
NAMESPACE     NAME                              READY   STATUS    RESTARTS      AGE     IP             NODE        NOMINATED NODE   READINESS GATES
kube-system   coredns-77767ff8f8-2p8j8          1/1     Running   0             3m18s   10.244.0.2     localhost   <none>           <none>
kube-system   kube-flannel-ds-pz2fm             1/1     Running   0             3m20s   172.16.0.205   localhost   <none>           <none>
kube-system   metrics-server-74bbf889cb-pzl6k   1/1     Running   1 (63s ago)   3m19s   10.244.0.4     localhost   <none>           <none>
[root@localhost ~]# kubectl logs  -n kube-system kube-flannel-ds-pz2fm
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I1117 08:22:37.777669       1 main.go:204] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true}
W1117 08:22:37.777751       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1117 08:22:37.786291       1 kube.go:126] Waiting 10m0s for node controller to sync
I1117 08:22:37.786319       1 kube.go:420] Starting kube subnet manager
I1117 08:22:38.787780       1 kube.go:133] Node controller sync successful
I1117 08:22:38.787823       1 main.go:224] Created subnet manager: Kubernetes Subnet Manager - localhost
I1117 08:22:38.787827       1 main.go:227] Installing signal handlers
I1117 08:22:38.787879       1 main.go:467] Found network config - Backend type: vxlan
I1117 08:22:38.787894       1 match.go:206] Determining IP address of default interface
I1117 08:22:38.788175       1 match.go:259] Using interface with name ens33 and address 172.16.0.205
I1117 08:22:38.788242       1 match.go:281] Defaulting external address to interface address (172.16.0.205)
I1117 08:22:38.788314       1 vxlan.go:138] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=true
I1117 08:22:38.816397       1 main.go:416] Current network or subnet (10.244.0.0/16, 10.244.0.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules
I1117 08:22:38.977946       1 main.go:342] Setting up masking rules
I1117 08:22:38.979644       1 main.go:364] Changing default FORWARD chain policy to ACCEPT
I1117 08:22:38.981301       1 main.go:379] Wrote subnet file to /run/flannel/subnet.env
I1117 08:22:38.981362       1 main.go:383] Running backend.
I1117 08:22:39.076378       1 vxlan_network.go:61] watching for new subnet leases
I1117 08:22:39.077515       1 main.go:404] Waiting for all goroutines to exit
I1117 08:22:39.095622       1 iptables.go:219] bootstrap done
I1117 08:22:39.095788       1 iptables.go:219] bootstrap done

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

我的虚拟机之前做了快照,恢复了初始化的状态。

你是不是应该搞几台测试比较准确?1个master+2个node

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

和几台机器没关系,几台机器是跨主机通讯的问题,你这本身就起不来,另外直接把 kubectl logs 的结果贴上来,另外提供一下网络环境,是否多网卡

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

无多网卡

/etc/NetworkManager/system-connections/ens192-4ef846e1-e2c5-4ba3-8f05-bacc65c6c029.nmconnection
#改过文件名,修改了UUID和里面的内容 的UUID

kubectl -n kube-system logs -f kube-flannel-ds-6mdv6

Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I1117 08:30:19.478370 1 main.go:204] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[ens192] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true}
W1117 08:30:19.478485 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I1117 08:30:19.582909 1 kube.go:126] Waiting 10m0s for node controller to sync
I1117 08:30:19.583004 1 kube.go:420] Starting kube subnet manager
I1117 08:30:20.583661 1 kube.go:133] Node controller sync successful
I1117 08:30:20.583721 1 main.go:224] Created subnet manager: Kubernetes Subnet Manager - v-k8s-kubeasy-185
I1117 08:30:20.583729 1 main.go:227] Installing signal handlers
I1117 08:30:20.583874 1 main.go:467] Found network config - Backend type: vxlan
I1117 08:30:20.584225 1 match.go:259] Using interface with name ens192 and address 10.126.12.185
I1117 08:30:20.584271 1 match.go:281] Defaulting external address to interface address (10.126.12.185)
I1117 08:30:20.584347 1 vxlan.go:138] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=true
I1117 08:30:20.584901 1 main.go:342] Setting up masking rules
I1117 08:30:20.878246 1 main.go:364] Changing default FORWARD chain policy to ACCEPT
I1117 08:30:21.074158 1 main.go:379] Wrote subnet file to /run/flannel/subnet.env
I1117 08:30:21.074195 1 main.go:383] Running backend.
I1117 08:30:21.074323 1 vxlan_network.go:61] watching for new subnet leases
I1117 08:30:21.083003 1 main.go:404] Waiting for all goroutines to exit
I1117 08:30:21.173312 1 iptables.go:219] bootstrap done
I1117 08:30:21.475115 1 iptables.go:219] bootstrap done
I1117 08:31:24.378890 1 watch.go:39] context canceled, close receiver chan
I1117 08:31:24.378920 1 main.go:451] shutdownHandler sent cancel signal...
I1117 08:31:24.378920 1 vxlan_network.go:75] evts chan closed
I1117 08:31:24.379069 1 main.go:407] Exiting cleanly...

kubectl -n kube-system logs -f kube-flannel-ds-sfjmb
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I1117 08:31:50.886697 1 main.go:204] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[ens192] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true}
W1117 08:31:50.886973 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I1117 08:31:51.081319 1 kube.go:126] Waiting 10m0s for node controller to sync
I1117 08:31:51.081381 1 kube.go:420] Starting kube subnet manager
I1117 08:31:52.081585 1 kube.go:133] Node controller sync successful
I1117 08:31:52.081642 1 main.go:224] Created subnet manager: Kubernetes Subnet Manager - v-k8s-kubeasy-182
I1117 08:31:52.081651 1 main.go:227] Installing signal handlers
I1117 08:31:52.081894 1 main.go:467] Found network config - Backend type: vxlan
I1117 08:31:52.082390 1 match.go:259] Using interface with name ens192 and address 10.126.12.182
I1117 08:31:52.082470 1 match.go:281] Defaulting external address to interface address (10.126.12.182)
I1117 08:31:52.082565 1 vxlan.go:138] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=true
I1117 08:31:52.083096 1 main.go:342] Setting up masking rules
I1117 08:31:52.383284 1 main.go:364] Changing default FORWARD chain policy to ACCEPT
I1117 08:31:52.483265 1 main.go:379] Wrote subnet file to /run/flannel/subnet.env
I1117 08:31:52.483298 1 main.go:383] Running backend.
I1117 08:31:52.483657 1 vxlan_network.go:61] watching for new subnet leases
I1117 08:31:52.581093 1 main.go:404] Waiting for all goroutines to exit
I1117 08:31:52.781155 1 iptables.go:219] bootstrap done
I1117 08:31:52.786050 1 iptables.go:219] bootstrap done
I1117 08:40:18.603543 1 main.go:451] shutdownHandler sent cancel signal...
W1117 08:40:18.603896 *1 reflector.go:347] github.com/flannel-io/flannel/subnet/kube/kube.go:421: watch of v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
I1117 08:40:18.603935 1 watch.go:39] context canceled, close receiver chan
I1117 08:40:18.604014 1 vxlan_network.go:75] evts chan closed
I1117 08:40:18.604082 1 main.go:407] Exiting cleanly...

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

网上找了一下资料,好像跟内核模块 vxlan有关
grep -i vxlan /boot/config-5.14.0-70.30.1.el9_0.x86_64
CONFIG_OPENVSWITCH_VXLAN=m
CONFIG_VXLAN=m

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024
grep -i vxlan /boot/config-$(uname -r)

flannel-io/flannel#1028

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

还是没解决,唉。。。

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

重新编译内核模块,或者换cilium试试 ^.^

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

cilium我试了,也不行,换个系统吧

[root@vm118011 ~]# kubectl get po -A -o wide
NAMESPACE     NAME                               READY   STATUS                 RESTARTS        AGE   IP              NODE       NOMINATED NODE   READINESS GATES
kube-system   cilium-6ddmd                       0/1     CrashLoopBackOff       7 (3m20s ago)   22m   172.16.118.13   vm118013   <none>           <none>
kube-system   cilium-hnqsh                       0/1     CrashLoopBackOff       8 (4m49s ago)   22m   172.16.118.12   vm118012   <none>           <none>
kube-system   cilium-n5qx4                       0/1     CrashLoopBackOff       10 (110s ago)   22m   172.16.118.11   vm118011   <none>           <none>
kube-system   cilium-operator-5bd8b7c976-bq8g7   0/1     CrashLoopBackOff       7 (4m5s ago)    22m   172.16.118.12   vm118012   <none>           <none>
kube-system   cilium-operator-5bd8b7c976-glvk8   0/1     CrashLoopBackOff       7 (3m50s ago)   22m   172.16.118.13   vm118013   <none>           <none>
kube-system   hubble-relay-77b67649b5-bzlvn      0/1     CreateContainerError   4 (4m52s ago)   22m   10.244.0.188    vm118012   <none>           <none>
kube-system   hubble-ui-7ff788f7f4-ql42p         0/2     Error                  0               22m   10.244.0.48     vm118012   <none>           <none>

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

我不想用centos 8,我就想用最新的9.0系统

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

https://www.cnblogs.com/vyunc/p/16526849.html
看他部署的,好像没有问题

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

请使用Rocky Linux 9.0,而不是AlmaLinux 9.0
你可以通过git checkout v1.24 部署1.24版本的K8S

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

找到原因了,问题解决了!

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

啥原因?

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

cgroup v2

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

可以把解决办法贴上来,可以帮助别人解决相同的问题

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

vim /etc/containerd/config.toml
SystemdCgroup = true
systemctl restart containerd

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

好的,我看能不能优化一下containerd的配置,刚才测试,感觉可能是containerd的问题,还没来得及测试,👍

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

就是要判断一下,rhel 9系列的,默认是cgroup v2
https://zhuanlan.zhihu.com/p/555252832

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

嗯,之前测试过高版本cgroup v2的问题,一个人测试不过来,可能是漏了

from kubeasy.

ztdawang avatar ztdawang commented on July 27, 2024

一个人确实不容易,感谢您的付出,辛苦了!

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

可能是 /etc/systemd/system/containerd.service 中以下配置的影响,去掉之后就正常了

Slice=podruntime.slice

from kubeasy.

buxiaomo avatar buxiaomo commented on July 27, 2024

已修复

from kubeasy.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.