This breaks pre-flight checks. I only tested on Debian. Seems like the package create

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

SGTM On Tue, 25 Oct 2016, 15:02 Devan Goodwin, <a href="mailto:notif

Race condition when scripting kubeadm deb installs, kubelet initializes normally before it uses the kubeadm dropin,about kubernetes/release

Comments (29)

dgoodwin commented on August 28, 2024 1

Yeah it's the rpm for sure, it even claims ownership of it, not sure how this was missed. Regardless given the complaints about cloud providers requiring /etc/kubernetes/cloud-config.json, I'm going to make both the pre-flight checks and reset stop assuming ownership of all of /etc/kubernetes and instead just clean out manifests, pki, and delete the admin/kubelet.conf files.

from release.

dgoodwin commented on August 28, 2024 1

This was part of how kubeadm was designed to work, we need to configure kubelet before it can fully launch, services tend to start on install in Debian, and we didn't really want to have kubeadm execution go do things specific to systemd mid-way through. There was a lot of discussion but we landed at crash looping waiting for our config to appear. That is the behavior out of the box with the packages, if someone is rolling their own installation but trying to use kubeadm, they could indeed start hitting problems like this, but IMO we do kind of need to configure the system and systemd in specific ways for a kubeadm opinionated deployment, and the distro packages were a big part of that.

Perhaps we could do an additional pre-flight check ensuring kubelet is crash looping and not running however?

from release.

pesho commented on August 28, 2024

The current check (directory is not empty) is too broad, I've had to skip preflight checks because of it. A better approach would be to only check for the presence of files which kubeadm generates, and ignore other files within these directories.

from release.

errordeveloper commented on August 28, 2024

@dgoodwin I'm seeing this on CentOS also, I think it's kubelet and not the packages... We should probably allow empty directory for the time being.

@pesho I see what you are saying, but I don't see a good reason to do this right now, as files we manage tend to evolve.

from release.

dgoodwin commented on August 28, 2024

@marun hit this, we thought it was newer versions of rpm causing the difference in Fedora where the problem was surfacing, and CentOS where I was not.

Agreed we probably should be more tolerant in pre-flight checks, given a point raised on slack today that cloud-config.json by default must live in /etc/kubernetes, this too causes a problem, so I'm tempted to agree with @pesho that we should be checking more specifically for the files and directories we will write.

@errordeveloper can you paste output of rpm --version and cat /etc/redhat-release for me?

from release.

dgoodwin commented on August 28, 2024

Will try to look into this first thing tomorrow.

from release.

dgoodwin commented on August 28, 2024

@errordeveloper I can reproduce but it does appear to be the rpm, and not kubelet. i.e. I can yum install, /etc/kubernetes/manifests appears, if I remove it and start kubelet it does not. Re-install rpms and it's back. All tests done with 1.5.0-1.alpha.1.409.714f816a349e79 from unstable.

from release.

errordeveloper commented on August 28, 2024

@dgoodwin as you asked:

[vagrant@k8s1 ~]$ rpm --version
RPM version 4.11.3
[vagrant@k8s1 ~]$ cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core) 
[vagrant@k8s1 ~]$

from release.

errordeveloper commented on August 28, 2024

@dgoodwin I also looked at kubelet code, and doesn't appear to call mkdir for the manifest path, it does for some other things, but not this.

from release.

errordeveloper commented on August 28, 2024

SGTM

On Tue, 25 Oct 2016, 15:02 Devan Goodwin, [email protected] wrote:

Yeah it's the rpm for sure, it even claims ownership of it, not sure how
this was missed. Regardless given the complaints about cloud providers
requiring /etc/kubernetes/cloud-config.json, I'm going to make both the
pre-flight checks and reset stop assuming ownership of all of
/etc/kubernetes and instead just clean out manifests, pki, and delete the
admin/kubelet.conf files.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#171 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPWS-rVs0I651xKfw9vXHSl7fYntmdGks5q3gv6gaJpZM4Ke8OF
.

from release.

luxas commented on August 28, 2024

@dgoodwin Great if you fix the preflight checks, be sure to ping me on it when ready!

from release.

luxas commented on August 28, 2024

This will be fixed with kubernetes/kubernetes#35632

from release.

dgoodwin commented on August 28, 2024

PR merged, this can be closed now.

from release.

luxas commented on August 28, 2024

Thanks

from release.

codablock commented on August 28, 2024

I'm currently working with a custom built master version of kubeadm and the pre-flight checks still fail for /var/lib/kubelet.

When doing a "kubeadm reset", the contents of /var/lib/kubelet disappear as expected.
"kubeadm init" then complains that kubelet is not running (it was running before reset) which requires me to start the kubelet service manually. After starting the kubelet service, the /var/lib/kubelet contains some empty directories and thus "kubeadm init" is failing again.

Should I open a new issue?

from release.

luxas commented on August 28, 2024

Just wait some hours, we will probably release the new version today which has these fixes in it

from release.

codablock commented on August 28, 2024

@luxas I'm using a custom built kubernetes master from today. Is the kubeadm tool developed in a different repository? As kubernetes/kubernetes#35632 got merged a few days ago, I'd expect the fix to be in my local build already?

from release.

codablock commented on August 28, 2024

Some additional info: I'm on CentOS 7.2. Kubelet is also from latest master.

Contents of /var/lib/kubelet after service start:
[root@ma-kub8ms0 devops]# find /var/lib/kubelet/
/var/lib/kubelet/
/var/lib/kubelet/pods
/var/lib/kubelet/plugins

from release.

dgoodwin commented on August 28, 2024

/var/lib/kubelet was still expected to be empty, if crash looping kubelet is now creating those directories that will indeed still trigger a pre-flight check error.

However I cannot reproduce, I compiled kubelet off master this morning:

(root@centos1 ~) $ kubeadm reset                          
Running pre-flight checks
Stopping the kubelet service...
Unmounting directories in /var/lib/kubelet...
Deleting the stateful directories: [/var/lib/kubelet /var/lib/etcd /etc/kubernetes]
Stopping all running docker containers...
failed to stop the running containers
(root@centos1 ~) $ ll /var/lib/kubelet
ls: cannot access /var/lib/kubelet: No such file or directory
(root@centos1 ~) $ systemctl start kubelet
(root@centos1 ~) $ journalctl -fu kubelet
-- Logs begin at Tue 2016-11-01 08:14:10 EDT. --
Nov 01 08:16:04 centos1.aos.example.com systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 01 08:16:04 centos1.aos.example.com systemd[1]: Unit kubelet.service entered failed state.
Nov 01 08:16:04 centos1.aos.example.com systemd[1]: kubelet.service failed.
Nov 01 08:16:14 centos1.aos.example.com systemd[1]: kubelet.service holdoff time over, scheduling restart.
Nov 01 08:16:14 centos1.aos.example.com systemd[1]: Started Kubernetes Kubelet Server.
Nov 01 08:16:14 centos1.aos.example.com systemd[1]: Starting Kubernetes Kubelet Server...
Nov 01 08:16:14 centos1.aos.example.com kubelet[3742]: error: failed to run Kubelet: invalid kubeconfig: stat /etc/kubernetes/kubelet.conf: no such file or directory
Nov 01 08:16:14 centos1.aos.example.com systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 01 08:16:14 centos1.aos.example.com systemd[1]: Unit kubelet.service entered failed state.
Nov 01 08:16:14 centos1.aos.example.com systemd[1]: kubelet.service failed.
^C
(root@centos1 ~) $ ll /var/lib/kubelet      
ls: cannot access /var/lib/kubelet: No such file or directory

Init then works fine.

Are you using the standard systemd config where we crash loop until kubeadm writes a config, or is your kubelet actually running?

from release.

codablock commented on August 28, 2024

kubelet was indeed running while I had these problems. At this time I did not have the kubeadm RPM package installed. Now I'm installing it before overwriting the kubeadm binary with my custom built binary.

After checking the systemd config as hinted by @dgoodwin I assume the missing /etc/systemd/system/kubelet.service.d/10-kubeadm.conf being the cause due to the RPM package not being installed. 10-kubeadm.conf makes the kubelet service crash loop, which makes the pre-flight checks work again.

Is the dependency on a crash looping kubelet service really a good thing? I can imagine other people also having problems figuring out whats wrong.

from release.

codablock commented on August 28, 2024

Additional pre-flight checks would have helped me. So, thumbs up :)

from release.

jbeda commented on August 28, 2024

We just hit this (@captainshar) when trying to script ubuntu. Using latest packages (v1.5.1) kubeadm failed due to empty directories under /var/kubelet. It is unclear what created those directories. No mention in the kubelet logs (it is crash looping as expected).

We will work around this by having the scripted stuff do a kubeadm reset before running kubeadm init. But it shouldn't be necessary.

We'll update if we get more details about what is going on and if we start seeing this more.

from release.

luxas commented on August 28, 2024

@jbeda It's a race condition with the debs when scripting the install it seems.
Somehow the kubelet service has just enough time to start the kubelet normally (/usr/sbin/kubelet with no params), that makes kubelet create its directories before the kubeadm-specific service file kicks in an makes kubelet crashloop.

@dgoodwin Did you ever find a solution to this?

from release.

luxas commented on August 28, 2024

(What we're talking about now is not the same issue as above, but since people have chosen this thread to talk about, let's continue)

from release.

GheRivero commented on August 28, 2024

There is "race condition" when kubelet starts: Most of the times, it dies before doing anything because of the missing /etc/kubernetes/kubelet.conf (expected behaviour)

Jan 30 15:46:00 node3 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 30 15:46:00 node3 kubelet[21771]: I0130 15:46:00.455213 21771 feature_gate.go:181] feature gates: map[]
Jan 30 15:46:00 node3 kubelet[21771]: error: failed to run Kubelet: invalid kubeconfig: stat /etc/kubernetes/kubelet.conf: no such file or directory
Jan 30 15:46:00 node3 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Jan 30 15:46:00 node3 systemd[1]: kubelet.service: Unit entered failed state.
Jan 30 15:46:00 node3 systemd[1]: kubelet.service: Failed with result 'exit-code'.

But other times, it goes into more checking before Jan 30 11:24:02 node3 systemd[1]: Started kubelet: Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.420597 Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.420652 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.494685 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.494703 Jan 30 11:24:02 node3 kubelet[3851]: E0130 11:24:02.499088 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.504346 Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.534432 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.535897 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.536966 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.537199 Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.541948 Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.542056 Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.543399 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.543417 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.545391 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.545400 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.546041 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.546103 Jan 30 11:24:02 node3 kubelet[3851]: E0130 11:24:02.546261 Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.546547 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.546918 Jan 30 11:24:02 node3 kubelet[3851]: E0130 11:24:02.547936 Jan 30 11:24:02 node3 kubelet[3851]: E0130 11:24:02.548080 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.548794 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.548942 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.549079 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.549505 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.548983 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.551250 Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.551270 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.551277 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.551414 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.553917 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.554957 Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.555505 Jan 30 11:24:02 node3 systemd[1]: Stopping kubelet: Jan 30 11:24:02 node3 systemd[1]: Stopped kubelet: Jan 30 11:24:02 node3 systemd[1]: Started kubelet: Jan 30 11:24:02 node3 kubelet[3897]: I0130 11:24:02.657147 Jan 30 11:24:02 node3 kubelet[3897]: error: failed dying:
The Kubernetes Node Agent.
3851 feature_gate.go:181] feature gates: map[]
3851 server.go:400] No API client: no api servers specified
3851 docker.go:356] Connecting to docker on unix:///var/run/docker.sock
3851 docker.go:376] Start docker client with request timeout=2m0s
3851 cni.go:163] error updating cni config: No networks found in /etc/cni/net.d
3851 manager.go:143] cAdvisor running in container: "/system.slice/kubelet.service"
3851 manager.go:151] unable to connect to Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp [::1]:15
3851 fs.go:117] Filesystem partitions: map[/dev/vda1:{mountpoint:/var/lib/docker/aufs major:253 minor:1 fsType:ext
3851 manager.go:198] Machine: {NumCores:4 CpuFrequency:3491914 MemoryCapacity:8371175424 MachineID:861e2114926fbf9
3851 manager.go:204] Version: {KernelVersion:4.4.0-59-generic ContainerOsVersion:Ubuntu 16.04.1 LTS DockerVersion:
3851 container_manager_linux.go:205] Running with swap on is not supported, please disable swap! This will be a fa
3851 server.go:669] No api server defined - no events will be sent to API server.
3851 kubelet_network.go:69] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "
3851 kubelet.go:477] Hairpin mode set to "hairpin-veth"
3851 docker_manager.go:257] Setting dockerRoot to /var/lib/docker
3851 docker_manager.go:260] Setting cgroupDriver to cgroupfs
3851 server.go:770] Started kubelet v1.5.2
3851 server.go:123] Starting to listen on 0.0.0.0:10250
3851 kubelet.go:1145] Image garbage collection failed: unable to find data for container /
3851 kubelet.go:1224] No api server defined - no node status update will be sent.
3851 kubelet_node_status.go:204] Setting node annotation to enable volume controller attach/detach
3851 kubelet.go:1634] Failed to check if disk space is available for the runtime: failed to get fs info for "runti
3851 kubelet.go:1642] Failed to check if disk space is available on the root partition: failed to get fs info for
3851 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
3851 status_manager.go:125] Kubernetes client is nil, not starting status manager.
3851 kubelet.go:1714] Starting kubelet main sync loop.
3851 kubelet.go:1725] skipping pod synchronization - [container runtime is down]
3851 volume_manager.go:242] Starting Kubelet Volume Manager
3851 factory.go:295] Registering Docker factory
3851 manager.go:247] Registration of the rkt container factory failed: unable to communicate with Rkt api service:
3851 factory.go:54] Registering systemd factory
3851 factory.go:86] Registering Raw factory
3851 manager.go:1106] Started watching for new ooms in manager
3851 oomparser.go:185] oomparser using systemd
3851 manager.go:288] Starting recovery of all containers
The Kubernetes Node Agent...
The Kubernetes Node Agent.
The Kubernetes Node Agent.
3897 feature_gate.go:181] feature gates: map[]
to run Kubelet: invalid kubeconfig: stat /etc/kubernetes/kubelet.conf: no such file or directory

It's in this case, where it creates /var/lib/kubelet/{plugins | pods} dirs, causing kubeadm preflight checks to fail

from release.

fejta-bot commented on August 28, 2024

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

from release.

fejta-bot commented on August 28, 2024

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

from release.

errordeveloper commented on August 28, 2024

@luxas should we keep this open? if so, does it still belong to this repo?

from release.

fejta-bot commented on August 28, 2024

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

from release.

Race condition when scripting kubeadm deb installs, kubelet initializes normally before it uses the kubeadm dropin about release HOT 29 CLOSED

Comments (29)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent