Comments (29)
Yeah it's the rpm for sure, it even claims ownership of it, not sure how this was missed. Regardless given the complaints about cloud providers requiring /etc/kubernetes/cloud-config.json, I'm going to make both the pre-flight checks and reset stop assuming ownership of all of /etc/kubernetes and instead just clean out manifests, pki, and delete the admin/kubelet.conf files.
from release.
This was part of how kubeadm was designed to work, we need to configure kubelet before it can fully launch, services tend to start on install in Debian, and we didn't really want to have kubeadm execution go do things specific to systemd mid-way through. There was a lot of discussion but we landed at crash looping waiting for our config to appear. That is the behavior out of the box with the packages, if someone is rolling their own installation but trying to use kubeadm, they could indeed start hitting problems like this, but IMO we do kind of need to configure the system and systemd in specific ways for a kubeadm opinionated deployment, and the distro packages were a big part of that.
Perhaps we could do an additional pre-flight check ensuring kubelet is crash looping and not running however?
from release.
The current check (directory is not empty) is too broad, I've had to skip preflight checks because of it. A better approach would be to only check for the presence of files which kubeadm
generates, and ignore other files within these directories.
from release.
@dgoodwin I'm seeing this on CentOS also, I think it's kubelet and not the packages... We should probably allow empty directory for the time being.
@pesho I see what you are saying, but I don't see a good reason to do this right now, as files we manage tend to evolve.
from release.
@marun hit this, we thought it was newer versions of rpm causing the difference in Fedora where the problem was surfacing, and CentOS where I was not.
Agreed we probably should be more tolerant in pre-flight checks, given a point raised on slack today that cloud-config.json by default must live in /etc/kubernetes, this too causes a problem, so I'm tempted to agree with @pesho that we should be checking more specifically for the files and directories we will write.
@errordeveloper can you paste output of rpm --version
and cat /etc/redhat-release
for me?
from release.
Will try to look into this first thing tomorrow.
from release.
@errordeveloper I can reproduce but it does appear to be the rpm, and not kubelet. i.e. I can yum install, /etc/kubernetes/manifests appears, if I remove it and start kubelet it does not. Re-install rpms and it's back. All tests done with 1.5.0-1.alpha.1.409.714f816a349e79 from unstable.
from release.
@dgoodwin as you asked:
[vagrant@k8s1 ~]$ rpm --version
RPM version 4.11.3
[vagrant@k8s1 ~]$ cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[vagrant@k8s1 ~]$
from release.
@dgoodwin I also looked at kubelet code, and doesn't appear to call mkdir for the manifest path, it does for some other things, but not this.
from release.
SGTM
On Tue, 25 Oct 2016, 15:02 Devan Goodwin, [email protected] wrote:
Yeah it's the rpm for sure, it even claims ownership of it, not sure how
this was missed. Regardless given the complaints about cloud providers
requiring /etc/kubernetes/cloud-config.json, I'm going to make both the
pre-flight checks and reset stop assuming ownership of all of
/etc/kubernetes and instead just clean out manifests, pki, and delete the
admin/kubelet.conf files.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#171 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPWS-rVs0I651xKfw9vXHSl7fYntmdGks5q3gv6gaJpZM4Ke8OF
.
from release.
@dgoodwin Great if you fix the preflight checks, be sure to ping me on it when ready!
from release.
This will be fixed with kubernetes/kubernetes#35632
from release.
PR merged, this can be closed now.
from release.
Thanks
from release.
I'm currently working with a custom built master version of kubeadm and the pre-flight checks still fail for /var/lib/kubelet.
When doing a "kubeadm reset", the contents of /var/lib/kubelet disappear as expected.
"kubeadm init" then complains that kubelet is not running (it was running before reset) which requires me to start the kubelet service manually. After starting the kubelet service, the /var/lib/kubelet contains some empty directories and thus "kubeadm init" is failing again.
Should I open a new issue?
from release.
Just wait some hours, we will probably release the new version today which has these fixes in it
from release.
@luxas I'm using a custom built kubernetes master from today. Is the kubeadm tool developed in a different repository? As kubernetes/kubernetes#35632 got merged a few days ago, I'd expect the fix to be in my local build already?
from release.
Some additional info: I'm on CentOS 7.2. Kubelet is also from latest master.
Contents of /var/lib/kubelet after service start:
[root@ma-kub8ms0 devops]# find /var/lib/kubelet/
/var/lib/kubelet/
/var/lib/kubelet/pods
/var/lib/kubelet/plugins
from release.
/var/lib/kubelet was still expected to be empty, if crash looping kubelet is now creating those directories that will indeed still trigger a pre-flight check error.
However I cannot reproduce, I compiled kubelet off master this morning:
(root@centos1 ~) $ kubeadm reset
Running pre-flight checks
Stopping the kubelet service...
Unmounting directories in /var/lib/kubelet...
Deleting the stateful directories: [/var/lib/kubelet /var/lib/etcd /etc/kubernetes]
Stopping all running docker containers...
failed to stop the running containers
(root@centos1 ~) $ ll /var/lib/kubelet
ls: cannot access /var/lib/kubelet: No such file or directory
(root@centos1 ~) $ systemctl start kubelet
(root@centos1 ~) $ journalctl -fu kubelet
-- Logs begin at Tue 2016-11-01 08:14:10 EDT. --
Nov 01 08:16:04 centos1.aos.example.com systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 01 08:16:04 centos1.aos.example.com systemd[1]: Unit kubelet.service entered failed state.
Nov 01 08:16:04 centos1.aos.example.com systemd[1]: kubelet.service failed.
Nov 01 08:16:14 centos1.aos.example.com systemd[1]: kubelet.service holdoff time over, scheduling restart.
Nov 01 08:16:14 centos1.aos.example.com systemd[1]: Started Kubernetes Kubelet Server.
Nov 01 08:16:14 centos1.aos.example.com systemd[1]: Starting Kubernetes Kubelet Server...
Nov 01 08:16:14 centos1.aos.example.com kubelet[3742]: error: failed to run Kubelet: invalid kubeconfig: stat /etc/kubernetes/kubelet.conf: no such file or directory
Nov 01 08:16:14 centos1.aos.example.com systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 01 08:16:14 centos1.aos.example.com systemd[1]: Unit kubelet.service entered failed state.
Nov 01 08:16:14 centos1.aos.example.com systemd[1]: kubelet.service failed.
^C
(root@centos1 ~) $ ll /var/lib/kubelet
ls: cannot access /var/lib/kubelet: No such file or directory
Init then works fine.
Are you using the standard systemd config where we crash loop until kubeadm writes a config, or is your kubelet actually running?
from release.
kubelet was indeed running while I had these problems. At this time I did not have the kubeadm RPM package installed. Now I'm installing it before overwriting the kubeadm binary with my custom built binary.
After checking the systemd config as hinted by @dgoodwin I assume the missing /etc/systemd/system/kubelet.service.d/10-kubeadm.conf being the cause due to the RPM package not being installed. 10-kubeadm.conf makes the kubelet service crash loop, which makes the pre-flight checks work again.
Is the dependency on a crash looping kubelet service really a good thing? I can imagine other people also having problems figuring out whats wrong.
from release.
Additional pre-flight checks would have helped me. So, thumbs up :)
from release.
We just hit this (@captainshar) when trying to script ubuntu. Using latest packages (v1.5.1) kubeadm failed due to empty directories under /var/kubelet
. It is unclear what created those directories. No mention in the kubelet logs (it is crash looping as expected).
We will work around this by having the scripted stuff do a kubeadm reset
before running kubeadm init
. But it shouldn't be necessary.
We'll update if we get more details about what is going on and if we start seeing this more.
from release.
@jbeda It's a race condition with the debs when scripting the install it seems.
Somehow the kubelet service has just enough time to start the kubelet normally (/usr/sbin/kubelet
with no params), that makes kubelet create its directories before the kubeadm-specific service file kicks in an makes kubelet crashloop.
@dgoodwin Did you ever find a solution to this?
from release.
(What we're talking about now is not the same issue as above, but since people have chosen this thread to talk about, let's continue)
from release.
There is "race condition" when kubelet starts: Most of the times, it dies before doing anything because of the missing /etc/kubernetes/kubelet.conf (expected behaviour)
Jan 30 15:46:00 node3 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 30 15:46:00 node3 kubelet[21771]: I0130 15:46:00.455213 21771 feature_gate.go:181] feature gates: map[]
Jan 30 15:46:00 node3 kubelet[21771]: error: failed to run Kubelet: invalid kubeconfig: stat /etc/kubernetes/kubelet.conf: no such file or directory
Jan 30 15:46:00 node3 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Jan 30 15:46:00 node3 systemd[1]: kubelet.service: Unit entered failed state.
Jan 30 15:46:00 node3 systemd[1]: kubelet.service: Failed with result 'exit-code'.
But other times, it goes into more checking before dying:
Jan 30 11:24:02 node3 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.420597 3851 feature_gate.go:181] feature gates: map[]
Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.420652 3851 server.go:400] No API client: no api servers specified
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.494685 3851 docker.go:356] Connecting to docker on unix:///var/run/docker.sock
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.494703 3851 docker.go:376] Start docker client with request timeout=2m0s
Jan 30 11:24:02 node3 kubelet[3851]: E0130 11:24:02.499088 3851 cni.go:163] error updating cni config: No networks found in /etc/cni/net.d
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.504346 3851 manager.go:143] cAdvisor running in container: "/system.slice/kubelet.service"
Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.534432 3851 manager.go:151] unable to connect to Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp [::1]:15
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.535897 3851 fs.go:117] Filesystem partitions: map[/dev/vda1:{mountpoint:/var/lib/docker/aufs major:253 minor:1 fsType:ext
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.536966 3851 manager.go:198] Machine: {NumCores:4 CpuFrequency:3491914 MemoryCapacity:8371175424 MachineID:861e2114926fbf9
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.537199 3851 manager.go:204] Version: {KernelVersion:4.4.0-59-generic ContainerOsVersion:Ubuntu 16.04.1 LTS DockerVersion:
Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.541948 3851 container_manager_linux.go:205] Running with swap on is not supported, please disable swap! This will be a fa
Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.542056 3851 server.go:669] No api server defined - no events will be sent to API server.
Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.543399 3851 kubelet_network.go:69] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.543417 3851 kubelet.go:477] Hairpin mode set to "hairpin-veth"
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.545391 3851 docker_manager.go:257] Setting dockerRoot to /var/lib/docker
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.545400 3851 docker_manager.go:260] Setting cgroupDriver to cgroupfs
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.546041 3851 server.go:770] Started kubelet v1.5.2
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.546103 3851 server.go:123] Starting to listen on 0.0.0.0:10250
Jan 30 11:24:02 node3 kubelet[3851]: E0130 11:24:02.546261 3851 kubelet.go:1145] Image garbage collection failed: unable to find data for container /
Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.546547 3851 kubelet.go:1224] No api server defined - no node status update will be sent.
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.546918 3851 kubelet_node_status.go:204] Setting node annotation to enable volume controller attach/detach
Jan 30 11:24:02 node3 kubelet[3851]: E0130 11:24:02.547936 3851 kubelet.go:1634] Failed to check if disk space is available for the runtime: failed to get fs info for "runti
Jan 30 11:24:02 node3 kubelet[3851]: E0130 11:24:02.548080 3851 kubelet.go:1642] Failed to check if disk space is available on the root partition: failed to get fs info for
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.548794 3851 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.548942 3851 status_manager.go:125] Kubernetes client is nil, not starting status manager.
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.549079 3851 kubelet.go:1714] Starting kubelet main sync loop.
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.549505 3851 kubelet.go:1725] skipping pod synchronization - [container runtime is down]
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.548983 3851 volume_manager.go:242] Starting Kubelet Volume Manager
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.551250 3851 factory.go:295] Registering Docker factory
Jan 30 11:24:02 node3 kubelet[3851]: W0130 11:24:02.551270 3851 manager.go:247] Registration of the rkt container factory failed: unable to communicate with Rkt api service:
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.551277 3851 factory.go:54] Registering systemd factory
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.551414 3851 factory.go:86] Registering Raw factory
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.553917 3851 manager.go:1106] Started watching for new ooms in manager
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.554957 3851 oomparser.go:185] oomparser using systemd
Jan 30 11:24:02 node3 kubelet[3851]: I0130 11:24:02.555505 3851 manager.go:288] Starting recovery of all containers
Jan 30 11:24:02 node3 systemd[1]: Stopping kubelet: The Kubernetes Node Agent...
Jan 30 11:24:02 node3 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jan 30 11:24:02 node3 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 30 11:24:02 node3 kubelet[3897]: I0130 11:24:02.657147 3897 feature_gate.go:181] feature gates: map[]
Jan 30 11:24:02 node3 kubelet[3897]: error: failed to run Kubelet: invalid kubeconfig: stat /etc/kubernetes/kubelet.conf: no such file or directory
It's in this case, where it creates /var/lib/kubelet/{plugins | pods} dirs, causing kubeadm preflight checks to fail
from release.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an /lifecycle frozen
comment.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta
.
/lifecycle stale
from release.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta
.
/lifecycle rotten
/remove-lifecycle stale
from release.
@luxas should we keep this open? if so, does it still belong to this repo?
from release.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
from release.
Related Issues (20)
- Edited release notes will logically interfere with mapped values
- No worldwide location for legacy packages HOT 4
- Error install kubernates HOT 2
- Release Notes Collection Picks Up Incorrect Text HOT 7
- The repository 'https://pkgs.k8s.io kubernetes-xenial InRelease' is not signed HOT 5
- Dependency update - Golang 1.22.2/1.21.9 HOT 4
- Version 1.29.1 kubectl not found HOT 4
- Error while installing kubernetes "kubernetes-xenial Release' does not have a Release file." HOT 6
- Refactor `pkg/announce` package to adhere to the interfaces pattern HOT 2
- Cannot upgrade kubeadm to 1.30.0 via apt HOT 10
- Cannot find `kubernetes-cni` in repo HOT 7
- Error 403 from https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/1.30/rpm/repodata/repomd.xml HOT 5
- Presubmit job `pull-release-cluster-up` fails consistently HOT 1
- CVE-2024-2961 in `registry.k8s.io/build-image/distroless-iptables:v0.5.3` HOT 6
- Dependency update - Golang 1.22.3/1.21.10 HOT 2
- Add a file to get the current latest version. HOT 2
- Upgrade from 1.21 to 1.24 HOT 2
- debian packages for v1.28.9 not available HOT 5
- Dependency update - Golang 1.22.4/1.21.11 HOT 9
- `krel obs stage` proceeds with exit code zero if there's a failed package build HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from release.