Coder Social home page Coder Social logo

[Release-1.27] - etcd snapshot controller thrashes on etcdsnapshotfile management when server is run with `--disable-agent` about k3s HOT 1 CLOSED

brandond avatar brandond commented on July 20, 2024
[Release-1.27] - etcd snapshot controller thrashes on etcdsnapshotfile management when server is run with `--disable-agent`

from k3s.

Comments (1)

aganesh-suse avatar aganesh-suse commented on July 20, 2024

Validated on release-1.27 branch with commit 2d48b19

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA: 3 server/ 1 agent

Config.yaml:

token: xxxx
cluster-init: true
disable-agent: true
write-kubeconfig-mode: "0644"
node-external-ip: 1.1.1.1
node-label:
- k3s-upgrade=server
etcd-snapshot-retention: 2
etcd-snapshot-schedule-cron: "* * * * *"
etcd-s3: true
etcd-s3-access-key: xxxx
etcd-s3-secret-key: xxxx
etcd-s3-bucket: xxxx
etcd-s3-folder: xxxx
etcd-s3-region: xxxx

Testing Steps

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  1. Install k3s
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_COMMIT='2d48b19624efec5082f1864e64f3a13ca4124354' sh -s - server
  1. Check the journal logs for reconciliation error messages:
$ sudo journalctl -xeu k3s | grep 'Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing'
$ sudo journalctl -xeu k3s | grep error | grep snapshot

Replication Results:

  • k3s version used for replication:
$ k3s -v
k3s version v1.27.12+k3s1 (78ad5756)
go version go1.21.8
$ sudo journalctl -xeu k3s | grep 'Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing' 
Apr 12 01:44:41 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:44:41Z" level=debug msg="Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing"
Apr 12 01:45:13 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:45:13Z" level=debug msg="Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing"
Apr 12 01:45:44 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:45:44Z" level=debug msg="Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing"
Apr 12 01:46:17 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:46:17Z" level=debug msg="Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing"
 $ sudo journalctl -xeu k3s | grep error | grep snapshot 
Apr 12 01:44:30 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:44:30Z" level=error msg="Failed to record snapshots for cluster: nodes \"ip-172-31-16-97\" not found"
Apr 12 01:44:33 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:44:33Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-97-1712886237-c40c0f': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-97-1712886237-c40c0f\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-on-demand-ip-172-31-16-97-1712886237-c40c0f, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: d9e9ecf9-1d5d-4b1f-a4cb-72225dcc335a, UID in object meta: , requeuing"
Apr 12 01:44:34 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:44:34Z" level=error msg="Failed to delete ETCDSnapshotFile for non-etcd node ip-172-31-16-97: etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-97-1712886245-cac01b\" not found"
Apr 12 01:44:34 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:44:34Z" level=error msg="Failed to record snapshots for cluster: nodes \"ip-172-31-16-97\" not found"
Apr 12 01:44:37 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:44:37Z" level=error msg="error syncing 'local-etcd-snapshot-ip-172-31-16-97-1712886183-a12559': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-etcd-snapshot-ip-172-31-16-97-1712886183-a12559\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-etcd-snapshot-ip-172-31-16-97-1712886183-a12559, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 7d241bd7-5e6a-4872-83ee-9de4d29ae4bf, UID in object meta: , requeuing"
Apr 12 01:44:38 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:44:38Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-97-1712886253-3eaa6e': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-97-1712886253-3eaa6e\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-on-demand-ip-172-31-16-97-1712886253-3eaa6e, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 7d203a64-9484-4707-8549-83ce4a5000c4, UID in object meta: , requeuing"
Apr 12 01:44:38 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:44:38Z" level=error msg="Failed to delete ETCDSnapshotFile for non-etcd node ip-172-31-16-97: etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-97-1712886245-cac01b\" not found"
Apr 12 01:44:38 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:44:38Z" level=error msg="Failed to record snapshots for cluster: nodes \"ip-172-31-16-97\" not found"
Apr 12 01:44:40 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:44:40Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-97-1712886245-cac01b': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-97-1712886245-cac01b\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-on-demand-ip-172-31-16-97-1712886245-cac01b, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 89c4f52e-be79-46e6-affb-a0c143165f98, UID in object meta: , requeuing"
Apr 12 01:44:43 ip-172-31-16-97 k3s[15476]: time="2024-04-12T01:44:43Z" level=error msg="Failed to record snapshots for cluster: nodes \"ip-172-31-16-97\" not found"

Validation Results:

  • k3s version used for validation:
$ k3s -v
k3s version v1.27.12+k3s-2d48b196 (2d48b196)
go version go1.21.8
$ sudo journalctl -xeu k3s | grep 'Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing' 
 $ sudo journalctl -xeu k3s | grep error | grep snapshot 
Apr 11 23:41:01 ip-172-31-18-168 k3s[38207]: time="2024-04-11T23:41:01Z" level=debug msg="Error encountered attempting to retrieve extra metadata from k3s-etcd-snapshot-extra-metadata ConfigMap, error: configmaps \"k3s-etcd-snapshot-extra-metadata\" not found"

from k3s.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.