Coder Social home page Coder Social logo

[Release-1.28] - etcd snapshot controller thrashes on etcdsnapshotfile management when server is run with `--disable-agent` about k3s HOT 1 CLOSED

brandond avatar brandond commented on July 20, 2024
[Release-1.28] - etcd snapshot controller thrashes on etcdsnapshotfile management when server is run with `--disable-agent`

from k3s.

Comments (1)

aganesh-suse avatar aganesh-suse commented on July 20, 2024

Validated on release-1.28 branch with commit feb211d

Environment Details


  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"

$ uname -m

Cluster Configuration:

HA: 3 server/ 1 agent


token: xxxx
cluster-init: true
disable-agent: true
write-kubeconfig-mode: "0644"
- k3s-upgrade=server
etcd-snapshot-retention: 2
etcd-snapshot-schedule-cron: "* * * * *"
etcd-s3: true
etcd-s3-access-key: xxxx
etcd-s3-secret-key: xxxx
etcd-s3-bucket: xxxx
etcd-s3-folder: xxxx
etcd-s3-region: xxxx

Testing Steps

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  1. Install k3s
curl -sfL | sudo INSTALL_K3S_COMMIT='feb211d3ce0c41ee8d02dfc9164bb9c7dd97533c' sh -s - server
  1. Check the journal logs for reconciliation error messages:
$ sudo journalctl -xeu k3s | grep 'Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing'
$ sudo journalctl -xeu k3s | grep error | grep snapshot

Replication Results:

  • k3s version used for replication:
$ k3s -v
k3s version v1.28.8+k3s1 (653dd61a)
go version go1.21.8
 $ sudo journalctl -xeu k3s | grep 'Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing' 
Apr 12 18:16:01 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:16:01Z" level=debug msg="Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing"
Apr 12 18:16:34 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:16:34Z" level=debug msg="Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing"
Apr 12 18:17:05 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:17:05Z" level=debug msg="Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing"
Apr 12 18:17:37 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:17:37Z" level=debug msg="Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing"
 $ sudo journalctl -xeu k3s | grep error | grep snapshot 
Apr 12 18:15:42 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:42Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945693-fa6b85': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on \"local-on-demand-ip-172-31-16-179-1712945693-fa6b85\": StorageError: invalid object, Code: 4, Key: /registry/, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 589f2cfb-de3c-48ca-b937-edf509c99b29, UID in object meta: , requeuing"
Apr 12 18:15:43 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:43Z" level=error msg="Failed to record snapshots for cluster: nodes \"ip-172-31-16-179\" not found"
Apr 12 18:15:44 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:44Z" level=error msg="error syncing 'local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on \"local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac\": StorageError: invalid object, Code: 4, Key: /registry/, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 9fae5bc1-2147-48c1-9779-aee5305ed898, UID in object meta: , requeuing"
Apr 12 18:15:46 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:46Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945693-fa6b85': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on \"local-on-demand-ip-172-31-16-179-1712945693-fa6b85\": StorageError: invalid object, Code: 4, Key: /registry/, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 27b49a1f-5b6e-40fa-8797-19ccd0e55fe4, UID in object meta: , requeuing"
Apr 12 18:15:47 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:47Z" level=error msg="Failed to record snapshots for cluster: nodes \"ip-172-31-16-179\" not found"
Apr 12 18:15:48 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:48Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945685-42d628': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on \"local-on-demand-ip-172-31-16-179-1712945685-42d628\": StorageError: invalid object, Code: 4, Key: /registry/, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 20905e61-ae7f-41c9-b695-c4f361a47500, UID in object meta: , requeuing"
Apr 12 18:15:50 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:50Z" level=error msg="error syncing 'local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on \"local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac\": StorageError: invalid object, Code: 4, Key: /registry/, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: bf4466e5-2ed4-4f8a-aa4d-2ef1aec15419, UID in object meta: , requeuing"
Apr 12 18:15:51 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:51Z" level=error msg="Failed to record snapshots for cluster: nodes \"ip-172-31-16-179\" not found"
Apr 12 18:15:52 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:52Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945693-fa6b85': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on \"local-on-demand-ip-172-31-16-179-1712945693-fa6b85\": StorageError: invalid object, Code: 4, Key: /registry/, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: cfcc2ed3-d048-4d59-a676-490f12ee1961, UID in object meta: , requeuing"
Apr 12 18:15:54 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:54Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945685-42d628': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on \"local-on-demand-ip-172-31-16-179-1712945685-42d628\": StorageError: invalid object, Code: 4, Key: /registry/, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: e9ffec4f-ed1a-41dd-a3bd-665657646d79, UID in object meta: , requeuing"
Apr 12 18:15:55 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:55Z" level=error msg="Failed to record snapshots for cluster: nodes \"ip-172-31-16-179\" not found"
Apr 12 18:15:56 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:56Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945685-42d628': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on \"local-on-demand-ip-172-31-16-179-1712945685-42d628\": StorageError: invalid object, Code: 4, Key: /registry/, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: a30b91e5-e9ab-46a6-a436-8267a4f98b85, UID in object meta: , requeuing"
Apr 12 18:15:57 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:57Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945693-fa6b85': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on \"local-on-demand-ip-172-31-16-179-1712945693-fa6b85\": StorageError: invalid object, Code: 4, Key: /registry/, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: f2dc8516-49fb-4522-a183-d1995104e316, UID in object meta: , requeuing"
Apr 12 18:15:58 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:58Z" level=error msg="error syncing 'local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on \"local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac\": StorageError: invalid object, Code: 4, Key: /registry/, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 816f76c6-3937-4f53-bcc9-4439731529bf, UID in object meta: , requeuing"
Apr 12 18:15:58 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:58Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945685-42d628': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on \"local-on-demand-ip-172-31-16-179-1712945685-42d628\": StorageError: invalid object, Code: 4, Key: /registry/, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 87fdb54e-1d91-43cb-8948-353c31236080, UID in object meta: , requeuing"
Apr 12 18:15:58 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:58Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945693-fa6b85': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on \"local-on-demand-ip-172-31-16-179-1712945693-fa6b85\": StorageError: invalid object, Code: 4, Key: /registry/, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: cf28b383-3d46-49fe-b63d-609df1517dee, UID in object meta: , requeuing"
Apr 12 18:15:59 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:59Z" level=error msg="error syncing 'local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on \"local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac\": StorageError: invalid object, Code: 4, Key: /registry/, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 3526385f-d60b-4c9c-9ccf-f656e8781a61, UID in object meta: , requeuing"

Validation Results:

  • k3s version used for validation:
$ k3s -v
k3s version v1.28.8+k3s-feb211d3 (feb211d3)
go version go1.21.8
 $ sudo journalctl -xeu k3s | grep 'Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing' 
 $ sudo journalctl -xeu k3s | grep error | grep snapshot 
Apr 12 17:04:39 ip-172-31-26-137 k3s[2555]: time="2024-04-12T17:04:39Z" level=debug msg="Error encountered attempting to retrieve extra metadata from k3s-etcd-snapshot-extra-metadata ConfigMap, error: configmaps \"k3s-etcd-snapshot-extra-metadata\" not found"

from k3s.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.