kubernetes-sigs / cli-utils Goto Github PK
View Code? Open in Web Editor NEWThis repo contains binaries that built from libraries in cli-runtime.
License: Apache License 2.0
This repo contains binaries that built from libraries in cli-runtime.
License: Apache License 2.0
WaitTask currently ignores failed applies and failed prunes, but doesn't ignore resources skipped by filters in the ApplyTask or WaitTask.
Because WaitTasks are only added when there are dependencies (depends-on, CRDs, namespaces, etc.), this hasn't been much of a problem yet, but when using a sufficiently large/complex set of resources, it's feasible for the WaitTask to wait until timeout and fail because it's waiting for one of the following:
Suggested fix:
When calling the Applier consecutively in a controller, sometimes we need the Applier to skip ignore some resources in the set. When the resource exists in the cluster, it skips applying for it. When the resource doesn't exist in the cluster, it skips creating it.
Change the Applier.Run from
func (a *Applier) Run(ctx context.Context, objects []*unstructured.Unstructured,
options Options) <-chan event.Event {}
to
func (a *Applier) Run(ctx context.Context, inventory inventory.Inventory, objects []*unstructured.Unstructured,
options Options) <-chan event.Event {}
I am using Rancher as part of my continuous deployment. AFAIU rancher uses kstatus while trying to figure what is the current deployment(s) status and then reports back to its UI.
However, it seems like that the following lines are triggered although all deployments are in a Running state.
cli-utils/pkg/kstatus/status/core.go
Lines 120 to 123 in 8200fe5
My current assumption is that because there are more than 1 deployments within the package that is being installed, then kstatus sees N deployments and N-1 successes because one of them wasn't changed at all and therefore didn't trigger an update.
I was wondering if this could be a bug, and what would be the best way to verify that?
Thank you
Shaked
In cmd/main.go we should use https://pkg.go.dev/os/signal#NotifyContext to handle SIGTERM and SIGINT properly vs just exiting abruptly.
We currently require that resource references to namespaced resource contains the namespace (i.e. config.kubernetes.io/depends-on: apps/namespaces/default/Deployment/wordpress-mysql
). This can become a bit awkward if we have a set of resources that doesn't have the namespace set, but relies on it being resolved from the kube context. We could consider allowing the namespace to be omitted, but resolve it to the same namespace as the resource in which the annotation is set.
As a new user to this code base (through kpt & Config Sync), I was initially really confused about the difference between destroy, prune, and delete. And as a contributor, I have a hard time documenting/explaining the waiting behavior after deletion.
Solution:
Both of these changes would be interface changes, for options and events, but I think the improved clarity will be worth it.
Destroy is also another synonym for delete, but in this context it means to delete a whole inventory, so we don't need to change that. I think it's easy to describe how destroy is different from delete/prune/elimination.
Currently we consider a PVC to be Current when its phase becomes Bound
. But this means that if a PVC ends up in an earlier group than the Deployment/Statefulset that uses the PVC (because of mutations or depends-on), then it will not reach Current status.
Alternatively, we can try to determine which workloads is using the PVC, and then in the solver make sure that they always end up in the same group.
The Reconciling
condition seems to encode whether the controller is currently still attempting to reconcile the resource (at its current observed generation), i.e. Reconciling==False
(done), Reconiling !=False
(potentially not done), and doesn't say anything about success/failure. The "Stalled" terminology seems to have some overlap with this, where Stalled==True
(done and failed), Stalled==False
(not done and/or not failed). It seems normalizing to have one condition signal "done / not done reconciling" and another to signal "error(s) encountered" could be:
Stalled==True && Reconciling==True
seems to be invalid.For example instead of "Stalled" it could be "ErrorEncountered" or just "Error". The condition Reason/Message could be used to give details of the error(s). Ideally multiple errors could be signaled independently via adding some metadata to conditions, e.g. error: true
or severity: error
as mentioned in kubernetes/community#4521 (comment), but that may be more difficult to gain traction on.
Currently, the baserunner handles context cancellation, but the only task it can currently interrupt is WaitTask, which it does by calling private methods (eww).
As a user who uses the global context for cancellation/timeout, I want to be able to interrupt the applier/destoryer and have it exit quickly, without finishing the ApplyTask/PruneTask.
Solution:
In some distant SIG-CLI meeting a long time ago (https://docs.google.com/document/d/1r0YElcXt6G5mOWxwZiXgGu_X6he3F--wKwg-9UBc29I/edit#heading=h.cok3ckor4epp) it was dicussed to bring cli-utils
into krew. As far as I remember, it was one of several options to make these tools public.
I wonder what the plans are for these utils, as the development has somewhat abated. Maybe it would be a good idea to get some user input, which is something where krew could help. WDYT?
cc @ahmetb
Status printing shows extra output for apply
and preview
commands.
To reproduce (example) for preview
:
$ kapply preview ~/testdata/testdata1
configmap/inventory-1e5824fb configured (preview)
namespace/test-namespace configured (preview)
pod/pod-a configured (preview)
pod/pod-b configured (preview)
pod/pod-c configured (preview)
5 resource(s) applied. 0 created, 0 unchanged, 5 configured (preview)
pod/pod-a is Current: Pod is Ready (preview)
pod/pod-b is Current: Pod is Ready (preview)
pod/pod-c is Current: Pod is Ready (preview)
configmap/inventory-1e5824fb is Current: Resource is always ready (preview)
namespace/test-namespace is Current: Resource is current (preview)
0 resource(s) pruned (preview)
Some of the lines of output are extraneous. The same happens for the apply
command.
I was looking at #482 and it ocurred to me - can someone craft an annotation to pull e.g. contents of a Secret
from another namespace? If there are users of cli-utils that use it in a sensitive context, this feature could create a security issue for them. E.g. cli-utils could be used by a privileged process, that accepts input from users and applies it to the cluster. It may perform some checks to ensure it's ok to apply the resources (e.g. perhaps it validates all namespace fields are set to something valid/restricted). This new feature may be circumventing the restrictions in place.
When the set of applied resources contains a CRD and a CR of that new kind, applier seem to get stuck in a loop, re-trying. I think it's waiting for the rest mapper to be updated with the new mapping from the CRD, but that never happens because nothing invalidates the rest mapper.
I wonder if it's a bug/missing feature or I'm just not using the library correctly?
FYI this is related to kubernetes/kubernetes#64391
taskrunner.NewTaskContext
creates a new TaskChannel
, but never closes it.
The tests don't seem to care, but it means you can't listen on the TaskChannel until it is closed, including for tr := range taskContext.TaskChannel() {
, which makes usage of the TaskChannel need some other mechanism to stop listening.
In general, the function or goroutine that sends or manages sending things to the channel should close it as a signal that no more will be sent. This implies that the TaskRunner should close the TaskChannel, since it manages the tasks that send events to the shared channel.
A follow up of #281, we can update the printer to give more information based on new fields in Apply/Prune/DeleteEvent.
Hi 👋
We're using kpt live preview
for PR feedback purposes. We recently ran into a resource-naming issue where the name was not matching the expected Kubernetes regex for metadata.name. This would've been caught earlier if preview supported server-side dry-run.
It'd be great if the server-side dry-run for preview could echo the semantics of kubectl apply --dry-run=server
.
My guess is that implementing something like this would require 1) update of kubectl dependency and 2) adding a server-side dry-run to preview that delegates to the server-side dry-run strategy in the updated kubectl dependency. Does that sound about right?
I'd be willing to give to give this a try, but I'm not familiar with how to update dependencies (i.e. using go get
locally adds a ton of new transitive dependencies).
This function was added in #316, we need to add unit tests for it.
In #143, we taught diff and its sibling commands to interpret an absent directory argument as implying that it should read from standard input. The common.processPaths
function detects this case and synthesizes a single argument: the hyphen. However, diff then fails later when it tries to inspect a file named -:
% echo '
apiVersion: v1
kind: Namespace
metadata:
name: another
---
apiVersion: v1
kind: ConfigMap
metadata:
name: something
namespace: another
data:
key: value
---
' | cat - inventory-template.yaml | kpt live diff
error: lstat -: no such file or directory
Note that error at the end calling lstat on -. I think that's happening inside of kio.LocalPackageReader.Read
as it attempts to walk the path _-, called from common.ExpandPackageDir
.
Note that kpt live preview and kpt live apply seem to work well enough in that same position on the command line.
After resources as []*unstructured.Unstructured
are passed to the Applier, each resource already contains the inventory-id in its annotations. The apply/prune function need to check if the current operation can go through by comparing the inventory-id in the passed object and that in the cluster object. When the inventory-id from two places matches, the operation can go through. To force all the operation, we add a flag InventoryPolicy to control it. This is needed for an inventory object to take ownership from another inventory object.
Updating the inventory (sending inventory events) before exiting should make the inventory more up to date when applier.Run
is cancelled (ex: from global timeout) or errors (ex: from poller error).
If needed, this feature could be optional with an UpdateInventoryOnError option, or similar.
/cc @haiyanmeng @Liujingfang1 @janetkuo (Config Sync stakeholders)
Running kapply init .
produces a grouping object with a label with the form cli-utils.sigs.k8s.io/inventory-id: .-167716
. Labels are not allowed to have values that start with .
, so this causes kapply apply .
to fail.
Add an option AbortOnErrors
in the ApplyOption. When it is false
, the apply doesn't return on an error immediately. Instead, it should try apply all resources. For each resource, the returned event should capture the error if any.
Delete and Prune are used to mean the same thing. I think we should combine all the prune code into the delete code. This would reduce complexity and code to maintain. AFAICT their behavior is identical.
For apply/prune event for a single object, we can include the duration for applying/pruning that object.
Hi!
I'm curious about the reasoning behind
cli-utils/pkg/kstatus/status/core.go
Lines 488 to 490 in 535f781
it will have the Current status while the job is running and after it has been completed successfully
.
Why isn't InProgress
the returned status for a job that is currently in-progress? Wouldn't that be more intuitive if you want to poll until the job has either succeeded (status: Current
) or failed (status: Failed
)?
Attempting to use [email protected] (the latest release at time of writing) with k8s.io/[email protected] results in the following:
$ go mod tidy
go: finding module for package sigs.k8s.io/cli-utils/pkg/manifestreader
go: found sigs.k8s.io/cli-utils/pkg/manifestreader in sigs.k8s.io/cli-utils v0.25.0
go: finding module for package k8s.io/api/batch/v2alpha1
gke-internal.googlesource.com/addon-manager-v2/pkg/prune tested by
gke-internal.googlesource.com/addon-manager-v2/pkg/prune.test imports
sigs.k8s.io/cli-utils/pkg/manifestreader imports
k8s.io/kubectl/pkg/cmd/util imports
k8s.io/kubectl/pkg/scheme imports
k8s.io/api/batch/v2alpha1: module k8s.io/api@latest found (v0.22.2), but does not contain package k8s.io/api/batch/v2alpha1
The master branch has a go.mod configured to use 1.21 libraries, but the latest release is still configured for 1.20 libraries. (Is the Github Actions bot broken?)
Running go get sigs.k8s.io/cli-utils@master
fixes the build.
Expected behavior: Building a project with cli-utils@latest and k8s.io/api@latest succeeds.
I am working on integrating cli-utils with different projects, specifically the applier seems to be very useful. However recently I have hit an issue where we have objects with different namespaces in the same package supplied to applier, and application of resources doesn't even start because of namespace validation here: https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/apply/applier.go#L188-L190
It seems like this limitation has a purpose, however reading the code I couldn't find any reason why that limitation is in place.
Maybe we should reconsider how we use the applier, but before that, It would be really great if I could understand why the limitation is there in a first place, this way we don't go sideways with the architecture and design that is behind cli-utils or misuse the library.
Or maybe the limitation is not need there, any input would be greatly appreciated, thanks in advance.
PS: I am not sure that the issues
is the correct place to discuss this, so please close it if it's a wrong place, and I will try to reach out some other way, thanks in advance.
The Applier and Destroyer each have WaitTasks used to wait until objects are Current (after apply) or NotFound (after prune/delete), but this behavior is a confusing "wait for some but not others" that is hard to describe and not relative to configuration, but rather the types of resources being applied together.
Then in addition to that inconsistency, WaitTasks use DeletePropagationBackground
by default, which doesn't wait until child objects are deleted before deleting the parent object. So things like ReplicaSets and Pods may still exist after a Deployment is confirmed to be NotFound.
As a user of cli-utils, I would rather that the applier/destroyer either waited for every resource to be reconciled/deleted or not wait at all.
In the context of kpt and Config Sync, I think the default behavior should be to wait forever until cancelled by the caller (they can implement their own global timeout using context.WithTimeout
or other cancellation using context.WithCancel
), now that both the applier and Destroyer accept context as input:
I also think the default should be changed to DeletePropagationForeground
so that the applier/destroyer don't exit until deletes are fully propagated.
make test
failing on master.
Error:
--- FAIL: TestApplier (0.16s)
--- FAIL: TestApplier/apply_resource_with_existing_object_belonging_to_different_inventory (0.02s)
applier_test.go:639:
Error Trace: applier_test.go:639
Error: Not equal:
expected: "StatusEventResourceUpdate"
actual : "StatusEventCompleted"
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-StatusEventResourceUpdate
+StatusEventCompleted
Test: TestApplier/apply_resource_with_existing_object_belonging_to_different_inventory
applier_test.go:639:
Error Trace: applier_test.go:639
Error: Not equal:
expected: "StatusEventCompleted"
actual : "StatusEventResourceUpdate"
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-StatusEventCompleted
+StatusEventResourceUpdate
Test: TestApplier/apply_resource_with_existing_object_belonging_to_different_inventory
FAIL
coverage: 53.5% of statements
FAIL sigs.k8s.io/cli-utils/pkg/apply 0.406s
https://github.com/GoogleContainerTools/kpt/blob/master/docs/ROADMAP.md#4-live-apply says there are some issues with ConfigMap
as an inventory object. Are they documented somewhere? I've found kptdev/kpt#708, but there is no explanation.
As a cli-utils
user, I'd like to avoid any issues with ConfigMap
and use the ResourceGroup
CR. I wonder if it would be possible to make it part of cli-utils
?
This repo contains cli-commands which includes some rather complex output formats. I think going forward we want this project to focus primarily on providing a library for actuation, rather than a cli tool. Having some simple cli commands are useful for development, but I think we should consider dropping support for the table format output and make the kapply
cli tool just a thin wrapper around the applier/destroyer.
Hi! I'm thinking of starting to use this project and I have some questions.
Thanks. If it works well for me (which I think it will, based on what I see in the code), I'd like to start contributing to the project.
p.s. thanks a lot for building this stuff as a reusable library!
Resources:
Output (from kpt using cli-utils master):
$ kpt live apply
namespace/test unchanged
1 resource(s) applied. 0 created, 1 unchanged, 0 configured, 0 failed
pod/pod-a created
pod/pod-b apply failed: failed to mutate "test_pod-b__Pod" with "ApplyTimeMutator": failed to read field ($.status.podIP) from source resource (/namespaces/test/Pod/pod-a): expected 1 match, but found 0)
2 resource(s) applied. 1 created, 0 unchanged, 0 configured, 1 failed
error: 1 resources failed
Problem:
The applier/destroyer pipelines add Apply/Prune tasks with Wait tasks, waiting for each apply/prune group to become Current (after apply) or NotFound (after prune). This behavior coupled with complex resource sets with multiple dependency branches has the following impact:
For example, here's an infra dependency chain with branches (just deploying two GKEs with a CC cluster):
If any cluster fails to apply (ex: error in config), then both node pools are blocked.
Another example is just using the same apply for multiple namespaces or CRDs and namespace:
If any CRD or any namespace fails to apply (ex: blocked by policy webhook or config error), then all the deployments and everything in the namespaces are blocked.
We are looking to use this library to understand the status of various k8s objects. I noticed that this library reports status as Current
when a Job is in progress.
ref:
cli-utils/pkg/kstatus/status/core.go
Lines 532 to 535 in a051c61
I think this should be reported as InProgress
so that it is possible to determine when a job is complete.
Creating a resource with underscores using kpt live apply
will lead to a config map that breaks all actions with that inventory.
How to reproduce
$ mkdir kubernetes && kpt live init kubernetes
$ cat <<EOF > kubernetes/ksa.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: my_service_account
EOF
$ kpt live apply kubernetes
which gives the following output:
configmap/inventory-eca1922 created
Fatal error: error when creating "kubernetes/ksa.yaml": ServiceAccount "my_service_account" is invalid: metadata.name: Invalid value: "my_service_account": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
The inventory is created before the file is rejected by the Kubernetes resource name constraint. The inventory data looks like this (with some generated fields removed for brevity):
apiVersion: v1
kind: ConfigMap
metadata:
name: inventory-eca1922
data:
default_my_service_account__ServiceAccount: ""
Let's try to create a valid resource instead.
$ cat <<EOF > kubernetes/ksa.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-service-account
EOF
$ kpt live apply kubernetes
with the following output:
configmap/inventory-d6583e75 created
serviceaccount/my-service-account created
2 resource(s) applied. 2 created, 0 unchanged, 0 configured
Fatal error: unable to decode inventory: default_my_service_account__ServiceAccount
Now the inventory is tainted, as any action targeting the config map will fail with "unable to decode inventory". The solution for now is to manually delete the offending data point in the config map.
Prune can erroneously delete applied resources under the following conditions:
Steps to reproduce:
$ tree ~/testdata/testdata-namespace-1
/home/sean/testdata/testdata-namespace-1
├── inventory-template.yaml
├── Kptfile
├── namespace.yaml
├── pod-a.yaml
├── pod-b.yaml
└── pod-c.yaml
$ cat ~/testdata/testdata-namespace-1/inventory-template.yaml
apiVersion: v1
kind: ConfigMap
metadata:
namespace: default
name: inventory-47026972
labels:
cli-utils.sigs.k8s.io/inventory-id: 283ff5a8-1e74-4b59-85cd-a09e59b7c066
$ kapply apply ~/testdata/testdata-namespace-1
namespace/test-namespace created
pod/pod-a created
pod/pod-b created
pod/pod-c created
4 resource(s) applied. 4 created, 0 unchanged, 0 configured
0 resource(s) pruned, 0 skipped
$ rm ~/testdata/testdata-namespace-1/namespace.yaml
$ tree ~/testdata/testdata-namespace-1
/home/sean/testdata/testdata-namespace-1
├── inventory-template.yaml
├── Kptfile
├── pod-a.yaml
├── pod-b.yaml
└── pod-c.yaml
0 directories, 5 files
$ kapply apply ~/testdata/testdata-namespace-1
pod/pod-a unchanged
pod/pod-b unchanged
pod/pod-c unchanged
3 resource(s) applied. 0 created, 3 unchanged, 0 configured
namespace/test-namespace pruned
1 resource(s) pruned, 0 skipped
$ kubectl get po -n test-namespace
No resources found in test-namespace namespace.
The preprocess function was added in #316, we need to add unit tests for it.
/assign @karlkfi
It would be very useful if Applier could be used as library with custom polling interface by, for example, allowing to inject it via constructor as an option, or by making statusPoller field public in here, or making a setter for it.
We have custom wait logic for numerous CRDs, from different projects, that we have no control over. And default Status readers, really can't handle properly waiting for those CRDs to have desired state, since these CRDs don't follow recommended status conditions, as in described in kpt documentation.
So we annotate these CRDs with special annotation, which contains a logic of how to check this status, for example:
metadata:
annotations:
airshipit.org/status-check: |
[
{
"status": "failed",
"condition": "@.status.state==\"failed\""
},
{
"status": "current",
"condition": "@.status.state==\"current\""
}
]
Based on this we can implement our own StatusReader for these CRDs, and inject it into poller interface in here and add this poller interface to applier.
I'm trying to build a small CLI to solve the problem of waiting on kubernetes resources to reach they desired state, for example when deploying a new version of an application to a cluster.
I saw that the kstatus
code lives here and in the kustomize repo as well. Can someone shed a light on which one is the officially maintained code? Are they exactly the same code? Which one is recommended to import in a third party project?
Thanks in advance! 🙇
/assign @karlkfi
Since the apply process is changed to continue on error. We should also update the wait tasks. For any resources that are failed during apply/prune, they should be skipped in the wait tasks. Otherwise, the tasks would wait for conditions that are not going to happen.
In kstatus the poller sends an event type with the event to indicate if an error has occurred or a value has been updated etc.
The following is stated about the event type CompletedEvent
.
cli-utils/pkg/kstatus/polling/event/event.go
Lines 21 to 23 in 5a53649
As I understand it this means that when all of the resource that are being polled have reached the CurrentStatus
, the CompletedEvent
will be sent and the channel is closed. However when my tests with polling a single Deployment has resulted in only receiving ResourceUpdateEvent
even after the Deployment has reached a ready state.
Is documentation incorrect or have I misunderstood the expected behavior.
Currently, StatusPoller does not have a way to inject custom StatusReaders when running Poll
. This is problematic for downstream projects making use of CRDs and having the desire to supply custom readers to kstatus. One of the issues this causes is described in fluxcd/kustomize-controller#387.
Besides the main apply function
Applier.Run(Context, InventoryInfo, []*Unstructured, option)
we also need the function to apply a single resource.
Applier.DetachResourceFromInventory(Context, InventoryInfo, *Unstructured)
It removes the owning-inventory
annotation from the resource and applies it.
It also removes the identifier for that resource from the Inventory list.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.