kubernetes-sigs / cli-utils Goto Github PK

View Code? Open in Web Editor NEW

152.0 152.0 76.0 2.57 MB

This repo contains binaries that built from libraries in cli-runtime.

License: Apache License 2.0

Shell 0.53% Go 99.24% Makefile 0.24%

k8s-sig-cli

cli-utils's People

Contributors

Stargazers

Watchers

Forkers

liujingfang1 barney-s hongsenliu mortent seans3 phanimarupaka verb knverey monopole pwittrock criahu kubematic fsommar jijiew ash2k haiyanmeng raffo niksko srt32 adrian anshlykov shell32-natsu ashutosh16 yjuns isabella232 dulltz irvifa tiffanny29631 etefera sfowl seaneagan arxiv-research kmodules godpeny raliev12 makkes atosatto justinsb joelsmith karlkfi lukeweber sdowell squaremo rquitales rohitagarwal003 karim7262 hiddeco apelisse mengqiy pramine jbadru73 adestis-bm jashandeep-sohi chunglu-chou gautierdelorme wleese ktaf xinnywinne ajayk manfredlift yashsingh74 simontheleg pixelsoccupied liangyuanpeng acorn-io nan-yu arikkfir oistein fluxcd tatehanawalt levenleven ykakarap floreks

cli-utils's Issues

Make sure grouping object is applied first

Bug: WaitTask waits for filtered resources

WaitTask currently ignores failed applies and failed prunes, but doesn't ignore resources skipped by filters in the ApplyTask or WaitTask.

Because WaitTasks are only added when there are dependencies (depends-on, CRDs, namespaces, etc.), this hasn't been much of a problem yet, but when using a sufficiently large/complex set of resources, it's feasible for the WaitTask to wait until timeout and fail because it's waiting for one of the following:

[deletion prevention, inventory policy, local namespace] Expecting a delete that was skipped to cause that object to become NotFound
[inventory policy] Expecting an apply that was skipped to cause the object (previously different or broken) to become Current

Suggested fix:

Add tracking of skipped deletions and applies to the TaskContext (similar to the existing deletion tracking)
Update WaitTask to ignore skipped deletions and applies

Feature Request: allow apply to ignore some resources

When calling the Applier consecutively in a controller, sometimes we need the Applier to skip ignore some resources in the set. When the resource exists in the cluster, it skips applying for it. When the resource doesn't exist in the cluster, it skips creating it.

@haiyanmeng

change the Apply interface to use inventory.Inventory instead of unstructured.Unstructured

Change the Applier.Run from

func (a *Applier) Run(ctx context.Context, objects []*unstructured.Unstructured,
options Options) <-chan event.Event {}

func (a *Applier) Run(ctx context.Context, inventory inventory.Inventory, objects []*unstructured.Unstructured,
options Options) <-chan event.Event {}

kstatus reports that a deployment is in progress, while that is not the case

I am using Rancher as part of my continuous deployment. AFAIU rancher uses kstatus while trying to figure what is the current deployment(s) status and then reports back to its UI.

However, it seems like that the following lines are triggered although all deployments are in a Running state.

cli-utils/pkg/kstatus/status/core.go

Lines 120 to 123 in 8200fe5

    
           if partition != -1 { 
        
           	if updatedReplicas < (specReplicas - partition) { 
        
           		message := fmt.Sprintf("updated: %d/%d", updatedReplicas, specReplicas-partition) 
        
           		return newInProgressStatus("PartitionRollout", message), nil

My current assumption is that because there are more than 1 deployments within the package that is being installed, then kstatus sees N deployments and N-1 successes because one of them wasn't changed at all and therefore didn't trigger an update.

I was wondering if this could be a bug, and what would be the best way to verify that?

Thank you
Shaked

Feature Request: Make kapply handle sigint and sigterm gracefully

In cmd/main.go we should use https://pkg.go.dev/os/signal#NotifyContext to handle SIGTERM and SIGINT properly vs just exiting abruptly.

Feature Request: Allow resource references to omit namespace if referencing a resource in same namespace as itself

We currently require that resource references to namespaced resource contains the namespace (i.e. config.kubernetes.io/depends-on: apps/namespaces/default/Deployment/wordpress-mysql). This can become a bit awkward if we have a set of resources that doesn't have the namespace set, but relies on it being resolved from the kube context. We could consider allowing the namespace to be omitted, but resolve it to the same namespace as the resource in which the annotation is set.

Tech Debt: Rename prune/delete timeout to reconcile timeout

As a new user to this code base (through kpt & Config Sync), I was initially really confused about the difference between destroy, prune, and delete. And as a contributor, I have a hard time documenting/explaining the waiting behavior after deletion.

Solution:

Rename prune to "delete". This will make the applier & destroyer more consistent and remove ambiguity between the two.
Rename prune/delete timeout to "elimination timeout". This will parallel nicely with "reconciliation timeout", I think.

Both of these changes would be interface changes, for options and events, but I think the improved clarity will be worth it.

Destroy is also another synonym for delete, but in this context it means to delete a whole inventory, so we don't need to change that. I think it's easy to describe how destroy is different from delete/prune/elimination.

/cc @seans3 @mortent

Bug: PersistentVolumeClaim reconciliation blocks until usage by another resource

Currently we consider a PVC to be Current when its phase becomes Bound. But this means that if a PVC ends up in an earlier group than the Deployment/Statefulset that uses the PVC (because of mutations or depends-on), then it will not reach Current status.
Alternatively, we can try to determine which workloads is using the PVC, and then in the solver make sure that they always end up in the same group.

Reconciling vs Stalled overlap

The Reconciling condition seems to encode whether the controller is currently still attempting to reconcile the resource (at its current observed generation), i.e. Reconciling==False (done), Reconiling !=False (potentially not done), and doesn't say anything about success/failure. The "Stalled" terminology seems to have some overlap with this, where Stalled==True (done and failed), Stalled==False (not done and/or not failed). It seems normalizing to have one condition signal "done / not done reconciling" and another to signal "error(s) encountered" could be:

more expressive, currently the "errors encountered, but not done trying" state cannot be expressed. This could be used to display helpful information during wait operations as to current progress / lack thereof, which can be used by a human to cancel a doomed wait operation, or by a client which times out to provide more information for debugging.
less confusing / error prone, currently Stalled==True && Reconciling==True seems to be invalid.

For example instead of "Stalled" it could be "ErrorEncountered" or just "Error". The condition Reason/Message could be used to give details of the error(s). Ideally multiple errors could be signaled independently via adding some metadata to conditions, e.g. error: true or severity: error as mentioned in kubernetes/community#4521 (comment), but that may be more difficult to gain traction on.

Feature Request: Allow cancellation of running applier/destroyer tasks

Currently, the baserunner handles context cancellation, but the only task it can currently interrupt is WaitTask, which it does by calling private methods (eww).

As a user who uses the global context for cancellation/timeout, I want to be able to interrupt the applier/destoryer and have it exit quickly, without finishing the ApplyTask/PruneTask.

Solution:

Pass context to tasks by adding it to the TaskContext
Update all tasks to handle context.Done() and exit early
Update all tasks to pass the context to all network calls, so they also can be cancelled (most already take a context)
Add a Task method to handle status events, so WaitTask can exit early.

Distribute cli-utils via krew

In some distant SIG-CLI meeting a long time ago (https://docs.google.com/document/d/1r0YElcXt6G5mOWxwZiXgGu_X6he3F--wKwg-9UBc29I/edit#heading=h.cok3ckor4epp) it was dicussed to bring cli-utils into krew. As far as I remember, it was one of several options to make these tools public.

I wonder what the plans are for these utils, as the development has somewhat abated. Maybe it would be a good idea to get some user input, which is something where krew could help. WDYT?

cc @ahmetb

Status printing shows extra output for `apply` and `preview`

Status printing shows extra output for apply and preview commands.

To reproduce (example) for preview:

$ kapply preview ~/testdata/testdata1
configmap/inventory-1e5824fb configured (preview)
namespace/test-namespace configured (preview)
pod/pod-a configured (preview)
pod/pod-b configured (preview)
pod/pod-c configured (preview)
5 resource(s) applied. 0 created, 0 unchanged, 5 configured (preview)
pod/pod-a is Current: Pod is Ready (preview)
pod/pod-b is Current: Pod is Ready (preview)
pod/pod-c is Current: Pod is Ready (preview)
configmap/inventory-1e5824fb is Current: Resource is always ready (preview)
namespace/test-namespace is Current: Resource is current (preview)
0 resource(s) pruned (preview)

Some of the lines of output are extraneous. The same happens for the apply command.

What is the security model of apply-time mutation feature?

I was looking at #482 and it ocurred to me - can someone craft an annotation to pull e.g. contents of a Secret from another namespace? If there are users of cli-utils that use it in a sensitive context, this feature could create a security issue for them. E.g. cli-utils could be used by a privileged process, that accepts input from users and applies it to the cluster. It may perform some checks to ensure it's ok to apply the resources (e.g. perhaps it validates all namespace fields are set to something valid/restricted). This new feature may be circumventing the restrictions in place.

Bug: apply sets that include both CRD and CR fail when not using a caching RestMapper

When the set of applied resources contains a CRD and a CR of that new kind, applier seem to get stuck in a loop, re-trying. I think it's waiting for the rest mapper to be updated with the new mapping from the CRD, but that never happens because nothing invalidates the rest mapper.

I wonder if it's a bug/missing feature or I'm just not using the library correctly?

FYI this is related to kubernetes/kubernetes#64391

TechDebt: TaskContext.TaskChannel is never closed

taskrunner.NewTaskContext creates a new TaskChannel, but never closes it.

The tests don't seem to care, but it means you can't listen on the TaskChannel until it is closed, including for tr := range taskContext.TaskChannel() {, which makes usage of the TaskChannel need some other mechanism to stop listening.

In general, the function or goroutine that sends or manages sending things to the channel should close it as a signal that no more will be sent. This implies that the TaskRunner should close the TaskChannel, since it manages the tasks that send events to the shared channel.

Add suffix (Preview) to preview command output

update printer to use the new information added in apply/prune/deleteEvent

A follow up of #281, we can update the printer to give more information based on new fields in Apply/Prune/DeleteEvent.

Server-side dry-run support for live preview

Hi 👋

We're using kpt live preview for PR feedback purposes. We recently ran into a resource-naming issue where the name was not matching the expected Kubernetes regex for metadata.name. This would've been caught earlier if preview supported server-side dry-run.

It'd be great if the server-side dry-run for preview could echo the semantics of kubectl apply --dry-run=server.

My guess is that implementing something like this would require 1) update of kubectl dependency and 2) adding a server-side dry-run to preview that delegates to the server-side dry-run strategy in the updated kubectl dependency. Does that sound about right?

I'd be willing to give to give this a try, but I'm not familiar with how to update dependencies (i.e. using go get locally adds a ton of new transitive dependencies).

chore: add unit test for InventoryClient: UpdateLabels

This function was added in #316, we need to add unit tests for it.

"diff" command doesn't understand that hyphen means standard input

In #143, we taught diff and its sibling commands to interpret an absent directory argument as implying that it should read from standard input. The common.processPaths function detects this case and synthesizes a single argument: the hyphen. However, diff then fails later when it tries to inspect a file named -:

% echo '
apiVersion: v1
kind: Namespace
metadata:
  name: another
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: something
  namespace: another
data:
  key: value
---
' | cat - inventory-template.yaml | kpt live diff
error: lstat -: no such file or directory

Note that error at the end calling lstat on -. I think that's happening inside of kio.LocalPackageReader.Read as it attempts to walk the path _-, called from common.ExpandPackageDir.

Note that kpt live preview and kpt live apply seem to work well enough in that same position on the command line.

add check for if apply/prune can go through based on inventory id

After resources as []*unstructured.Unstructured are passed to the Applier, each resource already contains the inventory-id in its annotations. The apply/prune function need to check if the current operation can go through by comparing the inventory-id in the passed object and that in the cluster object. When the inventory-id from two places matches, the operation can go through. To force all the operation, we add a flag InventoryPolicy to control it. This is needed for an inventory object to take ownership from another inventory object.

Feature Request: Update inventory before exiting, even on error, cancel, or timeout.

Updating the inventory (sending inventory events) before exiting should make the inventory more up to date when applier.Run is cancelled (ex: from global timeout) or errors (ex: from poller error).

If needed, this feature could be optional with an UpdateInventoryOnError option, or similar.

/cc @haiyanmeng @Liujingfang1 @janetkuo (Config Sync stakeholders)

Specifying current directory as argument to init produces invalid resource

Running kapply init . produces a grouping object with a label with the form cli-utils.sigs.k8s.io/inventory-id: .-167716. Labels are not allowed to have values that start with ., so this causes kapply apply . to fail.

Tech Debt: Simplify unit tests by using UnstructuredSet & ObjMetadataSet with AssertEqual deep comparison

Follow up after #419 and #426

/assign @karlkfi

add options to continue applying when encountering on errors

Add an option AbortOnErrors in the ApplyOption. When it is false, the apply doesn't return on an error immediately. Instead, it should try apply all resources. For each resource, the returned event should capture the error if any.

TechDebt: Merge Prune & Delete code & names

Delete and Prune are used to mean the same thing. I think we should combine all the prune code into the delete code. This would reduce complexity and code to maintain. AFAICT their behavior is identical.

Feature Request: include apply/prune duration in events

For apply/prune event for a single object, we can include the duration for applying/pruning that object.

Status for Job in-progress

Hi!
I'm curious about the reasoning behind

cli-utils/pkg/kstatus/status/core.go

Lines 488 to 490 in 535f781

    
           // A job will have the InProgress status until it starts running. Then it will have the Current 
        
           // status while the job is running and after it has been completed successfully. It 
        
           // will have the Failed status if it the job has failed.

, specifically it will have the Current status while the job is running and after it has been completed successfully.

Why isn't InProgress the returned status for a job that is currently in-progress? Wouldn't that be more intuitive if you want to poll until the job has either succeeded (status: Current) or failed (status: Failed)?

Feature Request: Upgrade to use the kubernetes 1.22 client libraries

Attempting to use [email protected] (the latest release at time of writing) with k8s.io/[email protected] results in the following:

 $ go mod tidy
go: finding module for package sigs.k8s.io/cli-utils/pkg/manifestreader
go: found sigs.k8s.io/cli-utils/pkg/manifestreader in sigs.k8s.io/cli-utils v0.25.0
go: finding module for package k8s.io/api/batch/v2alpha1
gke-internal.googlesource.com/addon-manager-v2/pkg/prune tested by
        gke-internal.googlesource.com/addon-manager-v2/pkg/prune.test imports
        sigs.k8s.io/cli-utils/pkg/manifestreader imports
        k8s.io/kubectl/pkg/cmd/util imports
        k8s.io/kubectl/pkg/scheme imports
        k8s.io/api/batch/v2alpha1: module k8s.io/api@latest found (v0.22.2), but does not contain package k8s.io/api/batch/v2alpha1

The master branch has a go.mod configured to use 1.21 libraries, but the latest release is still configured for 1.20 libraries. (Is the Github Actions bot broken?)

Running go get sigs.k8s.io/cli-utils@master fixes the build.

Expected behavior: Building a project with cli-utils@latest and k8s.io/api@latest succeeds.

Applier single namespace limitation

I am working on integrating cli-utils with different projects, specifically the applier seems to be very useful. However recently I have hit an issue where we have objects with different namespaces in the same package supplied to applier, and application of resources doesn't even start because of namespace validation here: https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/apply/applier.go#L188-L190

It seems like this limitation has a purpose, however reading the code I couldn't find any reason why that limitation is in place.

Maybe we should reconsider how we use the applier, but before that, It would be really great if I could understand why the limitation is there in a first place, this way we don't go sideways with the architecture and design that is behind cli-utils or misuse the library.

Or maybe the limitation is not need there, any input would be greatly appreciated, thanks in advance.

PS: I am not sure that the issues is the correct place to discuss this, so please close it if it's a wrong place, and I will try to reach out some other way, thanks in advance.

Change default behavior to always wait for reconciliation & deletion confirmation

The Applier and Destroyer each have WaitTasks used to wait until objects are Current (after apply) or NotFound (after prune/delete), but this behavior is a confusing "wait for some but not others" that is hard to describe and not relative to configuration, but rather the types of resources being applied together.

ReconcileTimeout, PruneTimeout, and DeleteTimeout have no explicit default value, which means their implicit default is 0.
- https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/apply/applier.go#L282
- https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/apply/destroyer.go#L80
But then the default of 0 is ignored by the Solver and 1 minute is used instead:
- https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/apply/solver/solver.go#L38
But the default of 1 minutes is only used if numObjSets > 1, otherwise the WaitTask is skipped.
- https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/apply/solver/solver.go#L274
This causes the WaitEvent to be skipped:

Then in addition to that inconsistency, WaitTasks use DeletePropagationBackground by default, which doesn't wait until child objects are deleted before deleting the parent object. So things like ReplicaSets and Pods may still exist after a Deployment is confirmed to be NotFound.

https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/apply/applier.go#L286-L288

As a user of cli-utils, I would rather that the applier/destroyer either waited for every resource to be reconciled/deleted or not wait at all.

In the context of kpt and Config Sync, I think the default behavior should be to wait forever until cancelled by the caller (they can implement their own global timeout using context.WithTimeout or other cancellation using context.WithCancel), now that both the applier and Destroyer accept context as input:

I also think the default should be changed to DeletePropagationForeground so that the applier/destroyer don't exit until deletes are fully propagated.

applier_test failing, expecting StatusEventResourceUpdate, got StatusEventCompleted

make test failing on master.

Error:

--- FAIL: TestApplier (0.16s)
    --- FAIL: TestApplier/apply_resource_with_existing_object_belonging_to_different_inventory (0.02s)
        applier_test.go:639: 
                Error Trace:    applier_test.go:639
                Error:          Not equal: 
                                expected: "StatusEventResourceUpdate"
                                actual  : "StatusEventCompleted"
                            
                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1 +1 @@
                                -StatusEventResourceUpdate
                                +StatusEventCompleted
                Test:           TestApplier/apply_resource_with_existing_object_belonging_to_different_inventory
        applier_test.go:639: 
                Error Trace:    applier_test.go:639
                Error:          Not equal: 
                                expected: "StatusEventCompleted"
                                actual  : "StatusEventResourceUpdate"
                            
                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1 +1 @@
                                -StatusEventCompleted
                                +StatusEventResourceUpdate
                Test:           TestApplier/apply_resource_with_existing_object_belonging_to_different_inventory
FAIL
coverage: 53.5% of statements
FAIL    sigs.k8s.io/cli-utils/pkg/apply 0.406s

Feature Request: Move ResourceGroup CRD from kpt to cli-utils

https://github.com/GoogleContainerTools/kpt/blob/master/docs/ROADMAP.md#4-live-apply says there are some issues with ConfigMap as an inventory object. Are they documented somewhere? I've found kptdev/kpt#708, but there is no explanation.

As a cli-utils user, I'd like to avoid any issues with ConfigMap and use the ResourceGroup CR. I wonder if it would be possible to make it part of cli-utils?

Consider removing advanced command line features from project

This repo contains cli-commands which includes some rather complex output formats. I think going forward we want this project to focus primarily on providing a library for actuation, rather than a cli tool. Having some simple cli commands are useful for development, but I think we should consider dropping support for the table format output and make the kapply cli tool just a thin wrapper around the applier/destroyer.

Roadmap?

Hi! I'm thinking of starting to use this project and I have some questions.

Is there a KEP? I don't see anything https://github.com/kubernetes/enhancements/tree/master/keps/sig-cli
Is there a roadmap document? Any documents? I only found some notes in the meeting agenda doc.
Anything I should be aware of?
This repo depends on kubectl. Shouldn't it be the other way around (i.e. binary depends on library) - perhaps it's a temporary state of things?

Thanks. If it works well for me (which I think it will, based on what I see in the code), I'd like to start contributing to the project.

p.s. thanks a lot for building this stuff as a reusable library!

Bug: implicit namespace resolution in apply-time-mutation causes dependency sorting failure

Resources:

namespace test
pod-a
- implicit dep: namespace test
pob-b
- implicit dep: namespace test
- explicit dep: apply-time mutation with pod-a as source

Output (from kpt using cli-utils master):

$ kpt live apply
namespace/test unchanged
1 resource(s) applied. 0 created, 1 unchanged, 0 configured, 0 failed
pod/pod-a created
pod/pod-b apply failed: failed to mutate "test_pod-b__Pod" with "ApplyTimeMutator": failed to read field ($.status.podIP) from source resource (/namespaces/test/Pod/pod-a): expected 1 match, but found 0)
2 resource(s) applied. 1 created, 0 unchanged, 0 configured, 1 failed
error: 1 resources failed

Problem:

pod-b didn't get sorted into a 3rd ApplyTask. So when it applied, pod-a wasn't reconciled yet.

Bug: dependency waiting blocks too aggressively

Problems

The applier/destroyer pipelines add Apply/Prune tasks with Wait tasks, waiting for each apply/prune group to become Current (after apply) or NotFound (after prune). This behavior coupled with complex resource sets with multiple dependency branches has the following impact:

apply/prune can be blocked for some object even if their dependencies are ready
blocking waits for the slowest reconciliation in the previous apply phase, even if it doesn't time out
because wait timeout currently causes the pipeline to terminate, any object that cause wait timeout blocks all objects in subsequent phases from being applied/pruned

Example 1

For example, here's an infra dependency chain with branches (just deploying two GKEs with a CC cluster):

namespace
- rbac
- GKE cluster1
  - node pool1 (depends-on cluster1)
- GKE cluster2
  - node pool2 (depends-on cluster2)

If any cluster fails to apply (ex: error in config), then both node pools are blocked.

Example 2

Another example is just using the same apply for multiple namespaces or CRDs and namespace:

CRD
namespace1
- deployment1
namespace2
- deployment2

If any CRD or any namespace fails to apply (ex: blocked by policy webhook or config error), then all the deployments and everything in the namespaces are blocked.

Possible solutions

Continue on wait timeout

This helps with problem 1 and 3, but not problem 2, and has the consequence of making failure take longer, because all resources will be applied, even if we know their dependency isn't ready.

Dependency filter

This would help remediate the side effect of "continue on wait timeout" by skipping apply/prune for objects with unreconciled dependencies
This would need to be used on both apply and prune
This would need to share logic with the graph sorter, which currently handles identifying dependencies (depends-on, apply-time-mutation, CRDs, and namespaces)

Parallel apply

Applying objects in parallel during each Apply Task (and deleting in parallel for prune tasks) would speed up e2e apply time significantly, helping to mitigate dependency cost, but not actually solving either problem 1 or problem 2

Async (graph) apply

Building a branching pipeline, instead of flattening into a single synchronous pipeline would help isolate dependencies to only block things that depend on them.
This is probably the best solution, but also the most complex and risky.
This would probably require changing how events are tested/used. Consuming them in a single for loop might not be the best strategy any more; async listeners might be better.
This would make both Dependency filter and Parallel apply obsolete.
This might also make Continue on wait timeout obsolete, because there might be more than one task branch executing and one could terminate even if the others continue.
This is the only proposal that solve problem 2, which is going to become a bigger problem as soon as people start using depends-on and apply-time-mutation on infra resources (like GKE clusters) that take a long time to reconcile.
This solution might make it easier to add support for advanced features like blocking on Job completion or lifecycle directives (ex: delete before apply)

Feature Request: Wait for Jobs to complete

We are looking to use this library to understand the status of various k8s objects. I noticed that this library reports status as Current when a Job is in progress.

ref:

cli-utils/pkg/kstatus/status/core.go

Lines 532 to 535 in a051c61

    
           	Status:     CurrentStatus, 
        
           	Message:    fmt.Sprintf("Job in progress. success:%d, active: %d, failed: %d", succeeded, active, failed), 
        
           	Conditions: []Condition{}, 
        
           }, nil

I think this should be reported as InProgress so that it is possible to determine when a job is complete.

Resource names with underscores lead to tainted inventory

Creating a resource with underscores using kpt live apply will lead to a config map that breaks all actions with that inventory.

How to reproduce

$ mkdir kubernetes && kpt live init kubernetes
$ cat <<EOF > kubernetes/ksa.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my_service_account
EOF
$ kpt live apply kubernetes

which gives the following output:

configmap/inventory-eca1922 created

Fatal error: error when creating "kubernetes/ksa.yaml": ServiceAccount "my_service_account" is invalid: metadata.name: Invalid value: "my_service_account": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

The inventory is created before the file is rejected by the Kubernetes resource name constraint. The inventory data looks like this (with some generated fields removed for brevity):

apiVersion: v1
kind: ConfigMap
metadata:
  name: inventory-eca1922
data:
  default_my_service_account__ServiceAccount: ""

Let's try to create a valid resource instead.

$ cat <<EOF > kubernetes/ksa.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-service-account
EOF
$ kpt live apply kubernetes

with the following output:

configmap/inventory-d6583e75 created
serviceaccount/my-service-account created
2 resource(s) applied. 2 created, 0 unchanged, 0 configured

Fatal error: unable to decode inventory: default_my_service_account__ServiceAccount

Now the inventory is tainted, as any action targeting the config map will fail with "unable to decode inventory". The solution for now is to manually delete the offending data point in the config map.

Prune can delete all resources if inventory namespace is omitted

Prune can erroneously delete applied resources under the following conditions:

The inventory object is in a separate namespace than applied resources (e.g. inventory object in default namespace)
One of the previously applied resources was the namespace that resources live in
The namespace is omitted in a subsequent apply (it will be erroneously pruned), deleting all resources in the namespace.

Steps to reproduce:

$ tree ~/testdata/testdata-namespace-1
/home/sean/testdata/testdata-namespace-1
├── inventory-template.yaml
├── Kptfile
├── namespace.yaml
├── pod-a.yaml
├── pod-b.yaml
└── pod-c.yaml

$ cat ~/testdata/testdata-namespace-1/inventory-template.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: default
  name: inventory-47026972
  labels:
    cli-utils.sigs.k8s.io/inventory-id: 283ff5a8-1e74-4b59-85cd-a09e59b7c066

$ kapply apply ~/testdata/testdata-namespace-1
namespace/test-namespace created
pod/pod-a created
pod/pod-b created
pod/pod-c created
4 resource(s) applied. 4 created, 0 unchanged, 0 configured
0 resource(s) pruned, 0 skipped

$ rm ~/testdata/testdata-namespace-1/namespace.yaml

$ tree ~/testdata/testdata-namespace-1
/home/sean/testdata/testdata-namespace-1
├── inventory-template.yaml
├── Kptfile
├── pod-a.yaml
├── pod-b.yaml
└── pod-c.yaml

0 directories, 5 files

$ kapply apply ~/testdata/testdata-namespace-1
pod/pod-a unchanged
pod/pod-b unchanged
pod/pod-c unchanged
3 resource(s) applied. 0 created, 3 unchanged, 0 configured
namespace/test-namespace pruned
1 resource(s) pruned, 0 skipped

$ kubectl get po -n test-namespace
No resources found in test-namespace namespace.

chore: add tests for preprocess functions

The preprocess function was added in #316, we need to add unit tests for it.

Add global timeout to kapply cmds that use applier/destroyer and update example tests

/assign @karlkfi

Feature request: Allow to inject custom poller into Applier

It would be very useful if Applier could be used as library with custom polling interface by, for example, allowing to inject it via constructor as an option, or by making statusPoller field public in here, or making a setter for it.

We have custom wait logic for numerous CRDs, from different projects, that we have no control over. And default Status readers, really can't handle properly waiting for those CRDs to have desired state, since these CRDs don't follow recommended status conditions, as in described in kpt documentation.

So we annotate these CRDs with special annotation, which contains a logic of how to check this status, for example:

metadata:
  annotations:
    airshipit.org/status-check: |
      [
        {
          "status": "failed",
          "condition": "@.status.state==\"failed\""
        },
        {
          "status": "current",
          "condition": "@.status.state==\"current\""
        }
      ]

Based on this we can implement our own StatusReader for these CRDs, and inject it into poller interface in here and add this poller interface to applier.

kstatus in cli-utils vs kustomize

I'm trying to build a small CLI to solve the problem of waiting on kubernetes resources to reach they desired state, for example when deploying a new version of an application to a cluster.
I saw that the kstatus code lives here and in the kustomize repo as well. Can someone shed a light on which one is the officially maintained code? Are they exactly the same code? Which one is recommended to import in a third party project?

Thanks in advance! 🙇

Tech Debt: Add apply-time-mutation example test

/assign @karlkfi

skip wait tasks for resources that fails during apply/prune

Since the apply process is changed to continue on error. We should also update the wait tasks. For any resources that are failed during apply/prune, they should be skipped in the wait tasks. Otherwise, the tasks would wait for conditions that are not going to happen.

kstatus poller CompletedEvent never sent

In kstatus the poller sends an event type with the event to indicate if an error has occurred or a value has been updated etc.

The following is stated about the event type CompletedEvent.

cli-utils/pkg/kstatus/polling/event/event.go

Lines 21 to 23 in 5a53649

    
           // CompletedEvent signals that all resources have been reconciled and the engine has completed its work. The 
        
           // event channel will be closed after this event. 
        
           CompletedEvent

As I understand it this means that when all of the resource that are being polled have reached the CurrentStatus, the CompletedEvent will be sent and the channel is closed. However when my tests with polling a single Deployment has resulted in only receiving ResourceUpdateEvent even after the Deployment has reached a ready state.

Is documentation incorrect or have I misunderstood the expected behavior.

Allow to inject custom StatusReaders into a StatusPoller

Currently, StatusPoller does not have a way to inject custom StatusReaders when running Poll. This is problematic for downstream projects making use of CRDs and having the desire to supply custom readers to kstatus. One of the issues this causes is described in fluxcd/kustomize-controller#387.

the library needs a function to apply a single resource to the cluster

Besides the main apply function

Applier.Run(Context, InventoryInfo, []*Unstructured, option)

we also need the function to apply a single resource.

Applier.DetachResourceFromInventory(Context, InventoryInfo, *Unstructured)

It removes the owning-inventory annotation from the resource and applies it.
It also removes the identifier for that resource from the Inventory list.

	if partition != -1 {
	if updatedReplicas < (specReplicas - partition) {
	message := fmt.Sprintf("updated: %d/%d", updatedReplicas, specReplicas-partition)
	return newInProgressStatus("PartitionRollout", message), nil

	// A job will have the InProgress status until it starts running. Then it will have the Current
	// status while the job is running and after it has been completed successfully. It
	// will have the Failed status if it the job has failed.

	Status: CurrentStatus,
	Message: fmt.Sprintf("Job in progress. success:%d, active: %d, failed: %d", succeeded, active, failed),
	Conditions: []Condition{},
	}, nil

	// CompletedEvent signals that all resources have been reconciled and the engine has completed its work. The
	// event channel will be closed after this event.
	CompletedEvent

kubernetes-sigs / cli-utils Goto Github PK

cli-utils's People

Contributors

Stargazers

Watchers

Forkers

cli-utils's Issues

Problems

Example 1

Example 2

Possible solutions

Continue on wait timeout

Dependency filter

Parallel apply

Async (graph) apply

Recommend Projects

Recommend Topics

Recommend Org