Coder Social home page Coder Social logo

kubernetes-sigs / boskos Goto Github PK

View Code? Open in Web Editor NEW
119.0 119.0 72.0 12.66 MB

Boskos is a resource management service that provides reservation and lifecycle management of a variety of different kinds of resources.

License: Apache License 2.0

Dockerfile 0.56% Makefile 0.57% Go 92.26% Python 5.71% Shell 0.90%
k8s-sig-testing

boskos's People

Contributors

akutz avatar alvaroaleman avatar amulyam24 avatar amwat avatar bentheelder avatar c22zhang avatar cblecker avatar chaodaig avatar chizhg avatar cpanato avatar detiber avatar dims avatar fejta avatar hongkailiu avatar ixdy avatar jprzychodzen avatar justinsb avatar k8s-ci-robot avatar katharine avatar krzyzacy avatar liztio avatar mrhohn avatar rifelpet avatar sawsa307 avatar sebastienvas avatar spiffxp avatar stevekuznetsov avatar wonderfly avatar yongxiu avatar youyangl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

boskos's Issues

Boskos client ReleaseOne is not usable if process is restarted

Originally filed as kubernetes/test-infra#15910 by @chemikadze

What happened:

ReleaseOne performs local check for whether resource was allocated by same client previously, and fails if it was not. After release, it makes client forget such association.
Release is not doing checks and is not making sure that association is removed.

At the same time, all Allocate methods are adding associations. This means, there is no consistent way to release resource in case when client was recreated for some reason (for example, process restart). One known workaround to make sure object is released and there is no memory leak is to use Release if ReleaseOne failed to cover restart case, however quite error-prone without knowledge of internals.

What you expected to happen:

ReleaseOne not to fail if client state is out of sync or Release to clean up client state.

How to reproduce it (as minimally and precisely as possible):

Create two boskos clients, allocate in one client, and run release from another client.

Updates silently fail if configuration file is renamed

Boskos should fail on startup if the path to the config file is invalid.

We've discovered a deployment where Boskos is not updating its configuration.

What I think happened is that the --config flag pointed to a valid configuration file, and then at some point, the filename inside ConfigMap changed, resulting in the original file being deleted.

It appears that viper simply silently stops its file watch when a file is deleted, so there's no good way to detect if this error has occurred until the pod restarts.

I don't have a good idea for how to resolve this - should we periodically check the config file and crash if it's no longer valid? That'd also help a bit with #20. (If we didn't crash, we'd at least get error messages that might be helpful for users trying to figure out why their configs aren't being updated.)

cc @coryrc

Logger is not configured correctly

Looking at logs from a recent boskos build (from this repo) reveals that some things are not configured appropriately:

boskos:

{"component":"unset","file":"/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:314","func":"github.com/sirupsen/logrus.(*Entry).Logf","handler":"handleUpdate","level":"info","msg":"From 10.44.2.234:42282","time":"2020-06-05T23:01:09Z"}

cleaner:

{"component":"unset","file":"/go/pkg/mod/github.com/sirupsen/[email protected]/logger.go:192","func":"github.com/sirupsen/logrus.(*Logger).Log","level":"info","msg":"Cleaner started","time":"2020-06-05T22:47:27Z"}

etc.

The component is unset because we aren't setting the variables in k8s.io/test-infra/prow/version.
We can fairly easily add linker flags for this, though it'll require a bit of work in the Makefile (or maybe we'll just want to write a wrapper script).

I'm not sure why we're getting useless file and function annotations, though.

/kind bug

REST client without depending on kubernetes staging libraries

This is a feature request asking that we consider publishing a go module that does not import Kubernetes staging libraries, at least for the purposes of talking to the boskos API.

We need a package like this somewhere for projects like kubetest kubernetes/test-infra#20422, having importable packages that depend on these is a bit of a nightmare.

Also: Having an independent client module might help resolve circular dependencies with any tools in test-infra that talk to boskos while boskos is importing test-infra ...

gcp_janitor deletes instances first causing some recreates

Originally filed as kubernetes/test-infra#16965 by @oxddr

What happened:
gcp_janitor deleted all instances first. Some of them belonged to managed instance groups, so they were recreated and deleted few seconds later when IGM has been deleted.

What you expected to happen:
Only non-managed instances should be deleted first or managed instance groups should be deleted before instances.

How to reproduce it (as minimally and precisely as possible):
create clusters on GCE using kube-up and clean it up with gcp_janitor/

Please provide links to example occurrences, if any:
n/a

Anything else we need to know?:
-

GCP janitor failing when trying to clean up logging sinks

At some point, it seems like GCP added two new Cloud Logging logs router sinks to projects:

  • _Default
  • _Required

These cannot be deleted, and this recently started causing cleanup of projects to fail, with error messages like the following:

ERROR: (gcloud.logging.sinks.delete) PERMISSION_DENIED: Sink _Default cannot be deleted. Consider disabling instead
ERROR: (gcloud.logging.sinks.delete) PERMISSION_DENIED: Sink _Required cannot be deleted
Error try to delete resources sinks: CalledProcessError()
Error try to delete resources sinks: CalledProcessError()

reaper: support different expiration times based on state

Currently, the reaper only resets resources in state busy, cleaning, or leased. Furthermore, it uses the same expiration time for each.

One use case that isn't supported by this model is human inspection of failed resources. For example, if a test fails, a team might want to look at the state of the resource before cleaning it up. The tests could move this resource into a new state (perhaps purgatory), but then it will never be cleaned up. Ideally we'd be able to set a longer expiration time on this new state.

Tangential note: why do we need a separate reaper binary at all? Would it be simpler to have a setting in the main boskos configuration map that controls whether leases expire, and have boskos core do that itself? Putting configuration there would allow easy per-resource overrides, too.

Boskos client does not distinguish between incorrect resource type and no resources available

Currently, when acquiring a resource using AcquireWait(), the client makes no distinction between a resource type that does not exist (which could arise from a user typo) vs. a situation where all resources are busy.

Unfortunately, the Boskos Server sends the 404 status code in both situation - see code here. The Boskos client then looks at the status code here.

One easy and backwards-compatible fix would be to add a new option to the client that, when set, would distinguish between the two situation based on the text returned from the HTTP call. The return text for the two different situations is:

  • Acquire failed: resource type "my-resource-type" does not exist
  • Acquire failed: no available resource my-resource-type, try again later

The current problem is that if the user accidentally asks for a resource type that does not exist, they will be stuck in a loop forever with no hope of unblocking except for a Context timeout, which, imo, is not an acceptable situation.

GCP janitor: support arbitrary cleanup commands instead of simple delete

Some gcloud cleanup commands are not standard delete commands - https://cloud.google.com/sdk/gcloud/reference/beta/compute/shared-vpc/associated-projects/remove as an example, thus cannot be simply added into the map in https://github.com/kubernetes-sigs/boskos/blob/master/cmd/janitor/gcp_janitor.py#L32-L91.

One option to support this is to add custom functions similar as clean_gke_cluster, but there's probably a better approach?

/cc @ixdy

Allow running Boskos in HA mode

Currently it is not possible to run boskos with more than one replica, because it maintains an in-memory FIFO queue to make sure leases are handed out in the order they got requested. This makes it impossible to run it in HA, resulting in downtimes of the whole service if the single replica goes down for whatever reason.

Deleting static resource may not take effect until next config update or container restart

Static resources are deleted only on "SyncConfig", which is triggered by config map update, and in-use resource deletion is delayed until next SyncConfig. But sync happens only on container restart or config change - so if resource was in use, deletion may be significantly delayed.

Dynamic resources however are updated at same time as static + on dynamic-resource-update-period (10 minutes by default). Should static resources be updated as well with similar cadence?

Release blocking tests cannot acquire project from boskos

gce-cos-master-scalability-100 release blocking tests are failing with:

2022/03/29 11:16:38 main.go:331: Something went wrong: failed to prepare test environment: --provider=gce boskos failed to acquire project: Post "http://boskos.test-pods.svc.cluster.local./acquire?dest=busy&owner=ci-kubernetes-e2e-gci-gce-scalability&request_id=eda4a4c1-65ce-460e-8906-35595e3e8d6f&state=free&type=scalability-project": dial tcp 10.35.241.148:80: connect: connection refused

Handwritten CRD DeepCopy methods are not deepcopying

For some reason we have handwritten instead of generated DeepCopy methods on our CRDs and they definitely do not DeepCopy, i.E. something like this is incorrect, because nested pointers will just get copied over, so if someone changes the value that is pointed to, it will change on both the original and the "deepcopied" version:

out.Spec = in.Spec

/kind bug

GCP Janitor: support clean up GCP resources in the additional zones

Currently there is a list of GCP zones configured in GCP Janitor - https://github.com/kubernetes-sigs/boskos/blob/master/cmd/janitor/gcp_janitor.py#L94-L184. When the script is run, Janitor will clean up all the GCP resources in these zones. However, the list is not exclusive and we cannot add the zones that are not publicly launched (e.g. us-east1-a) since the cleanups will fail for projects that cannot access these zones.

One solution would be adding an extra --additional-zones flag to gcp_janitor.py that allows extending the list of zones. For Janitor instances that manage internal GCP projects, pass the internal zones so that it can also clean up GCP resources in these zones.

/assign

janitor fails to clean up some resources in a timely manner if dirty rates are unequal

Originally filed as kubernetes/test-infra#15925

Creating a one-sentence summary of this issue is hard, but the basic bug is fairly easy to understand.

Assume a Boskos instance has three resource types, A, B, and C. A has 5 resources, B has 10, and C has 100. A Boskos janitor has been configured to clean all three types.

Currently, the janitor loops through all resource types, iteratively cleaning one resource of each type. If the janitor finds that one of the types has no dirty resources, it stops querying that resource type until all resources have been cleaned, at which point it waits a minute and then starts over with the complete list again.

In our hypothetical case (as well as observed in practice), what this means is that the janitor will finish cleaning resources of type A (and possibly B), while still having many more C resources to clean. Additionally, given that C is such a large pool, there will likely be many jobs making more C resources dirty. As a result, it will be quite some time before the janitor attempts to clean A resources, and the pool will probably fill up with dirty resources.

Possible ways to mitigate the issue (in increasing complexity):

  • increase the number of janitor replicas
  • segment the janitors (i.e. have separate janitors for each type)
  • remove the optimization in the janitor loop, continuing to attempt to acquire all resource types (this will likely result in more /aquire RPCs to Boskos)
  • use Boskos metrics to select which resources to attempt to clean. This could even be prioritized (e.g. focus on whichever type is closest to running out of resources), though that might lead to different issues with starvation. Additionally, a failing cleanup could mean the janitor might get completely stuck.

Update controller-runtime to v0.15.0

Controller-runtime v0.15.0 has released on 5/23/2023

https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.15.0

This new version has many breaking changes, and if someone imported this repo as dependency, it's blocking them from bumping up the controller-runtime to v0.15.0

could not import sigs.k8s.io/boskos/crds (-: # sigs.k8s.io/boskos/crds
vendor/sigs.k8s.io/boskos/crds/client.go:89:24: cannot use func(_ *rest.Config) (meta.RESTMapper, error) {…} (value of type func(_ *rest.Config) (meta.RESTMapper, error)) as func(c *rest.Config, httpClient *http.Client) (meta.RESTMapper, error) value in struct literal
vendor/sigs.k8s.io/boskos/crds/client.go:94:15: cannot use func(_ cache.Cache, _ *rest.Config, _ ctrlruntimeclient.Options, _ ...ctrlruntimeclient.Object) (ctrlruntimeclient.Client, error) {…} (value of type func(_ "sigs.k8s.io/controller-runtime/pkg/cache".Cache, _ *rest.Config, _
 client.Options, _ ...client.Object) (client.Client, error)) as client.NewClientFunc value in struct literal) (typecheck)
        "sigs.k8s.io/boskos/crds" 

Unexpected behavior of boskos in acquirestate when destination and current state are both free.

acquirestate lost a resource's state when destination state and current state of a resource are both free. This means if same acquirestate request is called twice sequentially, the second time boskos would give an error. It is only after the release request that acquirestate works again. The following are two proposed solutions:

  1. give an error when destination and current state are same(recommended)
  2. not allowing destination to be free.

gcp_janitor.py isn't thread-safe: multiple threaded invocations can corrupt GCloud config file

In one of my Boskos instances, I've observed failures when invoking gcp_janitor.py of the following form:

failed to clean up project asm-boskos-shared-vpc-svc-188, error info: ERROR: gcloud failed to load: Source contains parsing errors: '/root/.config/gcloud/configurations/config_default'
	[line 13]: 'oogleapis.com/\n'
    parsed_config.read(properties_path)
    self._read(fp, filename)
    raise e
	[line 13]: 'oogleapis.com/\n'

My analysis of gcp_janitor.py makes me believe it's not thread-safe, specifically when running gcloud config set.

The first place we run that is in line 511, where we run:

gcloud config set billing/quota-project <xyz>

The second is in line 588, where we run:

gcloud config set api_endpoint_overrides/gkehub https://<gkehub-url>/

The janitor itself, janitor.go, invokes gcp_janitor.py inside Goroutines, which run in parallel; I believe that if multiple threads attempt to run gcloud config set simultaneously, it can corrupt the GCloud config file (in my case, /root/.config/gcloud/configurations/config_default), which is shared among all threads. This renders any future attempts to run gcp_janitor.py futile, because the GCloud config file is irrevocably corrupted.

I believe I have a fix for this, involving setting os.environ rather than running gcloud config set. Specifically, you can replace the commands with environment variables like this:

gcloud config set billing/quota-project <xyz>
-> CLOUDSDK_BILLING_QUOTA_PROJECT=<xyz>

gcloud config set api_endpoint_overrides/gkehub https://<gkehub-url>/
-> CLOUDSDK_API_ENDPOINT_OVERRIDES_GKEHUB=https://<gkehub-url>

Looks as though the code already makes use of environment variables like these in line 436, where we do:

os.environ['CLOUDSDK_API_ENDPOINT_OVERRIDES_CONTAINER'] = endpoint

I'll put up a PR to modify those gcloud config set commands with os.environ assignments.

Support multiple config files

Currently, Boskos reads a single configuration file, but maintaining a single config file can be painful, and there has been some desire to support multiple configuration files.

Internally, Boskos is using viper, which only supports one configuration file per instance. We could experiment to see if multiple instances would be a feasible approach, or if there is some other way to address this request.

/kind feature

static resources removed from the configuration may never be deleted

Originally filed as kubernetes/test-infra#17282

I mentioned this tangentially in kubernetes/test-infra#16047 (comment), but I want to pull it to a separate issue to be more easily highlight it.

Boskos doesn't delete static resources that are removed from the configuration if they are in use, to ensure that jobs don't fail, and to ensure that such resources are properly cleaned up by the janitor.

Originally, this was a reasonable decision, since Boskos periodically synced its storage against the configuration, and most likely such resources would eventually be free and thus deleted from storage.

After kubernetes/test-infra#13990, Boskos only syncs its storage against the configuration when the configuration changes (or when Boskos restarts). As a result, it may take a long time for static resources to be deleted, if ever.

There was a similar issue for DRLCs that I addressed in kubernetes/test-infra#16021, effectively by putting the DRLCs into lame-duck mode.

There isn't a clear way to indicate that static resources are in lame-duck mode, though.

Possible ways to address this bug, in increasing order of complexity:

  1. Just delete static resources, regardless of what state they're in.
  2. Periodically sync storage against the config. It's probably less expensive now, due to the improvements around locking.
  3. Somehow indicate that resources are in lame-duck mode to prevent them from being leased, and then delete them once free:
    a. Add a field into the UserData for static resources. (Currently UserData is not used for static resources.)
    b. Set an ExpirationDate on static resources. (Currently ExpirationDate is not used for static resources.)
    c. Add a new field on the ResourceStatus indicating resources are in lame-duck mode.

Workaround until this bug is fixed: admins with access to the cluster where Boskos is running can just delete the resources manually using kubectl.

GCP projects stuck in Cleaning block other projects from cleaning

As far as I understood this logic correctly, Janitor tries to acquire and clean all dirty resources sequentially, type by type; acquisition is also throttled by size of the channel. If cleaning fails, janitor returns resource back to dirty state, and if acquisition was throttled, loop will try to get same resource again. So if janitor(s) are putting resources back to dirty fast enough, it's possible for janitor to get stuck on one resource type and do nothing for other resource types.

This was observed previously with filestore cleans in private install: multiple resource types have been affected after one of types got more projects stuck in cleaning than aggregated capacity of channels in all janitor instances.

AWS Janitor: Add support for ECR Public

In e2e CI jobs in CAPA, we're creating an extremely temporary ECR public repo to deploy a container image from the codebase into a created EC2 instance. These registries should be mopped up after use.

Note that ecr-public is its own API, distinct from normal ecr.

Migrate aws-janitor to use Go AWS SDK v2

The Go AWS SDK v1 (which is used by this project) is moving to maintenance mode July 2024 and will be completely out of support July 20251.

We should update aws-janitor (and any other other boskos related code) to use the the v2 SDK.

AWS have published a migration guide2 that we can use to understand the changes needed.

Footnotes

  1. https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-aws-sdk-for-go-v1-on-july-31-2025/

  2. https://aws.github.io/aws-sdk-go-v2/docs/migrating/

aws-janitor overly eagerly deleting IAM Role

On AWS, IAM Roles have the same name (and no unique UUID). They do have a creation timestamp, however.

Our test jobs are creating IAM roles with the same name. The aws-janitor runs periodically, and if the timings work out "just so", aws-janitor will observe different IAM roles with the same name for the entire TTL window. It will then delete an IAM role, thinking that it is no longer in use, but in fact it has seen multiple different IAM roles with the same name.

I propose using CreationTimestamp to differentiate.

flaky test: cleaner TestRecycleResources/noLeasedResources

Seen failing here: https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_boskos/24/pull-boskos-build-test-verify/1270814151844827142

=== FAIL: cleaner TestRecycleResources/noLeasedResources (0.05s)
time="2020-06-10T20:26:08Z" level=info msg="Cleaner started"
time="2020-06-10T20:26:08Z" level=info msg="Resource dynamic_2 is being recycled"
time="2020-06-10T20:26:08Z" level=info msg="Resource dynamic_2 is being recycled"
time="2020-06-10T20:26:08Z" level=info msg="Resource dynamic_2 is being recycled"
time="2020-06-10T20:26:08Z" level=info msg="Released dynamic_2 as tombstone"
time="2020-06-10T20:26:08Z" level=info msg="Released dynamic_2 as tombstone"
time="2020-06-10T20:26:08Z" level=info msg="Resource dynamic_2 is being recycled"
time="2020-06-10T20:26:08Z" level=info msg="Released dynamic_2 as tombstone"
time="2020-06-10T20:26:08Z" level=error msg="Release failed" error="owner mismatch request by cleaner, currently owned by "
time="2020-06-10T20:26:08Z" level=error msg="failed to release dynamic_2 as tombstone" error="owner mismatch request by cleaner, currently owned by "
time="2020-06-10T20:26:08Z" level=info msg="Resource dynamic_2 is being recycled"
    TestRecycleResources/noLeasedResources: cleaner_test.go:212: resource dynamic_2 state cleaning does not match expected tombstone
time="2020-06-10T20:26:08Z" level=info msg="Stopping Cleaner"
time="2020-06-10T20:26:08Z" level=info msg="Exiting recycleAll Thread"
time="2020-06-10T20:26:08Z" level=info msg="Released dynamic_2 as tombstone"
time="2020-06-10T20:26:08Z" level=info msg="Exiting recycleAll Thread"
time="2020-06-10T20:26:08Z" level=info msg="Exiting recycleAll Thread"
time="2020-06-10T20:26:08Z" level=info msg="Exiting recycleAll Thread"
time="2020-06-10T20:26:08Z" level=info msg="Exiting recycleAll Thread"
time="2020-06-10T20:26:08Z" level=info msg="Cleaner stopped"
    --- FAIL: TestRecycleResources/noLeasedResources (0.05s)
=== FAIL: cleaner TestRecycleResources (0.21s)

/kind bug

janitor: track when cleanup fails repeatedly for the same resource

Originally filed as kubernetes/test-infra#15866

Due to programming errors, the janitor may continuously fail to clean up a resource. Two examples I just discovered:

possibly an order-of-deletion issue:

{"error":"exit status 1","level":"info","msg":"failed to clean up project kube-gke-upg-1-2-1-3-upg-clu-n, error info: Activated service account credentials for: [[email protected]]\nERROR: (gcloud.compute.networks.delete) Could not fetch resource:\n - The network resource 'projects/kube-gke-upg-1-2-1-3-upg-clu-n/global/networks/jenkins-e2e' is already being used by 'projects/kube-gke-upg-1-2-1-3-upg-clu-n/global/routes/default-route-92807148d5aa60d1'\n\nError try to delete resources networks: CalledProcessError()\n[=== Start Janitor on project 'kube-gke-upg-1-2-1-3-upg-clu-n' ===]\n[=== Activating service_account /etc/service-account/service-account.json ===]\n[=== Finish Janitor on project 'kube-gke-upg-1-2-1-3-upg-clu-n' with status 1 ===]\n","time":"2020-01-10T21:03:14Z"}

likely incorrect flags (gcloud changed but we didn't?):

{"error":"exit status 1","level":"info","msg":"failed to clean up project k8s-jkns-e2e-gke-ci-canary, error info: Activated service account credentials for: [[email protected]]\nERROR: (gcloud.compute.disks.delete) unrecognized arguments: --global \n\nTo search the help text of gcloud commands, run:\n  gcloud help -- SEARCH_TERMS\nError try to delete resources disks: CalledProcessError()\nERROR: (gcloud.compute.disks.delete) unrecognized arguments: --region=https://www.googleapis.com/compute/v1/projects/k8s-jkns-e2e-gke-ci-canary/regions/us-central1 \n\nTo search the help text of gcloud commands, run:\n  gcloud help -- SEARCH_TERMS\nError try to delete resources disks: CalledProcessError()\n[=== Start Janitor on project 'k8s-jkns-e2e-gke-ci-canary' ===]\n[=== Activating service_account /etc/service-account/service-account.json ===]\n[=== Finish Janitor on project 'k8s-jkns-e2e-gke-ci-canary' with status 1 ===]\n","time":"2020-01-10T21:18:55Z"}

It'd be good to have some way of detecting when we're repeatedly failing to clean up a resource.
Not sure yet what the best way would be to track that.

Release blocking tests cannot acquire project from boskos

Similar to: #118

gce-cos-master-scalability-100 release blocking tests are failing with:

2022/04/06 10:36:20 main.go:331: Something went wrong: failed to prepare test environment: --provider=gce boskos failed to acquire project: Post "http://boskos.test-pods.svc.cluster.local./acquire?dest=busy&owner=ci-kubernetes-e2e-gci-gce-scalability&request_id=e01699c7-977c-4045-81e5-2d8825e132c6&state=free&type=scalability-project": dial tcp 10.35.241.148:80: connect: connection refused

Finish setting up this repository

A number of tasks remain. In no particular order:

aws-janitor: ensure IDs are unique across resources and regions for set.Mark

Some AWS resources don't have an ARN, so we're currently just using the ID of the resource, which may not necessarily be globally unique for the same resource and/or across resource types.

For several resource types, we generate a fake ARN which includes the resource type and region. We should ensure all resource types are doing something similar.

/kind cleanup

Boskos CRD uses deprecated API group to be removed in v1.22

https://kubernetes.io/docs/reference/using-api/deprecation-guide/#customresourcedefinition-v122

The apiextensions.k8s.io/v1beta1 API version of CustomResourceDefinition will no longer be served in v1.22.

  • Migrate manifests and API clients to use the apiextensions.k8s.io/v1 API version, available since v1.16.
  • All existing persisted objects are accessible via the new API
  • Notable changes:
    • spec.scope is no longer defaulted to Namespaced and must be explicitly specified
    • spec.version is removed in v1; use spec.versions instead
    • spec.validation is removed in v1; use spec.versions[*].schema instead
    • spec.subresources is removed in v1; use spec.versions[*].subresources instead
    • spec.additionalPrinterColumns is removed in v1; use spec.versions[*].additionalPrinterColumns instead
    • spec.conversion.webhookClientConfig is moved to spec.conversion.webhook.clientConfig in v1
    • spec.conversion.conversionReviewVersions is moved to spec.conversion.webhook.conversionReviewVersions in v1
    • spec.versions[*].schema.openAPIV3Schema is now required when creating v1 CustomResourceDefinition objects, and must be a structural schema
    • spec.preserveUnknownFields: true is disallowed when creating v1 CustomResourceDefinition objects; it must be specified within schema definitions as x-kubernetes-preserve-unknown-fields: true
    • In additionalPrinterColumns items, the JSONPath field was renamed to jsonPath in v1 (fixes #66531)

changing type of a static resource in config doesn't update storage

Originally filed as kubernetes/test-infra#16047

What happened:
We renamed the type of some of our static resources to work around a bug in the janitor.
(That bug: we had a group of projects that didn't end in -project, and thus the janitor was passing the wrong flag: https://github.com/kubernetes/test-infra/blob/761c11f53ddb7dde3fcc4073a7e3b9015554fe7f/boskos/janitor/janitor.go#L92-L99
)

After applying the config, the old type still remained in storage (in the Kubernetes objects).

What you expected to happen:
Boskos would update storage (in the Kubernetes objects) reflecting the new type.

How to reproduce it (as minimally and precisely as possible):
I wrote a simple unit test that reproduces this failure: ixdy/kubernetes-test-infra@d6714a6

It looks like when updating static resources, we just check to see whether the resources specified in the config exist in storage and vice-versa, only looking at the resource names. We do not consider that other metadata (such as type) may have changed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.