This issue is for status updates on 1.10 issues for release tracking.

jberkus commented on July 4, 2024 1

Corrected March 13th Burndown report:

Status as of noon, PDT, March 13th.

We still have 19 issues open. Some of these don't show up on github searches because they are not labelled correctly, which is being resolved.

We are waiting for a bunch of changes to go into 1.9 to fix downgrade tests, which should clear up some of the pending issues list. This does point to needing a change in how we do downgrade tests in the future.

There are several major regressions without resolution right now, sufficient to delay ending code freeze, and possibly the final release per burndown meeting this AM.

Red

Issues which are blockers, or are status unknown and look serious, without a good PR.

Performance issues in analysis/progress:

Failing tests with currently unknown causes:

Regression in progress, but fix untested. Issue was accidentally dropped by SIG and just picked up again:

Kubelet leaves orphaned directories and mounts after downtime

Yellow

Issues which are blockers, with a good PR. Also undecided issues with or without PRs which look like they won't be considered 1.10 bugs.

Schedule DaemonSet Pods by default scheduler

Test Fails in Progress

1.9 downgrade tests issues

These issues are waiting for code on 1.9 in order to fix the downgrade tests.

Green

Non-blocker issues.

Tracking Issues

from sig-release.

jberkus commented on July 4, 2024 1

1.10 is out now, closing.

from sig-release.

jberkus commented on July 4, 2024

Status as of 2/21, 1 day into Code Slush.

Detail behind this report can be found on the tracking spreadsheet

1 day after Code Slush, here is issue status.

Bump

Recommend bumping these issues from the milestone based on inactivity and lack of responsiveness by SIGs/issue owner: 11 issues.

Wait

Status of issues/PRs has been queried, currently waiting for response from owners: 9 issues

Keep

1.10 issues which appear to be in progress and headed towards resolution by Code Freeze: 19 issues

Tracking

These are tracking issues. In a few cases, they link to multiple issues which are NOT currently marked with the 1.10 milestone. 6 issues.

from sig-release.

liggitt commented on July 4, 2024

Graduate the kubeletconfig API to beta was completed in 1.10

from sig-release.

liggitt commented on July 4, 2024

Support out-of-tree authentication providers design is merged, implementing PR is under review and planned to merge this week

from sig-release.

jdumars commented on July 4, 2024

All SIG-Azure issues have been curated.

from sig-release.

jberkus commented on July 4, 2024

Status as of 2/23. All issue owners have been reminded of the need to add status/approved-for-milestone. 11 issues were removed from the milestone or resolved. 3 new issues were added. Total of 40 issues.

Bump

Recommend bumping these issues from the milestone based on inactivity and lack of responsiveness by SIGs/issue owner, or request by issue owner that it be bumped: 9 issues

Wait

Issue Owner/SIG has been queried, waiting for response. 10 issues

Keep

Issue is approved by SIG for 1.10 or looks likely to be approved/completed, or looks like a blocking bug against 1.10: 16 issues.

Tracking

Tracking issues which have features/bugs in 1.10:

from sig-release.

jberkus commented on July 4, 2024

Burndown report as of scheduled Code Freeze 2/26

Complete issue tracker is a spreadsheet.

Red

These issues represent what look like blockers for 1.10, and do not have a PR. One has had no attention at all.

Yellow

These issues have one or more PRs. However, either the PRs aren't in good shape (such as failing tests or review) or don't completely resolve the issue.

Green

These issues have a PR in good shape and are expected to merge and close within a couple of days.

Tracking

These are tracking issues for features being dealt with in multiple steps/releases/PRs.

Bump

These issues are expected to get bumped out of 1.10 once automation kicks in tommorrow, as they are not approved or are low-priority.

from sig-release.

jdumars commented on July 4, 2024

Thank you so much! This takes the organizational cake.

from sig-release.

jberkus commented on July 4, 2024

As of today, we have 27 issues open against 1.10, although some of them (4) would have been removed by automation if we hadn't run out of github tokens. I don't have a good comparison against prior releases because I've discovered math errors in the devstats charts. Working on that. Closest comparison is 20 open issues 4 days after Code Freeze for 1.9.

Tracking spreadsheet is here and is up to date as of this afternoon. Note the three "NA" issues; these are recent bugs and/or test failures which look like 1.10 issues, but have not been confirmed by their respective SIGs.

Red

Issues with no PR, or no complete PR, which cannot be easily kicked out of 1.10.

Yellow

Issues with a PR that is not approved, or issues with no PR which look possible to ignore/kick in 1.10.

Green

Issues with an approved PR which is just waiting for labels, release notes, or automation.

Special Issues

Primarily tracking issues.

Kick

Issues which are waiting for automation to kick them out of the milestone.

from sig-release.

jberkus commented on July 4, 2024

Burndown report as of 3/1 around 1pm Pacific

Complete issue tracker is a spreadsheet.

We currently have around 21 "real" issues, excluding a few which will be dropped by automation as soon as the patched munger catches up.

Red

These issues represent what look like blockers for 1.10, and do not have a PR.

Yellow

These issues have one or more PRs. However, either the PRs need review (such as failing tests or review) or don't completely resolve the issue.

Green

These issues have a PR in good shape and are expected to merge and close within a couple of days.

Tracking

These are tracking issues for features being dealt with in multiple steps/releases/PRs.

from sig-release.

jberkus commented on July 4, 2024

As of today, we have 21 issues open against the 1.10 milestone, which is the same as yesterday. However, several of those issues have moved from yellow to green status because of PRs being approved/fixed, so we can expect a drop in the number of issues over the weekend.

On the down side, several issues are of special concern, as they represent severe problems which may throw off the release schedule. These are all in the Red section and detailed there.

Tracking spreadsheet is here and is up to date as of this afternoon.

Red

Issues with no PR, or no complete PR, which cannot be easily taken out of 1.10 or represent major regressions.

Kubernetes is vulnerable to stale reads, violating critical pod safety guarantees

The Stale Reads issue is a potentially major issue, affecting all supported versions of Kubernetes, without a clear solution that doesn't produce a major performance regression. Depending on how fixing it goes, we may have to punt on it for 1.10.0 and wait for a fix in 1.10.1.

These two are really the same issue, and show what may be a very large performance regression in 1.10 even without a Stale Reads fix.
Waiting on test framework to find out how bad the problem is; it's related to some serious test flakes, though, so it's undetermined
whether it's a substantial issue or a problem with the test.

Yellow

Issues with a PR that is not approved, or issues with no PR which look possible to ignore/kick in 1.10.

Green

Issues with an approved PR which is just waiting for labels, release notes, or automation.

Kick

Issues just waiting for the grace period to elapse before being kicked out of 1.10.

Special Issues

Primarily tracking issues.

Kick

Issues which are waiting for automation to kick them out of the milestone.

from sig-release.

jberkus commented on July 4, 2024

As of around 10am PST today, we have 25 issues open against the 1.10 milestone, which is an increase of 4 from Friday. Most of the new issues are actually breakouts of a larger test fail issue, in order to have one issue per SIG for resolution (see test fails below).

Tracking spreadsheet is here
and is up to date as of this morning.

Red

Issues with no PR, or no complete PR, which cannot be easily taken out of 1.10 or represent major regressions.

Kubernetes is vulnerable to stale reads, violating critical pod safety guarantees

The Stale Reads issue is a potentially major issue, affecting all supported versions of Kubernetes, without a clear solution that
doesn't produce a major performance regression. Depending on how fixing it goes, we may have to punt on it for 1.10.0 and wait for
a fix in 1.10.1.

Apiserver CPU/Mem usage bumped to 1.5-2x in big clusters

This shows what may be a very large performance regression in 1.10 even without a Stale Reads fix. Test flakes are fixed, so hopefully we can get confirmation (or not).

Yellow

Issues with a PR that is not approved, or issues with no PR which look possible to ignore/kick in 1.10.

New Test Fails

We've added a bunch of test fail tracking over the weekend. These represent related test fails across several test suites, with individual asignees. All are considered Yellow right now, as they either have fixes in progress, or are too new to be considered stuck. Most of these are being tracked from issue #60003.

Green

Issues with an approved PR which is just waiting for labels, release notes, or automation.

Special Issues

Primarily tracking issues.

Kick

Issues just waiting for the grace period to elapse before being kicked out of 1.10.

from sig-release.

jberkus commented on July 4, 2024

As of around 11am PST today, we have 26 issues open against the 1.10 milestone, which is an increase of 1 from yesterday. Most issues are various test failures, as non-test-fail issues are getting resolved.

Tracking spreadsheet is here
and is up to date as of this morning.

Red

Issues with no PR, or no complete PR, which cannot be easily taken out of 1.10 or represent major regressions.

Kubernetes is vulnerable to stale reads, violating critical pod safety guarantees

The Stale Reads issue is a potentially major issue, affecting all supported versions of Kubernetes, without a clear solution that
doesn't produce a major performance regression. It does not look to be headed towards resolution on a reasonable timeline for 1.10, so we may need to talk about it for 1.10.1.

Apiserver CPU/Mem usage bumped to 1.5-2x in big clusters

This shows what may be a very large performance regression in 1.10 even without a Stale Reads fix. SIG thinks we have a major regression here, even though test flakes are making it hard to verify.

Yellow

Issues with a PR that is not approved, or issues with no PR which look possible to ignore/kick in 1.10.

Improve k8s support for multizone PDs (this is an exception)

Test Fails In Progress

The vast majority of issues open now are test fails. All of the below are in progress in some way, but don't yet have a clear resolution. Many of these are the usual upgrade test failures. In at least one case, we need to unfreeze the 1.9 tree to fix the test. Not clear at this point whether we have a general upgrade issue the way we did in 1.9.

Green

Issues with an approved PR which is just waiting for labels, release notes, or automation.

Special Issues

Primarily tracking issues.

v1.10 known issues / FAQ accumulator
Advanced Auditing 1.10 umbrella bug (just waiting on fixing the tests)
Bringing server-side printing to beta summary (should be done?)
[job failure]ci-kubernetes-e2e-gci-gke|gce-serial

from sig-release.

jberkus commented on July 4, 2024

As of around 4pm PST today, we have 22 issues open against the 1.10 milestone, which is a decrease of 4 from yesterday. Most issues are various test failures, as non-test-fail issues are getting resolved.

Tracking spreadsheet is here
and is up to date as of this afternoon.

Critical concerns right now are:

Potential major performance regression
Issue with stale reads affecting scalability
Test fails not getting enough attention from SIGs
Test fails which require patches against 1.9 to resolve.

Red

Issues with no PR, or no complete PR, which cannot be easily taken out of 1.10 or represent major regressions.

Kubernetes is vulnerable to stale reads, violating critical pod safety guarantees

The Stale Reads issue is a potentially major issue, affecting all supported versions of Kubernetes, without a clear solution that
doesn't produce a major performance regression. It does not look to be headed towards resolution on a reasonable timeline for 1.10, so we may need to talk about it for 1.10.1.

Apiserver CPU/Mem usage bumped to 1.5-2x in big clusters

This shows what may be a very large performance regression in 1.10 even without a Stale Reads fix. SIG thinks we have a major regression here, even though test flakes are making it hard to verify.
It may also be related to issue #60762, below.

These two failing tests are receiving zero attention from their respective SIGs, 3 days after notice. SIGs bothered on Slack.

Yellow

Issues with a PR that is not approved, or issues with no PR which look possible to ignore/kick in 1.10.

Improve k8s support for multizone PDs (this is an exception)

Test Fails In Progress

The vast majority of issues open now are test fails. All of the below are in progress in some way, but don't yet have a clear resolution. Many of these are the usual upgrade test failures.

These two test fails seem to require modifying tests/code for 1.9 in order to fix. We need a hotfix to allow the owners to do this, and then we need a better procedure for handling upgrade tests in the future so that it doesn't lead to needing to patch tests on an older, frozen, version.

This issue is unconfirmed and not yet assigned to 1.10, but could be causing some of the test failures above:

Kubelet Config comes from nowhere

Green

Issues with an approved PR which is just waiting for labels, release notes, or automation.

Mount propagation moved to beta, comment not updated

Special Issues

Primarily tracking issues.

v1.10 known issues / FAQ accumulator
Advanced Auditing 1.10 umbrella bug (just waiting on fixing the tests)
Bringing server-side printing to beta summary (should be done?)
[job failure]ci-kubernetes-e2e-gci-gke|gce-serial

from sig-release.

jberkus commented on July 4, 2024

As of around 4pm PST today, we have 18 issues open against the 1.10 milestone, which is a decrease of 4 from Wednesday, and a great trajectory to be on. Most issues are various test failures, as non-test-fail issues are getting resolved, and all test fails are now getting attention.

Tracking spreadsheet is here
and is up to date as of this afternoon.

Critical concerns right now are:

Potential major performance regression
Opening the 1.9 tree in order to patch the upgrade tests (see Yellow)

Red

Issues with no PR, or no complete PR, which cannot be easily taken out of 1.10 or represent major regressions.

This shows what may be a very large performance regression in 1.10 even without a Stale Reads fix. SIG thinks we have a major regression here, even though test flakes are making it hard to verify.

Kubernetes is vulnerable to stale reads, violating critical pod safety guarantees

The Stale Reads issue is still potentially major issue, affecting all supported versions of Kubernetes, without a clear solution that doesn't produce a major scalability regression. However, it looks highly unlikely to be resolved in the next 2 weeks, so we are recommending taking it out of the 1.10 milestone.

Yellow

Issues with a PR that is not approved, or issues with no PR which look possible to ignore/kick in 1.10.

Test Fails In Progress

The vast majority of issues open now are test fails. All of the below are in progress in some way, but don't yet have a clear resolution. Many of these are the usual upgrade test failures.

These four test fails seem to require modifying tests/code for 1.9 in order to fix. We need a hotfix to allow the owners to do this, and then we need a better procedure for handling upgrade tests in the future so that it doesn't lead to needing to patch tests on an older, frozen, version. Either that, or we have to decide to ignore the upgrade tests for 1.10.

Green

Issues with an approved PR which is just waiting for labels, release notes, or automation.

Special Issues

Primarily tracking issues.

v1.10 known issues / FAQ accumulator
Advanced Auditing 1.10 umbrella bug (just waiting on fixing the tests)
[job failure]ci-kubernetes-e2e-gci-gke|gce-serial

from sig-release.

jberkus commented on July 4, 2024

As of around 1pm PDT today, we have 15 accepted issues open against the 1.10 milestone, and four possibles, which is a decrease of 3 from Friday. Further, four issues are likely to close in the next 2-5 hours as fixed tests pass.

However, we have a couple of major blocker issues which look possible to delay the release, see Red below.

Tracking spreadsheet is here
and is up to date as of 1pm PDT.

Red

The big potential release-derailer is the major performance regressions possibly due to unidentified performance changes in etcd:

[test flakes] master-scalability suites

At this point, it is unclear if the etcd issues account for all of the problems, or if changing etcd version/settings will fix the issues.

This is an apparently unrelated increase on memory used by the API server. @shyamjvs has been hard at work bisecting for this, and may have found a culprit:

Fluentd-scaler causing fluentd pod deletions and messes with ds-controller

IMHO, these two performance regressions are significant enough to warrant a release delay.

We also have two test fails which have been receiving no attention. While neither looks that bad, I'm flagging them because we don't actually know what's causing them:

The Stale Reads issue has been removed from 1.10.

Yellow

Issues with a PR that is not approved, or issues with no PR which look possible to ignore/kick in 1.10.

None. Yay! But ...

Test Fails In Progress

All of the below are in progress in some way, but don't yet have a clear resolution.

Possibles

These three issues may be 1.10 issues; they were recently reported and seem related to other issues with 1.10. However, none of them have been examined by the SIGs yet. None look like release-blockers.

Green

Issues with an approved PR which is just waiting for labels, release notes, or automation. A bunch of these are tests which are being fixed now that code was merged into 1.9, just waiting on them to pass.

Special Issues

Primarily tracking issues.

v1.10 known issues / FAQ accumulator
Advanced Auditing 1.10 umbrella bug (waiting on docks merge)
[job failure]ci-kubernetes-e2e-gci-gke|gce-serial

from sig-release.

jberkus commented on July 4, 2024

Burndown report as of 10am PDT March 13

DELETED because issues with Google Sheets caused it to be inaccurate. New burndown shortly.

from sig-release.

stewart-yu commented on July 4, 2024

/cc

from sig-release.

jberkus commented on July 4, 2024

Status as of noon, PDT, March 14th.

We have three issues which were re-opened yesterday, because they were closed in advance of verifying the tests.

Several of the upgrade/downgrade tests have been fixed, but we are waiting on all tests to pass before we actually clear them, since several test fails have been closed and reopened. A big thanks to @liggitt for pursuing those.

Overall status is "Crimson". We have multiple unclosed issues, any of which are sufficient to block release, and two of which (performance/scalability and ) have no specific timeline for resolution. We also have one trailing feature of unknown status. Further delaying the release seems more likely than not.

Red

Issues which are blockers, or are status unknown and look serious, without a good PR.

Performance issues in analysis/progress. Some of the performance and scalability issues have been resolved, and others have been broken out into more specific issues. There are some issues (see Green below) which are not expected to be resolved for 1.10, but are regarded as non-blockers. Many thanks to @shyamjvs for diving into these regressions!

Failing tests with currently unknown causes:

Regression in progress, but fix not passing tests:

Kubelet leaves orphaned directories and mounts after downtime

Orphaned feature, awaiting response from SIG:

Schedule DaemonSet Pods by default scheduler

Yellow

Issues which are blockers, with a good PR. Also undecided issues with or without PRs which look like they won't be considered 1.10 bugs.

Test Fails in Progress

These are currently all upgrade tests

Green

Non-blocker issues (expected to remain broken for 1.10, need to add release note):

Flaky timeouts while waiting for RC pods to be running in density test
Controller-manager sees higher mem-usage when load test runs before density
HostPath mounts failing with "Path is not a shared or slave mount"

Resolved, pending having all tests passing:

Apiserver CPU/Mem usage bumped to 1.5-2x in big clusters
[test failed] [1.10 upgrade] Dynamic Provisioning DynamicProvisioner
pull-kubernetes-kubemark-e2e-gce is failing
"Cluster level logging implemented by Stackdriver should ingest events" fails for GKE Regional Clusters
"CreateContainerConfigError: failed to prepare subPath for volumeMount" error with configMap volume
Subpath tests don't work in multizone GCE

Resolved, waiting for automation:

zsh completion throws error v1.10.0-beta.2

Tracking Issues

from sig-release.

jberkus commented on July 4, 2024

As of around noon PDT today, we have 12 accepted issues open against the 1.10 milestone, which is a decrease of 3 from yesterday. Further, four issues are likely to close in the next 2-5 hours as fixed tests pass.

However, we have a couple of major blocker issues which look possible to delay the release, see Red below.

Tracking spreadsheet is here
and is up to date as of 1pm PDT.

Red

The big potential release-derailer is the major performance regressions:

These two performance regressions are considered significant enough to warrant a release delay work on them, including git bisect and scalability testing, has been ongoing. This is slow due to the relatively small number of folks who understand kube scalability and the scalability tests. You a performance geek who wants to get involved with Kubernetes? We could use you.

This test fail may be related to the fluentd performance issues, but root cause unknown:

[failing test] should restart all nodes and ensure all nodes and pods recover

Pod deletion has a problem with race conditions, work is in progress but intial patch attempt needs work:

Kubelet leaves orphaned directories and mounts after downtime

Yellow

Issues with a PR that is not approved, or issues with no PR which look possible to ignore/kick in 1.10.

The problem with being unable to delete PVCs on downgrade is in progress, in the form of manual downgrade docs and a patch for the tests, with an actual fix due in 1.9.5:

[job failed] 1.9-master upgrade|downgrade jobs

This GCE deprecated flag issue has a PR in progress and near approval. It is currently breaking a lot of unrelated tests:

cluster/gce uses deprecated gcloud parameter "mode"

The rest of the Daemonset Scheduling work looks likely to be postponed until 1.11, but it's unclear at this point what would be required to back out committed work:

Schedule DaemonSet Pods by default scheduler

Green

Issues that are non-blockers or expected regressions and are expected to remain issues after 1.10.0 release:

Issues with an approved PR which is just waiting for labels, release notes, or automation:

"CreateContainerConfigError: failed to prepare subPath for volumeMount" error with configMap volume

Test fails which have been fixed but we're waiting for a couple days of green before we stop watching them:

Special Issues

Primarily tracking issues.

v1.10 known issues / FAQ accumulator
Advanced Auditing 1.10 umbrella bug (waiting on docks merge)
[job failure]ci-kubernetes-e2e-gci-gke|gce-serial

from sig-release.

jberkus commented on July 4, 2024

As of around noon PDT today, we have 8 accepted issues open against the 1.10 milestone, which is a decrease of 4 from yesterday.

At this point, we have three outstanding areas of work, which relate to multiple issues: the performance regressions, PVC protection downgrade, and Daemonset scheduling. Everything else known is resolved.

Tracking spreadsheet is here
and is up to date as of 11am PDT.

Red

Performance regressions are in progress, but still not completely nailed down. Bisect has revealed a candidate issue which is possibly due to an already-reverted PR, and as such the release team wants to get an RC built so that they can really test the tweaks already made:

SIG-Storage is working to fix the failing test for downgrade of protected PVCs. @liggitt is working on a test patch to implement the manual instructions so that we can complete the downgrade tests. Risk: some of the other downgrade tests start failing now that they can finish running.

[job failed] 1.9-master upgrade|downgrade jobs

Yellow

Daemonset scheduling feature is cleared to go into 1.10. All code updates have been merged, although one refactoring PR is deferred to 1.11. Remaining open PR is docs, plus release notes are needed. Risk: we may break new tests with merge this morning.

Schedule DaemonSet Pods by default scheduler
[failing test] should restart all nodes and ensure all nodes and pods recover (also relates to fluentd issues)
[test failed] gci-gce-alpha-features

We also have one more miscellaneous test fail in progress:

[test failed] [1.10 upgrade] Proxy version v1

Special Issues

Primarily tracking issues.

v1.10 known issues / FAQ accumulator
Advanced Auditing 1.10 umbrella bug (waiting on docks merge)
[job failure]ci-kubernetes-e2e-gci-gke|gce-serial

from sig-release.

jberkus commented on July 4, 2024

Status as of noon, PDT, March 19th.

Overall status is "saffron" (yellow with some organge). While the majority of bugs are either closed or have short-timeline plans for closure, we still have outstanding performance issue(s) whose cause is unknown.

Red

Issues which are blockers, or are status unknown and look serious, without a good PR.

Performance issues in analysis/progress. With fluentd patches, performance issues have been addressed within acceptable tolerances (there is increased resource usage in this version of Kubernetes, period). Except this one, whose cause is still unknown:

[test flakes] master-scalability suites
Fluentd-scaler causing fluentd pod deletions and messes with ds-controller (this was patched, but the patch did not improve test results)

Yellow

Issues which are blockers, with a good PR. Also undecided issues with or without PRs which look like they won't be considered 1.10 bugs.

Features

Schedule DaemonSet Pods by default scheduler (merged except for Docs/Release Notes)

PVC protection:

[job failed] 1.9-master upgrade|downgrade jobs

This is being dealt with as a documentation bug with a documented workaround. There is a patch in progress against 1.9 that will make PVC downgrade work, to come out with 1.9.6. In the meantime, users who have a lot of PVCs should be encouraged to wait to upgrade until 1.9.6 is out.

Bugs

Current image in glbc.manifest points to alpha version (gave go ahead at Release Burndown)

Test Fails in Progress

Both of these test fails are related to either Daemonset scheduling, and should turn green soon now that PRs are merged. Hoping!

Green

Non-blocker issues (expected to remain broken for 1.10, generally need to add release note):

Flaky timeouts while waiting for RC pods to be running in density test
Controller-manager sees higher mem-usage when load test runs before density
HostPath mounts failing with "Path is not a shared or slave mount"
kubeadm: etcd certs missing in self-hosted deploy (will be fixed in point release)

Tracking Issues

from sig-release.

jberkus commented on July 4, 2024

Status as of 11am, PDT, March 20th.

Overall status is "tangerine" (trending red). While the majority of bugs are either closed, we still have outstanding performance issue(s) whose cause is unknown and may delay the release. We also have several other unrelated issues which need fixing.

Tracking sheet is here as always.

Red

Release Blockers without a resolution timeline of less than 24 hours.

Performance issues in analysis/progress. There are two, which may be related, causing unacceptable performance on GCE. The cause of these may be in some way related to fluentd, but that doesn't make them a non-blocker:

[test flakes] master-scalability suites
Fluentd-scaler causing fluentd pod deletions and messes with ds-controller (this was patched, but the patch did not improve test results)

GKE tests are no longer running due to some issue with GKE, but we need this resolved before we release:

All gke OSS tests are down since ~8p yesterday?

Yellow

Issues which are blockers, which are expected to resolve in 24 hours. Also undecided issues with or without PRs which look like they won't be considered 1.10 bugs.

Daemonset Scheduling feature needs to be reverted, or more accurately neutralized by disabling the alpha gate, before release:

PVC protection workaround for 1.10.0, with fix pending for 1.9. This shouldn't be a blocker anymore:

[job failed] 1.9-master upgrade|downgrade jobs

Green

Non-blocker issues (expected to remain broken for 1.10, generally need to add release note):

Flaky timeouts while waiting for RC pods to be running in density test
Controller-manager sees higher mem-usage when load test runs before density
HostPath mounts failing with "Path is not a shared or slave mount"
kubeadm: etcd certs missing in self-hosted deploy (will be fixed in point release)
Mounting socket files from subPaths fail (will be fixed in point release)

Tracking Issues

from sig-release.

jberkus commented on July 4, 2024

Status as of 11am, PDT, March 21st. Happy Nowruz! "zardi ye man az to, sorkhi ye to az man" seems particularly appropriate here.

Overall status is "straw" (light yellow). At this point, everthing is resolved or resolving in the next few hours except for #60589, which suffers from having to make a potentially painful tradeoff.

Red

Release Blockers without a resolution timeline of less than 24 hours.

This issue has been traced to a commit which was also a bugfix. At this point, we need opinions from multiple SIG leads about what to do on a reversion:

[test flakes] master-scalability suites

Yellow

Issues which are blockers, which are expected to resolve in 24 hours. Also undecided issues with or without PRs which look like they won't be considered 1.10 bugs.

Issue with subpaths which would prevent someone from upgrading from specific versions of Kubernetes, which has a patch just waiting to be cherrypicked:

Issues with using OuterVolumeSpecName for Volume Subpaths

Green

The Fluentd scaler issue has been fixed sufficient to not be a blocker for 1.10. There are still effects of it which will need fixing in future point releases:

Fluentd-scaler causing fluentd pod deletions and messes with ds-controller (this was patched, but the patch did not improve test results)

GKE tests are now running.

Daemonset scheduling and PVC protection issues have been resolved. Important release note for PVC protection regarding downgrades.

Non-blocker issues (expected to remain broken for 1.10, generally need to add release note):

Flaky timeouts while waiting for RC pods to be running in density test
Controller-manager sees higher mem-usage when load test runs before density
HostPath mounts failing with "Path is not a shared or slave mount"
kubeadm: etcd certs missing in self-hosted deploy (will be fixed in point release)
Mounting socket files from subPaths fail

from sig-release.

Comments (25)

Red

Yellow

Test Fails in Progress

1.9 downgrade tests issues

Green

Tracking Issues

Bump

Wait

Keep

Tracking

Bump

Wait

Keep

Tracking

Burndown report as of scheduled Code Freeze 2/26

Red

Yellow

Green

Tracking

Bump

Red

Yellow

Green

Special Issues

Kick

Burndown report as of 3/1 around 1pm Pacific

Red

Yellow

Green

Tracking

Red

Yellow

Green

Kick

Special Issues

Kick

Red

Yellow

New Test Fails

Green

Special Issues

Kick

Red

Yellow

Test Fails In Progress

Green

Special Issues

Red

Yellow

Test Fails In Progress

Green

Special Issues

Red

Yellow

Test Fails In Progress

Green

Special Issues

Red

Yellow

Test Fails In Progress

Possibles

Green

Special Issues

Red

Yellow

Test Fails in Progress

Green

Tracking Issues

Red

Yellow

Green

Issues that are non-blockers or expected regressions and are expected to remain issues after 1.10.0 release:

Issues with an approved PR which is just waiting for labels, release notes, or automation:

Test fails which have been fixed but we're waiting for a couple days of green before we stop watching them:

Special Issues

Red

Yellow

Special Issues

Red