groundnuty / k8s-wait-for Goto Github PK

A simple script that allows to wait for a k8s service, job or pods to enter a desired state

License: MIT License

Makefile 13.99% Shell 80.16% Dockerfile 5.85%

helm kubernetes dependencies synchronization init-containers chart

k8s-wait-for's Issues

The wait completed when the job was not yet done

We are running a k8-wait-for init container in aws/eks. Kubernetes server version =
"serverVersion": {
"major": "1",
"minor": "23+",
"gitVersion": "v1.23.17-eks-0a21954",
"gitCommit": "cd5c12c51b0899612375453f7a7c2e7b6563f5e9",
"gitTreeState": "clean",
"buildDate": "2023-04-15T00:32:27Z",
"goVersion": "go1.19.6",
"compiler": "gc",
"platform": "linux/amd64"
}

But still, the outcome of the describe job command is
...
Completion Mode: NonIndexed
Start Time: Mon, 26 Jun 2023 10:55:00 +0000
Pods Statuses: 1 Active / 0 Succeeded / 0 Failed
Pod Template:
...

which means that the script will silently fail to wait for the job.

#63 will solve the immediate problem, but it may be a good idea to be a bit more defensive and exit as error when the first sed command yields an empty string.

returns 0 (success) when service does not exist

I stumbled across this issue when I was not passing a --namespace ... to the script args...

My init pods were reporting success

    state:
      terminated:
        containerID: docker://754eff26e250a9cab94d6a5f49e118bb8975fe93807bc05cbae4bcec6104f722
        exitCode: 0
        finishedAt: "2019-11-14T20:39:09Z"
        reason: Completed
        startedAt: "2019-11-14T20:39:08Z"

and the logs from the init container contained

parse error: Invalid numeric literal at line 1, column 6
service qa-postgres  is ready.

In reality, it was querying a service that does not exist. The output from the kubectl command would have been:

$ kubectl get service "qa-postgres" -ojson
Error from server (NotFound): services "qa-postgres" not found

which when piped to jq results in:

$ kubectl get service "qa-postgres" -ojson 2>&1 |  jq -cr
parse error: Invalid numeric literal at line 1, column 6

however that failure (as seen in the pod log output) results in the sh script returning 0 (success)

It should cause the init pod to fail.

The above is true whenever kubectl outputs non-json.

/usr/local/bin # kubectl get pods -n qa
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:qa:default" cannot list resource "pods" in API group "" in the namespace "qa"

/usr/local/bin # wait_for.sh service qa-postgress -n qa && echo "*** returns success"
parse error: Invalid numeric literal at line 1, column 6
[2019-11-14 23:12:49] service qa-postgress is ready.
*** returns success

Namespaces other than default

The script does not seem to find pods or services in other namespaces ..
Adding --all-namespaces to the kubectl commands in the script solve the issue.

incorrect pod status when upgraded kubectl to v1.9.2

This is due to this change
kubernetes/kubernetes#60210
the Completed pods will need to be filtered manually.

Error from server (Forbidden): jobs.batch is forbidden: User system:serviceaccount:pronto-proxy:default cannot get resource jobs in API group batch in the namespace

When starting pod I got following error
Error from server (Forbidden): jobs.batch is forbidden: User system:serviceaccount:blabla:default cannot get resource jobs in API group batch in the namespace blabla

kubectl wait command

What is the difference between the wait_for.sh shell script and the kubectl wait command
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#wait

Multiple CVEs reported by Aqua Scan

Aqua Scan reports tons of CVEs in cURL binary:

Since we use this image in production environment, every CVE causes critical security incident (via automated job periodically checking all clusters) which is a pain itself. Easy fix would be to bump cURL version.

wait-for.sh job never finishes if the job fails (even if it re-runs and completed)

I've got a job that normally succeeds and we don't see any problems with k8s-wait-for. However, I'm observing a case where sometimes the job will fail and restart a new job. The metadata on the job ends up here:

status:
  completionTime: "2022-03-03T01:25:56Z"
  conditions:
  - lastProbeTime: "2022-03-03T01:25:56Z"
    lastTransitionTime: "2022-03-03T01:25:56Z"
    status: "True"
    type: Complete
  failed: 1
  startTime: "2022-03-03T01:25:14Z"
  succeeded: 1

You can see it completed but has one failure. This is fine because the job only needs to succeed once. However right now k8s-wait-for does not consider this ok and continues to spin forever.

Waiting for <foo> log output does not show timestamp

Here's some sample logs when using k8s-wait-for:v1.5.1 to wait for a k8s job to complete:

Waiting for job my-job...
Waiting for job my-job...
Waiting for job my-job..
[2021-09-27 14:02:52] job my-job is ready.

It would be awesome if every intermediate log leading up to the " is ready" line also outputted timestamps, so someone watching logs in real time could understand how long the pod has been waiting.

Code xrefs

https://github.com/groundnuty/k8s-wait-for/blob/master/wait_for.sh#L216
https://github.com/groundnuty/k8s-wait-for/blob/master/wait_for.sh#L207

Fix security vulnerabilities

Currently the image groundnuty/k8s-wait-for:no-root-v2.0 has several security vulnerabilities.

Running the command docker scout cves groundnuty/k8s-wait-for:no-root-v2.0 list all of these.
Here is the summary at the end:

67 vulnerabilities found in 8 packages
  LOW       5
  MEDIUM    30
  HIGH      29
  CRITICAL  3

The Trivy scan for this repo has been failing for some time too:
https://github.com/groundnuty/k8s-wait-for/actions/workflows/trivy.yml 💥

I have not looked into this in depth, but maybe the older image of alpine is a part of this?
FROM alpine:3.16.2

How to debug?

Hi,

How can I debug initContainer with k8s-wait-for?
I only see sth like this:

Waiting for pod -lapp=xxxx...
Waiting for pod -lapp=xxxx...
Waiting for pod -lapp=xxxx...

Is there any debug swtich?

[Question/New Feature] Can the script handle more than one job/service... ?

Instead of multiple initContainers, can the script handle more than one job/service?
Etc., It will stop waiting when all jobs are done (even some failed and some succeeded)

args: [
"job", "job-1",
"job", "job-2",
...
]

wait_for.sh job accepts failed jobs

The wait for script cannot distinct between failed jobs when waiting for unless there are more then 10 failed pods when describing the job.

$kubectl get jobs,pods
NAME           COMPLETIONS   DURATION   AGE
job.batch/pi   0/1           22h        22h

NAME           READY   STATUS   RESTARTS   AGE
pod/pi-j64gc   0/1     Error    0          22h

When using the script without -we flag

# wait_for.sh job pi
[2020-07-03 08:11:50] job pi is ready.

The error is because of the regex at https://github.com/groundnuty/k8s-wait-for/blob/master/wait_for.sh#L172
It should be changed to

sed_reg='-e s/^[1-9][[:digit:]]*:[[:digit:]]+:[[:digit:]]+$/1/p -e s/^0:[[:digit:]]+:[1-9][[:digit:]]*$/1/p'

The last digit from the second regex [1-9][[:digit:]]+ will only match more then 1 digits. because of the + sign. Above regex changes it to * on the second digit. Allowing it to be missing.
Thus matching 0:0:1 as well as 0:0:10

After the fix:

/ #  wait_for.sh job pi
Waiting for job pi...
Waiting for job pi...

/ #  wait_for.sh job-we pi
[2020-07-03 08:59:20] job pi is ready.

[Question] Move to GitHub Container Registry (ghcr.io)?

Love the tool, use it extensively.

Would you consider a move to hosting your images on ghcr.io instead of at Docker Hub? Docker Hub has some quite onerous pull limits nowadays.

Example in README does not work with kubectl 1.22

Hi. I'm new to this tool and to k8s in general and I had some problems getting started.

The example in the README does not seem to be working with later versions of kubectl:

$ kubectl run --generator=run-pod/v1 k8s-wait-for --rm -it --image groundnuty/k8s-wait-for:v1.3 --restart Never --command /bin/sh
Error: unknown flag: --generator
See 'kubectl run --help' for usage.
$ kubectl version --client
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:41:28Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}

The release notes indicate that the --generator flag was deprecated and then removed in 1.21:

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md

[Feature Request] Can I ignore 'Evicted' pod？

Sometimes pods enter the 'Evicted' state, but the running of the cluster is not affected.
Could you consider adding this feature? Thank you!

Wait for job does not work as expected

I was expecting for this app to wait until a job completed successfully but it only waited for the job to be ready. Am I misunderstanding something?

This is a portion of my deployment resource and I have verified that my job runs to completion and exits with a status code of 0.

initContainers:
  - name: data-migration-init
    image: 'groundnuty/k8s-wait-for:v1.7'
    args:
      - job
      - my-data-migration-job

Add option

Hi,

It will be very nice to add an option to wait until all pods are running / failed in a specific namespace.
something like:
wait_for.sh namespace my-namespace

Tx,
Roee.

Add a space after curl

k8s-wait-for/Dockerfile

Line 17 in 6507200

&& apk add --update -t deps curl\

Add non-root version

Is it possible to add a version with non-root user?

cannot get resource "jobs" in API group "batch" in the namespace "default"

This seems similar to #39, however this issue provides no solution.

I am running k8s-wait-for as an init container in a deployment, waiting for a job to complete.

The container starts fine, but then exits with:

Error from server (Forbidden): jobs.batch "app-migration-a4c7915a7495153bcae396e7bd9e3d66c" is forbidden: User "system:serviceaccount:default:default" cannot get resource "jobs" in API group "batch" in the namespace "default"

It seems to me like the container with wait-for is lacking the required permissions to access the API with kubectl.

However, I was unable to find any information in the documentation on how to set up proper credentials.

Wait for at least one pods to enter 'Ready' state

I have a scenario in which I need to wait for at least one pod to be ready instead of all.

Wait for job still waiting if one of the pods failed

Hi @groundnuty ,

Your wait for service is amazing, but I have a feature request that maybe you can help with.

Issue:
I have a use case which sometimes (depends on the server resources) my job will create a pod that will be failed, but using backoffLimit it will create a second pod that will run successfully.
The thing is that wait for job is waiting that all pods will be succeeded and not just the last pod for a job.

Expected behavior:
add the ability to wait for job with only the last pod success and not all of the pods.

How to reproduce:

create job withbackoffLimit: 1
create pod with wait for job above
failed the first pod from the job above
check the wait for pod logs

Old version in README examples leads to errors

I got wrecked by #60 because I was using an old image (v1.6). It took me a while to realize that I just copied the example in the README, but the latest was at v2.0.

Maybe it'd be helpful to update the README, or setup something to update the README when a new package is released?

Or maybe even better, since there's specific k8s versions this package is compatible with, wait-for.sh could detect this and throw?

support one container wait multiple pod or job

now one container only can wait one pod or job. if support multiple
such as: wait_for.sh pod -lapp=app -lapp=app2

Unable to find "/v1, Resource=pods" invalid label key "-ltier"

I'm getting this error:

No resources found.
FalseError from server (BadRequest): Unable to find "/v1, Resource=pods" that match label selector "app=cloud-services,-ltier=postgres", field selector "": unable to parse requirement: invalid label key "-ltier": name part must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')

This is on container image tag v1.2-5-g92c083e and the current script in master.

This appears to be caused by

    get_service_state_selectors=$(kubectl get service "$get_service_state_name" $KUBECTL_ARGS -ojson 2>&1 | jq -cr 'if . | has("items") then .items[]
else . end | [ .spec.selector | to_entries[] | "-l\(.key)=\(.value)" ] | join(",") ')

when the service has more than 1 selector

        "selector": {
            "app": "cloud-services",
            "tier": "postgres"

The to_entries[] | "-l\(.key)=\(.value)" ] | join(",") appears to be incorrect and would make

-lapp=cloud-services,-ltier=postgres

when it should not have the additional -l. It should be

-lapp=cloud-services,tier=postgres

The code below that is:

    for get_service_state_selector in $get_service_state_selectors ; do
        get_service_state_selector=$(echo "$get_service_state_selector" | tr ',' ' ')
        get_service_state_state=$(get_pod_state "$get_service_state_selectors")
        get_service_state_states="${get_service_state_states}${get_service_state_state}" ;
    done
    echo "$get_service_state_states"

which looks like it would then replace that comma with a space to make

-lapp=cloud-services -ltier=postgres

except for 2 issues; 1) get_service_state_selector is never used, only assigned to, 2) I don't think the kubectl command will allow multiple -l arguments anyway.

I think this could be fixed by doing:

get_service_state() {
    get_service_state_name="$1"
    get_service_state_selectors=$(kubectl get service "$get_service_state_name" $KUBECTL_ARGS -ojson 2>&1 | jq -cr 'if . | has("items") then .items[]
else . end | [ .spec.selector | to_entries[] | "\(.key)=\(.value)" ] | join(",") ')
    get_service_state_states=""
    for get_service_state_selector in $get_service_state_selectors ; do
        get_service_state_state=$(get_pod_state "-l$get_service_state_selectors")
        get_service_state_states="${get_service_state_states}${get_service_state_state}" ;
    done
    echo "$get_service_state_states"
}

But I'm not sure of all the cases that this has to handle.

retried job waited for for ever

If the job first fails and is retried by kubernetes, k8s-wait-for is not able to revive from it. It waits for the job for ever even after the retry has succeeded.

Got an error when getting service state

./wait_for.sh service mongodb-replicaset
expr: syntax error
error: name cannot be provided when a selector is specified

If the job is already deleted by ttl

We got a problem.
We use k8s-wait-for in initContainers in the workload to wait for database migrations to complete, and we have HPA running.
If the ttlSecondsAfterFinished on the Job expires and the pod is removed, then the new workload pods fail the job check.

What options do we have to solve the problem?

consider enabling ghcr on the repo and releasing the image there as well

Docker Hub throttles traffic and for a critical init container it'd be nice if there were an alternative that was more stable. I'm working around the issue in other ways (re-hosting the image) but just throwing it out there as a suggested improvement.

https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry

TREAT_ERRORS_AS_READ is hard coded

k8s-wait-for/wait_for.sh

Line 13 in 1c9c1a6

TREAT_ERRORS_AS_READY=0

Cannot actually switch the different behaviour in job waiting.

[Question] Update alpine version

Hi, Thanks for this helpful code! 🚀

We are very interested in using this utility. Still, the current version uses a base image that contains some vulnerabilities classified as critical and is impossible to use in some projects by security policies:

I would like to know if it is possible to update the Alpine base image version** from 3.12.1 to 3.12.19 or the latest Alpine version.

thanks,

Throttling warning causes incorrect behavior

I was wondering why my wait for was never finishing and while debugging in the container I saw that my API server was returning a warning message on top of the POD infos:

I0624 10:31:01.428761     375 request.go:668] Waited for 1.181153768s due to client-side throttling, not priority and fairness, request: GET:https://<server>/apis/rbac.authorization.k8s.io/v1?timeout=32s
NAME                            READY   STATUS    RESTARTS   AGE
<pod>   1/1     Running   0          173m
<pod>   1/1     Running   0          173m
<pod>   1/1     Running   0          173m

the script seems to be parsing the first line and ends up not detecting the pod status.

I might have some time to prepare a quick fix but for now I will leave an issue report here.

_{Basem Vaseghi [email protected], Daimler TSS GmbH, legal info/Impressum}

I think that it is related to an unexpected fi instruction.

k8s-wait-for/wait_for.sh

Line 182 in 30e860b

Pulling image is denied

❯ docker pull ghcr.io/groundnuty/k8s-wait-for:v2.0
Error response from daemon: Head "https://ghcr.io/v2/groundnuty/k8s-wait-for/manifests/v2.0": denied: denied

were the permissions for this changed recently?

groundnuty / k8s-wait-for Goto Github PK

k8s-wait-for's Issues

Recommend Projects

Recommend Topics

Recommend Org