Coder Social home page Coder Social logo

Comments (22)

ravisantoshgudimetla avatar ravisantoshgudimetla commented on July 19, 2024

@wjiangjay We didn't put up a table for supported versions of descheduler/kubernetes on github but as of now, descheduler works with 1.7 and 1.8.

from descheduler.

aveshagarwal avatar aveshagarwal commented on July 19, 2024

@wjiangjay thanks for reporting this, so it'd be good to move to using metadata.ownerReferences going forward. But as @ravisantoshgudimetla said descheduler is mainly tested with kube 1.7/1.8, and hope is it should work with later versions too. Generally these issues are handled during rebase to higher kube versions. But having a table of compatibility between versions, i think, would be helpful.

from descheduler.

 avatar commented on July 19, 2024

Yeah, better have a compatibility table to declare these kind of things to focus the features and avoid met this issue like me.

Since I have tried with the latest k8s(1.9.2) and found this strategy not working.
Firstly I go through the README to make sure I do the right steps.
After that I increase the descheduler loglevel to display more to point the issue.
And I saw there is no pod eviction requests to k8s, so comes two possibility, first is the strategy not working(low probability), second is the pod is not in the target pods list.
And I go through the code, finally found that filter pods with created-by annotation, so I got the root cause.
And compare the diff of kubectl run nginx --replicas 10 --image=nginx between k8s 1.8 and 1.9, I found k8s 1.9 lack of the annotation.
After check the k8s 1.9 release notes, I draft this issue.

from descheduler.

ravisantoshgudimetla avatar ravisantoshgudimetla commented on July 19, 2024

@wjiangjay I have raised #71 for the compatibility matrix. I am working to make sure that descheduler works with kube 1.9.

from descheduler.

ravisantoshgudimetla avatar ravisantoshgudimetla commented on July 19, 2024

#72 Should fix this.

from descheduler.

cesartl avatar cesartl commented on July 19, 2024

I use k8s 1.9.3 and removeDuplicates is not working. Not sure it's the reason though.

The pods that should be removed are issued from a deployment.
I can see the strategy running in the logs:

I0215 17:23:17.316534       1 duplicates.go:49] Processing node: "a.eu-west-1.compute.internal"
I0215 17:23:17.391900       1 duplicates.go:49] Processing node: "b.eu-west-1.compute.internal"
I0215 17:23:17.403305       1 duplicates.go:49] Processing node: "c.eu-west-1.compute.internal"
I0215 17:23:17.417449       1 duplicates.go:49] Processing node: "d.eu-west-1.compute.internal"

so here it doesn't do much.

After this has run I can find at least one duplicate for example:

kubectl get pods -n test -l app=revenue-modeler-data-store --output=jsonpath='{range .items[*]}{.metadata.ownerReferences[0].name}{" "}{.metadata.ownerReferences[0].kind}{" "}{.spec.nodeName}{"\n"}'
revenue-modeler-data-store-v2-0-5ddcbff849 ReplicaSet a.eu-west-1.compute.internal
revenue-modeler-data-store-v2-0-5ddcbff849 ReplicaSet b.eu-west-1.compute.internal
revenue-modeler-data-store-v2-0-5ddcbff849 ReplicaSet a.eu-west-1.compute.internal

This is issue is marked as fixed. Could it be a regression or something else I've missed?

from descheduler.

aveshagarwal avatar aveshagarwal commented on July 19, 2024

@cesartl thanks for reporting this. I am reopening it.

from descheduler.

aveshagarwal avatar aveshagarwal commented on July 19, 2024

@cesartl though labels are same, do they belong to same deployment or different deployments that use same labels?

from descheduler.

cesartl avatar cesartl commented on July 19, 2024

They belong to the same deployment.

from descheduler.

aveshagarwal avatar aveshagarwal commented on July 19, 2024

@cesartl I will reproduce it locally and will get back you as soon as I get a chance (might not be too fast)?

from descheduler.

cesartl avatar cesartl commented on July 19, 2024

No worries, it's not urgent :)
Let me know if you need anything else from me.

from descheduler.

cesartl avatar cesartl commented on July 19, 2024

It's strange, it seems to work partly. I've just seen this in the logs:

I0216 09:25:10.659899       1 duplicates.go:49] Processing node: "a.eu-west-1.compute.internal"
I0216 09:25:10.719866       1 duplicates.go:49] Processing node: "b.eu-west-1.compute.internal"
I0216 09:25:10.731252       1 duplicates.go:53] "ReplicaSet/rm-web-frontend-nodejs-6c65dffd5d"
I0216 09:25:10.744129       1 duplicates.go:61] Evicted pod: "rm-web-frontend-nodejs-6c65dffd5d-sgnl9" (<nil>)
I0216 09:25:10.744148       1 duplicates.go:49] Processing node: "c.eu-west-1.compute.internal"
I0216 09:25:10.759840       1 duplicates.go:49] Processing node: "d.eu-west-1.compute.internal"
I0216 09:25:10.809597       1 duplicates.go:53] "ReplicaSet/rabbitmq-679ddd6659"
I0216 09:25:10.828210       1 duplicates.go:61] Evicted pod: "rabbitmq-679ddd6659-9wd6m" (<nil>)

But there are still duplicates after this has run. They all use the same deployment method.

from descheduler.

aveshagarwal avatar aveshagarwal commented on July 19, 2024

@cesartl can you enable higher logs --v=4 and check if it shows something? also have you set any PDB (pod disruption budget) or any other sort of minimum limits on number of pods that might be causing pod to not get evicted?

from descheduler.

cesartl avatar cesartl commented on July 19, 2024

I'm already on level 5.

I don't think I'm doing any of that. Here is the deployment yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: revenue-modeler-data-store
    version: _IMAGE_VERSION_
    apiVersion: "_API_VERSION_"
  name: revenue-modeler-data-store-v_K8S_SUFFIX_NAME_
  namespace: _ENVIRONMENT_
spec:
  replicas: 3
  selector:
    matchLabels:
      app: revenue-modeler-data-store
      apiVersion: "_API_VERSION_"
  strategy:
    rollingUpdate:
      maxSurge: 3
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: revenue-modeler-data-store
        version: _IMAGE_VERSION_
        apiVersion: "_API_VERSION_"
        hystrix.enabled: "true"

    spec:
      containers:
      - image: ***
        imagePullPolicy: Always
        name: revenue-modeler-data-store
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 210
          timeoutSeconds: 10
          periodSeconds: 5
          successThreshold: 1
          failureThreshold: 10
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 210
          timeoutSeconds: 10
          periodSeconds: 60
          successThreshold: 1
          failureThreshold: 3
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          requests:
            cpu: 700m
            memory: 1100Mi
          limits:
            cpu: 2000m
            memory: 1100Mi
        terminationMessagePath: /dev/termination-log
        env:
          - name: JAVA_OPTS
            value: ***
          - name: MONGO_URI
            valueFrom:
              secretKeyRef:
                name: mongo-revenue-modeller-secret
                key: uri
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      securityContext: {}
      terminationGracePeriodSeconds: 10

from descheduler.

ravisantoshgudimetla avatar ravisantoshgudimetla commented on July 19, 2024

@cesartl - How many replicas do you have on each node. RemoveDuplicates strategy ensures that 1 copy runs on each node. So, on b.eu-west-1.compute.internal, if you have 2 rm-web-frontend-nodejs-6c65dffd5d-sgnl9, one will be evicted. Same goes for other pods.

But there are still duplicates after this has run. They all use the same deployment method.

Do you mean there is more than replica of rm-web-frontend-nodejs-6c65dffd5d-sgnl9 on each node?

from descheduler.

cesartl avatar cesartl commented on July 19, 2024

In total I have 3 replicas of each (except frontend has 2).

No I mean I had other deployments that were duplicated on some nodes. For some reason they are not being picked up by the strategy.

So it seems to be working for some but not all of them. I can't see the reason why. Here is a full export of the test namespace:

test edge-service-58f4ffdf8 ReplicaSet a.eu-west-1.compute.internal
test edge-service-58f4ffdf8 ReplicaSet a.eu-west-1.compute.internal
test edge-service-58f4ffdf8 ReplicaSet b.eu-west-1.compute.internal
test entitlement-service-fb7f88b99 ReplicaSet c.eu-west-1.compute.internal
test entitlement-service-fb7f88b99 ReplicaSet b.eu-west-1.compute.internal
test entitlement-service-fb7f88b99 ReplicaSet a.eu-west-1.compute.internal
test hazelcast-867f49794f ReplicaSet d.eu-west-1.compute.internal
test hazelcast-867f49794f ReplicaSet c.eu-west-1.compute.internal
test hazelcast-867f49794f ReplicaSet b.eu-west-1.compute.internal
test redis-sentinel StatefulSet b.eu-west-1.compute.internal
test redis-sentinel StatefulSet a.eu-west-1.compute.internal
test redis-sentinel StatefulSet c.eu-west-1.compute.internal
test redis-server StatefulSet b.eu-west-1.compute.internal
test redis-server StatefulSet c.eu-west-1.compute.internal
test redis-server StatefulSet a.eu-west-1.compute.internal
test revenue-modeler-data-store-v2-0-64885f9d67 ReplicaSet b.eu-west-1.compute.internal
test revenue-modeler-data-store-v2-0-64885f9d67 ReplicaSet c.eu-west-1.compute.internal
test revenue-modeler-data-store-v2-0-64885f9d67 ReplicaSet d.eu-west-1.compute.internal
test rm-web-frontend-nodejs-7d45b8d998 ReplicaSet a.eu-west-1.compute.internal
test rm-web-frontend-nodejs-7d45b8d998 ReplicaSet c.eu-west-1.compute.internal

Here they are mostly all good, expect edge-service. But I think revenue-modeler-data-store is by chance as I had situation where I had duplicate with this one after the rescheduler had run.

If I do the same export in a different namespace I get duplicate with redis-server and entitlement-service. Any idea from the code of what could make it work for some deployment but not for others? Could it be a bug in 1.9.3?

from descheduler.

aveshagarwal avatar aveshagarwal commented on July 19, 2024

@cesartl did you check if a pod got evicted but again when the pod was recreated (new instance of the killed pod) landed on the same node?

from descheduler.

cesartl avatar cesartl commented on July 19, 2024

No I don't think so.

I've got 2 pods of the same type on one node whose age are > a few hours and the rescheduler job runs every hour.

from descheduler.

fejta-bot avatar fejta-bot commented on July 19, 2024

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

from descheduler.

fejta-bot avatar fejta-bot commented on July 19, 2024

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

from descheduler.

fejta-bot avatar fejta-bot commented on July 19, 2024

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

from descheduler.

k8s-ci-robot avatar k8s-ci-robot commented on July 19, 2024

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

from descheduler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.