Coder Social home page Coder Social logo

Comments (7)

AndrewFarley avatar AndrewFarley commented on June 6, 2024

@rkdutta Can you grab logs from the pod to show me? Or from your log aggregation system if your pod has since died/been restarted? Also, are you using the latest version, or what version are you running?

from kubernetes-volume-autoscaler.

rkdutta avatar rkdutta commented on June 6, 2024

@AndrewFarley Thanks for responding. Hope the following information helps. If you need more inputs please let me know.

Restart: The pod is running for more than 3 days now and never got restarted.
Version: volume-autoscaler-1.0.6 (using helm) - just default installation with the configurations mentioned in the ticket.
image: devopsnirvana/kubernetes-volume-autoscaler:1.0.6
repo:

devops-nirvana      	https://devops-nirvana.s3.amazonaws.com/helm-charts/
➜  ~ helm search repo devops-nirvana
NAME                            	CHART VERSION	APP VERSION	DESCRIPTION
devops-nirvana/argo-cronjob     	1.0.32       	           	The Universal Argo Cronjob/CronWorkflow Helm Chart
devops-nirvana/cronjob          	1.0.32       	           	The Universal Cronjob Helm Chart
devops-nirvana/cronjob-multi    	1.0.32       	           	The Universal Cronjob Multi Helm Chart, to spin...
devops-nirvana/deployment       	1.0.32       	           	The Universal Deployment Helm Chart
devops-nirvana/statefulset      	1.0.32       	           	The Universal Statefulset Helm Chart
devops-nirvana/volume-autoscaler	1.0.6        	1.0.6      	Volume Autoscaler scales Kubernetes volumes up ...

Logs related to above alerts:

Volume test-claim1 is 100% in-use of the 3G available
  BECAUSE it is above 80% used
  ALERT has been for 1 period(s) which needs to at least 5 period(s) to scale
  BUT need to wait for 5 intervals in alert before considering to scale
  FYI this has desired_size 3G and current size 3G
Volume test-claim1 is 100% in-use of the 3G available
  BECAUSE it is above 80% used
  ALERT has been for 2 period(s) which needs to at least 5 period(s) to scale
  BUT need to wait for 5 intervals in alert before considering to scale
  FYI this has desired_size 3G and current size 3G
Querying and found 16 valid PVCs to assess in prometheus
Volume test-claim1 is 100% in-use of the 3G available
  BECAUSE it is above 80% used
  ALERT has been for 3 period(s) which needs to at least 5 period(s) to scale
  BUT need to wait for 5 intervals in alert before considering to scale
  FYI this has desired_size 3G and current size 3G
Volume test-claim1 is 100% in-use of the 3G available
  BECAUSE it is above 80% used
  ALERT has been for 4 period(s) which needs to at least 5 period(s) to scale
  BUT need to wait for 5 intervals in alert before considering to scale
  FYI this has desired_size 3G and current size 3G
Querying and found 16 valid PVCs to assess in prometheus
Volume test-claim1 is 100% in-use of the 3G available
  BECAUSE it is above 80% used
  ALERT has been for 5 period(s) which needs to at least 5 period(s) to scale
  AND we need to scale it immediately, it has never been scaled previously
  RESIZING disk from 3G to 4G
  Desired New Size: 4000000000
  Actual New Size: 4000000000
Successfully requested to scale up `test-claim1` by `20%` from `3G` to `4G`, it was using more than `80%` disk space over the last `300 seconds`
Volume test-claim1 is 100% in-use of the 3G available
  BECAUSE it is above 80% used
  ALERT has been for 6 period(s) which needs to at least 5 period(s) to scale
  AND we need to scale it immediately, it has never been scaled previously
  RESIZING disk from 3G to 4G
  Desired New Size: 4000000000
  Actual New Size: 4000000000
Successfully requested to scale up `test-claim1` by `20%` from `3G` to `4G`, it was using more than `80%` disk space over the last `360 seconds`

from kubernetes-volume-autoscaler.

rkdutta avatar rkdutta commented on June 6, 2024

Can anyone help or advise?

from kubernetes-volume-autoscaler.

AndrewFarley avatar AndrewFarley commented on June 6, 2024

@rkdutta I've reviewed some of the code and nothing stands out as a change that I can make. I will try to improve some of the logging for a release/update I'm making for this service later today or tomorrow. If you can maybe try the new version and let me know if the issue still persists. Thanks. I'll let you know when I release it...

from kubernetes-volume-autoscaler.

AndrewFarley avatar AndrewFarley commented on June 6, 2024

I think I'm going to add some de-bounce logic to internally prevent it from trying to modify a volume more than once too quickly in a row. It seems like maybe your volume didn't update properly in Kubernetes somehow, even though it didn't tell you about this. Can you tell me what Kubernetes providers you're on (cloud, or self-hosted), and what storage controller you're using? @rkdutta

from kubernetes-volume-autoscaler.

AndrewFarley avatar AndrewFarley commented on June 6, 2024

I've added a debounce in 51d1848 and will be releasing this shortly and closing this bug. After I release the new version please try it and report in if your issue still persists. It shouldn't happen any more if it was what I suspect it was which is just your storage controller taking a while to fully update Kubernetes.

from kubernetes-volume-autoscaler.

AndrewFarley avatar AndrewFarley commented on June 6, 2024

@rkdutta There's an improvement in 1.0.7 which was just released and has been published to the Helm Chart repository. Please update your deployment and let me know if this happens again. There is now a debounce logic inside which will prevent the engine from re-trying the same volume resize for at least 10 intervals. That I believe may help your situation and it's generally not going to be harmful for anyone else.

Closing issue as resolved. Please re-open or open a new one if there's any issues with this or further information. Thanks!

from kubernetes-volume-autoscaler.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.