Hello, I am trying the autoscaler. And the solution is working as ex

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I've added a debounce in <a class="commit-link" data-hovercard-type="commit" data-hove

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

All volume expansion alerts arriving at same timestamp in slack. WHY? about kubernetes-volume-autoscaler HOT 7 CLOSED

devops-nirvana commented on June 6, 2024

All volume expansion alerts arriving at same timestamp in slack. WHY?

from kubernetes-volume-autoscaler.

Comments (7)

AndrewFarley commented on June 6, 2024

@rkdutta Can you grab logs from the pod to show me? Or from your log aggregation system if your pod has since died/been restarted? Also, are you using the latest version, or what version are you running?

from kubernetes-volume-autoscaler.

rkdutta commented on June 6, 2024

@AndrewFarley Thanks for responding. Hope the following information helps. If you need more inputs please let me know.

Restart: The pod is running for more than 3 days now and never got restarted.
Version: volume-autoscaler-1.0.6 (using helm) - just default installation with the configurations mentioned in the ticket.
image: devopsnirvana/kubernetes-volume-autoscaler:1.0.6
repo:

devops-nirvana      	https://devops-nirvana.s3.amazonaws.com/helm-charts/

➜  ~ helm search repo devops-nirvana
NAME                            	CHART VERSION	APP VERSION	DESCRIPTION
devops-nirvana/argo-cronjob     	1.0.32       	           	The Universal Argo Cronjob/CronWorkflow Helm Chart
devops-nirvana/cronjob          	1.0.32       	           	The Universal Cronjob Helm Chart
devops-nirvana/cronjob-multi    	1.0.32       	           	The Universal Cronjob Multi Helm Chart, to spin...
devops-nirvana/deployment       	1.0.32       	           	The Universal Deployment Helm Chart
devops-nirvana/statefulset      	1.0.32       	           	The Universal Statefulset Helm Chart
devops-nirvana/volume-autoscaler	1.0.6        	1.0.6      	Volume Autoscaler scales Kubernetes volumes up ...

Logs related to above alerts:

Volume test-claim1 is 100% in-use of the 3G available
  BECAUSE it is above 80% used
  ALERT has been for 1 period(s) which needs to at least 5 period(s) to scale
  BUT need to wait for 5 intervals in alert before considering to scale
  FYI this has desired_size 3G and current size 3G
Volume test-claim1 is 100% in-use of the 3G available
  BECAUSE it is above 80% used
  ALERT has been for 2 period(s) which needs to at least 5 period(s) to scale
  BUT need to wait for 5 intervals in alert before considering to scale
  FYI this has desired_size 3G and current size 3G
Querying and found 16 valid PVCs to assess in prometheus
Volume test-claim1 is 100% in-use of the 3G available
  BECAUSE it is above 80% used
  ALERT has been for 3 period(s) which needs to at least 5 period(s) to scale
  BUT need to wait for 5 intervals in alert before considering to scale
  FYI this has desired_size 3G and current size 3G
Volume test-claim1 is 100% in-use of the 3G available
  BECAUSE it is above 80% used
  ALERT has been for 4 period(s) which needs to at least 5 period(s) to scale
  BUT need to wait for 5 intervals in alert before considering to scale
  FYI this has desired_size 3G and current size 3G
Querying and found 16 valid PVCs to assess in prometheus
Volume test-claim1 is 100% in-use of the 3G available
  BECAUSE it is above 80% used
  ALERT has been for 5 period(s) which needs to at least 5 period(s) to scale
  AND we need to scale it immediately, it has never been scaled previously
  RESIZING disk from 3G to 4G
  Desired New Size: 4000000000
  Actual New Size: 4000000000
Successfully requested to scale up `test-claim1` by `20%` from `3G` to `4G`, it was using more than `80%` disk space over the last `300 seconds`
Volume test-claim1 is 100% in-use of the 3G available
  BECAUSE it is above 80% used
  ALERT has been for 6 period(s) which needs to at least 5 period(s) to scale
  AND we need to scale it immediately, it has never been scaled previously
  RESIZING disk from 3G to 4G
  Desired New Size: 4000000000
  Actual New Size: 4000000000
Successfully requested to scale up `test-claim1` by `20%` from `3G` to `4G`, it was using more than `80%` disk space over the last `360 seconds`

from kubernetes-volume-autoscaler.

rkdutta commented on June 6, 2024

Can anyone help or advise?

from kubernetes-volume-autoscaler.

AndrewFarley commented on June 6, 2024

@rkdutta I've reviewed some of the code and nothing stands out as a change that I can make. I will try to improve some of the logging for a release/update I'm making for this service later today or tomorrow. If you can maybe try the new version and let me know if the issue still persists. Thanks. I'll let you know when I release it...

from kubernetes-volume-autoscaler.

AndrewFarley commented on June 6, 2024

I think I'm going to add some de-bounce logic to internally prevent it from trying to modify a volume more than once too quickly in a row. It seems like maybe your volume didn't update properly in Kubernetes somehow, even though it didn't tell you about this. Can you tell me what Kubernetes providers you're on (cloud, or self-hosted), and what storage controller you're using? @rkdutta

from kubernetes-volume-autoscaler.

AndrewFarley commented on June 6, 2024

I've added a debounce in 51d1848 and will be releasing this shortly and closing this bug. After I release the new version please try it and report in if your issue still persists. It shouldn't happen any more if it was what I suspect it was which is just your storage controller taking a while to fully update Kubernetes.

from kubernetes-volume-autoscaler.

AndrewFarley commented on June 6, 2024

@rkdutta There's an improvement in 1.0.7 which was just released and has been published to the Helm Chart repository. Please update your deployment and let me know if this happens again. There is now a debounce logic inside which will prevent the engine from re-trying the same volume resize for at least 10 intervals. That I believe may help your situation and it's generally not going to be harmful for anyone else.

Closing issue as resolved. Please re-open or open a new one if there's any issues with this or further information. Thanks!

from kubernetes-volume-autoscaler.

All volume expansion alerts arriving at same timestamp in slack. WHY? about kubernetes-volume-autoscaler HOT 7 CLOSED

Comments (7)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent