Comments (7)
@rkdutta Can you grab logs from the pod to show me? Or from your log aggregation system if your pod has since died/been restarted? Also, are you using the latest version, or what version are you running?
from kubernetes-volume-autoscaler.
@AndrewFarley Thanks for responding. Hope the following information helps. If you need more inputs please let me know.
Restart: The pod is running for more than 3 days now and never got restarted.
Version: volume-autoscaler-1.0.6 (using helm) - just default installation with the configurations mentioned in the ticket.
image: devopsnirvana/kubernetes-volume-autoscaler:1.0.6
repo:
devops-nirvana https://devops-nirvana.s3.amazonaws.com/helm-charts/
➜ ~ helm search repo devops-nirvana
NAME CHART VERSION APP VERSION DESCRIPTION
devops-nirvana/argo-cronjob 1.0.32 The Universal Argo Cronjob/CronWorkflow Helm Chart
devops-nirvana/cronjob 1.0.32 The Universal Cronjob Helm Chart
devops-nirvana/cronjob-multi 1.0.32 The Universal Cronjob Multi Helm Chart, to spin...
devops-nirvana/deployment 1.0.32 The Universal Deployment Helm Chart
devops-nirvana/statefulset 1.0.32 The Universal Statefulset Helm Chart
devops-nirvana/volume-autoscaler 1.0.6 1.0.6 Volume Autoscaler scales Kubernetes volumes up ...
Logs related to above alerts:
Volume test-claim1 is 100% in-use of the 3G available
BECAUSE it is above 80% used
ALERT has been for 1 period(s) which needs to at least 5 period(s) to scale
BUT need to wait for 5 intervals in alert before considering to scale
FYI this has desired_size 3G and current size 3G
Volume test-claim1 is 100% in-use of the 3G available
BECAUSE it is above 80% used
ALERT has been for 2 period(s) which needs to at least 5 period(s) to scale
BUT need to wait for 5 intervals in alert before considering to scale
FYI this has desired_size 3G and current size 3G
Querying and found 16 valid PVCs to assess in prometheus
Volume test-claim1 is 100% in-use of the 3G available
BECAUSE it is above 80% used
ALERT has been for 3 period(s) which needs to at least 5 period(s) to scale
BUT need to wait for 5 intervals in alert before considering to scale
FYI this has desired_size 3G and current size 3G
Volume test-claim1 is 100% in-use of the 3G available
BECAUSE it is above 80% used
ALERT has been for 4 period(s) which needs to at least 5 period(s) to scale
BUT need to wait for 5 intervals in alert before considering to scale
FYI this has desired_size 3G and current size 3G
Querying and found 16 valid PVCs to assess in prometheus
Volume test-claim1 is 100% in-use of the 3G available
BECAUSE it is above 80% used
ALERT has been for 5 period(s) which needs to at least 5 period(s) to scale
AND we need to scale it immediately, it has never been scaled previously
RESIZING disk from 3G to 4G
Desired New Size: 4000000000
Actual New Size: 4000000000
Successfully requested to scale up `test-claim1` by `20%` from `3G` to `4G`, it was using more than `80%` disk space over the last `300 seconds`
Volume test-claim1 is 100% in-use of the 3G available
BECAUSE it is above 80% used
ALERT has been for 6 period(s) which needs to at least 5 period(s) to scale
AND we need to scale it immediately, it has never been scaled previously
RESIZING disk from 3G to 4G
Desired New Size: 4000000000
Actual New Size: 4000000000
Successfully requested to scale up `test-claim1` by `20%` from `3G` to `4G`, it was using more than `80%` disk space over the last `360 seconds`
from kubernetes-volume-autoscaler.
Can anyone help or advise?
from kubernetes-volume-autoscaler.
@rkdutta I've reviewed some of the code and nothing stands out as a change that I can make. I will try to improve some of the logging for a release/update I'm making for this service later today or tomorrow. If you can maybe try the new version and let me know if the issue still persists. Thanks. I'll let you know when I release it...
from kubernetes-volume-autoscaler.
I think I'm going to add some de-bounce logic to internally prevent it from trying to modify a volume more than once too quickly in a row. It seems like maybe your volume didn't update properly in Kubernetes somehow, even though it didn't tell you about this. Can you tell me what Kubernetes providers you're on (cloud, or self-hosted), and what storage controller you're using? @rkdutta
from kubernetes-volume-autoscaler.
I've added a debounce in 51d1848 and will be releasing this shortly and closing this bug. After I release the new version please try it and report in if your issue still persists. It shouldn't happen any more if it was what I suspect it was which is just your storage controller taking a while to fully update Kubernetes.
from kubernetes-volume-autoscaler.
@rkdutta There's an improvement in 1.0.7 which was just released and has been published to the Helm Chart repository. Please update your deployment and let me know if this happens again. There is now a debounce logic inside which will prevent the engine from re-trying the same volume resize for at least 10 intervals. That I believe may help your situation and it's generally not going to be harmful for anyone else.
Closing issue as resolved. Please re-open or open a new one if there's any issues with this or further information. Thanks!
from kubernetes-volume-autoscaler.
Related Issues (13)
- Customer-reported issue: Is not detecting updated/resized max size HOT 3
- Multiarch image? HOT 2
- Add support for custom headers in calls to Prometheus API HOT 4
- Customize Slack message HOT 2
- Random feature ideas to consider (see here if you wish to contribute)
- Trigger on inodes count HOT 5
- kubelet_volume_stats_available_bytes metric is not available in prometheus HOT 1
- Can't use multiplier suffix (Gi, Ti) on `SCALE_UP_MAX_SIZE` HOT 1
- Autoscaling size below current size and PVC size not human readable. HOT 9
- Exception while trying to describe all PVCs HOT 3
- Support victoriametrics instead of prometheus
- Grafana Dashboard
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubernetes-volume-autoscaler.