Comments (15)
Really interesting point!
On my setups, I try to have most of my pods in the 50-80% range, I then consider them to be correctly sized.
In my experience, you can start having reliability issues and weird behaviors above 80% resource usage.
I also consider pods running under 50% usage to be over-sized.
I decided to go for a "standard" color scheme for theses because I think It's what makes sense for most users.
We need to keep in mind that requests could also go above 100% if the limit is higher, so you could have something like red > yellow > green > red
and I think it can be really confusing for users. We could also argue on the thresholds themselves, this depends on everyone use-cases and policies.
Other ideas would be to use a single color, or another color scheme (not green, yellow and red), but I think it's just a little bit weird...
Users like you that know what's best for their use-cases will just ignore the color anyway, so it's not a big deal in my opinion.
Keeping it this way is maybe safer for most users, what do you think?
If anyone wants to comment with thoughts or ideas, I think it's a good topic! 😊
from grafana-dashboards-kubernetes.
For the second part:
- Yes It's a good idea to add the real usage in the table, will make a PR this week to add this
- On Kubernetes the resources are set by containers not by pods, so I think it can only be "by container".
If you have a pod with more than one container, you should have one plot line per container like this:
- For the last point, I know It could be confusing depending of the pods/containers configuration but didn't find a way to make it more readable than this.
Good to know:
- I mostly use this dashboard to size my pods based on average or peak usage
- The table can really help you understand what's wrong with your setup (see screenshot above)
- Gauges could be hard to read if requests and limits are not set the the same way on all containers
- The requests gauges can disappear if no requests are set
A nice (but old) thread by @thockin on limits : https://www.reddit.com/r/kubernetes/comments/all1vg/comment/efgyygu/
from grafana-dashboards-kubernetes.
On my setups, I try to have most of my pods in the 50-80% range, I then consider them to be correctly sized. In my experience, you can start having reliability issues and weird behaviors above 80% resource usage. I also consider pods running under 50% usage to be over-sized.
Not clear if you target your pods to be 50-80% of the LIMIT or REQUEST. I try to target to within 20% of the REQUEST as ideal. If its constantly over the request (20%+) then I would bump that up when tuning as clearly the request I asked for was too low. The LIMIT I want within 50%-70% as a starting point to avoid OOM kills and leave wiggle room.
I decided to go for a "standard" color scheme for theses because I think It's what makes sense for most users. We need to keep in mind that requests could also go above 100% if the limit is higher, so you could have something like
red > yellow > green > red
and I think it can be really confusing for users. We could also argue on the thresholds themselves, this depends on everyone use-cases and policies.
I don't think that is confusing. The request number should be center point of GREEN, left and right of center is an arbitrary number we pick that feels right... +/- 25% from center ??. This defines the green area. Then 20% either side of that would be yellow and the last 5% either side is red. If you are significantly under or over the request, that is a problem.
I think its more confusing now as new users will see a very good request value as RED, be confused and alter the values to get it GREEN which really is not what they should be doing.
Other ideas would be to use a single color, or another color scheme (not green, yellow and red), but I think it's just a little bit weird... Users like you that know what's best for their use-cases will just ignore the color anyway, so it's not a big deal in my opinion.
I've been trying to use Goldilocks to get an idea for requests and limits and its values are all over the map. Pretty much every time you hit refresh you get a different recommendation. I found using your dashboard to be WAY easier to tune with. It's just the request colors are off, you need to know that, and not use the colors to base your tuning. But if we can correct the colors, I think it would be an excellent tool for this.
Keeping it this way is maybe safer for most users, what do you think?
I think no color vs current color pattern is safer. The way is is now, I think encourages the wrong action to make it green. But I don't want no color :(
This is how I think it should look:
You're a bit over, still ok, should not be red:
Significantly under should indicate you can improve:
from grafana-dashboards-kubernetes.
For above, changes I made to graph:
- Standard Options
- Min: auto (but zero looks good to, not sure of difference)
- Max: 2
- Decimals: 1
I'd also like to see a timeline graph of each CPU and RAM usage plotted with with respective request / limit lines plotted on it. This would allow an overall view over time (Last 1 hour, 6 hours, 2 days, etc).
from grafana-dashboards-kubernetes.
Thank you for this @reefland, you just shared many good points and ideas!
I'm still unsure for requests to be honest because it highly depend on how you manage your kubernetes resources (requests = limits, requests < limits...) So I would still keep them neutral for now but we can still iterate on this.
I just created a new version (didn't commit yet):
- Switched to blue color for requests (pod total) and left limits with green, yellow & red
- Added "Used" CPU & Memory in the table
- Added 2 new panels with % usage on requests & limits with thresholds as colored areas
The rest of the dashboard is left unchanged.
Used 20% 30%, 70% & 80% as thresholds, as I think it's pretty conservative.
What do you you think?
from grafana-dashboards-kubernetes.
Yeah! These look neat! Look forward to trying them.
from grafana-dashboards-kubernetes.
Just pushed the new version, try it and let me know what you think.
Maybe we can do a pros/cons list for the requests colors?
from grafana-dashboards-kubernetes.
ok, I'll check it out this weekend!
Do you have any way to determine if request = limit then make it blue, otherwise use color scale like something I suggested?
from grafana-dashboards-kubernetes.
I need to figure out this missing image=
key. As-is, I get nothing. I'll have to re-work each gauge to remove that reference.
from grafana-dashboards-kubernetes.
sigh another issue, besides not having the image=
do not have container=
The container_cpu_usage_seconds_total{namespace="mosquitto", pod="mosquitto-mqtt-0"}
yields:
container_cpu_usage_seconds_total{cpu="total", endpoint="https-metrics", id="/kubepods/burstable/podcc153a2a-d87e-4b18-b37b-159fa6907cd4", instance="k3s02", job="kubelet", metrics_path="/metrics/cadvisor", namespace="mosquitto", node="k3s02", pod="mosquitto-mqtt-0", service="prometheus-kubelet"}
Which returns an empty set using by (container)
:
sum(rate(container_cpu_usage_seconds_total{namespace="mosquitto", pod="mosquitto-mqtt-0"}[1m])) by (container)
from grafana-dashboards-kubernetes.
Ok I think it's time to run copy of your k3s setup to solve both of theses.
I'll do my best to do it this week or during the weekend.
Will keep you updated, hopefully with a fix.
from grafana-dashboards-kubernetes.
We'll keep this issue on topic.
Investigation on missing labels will be in #18
from grafana-dashboards-kubernetes.
Did you manage to test the latest version?
I think it now includes most of what we discussed in this issue.
Let me know.
from grafana-dashboards-kubernetes.
Nah... without the container level metrics I can't really test it properly.
from grafana-dashboards-kubernetes.
Hope you will find a solution to get this working on your setup 🤞
Thanks again for your time and ideas on this!
Closing this issue.
from grafana-dashboards-kubernetes.
Related Issues (20)
- [enhancement] Add support for monitoring node runtime & system resource usage HOT 5
- [bug] node dashboard only shows latest instance HOT 2
- [bug] "CoreDNS - Forward request duration" broken? HOT 5
- suggest lower cardinality variables for the pod dashboard[bug] HOT 3
- Publish tag to make update automation possible HOT 4
- [bug] node dashboard shows no values HOT 7
- [bug] "FS - Device Errors" query in Nodes dashboard is not scoped HOT 1
- [bug] CoreDNS Dashboard No Data HOT 12
- [bug] Variable `job` in Kubernetes / Views / Nodes is not referenced HOT 4
- getting Failed to upgrade legacy queries e.replace is not a function HOT 2
- [bug] Pod memory / cpu requests and limits invalid values with Opencost installed HOT 4
- [enhancement] Windows support HOT 13
- [bug] CPU dashboard can report negative values HOT 35
- All dashboards with cluster variable is broken in VictoriaMetrics [bug] HOT 6
- [bug] exclude iowait, steal, idle from CPU uages HOT 2
- [bug] should use last non-null value, rather than mean HOT 3
- The pod view should allow multi-selection (+`all`) for the `namespace` and `pod` variables [enhancement] HOT 3
- [bug] Trivy Dashboard Templating Failed to upgrade legacy queries Datasource prometheus was not found HOT 2
- Question: How I should export dashboard json HOT 3
- [bug] created_by variable is not refreshed on Time Range Change HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from grafana-dashboards-kubernetes.