Comments (2)
Please answer the following questions to get better assistance:
What happened?
Tell us what happened and provide as many details as possible, including logs.
What did you expect to happen?
Tell us about expected behaviour.
What is the GPU model?
Tell us about the hardware configuration of the GPU, including the output of 'nvidia-smi'
What is the environment?
Is DCGM-Exporter running on bare metal or in a virtual environment, container, pod, etc?
How did you deploy the dcgm-exporter and what is the configuration?
Tell us how you deployed DCGM-Exporter. Did you use helm, build from source or use the GPU Operator?
How can we reproduce the issue?
Clear and concise steps to reproduce an issue can help everyone by allowing us to identify and fix problems more quickly.
What is the version?
Tell us about DCGM-exporter version.
from dcgm-exporter.
@lengrongfu, The GPU feature discovery (https://github.com/NVIDIA/gpu-feature-discover) offers the "nvidia.com/mig.strategy" node labels. Do you want to see this label as part of the metric output?
Can you tell us your use case?
from dcgm-exporter.
Related Issues (20)
- can't get DCGM_EXP_XID_ERRORS_COUNT metrics HOT 5
- dcgm-exporter high cpu usage HOT 3
- Extremely high GPU temperature reported by dcgm-exporter HOT 7
- dcgm-exporter is not working on ec2 g5.48xlarge nodes HOT 3
- The pod for a given GPU in k8s mode cannot be captured HOT 5
- Missing NVLINK bandwidth metrics in dcgm-exporter HOT 4
- Failed to add DCGM_EXP_CLOCK_EVENTS_COUNT HOT 6
- SIGSEGV: segmentation violation HOT 6
- dcgmi version and dcgm-exporter version HOT 13
- Metrics around capturing gpu FLOPS HOT 2
- Cannot build from source HOT 9
- how to query rated power? HOT 1
- Cannot build from source via Ansible HOT 4
- Executing dcgmi diag -r 3 in dcgm-exporter, the prompt shows "nvvs binary was not found" HOT 1
- hello,I use docker run -d --gpus all --rm -p 9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:3.3.6-3.4.2-ubuntu22.04 to start the container and an error message readlink: missing operand HOT 5
- Profiling module failed to load HOT 5
- Could not enable kubernetes metric collection: nvml: Unknown Error HOT 2
- Failed to watch metrics: Error watching fields: The third-party Profiling module returned an u HOT 2
- Makefile missing DIST_DIR := cmd/dcgm-exporter HOT 1
- Hello, why /var/log/nv-hostengine.log file had many ERROR [5231:5273] [[NvSwitch]] ReadNvSwitchStatusAllSwitches() HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dcgm-exporter.