plambe / zabbix-nvidia-smi-multi-gpu Goto Github PK
View Code? Open in Web Editor NEWA zabbix template using nvidia-smi. Works with multiple GPUs on Windows and Linux.
License: Other
A zabbix template using nvidia-smi. Works with multiple GPUs on Windows and Linux.
License: Other
Added the template and required scripts/ config. I have changed all the Types of Prototypes to Zabbix Agent (active) and am getting this error in the Zabbix UI.
Invalid discovery rule value: cannot parse as a valid JSON object: invalid object format, expected opening character '{' or '[' at: 'The syntax of the command is incorrect. C:\windows\system32><!DOCTYPE html>'
Could be me just being silly though, who knows.
Hi,
I need help to understand and fix.
I follow and install the script in /etc/zabbix/scripts/
and add /etc/zabbix/zabbix_agentd.d/userparameter_nvidia-smi.conf
I also import the template and assign that template to the server. I try to run the command manually to make sure I don't miss something.
The problem is about the lack of information in zabbix and I see a problem in calculating the number of GPUs.
This is the result when I run the command localy.
root@hostname:/etc/zabbix# sudo -u zabbix zabbix_agentd -t gpu.number
gpu.number [t|9]
root@hostname:/etc/zabbix# sudo -u zabbix zabbix_agentd -t gpu.discovery
gpu.discovery [t|{
"data":[
{"{#GPUINDEX}":"0", "{#GPUUUID}":"GPU-UUID"},
{"{#GPUINDEX}":"1", "{#GPUUUID}":"GPU-UUID"},
{"{#GPUINDEX}":"2", "{#GPUUUID}":"GPU-UUID"},
{"{#GPUINDEX}":"3", "{#GPUUUID}":"GPU-UUID"},
{"{#GPUINDEX}":"4", "{#GPUUUID}":"GPU-UUID"},
{"{#GPUINDEX}":"5", "{#GPUUUID}":"GPU-UUID"},
{"{#GPUINDEX}":"6", "{#GPUUUID}":"GPU-UUID"},
{"{#GPUINDEX}":"7", "{#GPUUUID}":"GPU-UUID"},
{"{#GPUINDEX}":"8", "{#GPUUUID}":"GPU-UUID"}
]
}]
The BAT file points into Program Files but in fact the exe is now in a system path. Removing the path fixes the issue.
UserParameter=gpu.utilization.dec.min[*].....
UserParameter=gpu.utilization.dec.max[*].....
UserParameter=gpu.utilization.enc.min[*].....
UserParameter=gpu.utilization.enc.max[*].....
are missing for windows hosts and will yield as "unsupported item" in zabbix monitoring
I want to monitor GPU resource on VM using zabbix. I followed the instructions in the README, but the VM outputs the error shown below. On bare metal, it worked.
If you know a solution, please let me know.
■ Environment
Host OS: vSphere ESXi 7.0U3
GPU: A40
Guest OS: Windows 10 Pro
GPU profile (Guest OS): NVIDIA GRID vGPU nvidia_a40-8q
GPU driver (Guest OS & Host OS): 510.47.03
Hi, this is great work, and I hope we can use this easier.
In my environment, I can replace the get_gpus_info.sh
script into the one-liners with perl as following:
nvidia-smi -L | perl -le 'while(<>){push @a,qq|{"{#GPUINDEX}":"$1", "{#GPUUUID}":"$2"}| if(/GPU (
\d+).*UUID\: (.*)\)$/);} print qq|{"data":[\n| . join(",\n", @a) . qq|\n]}|;'
So, we can replace the UserParameter:
UserParameter=gpu.discovery,nvidia-smi -L | perl -le 'while(<>){push @a,qq|{"{#GPUINDEX}":"$1", "{#GPUUUID}":"$2"}| if(/GPU (
\d+).*UUID\: (.*)\)$/);} print qq|{"data":[\n| . join(",\n", @a) . qq|\n]}|;'
Note that, we need setting of PATH environment for nvidia-smi
and perl
commands.
Heyo,
I am having issue with the bash script you provide.
It says get_gpus_info.sh: 23: get_gpus_info.sh: Syntax error: redirection unexpected
Can you help me fix it ?
This part in the linux config file:
UserParameter=gpu.number,/usr/bin/nvidia-smi -L | /bin/grep GeForce | /usr/bin/wc -l
makes Zabbix only detect GeForce cards, whereas Tesla's are ignored for example.
Why grep anyway?
[root@host~]# /usr/bin/nvidia-smi -L
GPU 0: Tesla V100-PCIE-16GB (UUID: GPU-yyyyyyyyy-xxxx-yyyyy-xxxx-yxyxyxyxyxyxy)
GPU 1: Tesla V100-PCIE-16GB (UUID: GPU-xxxxxxxxx-yyyy-xxxx-yyyy-xxxxyyyyyxxx)
It should be possible to grep for "GPU" and count those lines.
FYI:
[root@host~]# nvidia-installer -v
nvidia-installer: version 396.26 (buildmeister@swio-display-x64-rhel04-19) Mon Apr 30 18:40:31 PDT 2018
Hi,
many thanks for this useful plugin!
Could you please add a LICENSE
to your project to allow contributions and "safe" usage of it?
Thanks! 😄
Hi,
I need help to understand and fix.
I follow and install the script in /etc/zabbix/scripts/ and add /etc/zabbix/zabbix_agentd.d/userparameter_nvidia-smi.conf
I also import the template and assign that template to the server. I try to run the command manually to make sure I don't miss something.
can you help me,thnks.
Hi, thank you for your work.
Could you make a windows script for GPUS discovery? I tried using awk but awk for windows doesn't work very well with quotes.
hello,
got error while importing zbx_nvidia-smi-multi-gpu.xml
Import failed
Cannot read XML: (41) Specification mandates value for attribute data-pjax-transient [Line: 44 | Column: 91]
zabbix is latest
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.