d34dc3n73r / netdata-glibc Goto Github PK
View Code? Open in Web Editor NEWnetdata with glibc package for use with nvidia-docker2
License: GNU General Public License v3.0
netdata with glibc package for use with nvidia-docker2
License: GNU General Public License v3.0
Any idea what im doing wrong? Ive setup an account and added
--runtime=nvidia --cap-add SYS_PTRACE --security-opt apparmor=unconfined -e NETDATA_CLAIM_TOKEN=XXX -e NETDATA_CLAIM_URL=https://app.netdata.cloud
it spams the log, but netdata says it doesnt get data.. might have to do with that it has a new unqiue name after each restart...? (and thus a new claim token... what have i been doing wrong?
2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1526/3946 bytes -61%, prep/sent/total = 0.15/0.15/0.30 ms) 200 '/api/v1/data?chart=system.net&_=1678401198273&format=array&points=364&group=average>ime=0&options=absolute|jsonwrap|nonzero&after=-780&dimensions=sent' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1534/3885 bytes -61%, prep/sent/total = 0.12/0.12/0.24 ms) 200 '/api/v1/data?chart=system.io&_=1678401198276&format=array&points=364&group=average>ime=0&options=absolute|jsonwrap|nonzero&after=-780&dimensions=out' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1689/4267 bytes -60%, prep/sent/total = 0.47/0.11/0.58 ms) 200 '/api/v1/data?chart=system.cpu&_=1678401198279&format=array&points=364&group=average>ime=0&options=absolute|jsonwrap|nonzero&after=-780' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1554/4056 bytes -62%, prep/sent/total = 0.09/0.10/0.19 ms) 200 '/api/v1/data?chart=system.net&_=1678401198282&format=array&points=364&group=average>ime=0&options=absolute|jsonwrap|nonzero&after=-780&dimensions=received' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 993/3028 bytes -67%, prep/sent/total = 0.12/0.08/0.20 ms) 200 '/api/v1/data?chart=system.io&_=1678401198285&format=array&points=364&group=average>ime=0&options=absolute|jsonwrap|nonzero&after=-780&dimensions=in' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1451/4258 bytes -66%, prep/sent/total = 0.30/0.10/0.41 ms) 200 '/api/v1/dat
Having trouble getting netdata to work with nvidia. I am able to run nvidia-smi on the host machine (openmediavault), as well as another docker container (plex media server). I was getting the same error in the plex container as netdata, editing config.toml to use ldconfig = "/sbin/ldconfig.real" fixed the issue with plex, and doesn't help netdata.
Here's my kernal version and docker version:
Linux 5.10.0-0.bpo.9-amd64 #1 SMP Debian 5.10.70-1~bpo10+1 (2021-10-10) x86_64 GNU/Linux
Client: Docker Engine - Community
Version: 20.10.12
API version: 1.41
Go version: go1.16.12
Git commit: e91ed57
Built: Mon Dec 13 11:45:37 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.12
API version: 1.41 (minimum version 1.12)
Go version: go1.16.12
Git commit: 459d0df
Built: Mon Dec 13 11:43:46 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.12
GitCommit: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
nvidia:
Version: 1.0.2
GitCommit: v1.0.2-0-g52b36a2
docker-init:
Version: 0.19.0
GitCommit: de40ad0
I'm getting this error when running nvidia-smi in the container:
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
As well as error's like this in the error log:
2022-01-13 21:05:35: go.d ERROR: prometheus[nvidia_gpu_exporter_local] Get "http://127.0.0.1:9445/metrics": dial tcp 127.0.0.1:9445: connect: connection refused
2022-01-13 21:05:35: go.d ERROR: prometheus[nvidia_gpu_exporter_local] check failed
2022-01-13 21:05:35: go.d ERROR: prometheus[nvidia_smi_exporter_local] Get "http://127.0.0.1:9454/metrics": dial tcp 127.0.0.1:9454: connect: connection refused
2022-01-13 21:05:35: go.d ERROR: prometheus[nvidia_smi_exporter_local] check failed
2022-01-13 21:05:35: python.d INFO: plugin[main] : [nvidia_smi] built 1 job(s) configs
2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/bin/nvidia-smi' (disk '_usr_bin_nvidia-smi', filesystem 'ext4', root '/usr/lib/nvidia/current/nvidia-smi') is not a directory. (errno 22, Invalid argument)
2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/bin/nvidia-debugdump' (disk '_usr_bin_nvidia-debugdump', filesystem 'ext4', root '/usr/lib/nvidia/current/nvidia-debugdump') is not a directory.
2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/lib64/libnvidia-ml.so.460.73.01' (disk '_usr_lib64_libnvidia-ml.so.460.73.01', filesystem 'ext4', root '/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.460.73.01') is not a directory. (errno 22, Invalid argument)
2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/lib64/libcuda.so.460.73.01' (disk '_usr_lib64_libcuda.so.460.73.01', filesystem 'ext4', root '/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.460.73.01') is not a directory.
2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/lib64/libnvidia-ptxjitcompiler.so.460.73.01' (disk '_usr_lib64_libnvidia-ptxjitcompiler.so.460.73.01', filesystem 'ext4', root '/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.460.73.01') is not a directory. (errno 22, Invalid argument)
2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/dev/nvidiactl' (disk '_dev_nvidiactl', filesystem 'devtmpfs', root '/nvidiactl') is not a directory.
2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/dev/nvidia-uvm' (disk '_dev_nvidia-uvm', filesystem 'devtmpfs', root '/nvidia-uvm') is not a directory. (errno 22, Invalid argument)
2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/dev/nvidia-uvm-tools' (disk '_dev_nvidia-uvm-tools', filesystem 'devtmpfs', root '/nvidia-uvm-tools') is not a directory.
2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/dev/nvidia0' (disk '_dev_nvidia0', filesystem 'devtmpfs', root '/nvidia0') is not a directory. (errno 22, Invalid argument)
2022-01-13 21:06:06: python.d ERROR: nvidia_smi[nvidia_smi] : xml parse failed: "b"NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.\nPlease also try adding directory that contains libnvidia-ml.so to your system PATH.\n"", error: syntax error: line 1, column 0
2022-01-13 21:06:06: python.d INFO: plugin[main] : nvidia_smi[nvidia_smi] : check failed
Can you elaborate on the step below means?
python.d.conf is the original with nvidia-smi=yes uncommented.
The conf file inside: https://github.com/coraxx/netdata_nv_plugin does not have anything related to nvidia-smi which makes me think this is the wrong folder to volume mount.
Hello
first of all, thanks for figuring out a way to have NVIDIA GPU benchmarking working by just extending the base netdata image ๐
I followed the instructions as reported on the DockerHub page.
I can start the container , and then access the webserver running at :19999.
However, I can't see any section hinting at a GPU / nvidia-smi benchmarking.
Not seeing any stats, I thought that maybe there was some issue with the execution of nvidia-smi
(if they use it internally in netdata).
I tried executing nvidia-smi
in the container:
docker exec netdata nvidia-smi
but received this error:
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
The only way that I found for having nvidia-smi
successfully executing via docker exec
was the following:
docker exec netdata bash -c 'LD_PRELOAD=$(find /usr/lib64/ -name "libnvidia-ml.so.*") nvidia-smi'
based on this StackOverflow answer
Any clues about how this issue could be solved?
Maybe I'll try to give a peek at netdata's sources to see if I can "patch" the system (supposing that the solution is indeed using LD_PRELOAD
).
Best regards.
Best regards.
thanks for integrating nvidia-smi in netdata
I tried many times to reproduce it, but no luck.
Neither docker run/nvidia-docker run with --runtime
option or docker-compose worked for me.
I am getting unknow runtime specified nvidia.
I have also added the configuration in daemon.json
Any ideas?
Is the automation broken again?
I see the latest versioned tag on this image is v1.31.0
and the latest
tag uses netdata v1.32.1-7-nightly
, while the netdata/netdata
image has v1.33.0
. Where are the latest releases?
Hello,
I just updated my Netdata container this morning after upgrading to v6.10-RC3 of Unraid and noticed the Nvidia Graphs were no longer loading. Looking in the Netdata container logs I am seeing this plugin load error:
Error relocating /usr/bin/nvidia-smi: __strtok_r: symbol not found Error relocating /usr/bin/nvidia-smi: __strdup: symbol not found 2022-03-17 15:24:04: python.d ERROR: nvidia_smi[nvidia_smi] : failed to invoke 'nvidia-smi' binary 2022-03-17 15:24:04: python.d INFO: plugin[main] : nvidia_smi[nvidia_smi] : check failed
I'm pretty sure the v6.10-RC3 update to Unraid didn't affect this, as Netdata was working after the RC3 update. This started happening after clicking the "apply update" button for the Netdata container in Unraid.
Potentially unrelated: I notice this container image says there is an update almost every day in Unraid. Is that normal?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.