Coder Social home page Coder Social logo

d34dc3n73r / netdata-glibc Goto Github PK

View Code? Open in Web Editor NEW
21.0 2.0 4.0 91 KB

netdata with glibc package for use with nvidia-docker2

License: GNU General Public License v3.0

Dockerfile 100.00%
netdata nvidia-docker nvidia-container-toolkit docker

netdata-glibc's People

Contributors

d34dc3n73r avatar joly0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

netdata-glibc's Issues

netdata cloud?

Any idea what im doing wrong? Ive setup an account and added

--runtime=nvidia --cap-add SYS_PTRACE --security-opt apparmor=unconfined -e NETDATA_CLAIM_TOKEN=XXX -e NETDATA_CLAIM_URL=https://app.netdata.cloud

it spams the log, but netdata says it doesnt get data.. might have to do with that it has a new unqiue name after each restart...? (and thus a new claim token... what have i been doing wrong?

image

2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1526/3946 bytes -61%, prep/sent/total = 0.15/0.15/0.30 ms) 200 '/api/v1/data?chart=system.net&_=1678401198273&format=array&points=364&group=average&gtime=0&options=absolute|jsonwrap|nonzero&after=-780&dimensions=sent' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1534/3885 bytes -61%, prep/sent/total = 0.12/0.12/0.24 ms) 200 '/api/v1/data?chart=system.io&_=1678401198276&format=array&points=364&group=average&gtime=0&options=absolute|jsonwrap|nonzero&after=-780&dimensions=out' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1689/4267 bytes -60%, prep/sent/total = 0.47/0.11/0.58 ms) 200 '/api/v1/data?chart=system.cpu&_=1678401198279&format=array&points=364&group=average&gtime=0&options=absolute|jsonwrap|nonzero&after=-780' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1554/4056 bytes -62%, prep/sent/total = 0.09/0.10/0.19 ms) 200 '/api/v1/data?chart=system.net&_=1678401198282&format=array&points=364&group=average&gtime=0&options=absolute|jsonwrap|nonzero&after=-780&dimensions=received' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 993/3028 bytes -67%, prep/sent/total = 0.12/0.08/0.20 ms) 200 '/api/v1/data?chart=system.io&_=1678401198285&format=array&points=364&group=average&gtime=0&options=absolute|jsonwrap|nonzero&after=-780&dimensions=in' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1451/4258 bytes -66%, prep/sent/total = 0.30/0.10/0.41 ms) 200 '/api/v1/dat

libnvidia-ml.so

Having trouble getting netdata to work with nvidia. I am able to run nvidia-smi on the host machine (openmediavault), as well as another docker container (plex media server). I was getting the same error in the plex container as netdata, editing config.toml to use ldconfig = "/sbin/ldconfig.real" fixed the issue with plex, and doesn't help netdata.

Here's my kernal version and docker version:
Linux 5.10.0-0.bpo.9-amd64 #1 SMP Debian 5.10.70-1~bpo10+1 (2021-10-10) x86_64 GNU/Linux

Client: Docker Engine - Community
Version: 20.10.12
API version: 1.41
Go version: go1.16.12
Git commit: e91ed57
Built: Mon Dec 13 11:45:37 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.12
API version: 1.41 (minimum version 1.12)
Go version: go1.16.12
Git commit: 459d0df
Built: Mon Dec 13 11:43:46 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.12
GitCommit: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
nvidia:
Version: 1.0.2
GitCommit: v1.0.2-0-g52b36a2
docker-init:
Version: 0.19.0
GitCommit: de40ad0

I'm getting this error when running nvidia-smi in the container:

NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

As well as error's like this in the error log:

2022-01-13 21:05:35: go.d ERROR: prometheus[nvidia_gpu_exporter_local] Get "http://127.0.0.1:9445/metrics": dial tcp 127.0.0.1:9445: connect: connection refused

2022-01-13 21:05:35: go.d ERROR: prometheus[nvidia_gpu_exporter_local] check failed

2022-01-13 21:05:35: go.d ERROR: prometheus[nvidia_smi_exporter_local] Get "http://127.0.0.1:9454/metrics": dial tcp 127.0.0.1:9454: connect: connection refused

2022-01-13 21:05:35: go.d ERROR: prometheus[nvidia_smi_exporter_local] check failed

2022-01-13 21:05:35: python.d INFO: plugin[main] : [nvidia_smi] built 1 job(s) configs

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/bin/nvidia-smi' (disk '_usr_bin_nvidia-smi', filesystem 'ext4', root '/usr/lib/nvidia/current/nvidia-smi') is not a directory. (errno 22, Invalid argument)

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/bin/nvidia-debugdump' (disk '_usr_bin_nvidia-debugdump', filesystem 'ext4', root '/usr/lib/nvidia/current/nvidia-debugdump') is not a directory.

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/lib64/libnvidia-ml.so.460.73.01' (disk '_usr_lib64_libnvidia-ml.so.460.73.01', filesystem 'ext4', root '/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.460.73.01') is not a directory. (errno 22, Invalid argument)

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/lib64/libcuda.so.460.73.01' (disk '_usr_lib64_libcuda.so.460.73.01', filesystem 'ext4', root '/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.460.73.01') is not a directory.

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/lib64/libnvidia-ptxjitcompiler.so.460.73.01' (disk '_usr_lib64_libnvidia-ptxjitcompiler.so.460.73.01', filesystem 'ext4', root '/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.460.73.01') is not a directory. (errno 22, Invalid argument)

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/dev/nvidiactl' (disk '_dev_nvidiactl', filesystem 'devtmpfs', root '/nvidiactl') is not a directory.

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/dev/nvidia-uvm' (disk '_dev_nvidia-uvm', filesystem 'devtmpfs', root '/nvidia-uvm') is not a directory. (errno 22, Invalid argument)

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/dev/nvidia-uvm-tools' (disk '_dev_nvidia-uvm-tools', filesystem 'devtmpfs', root '/nvidia-uvm-tools') is not a directory.

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/dev/nvidia0' (disk '_dev_nvidia0', filesystem 'devtmpfs', root '/nvidia0') is not a directory. (errno 22, Invalid argument)

2022-01-13 21:06:06: python.d ERROR: nvidia_smi[nvidia_smi] : xml parse failed: "b"NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.\nPlease also try adding directory that contains libnvidia-ml.so to your system PATH.\n"", error: syntax error: line 1, column 0

2022-01-13 21:06:06: python.d INFO: plugin[main] : nvidia_smi[nvidia_smi] : check failed

Can't run nvidia-smi in container

Hello

first of all, thanks for figuring out a way to have NVIDIA GPU benchmarking working by just extending the base netdata image ๐Ÿ™

I followed the instructions as reported on the DockerHub page.
I can start the container , and then access the webserver running at :19999.
However, I can't see any section hinting at a GPU / nvidia-smi benchmarking.

Not seeing any stats, I thought that maybe there was some issue with the execution of nvidia-smi (if they use it internally in netdata).

I tried executing nvidia-smi in the container:

docker exec netdata  nvidia-smi

but received this error:

NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

The only way that I found for having nvidia-smi successfully executing via docker exec was the following:

docker exec netdata bash -c 'LD_PRELOAD=$(find /usr/lib64/ -name "libnvidia-ml.so.*")  nvidia-smi'

based on this StackOverflow answer

Any clues about how this issue could be solved?

Maybe I'll try to give a peek at netdata's sources to see if I can "patch" the system (supposing that the solution is indeed using LD_PRELOAD).

Best regards.

Best regards.

unknow nvidia runtime

thanks for integrating nvidia-smi in netdata
I tried many times to reproduce it, but no luck.

Neither docker run/nvidia-docker run with --runtime option or docker-compose worked for me.
I am getting unknow runtime specified nvidia.

I have also added the configuration in daemon.json
Any ideas?

latest netdata releases not working?

I see the latest versioned tag on this image is v1.31.0 and the latest tag uses netdata v1.32.1-7-nightly, while the netdata/netdata image has v1.33.0. Where are the latest releases?

Symbol not found /usr/bin/nvidia-smi

Hello,

I just updated my Netdata container this morning after upgrading to v6.10-RC3 of Unraid and noticed the Nvidia Graphs were no longer loading. Looking in the Netdata container logs I am seeing this plugin load error:

Error relocating /usr/bin/nvidia-smi: __strtok_r: symbol not found Error relocating /usr/bin/nvidia-smi: __strdup: symbol not found 2022-03-17 15:24:04: python.d ERROR: nvidia_smi[nvidia_smi] : failed to invoke 'nvidia-smi' binary 2022-03-17 15:24:04: python.d INFO: plugin[main] : nvidia_smi[nvidia_smi] : check failed

I'm pretty sure the v6.10-RC3 update to Unraid didn't affect this, as Netdata was working after the RC3 update. This started happening after clicking the "apply update" button for the Netdata container in Unraid.

Potentially unrelated: I notice this container image says there is an update almost every day in Unraid. Is that normal?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.