Coder Social home page Coder Social logo

libnvidia-ml.so about netdata-glibc HOT 12 CLOSED

d34dc3n73r avatar d34dc3n73r commented on May 27, 2024
libnvidia-ml.so

from netdata-glibc.

Comments (12)

cryptoDevTrader avatar cryptoDevTrader commented on May 27, 2024 1

I got it to work. I followed the following guide on OMV 6 to install nvidia-drivers and nvidia-docker2.

https://forum.openmediavault.org/index.php?thread/31206-how-to-setup-nvidia-in-plex-docker-for-hardware-transcoding/

It indicates that ldconfig should be set to /sbin/ldconfig.real in /etc/nvidia-container-runtime/config.toml. Leaving this set to @/sbin/ldconfig (the default after I installed) works for both the Plex container and netdata.

from netdata-glibc.

D34DC3N73R avatar D34DC3N73R commented on May 27, 2024 1

@cryptoDevTrader you may also want to give the dev image & instructions a try. We'll be moving to that with the next netdata release.
image: d34dc3n73r/netdata-glibc:dev
instructions: https://github.com/D34DC3N73R/netdata-glibc/tree/dev

When the official release happens you'll have to change the image to :stable or :latest depending on your preference.

from netdata-glibc.

D34DC3N73R avatar D34DC3N73R commented on May 27, 2024

I haven't tested or run openmediavault before, but this sounds kind of similar to issue #3
Does it work if you run
docker exec netdata bash -c 'LDCONFIG=$(find /usr/lib64/ -name libnvidia-ml.so.*) nvidia-smi'

from netdata-glibc.

oamster avatar oamster commented on May 27, 2024

Here's the output,

~# docker exec netdata bash -c 'LDCONFIG=$(find /usr/lib64/ -name libnvidia-ml.so.*) nvidia-smi' NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.

My libnvidia on the host machine is in:
/usr/lib/x86_64-linux-gnu/

Not sure if that's the reason it's not working. But my other containers are working fine with it. Right now I've resorted to grafana.

from netdata-glibc.

D34DC3N73R avatar D34DC3N73R commented on May 27, 2024

/usr/lib/x86_64-linux-gnu/ is also where libnvidia is on my host system as well (ubuntu 20.04). But in the container, it should be in /usr/lib64/. What steps did you take to install nvidia container toolkit as well as the nvidia drivers?

Edit: I also found this in regards to OMV + Nvidia
https://forum.openmediavault.org/index.php?thread/40883-nvidia-working-with-omv-6/

Also see this if you're running OMV 5
https://forum.openmediavault.org/index.php?thread/39413-nvidia-smi-couldn-t-find-libnvidia-ml-so-library-in-your-system-please-make-sure/

from netdata-glibc.

oamster avatar oamster commented on May 27, 2024

I had actually used this guide to set everything up, the drivers as well as installing the nvidia tool kit.
https://forum.openmediavault.org/index.php?thread/38013-howto-nvidia-hardware-transcoding-on-omv-5-in-a-plex-docker-container/

I removed and reinstalled drivers, but did not remove /usr/lib/x86_64-linux-gnu/ and anything in that directory manually. Maybe I should give that a try.
Just strange that everything else works with the GPU, just not the official netdata image, or yours.

Edit: Maybe it's an issues with /etc/nvidia-container-runtime/config.toml. As mine is:
#ldconfig = "@/sbin/ldconfig"
#ldconfig = "/sbin/ldconfig"
ldconfig = "/sbin/ldconfig.real"

Edit: But plex and other containers error when setting it ldconfig to anything other than ldconfig.real.

from netdata-glibc.

D34DC3N73R avatar D34DC3N73R commented on May 27, 2024

config.toml is the default

$ cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"

Did reinstalling help at all?

from netdata-glibc.

oamster avatar oamster commented on May 27, 2024

Tried reinstalling, didn't help. Changed my config.toml to ldconfig = "@/sbin/ldconfig and getting this error when deploying the container:

OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: ldcache error: open failed: /sbin/ldconfig.real: no such file or directory: unknown

No error when using ldconfig = "/sbin/ldconfig.real" but still get the python.d error.

I resorted to using prometheus, nvidia smi exporter and grafana which works. But still cannot get it to work with netdata.

from netdata-glibc.

cryptoDevTrader avatar cryptoDevTrader commented on May 27, 2024

Any update/progress on this? I'm having the same exact issue on OMV 6.

from netdata-glibc.

cryptoDevTrader avatar cryptoDevTrader commented on May 27, 2024

Note that I also downgraded nvidia packages as per this post. Using up to date nvidia packages causes the plex container to not work with the configuration noted above. The netdata-glibc container does work.

https://forums.developer.nvidia.com/t/issue-with-setting-up-triton-on-jetson-nano/248485/2

from netdata-glibc.

cryptoDevTrader avatar cryptoDevTrader commented on May 27, 2024

@cryptoDevTrader you may also want to give the dev image & instructions a try. We'll be moving to that with the next netdata release. image: d34dc3n73r/netdata-glibc:dev instructions: https://github.com/D34DC3N73R/netdata-glibc/tree/dev

When the official release happens you'll have to change the image to :stable or :latest depending on your preference.

This was hugely helpful!

I am running both netdata-glibc and plex via docker-compose. netdata-glibc was already working properly with the previous config using the NVIDIA_VISIBLE_DEVICES env and nvidia runtime. Plex, however, was not working with the same configuration and the latest version of nvidia packages (older versions worked fine). Upgrading the nvidia packages to the latest versions and using the deploy method described in the dev branch worked for both deployments.

from netdata-glibc.

D34DC3N73R avatar D34DC3N73R commented on May 27, 2024

closing this, but feel free to reopen if it can be reproduced with the newest updates.

from netdata-glibc.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.