Coder Social home page Coder Social logo

Proxmox / LXC: Unable to start docker container, nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1 about stable-diffusion-webui-docker HOT 6 CLOSED

abdbarho avatar abdbarho commented on August 17, 2024
Proxmox / LXC: Unable to start docker container, nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1

from stable-diffusion-webui-docker.

Comments (6)

shodanx2 avatar shodanx2 commented on August 17, 2024 2

Hi,

I've run out of time for tonight, excited to continue working on this tomorrow

To be clear I have installed nvidia-docker2 in the ubuntu 22.04 LXC running on promox

I now suspect that I need some stuff on the proxmox host as well. I was really hoping not to have to touch it !
Because I have a ton of active software on it, oh well, time for update

I tried running it using the nvidia container toolkit install instruction but they failed early

root@proxmox:~#  distribution=$(. /etc/os-release;echo $ID$VERSION_ID)       && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg       && curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container.list |          sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' |          tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
File '/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg' exists. Overwrite? (y/N) y
# Unsupported distribution!
# Check https://nvidia.github.io/libnvidia-container

I will have a couple more leads to read through but I'm done for tonight


https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
https://stackoverflow.com/questions/64197626/nvidia-docker-initialization-error-nvml-error-driver-not-loaded
https://old.reddit.com/r/Proxmox/comments/s02d66/nvidia_gpu_passthrough_in_lxc_problem/
https://forum.proxmox.com/threads/nvidia-container-runtime-in-lxc-container.57786/
https://github.com/Saberwolf64/Proxmox-Nvidia-LXC-
https://old.reddit.com/r/Proxmox/comments/q4d0w5/gpu_access_in_a_docker_container_in_lxc/

from stable-diffusion-webui-docker.

AbdBarho avatar AbdBarho commented on August 17, 2024

@shodanx2 It seems that you are on the right path of debugging.

1080ti should be enough, I am running on a laptop 1060.

370 is not just old but ancient for the deep learning world, I know that docker / nvidia require a least >=418 or something similiar.

Do you have NVIDIA Container Toolkit installed? Yes you did nvidia-docker2

Did you restart the machine after install (so the kernel modules get loaded)?

from stable-diffusion-webui-docker.

greycubesgav avatar greycubesgav commented on August 17, 2024

Hi;
I have a similar setup (Proxmox -> LXC Ubuntu 22.04.1 LTS -> Docker 20.10.17)
It took a fair amount of careful work to get the GPU passed through all the way to a docker container.

My advice would be focus on one step at a time, i.e. make sure the Nvidia drivers are installer and running fine in the proxmox host, then work on the lxc container, then finally get docker Nvidia runtime working. Also make sure the driver version used in the lxc container match the proxmox host version exactly.

Before trying to run any complex python/ml/stable-diff docker containers make sure you can run nvidia-smi within the nvidia coda docker.

e.g.
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

I've shared my rough notes to get the Nvidia-runtime working within an LXC container under proxmox here:
https://gist.github.com/greycubesgav/8f77ff3b6411a868bf4a0365c71d064b

The main gotchas I would say are

  • Use the exactly same version of nvidia drivers within lxc as in the host
  • The lxc.cgroup.devices setting depends on your local group numbers. and may change on reboot
  • The no cgroups config option (no-cgroups = true) to allow docker within the LXC to attach to the gfx card

Thanks

from stable-diffusion-webui-docker.

github-actions avatar github-actions commented on August 17, 2024

This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 7 days.

from stable-diffusion-webui-docker.

github-actions avatar github-actions commented on August 17, 2024

This issue was closed because it has been stalled for 7 days with no activity.

from stable-diffusion-webui-docker.

futhgar avatar futhgar commented on August 17, 2024

I'd like to document this here:

I was getting errors when trying to utilize this repo - the error I was getting was related to cgroups with the following message:

rror response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown

Thanks to @greycubesgav response, I was just missing that change in the nvidia config file to uncomment no-cgroups = true.

from stable-diffusion-webui-docker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.