Coder Social home page Coder Social logo

cuda's Introduction

System76 CUDA SDK + Tensorflow Packaging

Due to issues with how NVIDIA and Canonical package the CUDA toolkit, System76 is offering better packaging for our users. This provides all of the debian packaging information needed to build multiple versions of the CUDA toolkit in parallel, where the resulting packages may be installed alongside each other. Users that install multiple toolkits only need to use update-alternatives to switch between different versions of the toolkit.

This repository is designed around our debrep tool. It provides the means to generate and maintain apt repositories based on a TOML spec and file system hierarchy. Using debrep, the data contained in the assets and debian directories are merged together and built with sbuild, then stored in a repo pool.

tensorflow-{cuda,cpu}

These packages are built with C, C++, and Python support. Similar to the CUDA packaging, it is possible to alternate between different versions of Tensorflow. We backport fixes to older versions of Tensorflow when possible. The build system we use is based on FloopCZ's work on a Cmake build system.

system76-cuda Metapackage

The cuda directory contains the metapackage required by each of the toolkits. This installs the required shared development dependencies, as well as some system configuration files to get toolkits working out of the box. This package should be built and installed first on the build server.

system76-cuda-X.Y Packages

The cuda-X.Y directories contain the specific versions of the toolkit, which depend upon the system76-cuda metapackage built from the cuda directory. These can be built and installed in parallel, as there are no conflicting files. update-alternatives is used post-install to add a new entry for the symlink at /usr/lib/cuda. Each cuda-X.Y directory contains its own Makefile, which will download the installer & patches for that release, if they are not already located in the directory.

Listing & Switching Between Toolkits

The update-alternatives command may be used to view installed toolkits, and switch between them.

sudo update-alternatives --list cuda
sudo update-alternatives --config cuda

You may verify that you have the correct toolkit active by checking the version.txt file associated with that release of the toolkit:

$ cat /usr/lib/cuda/version.txt
CUDA Version 9.2.88
CUDA Patch Version 9.2.88.1

cuda's People

Contributors

acxz avatar alelopezperez avatar ids1024 avatar jackpot51 avatar jacobgkau avatar machielg avatar mmstick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cuda's Issues

system76-cuda-latest tensorflow-cuda-latest always requires upgrade

Hi,

after installing these packages all works fine, but the system always see that packages might be upgraded.

2 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
  system76-cuda-latest tensorflow-cuda-latest
2 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 4,424 B of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://apt.pop-os.org/proprietary bionic/main all system76-cuda-latest all 9.2 [2,200 B]
Get:2 http://apt.pop-os.org/proprietary bionic/main all tensorflow-cuda-latest all 1.9 [2,224 B]
Fetched 4,424 B in 1s (7,313 B/s)                  
(Reading database ... 484455 files and directories currently installed.)
Preparing to unpack .../system76-cuda-latest_9.2_all.deb ...
Unpacking system76-cuda-latest (9.2) over (9.2) ...
Preparing to unpack .../tensorflow-cuda-latest_1.9_all.deb ...
Unpacking tensorflow-cuda-latest (1.9) over (1.9) ...
Setting up system76-cuda-latest (9.2) ...
Setting up tensorflow-cuda-latest (1.9) ...
Reading package lists... Done
Building dependency tree       
Reading state information... Done
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
➜  ~ sudo update-alternatives --list cuda
/usr/lib/cuda-9.2
➜  ~ cat /usr/lib/cuda/version.txt
CUDA Version 9.2.88
CUDA Patch Version 9.2.88.1
➜  ~ 

tensorflow-gpu warnings with cuda-11.2

Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda-11.2/lib

Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/lib64

The above warning was after installing the NVIDIA CUDA Toolkit using the following commands
sudo apt install system76-cuda-latest
sudo apt install system76-cudnn-10.2

add cuDNN 11.2

I'd like to use cuda 11.2 but cudnn 11.2 is missing, which is available now

Why /usr/lib instead of /usr/local ?

This package installs cuda to /usr/lib. However, the default seems to be /usr/local. For example, the mnistCUDNN sample code from the code samples available here only compiles after editing the Makefile to your path.

I've tried torch and it works fine. But there are packages that rely on the /usr/local as the installation directory. For example jax, which only worked after I symlinked /usr/lib/cuda to /usr/local/cuda

Is there a reason for this installation directory?

The following packages have unmet dependencies: tensorflow-1.12-cuda-10.0

I'm using Pop OS 18.10, I tried to install tensorflow but the following errors were found:

sudo apt install tensorflow-1.12-cuda-10.0
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
tensorflow-1.12-cuda-10.0 : Depends: python3-keras (>= 2.2.2) but it is not going to be installed
Depends: python3-keras-applications (>= 1.0.5) but it is not installable
Depends: python3-keras-preprocessing (>= 1.0.3) but it is not installable
E: Unable to correct problems, you have held broken packages.

findgllib.mk does not work in pop for cuda programs that use GL

The findgllib.mk file is executed by 'make' in all sample codes that use GL for visualization. This file tries to find the files libGL.so and libGLU.so in the system but only does so for supported distros (Ubuntu, Fedora, Rhel, CentOS and SUSE). If this programs are compiled in Pop the make command tells us it cannot find those libraries and fails.

This can be fixed by changing the default findgllib.mk file that ships with this repository, either adding an option for PopOs, which would look as follows. For the version that ships with cuda-11.2, after line 62 add:

POP   = $(shell echo $(DISTRO) | grep -i 'pop' >/dev/null 2>&1; echo $$?)
ifeq ("(POP)", "0")
      GLPATH    ?= /usr/lib
      GLLINK    ?= -L/usr/lib
      DFLT_PATH ?= /usr/lib
endif

This works since these files are installed with the cuda toolkit on /usr/lib/cuda/.

Alternatively, and a much simpler solution, is that line 89 (DFLT_PATH ?= /usr/lib) can be moved at the end of all if statements that check for distributions, right before lines 114 and 115 that actually try to find libGL.so and libGLU.so. I believe this last solution is actually what whoever wrote this file actually intended, since it's the last place to look for if a distribution cannot be recognized. This is how it then goes to find the header files, which works because it starts by setting a default distribution-independent path of search.

CUDA 10.1 missing

Support exists for cuda-10.2 and cuda-10.0, but not cuda-10.1.
There were certain issues with pytorch and cuda-10.2 which a user might want to avoid.

Running 20.04.

CUDA 11.3?

Hi all,

Thanks for your work packaging CUDA in an easy way for system76 machines!

PyTorch has moved up to CUDA 11.3 (see https://pytorch.org/get-started/locally/); does system76 expect to keep these releases up to date with NVIDIA releases, or should I install directly from NVIDIA if I need newer CUDA?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.