Coder Social home page Coder Social logo

kaixhin / dockerfiles Goto Github PK

View Code? Open in Web Editor NEW
503.0 32.0 128.0 422 KB

Compilation of Dockerfiles with automated builds enabled on the Docker Registry

Home Page: https://hub.docker.com/u/kaixhin/

License: MIT License

Shell 1.92% Dockerfile 98.08%
docker dockerfiles machine-learning cuda vnc deep-learning

dockerfiles's People

Contributors

kaixhin avatar nakosung avatar natsuki14 avatar scott-vsi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dockerfiles's Issues

Different Nvidia drivers for each image?

Is it not possible to use the same Nvidia driver versions across ALL of your images. Otherwise as it stands I need to create a completely different AWS image for each docker container I want to run, given that the docker containers only seem to work when you have the precise point version of the Nvidia drives on the host.

CNMeM support?

Will there be CNMeM support in the Lasagne image eventually?

error: implicit declaration of function 'THLongStorage_calculateExpandGeometry'

The error was

/tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/generic/THCTensor.c:304:3: error: implicit declaration of function 'THLongStorage_calculateExpandGeometry' [-Werror=implicit-function-declaration]

The last stack trace was

Scanning dependencies of target THC
[ 81%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCCachingAllocator.cpp.o
[ 82%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCCachingHostAllocator.cpp.o
[ 83%] Building C object lib/THC/CMakeFiles/THC.dir/THCGeneral.c.o
[ 84%] Building C object lib/THC/CMakeFiles/THC.dir/THCStorageCopy.c.o
[ 86%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCStream.cpp.o
[ 86%] Building C object lib/THC/CMakeFiles/THC.dir/THCTensorCopy.c.o
[ 87%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCTensorRandom.cpp.o
[ 88%] Building C object lib/THC/CMakeFiles/THC.dir/THCTensor.c.o
In file included from generic/THCTensor.c:1:0,
                 from /tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/THCGenerateAllTypes.h:17,
                 from /tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaByteTensor_newExpand':
/tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/generic/THCTensor.c:304:3: error: implicit declaration of function 'THLongStorage_calculateExpandGeometry' [-Werror=implicit-function-declaration]
   THLongStorage_calculateExpandGeometry(tensor->size,

when building the gpu cuda8.0 docker image.

[Theano] ValueError: Invalid value ("cpu") for configuration variable "gpu". Valid options start with one of "device", "opencl", "cuda"

I use the dock as follow:

sudo nvidia-docker run -it kaixhin/cuda-theano:8.0

and when I try to test theano as follow:

python -c "import theano"

I get the following error

root@57ec910ade69:/# python2 -c "import theano"Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/theano/init.py", line 67, in
from theano.configdefaults import config
File "/usr/local/lib/python2.7/dist-packages/theano/configdefaults.py", line 113, in
in_c_key=False)
File "/usr/local/lib/python2.7/dist-packages/theano/configparser.py", line 285, in AddConfigVar
configparam.get(root, type(root), delete_key=True)
File "/usr/local/lib/python2.7/dist-packages/theano/configparser.py", line 333, in get
self.set(cls, val_str)
File "/usr/local/lib/python2.7/dist-packages/theano/configparser.py", line 344, in set
self.val = self.filter(val)
File "/usr/local/lib/python2.7/dist-packages/theano/configdefaults.py", line 100, in filter
% (self.default, val, self.fullname)))
ValueError: Invalid value ("cpu") for configuration variable "gpu". Valid options start with one of "device", "opencl", "cuda"

Add documentation for cuda-caffe

When running nvidia-docker run -it kaixhin/cuda-caffe:8.0 one ends up in a ~/caffe folder. But caffee seems not to be available via caffe command. Also it looks like that caffe is actually not built. How is one supposed to use this image?

cuda-torch/cuda_v8.0 fails

Two warnings are being regarded as errors by the compiler when compiling THC. Perhaps flags allowing these warnings to proceed might help.

Scanning dependencies of target THC
[ 82%] [ 83%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCCachingAllocator.cpp.o
[ 84%] [ 86%] [ 88%] [ 88%] [ 89%] [ 90%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCCachingHostAllocator.cpp.o
Building C object lib/THC/CMakeFiles/THC.dir/THCStorageCopy.c.o
Building C object lib/THC/CMakeFiles/THC.dir/THCTensorCopy.c.o
Building C object lib/THC/CMakeFiles/THC.dir/THCTensor.c.o
Building C object lib/THC/CMakeFiles/THC.dir/THCGeneral.c.o
Building CXX object lib/THC/CMakeFiles/THC.dir/THCStream.cpp.o
Building CXX object lib/THC/CMakeFiles/THC.dir/THCTensorRandom.cpp.o
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:17,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaByteTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:3: error: implicit declaration of function 'THLongStorage_newInferSize' [-Werror=implicit-function-declaration]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:18,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaCharTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:19,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaShortTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:20,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaIntTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:21,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaLongTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:22,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaHalfTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:23,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:24,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaDoubleTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
cc1: some warnings being treated as errors
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THCTensor.c.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [lib/THC/CMakeFiles/THC.dir/all] Error 2
make: *** [all] Error 2

Error: Build error: Failed building.
Installing https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec... switching to 'build' mode
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/root/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/root/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$(getconf _NPROCESSORS_ONLN) install

The command '/bin/sh -c luarocks install cutorch && luarocks install cunn && luarocks install cudnn' returned a non-zero code: 1

cudnn is where exactly?

Hi,

I'm using your cuda-torch container via docker hub, but see no indication that cudnn is installed (libcudnn.so.* not found anywhere). Was this introduced only in a later version?

Thanks,

Problem running digits server: 'No module named digits'

Running docker run --rm -p 8080:5000 kaixhin/digits gives me /usr/bin/python: No module named digits.

Running

docker run -it --rm kaixhin/digits bash
cd /root/digits/
pip install -r requirements.txt
/root/digits/digits-devserver

fixes the problem and the digits server start.
So is there a problem with the digits image?

could not find boost

When I make other caffe with your enviroment, I got this error. Can you tell me where did you put your boost library?

Torch installs without itorch

For some reason it seems that itorch isn't installed together with torch (I believe that it was installed in earlier versions):

max@max-UX31A:~/$ sudo docker run --rm -it -p 8888:8888 kaixhin/cuda-torch
Unable to find image 'kaixhin/cuda-torch:latest' locally
latest: Pulling from kaixhin/cuda-torch
bbe1c4256df3: Pull complete 
911d09728ffd: Pull complete 
615765bc0d9f: Pull complete 
a3ed95caeb02: Pull complete 
f6c40ea017da: Pull complete 
a53854637f3f: Pull complete 
1cd0c8506d8b: Pull complete 
687b23b1ba76: Pull complete 
73a547b0c44e: Pull complete 
964cc0d8070b: Pull complete 
c146c215733f: Pull complete 
d0ba2846eec7: Pull complete 
Digest: sha256:2e22615195b4ebb19bd633a9afca997e083625b70e457f2ad1a847f96aec7ad7
Status: Downloaded newer image for kaixhin/cuda-torch:latest
root@e80bdeb5c974:~/torch# ls
CMakeLists.txt  README.md  clean.sh  exe    install       install.sh  test.sh
LICENSE.md      build      cmake     extra  install-deps  pkg         update.sh
root@e80bdeb5c974:~/torch# itorch
bash: itorch: command not found

Since the jupyter notebook is one of the most natural ways for interacting with a docker container I believe that it would be beneficial to add itorch to the build or have separate builds that includes the itorch

kaixhin/digits has a problem

In docker registry Ubuntu Core 14.04 + Pycaffe + DIGITS (CPU-only) in https://hub.docker.com/r/kaixhin/digits/

it seem libdc1394 is missing

here is log run :
/usr/local/lib/python2.7/dist-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.wtf is deprecated, use flask_wtf instead.
.format(x=modname), ExtDeprecationWarning
libdc1394 error: Failed to initialize libdc1394
/usr/local/lib/python2.7/dist-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.socketio is deprecated, use flask_socketio instead.
.format(x=modname), ExtDeprecationWarning
Traceback (most recent call last):
File "/root/digits/digits-devserver", line 42, in
from digits.webapp import app, socketio, scheduler
File "/root/digits/digits/webapp.py", line 38, in
import digits.views
File "/root/digits/digits/views.py", line 538, in
Default value for torch_root "" invalid:
torch binary not found in PATH
app.register_error_handler(code, handle_error)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1186, in register_error_handler
self._register_error_handler(None, code_or_exception, f)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 65, in wrapper_func
return f(self, _args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1201, in _register_error_handler
exc_class, code = self._get_exc_class_and_code(code_or_exception)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1121, in _get_exc_class_and_code
exc_class = default_exceptions[exc_class_or_code]
KeyError: 300
Exception KeyError: KeyError(140559450766864,) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored

Migrate to 16.04

All images should move from Ubuntu 14.04 LTS to 16.04 LTS, except for CUDA images where versions <= 7.5 should remain with 14.04 and versions >= 8.0 should migrate (see NVIDIA/nvidia-docker#110).

  • brainstorm
  • caffe
  • cuda-brainstorm
  • cuda-caffe
  • cuda-digits
  • cuda-keras
  • cuda-lasagne
  • cuda-mxnet
  • cuda-neon
  • cuda-pylearn2
  • cuda-ssh
  • cuda-theano
  • cuda-torch
  • cuda-vnc
  • digits
  • fglab
  • fgmachine
  • keras
  • lasagne
  • localtunnel
  • mxnet
  • neon
  • neuron (LXTerminal not working)
  • pylearn2
  • ros
  • samba
  • spearmint
  • ssh
  • sshx
  • theano
  • torch
  • vnc (LXTerminal not working)
  • vnc-ros (LXTerminal not working)

Theano couldn't use cudnn

Using docker 1.10.3 on Ubuntu 14.04 to run docker run -it --device /dev/nvidiactl --device /dev/nvidia-uvm --device /dev/nvidia0 kaixhin/cuda-keras:7.0, then inside the vm start python and try to import theano, it says "CuDNN not available". Try to force cudnn by adding "optimizer_including=cudnn", it shows theano couldn't find cudnn.h.

It looks to me that in addition to install libcudnn4 in dockerfile, you also need to install libcudnn4-dev to provide the header file.

Migrate to NVIDIA Docker

The NVIDIA Docker project seems reasonably stable and migrating will allow a range of drivers to be used (closing #5 and #7). The following need to be built and tested over all supported CUDA versions:

  • cuda-brainstorm
  • cuda-caffe
  • cuda-digits
  • cuda-keras
  • cuda-lasagne
  • cuda-mxnet
  • cuda-neon
  • cuda-pylearn2
  • cuda-ssh
  • cuda-theano
  • cuda-torch
  • cuda-vnc

difference between cuda-mxnet and cuda-mxnet:7.0

What is the difference between cuda-mxnet and cuda-mxnet:7.0 image?
Mxnet python demo works properly on cuda-mxnet:7.0, but it fails on cuda-mxnet.

Run demo in cuda-mxnet:7.0

nvidia-docker run -it --rm kaixhin/cuda-mxnet:7.0 python example/image-classification/train_mnist.py --network lenet --gpus 0

This command returns no errors.

Run demo in cuda-mxnet

nvidia-docker run -it --rm kaixhin/cuda-mxnet python example/image-classification/train_mnist.py --network lenet --gpus 0

This one returns error. These are the error messages:

Archive:  mnist.zip
  inflating: t10k-images-idx3-ubyte  
  inflating: t10k-labels-idx1-ubyte  
  inflating: train-images-idx3-ubyte  
  inflating: train-labels-idx1-ubyte  
[06:07:46] src/io/iter_mnist.cc:94: MNISTIter: load 60000 images, shuffle=1, shape=(128,1,28,28)
[06:07:46] src/io/iter_mnist.cc:94: MNISTIter: load 10000 images, shuffle=1, shape=(128,1,28,28)
2016-10-03 06:07:46,795 Node[0] Start training with [gpu(0)]
[06:10:16] /root/mxnet/dmlc-core/include/dmlc/logging.h:235: [06:10:16] /root/mxnet/mshadow/mshadow/./././dot_engine-inl.h:524: Check failed: (err) == (CUBLAS_STATUS_SUCCESS) Cublas: Sgemm fail
[06:10:16] /root/mxnet/dmlc-core/include/dmlc/logging.h:235: [06:10:16] src/engine/./threaded_engine.h:306: [06:10:16] /root/mxnet/mshadow/mshadow/./././dot_engine-inl.h:524: Check failed: (err) == (CUBLAS_STATUS_SUCCESS) Cublas: Sgemm fail
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
terminate called after throwing an instance of 'dmlc::Error'
  what():  [06:10:16] src/engine/./threaded_engine.h:306: [06:10:16] /root/mxnet/mshadow/mshadow/./././dot_engine-inl.h:524: Check failed: (err) == (CUBLAS_STATUS_SUCCESS) Cublas: Sgemm fail
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Problem with six version and cuda-theano

Hi Kaixhin,
first, thanks for the great images! I want to report a problem that I have with the cuda-theano image though. I am still having the issue which was marked as resolved here.

Here are my specs:

  • Ubuntu 14.04
  • NVIDIA driver version: 361.93.02
  • GPU: Quadro K6000

With the images kaixhin/cuda-theano:7.5 and kaixhin/cuda-theano:8.0 I get the following error after importing theano:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", line 124, in <module>
    from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/__init__.py", line 41, in <module>
    from theano.scan_module import scan_opt
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_opt.py", line 71, in <module>
    from theano.scan_module import scan_op
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 58, in <module>
    from six import iteritems, integer_types, raise_from
ImportError: cannot import name raise_from

I have six-1.5.2 installed in /usr/libs/python2.7 and six-1.11.0 in /usr/local/libs/python2.7.

If I enforce python to use the six version in /usr/local/lib/python2.7 (by putting /usr/local/lib/python2.7/dist-packages on the beginning of my sys.path, I get another error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", line 124, in <module>
    from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/__init__.py", line 41, in <module>
    from theano.scan_module import scan_opt
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_opt.py", line 60, in <module>
    from theano import tensor, scalar
ImportError: cannot import name tensor

If I install six-1.11.0 manually using setup.py from https://pypi.python.org/pypi/six#downloads, it seems to work. At least I get a different error now:

Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
In file included from /tmp/try_flags_ZG8wmH.c:4:0:
/usr/include/cudnn.h:63:26: fatal error: driver_types.h: No such file or directory
 #include "driver_types.h"
                          ^
compilation terminated.

Mapped name None to device cuda: Quadro K6000 (0000:03:00.0)

However, with the kaixhin/theano image (without cuda), I don't get the error and everything works fine.

Any idea what is wrong here?

Thanks!

Building cuda images locally fail

Hi,

I have been trying to build the cuda-torch v8.0 image locally. I have concatenated all your Dockerfiles like below, but I am encountering errors.

The obvious ones are easy to fix, install-deps: line 151: sudo: command not found, but I get to the point where I am unable to fix Failed copying contents of 'lua' directory.

Any idea ?

FROM nvidia/cuda:8.0-cudnn5-devel

# Install git, apt-add-repository and dependencies for iTorch
RUN apt-get update && apt-get install -y \
  git \
  software-properties-common \
  ipython3 \
  libssl-dev \
  libzmq3-dev \
  python-zmq \
  python-pip

# Install Jupyter Notebook for iTorch
RUN pip install notebook ipywidgets

# Run Torch7 installation scripts (dependencies only)
RUN git clone https://github.com/torch/distro.git /root/torch --recursive && \
  cd /root/torch && \
  bash install-deps

# Run Torch7 installation scripts
RUN cd /root/torch && \
# Run without nvcc to prevent timeouts
  sed -i 's/path_to_nvcc=$(which nvcc)/path_to_nvcc=$(which no_nvcc)/g' install.sh && \
  sed -i 's,path_to_nvcc=/usr/local/cuda/bin/nvcc,path_to_nvcc=,g' install.sh && \
  ./install.sh

# Export environment variables manually
ENV LUA_PATH='/root/.luarocks/share/lua/5.1/?.lua;/root/.luarocks/share/lua/5.1/?/init.lua;/root/torch/install/share/lua/5.1/?.lua;/root/torch/install/share/lua/5.1/?/init.lua;./?.lua;/root/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua'
ENV LUA_CPATH='/root/.luarocks/lib/lua/5.1/?.so;/root/torch/install/lib/lua/5.1/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so'
ENV PATH=/root/torch/install/bin:$PATH
ENV LD_LIBRARY_PATH=/root/torch/install/lib:$LD_LIBRARY_PATH
ENV DYLD_LIBRARY_PATH=/root/torch/install/lib:$DYLD_LIBRARY_PATH
ENV LUA_CPATH='/root/torch/install/lib/?.so;'$LUA_CPATH

# Restore Torch7 installation script
RUN cd /root/torch && \
  sed -i 's/path_to_nvcc=$(which no_nvcc)/path_to_nvcc=$(which nvcc)/g' install.sh

# Install CUDA libraries
RUN luarocks install cutorch && \
  luarocks install cunn && \
  luarocks install cudnn

Latest CUDA 7.5

It seems like the latest version of CUDA 7.5 (352.63), which fixes a critical bug on ec2, is only available .deb installer but not in .run installers. Hence the dockerfile in this repo will not install the latest version of CUDA 7.5.

I'm not sure if this is an "issue" that should be fixed but I think it's worth pointing out in case someone elses run into the same situation.

add wget in mxnet Dockerfile

Mxnet python package documentation includes a demo for quick test. However, this demo requires wget to download the dataset.

It would be better to add wget in the dependency so that users can test it directly from

nvidia-docker run -it --rm kaixhin/cuda-mxnet python example/image-classification/train_mnist.py --network lenet --gpus 0

cuda-theano:7.5 import error

I run this cuda-theano image use the command:

sudo nvidia-docker run -it --name theano kaixhin/cuda-theano:7.5 /bin/bash

In the theano container, I test the gpu theano and I have the error:

root@cd170ec8e8f9:~# python -c "import theano; theano.test()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", line 79, in <module>
    from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/__init__.py", line 41, in <module>
    from theano.scan_module import scan_opt
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_opt.py", line 71, in <module>
    from theano.scan_module import scan_op
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 58, in <module>
    from six import iteritems, integer_types, raise_from
ImportError: cannot import name raise_from

I'm sure nvidia-docker works well on my computer.
May I have any help?

Cuda-Digits:8.0 with Multi GPU

I am getting error while training in digits:

"ERROR: USE_NCCL := 1 must be specified for multi-GPU"

I guess caffe should be compiled with USE_NCCL := 1.

How can I do this? I am new to docker issues..

an env question of cuda-torch

I pulled the container of cuda-torch by the instruction. However, I find that we don't have /root/.luarocks/config-5.1.lua/ and /usr/local/share/lua/ exists.
Why do we need to set LUA_PATH and LUA_CPATH with /root/.luarocks/config-5.1.lua/ and /usr/local/share/lua/
image

image

Fix builds

Images in bold have builds disabled via removing their dependent linked repository.

  • cuda-torch
  • keras
  • cuda-keras
  • pylearn2
  • cuda-pylearn2
  • neon
  • cuda-neon

cuda-ssh does not allow root logins on cuda_v8.0 tag

The cuda-ssh dockerfile appears to have an error where sshd_config is modified. Right now root logins remain prohibited when the container is built on the cuda_v8.0.

The line:
# Allow root login with password sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \

should be the following (I think).

# Allow root login with password sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \

Rebuilding the container from the modified dockerfile allowed root login via ssh for me.

Spearmint script?

Spearmint requires several steps to get started; this could probably be reduced to make things smoother (for FGLab for example). Additionally it should be reasonably easy to have a separate MongoDB container. The disadvantage is keeping Spearmint documentation within this project, which requires keeping up to date with any potential API changes.

Pinging @gngdb for feedback on how to approach this.

VNC caffe cuda error

I build a image with caffe, Cuda8.0, cudnn5, and vnc sever.

By command line way,

docker exec -it container_name  bash

I run py-faster-rcnn with caffe. It worked well.

But by VNC way, I use jumpdesktop to connect the same container through VNC. I run the same demo. It reported error.

 WARNING: Logging before InitGoogleLogging() is written to STDERR
 E1228 06:30:11.450963  3340 common.cpp:104] Cannot create Cublas handle. Cublas won't be available.
 E1228 06:30:11.451587  3340 common.cpp:111] Cannot create Curand generator. Curand won't be available.
 E1228 06:30:11.451587  3340 common.cpp:111] Cannot create Curand generator. Curand won't be available.
 F1228 06:30:11.452177  3340 common.cpp:142] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA run\
 time version

cuda-mxnet doesn't work anymore

I build the image but failed.got the error msg:

g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See file:///usr/share/doc/gcc-4.8/README.Bugs for instructions.
make: *** [build/src/operator/tensor/control_flow_op.o] Error 4
The command '/bin/sh -c cd /workspace && git clone --recursive https://github.com/dmlc/mxnet && cd mxnet && cp make/config.mk config.mk && sed -i 's/USE_BLAS = atlas/USE_BLAS = openblas/g' config.mk && sed -i 's/USE_CUDA = 0/USE_CUDA = 1/g' config.mk && sed -i 's/USE_CUDA_PATH = NONE/USE_CUDA_PATH = /usr/local/cuda/g' config.mk && sed -i 's/USE_CUDNN = 0/USE_CUDNN = 1/g' config.mk && sed -i 's/USE_DIST_KVSTORE = 0/USE_DIST_KVSTORE = 1/g' config.mk && make -j"$(nproc)"' returned a non-zero code: 2

Failed to initialize NVML: GPU access blocked by the operating system

Hi, first of all, thanks for sharing these Dockerfiles. I've been trying to use your kaixhin/cuda, but I can't access the GPUs within the container. I'm fairly certain both the host and container are running the same CUDA versions, 7.0.28. But nvidia-smi always outputs Failed to initialize NVML: GPU access blocked by the operating system. Also nvidia-smi -a produces the same error, so I can't find a way to get more information about this error. Do you have any ideas what this could be caused by?

Thanks!

Brendan

Within the docker container:

$ docker run -ti -v `pwd`/NVIDIA_CUDA-7.0_Samples:/cudasamples --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidia1:/dev/nvidia1 --device /dev/nvidia2:/dev/nvidia2 --device /dev/nvidia3:/dev/nvidia3 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm kaixhin/cuda /bin/bash
root@9279fc160f42:/# nvidia-smi 
Failed to initialize NVML: GPU access blocked by the operating system
root@9279fc160f42:/# /cudasamples/1_Utilities/deviceQuery/deviceQuery 
/cudasamples/1_Utilities/deviceQuery/deviceQuery Starting...
 CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL
root@9279fc160f42:/# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27

On the host:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27
$ modinfo nvidia | grep version
version:        346.47
vermagic:       3.16.0-31-generic SMP mod_unload modversions 
$ nvidia-smi
Wed Apr  8 23:47:44 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 346.47     Driver Version: 346.47         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  On   | 0000:04:00.0     Off |                  N/A |
| 26%   28C    P8    14W / 250W |     15MiB /  6143MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  On   | 0000:08:00.0     Off |                  N/A |
| 26%   28C    P8    14W / 250W |     15MiB /  6143MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  On   | 0000:85:00.0     Off |                  N/A |
| 26%   29C    P8    13W / 250W |     15MiB /  6143MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX TIT...  On   | 0000:89:00.0     Off |                  N/A |
| 26%   28C    P8    14W / 250W |     15MiB /  6143MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

exec format error

I tried to install this in my hassio environment (home assistant) and I got the following error when starting the container:
standard_init_linux.go:211: exec user process caused "exec format error"
Any ideas?

ENV issue

ENV PATH=/root/torch/install/bin:$PATH
should be ENV PATH /root/torch/install/bin:$PATH
in dockerfiles/cuda-torch-plus/Dockerfile

See here zeromq/jzmq@5558dc0

Official docker image & multi-arch

Hi,

Want to check if there are any plans to make these docker images official? The steps for doing that are as per here: https://docs.docker.com/docker-hub/official_repos/#how-do-i-create-a-new-official-repository

Eventually I am interested in creating and publishing multi-arch images on dockerhub (especially for ppc64le), as described here https://github.com/docker-library/official-images#multiple-architectures, the first step for that is to have an official Intel image on dockerhub

VNC image local build error

Hey there, great images. I am having some issues building it locally.

Step 4/10 : ENV USER root
 ---> Running in 6b2c881d3107
 ---> 9c91f0c8818e
Removing intermediate container 6b2c881d3107
Step 5/10 : COPY password.txt .
 ---> 71f859bc9d2d
Removing intermediate container 7a7772cc3d2a
Step 6/10 : RUN cat password.txt password.txt | vncpasswd &&   rm password.txt
 ---> Running in 10ebdba5fef9
Using password file /root/.vnc/passwd
VNC directory /root/.vnc does not exist, creating.
Password: Warning: password truncated to the length of 8.
Verify:   Passwords do not match. Please try again.

Password: Password too short
The command '/bin/sh -c cat password.txt password.txt | vncpasswd && rm password.txt' returned a non-zero code: 1`
Tried setting up a longer password but didn't work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.