kaixhin / dockerfiles Goto Github PK

View Code? Open in Web Editor NEW

503.0 32.0 128.0 422 KB

Compilation of Dockerfiles with automated builds enabled on the Docker Registry

Home Page: https://hub.docker.com/u/kaixhin/

License: MIT License

Shell 1.92% Dockerfile 98.08%

docker dockerfiles machine-learning cuda vnc deep-learning

dockerfiles's People

Contributors

Stargazers

Watchers

Forkers

srepho bahbarrettmatthew jguiraudet freakthemighty wucanyi xsongx mkoloberdin joe8767 qmiwang leochencipher arjunchandra hrishikeshvganu huhoo clarklin lukovkin rexnxiaobai allenscu charlesshang nyimbi potis l9c linan7788626 algotrader-dotcom teepareep edwardtao gotchas99 jsimnz nikhilagrima csjunxu ticoneva yong-zhuang dsdenes mmielimonka colineles dwaiba ismateddy hargit14 jk128 davemssavage sweetcard nakosung lyming531 kenanpelit mknowlin-cng bx5974 paucarre kylemcdonald nlubock davidsonggithub kfriesth cyrta snormore hsaputra yscia007 jvarg teran kurisdo rquintino bsamadi michelleshih 3p3r matthewelse debadyutiroy heanylab john-zuo johnzuo tonitito hephaex adrianlsk sanchitaggarwal humin11 gifs solertis anmol6 lwanglinhong liushuchun wengkhong w796933 shyam2205 jeehyun100 lumoc evasong10 tahak yas3r sunnycat2013 aravindsrinivas ianamunoz greybeetle steer629 rburton04 sunxingxingtf mhousley sai50 ykankaya kenneway hisairnessag3 natsuki14 liviust dh1337 koenvaneijk

dockerfiles's Issues

"torch" dockerfile lacks iTorch

README.md
... (including iTorch)

But:

root@f34323292132:~/torch# itorch notebook
bash: itorch: command not found

Different Nvidia drivers for each image?

Is it not possible to use the same Nvidia driver versions across ALL of your images. Otherwise as it stands I need to create a completely different AWS image for each docker container I want to run, given that the docker containers only seem to work when you have the precise point version of the Nvidia drives on the host.

CNMeM support?

Will there be CNMeM support in the Lasagne image eventually?

error: implicit declaration of function 'THLongStorage_calculateExpandGeometry'

The error was

/tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/generic/THCTensor.c:304:3: error: implicit declaration of function 'THLongStorage_calculateExpandGeometry' [-Werror=implicit-function-declaration]

The last stack trace was

Scanning dependencies of target THC
[ 81%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCCachingAllocator.cpp.o
[ 82%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCCachingHostAllocator.cpp.o
[ 83%] Building C object lib/THC/CMakeFiles/THC.dir/THCGeneral.c.o
[ 84%] Building C object lib/THC/CMakeFiles/THC.dir/THCStorageCopy.c.o
[ 86%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCStream.cpp.o
[ 86%] Building C object lib/THC/CMakeFiles/THC.dir/THCTensorCopy.c.o
[ 87%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCTensorRandom.cpp.o
[ 88%] Building C object lib/THC/CMakeFiles/THC.dir/THCTensor.c.o
In file included from generic/THCTensor.c:1:0,
                 from /tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/THCGenerateAllTypes.h:17,
                 from /tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaByteTensor_newExpand':
/tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/generic/THCTensor.c:304:3: error: implicit declaration of function 'THLongStorage_calculateExpandGeometry' [-Werror=implicit-function-declaration]
   THLongStorage_calculateExpandGeometry(tensor->size,

when building the gpu cuda8.0 docker image.

[Theano] ValueError: Invalid value ("cpu") for configuration variable "gpu". Valid options start with one of "device", "opencl", "cuda"

I use the dock as follow:

sudo nvidia-docker run -it kaixhin/cuda-theano:8.0

and when I try to test theano as follow:

python -c "import theano"

I get the following error

root@57ec910ade69:/# python2 -c "import theano"Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/theano/init.py", line 67, in
from theano.configdefaults import config
File "/usr/local/lib/python2.7/dist-packages/theano/configdefaults.py", line 113, in
in_c_key=False)
File "/usr/local/lib/python2.7/dist-packages/theano/configparser.py", line 285, in AddConfigVar
configparam.get(root, type(root), delete_key=True)
File "/usr/local/lib/python2.7/dist-packages/theano/configparser.py", line 333, in get
self.set(cls, val_str)
File "/usr/local/lib/python2.7/dist-packages/theano/configparser.py", line 344, in set
self.val = self.filter(val)
File "/usr/local/lib/python2.7/dist-packages/theano/configdefaults.py", line 100, in filter
% (self.default, val, self.fullname)))
ValueError: Invalid value ("cpu") for configuration variable "gpu". Valid options start with one of "device", "opencl", "cuda"

Add documentation for cuda-caffe

When running nvidia-docker run -it kaixhin/cuda-caffe:8.0 one ends up in a ~/caffe folder. But caffee seems not to be available via caffe command. Also it looks like that caffe is actually not built. How is one supposed to use this image?

cuda-torch/cuda_v8.0 fails

Two warnings are being regarded as errors by the compiler when compiling THC. Perhaps flags allowing these warnings to proceed might help.

Scanning dependencies of target THC
[ 82%] [ 83%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCCachingAllocator.cpp.o
[ 84%] [ 86%] [ 88%] [ 88%] [ 89%] [ 90%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCCachingHostAllocator.cpp.o
Building C object lib/THC/CMakeFiles/THC.dir/THCStorageCopy.c.o
Building C object lib/THC/CMakeFiles/THC.dir/THCTensorCopy.c.o
Building C object lib/THC/CMakeFiles/THC.dir/THCTensor.c.o
Building C object lib/THC/CMakeFiles/THC.dir/THCGeneral.c.o
Building CXX object lib/THC/CMakeFiles/THC.dir/THCStream.cpp.o
Building CXX object lib/THC/CMakeFiles/THC.dir/THCTensorRandom.cpp.o
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:17,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaByteTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:3: error: implicit declaration of function 'THLongStorage_newInferSize' [-Werror=implicit-function-declaration]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:18,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaCharTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:19,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaShortTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:20,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaIntTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:21,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaLongTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:22,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaHalfTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:23,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:24,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaDoubleTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
cc1: some warnings being treated as errors
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THCTensor.c.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [lib/THC/CMakeFiles/THC.dir/all] Error 2
make: *** [all] Error 2

Error: Build error: Failed building.
Installing https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec... switching to 'build' mode
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/root/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/root/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$(getconf _NPROCESSORS_ONLN) install

The command '/bin/sh -c luarocks install cutorch && luarocks install cunn && luarocks install cudnn' returned a non-zero code: 1

cudnn is where exactly?

Hi,

I'm using your cuda-torch container via docker hub, but see no indication that cudnn is installed (libcudnn.so.* not found anywhere). Was this introduced only in a later version?

Thanks,

Problem running digits server: 'No module named digits'

Running docker run --rm -p 8080:5000 kaixhin/digits gives me /usr/bin/python: No module named digits.

Running

docker run -it --rm kaixhin/digits bash
cd /root/digits/
pip install -r requirements.txt
/root/digits/digits-devserver

fixes the problem and the digits server start.
So is there a problem with the digits image?

Updated DIGITS CPU image?

Can you push an update to DIGITS (https://hub.docker.com/r/kaixhin/digits/) to 6.0 or is there a reason you are sticking at 5?

could not find boost

When I make other caffe with your enviroment, I got this error. Can you tell me where did you put your boost library?

Torch installs without itorch

For some reason it seems that itorch isn't installed together with torch (I believe that it was installed in earlier versions):

max@max-UX31A:~/$ sudo docker run --rm -it -p 8888:8888 kaixhin/cuda-torch
Unable to find image 'kaixhin/cuda-torch:latest' locally
latest: Pulling from kaixhin/cuda-torch
bbe1c4256df3: Pull complete 
911d09728ffd: Pull complete 
615765bc0d9f: Pull complete 
a3ed95caeb02: Pull complete 
f6c40ea017da: Pull complete 
a53854637f3f: Pull complete 
1cd0c8506d8b: Pull complete 
687b23b1ba76: Pull complete 
73a547b0c44e: Pull complete 
964cc0d8070b: Pull complete 
c146c215733f: Pull complete 
d0ba2846eec7: Pull complete 
Digest: sha256:2e22615195b4ebb19bd633a9afca997e083625b70e457f2ad1a847f96aec7ad7
Status: Downloaded newer image for kaixhin/cuda-torch:latest
root@e80bdeb5c974:~/torch# ls
CMakeLists.txt  README.md  clean.sh  exe    install       install.sh  test.sh
LICENSE.md      build      cmake     extra  install-deps  pkg         update.sh
root@e80bdeb5c974:~/torch# itorch
bash: itorch: command not found

Since the jupyter notebook is one of the most natural ways for interacting with a docker container I believe that it would be beneficial to add itorch to the build or have separate builds that includes the itorch

Cannot checkout torch with cuda 8 using tag

$ nvidia-docker run -it kaixhin/cuda-torch:8.0
Tag 8.0 not found in repository docker.io/kaixhin/cuda-torch

Could NOT find CUDA: Found unsuitable version "7.0", but required is at least "7.5"

Getting the following error when building cudnn on the 7.0 tag release. �

[91mCMake Error at /usr/share/cmake-2.8/Modules/FindPackageHandleStandardArgs.cmake:108 (message):
Could NOT find CUDA: Found unsuitable version "7.0", but required is at
least "7.5" (found /usr/local/cuda)

You can see this error is generated on your automated build as well: Docker Hub Build Log

cannot find cuda samples script

If using the official cuda installer, I can find cuda-install-samples-7.0.sh.
http://docs.nvidia.com/cuda/cuda-samples/index.html#getting-cuda-samples

However, after building using your dockerfile, I cannot find this script.

How to include the cuda samples in the docker image? Because I want to test the cuda with GPU by using deviceQuery script.

kaixhin/digits has a problem

In docker registry Ubuntu Core 14.04 + Pycaffe + DIGITS (CPU-only) in https://hub.docker.com/r/kaixhin/digits/

it seem libdc1394 is missing

here is log run :
/usr/local/lib/python2.7/dist-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.wtf is deprecated, use flask_wtf instead.
.format(x=modname), ExtDeprecationWarning
libdc1394 error: Failed to initialize libdc1394
/usr/local/lib/python2.7/dist-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.socketio is deprecated, use flask_socketio instead.
.format(x=modname), ExtDeprecationWarning
Traceback (most recent call last):
File "/root/digits/digits-devserver", line 42, in
from digits.webapp import app, socketio, scheduler
File "/root/digits/digits/webapp.py", line 38, in
import digits.views
File "/root/digits/digits/views.py", line 538, in
Default value for torch_root "" invalid:
torch binary not found in PATH
app.register_error_handler(code, handle_error)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1186, in register_error_handler
self._register_error_handler(None, code_or_exception, f)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 65, in wrapper_func
return f(self, _args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1201, in _register_error_handler
exc_class, code = self._get_exc_class_and_code(code_or_exception)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1121, in _get_exc_class_and_code
exc_class = default_exceptions[exc_class_or_code]
KeyError: 300
Exception KeyError: KeyError(140559450766864,) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored

Migrate to 16.04

All images should move from Ubuntu 14.04 LTS to 16.04 LTS, except for CUDA images where versions <= 7.5 should remain with 14.04 and versions >= 8.0 should migrate (see NVIDIA/nvidia-docker#110).

digits cpu image-add support for data & jobs volumes/host persistence

Hi, would be cool to add support like in the default nvidia digits gpu dockerfile:

https://gitlab.com/nvidia/digits/blob/master/5.0/Dockerfile

VOLUME /data
VOLUME /jobs

ENV DIGITS_JOBS_DIR=/jobs
ENV DIGITS_LOGFILE_FILENAME=/jobs/digits.log

great work! regards
RQ

Theano couldn't use cudnn

Using docker 1.10.3 on Ubuntu 14.04 to run docker run -it --device /dev/nvidiactl --device /dev/nvidia-uvm --device /dev/nvidia0 kaixhin/cuda-keras:7.0, then inside the vm start python and try to import theano, it says "CuDNN not available". Try to force cudnn by adding "optimizer_including=cudnn", it shows theano couldn't find cudnn.h.

It looks to me that in addition to install libcudnn4 in dockerfile, you also need to install libcudnn4-dev to provide the header file.

Migrate to NVIDIA Docker

The NVIDIA Docker project seems reasonably stable and migrating will allow a range of drivers to be used (closing #5 and #7). The following need to be built and tested over all supported CUDA versions:

difference between cuda-mxnet and cuda-mxnet:7.0

What is the difference between cuda-mxnet and cuda-mxnet:7.0 image?
Mxnet python demo works properly on cuda-mxnet:7.0, but it fails on cuda-mxnet.

Run demo in cuda-mxnet:7.0

nvidia-docker run -it --rm kaixhin/cuda-mxnet:7.0 python example/image-classification/train_mnist.py --network lenet --gpus 0

This command returns no errors.

Run demo in cuda-mxnet

nvidia-docker run -it --rm kaixhin/cuda-mxnet python example/image-classification/train_mnist.py --network lenet --gpus 0

This one returns error. These are the error messages:

Archive:  mnist.zip
  inflating: t10k-images-idx3-ubyte  
  inflating: t10k-labels-idx1-ubyte  
  inflating: train-images-idx3-ubyte  
  inflating: train-labels-idx1-ubyte  
[06:07:46] src/io/iter_mnist.cc:94: MNISTIter: load 60000 images, shuffle=1, shape=(128,1,28,28)
[06:07:46] src/io/iter_mnist.cc:94: MNISTIter: load 10000 images, shuffle=1, shape=(128,1,28,28)
2016-10-03 06:07:46,795 Node[0] Start training with [gpu(0)]
[06:10:16] /root/mxnet/dmlc-core/include/dmlc/logging.h:235: [06:10:16] /root/mxnet/mshadow/mshadow/./././dot_engine-inl.h:524: Check failed: (err) == (CUBLAS_STATUS_SUCCESS) Cublas: Sgemm fail
[06:10:16] /root/mxnet/dmlc-core/include/dmlc/logging.h:235: [06:10:16] src/engine/./threaded_engine.h:306: [06:10:16] /root/mxnet/mshadow/mshadow/./././dot_engine-inl.h:524: Check failed: (err) == (CUBLAS_STATUS_SUCCESS) Cublas: Sgemm fail
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
terminate called after throwing an instance of 'dmlc::Error'
  what():  [06:10:16] src/engine/./threaded_engine.h:306: [06:10:16] /root/mxnet/mshadow/mshadow/./././dot_engine-inl.h:524: Check failed: (err) == (CUBLAS_STATUS_SUCCESS) Cublas: Sgemm fail
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

cuda-mxnet build failed for the past 2 months

From this page, it seems that the dockerfile fails to compile recent mxnet updates:
https://hub.docker.com/r/kaixhin/cuda-mxnet/builds/

Problem with six version and cuda-theano

Hi Kaixhin,
first, thanks for the great images! I want to report a problem that I have with the cuda-theano image though. I am still having the issue which was marked as resolved here.

Here are my specs:

Ubuntu 14.04
NVIDIA driver version: 361.93.02
GPU: Quadro K6000

With the images kaixhin/cuda-theano:7.5 and kaixhin/cuda-theano:8.0 I get the following error after importing theano:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", line 124, in <module>
    from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/__init__.py", line 41, in <module>
    from theano.scan_module import scan_opt
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_opt.py", line 71, in <module>
    from theano.scan_module import scan_op
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 58, in <module>
    from six import iteritems, integer_types, raise_from
ImportError: cannot import name raise_from

I have six-1.5.2 installed in /usr/libs/python2.7 and six-1.11.0 in /usr/local/libs/python2.7.

If I enforce python to use the six version in /usr/local/lib/python2.7 (by putting /usr/local/lib/python2.7/dist-packages on the beginning of my sys.path, I get another error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", line 124, in <module>
    from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/__init__.py", line 41, in <module>
    from theano.scan_module import scan_opt
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_opt.py", line 60, in <module>
    from theano import tensor, scalar
ImportError: cannot import name tensor

If I install six-1.11.0 manually using setup.py from https://pypi.python.org/pypi/six#downloads, it seems to work. At least I get a different error now:

Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
In file included from /tmp/try_flags_ZG8wmH.c:4:0:
/usr/include/cudnn.h:63:26: fatal error: driver_types.h: No such file or directory
 #include "driver_types.h"
                          ^
compilation terminated.

Mapped name None to device cuda: Quadro K6000 (0000:03:00.0)

However, with the kaixhin/theano image (without cuda), I don't get the error and everything works fine.

Any idea what is wrong here?

Thanks!

Building cuda images locally fail

Hi,

I have been trying to build the cuda-torch v8.0 image locally. I have concatenated all your Dockerfiles like below, but I am encountering errors.

The obvious ones are easy to fix, install-deps: line 151: sudo: command not found, but I get to the point where I am unable to fix Failed copying contents of 'lua' directory.

Any idea ?

FROM nvidia/cuda:8.0-cudnn5-devel

# Install git, apt-add-repository and dependencies for iTorch
RUN apt-get update && apt-get install -y \
  git \
  software-properties-common \
  ipython3 \
  libssl-dev \
  libzmq3-dev \
  python-zmq \
  python-pip

# Install Jupyter Notebook for iTorch
RUN pip install notebook ipywidgets

# Run Torch7 installation scripts (dependencies only)
RUN git clone https://github.com/torch/distro.git /root/torch --recursive && \
  cd /root/torch && \
  bash install-deps

# Run Torch7 installation scripts
RUN cd /root/torch && \
# Run without nvcc to prevent timeouts
  sed -i 's/path_to_nvcc=$(which nvcc)/path_to_nvcc=$(which no_nvcc)/g' install.sh && \
  sed -i 's,path_to_nvcc=/usr/local/cuda/bin/nvcc,path_to_nvcc=,g' install.sh && \
  ./install.sh

# Export environment variables manually
ENV LUA_PATH='/root/.luarocks/share/lua/5.1/?.lua;/root/.luarocks/share/lua/5.1/?/init.lua;/root/torch/install/share/lua/5.1/?.lua;/root/torch/install/share/lua/5.1/?/init.lua;./?.lua;/root/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua'
ENV LUA_CPATH='/root/.luarocks/lib/lua/5.1/?.so;/root/torch/install/lib/lua/5.1/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so'
ENV PATH=/root/torch/install/bin:$PATH
ENV LD_LIBRARY_PATH=/root/torch/install/lib:$LD_LIBRARY_PATH
ENV DYLD_LIBRARY_PATH=/root/torch/install/lib:$DYLD_LIBRARY_PATH
ENV LUA_CPATH='/root/torch/install/lib/?.so;'$LUA_CPATH

# Restore Torch7 installation script
RUN cd /root/torch && \
  sed -i 's/path_to_nvcc=$(which no_nvcc)/path_to_nvcc=$(which nvcc)/g' install.sh

# Install CUDA libraries
RUN luarocks install cutorch && \
  luarocks install cunn && \
  luarocks install cudnn

Latest CUDA 7.5

It seems like the latest version of CUDA 7.5 (352.63), which fixes a critical bug on ec2, is only available .deb installer but not in .run installers. Hence the dockerfile in this repo will not install the latest version of CUDA 7.5.

I'm not sure if this is an "issue" that should be fixed but I think it's worth pointing out in case someone elses run into the same situation.

add wget in mxnet Dockerfile

Mxnet python package documentation includes a demo for quick test. However, this demo requires wget to download the dataset.

It would be better to add wget in the dependency so that users can test it directly from

nvidia-docker run -it --rm kaixhin/cuda-mxnet python example/image-classification/train_mnist.py --network lenet --gpus 0

cuda-theano:7.5 import error

I run this cuda-theano image use the command:

sudo nvidia-docker run -it --name theano kaixhin/cuda-theano:7.5 /bin/bash

In the theano container, I test the gpu theano and I have the error:

root@cd170ec8e8f9:~# python -c "import theano; theano.test()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", line 79, in <module>
    from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/__init__.py", line 41, in <module>
    from theano.scan_module import scan_opt
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_opt.py", line 71, in <module>
    from theano.scan_module import scan_op
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 58, in <module>
    from six import iteritems, integer_types, raise_from
ImportError: cannot import name raise_from

I'm sure nvidia-docker works well on my computer.
May I have any help?

Cuda-Digits:8.0 with Multi GPU

I am getting error while training in digits:

"ERROR: USE_NCCL := 1 must be specified for multi-GPU"

I guess caffe should be compiled with USE_NCCL := 1.

How can I do this? I am new to docker issues..

an env question of cuda-torch

I pulled the container of cuda-torch by the instruction. However, I find that we don't have /root/.luarocks/config-5.1.lua/ and /usr/local/share/lua/ exists.
Why do we need to set LUA_PATH and LUA_CPATH with /root/.luarocks/config-5.1.lua/ and /usr/local/share/lua/

Fix builds

Images in bold have builds disabled via removing their dependent linked repository.

Question: how do you recommend installing CUDA on the host OS?

I ran into an issue with a minor mismatch in the CUDA versions if I installed CUDA on the host using these instructions and then trying to run the kaixhin/cuda docker image.

Any advice here?

Support different driver ranges for CUDA

This is possible (although it may not be compatible with the current approach for building images). Should be investigated if time permits.

cuda-ssh does not allow root logins on cuda_v8.0 tag

The cuda-ssh dockerfile appears to have an error where sshd_config is modified. Right now root logins remain prohibited when the container is built on the cuda_v8.0.

The line:
# Allow root login with password sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \

should be the following (I think).

# Allow root login with password sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \

Rebuilding the container from the modified dockerfile allowed root login via ssh for me.

Spearmint script?

Spearmint requires several steps to get started; this could probably be reduced to make things smoother (for FGLab for example). Additionally it should be reasonably easy to have a separate MongoDB container. The disadvantage is keeping Spearmint documentation within this project, which requires keeping up to date with any potential API changes.

Pinging @gngdb for feedback on how to approach this.

VNC caffe cuda error

I build a image with caffe, Cuda8.0, cudnn5, and vnc sever.

By command line way,

docker exec -it container_name  bash

I run py-faster-rcnn with caffe. It worked well.

But by VNC way, I use jumpdesktop to connect the same container through VNC. I run the same demo. It reported error.

 WARNING: Logging before InitGoogleLogging() is written to STDERR
 E1228 06:30:11.450963  3340 common.cpp:104] Cannot create Cublas handle. Cublas won't be available.
 E1228 06:30:11.451587  3340 common.cpp:111] Cannot create Curand generator. Curand won't be available.
 E1228 06:30:11.451587  3340 common.cpp:111] Cannot create Curand generator. Curand won't be available.
 F1228 06:30:11.452177  3340 common.cpp:142] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA run\
 time version

cuda-mxnet doesn't work anymore

I build the image but failed.got the error msg:

g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See file:///usr/share/doc/gcc-4.8/README.Bugs for instructions.
make: *** [build/src/operator/tensor/control_flow_op.o] Error 4
The command '/bin/sh -c cd /workspace && git clone --recursive https://github.com/dmlc/mxnet && cd mxnet && cp make/config.mk config.mk && sed -i 's/USE_BLAS = atlas/USE_BLAS = openblas/g' config.mk && sed -i 's/USE_CUDA = 0/USE_CUDA = 1/g' config.mk && sed -i 's/USE_CUDA_PATH = NONE/USE_CUDA_PATH = /usr/local/cuda/g' config.mk && sed -i 's/USE_CUDNN = 0/USE_CUDNN = 1/g' config.mk && sed -i 's/USE_DIST_KVSTORE = 0/USE_DIST_KVSTORE = 1/g' config.mk && make -j"$(nproc)"' returned a non-zero code: 2

Failed to initialize NVML: GPU access blocked by the operating system

Hi, first of all, thanks for sharing these Dockerfiles. I've been trying to use your kaixhin/cuda, but I can't access the GPUs within the container. I'm fairly certain both the host and container are running the same CUDA versions, 7.0.28. But nvidia-smi always outputs Failed to initialize NVML: GPU access blocked by the operating system. Also nvidia-smi -a produces the same error, so I can't find a way to get more information about this error. Do you have any ideas what this could be caused by?

Thanks!

Brendan

Within the docker container:

$ docker run -ti -v `pwd`/NVIDIA_CUDA-7.0_Samples:/cudasamples --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidia1:/dev/nvidia1 --device /dev/nvidia2:/dev/nvidia2 --device /dev/nvidia3:/dev/nvidia3 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm kaixhin/cuda /bin/bash
root@9279fc160f42:/# nvidia-smi 
Failed to initialize NVML: GPU access blocked by the operating system
root@9279fc160f42:/# /cudasamples/1_Utilities/deviceQuery/deviceQuery 
/cudasamples/1_Utilities/deviceQuery/deviceQuery Starting...
 CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

root@9279fc160f42:/# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27

On the host:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27
$ modinfo nvidia | grep version
version:        346.47
vermagic:       3.16.0-31-generic SMP mod_unload modversions

$ nvidia-smi
Wed Apr  8 23:47:44 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 346.47     Driver Version: 346.47         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  On   | 0000:04:00.0     Off |                  N/A |
| 26%   28C    P8    14W / 250W |     15MiB /  6143MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  On   | 0000:08:00.0     Off |                  N/A |
| 26%   28C    P8    14W / 250W |     15MiB /  6143MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  On   | 0000:85:00.0     Off |                  N/A |
| 26%   29C    P8    13W / 250W |     15MiB /  6143MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX TIT...  On   | 0000:89:00.0     Off |                  N/A |
| 26%   28C    P8    14W / 250W |     15MiB /  6143MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

docker: Error response from daemon: manifest for kaixhin/cuda-torch:latest not found.

I usr crnn but it shows the above error.

Then i use nvidia-docker run -it kaixhin/cuda-torch this, it also show:


Unable to find image 'kaixhin/cuda-torch:latest' locally
docker: Error response from daemon: manifest for kaixhin/cuda-torch:latest not found.
See 'docker run --help'

My machine is ubuntu16.04.

Eventually I am interested in creating and publishing multi-arch images on dockerhub (especially for ppc64le), as described here https://github.com/docker-library/official-images#multiple-architectures, the first step for that is to have an official Intel image on dockerhub

VNC image local build error

Hey there, great images. I am having some issues building it locally.

Step 4/10 : ENV USER root
 ---> Running in 6b2c881d3107
 ---> 9c91f0c8818e
Removing intermediate container 6b2c881d3107
Step 5/10 : COPY password.txt .
 ---> 71f859bc9d2d
Removing intermediate container 7a7772cc3d2a
Step 6/10 : RUN cat password.txt password.txt | vncpasswd &&   rm password.txt
 ---> Running in 10ebdba5fef9
Using password file /root/.vnc/passwd
VNC directory /root/.vnc does not exist, creating.
Password: Warning: password truncated to the length of 8.
Verify:   Passwords do not match. Please try again.

Password: Password too short
The command '/bin/sh -c cat password.txt password.txt | vncpasswd && rm password.txt' returned a non-zero code: 1`
Tried setting up a longer password but didn't work.