kaixhin / dockerfiles Goto Github PK
View Code? Open in Web Editor NEWCompilation of Dockerfiles with automated builds enabled on the Docker Registry
Home Page: https://hub.docker.com/u/kaixhin/
License: MIT License
Compilation of Dockerfiles with automated builds enabled on the Docker Registry
Home Page: https://hub.docker.com/u/kaixhin/
License: MIT License
README.md
... (including iTorch)
But:
root@f34323292132:~/torch# itorch notebook
bash: itorch: command not found
Is it not possible to use the same Nvidia driver versions across ALL of your images. Otherwise as it stands I need to create a completely different AWS image for each docker container I want to run, given that the docker containers only seem to work when you have the precise point version of the Nvidia drives on the host.
Will there be CNMeM support in the Lasagne image eventually?
The error was
/tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/generic/THCTensor.c:304:3: error: implicit declaration of function 'THLongStorage_calculateExpandGeometry' [-Werror=implicit-function-declaration]
The last stack trace was
Scanning dependencies of target THC
[ 81%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCCachingAllocator.cpp.o
[ 82%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCCachingHostAllocator.cpp.o
[ 83%] Building C object lib/THC/CMakeFiles/THC.dir/THCGeneral.c.o
[ 84%] Building C object lib/THC/CMakeFiles/THC.dir/THCStorageCopy.c.o
[ 86%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCStream.cpp.o
[ 86%] Building C object lib/THC/CMakeFiles/THC.dir/THCTensorCopy.c.o
[ 87%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCTensorRandom.cpp.o
[ 88%] Building C object lib/THC/CMakeFiles/THC.dir/THCTensor.c.o
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/THCGenerateAllTypes.h:17,
from /tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaByteTensor_newExpand':
/tmp/luarocks_cutorch-scm-1-6613/cutorch/lib/THC/generic/THCTensor.c:304:3: error: implicit declaration of function 'THLongStorage_calculateExpandGeometry' [-Werror=implicit-function-declaration]
THLongStorage_calculateExpandGeometry(tensor->size,
when building the gpu cuda8.0 docker image.
I use the dock as follow:
sudo nvidia-docker run -it kaixhin/cuda-theano:8.0
and when I try to test theano as follow:
python -c "import theano"
I get the following error
root@57ec910ade69:/# python2 -c "import theano"Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/theano/init.py", line 67, in
from theano.configdefaults import config
File "/usr/local/lib/python2.7/dist-packages/theano/configdefaults.py", line 113, in
in_c_key=False)
File "/usr/local/lib/python2.7/dist-packages/theano/configparser.py", line 285, in AddConfigVar
configparam.get(root, type(root), delete_key=True)
File "/usr/local/lib/python2.7/dist-packages/theano/configparser.py", line 333, in get
self.set(cls, val_str)
File "/usr/local/lib/python2.7/dist-packages/theano/configparser.py", line 344, in set
self.val = self.filter(val)
File "/usr/local/lib/python2.7/dist-packages/theano/configdefaults.py", line 100, in filter
% (self.default, val, self.fullname)))
ValueError: Invalid value ("cpu") for configuration variable "gpu". Valid options start with one of "device", "opencl", "cuda"
When running nvidia-docker run -it kaixhin/cuda-caffe:8.0
one ends up in a ~/caffe folder. But caffee seems not to be available via caffe command. Also it looks like that caffe is actually not built. How is one supposed to use this image?
Two warnings are being regarded as errors by the compiler when compiling THC. Perhaps flags allowing these warnings to proceed might help.
Scanning dependencies of target THC
[ 82%] [ 83%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCCachingAllocator.cpp.o
[ 84%] [ 86%] [ 88%] [ 88%] [ 89%] [ 90%] Building CXX object lib/THC/CMakeFiles/THC.dir/THCCachingHostAllocator.cpp.o
Building C object lib/THC/CMakeFiles/THC.dir/THCStorageCopy.c.o
Building C object lib/THC/CMakeFiles/THC.dir/THCTensorCopy.c.o
Building C object lib/THC/CMakeFiles/THC.dir/THCTensor.c.o
Building C object lib/THC/CMakeFiles/THC.dir/THCGeneral.c.o
Building CXX object lib/THC/CMakeFiles/THC.dir/THCStream.cpp.o
Building CXX object lib/THC/CMakeFiles/THC.dir/THCTensorRandom.cpp.o
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:17,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaByteTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:3: error: implicit declaration of function 'THLongStorage_newInferSize' [-Werror=implicit-function-declaration]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:18,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaCharTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:19,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaShortTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:20,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaIntTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:21,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaLongTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:22,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaHalfTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:23,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
In file included from generic/THCTensor.c:1:0,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCGenerateAllTypes.h:24,
from /tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/THCTensor.c:7:
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c: In function 'THCudaDoubleTensor_newView':
/tmp/luarocks_cutorch-scm-1-2857/cutorch/lib/THC/generic/THCTensor.c:231:34: warning: initialization makes pointer from integer without a cast [enabled by default]
THLongStorage *inferred_size = THLongStorage_newInferSize(size, numel);
^
cc1: some warnings being treated as errors
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THCTensor.c.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [lib/THC/CMakeFiles/THC.dir/all] Error 2
make: *** [all] Error 2
Error: Build error: Failed building.
Installing https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec... switching to 'build' mode
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/root/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/root/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$(getconf _NPROCESSORS_ONLN) install
The command '/bin/sh -c luarocks install cutorch && luarocks install cunn && luarocks install cudnn' returned a non-zero code: 1
Hi,
I'm using your cuda-torch
container via docker hub, but see no indication that cudnn is installed (libcudnn.so.* not found anywhere). Was this introduced only in a later version?
Thanks,
Running docker run --rm -p 8080:5000 kaixhin/digits
gives me /usr/bin/python: No module named digits
.
Running
docker run -it --rm kaixhin/digits bash
cd /root/digits/
pip install -r requirements.txt
/root/digits/digits-devserver
fixes the problem and the digits server start.
So is there a problem with the digits image?
Can you push an update to DIGITS (https://hub.docker.com/r/kaixhin/digits/) to 6.0 or is there a reason you are sticking at 5?
When I make other caffe with your enviroment, I got this error. Can you tell me where did you put your boost library?
For some reason it seems that itorch isn't installed together with torch (I believe that it was installed in earlier versions):
max@max-UX31A:~/$ sudo docker run --rm -it -p 8888:8888 kaixhin/cuda-torch
Unable to find image 'kaixhin/cuda-torch:latest' locally
latest: Pulling from kaixhin/cuda-torch
bbe1c4256df3: Pull complete
911d09728ffd: Pull complete
615765bc0d9f: Pull complete
a3ed95caeb02: Pull complete
f6c40ea017da: Pull complete
a53854637f3f: Pull complete
1cd0c8506d8b: Pull complete
687b23b1ba76: Pull complete
73a547b0c44e: Pull complete
964cc0d8070b: Pull complete
c146c215733f: Pull complete
d0ba2846eec7: Pull complete
Digest: sha256:2e22615195b4ebb19bd633a9afca997e083625b70e457f2ad1a847f96aec7ad7
Status: Downloaded newer image for kaixhin/cuda-torch:latest
root@e80bdeb5c974:~/torch# ls
CMakeLists.txt README.md clean.sh exe install install.sh test.sh
LICENSE.md build cmake extra install-deps pkg update.sh
root@e80bdeb5c974:~/torch# itorch
bash: itorch: command not found
Since the jupyter notebook is one of the most natural ways for interacting with a docker container I believe that it would be beneficial to add itorch to the build or have separate builds that includes the itorch
$ nvidia-docker run -it kaixhin/cuda-torch:8.0
Tag 8.0 not found in repository docker.io/kaixhin/cuda-torch
Getting the following error when building cudnn on the 7.0 tag release. ๏ฟฝ
[91mCMake Error at /usr/share/cmake-2.8/Modules/FindPackageHandleStandardArgs.cmake:108 (message):
Could NOT find CUDA: Found unsuitable version "7.0", but required is at
least "7.5" (found /usr/local/cuda)
You can see this error is generated on your automated build as well: Docker Hub Build Log
If using the official cuda installer, I can find cuda-install-samples-7.0.sh
.
http://docs.nvidia.com/cuda/cuda-samples/index.html#getting-cuda-samples
However, after building using your dockerfile, I cannot find this script.
How to include the cuda samples in the docker image? Because I want to test the cuda with GPU by using deviceQuery
script.
In docker registry Ubuntu Core 14.04 + Pycaffe + DIGITS (CPU-only) in https://hub.docker.com/r/kaixhin/digits/
it seem libdc1394 is missing
here is log run :
/usr/local/lib/python2.7/dist-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.wtf is deprecated, use flask_wtf instead.
.format(x=modname), ExtDeprecationWarning
libdc1394 error: Failed to initialize libdc1394
/usr/local/lib/python2.7/dist-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.socketio is deprecated, use flask_socketio instead.
.format(x=modname), ExtDeprecationWarning
Traceback (most recent call last):
File "/root/digits/digits-devserver", line 42, in
from digits.webapp import app, socketio, scheduler
File "/root/digits/digits/webapp.py", line 38, in
import digits.views
File "/root/digits/digits/views.py", line 538, in
Default value for torch_root "" invalid:
torch binary not found in PATH
app.register_error_handler(code, handle_error)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1186, in register_error_handler
self._register_error_handler(None, code_or_exception, f)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 65, in wrapper_func
return f(self, _args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1201, in _register_error_handler
exc_class, code = self._get_exc_class_and_code(code_or_exception)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1121, in _get_exc_class_and_code
exc_class = default_exceptions[exc_class_or_code]
KeyError: 300
Exception KeyError: KeyError(140559450766864,) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored
All images should move from Ubuntu 14.04 LTS to 16.04 LTS, except for CUDA images where versions <= 7.5 should remain with 14.04 and versions >= 8.0 should migrate (see NVIDIA/nvidia-docker#110).
Hi, would be cool to add support like in the default nvidia digits gpu dockerfile:
https://gitlab.com/nvidia/digits/blob/master/5.0/Dockerfile
VOLUME /data
VOLUME /jobs
ENV DIGITS_JOBS_DIR=/jobs
ENV DIGITS_LOGFILE_FILENAME=/jobs/digits.log
great work! regards
RQ
Using docker 1.10.3 on Ubuntu 14.04 to run docker run -it --device /dev/nvidiactl --device /dev/nvidia-uvm --device /dev/nvidia0 kaixhin/cuda-keras:7.0
, then inside the vm start python and try to import theano, it says "CuDNN not available". Try to force cudnn by adding "optimizer_including=cudnn", it shows theano couldn't find cudnn.h.
It looks to me that in addition to install libcudnn4 in dockerfile, you also need to install libcudnn4-dev to provide the header file.
The NVIDIA Docker project seems reasonably stable and migrating will allow a range of drivers to be used (closing #5 and #7). The following need to be built and tested over all supported CUDA versions:
What is the difference between cuda-mxnet and cuda-mxnet:7.0 image?
Mxnet python demo works properly on cuda-mxnet:7.0
, but it fails on cuda-mxnet
.
Run demo in cuda-mxnet:7.0
nvidia-docker run -it --rm kaixhin/cuda-mxnet:7.0 python example/image-classification/train_mnist.py --network lenet --gpus 0
This command returns no errors.
Run demo in cuda-mxnet
nvidia-docker run -it --rm kaixhin/cuda-mxnet python example/image-classification/train_mnist.py --network lenet --gpus 0
This one returns error. These are the error messages:
Archive: mnist.zip
inflating: t10k-images-idx3-ubyte
inflating: t10k-labels-idx1-ubyte
inflating: train-images-idx3-ubyte
inflating: train-labels-idx1-ubyte
[06:07:46] src/io/iter_mnist.cc:94: MNISTIter: load 60000 images, shuffle=1, shape=(128,1,28,28)
[06:07:46] src/io/iter_mnist.cc:94: MNISTIter: load 10000 images, shuffle=1, shape=(128,1,28,28)
2016-10-03 06:07:46,795 Node[0] Start training with [gpu(0)]
[06:10:16] /root/mxnet/dmlc-core/include/dmlc/logging.h:235: [06:10:16] /root/mxnet/mshadow/mshadow/./././dot_engine-inl.h:524: Check failed: (err) == (CUBLAS_STATUS_SUCCESS) Cublas: Sgemm fail
[06:10:16] /root/mxnet/dmlc-core/include/dmlc/logging.h:235: [06:10:16] src/engine/./threaded_engine.h:306: [06:10:16] /root/mxnet/mshadow/mshadow/./././dot_engine-inl.h:524: Check failed: (err) == (CUBLAS_STATUS_SUCCESS) Cublas: Sgemm fail
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
terminate called after throwing an instance of 'dmlc::Error'
what(): [06:10:16] src/engine/./threaded_engine.h:306: [06:10:16] /root/mxnet/mshadow/mshadow/./././dot_engine-inl.h:524: Check failed: (err) == (CUBLAS_STATUS_SUCCESS) Cublas: Sgemm fail
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
From this page, it seems that the dockerfile fails to compile recent mxnet updates:
https://hub.docker.com/r/kaixhin/cuda-mxnet/builds/
Hi Kaixhin,
first, thanks for the great images! I want to report a problem that I have with the cuda-theano image though. I am still having the issue which was marked as resolved here.
Here are my specs:
With the images kaixhin/cuda-theano:7.5
and kaixhin/cuda-theano:8.0
I get the following error after importing theano:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", line 124, in <module>
from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/__init__.py", line 41, in <module>
from theano.scan_module import scan_opt
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_opt.py", line 71, in <module>
from theano.scan_module import scan_op
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 58, in <module>
from six import iteritems, integer_types, raise_from
ImportError: cannot import name raise_from
I have six-1.5.2
installed in /usr/libs/python2.7
and six-1.11.0
in /usr/local/libs/python2.7
.
If I enforce python to use the six version in /usr/local/lib/python2.7
(by putting /usr/local/lib/python2.7/dist-packages
on the beginning of my sys.path, I get another error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", line 124, in <module>
from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/__init__.py", line 41, in <module>
from theano.scan_module import scan_opt
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_opt.py", line 60, in <module>
from theano import tensor, scalar
ImportError: cannot import name tensor
If I install six-1.11.0 manually using setup.py from https://pypi.python.org/pypi/six#downloads, it seems to work. At least I get a different error now:
Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
In file included from /tmp/try_flags_ZG8wmH.c:4:0:
/usr/include/cudnn.h:63:26: fatal error: driver_types.h: No such file or directory
#include "driver_types.h"
^
compilation terminated.
Mapped name None to device cuda: Quadro K6000 (0000:03:00.0)
However, with the kaixhin/theano
image (without cuda), I don't get the error and everything works fine.
Any idea what is wrong here?
Thanks!
Hi,
I have been trying to build the cuda-torch v8.0 image locally. I have concatenated all your Dockerfile
s like below, but I am encountering errors.
The obvious ones are easy to fix, install-deps: line 151: sudo: command not found
, but I get to the point where I am unable to fix Failed copying contents of 'lua' directory
.
Any idea ?
FROM nvidia/cuda:8.0-cudnn5-devel
# Install git, apt-add-repository and dependencies for iTorch
RUN apt-get update && apt-get install -y \
git \
software-properties-common \
ipython3 \
libssl-dev \
libzmq3-dev \
python-zmq \
python-pip
# Install Jupyter Notebook for iTorch
RUN pip install notebook ipywidgets
# Run Torch7 installation scripts (dependencies only)
RUN git clone https://github.com/torch/distro.git /root/torch --recursive && \
cd /root/torch && \
bash install-deps
# Run Torch7 installation scripts
RUN cd /root/torch && \
# Run without nvcc to prevent timeouts
sed -i 's/path_to_nvcc=$(which nvcc)/path_to_nvcc=$(which no_nvcc)/g' install.sh && \
sed -i 's,path_to_nvcc=/usr/local/cuda/bin/nvcc,path_to_nvcc=,g' install.sh && \
./install.sh
# Export environment variables manually
ENV LUA_PATH='/root/.luarocks/share/lua/5.1/?.lua;/root/.luarocks/share/lua/5.1/?/init.lua;/root/torch/install/share/lua/5.1/?.lua;/root/torch/install/share/lua/5.1/?/init.lua;./?.lua;/root/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua'
ENV LUA_CPATH='/root/.luarocks/lib/lua/5.1/?.so;/root/torch/install/lib/lua/5.1/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so'
ENV PATH=/root/torch/install/bin:$PATH
ENV LD_LIBRARY_PATH=/root/torch/install/lib:$LD_LIBRARY_PATH
ENV DYLD_LIBRARY_PATH=/root/torch/install/lib:$DYLD_LIBRARY_PATH
ENV LUA_CPATH='/root/torch/install/lib/?.so;'$LUA_CPATH
# Restore Torch7 installation script
RUN cd /root/torch && \
sed -i 's/path_to_nvcc=$(which no_nvcc)/path_to_nvcc=$(which nvcc)/g' install.sh
# Install CUDA libraries
RUN luarocks install cutorch && \
luarocks install cunn && \
luarocks install cudnn
It seems like the latest version of CUDA 7.5 (352.63), which fixes a critical bug on ec2, is only available .deb installer but not in .run installers. Hence the dockerfile in this repo will not install the latest version of CUDA 7.5.
I'm not sure if this is an "issue" that should be fixed but I think it's worth pointing out in case someone elses run into the same situation.
Mxnet python package documentation includes a demo for quick test. However, this demo requires wget to download the dataset.
It would be better to add wget in the dependency so that users can test it directly from
nvidia-docker run -it --rm kaixhin/cuda-mxnet python example/image-classification/train_mnist.py --network lenet --gpus 0
I run this cuda-theano image use the command:
sudo nvidia-docker run -it --name theano kaixhin/cuda-theano:7.5 /bin/bash
In the theano container, I test the gpu theano and I have the error:
root@cd170ec8e8f9:~# python -c "import theano; theano.test()"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", line 79, in <module>
from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/__init__.py", line 41, in <module>
from theano.scan_module import scan_opt
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_opt.py", line 71, in <module>
from theano.scan_module import scan_op
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 58, in <module>
from six import iteritems, integer_types, raise_from
ImportError: cannot import name raise_from
I'm sure nvidia-docker works well on my computer.
May I have any help?
I am getting error while training in digits:
"ERROR: USE_NCCL := 1 must be specified for multi-GPU"
I guess caffe should be compiled with USE_NCCL := 1.
How can I do this? I am new to docker issues..
Images in bold have builds disabled via removing their dependent linked repository.
I ran into an issue with a minor mismatch in the CUDA versions if I installed CUDA on the host using these instructions and then trying to run the kaixhin/cuda docker image.
Any advice here?
This is possible (although it may not be compatible with the current approach for building images). Should be investigated if time permits.
The cuda-ssh dockerfile appears to have an error where sshd_config is modified. Right now root logins remain prohibited when the container is built on the cuda_v8.0.
The line:
# Allow root login with password sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
should be the following (I think).
# Allow root login with password sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
Rebuilding the container from the modified dockerfile allowed root login via ssh for me.
Spearmint requires several steps to get started; this could probably be reduced to make things smoother (for FGLab for example). Additionally it should be reasonably easy to have a separate MongoDB container. The disadvantage is keeping Spearmint documentation within this project, which requires keeping up to date with any potential API changes.
Pinging @gngdb for feedback on how to approach this.
I build a image with caffe
, Cuda8.0
, cudnn5
, and vnc
sever.
By command line way,
docker exec -it container_name bash
I run py-faster-rcnn
with caffe
. It worked well.
But by VNC
way, I use jumpdesktop
to connect the same container through VNC
. I run the same demo. It reported error.
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1228 06:30:11.450963 3340 common.cpp:104] Cannot create Cublas handle. Cublas won't be available.
E1228 06:30:11.451587 3340 common.cpp:111] Cannot create Curand generator. Curand won't be available.
E1228 06:30:11.451587 3340 common.cpp:111] Cannot create Curand generator. Curand won't be available.
F1228 06:30:11.452177 3340 common.cpp:142] Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA run\
time version
I build the image but failed.got the error msg:
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See file:///usr/share/doc/gcc-4.8/README.Bugs for instructions.
make: *** [build/src/operator/tensor/control_flow_op.o] Error 4
The command '/bin/sh -c cd /workspace && git clone --recursive https://github.com/dmlc/mxnet && cd mxnet && cp make/config.mk config.mk && sed -i 's/USE_BLAS = atlas/USE_BLAS = openblas/g' config.mk && sed -i 's/USE_CUDA = 0/USE_CUDA = 1/g' config.mk && sed -i 's/USE_CUDA_PATH = NONE/USE_CUDA_PATH = /usr/local/cuda/g' config.mk && sed -i 's/USE_CUDNN = 0/USE_CUDNN = 1/g' config.mk && sed -i 's/USE_DIST_KVSTORE = 0/USE_DIST_KVSTORE = 1/g' config.mk && make -j"$(nproc)"' returned a non-zero code: 2
Hi, first of all, thanks for sharing these Dockerfiles. I've been trying to use your kaixhin/cuda
, but I can't access the GPUs within the container. I'm fairly certain both the host and container are running the same CUDA versions, 7.0.28. But nvidia-smi
always outputs Failed to initialize NVML: GPU access blocked by the operating system
. Also nvidia-smi -a
produces the same error, so I can't find a way to get more information about this error. Do you have any ideas what this could be caused by?
Thanks!
Brendan
Within the docker container:
$ docker run -ti -v `pwd`/NVIDIA_CUDA-7.0_Samples:/cudasamples --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidia1:/dev/nvidia1 --device /dev/nvidia2:/dev/nvidia2 --device /dev/nvidia3:/dev/nvidia3 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm kaixhin/cuda /bin/bash
root@9279fc160f42:/# nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
root@9279fc160f42:/# /cudasamples/1_Utilities/deviceQuery/deviceQuery
/cudasamples/1_Utilities/deviceQuery/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL
root@9279fc160f42:/# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27
On the host:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27
$ modinfo nvidia | grep version
version: 346.47
vermagic: 3.16.0-31-generic SMP mod_unload modversions
$ nvidia-smi
Wed Apr 8 23:47:44 2015
+------------------------------------------------------+
| NVIDIA-SMI 346.47 Driver Version: 346.47 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... On | 0000:04:00.0 Off | N/A |
| 26% 28C P8 14W / 250W | 15MiB / 6143MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... On | 0000:08:00.0 Off | N/A |
| 26% 28C P8 14W / 250W | 15MiB / 6143MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX TIT... On | 0000:85:00.0 Off | N/A |
| 26% 29C P8 13W / 250W | 15MiB / 6143MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX TIT... On | 0000:89:00.0 Off | N/A |
| 26% 28C P8 14W / 250W | 15MiB / 6143MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I usr crnn but it shows the above error.
Then i use nvidia-docker run -it kaixhin/cuda-torch
this, it also show:
Unable to find image 'kaixhin/cuda-torch:latest' locally
docker: Error response from daemon: manifest for kaixhin/cuda-torch:latest not found.
See 'docker run --help'
My machine is ubuntu16.04.
I tried to install this in my hassio environment (home assistant) and I got the following error when starting the container:
standard_init_linux.go:211: exec user process caused "exec format error"
Any ideas?
ENV PATH=/root/torch/install/bin:$PATH
should be ENV PATH /root/torch/install/bin:$PATH
in dockerfiles/cuda-torch-plus/Dockerfile
See here zeromq/jzmq@5558dc0
Hi,
Want to check if there are any plans to make these docker images official? The steps for doing that are as per here: https://docs.docker.com/docker-hub/official_repos/#how-do-i-create-a-new-official-repository
Eventually I am interested in creating and publishing multi-arch images on dockerhub (especially for ppc64le), as described here https://github.com/docker-library/official-images#multiple-architectures, the first step for that is to have an official Intel image on dockerhub
Hey there, great images. I am having some issues building it locally.
Step 4/10 : ENV USER root
---> Running in 6b2c881d3107
---> 9c91f0c8818e
Removing intermediate container 6b2c881d3107
Step 5/10 : COPY password.txt .
---> 71f859bc9d2d
Removing intermediate container 7a7772cc3d2a
Step 6/10 : RUN cat password.txt password.txt | vncpasswd && rm password.txt
---> Running in 10ebdba5fef9
Using password file /root/.vnc/passwd
VNC directory /root/.vnc does not exist, creating.
Password: Warning: password truncated to the length of 8.
Verify: Passwords do not match. Please try again.
Password: Password too short
The command '/bin/sh -c cat password.txt password.txt | vncpasswd && rm password.txt' returned a non-zero code: 1`
Tried setting up a longer password but didn't work.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.