benoitsteiner / tensorflow-opencl Goto Github PK
View Code? Open in Web Editor NEWOpenCL support for TensorFlow
License: Apache License 2.0
OpenCL support for TensorFlow
License: Apache License 2.0
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):14.04.3-->Trusty
TensorFlow installed from (source or binary):Source
TensorFlow version (use command below):1.0 (Steps-> Downloaded tensorflow from https://github.com/benoitsteiner/tensorflow-opencl, ./configure - to configure project)
Bazel version (if compiling from source):0.4.5
CUDA/cuDNN version:NA
OPENCL Version:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (1800.11)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
GPU model and memory:
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Board name: AMD Radeon (TM) R5 M335
Memory: 4096M
Exact command to reproduce:
run the python script -- ipython keras_code.py
** G++/GCC version**:
g++-4.9 (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4
I have compiled CPP programs, they work fine.
-- ** Python**: I am using Anaconda distribution Python for 2.7.2. (Anaconda - 2.4.3)
I have compile tensorflow, and deployed the same -> No issues here. when I try to run the code I run into the following error:
2017-04-23 14:01:15.180795: W ./tensorflow/core/common_runtime/sycl/sycl_util.h:44] No OpenCL GPU found that is supported by ComputeCpp, trying OpenCL CPU
2017-04-23 14:01:15.180843: F ./tensorflow/core/common_runtime/sycl/sycl_util.h:53] No OpenCL GPU nor CPU found that is supported by ComputeCpp
Aborted (core dumped)
I have attached the code file. Please note this is a simplified version of the file. The logic is:
tensorflow-code-throwing-error.txt
Please let me know if there are any fixes or if I can do something to get round this issue.
Thanks and regards
Sayantan
Hey everyone,
Post is mostly self-explanatory, but I've been struggling to get open-CL working using the AMD R9 390x while trying to follow the tutorial here. I get this error when I try to run clinfo:
modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='fglrx'
modprobe: ERROR: could not insert 'fglrx': Function not implemented
Error! Fail to load fglrx kernel module! Maybe you can switch to root user to load kernel module directly
modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='fglrx'
modprobe: ERROR: could not insert 'fglrx': Function not implemented
Error! Fail to load fglrx kernel module! Maybe you can switch to root user to load kernel module directly
X Error of failed request: BadRequest (invalid request code or no such operation)
Major opcode of failed request: 156 ()
Minor opcode of failed request: 19
Serial number of failed request: 12
Current serial number in output stream: 12
Was a pain to finally get even the fglrx drivers to not be buggy as hell and stop throwing up errors in general, which I'm sure is going to be related somehow. What worked for me was the purging and installation methods found at this post. Everything else I followed directly from the tutorial.
Anyone have any ideas as to where I should start looking? I know I have fglrx installed, so I'm really not sure what is up. Related but not so important...my 3 displays are also not working properly either.
Please let me know what would be some good logs to provide so I can provide you with with some helpful info. Thank you!
This is a follow-up on a previous message. I am encountering build errors, and don't seem to be able to find the source of it.
I have followed the following steps that I believe your colleague posted here:
https://www.codeplay.com/portal/03-30-17-setting-up-tensorflow-with-opencl-using-sycl
I deviated these instructions in the following way:
I did not update execute the following steps:
$ sudo apt-get install linux-image-3.19.0-79-generic linux-image-extra-3.19.0-79-generic linux-headers-3.19.0-79-generic
$ sudo apt-get remove linux-image-4.2.0-42-generic
$ sudo update-grub -
I was not sure why it is important to go to a that particular kernal so I did not upgrade the kernel. This is the version of Ubuntu I am using:
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
I am using the following kernel as part of t his standard Ubuntu 14.04.5 built:
3.13.0-116-generic
I used Python 3.5 inside a conda environment instead of Python 2.7
clinfo gives the following info:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (1912.5)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon(TM) R7 Graphics
Device Topology: PCI[ B#0, D#1, F#0 ]
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 720Mhz
Address bits: 64
Max memory allocation: 215482368
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 64
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 861929472
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 215482368
Max global variable size: 193934080
Max global variable preferred total size: 861929472
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7f1c77535a18
Name: Spectre
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 1912.5 (VM)
Profile: FULL_PROFILE
Version: OpenCL 2.0 AMD-APP (1912.5)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_khr_gl_depth_images cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes
Device Type: CL_DEVICE_TYPE_CPU
Vendor ID: 1002h
Board name:
Max compute units: 4
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 8
Preferred vector width double: 4
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 8
Native vector width double: 4
Max clock frequency: 3700Mhz
Address bits: 64
Max memory allocation: 2147483648
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 64
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 7182524416
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 2147483648
Max global variable size: 1879048192
Max global variable preferred total size: 1879048192
Max read/write image args: 64
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7f1c77535a18
Name: AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
Vendor: AuthenticAMD
Device OpenCL C version: OpenCL C 1.2
Driver version: 1912.5 (sse2,avx,fma4)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (1912.5)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event
/usr/local/computecpp/bin/computecpp_info gives the following output
********************************************************************************
ComputeCpp Info (CE 0.1.3)
********************************************************************************
Toolchain information:
GLIBCXX: 20150426
This version of libstdc++ is supported.
********************************************************************************
Device Info:
Discovered 1 devices matching:
platform : <any>
device type : <any>
--------------------------------------------------------------------------------
Device 0:
Device is supported : UNTESTED - Device not tested on this OS
CL_DEVICE_NAME : Spectre
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 1912.5 (VM)
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
********************************************************************************
********************************************************************************
********************************************************************************
I note here that somehow the CPU was not detected which is different from the tutorial mentioned above.
After configuring with default options, I run the following command:
$ bazel build -c opt --copt=-mavx --copt=-msse4.1 --copt=-msse4.2 --config=sycl //tensorflow/tools/pip_package:build_pip_package --verbose_failures
I am encountering the following error:
INFO: Found 1 target...
INFO: From Executing genrule //tensorflow/cc:array_ops_genrule:
2017-04-18 22:42:10.696714: W tensorflow/core/framework/op_gen_lib.cc:194] Squeeze can't find input squeeze_dims to rename
ERROR: /home/anthonyle/Projects/tensorflow-opencl/tensorflow/core/kernels/BUILD:2616:1: C++ compilation of rule '//tensorflow/core/kernels:pooling_ops' failed: computecpp failed: error executing command
(cd /home/anthonyle/.cache/bazel/_bazel_anthonyle/1b4b305bac04d7a568c973de167c2cf3/execroot/tensorflow-opencl && \
exec env - \
external/local_config_sycl/crosstool/computecpp -Wall -msse3 -g0 -O2 -DNDEBUG -mavx -msse4.1 -msse4.2 '-std=c++11' -MD -MF bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/pooling_ops/tensorflow/core/kernels/pooling_ops_3d.pic.d '-frandom-seed=bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/pooling_ops/tensorflow/core/kernels/pooling_ops_3d.pic.o' -fPIC -DEIGEN_MPL2_ONLY -DTENSORFLOW_USE_JEMALLOC -iquote . -iquote bazel-out/local_linux-py3-opt/genfiles -iquote external/eigen_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -iquote external/bazel_tools -iquote bazel-out/local_linux-py3-opt/genfiles/external/bazel_tools -iquote external/local_config_sycl -iquote bazel-out/local_linux-py3-opt/genfiles/external/local_config_sycl -iquote external/jemalloc -iquote bazel-out/local_linux-py3-opt/genfiles/external/jemalloc -iquote external/protobuf -iquote bazel-out/local_linux-py3-opt/genfiles/external/protobuf -iquote external/gif_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/gif_archive -iquote external/jpeg -iquote bazel-out/local_linux-py3-opt/genfiles/external/jpeg -iquote external/com_googlesource_code_re2 -iquote bazel-out/local_linux-py3-opt/genfiles/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive -iquote external/highwayhash -iquote bazel-out/local_linux-py3-opt/genfiles/external/highwayhash -iquote external/png_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/png_archive -iquote external/zlib_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -isystem external/eigen_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/local_config_sycl/sycl -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_sycl/sycl -isystem external/local_config_sycl/sycl/include -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_sycl/sycl/include -isystem external/jemalloc/include -isystem bazel-out/local_linux-py3-opt/genfiles/external/jemalloc/include -isystem external/protobuf/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/protobuf/src -isystem external/gif_archive/lib -isystem bazel-out/local_linux-py3-opt/genfiles/external/gif_archive/lib -isystem external/farmhash_archive/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive/src -isystem external/png_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/png_archive -isystem external/zlib_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare -fno-exceptions -msse3 -pthread -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c tensorflow/core/kernels/pooling_ops_3d.cc -o bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/pooling_ops/tensorflow/core/kernels/pooling_ops_3d.pic.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
In file included from tensorflow/core/kernels/pooling_ops_3d.cc:26:
./tensorflow/core/kernels/eigen_pooling.h:354:9: error: cannot compile this builtin function yet
pequal(p, pset1<Packet>(-Eigen::NumTraits<T>::highest()));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./tensorflow/core/kernels/eigen_pooling.h:337:22: note: expanded from macro 'pequal'
#define pequal(a, b) _mm256_cmp_ps(a, b, _CMP_EQ_UQ)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/computecpp/bin/../lib/clang/3.6.0/include/avxintrin.h:421:11: note: expanded from macro '_mm256_cmp_ps'
(__m256)__builtin_ia32_cmpps256((__v8sf)__a, (__v8sf)__b, (c)); })
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 20.702s, Critical Path: 20.03s
Did skipping some of the steps outlined above really lead to these errors? What did I do wrong?
Operating System: Antergos x64
Here is the following error:
ERROR: /home/samyr/programming/tools/tensorflow-opencl/tensorflow/core/kernels/BUILD:881:1: C++ compilation of rule '//tensorflow/core/kernels:gather_functor' failed (Exit 1). In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:141:0, from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:4, from ./tensorflow/core/kernels/gather_functor.h:19, from tensorflow/core/kernels/gather_functor.cc:50: external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorGenerator.h: In constructor 'Eigen::TensorEvaluator<const Eigen::TensorGeneratorOp<Generator, XprType>, Device>::TensorEvaluator(const XprType&, const Device&)': external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorGenerator.h:100:38: error: class 'Eigen::TensorEvaluator<const Eigen::TensorGeneratorOp<Generator, XprType>, Device>' does not have any field named 'm_argImpl' : m_generator(op.generator()), m_argImpl(op.expression(), device) ^~~~~~~~~ external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorGenerator.h:102:20: error: 'm_argImpl' was not declared in this scope m_dimensions = m_argImpl.dimensions(); ^~~~~~~~~
Installation has been done from source, using OpenCL. GCC version is 7.1.1
I have tried to download the source a second time, I have also tried to install g++, as Arch Linux only comes with GCC.
is the windows build dead?
what options need to be selected when running ./configure in tensorflow-opencl directory? Here's where I'm not sure what to enter in order to get tensorflow to recognize and use the AMD GPU/APU.
`~/tensorflow-opencl ~/tensorflow-opencl
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
/usr/lib/python3/dist-packages
Please input the desired Python library path to use. Default is [/usr/lib/python3/dist-packages]
Using python library path: /usr/lib/python3/dist-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] y
OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]:
`
I'm getting a 404 Not Found error when trying to access the nightly build download:
https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-0.12.1-py2-none-any.whl
Here's the error:
HTTP ERROR 404
Problem accessing /view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-0.12.1-py2-none-any.whl. Reason:
Not Found
Powered by Jetty://
Linked from: https://github.com/benoitsteiner/tensorflow-opencl
People who are a little more adventurous can also try our nightly binaries:
Linux CPU-only: Python 2 (build history) / Python 3.4 (build history) / Python 3.5 (build history) Linux GPU: Python 2 (build history) / Python 3.4 (build history) / Python 3.5 (build history) Mac CPU-only: Python 2 (build history) / Python 3 (build history)
--> Mac GPU: Python 2 (build history) / Python 3 (build history)
Android: demo APK, native libs (build history)
Hi
i try to build tensorflow with sycl
but i get error below:
external/local_config_sycl/crosstool/../sycl/include/SYCL/multi_pointer.h:342:3: error: multiple overloads of 'global_ptr' instantiate to the same signature 'void (pointer_t)' (aka 'void (attribute((address_space(1))) float *)')
global_ptr(pointer_t ptr) : Base(ptr) {}
^
...
^
tensorflow/core/kernels/adjust_contrast_op.cc:427:44: note: in instantiation of member function 'tensorflow::functor::AdjustContrastv2Eigen::SyclDevice::operator()' requested here
functor::AdjustContrastv2()(
^
external/local_config_sycl/crosstool/../sycl/include/SYCL/multi_pointer.h:334:3: note: previous declaration is here
global_ptr(dataType *ptr);
^
It seems re define global_ptr , how can i fix it??
my build command is:
bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package
python 2.7/3.6 is the same error
gcc 4.85 / 5.x is the same error
thx a lot!!
My build goes fine but when I import tensorflow I get the following. I'm doing this on Ubuntu 16.04 but I see this type of message on Gentoo when one of the packages has been compiled with gcc4.x and the one you're compiling was done with gcc5.x. I just checked and the version I compile without OpenCL works just fine. Any ideas?
ImportError: /usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: _ZN2cl4sycl7program30create_program_for_kernel_implENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPKhiPPKcSt10shared_ptrINS0_6detail7contextEE
I've been trying to install this by following the guide here. Now the issue seems to be with installing the fglrx driver, but my GPU is no longer supported (RX460). I tried to install the fglrx driver anyway on Ubuntu 14.04.1 and 14.04.5 but as expected, it didn't work. And following the guide while skipping the fglrx installation part still doesn't work.
Can someone please confirm whether there is any way to get this to work on an AMD GPU that doesn't support fglrx? So far I've spent two days trying to get this to work.
tensorflow-opencl$ bazel clean
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
inferno@hmstr:/media/Compressed/Drivers_bios/src/dev/tensorflow-opencl$ ./configure
/media/Compressed/Drivers_bios/src/dev/tensorflow-opencl /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
No Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
/usr/local/lib/python3.5/dist-packages
/usr/lib/python3/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages]
Using python library path: /usr/local/lib/python3.5/dist-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] Y
OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] N
No CUDA support will be enabled for TensorFlow
Please specify which C++ compiler should be used as the host C++ compiler. [Default is ]: /usr/bin/g++
Please specify which C compiler should be used as the host C compiler. [Default is ]: /usr/bin/gcc
Please specify the location where ComputeCpp for SYCL 1.2 is installed. [Default is /usr/local/computecpp]:
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
..........................
____Loading package: tensorflow/contrib/deprecated
____Loading package: tensorflow/core/platform/default/gpu
____Loading package: tensorflow/core/ops/compat
____Loading package: tensorflow/contrib/cudnn_rnn
____Loading package: tensorflow/contrib/seq2seq
____Loading package: tensorflow/tensorboard/components
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 65,536 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 280,012 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 353,748 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 784,820 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 1,225,818 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 1,422,604 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 1,741,654 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,052,512 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,242,524 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,424,028 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,568,664 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,721,808 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,883,460 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,047,948 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,201,092 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,366,998 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,537,158 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,711,572 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,887,404 bytes
INFO: All external dependencies fetched successfully.
Configuration finished
inferno@hmstr:/media/Compressed/Drivers_bios/src/dev/tensorflow-opencl$ nice bazel build --jobs 1 -c opt --verbose_failures //tensorflow/tools/pip_package:build_pip_package
INFO: Found 1 target...
INFO: From Executing genrule //tensorflow/core:version_info_gen [for host]:
fatal: No names found, cannot describe anything.
INFO: From Executing genrule //tensorflow/core:version_info_gen:
fatal: No names found, cannot describe anything.
INFO: From ProtoCompile tensorflow/core/protobuf/master.pb.cc:
bazel-out/local-py3-opt/genfiles/external/protobuf/src: warning: directory does not exist.
bazel-out/local-py3-opt/genfiles/external/protobuf/src: warning: directory does not exist.
INFO: From ProtoCompile tensorflow/core/kernels/reader_base.pb.cc:
bazel-out/local-py3-opt/genfiles/external/protobuf/src: warning: directory does not exist.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/core/kernels/BUILD:282:1: undeclared inclusion(s) in rule '//tensorflow/core/kernels:reader_base_proto_cc':
this rule is missing dependency declarations for the following files included by 'tensorflow/core/kernels/reader_base.pb.cc':
'/media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/core/kernels/reader_base.pb.h'.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 737.955s, Critical Path: 11.33s
Ubuntu 16.04, bazel 0.4.2
NOTE: Only file GitHub issues for bugs and feature requests. All other topics will be closed.
For general support from the community, see StackOverflow.
To make bugs and feature requests more easy to find and organize, we close issues that are deemed
out of scope for GitHub Issues and point people to StackOverflow.
For bugs or installation issues, please provide the following information.
The more information you provide, the more easily we will be able to offer
help and advice.
Operating System:
Ubuntu 16.04
Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*
):
If installed from binary pip package, provide:
python -c "import tensorflow; print(tensorflow.__version__)"
.If installed from source, provide
git rev-parse HEAD
)bazel version
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
import tensorflow
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/init.py", line 61, in
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/usr/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: /usr/local/lib/python3.5/dist-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: _ZN2cl4sycl7program30create_program_for_kernel_implESsPKhiPKPKcSt10shared_ptrINS0_6detail7contextEEb
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.5/dist-packages/tensorflow/init.py", line 24, in
from tensorflow.python import *
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/init.py", line 72, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/init.py", line 61, in
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/usr/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: /usr/local/lib/python3.5/dist-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: _ZN2cl4sycl7program30create_program_for_kernel_implESsPKhiPKPKcSt10shared_ptrINS0_6detail7contextEEb
(If logs are large, please upload as attachment or provide link).
Hi all,
Looking to see if more than one SYCL is possible at the moment. I can see that the sycl device factory adds them incrementally upon building tensorflow-opencl from source, but the physical_device_desc: "device: 0, name SYCL, pci bus id: 0" is hardcoded in, which makes me think only one SYCL device is currently supported.
All my gpus show up in computecpp_info fine, and clinfo.
Does anyone know?
Thanks!
Operating System: Ubuntu 14.04
Computecpp version: 0.1.4
Thanks for the great work on tensorflow-opencl. It's really great.
I'm getting a runtime error for almost all tensorflow programs:
2017-03-18 12:52:52.241954: F ./tensorflow/core/framework/tensor.cc:488] Check failed: IsAligned()
Aborted (core dumped)
I have an Intel HD Graphics 5500 GPU with the Intel Broadwell i5 CPU x64. I'm using Intel's OpenCL drivers from here: https://software.intel.com/en-us/articles/opencl-drivers .
The OS is Ubuntu 16.04 LTS.
Python version is 3.5. It's running in a conda environment using Anaconda's versions of python, numpy, scipy, pyyaml, h5py, pandas, and jupyter.
However, the tensorflow pip package was compiled using Ubuntu's version of everything as per the compile from source instructions. I disabled Anaconda by removing it from ~/.bashrc, compiled the pip package, re-enabled Anaconda, activated the conda environment, and installed the pip package into the conda environment.
Here's the only tensorflow program I tried that did not fail:
import random
import sys
import tensorflow as tf
import time
random_number_generator = random.SystemRandom()
NUM_ROWS = 1024
NUM_COLUMNS = 1024
a_array = []
for i in range(1, (NUM_ROWS * NUM_COLUMNS) - 1):
a_array.append(random_number_generator.random())
b_array = []
for i in range(1, (NUM_ROWS * NUM_COLUMNS) - 1):
b_array.append(random_number_generator.random())
# Creates a graph.
with tf.device('/device:SYCL:0'):
a = tf.constant(a_array, shape=[NUM_ROWS, NUM_COLUMNS], name='a', dtype=tf.float64)
b = tf.constant(b_array, shape=[NUM_COLUMNS, NUM_ROWS], name='b', dtype=tf.float64)
c = tf.matmul(a, b)
sess = tf.Session()
start = time.time()
sess.run(c)
Changing NUM_ROWS
and NUM_COLUMNS
to even 1200 resulted in the error above.
I also installed keras into the same conda environment using pip install keras
and ran this script: https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py . This resulted in the same error: Check failed: IsAligned()
. The error is displayed after Build model...
is outputted to the console.
git rev-parse HEAD
)dda6b4ee253ca3016841ff60b16df4be40b5b052
...........
Build label: 0.4.5
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Mar 16 12:19:38 2017 (1489666778)
Build timestamp: 1489666778
Build timestamp as int: 1489666778
Number of platforms 1
Platform Name Intel(R) OpenCL
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 2.0
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
Platform Extensions function suffix INTEL
Platform Name Intel(R) OpenCL
Number of devices 2
Device Name Intel(R) HD Graphics
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 2.0
Driver Version r4.0.59481
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
Device Profile FULL_PROFILE
Max compute units 24
Max clock frequency 900MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types by <unknown> (0x7FF200000000)
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 32
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 1 / 1
half 8 / 8 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 13231777383 (12.32GiB)
Error Correction support No
Max memory allocation 4294959103 (4GiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing No
Fine-grained system sharing No
Atomics No
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 64 bytes
Max size for global variable 65536 (64KiB)
Preferred total size of global vars 4294959103 (4GiB)
Global Memory cache type Read/Write
Global Memory cache size 589824
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 268434943 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 4 bytes
Pitch alignment for 2D image buffers 4 bytes
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 128
Max number of read/write image args 128
Max number of pipe args 16
Max active pipe reservations 1
Max pipe packet size 1024
Local memory type Local
Local memory size 65536 (64KiB)
Max constant buffer size 4294959103 (4GiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 131072 (128KiB)
Max size 67108864 (64MiB)
Max queues on device 1
Max events on device 1024
Prefer user sync for interop Yes
Profiling timer resolution 80ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel
Motion Estimation accelerator version (Intel) 2
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_driver_diagnostics cl_intel_media_block_io cl_intel_motion_estimation cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_required_subgroup_size cl_intel_subgroups cl_intel_va_api_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp16 cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_khr_spir
Device Name Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 2.0 (Build 400)
Driver Version 1.2.0.400
Device OpenCL C Version OpenCL C 2.0
Device Type CPU
Device Profile FULL_PROFILE
Max compute units 4
Max clock frequency 2200MHz
Device Partition (core)
Max number of sub-devices 4
Supported partition types by counts, equally, by names (Intel)
Max work item dimensions 3
Max work item sizes 8192x8192x8192
Max work group size 8192
Preferred work group size multiple 128
Preferred / native vector sizes
char 1 / 32
short 1 / 16
int 1 / 8
long 1 / 4
half 0 / 0 (n/a)
float 1 / 8
double 1 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 16550207488 (15.41GiB)
Error Correction support No
Max memory allocation 4137551872 (3.853GiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing No
Fine-grained system sharing No
Atomics No
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 0 bytes
Max size for global variable 65536 (64KiB)
Preferred total size of global vars 65536 (64KiB)
Global Memory cache type Read/Write
Global Memory cache size 262144
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 480
Max size for 1D images from buffer 258596992 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 64 bytes
Pitch alignment for 2D image buffers 64 bytes
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 480
Max number of write image args 480
Max number of read/write image args 480
Max number of pipe args 16
Max active pipe reservations 65535
Max pipe packet size 1024
Local memory type Global
Local memory size 32768 (32KiB)
Max constant buffer size 131072 (128KiB)
Max number of constant args 480
Max size of kernel argument 3840 (3.75KiB)
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Local thread execution (Intel) Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 4294967295 (4GiB)
Max size 4294967295 (4GiB)
Max queues on device 4294967295
Max events on device 4294967295
Prefer user sync for interop No
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [INTEL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
********************************************************************************
ComputeCpp Info (CE 0.1.2)
********************************************************************************
Toolchain information:
GLIBCXX: 20150426
This version of libstdc++ is supported.
********************************************************************************
Device Info:
Discovered 1 devices matching:
platform : <any>
device type : <any>
--------------------------------------------------------------------------------
Device 0:
Device is supported : UNTESTED - Device not tested on this OS
CL_DEVICE_NAME : Intel(R) HD Graphics
CL_DEVICE_VENDOR : Intel(R) Corporation
CL_DRIVER_VERSION : r4.0.59481
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
********************************************************************************
********************************************************************************
********************************************************************************
Hey guys,
Perhaps it's naive question but could you tell me the difference between CPU and GPU versions in this particular version of TF ? What version would I need if I want to use TF with Intel CPU based OpenCL ? I have i7 on kabylake. It supports OpenCL 2.0
I tried to use GPU version first and oddly it wants libcublas.so.9.0 from Nvidia CUDA 9.0. CPU version seems to be just CPU version.
Hello,
I literally followed this guide from end to end https://www.codeplay.com/portal/03-30-17-setting-up-tensorflow-with-opencl-using-sycl , and it kinda worked, (i tested multiplying small matrices) but when i run any of tensorflow examples ( https://github.com/aymericdamien/TensorFlow-Examples ) i am getting Nan in training loss everywhere, also batch computation time is very long.
I have AMD Radeon R9 290, ubuntu 14.04
tensorflow-opencl$ ./configure
/media/Compressed/Drivers_bios/src/dev/tensorflow-opencl /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
No Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
/usr/local/lib/python3.5/dist-packages
/usr/lib/python3/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages]
Using python library path: /usr/local/lib/python3.5/dist-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] Y
OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] N
No CUDA support will be enabled for TensorFlow
Please specify which C++ compiler should be used as the host C++ compiler. [Default is ]: /usr/lib/ccache/g++
Please specify which C compiler should be used as the host C compiler. [Default is ]: /usr/lib/ccache/gcc
Please specify the location where ComputeCpp for SYCL 1.2 is installed. [Default is /usr/local/computecpp]:
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
........................
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:17:3: //external:eigen_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:17:3: //external:eigen_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:28:3: //external:libxsmm_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:28:3: //external:libxsmm_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:44:3: //external:com_googlesource_code_re2: no such attribute 'urls' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:44:3: //external:com_googlesource_code_re2: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:54:3: //external:gemmlowp: no such attribute 'urls' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:54:3: //external:gemmlowp: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:64:3: //external:farmhash_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:64:3: //external:farmhash_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:80:3: //external:highwayhash: no such attribute 'urls' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:80:3: //external:highwayhash: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:90:3: //external:nasm: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:90:3: //external:nasm: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:101:3: //external:jpeg: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:101:3: //external:jpeg: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:112:3: //external:png_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:112:3: //external:png_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:123:3: //external:gif_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:123:3: //external:gif_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:135:3: //external:six_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:135:3: //external:six_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:151:3: //external:protobuf: no such attribute 'urls' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:151:3: //external:protobuf: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:161:3: //external:gmock_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:161:3: //external:gmock_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:187:3: //external:pcre: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:187:3: //external:pcre: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:198:3: //external:swig: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:198:3: //external:swig: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:222:3: //external:grpc: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:222:3: //external:grpc: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:245:3: //external:linenoise: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:245:3: //external:linenoise: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:258:3: //external:llvm: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:258:3: //external:llvm: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:269:3: //external:jsoncpp_git: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:269:3: //external:jsoncpp_git: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:285:3: //external:boringssl: no such attribute 'urls' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:285:3: //external:boringssl: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:295:3: //external:nanopb_git: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:295:3: //external:nanopb_git: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:311:3: //external:zlib_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:311:3: //external:zlib_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': error loading package 'external': Could not load //external package.
ERROR: missing fetch expression. Type 'bazel help fetch' for syntax and help.
My system is Ubuntu 16.04
Bazel version 0.3.2
env:
install got error :
(py3) C:\Users\Kasim\Desktop>pip install tf_nightly_gpu-1.head-cp35-cp35m-win_amd64.whl
Processing c:\users\kasim\desktop\tf_nightly_gpu-1.head-cp35-cp35m-win_amd64.whl
Exception:
Traceback (most recent call last):
File "D:\Anaconda3\envs\py3\lib\site-packages\pip\basecommand.py", line 215, in main
status = self.run(options, args)
File "D:\Anaconda3\envs\py3\lib\site-packages\pip\commands\install.py", line 335, in run
wb.build(autobuilding=True)
File "D:\Anaconda3\envs\py3\lib\site-packages\pip\wheel.py", line 749, in build
self.requirement_set.prepare_files(self.finder)
File "D:\Anaconda3\envs\py3\lib\site-packages\pip\req\req_set.py", line 380, in prepare_files
ignore_dependencies=self.ignore_dependencies))
File "D:\Anaconda3\envs\py3\lib\site-packages\pip\req\req_set.py", line 620, in _prepare_file
session=self.session, hashes=hashes)
File "D:\Anaconda3\envs\py3\lib\site-packages\pip\download.py", line 809, in unpack_url
unpack_file_url(link, location, download_dir, hashes=hashes)
File "D:\Anaconda3\envs\py3\lib\site-packages\pip\download.py", line 715, in unpack_file_url
unpack_file(from_path, location, content_type, link)
File "D:\Anaconda3\envs\py3\lib\site-packages\pip\utils\__init__.py", line 599, in unpack_file
flatten=not filename.endswith('.whl')
File "D:\Anaconda3\envs\py3\lib\site-packages\pip\utils\__init__.py", line 484, in unzip_file
zip = zipfile.ZipFile(zipfp, allowZip64=True)
File "D:\Anaconda3\envs\py3\lib\zipfile.py", line 1026, in __init__
self._RealGetContents()
File "D:\Anaconda3\envs\py3\lib\zipfile.py", line 1094, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
NOTE: Only file GitHub issues for bugs and feature requests. All other topics will be closed.
For general support from the community, see StackOverflow.
To make bugs and feature requests more easy to find and organize, we close issues that are deemed
out of scope for GitHub Issues and point people to StackOverflow.
For bugs or installation issues, please provide the following information.
The more information you provide, the more easily we will be able to offer
help and advice.
Operating System:
Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*
):
If installed from binary pip package, provide:
python -c "import tensorflow; print(tensorflow.__version__)"
.If installed from source, provide
git rev-parse HEAD
)bazel version
(If logs are large, please upload as attachment or provide link).
When running convolutional.py
in the mnist folder with Python3, I get the output
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2017-01-29 03:14:01: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-29 03:14:01: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
*** Error in `python3': double free or corruption (fasttop): 0x00007f97ec005670 ***
*** Error in `python3': malloc(): memory corruption (fast): 0x00000000043b5ce0 ***
Aborted
When immediately running the script a second time I get another error:
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2017-01-29 03:23:13: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-29 03:23:13: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
*** Error in `python3': malloc(): memory corruption (fast): 0x00007fac377fd0d0 ***
*** Error in `python3': Segmentation fault
Seems like there is some problem with the memory management which manifests nondeterministically. Uninitialized pointer perhaps?
Anyone knows what causes this or how I can find that out?
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
When importing tensorflow.
I'm working on debian stable and only cublas8 is available. Is saw a similar issue on tensorflow-gpu package and they ask installing 1.4 version instead. Is there a similar version for your package?
Anyway, thanks for everything you've made and you're doing! :D
Hi,
I've tryed to build tensorflow from source because I need OpenCL support.
The environment was successfully setted as in the guide but the build did not end correctly.
I'm running Ubuntu 16.04, I've tryed to build both with GCC and Clang. The output provided here is with CLANG.
If you need addictional informations please ask, I really need TF to work on this computer.
Thank you
$ bazel version
Build label: 0.4.3
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 22 12:31:25 2016 (1482409885)
Build timestamp: 1482409885
Build timestamp as int: 1482409885
Errors.txt
Hi, from the documentation I saw that the OpenCL support is on Linux only, and I saw that it's mainly tested on AMD GPUs. I'm wondering how much effort will be needed to support (1) ARM GPU such as the Mali series (2) Running tensorflow-opencl on Android?
Please go to Stack Overflow for help and support:
https://stackoverflow.com/questions/tagged/tensorflow
If you open a GitHub issue, here is our policy:
Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
Hello again,
Unfortunately I'm still unable to run tensorflow. A simple code like:
import tensorflow as tf
tf.Session()
gives me:
2017-01-30 00:17:17: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compi
led to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-30 00:17:17: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compi
led to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-01-30 00:17:17: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compi
led to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-30 00:17:17: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compi
led to use FMA instructions, but these are available on your machine and could speed up CPU computations.
terminate called after throwing an instance of 'cl::sycl::exception'
what(): Error: [ComputeCpp:RT0106] Device not found
Aborted (core dumped)
This is the output of computecpp_info
:
ComputeCpp Info (CE 0.1.2)
********************************************************************************
Toolchain information:
GLIBCXX: 20150426
This version of libstdc++ is supported.
********************************************************************************
Device Info:
Discovered 1 devices matching:
platform : <any>
device type : <any>
--------------------------------------------------------------------------------
Device 0:
Device is supported : NO - Device does not support SPIR
CL_DEVICE_NAME : AMD TAHITI (DRM 2.43.0, LLVM 3.8.0)
CL_DEVICE_VENDOR : AMD
CL_DRIVER_VERSION : 11.2.0
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
I believe Device does not support SPIR is bad news, although I hope not. Would you have any hints?
Should I move this issue to ComputeCpp's repository instead?
I'm running Ubuntu 16.04 and the output of bazel version
is:
Build label: 0.4.3
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 22 12:31:25 2016 (1482409885)
Build timestamp: 1482409885
Build timestamp as int: 1482409885
Thank you again :)
HTTP ERROR 404
Problem accessing /view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.0.0-cp35-cp35m-linux_x86_64.whl. Reason:
Not Found
the linux gpu version's page is error
After following the instructions as closely as possible (I'm using Python3 and I have an Intel GPU, if you can call it that), when I run the mnist example (by running python3 convolutional.py
from the terminal) I get the output
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2017-01-26 22:30:09: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-26 22:30:09: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
terminate called after throwing an instance of 'cl::sycl::cl_exception'
what(): Error: [ComputeCpp:RT0408] Error querying the number of OpenCL platforms in the system
Aborted
Why do I get this error (googling for the error message yields zero hits so I don't know where it is generated) and what can I do to fix it?
Compiles fine with
bazel build -copt=-march=native --local_resources 2048,.5,1.0 -c opt //tensorflow/tools/pip_package:build_pip_package
but then when running tensorflow it warns that my cpu has avx2 but it isn't enabled so I run
bazel build --copt=-mavx2 --copt=-march=native --local_resources 2048,.5,1.0 -c opt //tensorflow/tools/pip_package:build_pip_package
And I get the following error:
ERROR: /home/ben/tensorflow-opencl/tensorflow/core/kernels/BUILD:828:1: C++ compilation of rule '//tensorflow/core/kernels:gather_functor' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-canonical-system-headers ... (remaining 51 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
In file included from ./tensorflow/core/framework/numeric_types.h:25:0,
from ./tensorflow/core/framework/type_traits.h:22,
from ./tensorflow/core/kernels/gather_functor.h:22,
from tensorflow/core/kernels/gather_functor.cc:50:
./third_party/eigen3/unsupported/Eigen/CXX11/FixedPoint:42:52: fatal error: src/Tensor/TensorContractionThreadPool.h: No such file or directory
compilation terminated.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
I have compiled amdgpu-pro for gentoo , latest eselect-opencl for opencl 1.2 headers, compile and run some opencl samples (they worked!),
configured tensorflow for opencl, g++-5.4.0, gcc-5.4.0 and run the command:
bazel test -c opt --config=sycl --verbose_failures --test_timeout 3600 //tensorflow/python/kernel_tests:basic_gpu_test
and after compilation it failed with an error:
.terminate called after throwing an instance of 'cl::sycl::cl_exception'
what(): Error: [ComputeCpp:RT0107] Failed to create program from binary
external/bazel_tools/tools/test/test-setup.sh: line 114: 3209 Aborted (core dumped) "${TEST_PATH}" "$@"
Amazing work! Will you guys also support GPU (OpenCL) support for Mac?
What's the status of this? I'm building it from source(since I'm guessing that's the only option) but not sure if I'm doing it correctly. I know you have to add "config=cuda" if you're building for CUDA but not sure if there's a special config for OpenCL. Also, I'm planning on using this with my Intel gfx GPU, will that be supported if I use beignet?
I wonder if you could help me with the following error when trying to import tensorflow freshly built from commit 49359bd.
ImportError: libComputeCpp.so: cannot open shared object file: No such file or directory
Everything seemed to work just fine during building and installing. Something like:
./configure
> Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3.5
> Please specify optimization flags to use during compilation [Default is -march=native]:
> Do you wish to use jemalloc as the malloc implementation? [Y/n]
> jemalloc enabled
> Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
> No Google Cloud Platform support will be enabled for TensorFlow
> Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
> No Hadoop File System support will be enabled for TensorFlow
> Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] N
> No XLA JIT support will be enabled for TensorFlow
> Found possible Python library paths:
> /usr/local/lib/python3.5/dist-packages
> /usr/lib/python3/dist-packages
> Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages]
>
> Using python library path: /usr/local/lib/python3.5/dist-packages
> Do you wish to build TensorFlow with OpenCL support? [y/N] y
> OpenCL support will be enabled for TensorFlow
> Do you wish to build TensorFlow with CUDA support? [y/N] N
> No CUDA support will be enabled for TensorFlow
> Please specify which C++ compiler should be used as the host C++ compiler. [Default is ]: /usr/bin/gcc
> Please specify which C compiler should be used as the host C compiler. [Default is ]: /usr/bin/gcc
> Please specify the location where ComputeCpp for SYCL 1.2 is installed. [Default is /usr/local/computecpp]:
bazel build -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
sudo pip3 install /tmp/tensorflow_pkg/tensorflow-0.12.1-cp35-cp35m-linux_x86_64.whl
I'm running Ubuntu 16.04 and the output of bazel version
is:
Build label: 0.4.3
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 22 12:31:25 2016 (1482409885)
Build timestamp: 1482409885
Build timestamp as int: 1482409885
Thanks!
System Info:
OS: Ubuntu 16.04
CUDA version: None
GPU: Advanced Micro Devices, Inc. [AMD/ATI] Richland [Radeon HD 8650G]
TensorFlow Build: Source
I ran the bazel gpu test with the command:
bazel test -c opt --verbose_failures --test_timeout 3600 //tensorflow/python/kernel_tests:basic_gpu_test
which it passed.
However, I am trying to run someone else's code found here:
https://github.com/cysmith/neural-style-tf
command I run is:
bash stylize_image.sh ./image_input/lion.jpg ./styles/kandinsky.jpg
which gives me the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'Variable': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
[[Node: Variable = VariableV2container="", dtype=DT_FLOAT, shape=[1,512,512,3], shared_name="", _device="/device:GPU:0"]]
How should I change the code to get it to work with my AMD GPU?
Can you add a dockerfile for easier build?
Similar to this:
https://github.com/benoitsteiner/tensorflow-opencl/tree/master/tensorflow/tools/docker
I would like to be able to use my Rx480 GPU for tensorflow but there are no pre_built pip packages to install tensorflow-opencl on windows 10. Building from source code looks very hard. could some one make a pre_built pip packages for windows 10 that would be awesome.
Very confused. How does one install and test this? It's basically just a big old directory with lots of files and the readme is just a copy of the tensorflow install which tell you how to install tensorflow, not tensorflow-opencl.
Hi,
I am trying to compile the tensorflow source with opencl 'syscl' flag. I got the following error:
In file included from <command-line>:0:0:
./bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/relu_op/tensorflow/core/kernels/relu_op.pic.sycl:8149:22: note: 'const bool cl::sycl::detail::{anonymous}::arg_used [16]' previously defined here
const bool kernel_info< ::Eigen::TensorSycl::ExecExprFunctorKernel<const ::Eigen::TensorAssignOp< ::Eigen::TensorMap< ::Eigen::Tensor<float, 1, 1, long>, 16, ::Eigen::MakePointer>, const ::Eigen::TensorCwiseBinaryOp< ::Eigen::internal::scalar_product_op<const float, const float>, const ::Eigen::TensorMap< ::Eigen::Tensor<const float, 1, 1, long>, 16, ::Eigen::MakePointer>, const ::Eigen::TensorConversionOp<float, const ::Eigen::TensorCwiseBinaryOp< ::Eigen::internal::scalar_cmp_op<const float, const float, ::Eigen::internal::ComparisonName::cmp_GT>, const ::Eigen::TensorMap< ::Eigen::Tensor<const float, 1, 1, long>, 16, ::Eigen::MakePointer>, const ::Eigen::TensorCwiseNullaryOp< ::Eigen::internal::scalar_constant_op<const float>, const ::Eigen::TensorMap< ::Eigen::Tensor<const float, 1, 1, long>, 16, ::Eigen::MakePointer> > > > > >, ::Eigen::TensorSycl::internal::FunctorExtractor< ::Eigen::TensorEvaluator<const ::Eigen::TensorAssignOp< ::Eigen::TensorMap< ::Eigen::Tensor<float, 1, 1, long>, 16, ::Eigen::MakePointer>, const ::Eigen::TensorCwiseBinaryOp< ::Eigen::internal::scalar_product_op<const float, const float>, const ::Eigen::TensorMap< ::Eigen::Tensor<const float, 1, 1, long>, 16, ::Eigen::MakePointer>, const ::Eigen::TensorConversionOp<float, const ::Eigen::TensorCwiseBinaryOp< ::Eigen::internal::scalar_cmp_op<const float, const float, ::Eigen::internal::ComparisonName::cmp_GT>, const ::Eigen::TensorMap< ::Eigen::Tensor<const float, 1, 1, long>, 16, ::Eigen::MakePointer>, const ::Eigen::TensorCwiseNullaryOp< ::Eigen::internal::scalar_constant_op<const float>, const ::Eigen::TensorMap< ::Eigen::Tensor<const float, 1, 1, long>, 16, ::Eigen::MakePointer> > > > > >, const ::Eigen::SyclDevice> >, ::utility::tuple::Tuple< ::cl::sycl::accessor<unsigned char, 1, ::cl::sycl::access::mode::read_write, ::cl::sycl::access::target::global_buffer>, ::cl::sycl::accessor<unsigned char, 1, ::cl::sycl::access::mode::read, ::cl::sycl::access::target::global_buffer>, ::cl::sycl::accessor<unsigned char, 1, ::cl::sycl::access::mode::read, ::cl::sycl::access::target::global_buffer>, ::cl::sycl::accessor<unsigned char, 1, ::cl::sycl::access::mode::read, ::cl::sycl::access::target::global_buffer> > > >::arg_used[] = {
^
In file included from <command-line>:0:0:
./bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/relu_op/tensorflow/core/kernels/relu_op.pic.sycl:8012:83: error: template parameter 'int cmp'
template <typename LhsScalar, typename RhsScalar, Eigen::internal::ComparisonName cmp> struct scalar_cmp_op;
^
In file included from external/eigen_archive/unsupported/Eigen/CXX11/../../../Eigen/Core:423:0,
from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:14,
from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1,
from ./tensorflow/core/kernels/relu_op.h:23,
from tensorflow/core/kernels/relu_op.cc:20:
external/eigen_archive/unsupported/Eigen/CXX11/../../../Eigen/src/Core/functors/BinaryFunctors.h:190:77: error: redeclared here as 'Eigen::internal::ComparisonName cmp'
template<typename LhsScalar, typename RhsScalar, ComparisonName cmp> struct scalar_cmp_op;
^
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 4754.029s, Critical Path: 191.56s
I have used the following command to initiate compilation:
bazel build -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package
I have AMD radeon 7670M GPU. I am using ubuntu 14.04.5 LTS and have latest compute-cpp module. My AMD drivers are working correctly ( The sole reason why i am using 14.04.5 :) )
~/TensorFlow/Packages/tensorflow$ glxinfo | grep OpenGL
OpenGL vendor string: Advanced Micro Devices, Inc.
OpenGL renderer string: AMD Radeon HD 7600M Series
OpenGL core profile version string: 4.3.13416 Core Profile Context 15.302
OpenGL core profile shading language version string: 4.40
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.5.13416 Compatibility Profile Context 15.302
OpenGL shading language version string: 4.40
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL extensions:
$ bazel test //tensorflow/python/kernel_tests:basic_gpu_test INFO: Found 1 test target...
Target //tensorflow/python/kernel_tests:basic_gpu_test up-to-date:
bazel-bin/tensorflow/python/kernel_tests/basic_gpu_test
INFO: Elapsed time: 2.213s, Critical Path: 1.83s
//tensorflow/python/kernel_tests:basic_gpu_test PASSED in 1.8s
Executed 1 out of 1 test: 1 test passes.
Any help will be appreciated.
Thanks.
Please go to Stack Overflow for help and support:
https://stackoverflow.com/questions/tagged/tensorflow
If you open a GitHub issue, here is our policy:
Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
I get the following error from bazel when trying to compile tensorflow with OpenCL support. Using bazel with -c opt and --config=sycl. Other details:
ERROR: /home/smistad/workspace/tensorflow/tensorflow/cc/BUILD:120:1: C++ compilation of rule '//tensorflow/cc:scope' failed: computecpp failed: error executing command external/local_config_sycl/crosstool/computecpp -Wall '-std=c++11' -MD -MF bazel-out/local_linux-py3-opt/bin/tensorflow/cc/_objs/scope/tensorflow/cc/framework/scope.pic.d ... (remaining 97 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
tensorflow/cc/framework/scope.cc:26:8: error: constructor for 'tensorflow::Scope' must explicitly initialize the const member 'colocation_constraints_'
Scope::Scope(Graph* graph, Status* status, Scope::NameMap* name_map,
^
./tensorflow/cc/framework/scope.h:255:36: note: 'colocation_constraints_' declared here
const std::unordered_set<string> colocation_constraints_;
^
System information
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu):14.04.3-->Trusty
TensorFlow installed from (source or binary):Source
TensorFlow version (use command below):1.0 (Steps-> Downloaded tensorflow from https://github.com/benoitsteiner/tensorflow-opencl, ./configure - to configure project)
Bazel version (if compiling from source):0.5.4
CUDA/cuDNN version:NA
OPENCL Version:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (1800.11)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
GPU model and memory:
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Board name: AMD Radeon (TM) R5 M335
Memory: 4096M
Exact command to reproduce:
run the python script -- ipython keras_code.py
** G++/GCC version**:
g++-4.8
I have compiled CPP programs, they work fine.
ComputeCPP: 0.3.4
-- ** Python**: I am using Anaconda distribution Python for 2.7.2. (Anaconda - 2.4.3)
Describe the problem:
I have tried installing as per the steps given in :
https://www.codeplay.com/portal/03-30-17-setting-up-tensorflow-with-opencl-using-sycl
I have cloned the following git Repo:
git clone https://github.com/benoitsteiner/tensorflow-opencl.git
Everything works fine till the compile step: I use the following command:
bazel build --local_resources 2048,.5,1.0 -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package
================
Error
ERROR: /home/sayantan/Downloads/tensorflow-opencl/tensorflow/core/kernels/BUILD:2512:1: C++ compilation of rule '//tensorflow/core/kernels:softmax_op' failed (Exit 1).
In file included from tensorflow/core/kernels/softmax_op.cc:20:
In file included from ./tensorflow/core/kernels/softmax_op.h:23:
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:4:
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:14:
In file included from external/eigen_archive/Eigen/Core:299:
In file included from external/local_config_sycl/crosstool/../sycl/include/SYCL/sycl.hpp:20:
In file included from external/local_config_sycl/crosstool/../sycl/include/SYCL/sycl_interface.h:54:
external/local_config_sycl/crosstool/../sycl/include/SYCL/multi_pointer.h:342:3: error: multiple overloads of 'global_ptr' instantiate to the same signature 'void (pointer_t)' (aka 'void (attribute((address_space(1))) float *)')
global_ptr(pointer_t ptr) : Base(ptr) {}
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorSyclExprConstructor.h:42:74: note: in instantiation of template class 'cl::sycl::global_ptr<attribute((address_space(1))) float>' requested here
EvalToLHSConstructor(const utility::tuple::Tuple<Params...> &t) : expr(ConvertToActualTypeSycl(typename Eigen::internal::remove_all::type, utility::tuple::get(t))) {}
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceSycl.h:31:88: note: expanded from macro 'ConvertToActualTypeSycl'
#define ConvertToActualTypeSycl(Scalar, buf_acc) reinterpret_cast<typename cl::sycl::global_ptr::pointer_t>((&(*buf_acc.get_pointer())))
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorSyclExprConstructor.h:226:1: note: in instantiation of member function 'Eigen::TensorSycl::internal::EvalToLHSConstructor<attribute((address_space(1))) float *, 0, cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::write, cl::sycl::access::target::global_buffer, cl::sycl::codeplay::access::placeholder::false_t>, cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::read, cl::sycl::access::target::global_buffer, cl::sycl::codeplay::access::placeholder::false_t> >::EvalToLHSConstructor' requested here
EVALTO(const)
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorSyclExprConstructor.h:223:41: note: expanded from macro 'EVALTO'
: nestedExpression(funcD.xprExpr, t), buffer(t), expr(buffer.expr, nestedExpression.expr) {}
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorSyclExprConstructor.h:506:10: note: in instantiation of function template specialization 'Eigen::TensorSycl::internal::ExprConstructor<const Eigen::TensorEvalToOp<const Eigen::TensorReductionOp<Eigen::internal::MaxReducer, const Eigen::IndexList<Eigen::type2index<1>>, const Eigen::TensorMap<Eigen::Tensor<const float, 2, 1, long>, 16, MakeGlobalPointer>, MakeGlobalPointer>, MakeGlobalPointer>, const Eigen::TensorEvalToOp<const Eigen::TensorSycl::internal::PlaceHolder<const Eigen::TensorReductionOp<Eigen::internal::MaxReducer, const Eigen::IndexList<Eigen::type2index<1>>, const Eigen::TensorMap<Eigen::Tensor<const float, 2, 1, long>, 16, MakePointer>, MakePointer>, 1>, MakePointer>, cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::write, cl::sycl::access::target::global_buffer, cl::sycl::codeplay::access::placeholder::false_t>, cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::read, cl::sycl::access::target::global_buffer, cl::sycl::codeplay::access::placeholder::false_t> >::ExprConstructor<Eigen::TensorSycl::internal::FunctorExtractor<Eigen::TensorEvaluator<const Eigen::TensorEvalToOp<const Eigen::TensorReductionOp<Eigen::internal::MaxReducer, const Eigen::IndexList<Eigen::type2index<1>>, const Eigen::TensorMap<Eigen::Tensor<const float, 2, 1, long>, 16, MakePointer>, MakePointer>, MakePointer>, const Eigen::SyclDevice> > >' requested here
return ExprConstructor<OrigExpr, IndexExpr, Params...>(funcD, t);
.....
2 errors generated.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 3715.231s, Critical Path: 167.06s
Can you please help?
Thanks and regards
Sayantan Raha
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.