benoitsteiner / tensorflow-opencl Goto Github PK

OpenCL support for TensorFlow

License: Apache License 2.0

Python 44.45% C++ 48.26% C 0.31% CMake 1.02% Objective-C 0.01% Objective-C++ 0.10% Makefile 0.06% Shell 0.59% Java 0.69% Jupyter Notebook 2.87% Go 1.56% Batchfile 0.01% LLVM 0.01% Ruby 0.01% Perl 0.01% PureBasic 0.04%

tensorflow-opencl's Issues

Issue with running Tensorflow with OpenCL - Ubuntu 14.04.3 (Trusty) - AMD R5 Radeon M335 GPU

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):14.04.3-->Trusty
TensorFlow installed from (source or binary):Source
TensorFlow version (use command below):1.0 (Steps-> Downloaded tensorflow from https://github.com/benoitsteiner/tensorflow-opencl, ./configure - to configure project)
Bazel version (if compiling from source):0.4.5
CUDA/cuDNN version:NA
OPENCL Version:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (1800.11)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
GPU model and memory:
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Board name: AMD Radeon (TM) R5 M335
Memory: 4096M
Exact command to reproduce:
run the python script -- ipython keras_code.py
** G++/GCC version**:
g++-4.9 (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4

I have compiled CPP programs, they work fine.

ComputeCPP: 0.1.1

-- ** Python**: I am using Anaconda distribution Python for 2.7.2. (Anaconda - 2.4.3)

Describe the problem

I have compile tensorflow, and deployed the same -> No issues here. when I try to run the code I run into the following error:

2017-04-23 14:01:15.180795: W ./tensorflow/core/common_runtime/sycl/sycl_util.h:44] No OpenCL GPU found that is supported by ComputeCpp, trying OpenCL CPU
2017-04-23 14:01:15.180843: F ./tensorflow/core/common_runtime/sycl/sycl_util.h:53] No OpenCL GPU nor CPU found that is supported by ComputeCpp
Aborted (core dumped)

I have attached the code file. Please note this is a simplified version of the file. The logic is:

Read data from files,
Pass it through a NN
I am using Keras as the Functional programming API on top of Tensorflow.

tensorflow-code-throwing-error.txt
Please let me know if there are any fixes or if I can do something to get round this issue.
Thanks and regards
Sayantan

Issues getting R9 390x working with openCL using the codeplay tutorial.

Hey everyone,

Post is mostly self-explanatory, but I've been struggling to get open-CL working using the AMD R9 390x while trying to follow the tutorial here. I get this error when I try to run clinfo:

modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='fglrx'
modprobe: ERROR: could not insert 'fglrx': Function not implemented
Error! Fail to load fglrx kernel module! Maybe you can switch to root user to load kernel module directly
modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='fglrx'
modprobe: ERROR: could not insert 'fglrx': Function not implemented
Error! Fail to load fglrx kernel module! Maybe you can switch to root user to load kernel module directly
X Error of failed request: BadRequest (invalid request code or no such operation)
Major opcode of failed request: 156 ()
Minor opcode of failed request: 19
Serial number of failed request: 12
Current serial number in output stream: 12

Was a pain to finally get even the fglrx drivers to not be buggy as hell and stop throwing up errors in general, which I'm sure is going to be related somehow. What worked for me was the purging and installation methods found at this post. Everything else I followed directly from the tutorial.

Anyone have any ideas as to where I should start looking? I know I have fglrx installed, so I'm really not sure what is up. Related but not so important...my 3 displays are also not working properly either.

Please let me know what would be some good logs to provide so I can provide you with with some helpful info. Thank you!

Unsuccessful Build on A10-7850K, please help!

This is a follow-up on a previous message. I am encountering build errors, and don't seem to be able to find the source of it.

I have followed the following steps that I believe your colleague posted here:

https://www.codeplay.com/portal/03-30-17-setting-up-tensorflow-with-opencl-using-sycl

I deviated these instructions in the following way:

I did not update execute the following steps:

$ sudo apt-get install linux-image-3.19.0-79-generic linux-image-extra-3.19.0-79-generic linux-headers-3.19.0-79-generic 
$ sudo apt-get remove linux-image-4.2.0-42-generic 
$ sudo update-grub -

I was not sure why it is important to go to a that particular kernal so I did not upgrade the kernel. This is the version of Ubuntu I am using:

Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty

I am using the following kernel as part of t his standard Ubuntu 14.04.5 built:

3.13.0-116-generic

I used Python 3.5 inside a conda environment instead of Python 2.7

clinfo gives the following info:

Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.0 AMD-APP (1912.5)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 2
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon(TM) R7 Graphics  
  Device Topology:				 PCI[ B#0, D#1, F#0 ]
  Max compute units:				 8
  Max work items dimensions:			 3
    Max work items[0]:				 256
    Max work items[1]:				 256
    Max work items[2]:				 256
  Max work group size:				 256
  Preferred vector width char:			 4
  Preferred vector width short:			 2
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Native vector width char:			 4
  Native vector width short:			 2
  Native vector width int:			 1
  Native vector width long:			 1
  Native vector width float:			 1
  Native vector width double:			 1
  Max clock frequency:				 720Mhz
  Address bits:					 64
  Max memory allocation:			 215482368
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 64
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 2048
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 16384
  Global memory size:				 861929472
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Max pipe arguments:				 16
  Max pipe active reservations:			 16
  Max pipe packet size:				 215482368
  Max global variable size:			 193934080
  Max global variable preferred total size:	 861929472
  Max read/write image args:			 64
  Max on device events:				 1024
  Queue on device max size:			 8388608
  Max on device queues:				 1
  Queue on device preferred size:		 262144
  SVM capabilities:				 
    Coarse grain buffer:			 Yes
    Fine grain buffer:				 Yes
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 64
  Error correction support:			 0
  Unified memory for Host and Device:		 1
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:				 
    Out-of-Order:				 Yes
    Profiling :					 Yes
  Platform ID:					 0x7f1c77535a18
  Name:						 Spectre
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 2.0 
  Driver version:				 1912.5 (VM)
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 2.0 AMD-APP (1912.5)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_khr_gl_depth_images cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes 


  Device Type:					 CL_DEVICE_TYPE_CPU
  Vendor ID:					 1002h
  Board name:					 
  Max compute units:				 4
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 1024
  Preferred vector width char:			 16
  Preferred vector width short:			 8
  Preferred vector width int:			 4
  Preferred vector width long:			 2
  Preferred vector width float:			 8
  Preferred vector width double:		 4
  Native vector width char:			 16
  Native vector width short:			 8
  Native vector width int:			 4
  Native vector width long:			 2
  Native vector width float:			 8
  Native vector width double:			 4
  Max clock frequency:				 3700Mhz
  Address bits:					 64
  Max memory allocation:			 2147483648
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 64
  Max image 2D width:				 8192
  Max image 2D height:				 8192
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 16
  Max size of kernel argument:			 4096
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 Yes
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 16384
  Global memory size:				 7182524416
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Global
  Local memory size:				 32768
  Max pipe arguments:				 16
  Max pipe active reservations:			 16
  Max pipe packet size:				 2147483648
  Max global variable size:			 1879048192
  Max global variable preferred total size:	 1879048192
  Max read/write image args:			 64
  Max on device events:				 0
  Queue on device max size:			 0
  Max on device queues:				 0
  Queue on device preferred size:		 0
  SVM capabilities:				 
    Coarse grain buffer:			 No
    Fine grain buffer:				 No
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 1
  Error correction support:			 0
  Unified memory for Host and Device:		 1
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 Yes
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:				 
    Out-of-Order:				 No
    Profiling :					 No
  Platform ID:					 0x7f1c77535a18
  Name:						 AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
  Vendor:					 AuthenticAMD
  Device OpenCL C version:			 OpenCL C 1.2 
  Driver version:				 1912.5 (sse2,avx,fma4)
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2 AMD-APP (1912.5)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event

/usr/local/computecpp/bin/computecpp_info gives the following output

********************************************************************************

ComputeCpp Info (CE 0.1.3)

********************************************************************************

Toolchain information:

GLIBCXX: 20150426
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 1 devices matching:
  platform    : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                     : UNTESTED - Device not tested on this OS
  CL_DEVICE_NAME                          : Spectre
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 1912.5 (VM)
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 
********************************************************************************

********************************************************************************

********************************************************************************

I note here that somehow the CPU was not detected which is different from the tutorial mentioned above.

After configuring with default options, I run the following command:

$ bazel build -c opt --copt=-mavx --copt=-msse4.1 --copt=-msse4.2 --config=sycl //tensorflow/tools/pip_package:build_pip_package --verbose_failures

I am encountering the following error:

INFO: Found 1 target...
INFO: From Executing genrule //tensorflow/cc:array_ops_genrule:
2017-04-18 22:42:10.696714: W tensorflow/core/framework/op_gen_lib.cc:194] Squeeze can't find input squeeze_dims to rename
ERROR: /home/anthonyle/Projects/tensorflow-opencl/tensorflow/core/kernels/BUILD:2616:1: C++ compilation of rule '//tensorflow/core/kernels:pooling_ops' failed: computecpp failed: error executing command 
  (cd /home/anthonyle/.cache/bazel/_bazel_anthonyle/1b4b305bac04d7a568c973de167c2cf3/execroot/tensorflow-opencl && \
  exec env - \
  external/local_config_sycl/crosstool/computecpp -Wall -msse3 -g0 -O2 -DNDEBUG -mavx -msse4.1 -msse4.2 '-std=c++11' -MD -MF bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/pooling_ops/tensorflow/core/kernels/pooling_ops_3d.pic.d '-frandom-seed=bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/pooling_ops/tensorflow/core/kernels/pooling_ops_3d.pic.o' -fPIC -DEIGEN_MPL2_ONLY -DTENSORFLOW_USE_JEMALLOC -iquote . -iquote bazel-out/local_linux-py3-opt/genfiles -iquote external/eigen_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -iquote external/bazel_tools -iquote bazel-out/local_linux-py3-opt/genfiles/external/bazel_tools -iquote external/local_config_sycl -iquote bazel-out/local_linux-py3-opt/genfiles/external/local_config_sycl -iquote external/jemalloc -iquote bazel-out/local_linux-py3-opt/genfiles/external/jemalloc -iquote external/protobuf -iquote bazel-out/local_linux-py3-opt/genfiles/external/protobuf -iquote external/gif_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/gif_archive -iquote external/jpeg -iquote bazel-out/local_linux-py3-opt/genfiles/external/jpeg -iquote external/com_googlesource_code_re2 -iquote bazel-out/local_linux-py3-opt/genfiles/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive -iquote external/highwayhash -iquote bazel-out/local_linux-py3-opt/genfiles/external/highwayhash -iquote external/png_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/png_archive -iquote external/zlib_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -isystem external/eigen_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/local_config_sycl/sycl -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_sycl/sycl -isystem external/local_config_sycl/sycl/include -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_sycl/sycl/include -isystem external/jemalloc/include -isystem bazel-out/local_linux-py3-opt/genfiles/external/jemalloc/include -isystem external/protobuf/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/protobuf/src -isystem external/gif_archive/lib -isystem bazel-out/local_linux-py3-opt/genfiles/external/gif_archive/lib -isystem external/farmhash_archive/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive/src -isystem external/png_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/png_archive -isystem external/zlib_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare -fno-exceptions -msse3 -pthread -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c tensorflow/core/kernels/pooling_ops_3d.cc -o bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/pooling_ops/tensorflow/core/kernels/pooling_ops_3d.pic.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
In file included from tensorflow/core/kernels/pooling_ops_3d.cc:26:
./tensorflow/core/kernels/eigen_pooling.h:354:9: error: cannot compile this builtin function yet
        pequal(p, pset1<Packet>(-Eigen::NumTraits<T>::highest()));
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./tensorflow/core/kernels/eigen_pooling.h:337:22: note: expanded from macro 'pequal'
#define pequal(a, b) _mm256_cmp_ps(a, b, _CMP_EQ_UQ)
                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/computecpp/bin/../lib/clang/3.6.0/include/avxintrin.h:421:11: note: expanded from macro '_mm256_cmp_ps'
  (__m256)__builtin_ia32_cmpps256((__v8sf)__a, (__v8sf)__b, (c)); })
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 20.702s, Critical Path: 20.03s

Did skipping some of the steps outlined above really lead to these errors? What did I do wrong?

Compilation fails at eigen, compiler finds an error

Environment info

Operating System: Antergos x64

Here is the following error:
ERROR: /home/samyr/programming/tools/tensorflow-opencl/tensorflow/core/kernels/BUILD:881:1: C++ compilation of rule '//tensorflow/core/kernels:gather_functor' failed (Exit 1). In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:141:0, from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:4, from ./tensorflow/core/kernels/gather_functor.h:19, from tensorflow/core/kernels/gather_functor.cc:50: external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorGenerator.h: In constructor 'Eigen::TensorEvaluator<const Eigen::TensorGeneratorOp<Generator, XprType>, Device>::TensorEvaluator(const XprType&, const Device&)': external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorGenerator.h:100:38: error: class 'Eigen::TensorEvaluator<const Eigen::TensorGeneratorOp<Generator, XprType>, Device>' does not have any field named 'm_argImpl' : m_generator(op.generator()), m_argImpl(op.expression(), device) ^~~~~~~~~ external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorGenerator.h:102:20: error: 'm_argImpl' was not declared in this scope m_dimensions = m_argImpl.dimensions(); ^~~~~~~~~
Installation has been done from source, using OpenCL. GCC version is 7.1.1

What other attempted solutions have you tried?

I have tried to download the source a second time, I have also tried to install g++, as Arch Linux only comes with GCC.

Whats up with the windows build?

is the windows build dead?

How to compile for AMD GPU?

what options need to be selected when running ./configure in tensorflow-opencl directory? Here's where I'm not sure what to enter in order to get tensorflow to recognize and use the AMD GPU/APU.
`~/tensorflow-opencl ~/tensorflow-opencl
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
/usr/lib/python3/dist-packages
Please input the desired Python library path to use. Default is [/usr/lib/python3/dist-packages]

Using python library path: /usr/lib/python3/dist-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] y
OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]:
`

404 Error Downloading Nightly

I'm getting a 404 Not Found error when trying to access the nightly build download:

https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-0.12.1-py2-none-any.whl

Here's the error:

HTTP ERROR 404

Problem accessing /view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-0.12.1-py2-none-any.whl. Reason:

    Not Found

Powered by Jetty://

Linked from: https://github.com/benoitsteiner/tensorflow-opencl

People who are a little more adventurous can also try our nightly binaries:

Linux CPU-only: Python 2 (build history) / Python 3.4 (build history) / Python 3.5 (build history)
Linux GPU: Python 2 (build history) / Python 3.4 (build history) / Python 3.5 (build history)
Mac CPU-only: Python 2 (build history) / Python 3 (build history)

--> Mac GPU: Python 2 (build history) / Python 3 (build history)

Android: demo APK, native libs (build history)

build tensorflow with sycl

Hi
i try to build tensorflow with sycl
but i get error below:

external/local_config_sycl/crosstool/../sycl/include/SYCL/multi_pointer.h:342:3: error: multiple overloads of 'global_ptr' instantiate to the same signature 'void (pointer_t)' (aka 'void (attribute((address_space(1))) float *)')
global_ptr(pointer_t ptr) : Base(ptr) {}
^
...
^
tensorflow/core/kernels/adjust_contrast_op.cc:427:44: note: in instantiation of member function 'tensorflow::functor::AdjustContrastv2Eigen::SyclDevice::operator()' requested here
functor::AdjustContrastv2()(
^
external/local_config_sycl/crosstool/../sycl/include/SYCL/multi_pointer.h:334:3: note: previous declaration is here
global_ptr(dataType *ptr);
^
It seems re define global_ptr , how can i fix it??

my build command is:
bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package

python 2.7/3.6 is the same error

gcc 4.85 / 5.x is the same error

thx a lot!!

ImportError: /usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: _ZN2cl4sycl7program30create_program_for_kernel_implENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPKhiPPKcSt10shared_ptrINS0_6detail7contextEE

My build goes fine but when I import tensorflow I get the following. I'm doing this on Ubuntu 16.04 but I see this type of message on Gentoo when one of the packages has been compiled with gcc4.x and the one you're compiling was done with gcc5.x. I just checked and the version I compile without OpenCL works just fine. Any ideas?

ImportError: /usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: _ZN2cl4sycl7program30create_program_for_kernel_implENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPKhiPPKcSt10shared_ptrINS0_6detail7contextEE

Anyway to run on AMDGPU-PRO driver?

I've been trying to install this by following the guide here. Now the issue seems to be with installing the fglrx driver, but my GPU is no longer supported (RX460). I tried to install the fglrx driver anyway on Ubuntu 14.04.1 and 14.04.5 but as expected, it didn't work. And following the guide while skipping the fglrx installation part still doesn't work.

Can someone please confirm whether there is any way to get this to work on an AMD GPU that doesn't support fglrx? So far I've spent two days trying to get this to work.

Can't build pip_package

tensorflow-opencl$ bazel clean
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
inferno@hmstr:/media/Compressed/Drivers_bios/src/dev/tensorflow-opencl$ ./configure 
/media/Compressed/Drivers_bios/src/dev/tensorflow-opencl /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
No Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
  /usr/local/lib/python3.5/dist-packages
  /usr/lib/python3/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.5/dist-packages]

Using python library path: /usr/local/lib/python3.5/dist-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] Y
OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] N
No CUDA support will be enabled for TensorFlow
Please specify which C++ compiler should be used as the host C++ compiler. [Default is ]: /usr/bin/g++
Please specify which C compiler should be used as the host C compiler. [Default is ]: /usr/bin/gcc
Please specify the location where ComputeCpp for SYCL 1.2 is installed. [Default is /usr/local/computecpp]: 
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
..........................
____Loading package: tensorflow/contrib/deprecated
____Loading package: tensorflow/core/platform/default/gpu
____Loading package: tensorflow/core/ops/compat
____Loading package: tensorflow/contrib/cudnn_rnn
____Loading package: tensorflow/contrib/seq2seq
____Loading package: tensorflow/tensorboard/components
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 65,536 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 280,012 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 353,748 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 784,820 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 1,225,818 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 1,422,604 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 1,741,654 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,052,512 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,242,524 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,424,028 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,568,664 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,721,808 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,883,460 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,047,948 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,201,092 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,366,998 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,537,158 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,711,572 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,887,404 bytes
INFO: All external dependencies fetched successfully.
Configuration finished
inferno@hmstr:/media/Compressed/Drivers_bios/src/dev/tensorflow-opencl$ nice bazel build --jobs 1 -c opt --verbose_failures //tensorflow/tools/pip_package:build_pip_package
INFO: Found 1 target...
INFO: From Executing genrule //tensorflow/core:version_info_gen [for host]:
fatal: No names found, cannot describe anything.
INFO: From Executing genrule //tensorflow/core:version_info_gen:
fatal: No names found, cannot describe anything.
INFO: From ProtoCompile tensorflow/core/protobuf/master.pb.cc:
bazel-out/local-py3-opt/genfiles/external/protobuf/src: warning: directory does not exist.
bazel-out/local-py3-opt/genfiles/external/protobuf/src: warning: directory does not exist.
INFO: From ProtoCompile tensorflow/core/kernels/reader_base.pb.cc:
bazel-out/local-py3-opt/genfiles/external/protobuf/src: warning: directory does not exist.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/core/kernels/BUILD:282:1: undeclared inclusion(s) in rule '//tensorflow/core/kernels:reader_base_proto_cc':
this rule is missing dependency declarations for the following files included by 'tensorflow/core/kernels/reader_base.pb.cc':
  '/media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/core/kernels/reader_base.pb.h'.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 737.955s, Critical Path: 11.33s

Ubuntu 16.04, bazel 0.4.2

python import error (undefined symbol)

NOTE: Only file GitHub issues for bugs and feature requests. All other topics will be closed.

For general support from the community, see StackOverflow.
To make bugs and feature requests more easy to find and organize, we close issues that are deemed
out of scope for GitHub Issues and point people to StackOverflow.

For bugs or installation issues, please provide the following information.
The more information you provide, the more easily we will be able to offer
help and advice.

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

Environment info

Operating System:
Ubuntu 16.04

Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*):

If installed from binary pip package, provide:

A link to the pip package you installed:
The output from python -c "import tensorflow; print(tensorflow.__version__)".

If installed from source, provide

The commit hash (git rev-parse HEAD)
The output of bazel version

If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)

Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

import tensorflow
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/init.py", line 61, in
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/usr/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: /usr/local/lib/python3.5/dist-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: _ZN2cl4sycl7program30create_program_for_kernel_implESsPKhiPKPKcSt10shared_ptrINS0_6detail7contextEEb

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.5/dist-packages/tensorflow/init.py", line 24, in
from tensorflow.python import *
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/init.py", line 72, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/init.py", line 61, in
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/usr/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: /usr/local/lib/python3.5/dist-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: _ZN2cl4sycl7program30create_program_for_kernel_implESsPKhiPKPKcSt10shared_ptrINS0_6detail7contextEEb

What other attempted solutions have you tried?

Logs or other output that would be helpful

(If logs are large, please upload as attachment or provide link).

More than one SYCL device supported?

Hi all,

Looking to see if more than one SYCL is possible at the moment. I can see that the sycl device factory adds them incrementally upon building tensorflow-opencl from source, but the physical_device_desc: "device: 0, name SYCL, pci bus id: 0" is hardcoded in, which makes me think only one SYCL device is currently supported.

All my gpus show up in computecpp_info fine, and clinfo.

Does anyone know?

Thanks!

Environment info

Operating System: Ubuntu 14.04
Computecpp version: 0.1.4

Runtime error: Check failed: IsAligned()

Thanks for the great work on tensorflow-opencl. It's really great.

Summary

I'm getting a runtime error for almost all tensorflow programs:

2017-03-18 12:52:52.241954: F ./tensorflow/core/framework/tensor.cc:488] Check failed: IsAligned()
Aborted (core dumped)

Environment Description

I have an Intel HD Graphics 5500 GPU with the Intel Broadwell i5 CPU x64. I'm using Intel's OpenCL drivers from here: https://software.intel.com/en-us/articles/opencl-drivers .

The OS is Ubuntu 16.04 LTS.

Python version is 3.5. It's running in a conda environment using Anaconda's versions of python, numpy, scipy, pyyaml, h5py, pandas, and jupyter.

However, the tensorflow pip package was compiled using Ubuntu's version of everything as per the compile from source instructions. I disabled Anaconda by removing it from ~/.bashrc, compiled the pip package, re-enabled Anaconda, activated the conda environment, and installed the pip package into the conda environment.

Steps to Reproduce

Here's the only tensorflow program I tried that did not fail:

import random
import sys
import tensorflow as tf
import time

random_number_generator = random.SystemRandom()

NUM_ROWS = 1024
NUM_COLUMNS = 1024

a_array = []
for i in range(1, (NUM_ROWS * NUM_COLUMNS) - 1):
    a_array.append(random_number_generator.random())

b_array = []
for i in range(1, (NUM_ROWS * NUM_COLUMNS) - 1):
    b_array.append(random_number_generator.random())

# Creates a graph.
with tf.device('/device:SYCL:0'):
    a = tf.constant(a_array, shape=[NUM_ROWS, NUM_COLUMNS], name='a', dtype=tf.float64)
    b = tf.constant(b_array, shape=[NUM_COLUMNS, NUM_ROWS], name='b', dtype=tf.float64)
    c = tf.matmul(a, b)

sess = tf.Session()

start = time.time()
sess.run(c)

Changing NUM_ROWS and NUM_COLUMNS to even 1200 resulted in the error above.

I also installed keras into the same conda environment using pip install keras and ran this script: https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py . This resulted in the same error: Check failed: IsAligned(). The error is displayed after Build model... is outputted to the console.

Commit Hash (`git rev-parse HEAD`)

dda6b4ee253ca3016841ff60b16df4be40b5b052

Bazel Version

...........
Build label: 0.4.5
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Mar 16 12:19:38 2017 (1489666778)
Build timestamp: 1489666778
Build timestamp as int: 1489666778

clinfo

Number of platforms                               1
  Platform Name                                   Intel(R) OpenCL
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 2.0 
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
  Platform Extensions function suffix             INTEL

  Platform Name                                   Intel(R) OpenCL
Number of devices                                 2
  Device Name                                     Intel(R) HD Graphics
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.0 
  Driver Version                                  r4.0.59481
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               24
  Max clock frequency                             900MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     by <unknown> (0x7FF200000000)
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              32
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 1 / 1       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              13231777383 (12.32GiB)
  Error Correction support                        No
  Max memory allocation                           4294959103 (4GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             4294959103 (4GiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        589824
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            268434943 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          4 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
    Max number of read/write image args           128
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max constant buffer size                        4294959103 (4GiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                131072 (128KiB)
    Max size                                      67108864 (64MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      80ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    SPIR versions                                 1.2 
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel
  Motion Estimation accelerator version	(Intel)   2
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_driver_diagnostics cl_intel_media_block_io cl_intel_motion_estimation cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_required_subgroup_size cl_intel_subgroups cl_intel_va_api_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp16 cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_khr_spir 

  Device Name                                     Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.0 (Build 400)
  Driver Version                                  1.2.0.400
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               4
  Max clock frequency                             2200MHz
  Device Partition                                (core)
    Max number of sub-devices                     4
    Supported partition types                     by counts, equally, by names (Intel)
  Max work item dimensions                        3
  Max work item sizes                             8192x8192x8192
  Max work group size                             8192
  Preferred work group size multiple              128
  Preferred / native vector sizes                 
    char                                                 1 / 32      
    short                                                1 / 16      
    int                                                  1 / 8       
    long                                                 1 / 4       
    half                                                 0 / 0        (n/a)
    float                                                1 / 8       
    double                                               1 / 4        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              16550207488 (15.41GiB)
  Error Correction support                        No
  Max memory allocation                           4137551872 (3.853GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             65536 (64KiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        262144
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             480
    Max size for 1D images from buffer            258596992 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   64 bytes
    Pitch alignment for 2D image buffers          64 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 480
    Max number of write image args                480
    Max number of read/write image args           480
  Max number of pipe args                         16
  Max active pipe reservations                    65535
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        131072 (128KiB)
  Max number of constant args                     480
  Max size of kernel argument                     3840 (3.75KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Local thread execution (Intel)                Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                4294967295 (4GiB)
    Max size                                      4294967295 (4GiB)
  Max queues on device                            4294967295
  Max events on device                            4294967295
  Prefer user sync for interop                    No
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [INTEL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

computecpp_info

********************************************************************************

ComputeCpp Info (CE 0.1.2)

********************************************************************************

Toolchain information:

GLIBCXX: 20150426
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 1 devices matching:
  platform    : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                     : UNTESTED - Device not tested on this OS
  CL_DEVICE_NAME                          : Intel(R) HD Graphics
  CL_DEVICE_VENDOR                        : Intel(R) Corporation
  CL_DRIVER_VERSION                       : r4.0.59481
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 
********************************************************************************

********************************************************************************

********************************************************************************

Will it work on Intel CPU OpenCL ?

Hey guys,

Perhaps it's naive question but could you tell me the difference between CPU and GPU versions in this particular version of TF ? What version would I need if I want to use TF with Intel CPU based OpenCL ? I have i7 on kabylake. It supports OpenCL 2.0

I tried to use GPU version first and oddly it wants libcublas.so.9.0 from Nvidia CUDA 9.0. CPU version seems to be just CPU version.

Getting Nan everywhere

Hello,
I literally followed this guide from end to end https://www.codeplay.com/portal/03-30-17-setting-up-tensorflow-with-opencl-using-sycl , and it kinda worked, (i tested multiplying small matrices) but when i run any of tensorflow examples ( https://github.com/aymericdamien/TensorFlow-Examples ) i am getting Nan in training loss everywhere, also batch computation time is very long.
I have AMD Radeon R9 290, ubuntu 14.04

Can't configure package w/o CUDA

tensorflow-opencl$ ./configure 
/media/Compressed/Drivers_bios/src/dev/tensorflow-opencl /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
No Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
  /usr/local/lib/python3.5/dist-packages
  /usr/lib/python3/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.5/dist-packages]

Using python library path: /usr/local/lib/python3.5/dist-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] Y
OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] N
No CUDA support will be enabled for TensorFlow
Please specify which C++ compiler should be used as the host C++ compiler. [Default is ]: /usr/lib/ccache/g++
Please specify which C compiler should be used as the host C compiler. [Default is ]: /usr/lib/ccache/gcc
Please specify the location where ComputeCpp for SYCL 1.2 is installed. [Default is /usr/local/computecpp]: 
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
........................
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:17:3: //external:eigen_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:17:3: //external:eigen_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:28:3: //external:libxsmm_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:28:3: //external:libxsmm_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:44:3: //external:com_googlesource_code_re2: no such attribute 'urls' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:44:3: //external:com_googlesource_code_re2: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:54:3: //external:gemmlowp: no such attribute 'urls' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:54:3: //external:gemmlowp: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:64:3: //external:farmhash_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:64:3: //external:farmhash_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:80:3: //external:highwayhash: no such attribute 'urls' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:80:3: //external:highwayhash: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:90:3: //external:nasm: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:90:3: //external:nasm: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:101:3: //external:jpeg: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:101:3: //external:jpeg: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:112:3: //external:png_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:112:3: //external:png_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:123:3: //external:gif_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:123:3: //external:gif_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:135:3: //external:six_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:135:3: //external:six_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:151:3: //external:protobuf: no such attribute 'urls' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:151:3: //external:protobuf: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:161:3: //external:gmock_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:161:3: //external:gmock_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:187:3: //external:pcre: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:187:3: //external:pcre: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:198:3: //external:swig: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:198:3: //external:swig: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:222:3: //external:grpc: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:222:3: //external:grpc: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:245:3: //external:linenoise: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:245:3: //external:linenoise: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:258:3: //external:llvm: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:258:3: //external:llvm: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:269:3: //external:jsoncpp_git: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:269:3: //external:jsoncpp_git: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:285:3: //external:boringssl: no such attribute 'urls' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:285:3: //external:boringssl: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:295:3: //external:nanopb_git: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:295:3: //external:nanopb_git: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:311:3: //external:zlib_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /media/Compressed/Drivers_bios/src/dev/tensorflow-opencl/tensorflow/workspace.bzl:311:3: //external:zlib_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': error loading package 'external': Could not load //external package.
ERROR: missing fetch expression. Type 'bazel help fetch' for syntax and help.

My system is Ubuntu 16.04
Bazel version 0.3.2

zipfile.BadZipFile: File is not a zip file

env:

win 7 x64
python3.5 in annoconda

install got error :

(py3) C:\Users\Kasim\Desktop>pip install tf_nightly_gpu-1.head-cp35-cp35m-win_amd64.whl
Processing c:\users\kasim\desktop\tf_nightly_gpu-1.head-cp35-cp35m-win_amd64.whl
Exception:
Traceback (most recent call last):
  File "D:\Anaconda3\envs\py3\lib\site-packages\pip\basecommand.py", line 215, in main
    status = self.run(options, args)
  File "D:\Anaconda3\envs\py3\lib\site-packages\pip\commands\install.py", line 335, in run
    wb.build(autobuilding=True)
  File "D:\Anaconda3\envs\py3\lib\site-packages\pip\wheel.py", line 749, in build
    self.requirement_set.prepare_files(self.finder)
  File "D:\Anaconda3\envs\py3\lib\site-packages\pip\req\req_set.py", line 380, in prepare_files
    ignore_dependencies=self.ignore_dependencies))
  File "D:\Anaconda3\envs\py3\lib\site-packages\pip\req\req_set.py", line 620, in _prepare_file
    session=self.session, hashes=hashes)
  File "D:\Anaconda3\envs\py3\lib\site-packages\pip\download.py", line 809, in unpack_url
    unpack_file_url(link, location, download_dir, hashes=hashes)
  File "D:\Anaconda3\envs\py3\lib\site-packages\pip\download.py", line 715, in unpack_file_url
    unpack_file(from_path, location, content_type, link)
  File "D:\Anaconda3\envs\py3\lib\site-packages\pip\utils\__init__.py", line 599, in unpack_file
    flatten=not filename.endswith('.whl')
  File "D:\Anaconda3\envs\py3\lib\site-packages\pip\utils\__init__.py", line 484, in unzip_file
    zip = zipfile.ZipFile(zipfp, allowZip64=True)
  File "D:\Anaconda3\envs\py3\lib\zipfile.py", line 1026, in __init__
    self._RealGetContents()
  File "D:\Anaconda3\envs\py3\lib\zipfile.py", line 1094, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

the tensorflow-opencl will build for windows？

NOTE: Only file GitHub issues for bugs and feature requests. All other topics will be closed.

For bugs or installation issues, please provide the following information.
The more information you provide, the more easily we will be able to offer
help and advice.

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

Environment info

Operating System:

Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*):

If installed from binary pip package, provide:

A link to the pip package you installed:
The output from python -c "import tensorflow; print(tensorflow.__version__)".

If installed from source, provide

The commit hash (git rev-parse HEAD)
The output of bazel version

If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)

What other attempted solutions have you tried?

Logs or other output that would be helpful

(If logs are large, please upload as attachment or provide link).

Memory manegement fault when running the mnist tutorial in Python3

When running convolutional.py in the mnist folder with Python3, I get the output

Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2017-01-29 03:14:01: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-29 03:14:01: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
*** Error in `python3': double free or corruption (fasttop): 0x00007f97ec005670 ***
*** Error in `python3': malloc(): memory corruption (fast): 0x00000000043b5ce0 ***
Aborted

When immediately running the script a second time I get another error:

Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2017-01-29 03:23:13: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-29 03:23:13: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
*** Error in `python3': malloc(): memory corruption (fast): 0x00007fac377fd0d0 ***
*** Error in `python3': Segmentation fault

Seems like there is some problem with the memory management which manifests nondeterministically. Uninitialized pointer perhaps?

Anyone knows what causes this or how I can find that out?

libcublas9 missing error

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

When importing tensorflow.

I'm working on debian stable and only cublas8 is available. Is saw a similar issue on tensorflow-gpu package and they ask installing 1.4 version instead. Is there a similar version for your package?

Anyway, thanks for everything you've made and you're doing! :D

Not Building

Hi,

I've tryed to build tensorflow from source because I need OpenCL support.

The environment was successfully setted as in the guide but the build did not end correctly.

I'm running Ubuntu 16.04, I've tryed to build both with GCC and Clang. The output provided here is with CLANG.

If you need addictional informations please ask, I really need TF to work on this computer.

Thank you

$ bazel version
Build label: 0.4.3
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 22 12:31:25 2016 (1482409885)
Build timestamp: 1482409885
Build timestamp as int: 1482409885
Errors.txt

ARM GPU or Android support

Hi, from the documentation I saw that the OpenCL support is on Linux only, and I saw that it's mainly tested on AMD GPUs. I'm wondering how much effort will be needed to support (1) ARM GPU such as the Mali series (2) Running tensorflow-opencl on Android?

How to compile it for NDK?

Please go to Stack Overflow for help and support:

https://stackoverflow.com/questions/tagged/tensorflow

If you open a GitHub issue, here is our policy:

It must be a bug or a feature request.
The form below must be filled out.
It shouldn't be a TensorBoard issue. Those go here.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
TensorFlow installed from (source or binary):
TensorFlow version (use command below):
Python version:
Bazel version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:
Exact command to reproduce:

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

if the project will be built for Windows？

Device not found by ComputeCpp

Hello again,

Unfortunately I'm still unable to run tensorflow. A simple code like:

import tensorflow as tf
tf.Session()

gives me:

2017-01-30 00:17:17: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compi
led to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-30 00:17:17: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compi
led to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-01-30 00:17:17: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compi
led to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-30 00:17:17: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compi
led to use FMA instructions, but these are available on your machine and could speed up CPU computations.
terminate called after throwing an instance of 'cl::sycl::exception'
  what():  Error: [ComputeCpp:RT0106] Device not found
Aborted (core dumped)

This is the output of computecpp_info:

ComputeCpp Info (CE 0.1.2)

********************************************************************************

Toolchain information:

GLIBCXX: 20150426
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 1 devices matching:
  platform    : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                     : NO - Device does not support SPIR
  CL_DEVICE_NAME                          : AMD TAHITI (DRM 2.43.0, LLVM 3.8.0)
  CL_DEVICE_VENDOR                        : AMD
  CL_DRIVER_VERSION                       : 11.2.0
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU

I believe Device does not support SPIR is bad news, although I hope not. Would you have any hints?
Should I move this issue to ComputeCpp's repository instead?

I'm running Ubuntu 16.04 and the output of bazel version is:

Build label: 0.4.3
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 22 12:31:25 2016 (1482409885)
Build timestamp: 1482409885
Build timestamp as int: 1482409885

Thank you again :)

404 not found

HTTP ERROR 404
Problem accessing /view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.0.0-cp35-cp35m-linux_x86_64.whl. Reason:
Not Found

the linux gpu version's page is error

Error querying the number of OpenCL platforms in the system

After following the instructions as closely as possible (I'm using Python3 and I have an Intel GPU, if you can call it that), when I run the mnist example (by running python3 convolutional.py from the terminal) I get the output

Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2017-01-26 22:30:09: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-26 22:30:09: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
terminate called after throwing an instance of 'cl::sycl::cl_exception'
  what():  Error: [ComputeCpp:RT0408] Error querying the number of OpenCL platforms in the system
Aborted

Why do I get this error (googling for the error message yields zero hits so I don't know where it is generated) and what can I do to fix it?

--copt=-mavx2 causes build error

Compiles fine with
bazel build -copt=-march=native --local_resources 2048,.5,1.0 -c opt //tensorflow/tools/pip_package:build_pip_package
but then when running tensorflow it warns that my cpu has avx2 but it isn't enabled so I run
bazel build --copt=-mavx2 --copt=-march=native --local_resources 2048,.5,1.0 -c opt //tensorflow/tools/pip_package:build_pip_package

And I get the following error:

ERROR: /home/ben/tensorflow-opencl/tensorflow/core/kernels/BUILD:828:1: C++ compilation of rule '//tensorflow/core/kernels:gather_functor' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-canonical-system-headers ... (remaining 51 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
In file included from ./tensorflow/core/framework/numeric_types.h:25:0,
                 from ./tensorflow/core/framework/type_traits.h:22,
                 from ./tensorflow/core/kernels/gather_functor.h:22,
                 from tensorflow/core/kernels/gather_functor.cc:50:
./third_party/eigen3/unsupported/Eigen/CXX11/FixedPoint:42:52: fatal error: src/Tensor/TensorContractionThreadPool.h: No such file or directory
compilation terminated.
Target //tensorflow/tools/pip_package:build_pip_package failed to build

Error: [ComputeCPP:RT0107]

I have compiled amdgpu-pro for gentoo , latest eselect-opencl for opencl 1.2 headers, compile and run some opencl samples (they worked!),
configured tensorflow for opencl, g++-5.4.0, gcc-5.4.0 and run the command:
bazel test -c opt --config=sycl --verbose_failures --test_timeout 3600 //tensorflow/python/kernel_tests:basic_gpu_test
and after compilation it failed with an error:

.terminate called after throwing an instance of 'cl::sycl::cl_exception'
what(): Error: [ComputeCpp:RT0107] Failed to create program from binary
external/bazel_tools/tools/test/test-setup.sh: line 114: 3209 Aborted (core dumped) "${TEST_PATH}" "$@"

OSX GPU

Amazing work! Will you guys also support GPU (OpenCL) support for Mac?

Status?

What's the status of this? I'm building it from source(since I'm guessing that's the only option) but not sure if I'm doing it correctly. I know you have to add "config=cuda" if you're building for CUDA but not sure if there's a special config for OpenCL. Also, I'm planning on using this with my Intel gfx GPU, will that be supported if I use beignet?

Can't find libComputeCpp.so when importing tensorflow

I wonder if you could help me with the following error when trying to import tensorflow freshly built from commit 49359bd.

ImportError: libComputeCpp.so: cannot open shared object file: No such file or directory

Everything seemed to work just fine during building and installing. Something like:

./configure

> Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3.5
> Please specify optimization flags to use during compilation [Default is -march=native]: 
> Do you wish to use jemalloc as the malloc implementation? [Y/n] 
> jemalloc enabled
> Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
> No Google Cloud Platform support will be enabled for TensorFlow
> Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
> No Hadoop File System support will be enabled for TensorFlow
> Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] N
> No XLA JIT support will be enabled for TensorFlow
> Found possible Python library paths:
>   /usr/local/lib/python3.5/dist-packages
>   /usr/lib/python3/dist-packages
> Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.5/dist-packages]
> 
> Using python library path: /usr/local/lib/python3.5/dist-packages
> Do you wish to build TensorFlow with OpenCL support? [y/N] y
> OpenCL support will be enabled for TensorFlow
> Do you wish to build TensorFlow with CUDA support? [y/N] N
> No CUDA support will be enabled for TensorFlow
> Please specify which C++ compiler should be used as the host C++ compiler. [Default is ]: /usr/bin/gcc
> Please specify which C compiler should be used as the host C compiler. [Default is ]: /usr/bin/gcc
> Please specify the location where ComputeCpp for SYCL 1.2 is installed. [Default is /usr/local/computecpp]: 

bazel build -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
sudo pip3 install /tmp/tensorflow_pkg/tensorflow-0.12.1-cp35-cp35m-linux_x86_64.whl

I'm running Ubuntu 16.04 and the output of bazel version is:

Build label: 0.4.3
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 22 12:31:25 2016 (1482409885)
Build timestamp: 1482409885
Build timestamp as int: 1482409885

Thanks!

AMD GPU device not found, but GPU test passed

System Info:

OS: Ubuntu 16.04
CUDA version: None
GPU: Advanced Micro Devices, Inc. [AMD/ATI] Richland [Radeon HD 8650G]
TensorFlow Build: Source

I ran the bazel gpu test with the command:

bazel test -c opt --verbose_failures --test_timeout 3600 //tensorflow/python/kernel_tests:basic_gpu_test

which it passed.

However, I am trying to run someone else's code found here:

https://github.com/cysmith/neural-style-tf

command I run is:

bash stylize_image.sh ./image_input/lion.jpg ./styles/kandinsky.jpg

which gives me the error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'Variable': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
[[Node: Variable = VariableV2container="", dtype=DT_FLOAT, shape=[1,512,512,3], shared_name="", _device="/device:GPU:0"]]

How should I change the code to get it to work with my AMD GPU?

Build Dockerfile

Can you add a dockerfile for easier build?

pre_built tensorflow-opencl for windows 10

I would like to be able to use my Rx480 GPU for tensorflow but there are no pre_built pip packages to install tensorflow-opencl on windows 10. Building from source code looks very hard. could some one make a pre_built pip packages for windows 10 that would be awesome.

No open-cl documentation

Very confused. How does one install and test this? It's basically just a big old directory with lots of files and the readme is just a copy of the tensorflow install which tell you how to install tensorflow, not tensorflow-opencl.

Tensorflow compilation fails with error "redeclared here as 'Eigen::internal::ComparisonName cmp'"

Hi,

I am trying to compile the tensorflow source with opencl 'syscl' flag. I got the following error:

In file included from <command-line>:0:0:
./bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/relu_op/tensorflow/core/kernels/relu_op.pic.sycl:8149:22: note: 'const bool cl::sycl::detail::{anonymous}::arg_used [16]' previously defined here
 const bool           kernel_info< ::Eigen::TensorSycl::ExecExprFunctorKernel<const ::Eigen::TensorAssignOp< ::Eigen::TensorMap< ::Eigen::Tensor<float, 1, 1, long>, 16, ::Eigen::MakePointer>, const ::Eigen::TensorCwiseBinaryOp< ::Eigen::internal::scalar_product_op<const float, const float>, const ::Eigen::TensorMap< ::Eigen::Tensor<const float, 1, 1, long>, 16, ::Eigen::MakePointer>, const ::Eigen::TensorConversionOp<float, const ::Eigen::TensorCwiseBinaryOp< ::Eigen::internal::scalar_cmp_op<const float, const float, ::Eigen::internal::ComparisonName::cmp_GT>, const ::Eigen::TensorMap< ::Eigen::Tensor<const float, 1, 1, long>, 16, ::Eigen::MakePointer>, const ::Eigen::TensorCwiseNullaryOp< ::Eigen::internal::scalar_constant_op<const float>, const ::Eigen::TensorMap< ::Eigen::Tensor<const float, 1, 1, long>, 16, ::Eigen::MakePointer> > > > > >, ::Eigen::TensorSycl::internal::FunctorExtractor< ::Eigen::TensorEvaluator<const ::Eigen::TensorAssignOp< ::Eigen::TensorMap< ::Eigen::Tensor<float, 1, 1, long>, 16, ::Eigen::MakePointer>, const ::Eigen::TensorCwiseBinaryOp< ::Eigen::internal::scalar_product_op<const float, const float>, const ::Eigen::TensorMap< ::Eigen::Tensor<const float, 1, 1, long>, 16, ::Eigen::MakePointer>, const ::Eigen::TensorConversionOp<float, const ::Eigen::TensorCwiseBinaryOp< ::Eigen::internal::scalar_cmp_op<const float, const float, ::Eigen::internal::ComparisonName::cmp_GT>, const ::Eigen::TensorMap< ::Eigen::Tensor<const float, 1, 1, long>, 16, ::Eigen::MakePointer>, const ::Eigen::TensorCwiseNullaryOp< ::Eigen::internal::scalar_constant_op<const float>, const ::Eigen::TensorMap< ::Eigen::Tensor<const float, 1, 1, long>, 16, ::Eigen::MakePointer> > > > > >, const ::Eigen::SyclDevice> >, ::utility::tuple::Tuple< ::cl::sycl::accessor<unsigned char, 1, ::cl::sycl::access::mode::read_write, ::cl::sycl::access::target::global_buffer>, ::cl::sycl::accessor<unsigned char, 1, ::cl::sycl::access::mode::read, ::cl::sycl::access::target::global_buffer>, ::cl::sycl::accessor<unsigned char, 1, ::cl::sycl::access::mode::read, ::cl::sycl::access::target::global_buffer>, ::cl::sycl::accessor<unsigned char, 1, ::cl::sycl::access::mode::read, ::cl::sycl::access::target::global_buffer> > > >::arg_used[] = {
                      ^
In file included from <command-line>:0:0:
./bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/relu_op/tensorflow/core/kernels/relu_op.pic.sycl:8012:83: error: template parameter 'int cmp'
 template <typename LhsScalar, typename RhsScalar, Eigen::internal::ComparisonName cmp> struct scalar_cmp_op;
                                                                                   ^
In file included from external/eigen_archive/unsupported/Eigen/CXX11/../../../Eigen/Core:423:0,
                 from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:14,
                 from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1,
                 from ./tensorflow/core/kernels/relu_op.h:23,
                 from tensorflow/core/kernels/relu_op.cc:20:
external/eigen_archive/unsupported/Eigen/CXX11/../../../Eigen/src/Core/functors/BinaryFunctors.h:190:77: error: redeclared here as 'Eigen::internal::ComparisonName cmp'
 template<typename LhsScalar, typename RhsScalar, ComparisonName cmp> struct scalar_cmp_op;
                                                                             ^
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 4754.029s, Critical Path: 191.56s

I have used the following command to initiate compilation:

bazel build -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package

I have AMD radeon 7670M GPU. I am using ubuntu 14.04.5 LTS and have latest compute-cpp module. My AMD drivers are working correctly ( The sole reason why i am using 14.04.5 :) )

~/TensorFlow/Packages/tensorflow$ glxinfo | grep OpenGL
OpenGL vendor string: Advanced Micro Devices, Inc.
OpenGL renderer string: AMD Radeon HD 7600M Series
OpenGL core profile version string: 4.3.13416 Core Profile Context 15.302
OpenGL core profile shading language version string: 4.40
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.5.13416 Compatibility Profile Context 15.302
OpenGL shading language version string: 4.40
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL extensions:

$ bazel test //tensorflow/python/kernel_tests:basic_gpu_test    INFO: Found 1 test target...
Target //tensorflow/python/kernel_tests:basic_gpu_test up-to-date:
  bazel-bin/tensorflow/python/kernel_tests/basic_gpu_test
INFO: Elapsed time: 2.213s, Critical Path: 1.83s
//tensorflow/python/kernel_tests:basic_gpu_test                          PASSED in 1.8s

Executed 1 out of 1 test: 1 test passes.

Any help will be appreciated.

Thanks.

tensorflow 2.8.0 requires tf-estimator-nightly==2.8.0.dev2021122109, but you have tf-estimator-nightly 2.9.0.dev2022022309 which is incompatible.

Please go to Stack Overflow for help and support:

https://stackoverflow.com/questions/tagged/tensorflow

If you open a GitHub issue, here is our policy:

It must be a bug or a feature request.
The form below must be filled out.
It shouldn't be a TensorBoard issue. Those go here.

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
TensorFlow installed from (source or binary):
TensorFlow version (use command below):
Python version:
Bazel version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:
Exact command to reproduce:

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.

Source code / logs

constructor for 'tensorflow::Scope' must explicitly initialize the const member 'colocation_constraints_'

I get the following error from bazel when trying to compile tensorflow with OpenCL support. Using bazel with -c opt and --config=sycl. Other details:

ComputeCpp-CE 0.1.1
Ubuntu 16.04

ERROR: /home/smistad/workspace/tensorflow/tensorflow/cc/BUILD:120:1: C++ compilation of rule '//tensorflow/cc:scope' failed: computecpp failed: error executing command external/local_config_sycl/crosstool/computecpp -Wall '-std=c++11' -MD -MF bazel-out/local_linux-py3-opt/bin/tensorflow/cc/_objs/scope/tensorflow/cc/framework/scope.pic.d ... (remaining 97 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
tensorflow/cc/framework/scope.cc:26:8: error: constructor for 'tensorflow::Scope' must explicitly initialize the const member 'colocation_constraints_'
Scope::Scope(Graph* graph, Status* status, Scope::NameMap* name_map,
       ^
./tensorflow/cc/framework/scope.h:255:36: note: 'colocation_constraints_' declared here
  const std::unordered_set<string> colocation_constraints_;
                                   ^

Issue with compiling Tensorflow

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No

OS Platform and Distribution (e.g., Linux Ubuntu):14.04.3-->Trusty

TensorFlow installed from (source or binary):Source

TensorFlow version (use command below):1.0 (Steps-> Downloaded tensorflow from https://github.com/benoitsteiner/tensorflow-opencl, ./configure - to configure project)

Bazel version (if compiling from source):0.5.4

CUDA/cuDNN version:NA

OPENCL Version:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (1800.11)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

GPU model and memory:
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Board name: AMD Radeon (TM) R5 M335
Memory: 4096M

Exact command to reproduce:
run the python script -- ipython keras_code.py

** G++/GCC version**:
g++-4.8

I have compiled CPP programs, they work fine.

ComputeCPP: 0.3.4
-- ** Python**: I am using Anaconda distribution Python for 2.7.2. (Anaconda - 2.4.3)

Describe the problem:

I have tried installing as per the steps given in :
https://www.codeplay.com/portal/03-30-17-setting-up-tensorflow-with-opencl-using-sycl

I have cloned the following git Repo:
git clone https://github.com/benoitsteiner/tensorflow-opencl.git

Everything works fine till the compile step: I use the following command:
bazel build --local_resources 2048,.5,1.0 -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package

================

Error

ERROR: /home/sayantan/Downloads/tensorflow-opencl/tensorflow/core/kernels/BUILD:2512:1: C++ compilation of rule '//tensorflow/core/kernels:softmax_op' failed (Exit 1).
In file included from tensorflow/core/kernels/softmax_op.cc:20:
In file included from ./tensorflow/core/kernels/softmax_op.h:23:
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:4:
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:14:
In file included from external/eigen_archive/Eigen/Core:299:
In file included from external/local_config_sycl/crosstool/../sycl/include/SYCL/sycl.hpp:20:
In file included from external/local_config_sycl/crosstool/../sycl/include/SYCL/sycl_interface.h:54:
external/local_config_sycl/crosstool/../sycl/include/SYCL/multi_pointer.h:342:3: error: multiple overloads of 'global_ptr' instantiate to the same signature 'void (pointer_t)' (aka 'void (attribute((address_space(1))) float *)')
global_ptr(pointer_t ptr) : Base(ptr) {}
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorSyclExprConstructor.h:42:74: note: in instantiation of template class 'cl::sycl::global_ptr<attribute((address_space(1))) float>' requested here
EvalToLHSConstructor(const utility::tuple::Tuple<Params...> &t) : expr(ConvertToActualTypeSycl(typename Eigen::internal::remove_all::type, utility::tuple::get(t))) {}
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceSycl.h:31:88: note: expanded from macro 'ConvertToActualTypeSycl'
#define ConvertToActualTypeSycl(Scalar, buf_acc) reinterpret_cast<typename cl::sycl::global_ptr::pointer_t>((&(*buf_acc.get_pointer())))
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorSyclExprConstructor.h:226:1: note: in instantiation of member function 'Eigen::TensorSycl::internal::EvalToLHSConstructor<attribute((address_space(1))) float *, 0, cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::write, cl::sycl::access::target::global_buffer, cl::sycl::codeplay::access::placeholder::false_t>, cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::read, cl::sycl::access::target::global_buffer, cl::sycl::codeplay::access::placeholder::false_t> >::EvalToLHSConstructor' requested here
EVALTO(const)
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorSyclExprConstructor.h:223:41: note: expanded from macro 'EVALTO'
: nestedExpression(funcD.xprExpr, t), buffer(t), expr(buffer.expr, nestedExpression.expr) {}
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorSyclExprConstructor.h:506:10: note: in instantiation of function template specialization 'Eigen::TensorSycl::internal::ExprConstructor<const Eigen::TensorEvalToOp<const Eigen::TensorReductionOp<Eigen::internal::MaxReducer, const Eigen::IndexList<Eigen::type2index<1>>, const Eigen::TensorMap<Eigen::Tensor<const float, 2, 1, long>, 16, MakeGlobalPointer>, MakeGlobalPointer>, MakeGlobalPointer>, const Eigen::TensorEvalToOp<const Eigen::TensorSycl::internal::PlaceHolder<const Eigen::TensorReductionOp<Eigen::internal::MaxReducer, const Eigen::IndexList<Eigen::type2index<1>>, const Eigen::TensorMap<Eigen::Tensor<const float, 2, 1, long>, 16, MakePointer>, MakePointer>, 1>, MakePointer>, cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::write, cl::sycl::access::target::global_buffer, cl::sycl::codeplay::access::placeholder::false_t>, cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::read, cl::sycl::access::target::global_buffer, cl::sycl::codeplay::access::placeholder::false_t> >::ExprConstructor<Eigen::TensorSycl::internal::FunctorExtractor<Eigen::TensorEvaluator<const Eigen::TensorEvalToOp<const Eigen::TensorReductionOp<Eigen::internal::MaxReducer, const Eigen::IndexList<Eigen::type2index<1>>, const Eigen::TensorMap<Eigen::Tensor<const float, 2, 1, long>, 16, MakePointer>, MakePointer>, MakePointer>, const Eigen::SyclDevice> > >' requested here
return ExprConstructor<OrigExpr, IndexExpr, Params...>(funcD, t);
.....

2 errors generated.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 3715.231s, Critical Path: 167.06s

Can you please help?

Thanks and regards
Sayantan Raha

benoitsteiner / tensorflow-opencl Goto Github PK

tensorflow-opencl's Issues

System information

Describe the problem

Environment info

What other attempted solutions have you tried?

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

Environment info

If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)

What other attempted solutions have you tried?

Logs or other output that would be helpful

Environment info

Summary

Environment Description

Steps to Reproduce

Commit Hash (git rev-parse HEAD)

Bazel Version

clinfo

computecpp_info

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

Environment info

If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)

What other attempted solutions have you tried?

Logs or other output that would be helpful

System information

Describe the problem

Source code / logs

System information

Describe the problem

Source code / logs

Recommend Projects

Recommend Topics

Recommend Org

Commit Hash (`git rev-parse HEAD`)