Coder Social home page Coder Social logo

pocl / pocl Goto Github PK

View Code? Open in Web Editor NEW
913.0 79.0 251.0 58.68 MB

pocl - Portable Computing Language

Home Page: http://portablecl.org

License: MIT License

CMake 2.20% Shell 0.29% C 59.09% C++ 14.41% LLVM 22.91% Python 0.75% Ruby 0.22% Cuda 0.04% Gnuplot 0.03% Dockerfile 0.05%
opencl heterogeneous-parallel-programming

pocl's Introduction

Portable Computing Language (PoCL)

PoCL is being developed towards an efficient implementation of the OpenCL standard which can be easily adapted for new targets.

Official web page

Full documentation

Building

This section contains instructions for building PoCL in its default configuration and a subset of driver backends. You can find the full build instructions including a list of available options in the user guide.

Requirements

In order to build PoCL, you need the following support libraries and tools:

  • Latest released version of LLVM & Clang
  • development files for LLVM & Clang + their transitive dependencies (e.g. libclang-dev, libclang-cpp-dev, libllvm-dev, zlib1g-dev, libtinfo-dev...)
  • CMake 3.9 or newer
  • GNU make or ninja
  • pkg-config
  • pthread (should be installed by default)
  • hwloc v1.0 or newer (e.g. libhwloc-dev) - optional
  • python3 (for support of LLVM bitcode with SPIR target; optional but enabled by default)
  • llvm-spirv (version-compatible with LLVM) and spirv-tools (optional; required for SPIR-V support in CPU / CUDA; Vulkan driver supports SPIR-V through clspv)

On Ubuntu or Debian based distros you can install the relevant packages with

export LLVM_VERSION=<major LLVM version>
apt install -y python3-dev libpython3-dev build-essential ocl-icd-libopencl1 \
    cmake git pkg-config libclang-${LLVM_VERSION}-dev clang-${LLVM_VERSION} \
    llvm-${LLVM_VERSION} make ninja-build ocl-icd-libopencl1 ocl-icd-dev \
    ocl-icd-opencl-dev libhwloc-dev zlib1g zlib1g-dev clinfo dialog apt-utils \
    libxml2-dev libclang-cpp${LLVM_VERSION}-dev libclang-cpp${LLVM_VERSION} \
    llvm-${LLVM_VERSION}-dev

If your distro does not package the version of LLVM you wish to build against you might want to set up the upstream LLVM package repository.

If LLVM is linked to PoCL statically (-DSTATIC_LLVM=ON cmake option), then the libpolly-${LLVM_VERSION}-dev libzstd-dev packages might be also needed (at least on Ubuntu 22.04 with packages from apt.llvm.org).

Configure & Build

Building PoCL follows the usual CMake workflow, i.e.:

cd <directory-with-pocl-sources>
mkdir build
cd build
cmake ..
make
# and optionally
make install

Supported LLVM Versions

PoCL aims to support the latest LLVM version at the time of PoCL release, plus the previous LLVM version. All older LLVM versions are supported on a "best effort" basis; there might not be build bots continuously testing the code base nor anyone fixing their possible breakage.

OpenCL 3.0 support

If you want PoCL built with ICD and OpenCL 3.0 support at platform level, you will need sufficiently new ocl-icd (2.3.x). For Ubuntu, it can be installed' from this PPA: https://launchpad.net/~ocl-icd/+archive/ubuntu/ppa Additionally, if you want the CPU device to report as 3.0 OpenCL you will need LLVM 14 or newer.

GPU support on different architectures

PoCL can be used to provide OpenCL driver on several architectures where the hardware manufacturer does not ship them like Nvidia Tegra (ARM) or IBM Power servers. On PPC64le servers, there are specific instructions to handle the build of PoCL in README.PPC64le. See also PoCL with CUDA driver section for prebuilt binaries.

Windows

Windows support has been unmaintained for a long time and building on Windows may or may not work. There are old instructions for building with Visual Studio in README.Windows but with the builtin CMake support of more recent Visual Studio versions (2019+) it might be enough to install the dependencies (e.g. with winget) and simply open the main CMakeLists.txt file in Visual Studio and let it work its magic.

Contributions for improving compatibility with Windows and more detailed and up to date build steps are welcome!

Notes

Building on ARM platforms is possible but lacks a maintainer and there are some gotchas.

If you are a distro maintainer, check README.packaging for recommendations on build settings for packaged builds.

Binary packages

Linux distros

PoCL with CPU device support can be found on many linux distribution managers. See latest packaged version(s)

PoCL with CUDA driver

PoCL with CUDA driver support for Linux x86_64, aarch64 and ppc64le can be found on conda-forge distribution and can be installed with

wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh   # install mambaforge

To install pocl with cuda driver

mamba install pocl-cuda

To install all drivers

mamba install pocl

macOS

Homebrew

PoCL with CPU driver support Intel and Apple Silicon chips can be found on homebrew and can be installed with

brew install pocl

Note that this installs an ICD loader from KhronoGroup and the builtin OpenCL implementation will be invisible when your application is linked to this loader.

Conda

PoCL with CPU driver support Intel and Apple Silicon chips can be found on conda-forge distribution and can be installed with

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh

To install the CPU driver

mamba install pocl

Note that this installs an ICD loader from KhronoGroup and the builtin OpenCL implementation will be invisible when your application is linked to this loader. To make both pocl and the builtin OpenCL implementaiton visible, do

mamba install pocl ocl_icd_wrapper_apple

License

PoCL is distributed under the terms of the MIT license. Contributions are expected to be made with the same terms.

pocl's People

Contributors

0charleschen0 avatar anbe42 avatar ardacoskunses avatar csanchezdll avatar dsandersllvm avatar elhigu avatar eschnett avatar fodinabor avatar franz avatar isuruf avatar jansol avatar jrprice avatar kolanich avatar koskinel avatar krrishnarraj avatar larsmans avatar linehill avatar loganchien avatar matrix avatar matthiasdiener avatar mattwala avatar new2f7 avatar oblomov avatar pjaaskel avatar rabijl avatar topileppanen avatar victoroliv2 avatar vinsteri avatar vkorhonen avatar xantares avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pocl's Issues

no NDEBUG when using LLVM

llvm-config does not pass NDEBUG to pocl if LLVM is built without debugging symbols. http://llvm.org/bugs/show_bug.cgi?id=18253
This breaks pocl, as LLVM has a lot of code in its headers, which get included in pocl and compiled without -DNDEBUG, causing all sorts of errors.

Temporary work-around in 49af9a9 is to compile all LLVM-related code with an explicit -DNDEBUG.

Regression between 0.8 and 0.9 when compiling with -Os optimization level.

I noticed while after release of 0.9 that homebrew package of pocl did fail unexpectedly with some cases, but I was just recently able to pin-point where the problem was.

Homebrew automatically changes all optimization switches to -Os and I hadn't been testing it myself earlier. Now that I configure pocl 0.9 package with:

CXXFLAGS=-Os ../pocl-git/configure --disable-debug --enable-direct-linkage --disable-icd --enable-testsuites= --prefix=/Users/mikaelle/Projects/Vincit/webcl/pocl-testing/install-0.9

Some test cases fails with message (test 4 is first to fail because of this):

Assertion failed: (VP.getPointer() == Next->VP.getPointer() && "Added to wrong list?"), function AddToExistingUseList, file Value.cpp, line 498.

Same examples did work on 0.8 so I ran bisect to find out when it got broken and got:

49af9a9094290aa7b8b59ba7b4712ba7086b6a40 is the first bad commit
commit 49af9a9094290aa7b8b59ba7b4712ba7086b6a40
Author: Kalle Raiskila <[email protected]>
Date:   Sat Dec 14 21:11:23 2013 +0200

    Add work-arounds for LLVM bug 18253.

:040000 040000 c836d7c1b897d702d3d56f71fe94bd8b01e4ebf1 617d7bcd1d99aafa21fce6569a71c82fef495a3d M  lib
Mikaels-MacBook-Pro:pocl-git mikaelle

It basically hard coded some DNDEBUG flags to few Makefile.am:

-AM_CXXFLAGS = -I@top_srcdir@/fix-include -I@top_srcdir@/include `@LLVM_CONFIG@ --cxxflags` -fno-rtti
+#the explicit '-DNDEBUG' is a work-around for llvm bug 18253
+AM_CXXFLAGS = -I@top_srcdir@/fix-include -I@top_srcdir@/include `@LLVM_CONFIG@ --cxxflags` -fno-rtti -DNDEBUG

Would it be possible to figure out somehow if DNDEBUG flag is required or not required?

Adapt Csmith to OpenCL

Csmith tests C99 compilers for correctness by using randomized inputs. We should adapt it to OpenCL.

https://github.com/csmith-project/csmith

This should be straightforward, as the semantic differences between the subset of C99 that Csmith tests and OpenCL are small. It would be interesting, though, to extend Csmith to test vector data types as well.

Implement events

gocl is a glib wrapper for OpenCL which heavily uses events.

currntly simple examples fail, because pocl doesn't implement Events:

$ ./hello-world
Failed to create GPU context (-1): Device not found.
Trying with CPU context... Context created
Num devices: 1
Max compute units: 4
Program created
Program built
Kernel created
Max work group size: 1024
Buffer created
pocl error: encountered unimplemented part of the OpenCL specs in clSetEventCallback.c:9

But I must admit that the gocl sync-example is working.

opencv: some tests crash or fail

Hi,
in December 2013 I tested opencv with pocl and there were many problems. It has been a progress since that time and now the following bug report became smaller:

GNU/Debian Jessie (testing)
opencv: git master (last commit: f269a89792292e65d7df10cf386bc2ec04b757a8)

cd opencv_extra.git/testdata
../../opencv-git-master-build/bin/opencv_test_core --gtest_filter="OCL*"


without opencl:

[----------] Global test environment tear-down
[==========] 5969 tests from 56 test cases ran. (19331 ms total)
[  PASSED  ] 5969 tests.
40.61user 0.27system 0:19.59elapsed 208%CPU (0avgtext+0avgdata 62052maxresident)k
0inputs+0outputs (0major+157766minor)pagefaults 0swaps

opencl intel: 4.4.0.117

Build type: debug
Parallel framework: openmp
CPU features: sse sse2 sse3 ssse3 sse4.1 sse4.2 avx
OpenCL Platforms: 
    Intel(R) OpenCL
        CPU:        Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz (OpenCL 1.2 (Build 44))
Current OpenCL device: 
    Type = CPU
    Name =        Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz
    Version = OpenCL 1.2 (Build 44)
    Compute units = 4
    Max work group size = 8192
    Local memory size = 32 kB 
    Max memory allocation size = 974 MB 1010 kB 
    Double support = Yes
    Host unified memory = Yes
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 1
    Preferred vector width short = 1
    Preferred vector width int = 1
    Preferred vector width long = 1
    Preferred vector width float = 1
    Preferred vector width double = 1

...

[----------] Global test environment tear-down
[==========] 5969 tests from 56 test cases ran. (525778 ms total)
[  PASSED  ] 5969 tests.
577.34user 7.99system 8:49.08elapsed 110%CPU (0avgtext+0avgdata 1910284maxresident)k
3600inputs+0outputs (35major+1923277minor)pagefaults 0swaps

opencl amd: 14.6-beta-v1.0-may23

Parallel framework: openmp
CPU features: sse sse2 sse3 ssse3 sse4.1 sse4.2 avx
OpenCL Platforms: 
    AMD Accelerated Parallel Processing
        CPU: Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz (OpenCL 1.2 AMD-APP (1526.3))
Current OpenCL device: 
    Type = CPU
    Name = Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz
    Version = OpenCL 1.2 AMD-APP (1526.3)
    Compute units = 4
    Max work group size = 1024
    Local memory size = 32 kB 
    Max memory allocation size = 2 GB 
    Double support = Yes
    Host unified memory = Yes
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 16
    Preferred vector width short = 8
    Preferred vector width int = 4
    Preferred vector width long = 2
    Preferred vector width float = 8
    Preferred vector width double = 4

...

[----------] Global test environment tear-down
[==========] 5969 tests from 56 test cases ran. (369619 ms total)
[  PASSED  ] 5969 tests.
408.51user 20.96system 6:10.04elapsed 116%CPU (0avgtext+0avgdata 152132maxresident)k
288inputs+338816outputs (3major+6572443minor)pagefaults 0swaps

llvm: 3.4.1-4
pocl: git master (last commit: ce87f87)

Build type: debug
Parallel framework: openmp
CPU features: sse sse2 sse3 ssse3 sse4.1 sse4.2 avx
OpenCL Platforms: 
    Portable Computing Language
        unknown: pthread-Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz (OpenCL 1.2 pocl)
Current OpenCL device: 
    Type = unknown
    Name = pthread-Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz
    Version = OpenCL 1.2 pocl
    Compute units = 4
    Max work group size = 1024
    Local memory size = 974 MB 1010 kB 
    Max memory allocation size = 974 MB 1010 kB 
    Double support = Yes
    Host unified memory = Yes
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 16
    Preferred vector width short = 8
    Preferred vector width int = 4
    Preferred vector width long = 2
    Preferred vector width float = 4
    Preferred vector width double = 2

...

Comparing to above: very slow and growing memory consumption !

...
[----------] 14 tests from OCL_Channels/MixChannels
[ RUN      ] OCL_Channels/MixChannels.Accuracy/0
*** Error in `../../opencv-git-master-build/bin/opencv_test_core': free(): corrupted unsorted chunks: 0x000000000e341080 ***
Command terminated by signal 6
195.62user 15.59system 3:31.02elapsed 100%CPU (0avgtext+0avgdata 490660maxresident)k
488inputs+17960outputs (0major+15199320minor)pagefaults 0swaps

Details:

These tests crash:
OCL_Channels/MixChannels.*

These tests crash (depending on running together or once at a time):
OCL_Arithm/Sum.MAT/*
OCL_Arithm/Norm.*
OCL_Arithm/UMatDot.*

There are some failed tests, especially in:
OCL_Arithm/CountNonZero.MAT/*
OCL_Arithm/Sum.MAT/*
OCL_Arithm/Norm.*

Remaining tests (not starting with OCL) are all passed by pocl

cd opencv_extra.git/testdata
../../opencv-git-master-build/bin/opencv_test_core --gtest_filter="-OCL*"

without opencl:

[----------] Global test environment tear-down
[==========] 2878 tests from 109 test cases ran. (284399 ms total)
[  PASSED  ] 2878 tests.

  YOU HAVE 438 DISABLED TESTS

310.30user 0.90system 4:45.25elapsed 109%CPU (0avgtext+0avgdata 246956maxresident)k
6624inputs+928outputs (52major+625642minor)pagefaults 0swaps

opencl intel:

[----------] Global test environment tear-down
[==========] 2878 tests from 109 test cases ran. (292669 ms total)
[  PASSED  ] 2878 tests.

  YOU HAVE 438 DISABLED TESTS

319.52user 2.69system 4:53.33elapsed 109%CPU (0avgtext+0avgdata 352980maxresident)k
32inputs+928outputs (1major+1488611minor)pagefaults 0swaps

opencl amd:

[----------] Global test environment tear-down
[==========] 2878 tests from 109 test cases ran. (298827 ms total)
[  PASSED  ] 2878 tests.

  YOU HAVE 438 DISABLED TESTS

321.95user 1.88system 4:59.11elapsed 108%CPU (0avgtext+0avgdata 290368maxresident)k
456inputs+7648outputs (3major+591166minor)pagefaults 0swaps

pocl:

[----------] Global test environment tear-down
[==========] 2878 tests from 109 test cases ran. (348391 ms total)
[  PASSED  ] 2878 tests.

  YOU HAVE 438 DISABLED TESTS

367.80user 7.38system 5:48.79elapsed 107%CPU (0avgtext+0avgdata 2664204maxresident)k
368inputs+9456outputs (5major+4847686minor)pagefaults 0swaps

Do you plan to add opencv to the set of pocl regular tests?

Pocl 0.9 RC1: building opencl C fails if include directory contains -g in its name.

This was really strange

mikaelle$ opencl-validator -I/dir < empty.cl 
opencl-validator: Compiling input for Portable Computing Language/pthread.
mikaelle$ opencl-validator -I/dir-g < empty.cl 
opencl-validator: Compiling input for Portable Computing Language/pthread.
opencl-validator: Can't build OpenCL program.
mikaelle$ 

opencl-validator only builds the program that is inputted to stdin.

Cannot configure pocl

When I configure pocl, I receive the error

config.status: error: cannot find input file: `examples/ViennaCL/custom-context.stdout.in'

Is this file missing from the repo?

Resource leak in /lib/poclu/misc.c

Fix:
It is a good practice to free resources, even if the OS does clean up after the program as it terminates.


---
 lib/poclu/misc.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/poclu/misc.c b/lib/poclu/misc.c
index 9355d40..2504417 100644
--- a/lib/poclu/misc.c
+++ b/lib/poclu/misc.c
@@ -97,8 +97,11 @@ poclu_read_file(char *filename)
   fseek( file, 0, SEEK_END);
   size = ftell(file);
   src = (char*)malloc(size+1);
-  if (src == NULL)
+  if (src == NULL) {
+    /* do not leak resources, even in fatal case */
+    fclose(file);
     return NULL;
+  }

   fseek(file, 0, SEEK_SET);
   fread(src, size, 1, file);
--

clpeak fails on x86_64

Hey,

because of the ml announcement of the andorid port I discovered clpeak. Sadly it fails on my desktop machine:

$ ./clpeak 

Platform: Portable Computing Language
  Device: pthread-Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
    Driver version  : 0.9 (Linux x64)
    Compute units   : 4
    Clock frequency : 3400 MHz

    Global memory bandwidth (GBPS)
      float   : 7.35
      float2  : 10.13
      float4  : 10.97
      float8  : 12.49
      float16 : 15.42

    Single-precision compute (GFLOPS)
      float   : 2.06
      float2  : 2.87
      float4  : 8.44
      float8  : 3.18
      float16 : 1.39

    Double-precision compute (GFLOPS)
      double   : 2.14
      double2  : 6.22
      double4  : 1.17
      double8  : Speicherzugriffsfehler (Speicherabzug geschrieben)
                       ^^ segfault


$ gdb ./clpeak
...
(gdb) bt
#0  0x00007ffff488d7db in _compute_dp_v8 () from /tmp/poclDH1p5F/pthread/compute_dp_v8/256-1-1.0-0-0/parallel.so
#1  0x00007ffff72ae47e in workgroup_thread () from /usr/lib64/libpocl.so.1.2.0
#2  0x00007ffff6c37c53 in start_thread (arg=0x7ffff549b700) at pthread_create.c:308
#3  0x0000003b08ef5dbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb)

llvm 3.5 support

Current code fails to compile on llvm 3.5. Some headers and structures have been refined in 3.5
clang 3.5 received support for integrated assembler on arm. It would be a good starting point for android port since gcc dependency(for assembling) is no more required

I can look into android port once pocl is llvm 3.5 ready :)

enhancement

The spir64 tests should probably not run on a 32-bit architecture

I am building pocl on my Raspberry PI, a 32-bit ARMv6 architecture. The test "example1: dot product (SPIR64)" fails with the log output below. Should this test even be tried, given that a 64-bit ABI won't work on a 32-bit system?

#                             -*- compilation -*-
3. testsuite.at:48: testing example1: dot product (SPIR64) ...
/xfs1/eschnetter/remote/water/src/pocl/tests/testsuite.at:60: $abs_top_builddir/examples/example1-spir64/example1-spir
stderr:
WARNING: Linking two modules of different data layouts!
WARNING: Linking two modules of different target triples: /xfs1/eschnetter/remote/water/src/pocl-build/lib/kernel/host/kernel-armv6-unknown-linux-gnueabihf.bc: 'armv6-unknown-linux-gnueabihf' and 'spir64-unknown-unknown'
lt-example1-spir: /xfs1/eschnetter/remote/water/src/llvm-3.4/lib/IR/Instructions.cpp:1084: void llvm::StoreInst::AssertOK(): Assertion `getOperand(0)->getType() == cast<PointerType>(getOperand(1)->getType())->getElementType() && "Ptr must be a pointer to Val type!"' failed.
/xfs1/eschnetter/remote/water/src/pocl-build/tests/testsuite.dir/at-groups/3/test-source: line 37:  4828 Aborted                 $abs_top_builddir/examples/example1-spir64/example1-spir
--- expout      2014-06-24 10:52:04.394644352 -0400
+++ /xfs1/eschnetter/remote/water/src/pocl-build/tests/testsuite.dir/at-groups/3/stdout 2014-06-24 10:52:04.435640200 -0400
@@ -1,5 +0,0 @@
-(0.000000, 0.000000, 0.000000, 0.000000) . (0.000000, 0.000000, 0.000000, 0.000000) = 0.000000
-(1.000000, 1.000000, 1.000000, 1.000000) . (1.000000, 1.000000, 1.000000, 1.000000) = 4.000000
-(2.000000, 2.000000, 2.000000, 2.000000) . (2.000000, 2.000000, 2.000000, 2.000000) = 16.000000
-(3.000000, 3.000000, 3.000000, 3.000000) . (3.000000, 3.000000, 3.000000, 3.000000) = 36.000000
-OK
/xfs1/eschnetter/remote/water/src/pocl/tests/testsuite.at:60: exit code was 134, expected 0
3. testsuite.at:48: 3. example1: dot product (SPIR64) (testsuite.at:48): FAILED (testsuite.at:60)

Vector assigning with single values in conditional operator

I'm not sure if this is legal syntax but it works with AMD's OpenCL and not with POCL:

float4 foo = (float4){0.0f, 1.0f, 2.0f, 3.0f};
foo = foo > 1.0f ? 1.0f : 0.5f;
printf("%2.2v4hlf\n", foo);
//Should output:
//0.50,0.50,1.00,1.00

POCL kernel compiler gives this error:

error: assigning to 'float4' from incompatible type 'int __attribute__((ext_vector_type(4)))'

This works:

float4 foo = (float4){0.0f, 1.0f, 2.0f, 3.0f};
foo = foo > 1.0f ? (float4){1.0f, 1.0f, 1.0f, 1.0f} : 0.5f;
printf("%2.2v4hlf\n", foo);
//Output:
//0.50,0.50,1.00,1.00

rust-opencl tests fail

Environment

System: Arch Linux (up-to-date)
pocl: HEAD (56b8f47)
LLVM/Clang: 3.4
Used payload: luqmana/rust-opencl (2d4b1f7edef5) tests

Minimal Testcase

none, see below

Description

I get some random segfaults and problems during the buffer allocation of the test suite (clCreateBuffer=> CL_INVALID_BUFFER_SIZE) even for small sizes of 64bytes and enough system memory left. Sadly I am not able to produce a minimal test case because single tests that fail when started together with the entire test suite succeed when running alone. I've tried around some stuff using a small C program but was not able to reproduce the problem.

Ideas

It could be that the Rust garbage collector produces some problems here because the buffers of the test suite are freed when marked as garbage. If you wish I can cross-post an issue to the rust-opencl tracker.
I was not able to disable the region allocator because of #75

test_as_type fail on avx

The kernel below causes a run-time segfault when compiled for corei7-avx on LLVM 3.4 (of today). Works when processor is corei7 so avx is probably the crux again. This is a reduction of the 'test_as_type' test.
Looks like a LLVM 3.4 regression, but the pure LLVM testcase is stil a bit big.

attribute((aligned(128)))
constant char data[128];
kernel void test_as_type()
{
union { uint16 value; char raw[sizeof(uint16)]; } src;
src.value = ((constant uint16)data);
for (size_t i = 0; i < 16; ++i)
printf("%d - %x \n", i, src.raw[i]);
}

'ceil' broken on powerpc32

The following kernel code:
float8 foo = (float8)(1.1f,2.2f,3.3f,4.4f,5.5f,6.6f,7.7f,8.8f);
float8 bar = ceil(foo);
for(i=0; i<8; i++) printf("%f, ", bar[i] ); printf("\n");

prints this:
2.000000, 3.000000, 2.000000, 3.000000, 2.000000, 3.000000, 2.000000, 3.000000,

on powperc

LLVM opt segfaults with pocl-0.9/examples/standalone

This could very well be a case of PEBKAC but it might still be reasonable to document this in the issues. This could also be an issue with clang itself, since it actually crashes the opt tool.

Also this sounds like it could have something to do with: http://llvm.org/bugs/show_bug.cgi?id=12945
However, given that the bug report was originally for llvm 3.1, I don't see this
as an obvious conclusion. Also the test case (hello.cpp) given in the bug report
does not seem to crash on clang. This leads me to believe that this is a
separate bug.

First of all the LLVM / Clang (version 3.4.2) which was used here was configured as follows:
../llvm-3.4.2/configure
--prefix=/usr --enable-polly --enable-cxx11 --enable-clang-static-analyzer
--enable-clang-rewriter --enable-optimized --enable-profiling --enable-assertions
--enable-expensive-checks --disable-debug-runtime --disable-debug-symbols
--disable-keep-symbols --enable-jit --enable-docs --enable-threads
--enable-pthreads --enable-zlib --enable-pic --enable-shared
--enable-timestamps --enable-backtraces --enable-crash-overrides
--enable-targets=all --enable-bindings=all --enable-ltdl-install
--with-optimize-option=-O2 CC=gcc-4.7 CXX=g++-4.7

LLVM was built with gcc-4.7.2

pocl (release 0.9) was configured as follows:
../pocl-0.9/configure CC="clang" CXX="clang++"

The make is running fine untils it encounters the standalone.cl in examples/standalone.

make[3]: Entering directory `/home/kmort/workspace/build-pocl-0.9/examples/standalone'
../../scripts/pocl-standalone -h standalone.h -o standalone.bc ../../../pocl-0.9/examples/standalone/standalone.cl
x86_64-unknown-linux-gnu
0 libLLVM-3.4.so 0x00002b9c46f1a1fb llvm::sys::PrintStackTrace(IO_FILE) + 43
1 libLLVM-3.4.so 0x00002b9c46f1a92c
2 libpthread.so.0 0x00002b9c4846d030
3 libc.so.6 0x00002b9c4952ee3b mcount + 43
Stack dump:
0. Program arguments: /usr/bin/opt -relocation-model=pic -mtriple=x86_64-unknown-linux-gnu -mcpu=core2 -load=/home/kmort/workspace/build-pocl-0.9/lib/llvmopencl/.libs/llvmopencl.so -generate-header -disable-output -header=standalone.h ./.pocl8788/kernel.bc
Segmentation fault
make[3]: *
* [standalone.bc] Error 139

The bug is reproducable via development version of pocl (pulled today, top commit: c429022).

The problem also persist with --enable-icd --disable-direct-linkage added to
configure in pocl. I have not yet tried the other way around.

Run "llc kernel*.bc" as part of the tests

Test code generation by creating machine code for all kernel library functions. This can be done by calling "llc kernel*.bc" in lib/kernel/hosts. If this fails, then there are kernel functions that cannot be compiled, i.e. which probably lead to failure if they are used in application code.

Compiling a kernel function fails if we either use constructs that are not supported in LLVM, or if there is a code generation bug in LLVM.

does not build with clang-3.2-spir

The official SPIR producer version of LLVM would be nice to support. It seems not to be compatible with the upstream 3.2 (it probably has more OpenCL C support), thus fails with the type defines:

llvm-3.2-spir/bin/clang -target x86_64-unknown-linux-gnu -march=penryn -D_CL_DISABLE_LONG -Xclang -ffake-address-space-map -emit-llvm -ffp-contract=off -x cl -D__OPENCL_VERSION__=120 -fsigned-char -c -o abs.cl.bc ../../../lib/kernel/abs.cl -include /home/visit0r/src/pocl/include/_kernel.h
In file included from :155:
In file included from :3:
In file included from /home/visit0r/src/pocl/include/_kernel.h:33:
In file included from /home/visit0r/src/pocl/include/_kernel_c.h:34:
/home/visit0r/src/pocl/include/pocl_types.h:48:9: error: cannot combine with previous 'type-name' declaration specifier
typedef SIZE_TYPE size_t;
^
:56:37: note: expanded from macro 'SIZE_TYPE'

define SIZE_TYPE long unsigned int

                                ^

In file included from :155:
In file included from :3:
In file included from /home/visit0r/src/pocl/include/_kernel.h:33:
In file included from /home/visit0r/src/pocl/include/_kernel_c.h:34:
/home/visit0r/src/pocl/include/pocl_types.h:48:9: error: 'type-name' cannot be signed or unsigned
typedef SIZE_TYPE size_t;
^
:56:28: note: expanded from macro 'SIZE_TYPE'

define SIZE_TYPE long unsigned int

                       ^

In file included from :155:
In file included from :3:
In file included from /home/visit0r/src/pocl/include/_kernel.h:33:
In file included from /home/visit0r/src/pocl/include/_kernel_c.h:34:
/home/visit0r/src/pocl/include/pocl_types.h:49:9: error: cannot combine with previous 'type-name' declaration specifier
typedef PTRDIFF_TYPE ptrdiff_t;
^
:52:31: note: expanded from macro 'PTRDIFF_TYPE'

define PTRDIFF_TYPE long int

                          ^

In file included from :155:
In file included from :3:
In file included from /home/visit0r/src/pocl/include/_kernel.h:33:
/home/visit0r/src/pocl/include/_kernel_c.h:161:13: error: cannot combine with previous 'int' declaration specifier
typedef int sampler_t;

.... etc ...

It seems pocl build thinks that the Clang 3.2 does not provide those types (like the upstream doesn't) and try to #define them to something sensible while Clang-SPIR 3.2 does provide. https://github.com/KhronosGroup/SPIR

debug build of host causes error in kernel results

Buildig the host library with --enable-debug causes kernel result errors in pthread and basic device. This happens only when statically linking LLVM in API mode, and is probably a race. Only some tests fail - seem to be those that use a buffer that the kernel writes
What I tried:
-set either CFLAGS or CXXFLAGS to -O1, the other to -O0, then there is no errors
-save temp dirs, copy a known-good parallel.so to it. Still causes error. (I checked in debugger that the cached parallel.so is actually used)
-debug the ICD - the correct library is loaded without problems

example1 output:
$ ./examples/example1/example1
(0.000000, 0.000000, 0.000000, 0.000000) . (0.000000, 0.000000, 0.000000, 0.000000) = 0.000000
(1.000000, 1.000000, 1.000000, 1.000000) . (1.000000, 1.000000, 1.000000, 1.000000) = 0.000000

non-pointer struct kernel arguments fail due to varying ABIs

pocl assumes the kernels arguments are mapped as they are in sources. Structs are problematic as different ABIs map them differently. E.g. on AMD64/Linux it seems to split the struct to two integers some times which causes the kernel launching to fail as it maps the arguments to wrong positions.

This is not trivial to fix. Some possibilities:

  • force the kernel ABI to be fixed at Clang side
  • create a more sophisticated work group launcher that converts from the pocl argument table to the correct types and then calls the kernel with the correct ABI

The first is cleaner but possibly hard to get through.

https://bugs.launchpad.net/pocl/+bug/987905

parallel build of AMD SDK tests fails

Some makefile prerequisite missing:
make[7]: *** No rule to make target ../../../../../samples/opencl/SDKUtil/build/debug/x86_64//libSDKUtil.a', needed bybuild/debug/x86_64/SobelFilterImage'. Stop.

Happens when "make -j N" is used. Several (possibly all) tests suffer from this.

Android port

Good news. pocl can be ported to android. I have an intermediate version of pocl running on my Moto G

root@falcon_umtsds:/data/local/tmp # ./clpeak --compute-sp

Platform: Portable Computing Language
Device: pthread-ARMv7 Processor rev 3 (v7l)
Driver version : 0.10-pre (Android)
Compute units : 5
Clock frequency : 1190 MHz

Single-precision compute (GFLOPS)
  float   : 1.03
  float2  : 2.04
  float4  : 3.91
  float8  : 3.87
  float16 : 4.01

llvm/clang 3.4 is used.

TODO:

  1. Code-cleanup to be done. Build scripts are not in a shape for upload
  2. I have written an android app to install pocl libraries, clang etc to a standard location /data/data/org.pocl.libs/files/ . Apk itself is ~25 MB, and when extracted it expands to ~90 MB space in rom :|
  3. tmp files are put to /sdcard/pocl/tmp. Android doesn't provide /tmp :/ junk will be collected in /sdcard/pocl/tmp if clReleaseKernel is not called. Can we cache kernels based on crc of source? This can significantly reduce build time

enhancement

clSetKernelArg does not verify cl_mem objects

void *foo;
clSetKernelArg( krn, 0, sizeof(cl_mem), &foo );

should return CL_INVALID_MEM_OBJECT, according to specification.

Currently, pocl does one of:

  1. freeze on clRetainMemObject's mutex
  2. assert with __pthread_tpp_change_priority: Assertion `new_prio == -1 || (new_prio >= __sched_fifo_min_prio && new_prio <= __sched_fifo_max_prio)
  3. continue happily

--disable-region-allocator compilation fails

Environment

System: Arch Linux (up-to-date)
pocl: HEAD (56b8f47)
LLVM/Clang: 3.4

Description

Disabling the region allocator via ./configure --disable-region-allocator ends up with the compilation error pthread.c:92:3: error: unknown type name 'ba_lock_t'.

Pocl breaks on OSX when after commit 351ca0a3cdeb584cab7e2ea5e3bb845a103f4cbd

After commit our test cases start giving following type of errors:

clEnqueueNDRangeKernel failed with code -5

clSetKernelArg failed for arg 2
clSetKernelArg failed for array length parameter
clEnqueueNDRangeKernel failed with code -5

And pocl test bench start failing:

Subject: [pocl 0.9-pre] testsuite: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 52 53 54 55 56 57 58 59 60 61 62 66 67 68 69 70 71 72 74 75 76 79 80 82 85 87 failed

Build fails with `error: unknown target CPU 'pentium-m'unknown target CPU 'pentium-m'`

Building a recent git snapshot of pocl for Fedora leads to:


mkdir -p ./
/usr/bin/clang -target x86_64-redhat-linux-gnu -march=pentium-m -D_CL_DISABLE_LONG -Xclang -ffake-address-space-map -emit-llvm -ffp-contract=off -x cl -D__OPENCL_VERSION__=120 -fsigned-char -c -o abs.cl.bc ../../../lib/kernel/abs.cl -include /builddir/build/BUILD/pocl-f5f8e278d6c5fe9a0b8ea1fa25ca0d1ef308a7c1/include/_kernel.h
/usr/bin/clang -target x86_64-redhat-linux-gnu -march=pentium-m -D_CL_DISABLE_LONG -Xclang -ffake-address-space-map -emit-llvm -ffp-contract=off -x cl -D__OPENCL_VERSION__=120 -fsigned-char -c -o abs_diff.cl.bc ../../../lib/kernel/abs_diff.cl -include /builddir/build/BUILD/pocl-f5f8e278d6c5fe9a0b8ea1fa25ca0d1ef308a7c1/include/_kernel.h
mkdir -p ./
mkdir -p ./
/usr/bin/clang -target x86_64-redhat-linux-gnu -march=pentium-m -D_CL_DISABLE_LONG -Xclang -ffake-address-space-map -emit-llvm -ffp-contract=off -x cl -D__OPENCL_VERSION__=120 -fsigned-char -c -o add_sat.cl.bc ../../../lib/kernel/add_sat.cl -include /builddir/build/BUILD/pocl-f5f8e278d6c5fe9a0b8ea1fa25ca0d1ef308a7c1/include/_kernel.h
/usr/bin/clang -target x86_64-redhat-linux-gnu -march=pentium-m -D_CL_DISABLE_LONG -Xclang -ffake-address-space-map -emit-llvm -ffp-contract=off -x cl -D__OPENCL_VERSION__=120 -fsigned-char -c -o all.cl.bc ../../../lib/kernel/all.cl -include /builddir/build/BUILD/pocl-f5f8e278d6c5fe9a0b8ea1fa25ca0d1ef308a7c1/include/_kernel.h
mkdir -p ./
/usr/bin/clang -target x86_64-redhat-linux-gnu -march=pentium-m -D_CL_DISABLE_LONG -Xclang -ffake-address-space-map -emit-llvm -ffp-contract=off -x cl -D__OPENCL_VERSION__=120 -fsigned-char -c -o any.cl.bc ../../../lib/kernel/any.cl -include /builddir/build/BUILD/pocl-f5f8e278d6c5fe9a0b8ea1fa25ca0d1ef308a7c1/include/_kernel.h
error: unknown target CPU 'pentium-m'
error: unknown target CPU 'pentium-m'
error: unknown target CPU 'pentium-m'
make[4]: *** [abs_diff.cl.bc] Error 1
make[4]: *** Waiting for unfinished jobs....
make[4]: *** [abs.cl.bc] Error 1
make[4]: *** [add_sat.cl.bc] Error 1
error: error: unknown target CPU 'pentium-m'unknown target CPU 'pentium-m'
make[4]: *** [any.cl.bc] Error 1
make[4]: Leaving directory `/builddir/build/BUILD/pocl-f5f8e278d6c5fe9a0b8ea1fa25ca0d1ef308a7c1/lib/kernel/host'
make[4]: *** [all.cl.bc] Error 1
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory `/builddir/build/BUILD/pocl-f5f8e278d6c5fe9a0b8ea1fa25ca0d1ef308a7c1/lib/kernel'
make[2]: Leaving directory `/builddir/build/BUILD/pocl-f5f8e278d6c5fe9a0b8ea1fa25ca0d1ef308a7c1/lib'
make[2]: *** [all-recursive] Error 1
make[1]: Leaving directory `/builddir/build/BUILD/pocl-f5f8e278d6c5fe9a0b8ea1fa25ca0d1ef308a7c1'
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.HkM6HV (%build)
    Bad exit status from /var/tmp/rpm-tmp.HkM6HV (%build)
Child return code was: 1
EXCEPTION: Command failed. See logs for output.
 # ['bash', '--login', '-c', 'rpmbuild -bb --target x86_64 --nodeps builddir/build/SPECS/pocl.spec']
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mockbuild/trace_decorator.py", line 70, in trace
    result = func(*args, **kw)
  File "/usr/lib/python2.7/site-packages/mockbuild/util.py", line 361, in do
    raise mockbuild.exception.Error, ("Command failed. See logs for output.\n # %s" % (command,), child.returncode)
Error: Command failed. See logs for output.
 # ['bash', '--login', '-c', 'rpmbuild -bb --target x86_64 --nodeps builddir/build/SPECS/pocl.spec']
LEAVE do --> EXCEPTION RAISED

Whole log is here: http://kojipkgs.fedoraproject.org//work/tasks/2774/6222774/build.log

PARTITION_MAX_SUB_DEVICES is unknown to clGetDeviceInfo

The clinfo from https://github.com/Oblomov/clinfo queries properties based on the exposed OpenCL profile version (1.2 in the case of pocl). It seems that PARTITION_MAX_SUB_DEVICES is missing/unknown in pocl - but it should probably known:

Number of platforms                               1
  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 1.2 pocl 0.8
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     pthread
  Device Vendor                                   unknown
  Device Version                                  OpenCL 1.2 pocl
  Driver Version                                  0.8
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     Default
  Device Profile                                  FULL_PROFILE
  Max compute units                               4
  Max clock frequency                             3400MHz
  Device Partition                                (core)
printDeviceInfo:335: get PARTITION_MAX_SUB_DEVICES : error -30

pthreads on FreeBSD freeze on mutex

The phtread device will not work on FreeBSD "out of the box". The issue here is that a library may not initialize the threading on BSD. This will cause pocl to stall on some uninitialized internal mutex.
http://www.freebsd.org/cgi/query-pr.cgi?pr=163512

A simple work-around is to compile the OpenCL application with "-pthread", but this of course cannot be enforced from pocl, especially if an ICD loader is used.
The internal testsuite does run if "-pthread" is passed to ./configure in CFLAGS and CXXFLAGS, even if an ICD loader is used.

Assigning for loop variable to private variable makes it local

The kernel compiler makes int hitIndex into local variable even it should be private in the following kernel code. If I change for loops condition to i < 968 the whole kernel works just how it should work.

__kernel void draw(const __global int *faceCount) {

  int hitIndex = -1;

  for(int i = 0; i < *faceCount; ++i){
    if(i == 854 && get_global_id(0) == 67599){
      hitIndex = i;
      if(get_global_id(0) > 67595 && get_global_id(0) < 67600){
        printf("Changed value at global_id: %d, local_id %d, group_id %d, to: %i\n", get_global_id(0), get_local_id(0), get_group_id(0), i);
      }
    }
  }

  if(hitIndex > -1){
    // (This should only print if first print is printed with the same id)
    if(get_global_id(0) > 67595 && get_global_id(0) < 67600){
      printf("Value is changed at global_id: %d, local_id %d, group_id %d, to: %i\n", get_global_id(0), get_local_id(0), get_group_id(0), hitIndex);
    }
  }
}

The output of the kernel:

Changed value at global_id: 67599, local_id 3, group_id 16899, to: 854
Value is changed at global_id: 67596, local_id 0, group_id 16899, to: 854
Value is changed at global_id: 67597, local_id 1, group_id 16899, to: 854
Value is changed at global_id: 67598, local_id 2, group_id 16899, to: 854
Value is changed at global_id: 67599, local_id 3, group_id 16899, to: 854

And this is how I am calling calling the kernel:

  int ret;
  cl_mem faceCount_mem_obj_ = clCreateBuffer(context_, CL_MEM_READ_ONLY, 
        sizeof(int), NULL, &ret);
  int faceCount_ = 968;
  ret = clEnqueueWriteBuffer(command_queue_, faceCount_mem_obj_, CL_TRUE, 0, 
        sizeof(int), &faceCount_, 0, NULL, NULL);

  cl_kernel kernel = clCreateKernel(program_, "draw", &ret);

  ret = clSetKernelArg(kernel, 0, sizeof(cl_mem),(void*)&faceCount_mem_obj_);

  size_t global_item_size = 110880;
  size_t workGroupSize = 4;
  ret = clEnqueueNDRangeKernel(command_queue_, kernel, 1, NULL, 
          &global_item_size, &workGroupSize, 0, NULL, 
          NULL);

  ret = clReleaseKernel(kernel);

Test 96 fails strangely on OSX 10.9.1 Xcode 5.0.2 llvm 3.3 and llvm 3.4

This fails only when llvmapi is enabled. If configured with --disable-llvmapi all tests run fine. I tested this on both llvm 3.3 and 3.4.

mikaels-mbp:build-0.9 mikaelle$ cat tests/testsuite.dir/096/
run            testsuite.log  
mikaels-mbp:build-0.9 mikaelle$ cat tests/testsuite.dir/096/testsuite.log 
#                             -*- compilation -*-
96. testsuite-runtime.at:4: testing Host runtime tests ...
../../pocl-git/tests/testsuite-runtime.at:6: $abs_top_builddir/tests/runtime/test_clGetDeviceInfo
../../pocl-git/tests/testsuite-runtime.at:7: $abs_top_builddir/tests/runtime/test_clEnqueueNativeKernel
../../pocl-git/tests/testsuite-runtime.at:8: $abs_top_builddir/tests/runtime/test_clGetEventInfo
../../pocl-git/tests/testsuite-runtime.at:9: $abs_top_builddir/tests/runtime/test_clCreateProgramWithBinary
../../pocl-git/tests/testsuite-runtime.at:11: cd $abs_top_srcdir/tests/runtime/; $abs_top_builddir/tests/runtime/test_clBuildProgram
--- /dev/null   2014-01-15 23:28:57.000000000 +0200
+++ /Users/mikaelle/Projects/Vincit/webcl/pocl-testing/build-0.9/tests/testsuite.dir/at-groups/96/stderr    2014-01-15 23:28:58.000000000 +0200
@@ -0,0 +1,13 @@
+/usr/include/module.map:36:14: error: header 'float.h' not found
+      header "float.h" // note: supplied by compiler
+             ^
+/usr/include/module.map:81:14: error: header 'stdarg.h' not found
+      header "stdarg.h" // note: supplied by the compiler
+             ^
+/usr/include/module.map:113:14: error: header 'tgmath.h' not found
+      header "tgmath.h" // note: supplied by the compiler
+             ^
+/tmp/poclGT8bPO//program.cl:2:10: fatal error: 'test_kernel_src_in_pwd.h' file not found
+#include "test_kernel_src_in_pwd.h"
+         ^
+4 errors generated.
../../pocl-git/tests/testsuite-runtime.at:11: exit code was 1, expected 0
96. testsuite-runtime.at:4: 96. Host runtime tests (testsuite-runtime.at:4): FAILED (testsuite-runtime.at:11)
mikaels-mbp:build-0.9 mikaelle$

Missing mutex delete function

All basic locking mechanics in the pocl runtime environment is based on these three functions:

#define POCL_LOCK(__LOCK__) pthread_mutex_lock (&(__LOCK__))
#define POCL_UNLOCK(__LOCK__) pthread_mutex_unlock (&(__LOCK__))
#define POCL_INIT_LOCK(__LOCK__) pthread_mutex_init (&(__LOCK__), NULL)

What i miss, though, is POCL_DESTROY_LOCK, as initializing locks without releasing them creates a massive memory leak.

Please, add that functionality. (Even though i'm pretty sure it's not just a trivial modification)

Assertion failure with __attribute__((reqd_work_group_size))__

The NVIDIA OpenCL example oclRadixSort from https://developer.nvidia.com/opencl (and possibly others) fail with

oclRadixSort: ../../../lib/CL/pocl_llvm_api.cc:400: int pocl_llvm_get_kernel_arg_metadata(const char*, llvm::Module*, cl_kernel): Assertion `((arg_num-1) == kernel->num_args) && "Kernel argument count doesn't fit metadata arg count"' failed.

Removing __attribute__((reqd_work_group_size(WORKGROUP_SIZE, 1, 1))) from the kernel stops this assertion from being triggered.

Here’s a backtrace:

#0  0x00007ffff5123535 in raise () from /lib64/libc.so.6
#1  0x00007ffff51249b8 in abort () from /lib64/libc.so.6
#2  0x00007ffff511c5f2 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff511c6a2 in __assert_fail () from /lib64/libc.so.6
#4  0x00007ffff604496a in pocl_llvm_get_kernel_arg_metadata (
    kernel_name=0x418414 "scanExclusiveLocal1", input=0x66bb70, kernel=0xb50b70)
    at ../../../lib/CL/pocl_llvm_api.cc:400
#5  0x00007ffff6045f60 in pocl_llvm_get_kernel_metadata (program=0x639160, kernel=0xb50b70, 
    device_i=0, kernel_name=0x418414 "scanExclusiveLocal1", 
    device_tmpdir=0x7fffffffd430 "/tmp/pocl0p43AY/XXX", 
    descriptor_filename=0x7fffffffd030 "/tmp/pocl0p43AY/XXX/scanExclusiveLocal1/descriptor.so", 
    errcode=0x7fffffffd02c) at ../../../lib/CL/pocl_llvm_api.cc:726
#6  0x00007ffff603145e in POclCreateKernel (program=0x639160, 
    kernel_name=0x418414 "scanExclusiveLocal1", errcode_ret=0x7fffffffd8bc)
    at ../../../lib/CL/clCreateKernel.c:88
#7  0x0000000000406c49 in Scan::Scan (this=0x639118, GPUContext=0x628bb0, CommandQue=0x628d80, 
    numElements=512, 
    path=0x7fffffffdec8 "XXX/OpenCL/./bin/linux/debug//oclRadixSort") at src/Scan.cpp:49
#8  0x0000000000405bb9 in RadixSort::RadixSort (this=0x6390b0, GPUContext=0x628bb0, 
    CommandQue=0x628d80, maxElements=8192, 
    path=0x7fffffffdec8 "XXX/OpenCL/./bin/linux/debug//oclRadixSort", ctaSize=128, keysOnly=true) at src/RadixSort.cpp:31
#9  0x0000000000404e73 in main (argc=1, argv=0x7fffffffdb48) at src/oclRadixSort.cpp:112
(gdb) print arg_num
$1 = 4
(gdb) print kernel->num_args
$2 = 4

I’m using LLVM 3.3.

Assigning for loop variable to private variable makes it local 2

I was able to construct small example kernel code which shows that there is still similar bug than in #94. Misbehavior occurs with both POCL_WORK_GROUP_METHODs:

__kernel void draw(const __global int *limit, __global int *result) {
  int hitIndex = -1;

  for(int i = 0; i < *limit; ++i){
    if(i == 3 && get_global_id(0) == 6){
      hitIndex = i;
      printf("changing the value at global_id: %d, local_id %d, group_id %d, to: %i\n", get_global_id(0), get_local_id(0), get_group_id(0), hitIndex);
    }
  }

  if(hitIndex != -1){
    // (This should print if first print is printed with the same id)
    printf("value is changed at global_id: %d, local_id %d, group_id %d, to: %i\n", get_global_id(0), get_local_id(0), get_group_id(0), hitIndex);
    result[get_global_id(0)] = 1;
  }else{
    printf("VALUE NOT CHANGED AT: %d\n", get_global_id(0));
    result[get_global_id(0)] = 0;
  }
}
//Cleaned output when global_work_size is 8 and local_work_size is 4 :
//VALUE NOT CHANGED AT: 0
//VALUE NOT CHANGED AT: 1
//VALUE NOT CHANGED AT: 2
//VALUE NOT CHANGED AT: 3
//changing the value at global_id: 6, local_id 2, group_id 1, to: 3
//VALUE NOT CHANGED AT: 4
//VALUE NOT CHANGED AT: 5
//VALUE NOT CHANGED AT: 6
//VALUE NOT CHANGED AT: 7

//Cleaned output when global_work_size is 8 and local_work_size is 1 :
//VALUE NOT CHANGED AT: 0
//VALUE NOT CHANGED AT: 1
//VALUE NOT CHANGED AT: 2
//VALUE NOT CHANGED AT: 3
//VALUE NOT CHANGED AT: 4
//changing the value at global_id: 6, local_id 0, group_id 6, to: 3
//VALUE NOT CHANGED AT: 5
//value is changed at global_id: 6, local_id 0, group_id 6, to: 3
//VALUE NOT CHANGED AT: 7

Some tests are failing: release_0_9 branch on OSX Mavericks 10.9.1 XCode 5.0.2 with llvm-3.3 and deps from homebrew

Some tests are failing: 1 3 30 36 96 failed

Mikaels-MacBook-Pro:pocl-build mikaelle$  cat tests/testsuite.dir/001/testsuite.log 
#                             -*- compilation -*-
1. testsuite.at:30: testing example1: dot product ...
../../pocl-git/tests/testsuite.at:39: $abs_top_builddir/examples/example1/example1
--- /dev/null   2014-01-14 15:02:46.000000000 +0200
+++ /Users/mikaelle/Projects/Vincit/webcl/pocl-testing/pocl-build/tests/testsuite.dir/at-groups/1/stderr    2014-01-14 15:02:46.000000000 +0200
@@ -0,0 +1 @@
+error: unknown argument: '-ffp-co'
--- expout  2014-01-14 15:02:46.000000000 +0200
+++ /Users/mikaelle/Projects/Vincit/webcl/pocl-testing/pocl-build/tests/testsuite.dir/at-groups/1/stdout    2014-01-14 15:02:46.000000000 +0200
@@ -1,5 +1,4 @@
+ERROR
 (0.000000, 0.000000, 0.000000, 0.000000) . (0.000000, 0.000000, 0.000000, 0.000000) = 0.000000
-(1.000000, 1.000000, 1.000000, 1.000000) . (1.000000, 1.000000, 1.000000, 1.000000) = 4.000000
-(2.000000, 2.000000, 2.000000, 2.000000) . (2.000000, 2.000000, 2.000000, 2.000000) = 16.000000
-(3.000000, 3.000000, 3.000000, 3.000000) . (3.000000, 3.000000, 3.000000, 3.000000) = 36.000000
-OK
+(1.000000, 1.000000, 1.000000, 1.000000) . (1.000000, 1.000000, 1.000000, 1.000000) = 0.000000
+FAIL
../../pocl-git/tests/testsuite.at:39: exit code was 255, expected 0
1. testsuite.at:30: 1. example1: dot product (testsuite.at:30): FAILED (testsuite.at:39)
Mikaels-MacBook-Pro:pocl-build mikaelle$ 
Mikaels-MacBook-Pro:pocl-build mikaelle$  cat tests/testsuite.dir/003/testsuite.log 
#                             -*- compilation -*-
3. testsuite.at:48: testing example2a: matrix transpose (automatic locals) ...
../../pocl-git/tests/testsuite.at:51: $abs_top_builddir/examples/example2a/example2a
--- /dev/null   2014-01-14 15:02:46.000000000 +0200
+++ /Users/mikaelle/Projects/Vincit/webcl/pocl-testing/pocl-build/tests/testsuite.dir/at-groups/3/stderr    2014-01-14 15:02:48.000000000 +0200
@@ -0,0 +1 @@
+error: unknown argument: '-ffp-co'
--- -   2014-01-14 15:02:48.000000000 +0200
+++ /Users/mikaelle/Projects/Vincit/webcl/pocl-testing/pocl-build/tests/testsuite.dir/at-groups/3/stdout    2014-01-14 15:02:48.000000000 +0200
@@ -1,2 +1 @@
-OK

../../pocl-git/tests/testsuite.at:51: exit code was 255, expected 0
3. testsuite.at:48: 3. example2a: matrix transpose (automatic locals) (testsuite.at:48): FAILED (testsuite.at:51)
Mikaels-MacBook-Pro:pocl-build mikaelle$ 
Mikaels-MacBook-Pro:pocl-build mikaelle$  cat tests/testsuite.dir/036/testsuite.log 
#                             -*- compilation -*-
36. testsuite-workgroup.at:137: testing loop with two paths to the latch (full replication) ...
../../pocl-git/tests/testsuite-workgroup.at:141: POCL_DEVICES=basic POCL_WORK_GROUP_METHOD=workitemrepl $abs_top_builddir/tests/workgroup/run_kernel for_bug.cl 1 2 1 1
--- /dev/null   2014-01-14 15:05:33.000000000 +0200
+++ /Users/mikaelle/Projects/Vincit/webcl/pocl-testing/pocl-build/tests/testsuite.dir/at-groups/36/stderr   2014-01-14 15:05:33.000000000 +0200
@@ -0,0 +1 @@
+error: unknown argument: '-target core-avx-i'
--- -   2014-01-14 15:05:33.000000000 +0200
+++ /Users/mikaelle/Projects/Vincit/webcl/pocl-testing/pocl-build/tests/testsuite.dir/at-groups/36/stdout   2014-01-14 15:05:33.000000000 +0200
@@ -1,36 +1 @@
-gid_x 0 after barrier at iteration 16
-gid_x 1 after barrier at iteration 16
-gid_x 0 after barrier at iteration 15
-gid_x 1 after barrier at iteration 15
-gid_x 0 after barrier at iteration 14
-gid_x 1 after barrier at iteration 14
-gid_x 0 after barrier at iteration 13
-gid_x 1 after barrier at iteration 13
-gid_x 0 after barrier at iteration 12
-gid_x 1 after barrier at iteration 12
-gid_x 0 after barrier at iteration 11
-gid_x 1 after barrier at iteration 11
-gid_x 0 after barrier at iteration 10
-gid_x 1 after barrier at iteration 10
-gid_x 0 after barrier at iteration 9
-gid_x 1 after barrier at iteration 9
-gid_x 0 after barrier at iteration 8
-gid_x 1 after barrier at iteration 8
-gid_x 0 after barrier at iteration 7
-gid_x 1 after barrier at iteration 7
-gid_x 0 after barrier at iteration 6
-gid_x 1 after barrier at iteration 6
-gid_x 0 after barrier at iteration 5
-gid_x 1 after barrier at iteration 5
-gid_x 0 after barrier at iteration 4
-gid_x 1 after barrier at iteration 4
-gid_x 0 after barrier at iteration 3
-gid_x 1 after barrier at iteration 3
-gid_x 0 after barrier at iteration 2
-gid_x 1 after barrier at iteration 2
-gid_x 0 after barrier at iteration 1
-gid_x 1 after barrier at iteration 1
-gid_x 0 after barrier at iteration 0
-gid_x 1 after barrier at iteration 0
-OK

../../pocl-git/tests/testsuite-workgroup.at:141: exit code was 255, expected 0
36. testsuite-workgroup.at:137: 36. loop with two paths to the latch (full replication) (testsuite-workgroup.at:137): FAILED (testsuite-workgroup.at:141)
Mikaels-MacBook-Pro:pocl-build mikaelle$ 
Mikaels-MacBook-Pro:pocl-build mikaelle$  cat tests/testsuite.dir/096/testsuite.log 
#                             -*- compilation -*-
96. testsuite-runtime.at:4: testing Host runtime tests ...
../../pocl-git/tests/testsuite-runtime.at:6: $abs_top_builddir/tests/runtime/test_clGetDeviceInfo
../../pocl-git/tests/testsuite-runtime.at:7: $abs_top_builddir/tests/runtime/test_clEnqueueNativeKernel
../../pocl-git/tests/testsuite-runtime.at:8: $abs_top_builddir/tests/runtime/test_clGetEventInfo
../../pocl-git/tests/testsuite-runtime.at:9: $abs_top_builddir/tests/runtime/test_clCreateProgramWithBinary
../../pocl-git/tests/testsuite-runtime.at:11: cd $abs_top_srcdir/tests/runtime/; $abs_top_builddir/tests/runtime/test_clBuildProgram
--- /dev/null   2014-01-14 15:06:41.000000000 +0200
+++ /Users/mikaelle/Projects/Vincit/webcl/pocl-testing/pocl-build/tests/testsuite.dir/at-groups/96/stderr   2014-01-14 15:06:45.000000000 +0200
@@ -0,0 +1,4 @@
+/tmp/poclIEzav6//program.cl:2:10: fatal error: 'test_kernel_src_in_pwd.h' file not found
+#include "test_kernel_src_in_pwd.h"
+         ^
+1 error generated.
../../pocl-git/tests/testsuite-runtime.at:11: exit code was 1, expected 0
96. testsuite-runtime.at:4: 96. Host runtime tests (testsuite-runtime.at:4): FAILED (testsuite-runtime.at:11)
Mikaels-MacBook-Pro:pocl-build mikaelle$ 

I had configured pocl like this:

mkdir pocl-install
mkdir pocl-build
git clone --single-branch https://github.com/pocl/pocl.git -b release_0_9 pocl-git
cd pocl-git
./autogen.sh
cd ../pocl-build
../pocl-git/configure --prefix=$PWD/../pocl-install --disable-icd && make -j8 && make check

shuffle tests taking a lot of time in the test suite

Is it a bit overkill to spend 10-15+ mins to test a couple of built-in functions in the test suite? Perhaps a couple of edge cases should be tested or at least multiple tests combined to a single test kernel?

saturating arithmetic errors in x86_64(due to avx2?)

Tests 5 and 11 fail unexpectedly on one x86_64 Ubuntu 12.04 installation. The only difference to several working similar installations that I can think of is that this machine is a haswell, i.e. has avx2 instructions.
Tested with LLVM 3.3 only.
Marked as XFAIL in 2150cd1.

clpeak fails on x86_64

clpeak fails for me on x86_64:

$ ./clpeak

Platform: Portable Computing Language
  Device: pthread-Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
    Driver version  : 0.9 (Linux x64)
    Compute units   : 4
    Clock frequency : 3400 MHz

    Global memory bandwidth (GBPS)
      float   : 8.79
      float2  : 10.33
      float4  : 11.40
      float8  : 11.18
      float16 : 15.41

    Single-precision compute (GFLOPS)
      float   : 2.73
      float2  : 3.76
      float4  : 11.95
      float8  : 2.86
      float16 : 1.51

    Double-precision compute (GFLOPS)
      double   : 3.02
      double2  : 5.84
      double4  : 1.16
      double8  : Speicherzugriffsfehler (Speicherabzug geschrieben)
                        ^^ segfault

OpenGL(mesa)+OpenCL applications segfault due to multiple LLVMs

This is one of the problems already recorded in issue #44 and partially fixed in d86c335.

When an application links against mesa and pocl (even via ICD), and the mesa backend used is LLVMpipe, mesa will jump to the LLVM symbols introduced when LLVM was linked to pocl.

Commit d86c335, added -Bsymbolic flag to linker (on GNU systems). This fixed the inverse issue, i.e. libpocl jumped to the LLVM symbols found in mesa.

Sample backtrace:
#0 0x00007ffff6079475 in *__GI_raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff607c6f0 in *__GI_abort () at abort.c:92
#2 0x00007ffff6072621 in *__GI___assert_fail (assertion=0x7ffff1923ca0 "findOption(Name) == Values.size() && "Option already exists!"", file=, line=675, function=0x7

ffff1f77660 "void llvm::cl::parser::addLiteralOption(const char_, const DT&, const char_) [with DT = llvm::ScheduleDAGSDNodes* ()(llvm::SelectionDAGISel, llvm::CodeGenOpt:
:Level); DataType = llvm::Sche"...) at assert.c:81
#3 0x00007ffff07cab7c in void llvm::cl::parser<llvm::ScheduleDAGSDNodes* ()(llvm::SelectionDAGISel, llvm::CodeGenOpt::Level)>::addLiteralOption<llvm::ScheduleDAGSDNodes* (*)(llvm::

SelectionDAGISel_, llvm::CodeGenOpt::Level)>(char const_, llvm::ScheduleDAGSDNodes* (* const&)(llvm::SelectionDAGISel_, llvm::CodeGenOpt::Level), char const_) clone .part.609 fro
m /opt/llvm/3.4//lib/libLLVM-3.4.so
#4 0x00007ffff159618e in llvm::RegisterPassParserllvm::RegisterScheduler::NotifyAdd(char const_, void_ ()(), char const) () from /opt/llvm/3.4//lib/libLLVM-3.4.so
#5 0x00007fffeeff8236 in llvm::MachinePassRegistry::Remove(llvm::MachinePassRegistryNode*) () from /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so

printf() broken for multi-AS machines

The current printf() assumes all address spaces map to the same default one. This fails with multi-AS machines like TCE. It manifests itself as a crash in TargetAddressSpaces where it fails to convert the bitcast from the AS0 cl_printf to AS3 cl_printf.

The proper fix for this would be to provide a full implementation of printf in pocl which refers to __constant address space format string as in OpenCL specs. There are some open ones with permissive licenses which we might be able to utilize as a basis: http://www.sparetimelabs.com/tinyprintf/tinyprintf.php

This problem started to appear with LLVM 3.4 which places the constant strings to the __constant address space, thus the libc printf() does not by default work anymore.

missing symbols ilogb, ldexp in vecmathlib

The following kernel fails to link when pocl has vecmathlib enabled, works when it is disabled:

kernel void test_ilogb()
{
    float f=1.0;
    printf("ldexp: %f\n", ldexp(f,3));
    printf("ilogb: %f\n", ilogb(f));
}
$ ./tests/kernel/kernel test_ilogb
Running test test_ilogb...
/home/kraiskil/opencl/pocl/build/tests/kernel/.libs/lt-kernel: symbol lookup error: /tmp/pocliXT1H1/pthread/test_ilogb/1-1-1.0-0-0/parallel.so: undefined symbol: _Z6ldexp_fi

get_image_dim(image2d_t img) builtin seems to be missing,

mikaels-mbp:build-3.4-makefiles mikaelle$ bin/kernel-runner --kernel test --image 10 10  < image_dim_test.cl 
Platform: Portable Computing Language
Device pthread, version OpenCL 1.2 pocl
/tmp/pocl8v0D3H//program.cl:3:17: warning: implicit declaration of function 'get_image_dim' is invalid in C99
    int2 size = get_image_dim(arg0);
                ^
1 warning generated.
clSetKernelArg failed for cl_mem (image).
Undefined symbols for architecture x86_64:
  "_get_image_dim", referenced from:
      __test in parallel.so.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Assertion failed: (error == 0), function llvm_codegen, file ../../../../pocl-git/lib/CL/devices/common.c, line 116.
Abort trap: 6

convert_uint(float) implementation defined

ARM fails test_convert_type checks with lines like:
FAIL: convert_uint(float) - sample#: 2 element#: 0 original: -1.5 expected: 0xffffffff actual: 00000000
(e.g. http://tce.cs.tut.fi:8010/builders/ARM-wheezy-3.3/builds/2/steps/checks/logs/test.log )

i.e. convert_uint(-1.5f)
OpenCL-1.2, section 6.2.3.3
"Out of Range Behavior and Saturated Conversions" says "When converting from a floating point type to integer type, the behavior is implementation defined."

Is there any reason to try to shoe-horn LLVM to consistent behaviour
or should we just remove any tests whose results are implementation defined?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.