Coder Social home page Coder Social logo

trisycl / sycl Goto Github PK

View Code? Open in Web Editor NEW
106.0 22.0 19.0 1.4 GB

SYCL for Vitis: Experimental fusion of triSYCL with Intel SYCL oneAPI DPC++ up-streaming effort into Clang/LLVM

License: Other

xilinx-fpga xilinx-vitis trisycl oneapi-dpc sycl-compilation accelerators clang llvm dpc-toolchain cpp20

sycl's Introduction

triSYCL

1   ACAP++: C++ extensions for AMD Versal ACAP AIE1 architecture

See tests/acap for some code samples.

Look at doc/acap.rst to know more about how to install/use the ACAP++ environment.

2   Introduction

triSYCL is a research project to experiment with the specification of the SYCL standard and to give feedback to the Khronos Group SYCL_committee and also to the ISO C++ committee.

Because of lack of resources this SYCL implementation is very incomplete and should not be used by a normal end-user. Fortunately there are now many other implementations of SYCL available, including some strong implementations like ComputeCpp, DPC++ or hipSYCL that can be used on various targets.

This implementation is mainly based on C++23 features backed with OpenMP or TBB for parallel execution on the CPU, with Boost.Compute for the non single-source OpenCL interoperability layer and with an experimental LLVM/Clang version for the device compiler (from 2017-2018 which is now obsolete) providing full single-source SYCL experience, typically targeting a SPIR device. Since in SYCL there is a host fall-back, this CPU implementation can be seen as an implementation of this fall-back too.

Since around 2018 Intel has put a lot of effort in their own oneAPI DPC++ SYCL project to up-stream SYCL into LLVM/Clang, there is another project about merging the oneAPI DPC++ SYCL implementation with triSYCL at https://github.com/triSYCL/sycl to give a greater user experience for Xilinx FPGA instead of using our obsolete experimental clunky device compiler. But this is still very experimental because the Xilinx tool-chain is based on old incompatible versions of LLVM/Clang and nothing of these is supported by the Xilinx product teams.

Most of our efforts are focused on extensions, such as targeting Xilinx FPGA and Versal ACAP CGRA with internal developments on https://gitenterprise.xilinx.com/rkeryell/acappp.

triSYCL has been used to experiment and provide feedback for SYCL 1.2, 1.2.1, 2.2, 2020 and even the OpenCL C++ 1.0 kernel language from OpenCL 2.2.

This is provided as is, without any warranty, with the same license as LLVM/Clang.

Technical lead: Ronan at keryell point FR. Developments started first at AMD, then was mainly funded by Xilinx and now again by AMD since Xilinx has been bought by AMD in 2022.

It is possible to have a paid internship around triSYCL, if you have some skills related to this project. Contact the technical lead about this. AMD is also hiring in this area... :-)

3   SYCL

SYCL is a single-source modern C++-based DSEL (Domain Specific Embedded Language) and open standard from Khronos aimed at facilitating the programming of heterogeneous accelerators by leveraging existing concepts inspired by OpenCL, CUDA, C++AMP, OpenMP...

A typical kernel with its launch looks like this pure modern C++ code:

queue {}.submit([&](handler &h) {
    auto accA = bufA.get_access<access::mode::read>(h);
    auto accB = bufB.get_access<access::mode::write>(h);
    h.parallel_for<class myKernel>(myRange, [=](item i) {
        accA[i] = accB[i] + 1;
    });
});

Look for example at https://github.com/triSYCL/triSYCL/blob/master/tests/examples/demo_parallel_matrix_add.cpp for a complete example.

SYCL is developed inside the Khronos SYCL committee and thus, for more information on SYCL, look at https://www.khronos.org/sycl

Note that even if the concepts behind SYCL are inspired by OpenCL concepts, the SYCL programming model is a very general asynchronous task graph model for heterogeneous computing targeting various frameworks and API and has no relation with OpenCL itself, except when using the OpenCL API interoperability mode, like any other target.

For the SYCL ecosystem, look at https://sycl.tech

4   Documentation

4.1   Some reasons to use SYCL

Please see about SYCL to have some context, a list of presentations, some related projects.

4.2   Installation & testing

SYCL is a template library, so no real installation is required.

There are some examples you can build however.

See Testing.

4.3   Architecture of triSYCL runtime and compiler

Architecture of triSYCL runtime and compiler describes the code base with some high-level diagrams but also how it was possible to compile and use the obsolete device compiler on some Xilinx FPGA for example. Now look at https://github.com/triSYCL/sycl instead.

4.4   CMake infrastructure

Some details about CMake configuration and organization can be found in CMake.

4.5   Pre-processor macros used in triSYCL

Yes, there are some macros used in triSYCL! Look at Pre-processor macros used in triSYCL to discover some of them.

4.6   Environment variables used in triSYCL

See Environment variables with triSYCL.

4.7   Possible futures

See Possible futures.

4.8   triSYCL code documentation

The documentation of the triSYCL implementation itself can be found in https://trisycl.github.io/triSYCL/Doxygen/triSYCL/html and https://trisycl.github.io/triSYCL/Doxygen/triSYCL/triSYCL-implementation-refman.pdf

There are also some internal documentation at https://pages.gitenterprise.xilinx.com/rkeryell/acappp/Doxygen/acappp/html

5   News

  • 2023/06/09: merge the 5-year old branch experimenting with ACAP++ SYCL CPU model extensions for AMD Versal ACAP AIE1 CGRA like the XCVC1902 used in VCK190 or VCK5000 boards.
  • 2018/03/12: the long-going device compiler branch has been merged in to provide experimental support for SPIR-df friendly devices, such as PoCL or Xilinx FPGA. This is only for the brave for now.
  • 2018/02/01: there is now some documentation about the architecture of triSYCL on GPU and accelerators with its device compiler based on Clang/LLVM in doc/architecture.rst. While this is wildly experimental, there is a growing interest around it and it is always useful to get started as a contributor.
  • 2018/01/05: there are some internship openings at Xilinx to work on triSYCL for FPGA https://xilinx.referrals.selectminds.com/jobs/compiler-engineer-intern-on-sycl-for-fpga-4685 and more generally Xilinx is hiring in compilation, runtime, C++, SYCL, OpenCL, machine-learning...
  • 2017/12/06: the brand-new SYCL 1.2.1 specification is out and triSYCL starts moving to it
  • 2017/11/17: the presentations and videos from SC17 on SYCL and triSYCL are now online https://www.khronos.org/news/events/supercomputing-2017
  • 2017/09/19: there is a prototype of device compiler based on Clang/LLVM generating SPIR 2.0 "de facto" (SPIR-df) and working at least with PoCL and Xilinx SDx xocc for FPGA.
  • 2017/03/03: triSYCL can use CMake & ctest and works on Windows 10 with Visual Studio 2017. It works also with Ubuntu WSL on Windows. :-) More info
  • 2017/01/12: Add test case using the Xilinx compiler for FPGA
  • 2016/11/18: If you missed the free SYCL T-shirt on the Khronos booth during SC16, you can always buy some on https://teespring.com/khronos-hpc (lady's sizes available, so no excuse! :-) )
  • 2016/08/12: OpenCL kernels can be run with OpenCL kernel interoperability mode now.
  • 2016/04/18: SYCL 2.2 provisional specification is out. This version implement SYCL 2.2 pipes and reservations plus the blocking pipe extension from Xilinx.

sycl's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sycl's Issues

llvm/sycl/test/xocc_tests/disabled/edge_detection example does not work

The example stated in sycl/doc/GettingStartedAlveo.md:

  • sw_emu:
    It compiles and runs (indefinitely?) while displaying

    EXE: /home/rkeryell/sycl_workspace/llvm/sycl/test/xocc_tests/disabled/edge_detection/edge_detection
    [XRT] WARNING: unaligned host pointer '0x7fe590429040' detected, this leads to extra memcpy
    /opt/xilinx/Vitis/2020.1/data/emulation/unified/cpu_em/generic_pcie/model/genericpciemodel: symbol lookup error: /home/rkeryell/sycl_workspace/llvm/sycl/test/xocc_tests/disabled/edge_detection/.run/907981/sw_emu/device0/binary_0/dltmp: undefined symbol: llvm.sideeffect
    

    Is there an issue because the old internal LLVM chokes on newer llvm.sideeffect?

  • hw_emu:

    INFO: [v++ 60-585] Compiling for hardware emulation target
    INFO: [v++ 60-423]   Target device: xilinx_u200_xdma_201830_2
    INFO: [v++ 60-242] Creating kernel: 'xSYCL13730816572338433693'
    ERROR: [v++ 60-300] Failed to build kernel(ip) xSYCL13730816572338433693, see log for details: /home/rkeryell/sycl_workspace/llvm/sycl/test/xocc_tests/disabled/edge_detection/_x/sycl-xocc.Nytewv_xSYCL13730816572338433693/xSYCL13730816572338433693/vitis_hls.log
    ERROR: [v++ 60-599] Kernel compilation failed to complete
    ERROR: [v++ 60-592] Failed to finish compilation
    INFO: [v++ 60-1653] Closing dispatch client.
    clang-12: error: sycl-link-xocc command failed with exit code 1 (use -v to see invocation)
    
  • hw:

    INFO: [v++ 60-242] Creating kernel: 'xSYCL13730816572338433693'
    ERROR: [v++ 60-300] Failed to build kernel(ip) xSYCL13730816572338433693, see log for details: /home/rkeryell/sycl_workspace/llvm/sycl/test/xocc_tests/disabled/edge_detection/_x/sycl-xocc.03eRpw_xSYCL13730816572338433693/xSYCL13730816572338433693/vitis_hls.log
    ERROR: [v++ 60-599] Kernel compilation failed to complete
    ERROR: [v++ 60-592] Failed to finish compilation
    INFO: [v++ 60-1653] Closing dispatch client.
    clang-12: error: sycl-link-xocc command failed with exit code 1 (use -v to see invocation)
    rkeryell@rk-xsj:~/sycl_workspace/llvm/sycl/test/xocc_tests/disabled/edge_detection (sycl/unified/next)$ m /home/rkeryell/sycl_workspace/llvm/sycl/test/xocc_tests/disabled/edge_detection/_x/sycl-xocc.03eRpw_xSYCL13730816572338433693/xSYCL13730816572338433693/vitis_hls.log
    

Compiler crash with named function object

I am trying to compile the attached code (simple N-body) with the current versions of the compiler (up-to-date sycl/unified/master) and SDAccel (2018.3) and a self-compiled XRT (up-to-date 2018.3), OS is Ubuntu 18.10. After starting the compilation the compiler crashed almost immediately with the following message:

$ clang++ -std=c++2a -fsycl -fsycl-targets=fpga64-xilinx-unknown-sycldevice nbody.cpp -o nbody.sw_emu -lOpenCL
Stack dump:
0.	Program arguments: /home/jan/software/sycl/bin/opt -O3 -asfix -globaldce -inSPIRation -globaldce -kernelNameGen /tmp/nbody-4bd090.o -o /tmp/nbody_kernels-optimized.bc 
1.	Running pass 'ASFixer' on module '/tmp/nbody-4bd090.o'.
 #0 0x0000562b936c72ca llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/home/jan/software/sycl/bin/opt+0x20932ca)
 #1 0x0000562b936c5194 llvm::sys::RunSignalHandlers() (/home/jan/software/sycl/bin/opt+0x2091194)
 #2 0x0000562b936c5315 SignalHandler(int) (/home/jan/software/sycl/bin/opt+0x2091315)
 #3 0x00007f7f8a12fdd0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12dd0)
 #4 0x0000562b930e1fa3 llvm::Value::getContext() const (/home/jan/software/sycl/bin/opt+0x1aadfa3)
 #5 0x0000562b930e395a llvm::ValueHandleBase::AddToUseList() (/home/jan/software/sycl/bin/opt+0x1aaf95a)
 #6 0x0000562b93701113 llvm::ValueMap<llvm::Value const*, llvm::WeakTrackingVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >::operator[](llvm::Value const* const&) (.isra.418) (/home/jan/software/sycl/bin/opt+0x20cd113)
 #7 0x0000562b937048bc llvm::CloneFunctionInto(llvm::Function*, llvm::Function const*, llvm::ValueMap<llvm::Value const*, llvm::WeakTrackingVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, bool, llvm::SmallVectorImpl<llvm::ReturnInst*>&, char const*, llvm::ClonedCodeInfo*, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) (/home/jan/software/sycl/bin/opt+0x20d08bc)
 #8 0x0000562b9391d1fb (anonymous namespace)::createNewFunction(llvm::Function*, llvm::FunctionType*, llvm::DenseMap<llvm::Value*, llvm::Value*, llvm::DenseMapInfo<llvm::Value*>, llvm::detail::DenseMapPair<llvm::Value*, llvm::Value*> >&, llvm::DenseMap<llvm::User*, llvm::SmallVector<std::pair<unsigned int, llvm::Value*>, 32u>, llvm::DenseMapInfo<llvm::User*>, llvm::detail::DenseMapPair<llvm::User*, llvm::SmallVector<std::pair<unsigned int, llvm::Value*>, 32u> > >&) (/home/jan/software/sycl/bin/opt+0x22e91fb)
 #9 0x0000562b93920637 (anonymous namespace)::doReplace(llvm::DenseMap<llvm::Value*, llvm::Value*, llvm::DenseMapInfo<llvm::Value*>, llvm::detail::DenseMapPair<llvm::Value*, llvm::Value*> >&, llvm::DenseMap<llvm::User*, llvm::SmallVector<std::pair<unsigned int, llvm::Value*>, 32u>, llvm::DenseMapInfo<llvm::User*>, llvm::detail::DenseMapPair<llvm::User*, llvm::SmallVector<std::pair<unsigned int, llvm::Value*>, 32u> > >&, llvm::DenseMap<llvm::Function*, llvm::FunctionType*, llvm::DenseMapInfo<llvm::Function*>, llvm::detail::DenseMapPair<llvm::Function*, llvm::FunctionType*> >&) (/home/jan/software/sycl/bin/opt+0x22ec637)
#10 0x0000562b93923ed1 (anonymous namespace)::ASFixer::runOnModule(llvm::Module&) (/home/jan/software/sycl/bin/opt+0x22efed1)
#11 0x0000562b9308fa02 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/jan/software/sycl/bin/opt+0x1a5ba02)
#12 0x0000562b91c4e70f main (/home/jan/software/sycl/bin/opt+0x61a70f)
#13 0x00007f7f897d509b __libc_start_main /build/glibc-B9XfQf/glibc-2.28/csu/../csu/libc-start.c:308:16
#14 0x0000562b91cdc51a _start (/home/jan/software/sycl/bin/opt+0x6a851a)
/home/jan/software/sycl/bin/sycl-xocc: line 76:  9118 Segmentation fault      (core dumped) $OPT -O3 -asfix -globaldce -inSPIRation -globaldce -kernelNameGen "$4" -o "$5/$3_kernels-optimized.bc"
/home/jan/software/sycl/bin/llvm-link: /tmp/nbody_kernels-optimized.bc: error: Could not open input file: No such file or directory
/home/jan/software/sycl/bin/llvm-link: error:  loading file '/tmp/nbody_kernels-optimized.bc'
/home/jan/software/sycl/bin/sycl-xocc: line 87: /tmp/KernelNames_nbody.txt: No such file or directory
ERROR: No input file specified.
Allowed options:
[...]
/usr/bin/ld: /tmp/nbody-d900a3.o: file not recognized: file truncated
clang-9: error: sycl-link-xocc command failed with exit code 1 (use -v to see invocation)
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

I suppose something is still wrong with my code but I assume that a compiler crash isn't the intended behaviour here ;-)
nbody.zip

[SYCL][Driver][sycl-xocc] Allow filtering of Clang arguments down to xocc compile and link stages

For example when a user adds -g or -O3 for debugging or optimization this should be filtered down through the xocc ToolChain to the sycl-xocc script and passed to the xocc compiler.

There are possibly other examples of arguments that Clang and xocc share that should be passed down as well. And perhaps allowing a user to specify certain xocc or sycl-xocc only arguments would also be useful in someway.

Requires a second invocation of make to build sycl-xocc

Currently need to invoke make a second time after the initial compile described in the GetStartedWithSYCLCompiler.md for the sycl-xocc script to be properly "built" into the build/bin directory. Unsure why at the moment, seems to be required for other sycl related things like sycl-check as well.

std::make_tuple inside a lambda does not compile

Problem when compiling a call to std::make_tuple inside a lambda in a kernel.

The following code don't compile :

cgh.single_task<class array_add>([=]() {
      auto f = [&] (auto i) {buff_w[0] = std::get<0>(std::make_tuple(0));};
      f(0);
 });

and xocc emits the following error message :

ERROR: [XOCC 200-41] in function 'xSYCL12199936647150665982': unsupported pointer reinterpretation from type 'i64*' to type 'i32*'.
ERROR: [XOCC 200-71] in function 'xSYCL12199936647150665982': function 'FORWARD_REFERENCE&& std::forward<int>(std::remove_reference<FORWARD_REFERENCE>::type&)' has no function body.
ERROR: [XOCC 200-71] in function 'xSYCL12199936647150665982': function 'std::tuple_element<FORWARD_REFERENCE, std::tuple<FORWARD_REFERENCE' has no function body.
ERROR: [XOCC 200-71] in function 'xSYCL12199936647150665982': function 'FORWARD_REFERENCE&& std::forward<int&&>(std::remove_reference<FORWARD_REFERENCE>::type&)' has no function body.
ERROR: [XOCC 200-70] Synthesizability check failed.
ERROR: [XOCC 60-300] Failed to build kernel(ip) xSYCL12199936647150665982, see log for details: /var/tmp/victorl/result/issue_tuple_lambda/hw/2019-07-24_14_10/_x/xSYCL12199936647150665982/xSYCL12199936647150665982/vivado_hls.log
ERROR: [XOCC 60-599] Kernel compilation failed to complete
ERROR: [XOCC 60-592] Failed to finish compilation

The following code is compiling and working without any issue :

cgh.single_task<class array_add>([=]() {
      auto f = [&] () {buff_w[0] = std::get<0>(std::make_tuple(0));};
      f();
});

Full compilation fails

All the full compilation of sycl/unified/next, Ralender/Merge2 and sycl/unified/next+Ralender/Merge2 fail with various mileage.
@Ralender Could you first merge sycl/unified/next into your latest Ralender/Merge2 and test the following compilation recipe to be sure we are not breaking the original big picture?

# Pick some place where SYCL has to be compiled:
export SYCL_HOME=~/sycl_workspace
mkdir $SYCL_HOME
cd $SYCL_HOME
git clone --branch sycl/unified/next [email protected]:triSYCL/sycl.git llvm
python $SYCL_HOME/llvm/buildbot/configure.py
python $SYCL_HOME/llvm/buildbot/compile.py

[SYCL] Common functionality header for InSPIRation and SYCL runtime

At the moment the InSPIRation LLVM pass and SYCL runtime share some common code that should be refactored into a shared header that can be reused across the LLVM pass and runtime. This will make it easier to maintain long term.

Current Examples are:
(0) program_manager.cpp: static std::string getUniqueName(const char *KernelName)
(0) InSPIRation.cpp: void setUniqueName(Function &F)
(1) kernel_properties.hpp: static std::vector<size_t> get_reqd_work_group_size(std::string mangledKernelName)
(1) InSPIRation.cpp: SmallVector<llvm::Metadata *, 8> getReqdWorkGroupSize(const std::string& demangledName, LLVMContext &Ctx)

Another option would be to rework these components in the future so that the LLVM Pass and runtime are less intrinsically linked.

Related to discussion in: #19

[SYCL] SDAccel Example Ports

Port some further SDAccel Examples to SYCL Examples such as the smithwaterman example and wip median_filter.

This helps us find issues in the current infrastructure and showcase what SYCL can do versus a traditional OpenCL model.

Simple xocc test fails to compile in hw_emu mode

I followed the build instructions in GetStartedWithSYCLCompiler.md and tried to compile the test case mentioned in XilinxFPGACompilation.md. This produces the following error message:

$ /home/ubuntu/software/sycl/compiler/bin/clang++ -std=c++2a -fsycl -fsycl-xocc-device \ 
  single_task_vector_add.cpp -o single_task_vector_add -lOpenCL
In file included from single_task_vector_add.cpp:6:
In file included from /home/ubuntu/software/sycl/compiler/lib/clang/9.0.0/include/CL/sycl.hpp:11:
In file included from /home/ubuntu/software/sycl/compiler/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:14:
In file included from /home/ubuntu/software/sycl/compiler/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:10:
In file included from /home/ubuntu/software/sycl/compiler/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl.hpp:15:
In file included from /home/ubuntu/software/sycl/compiler/lib/clang/9.0.0/include/CL/sycl/detail/queue_impl.hpp:12:
In file included from /home/ubuntu/software/sycl/compiler/lib/clang/9.0.0/include/CL/sycl/detail/scheduler/scheduler.h:15:
In file included from /home/ubuntu/software/sycl/compiler/lib/clang/9.0.0/include/CL/sycl/device.hpp:12:
In file included from /home/ubuntu/software/sycl/compiler/lib/clang/9.0.0/include/CL/sycl/detail/device_impl.hpp:13:
/home/ubuntu/software/sycl/compiler/lib/clang/9.0.0/include/sycl_wrappers/algorithm:498:47: error: non-constexpr declaration of 'min_element'
      follows constexpr declaration
_LIBCPP_CONSTEXPR_AFTER_CXX11 ForwardIterator min_element(ForwardIterator first,
                                              ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/stl_algo.h:5610:12: note: previous declaration is here
    inline min_element(_ForwardIterator __first, _ForwardIterator __last)

Similar messages are following for lines 502:47, 507:40, 510:40, 513:52, 516:33, 527:47, 531:47, 536:40, 539:40, 542:52, 545:33, 549:1, 553:1, 556:58, 561:1, 564:42, and 567:42 in sycl_wrappers/algorithm.

The system is a fresh Ubuntu 18.10 installation with SDAccel 2018.3 and a self-compiled XRT (2018.3 branch). I'm using the sycl/unified/master branch which is up-to-date as of writing this issue.

[v++] v++ seems to infinite loop

This seems to be a bug in hls not triSYCL

v++ from 2020.1 with this command seems to infinte loop.
v++ --target sw_emu --platform xilinx_u200_xdma_201830_2 --xp param:compiler.hlsDataflowStrictMode=off --save-temps -c -k xSYCL17601996295371491662 -o out.xo tmp.xpirbc
tmp.xpirbc is inside this archive tmp.zip

[SYCL] xclbinutil crashes with segmentation fault in sw_emu mode

I thought I'd give the n-body example from #28 another shot and see how things improved. This caused a segmentation fault in xclbinutil during linking in sw_emu mode. Versions used: up-to-date master (c3c6023) for the compiler, up-to-date master for XRT (Xilinx/XRT@feb6849), SDAccel 2019.1.1, Ubuntu 19.04.

Environment:

$: echo $XCL_EMULATION_MODE
sw_emu
$: echo $XILINX_PLATFORM
xilinx_u200_xdma_201830_2
$: apt search xdma
xilinx-u200-xdma/now 201830.2-2580015 all [installed,local]
xilinx-u200-xdma-dev/now 201830.2-2580015 amd64 [installed,local]

Command line:

clang++ -std=c++2a -fsycl -fsycl-targets=fpga64-xilinx-unknown-sycldevice nbody.cpp -o nbody.sw_emu -lOpenCL -I/opt/xilinx/xrt/include

Error message:

[previous messages snipped]
****** xocc v2019.1.1 (64-bit)
  **** SW Build 2580384 on Sat Jun 29 08:04:45 MDT 2019
    ** Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

Attempting to get a license: ap_opencl
Feature available: ap_opencl
INFO: [XOCC 60-1306] Additional information associated with this xocc link can be found at:
	Reports: /home/jan/workspace/sycl-nbody/_x/reports/link
	Log files: /home/jan/workspace/sycl-nbody/_x/logs/link
INFO: [XOCC 60-629] Linking for software emulation target
INFO: [XOCC 60-1316] Initiating connection to rulecheck server, at Thu Aug 15 15:55:15 2019
Running Rule Check Server on port:43061
INFO: [XOCC 60-1315] Creating rulecheck session with output '/home/jan/workspace/sycl-nbody/_x/reports/link/xocc_link_nbody-0e0111_guidance.html', at Thu Aug 15 15:55:16 2019
INFO: [XOCC 60-895]   Target platform: /opt/xilinx/platforms/xilinx_u200_xdma_201830_2/xilinx_u200_xdma_201830_2.xpfm
INFO: [XOCC 60-423]   Target device: xilinx_u200_xdma_201830_2
ERROR: [XOCC 60-399] xclbinutil failed, please see log file for detail: '/home/jan/workspace/sycl-nbody/_x/link/int/nbody-0e0111_xclbinutil.log'
ERROR: [XOCC 60-626] Kernel link failed to complete
ERROR: [XOCC 60-703] Failed to finish linking
/home/jan/software/sycl/xilinx/bin/sycl-xocc: line 179: 23871 Segmentation fault      (core dumped) $XILINX_XRT/bin/xclbinutil --info --input "$OUTPUT_FILE_NAME" &> "$OUTPUT_FILE_NAME.xclbinutil.dump"
/usr/bin/ld: /tmp/nbody-3310d8.o: file not recognized: file truncated
clang-9: error: sycl-link-xocc command failed with exit code 139 (use -v to see invocation)
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

Unfortunately the log file mentioned in the output is present but empty.

[SYCL] Invalid record when comparing reference to value

Example code below. This bug isn't present in the Intel compiler (or has been fixed already). After the initial error occurs compilation resumes and produces an executable which doesn't crash.

Command line and subsequent output:

$ LC_ALL="C" clang++ -O3 -std=c++2a -fsycl -fsycl-targets=fpga64-xilinx-unknown-sycldevice main.cpp -o invalid_record -lOpenCL -I/opt/xilinx/xrt/include

/home/jan/software/sycl/xilinx/bin/opt: /tmp/main-bee693.o: error: Invalid record (Producer: 'LLVM9.0.0svn' Reader: 'LLVM 9.0.0svn')
warning: Linking two modules of different data layouts: '/opt/xilinx/SDx/2019.1/bin/../lnx64/lib/libspir64-39-hls.bc' is 'e-m:e-i64:64-i128:128-i256:256-i512:512-i1024:1024-i2048:2048-i4096:4096-n8:16:32:64-S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024' whereas 'llvm-link' is 'e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024'

warning: Linking two modules of different target triples: /opt/xilinx/SDx/2019.1/bin/../lnx64/lib/libspir64-39-hls.bc' is 'fpga64-xilinx-none' whereas 'llvm-link' is 'spir64'

Invoking xocc Kernel Compilation
xocc: /opt/xilinx/SDx/2019.1/bin/xocc
--target: sw_emu
--platform: xilinx_u200_xdma_201830_2
Compiling kernel: xSYCL9318603472913316893
Outputting file to: /tmp/xSYCL9318603472913316893.xo
Input file is: /tmp/main_kernels-linked.xpirbc
Option Map File Used: '/opt/xilinx/SDx/2019.1/data/sdx/xocc/optMap.xml'

****** xocc v2019.1 (64-bit)
  **** SW Build 2552052 on Fri May 24 14:47:09 MDT 2019
    ** Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

Attempting to get a license: ap_opencl
Feature available: ap_opencl
INFO: [XOCC 60-1306] Additional information associated with this xocc compile can be found at:
	Reports: /home/jan/workspace/sycl-bugreports/xilinx/invalid_record/_x/reports/xSYCL9318603472913316893
	Log files: /home/jan/workspace/sycl-bugreports/xilinx/invalid_record/_x/logs/xSYCL9318603472913316893
INFO: [XOCC 60-585] Compiling for software emulation target
INFO: [XOCC 60-1316] Initiating connection to rulecheck server, at Fri Sep 20 17:24:53 2019
Running Rule Check Server on port:45325
INFO: [XOCC 60-1315] Creating rulecheck session with output '/home/jan/workspace/sycl-bugreports/xilinx/invalid_record/_x/reports/xSYCL9318603472913316893/xocc_compile_xSYCL9318603472913316893_guidance.html', at Fri Sep 20 17:24:54 2019
INFO: [XOCC 60-895]   Target platform: /opt/xilinx/platforms/xilinx_u200_xdma_201830_2/xilinx_u200_xdma_201830_2.xpfm
INFO: [XOCC 60-423]   Target device: xilinx_u200_xdma_201830_2
INFO: [XOCC 60-242] Creating kernel: 'xSYCL9318603472913316893'
INFO: [XOCC 60-594] Finished kernel compilation
INFO: [XOCC 60-586] Created /tmp/xSYCL9318603472913316893.xo
INFO: [XOCC 60-791] Total elapsed time: 0h 0m 9s
Option Map File Used: '/opt/xilinx/SDx/2019.1/data/sdx/xocc/optMap.xml'

****** xocc v2019.1 (64-bit)
  **** SW Build 2552052 on Fri May 24 14:47:09 MDT 2019
    ** Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

Attempting to get a license: ap_opencl
Feature available: ap_opencl
INFO: [XOCC 60-1306] Additional information associated with this xocc link can be found at:
	Reports: /home/jan/workspace/sycl-bugreports/xilinx/invalid_record/_x/reports/link
	Log files: /home/jan/workspace/sycl-bugreports/xilinx/invalid_record/_x/logs/link
INFO: [XOCC 60-629] Linking for software emulation target
INFO: [XOCC 60-1316] Initiating connection to rulecheck server, at Fri Sep 20 17:25:04 2019
Running Rule Check Server on port:45591
INFO: [XOCC 60-1315] Creating rulecheck session with output '/home/jan/workspace/sycl-bugreports/xilinx/invalid_record/_x/reports/link/xocc_link_main-261146_guidance.html', at Fri Sep 20 17:25:05 2019
INFO: [XOCC 60-895]   Target platform: /opt/xilinx/platforms/xilinx_u200_xdma_201830_2/xilinx_u200_xdma_201830_2.xpfm
INFO: [XOCC 60-423]   Target device: xilinx_u200_xdma_201830_2
INFO: [XOCC 60-586] Created /tmp/main-261146.out
INFO: [XOCC 60-1307] Run completed. Additional information can be found in:
	Guidance: /home/jan/workspace/sycl-bugreports/xilinx/invalid_record/_x/reports/link/xocc_link_main-261146_guidance.html
	Steps Log File: /home/jan/workspace/sycl-bugreports/xilinx/invalid_record/_x/logs/link/link.steps.log

INFO: [XOCC 60-791] Total elapsed time: 0h 0m 14s

Example code:

#include <CL/sycl.hpp>

struct kernel;

auto cmp(std::size_t& large, std::size_t small)
{
    auto val = (large > small) ? small : large;
    return val;
}

auto main() -> int
{
    auto queue = cl::sycl::queue{};

    auto s_buf = cl::sycl::buffer<std::size_t, 3>{
                    cl::sycl::range<3>{42, 42, 42}};

    queue.submit([&](cl::sycl::handler& cgh)
    {
        auto s = s_buf.get_access<cl::sycl::access::mode::read_write>(cgh);

        cgh.single_task<kernel>([=]()
        {
            const auto id = cl::sycl::id<3>{0, 0, 0};
            s[id] = cmp(s[id], 40);
        });
                                                
    });
    queue.wait();

    return 0;
}

[SYCL] fpga_pipes.cpp pipe ambiguity error with unistd.h pipe

Seems to be something incorrect in the Clang/LLVM/libsycl.so build process or some incorrect system inclusion as part of the SYCL compilation process causing unistd.h to be added (could just be an out of order inclusion). This isn't a problem when using an unmodified Intel SYCL compiler (although technically were behind a few patches so maybe a follow up patch I don't recall fixed it).

It results in the following:

/storage/ogozillo/intel-sycl/sycl/build/lib/clang/9.0.0/include/CL/sycl/pipes.hpp:18:68: note: candidate found by name lookup is 'cl::sycl::pipe'
template <class name, class dataT, int32_t min_capacity = 0> class pipe {
^
fpga_pipes.cpp:141:11: error: reference to 'pipe' is ambiguous
pipe<class pipe_type_for_lambdas, int>::read(SuccessCode);
^
/usr/include/unistd.h:417:12: note: candidate found by name lookup is 'pipe'
extern int pipe (int __pipedes[2]) __THROW __wur;
^

Workaround is just to always be specific that you're using SYCL's pipe i.e. longhand cl::sycl::pipe.

[SYCL] Optionally compile all tests for Xilinx FPGA

Extend the existing SYCL test infrastructure to optionally compile and execute all of the current SYCL tests (other than tests with Intel specific extensions) for Xilinx FPGA via the xocc ToolChain when make check-all is invoked.

This should be made optional/opt-in in some way as compiling all of the tests for FPGA will take a significant amount of time. This would 1) allow us to continue using the build-bot in pull requests and 2) allow people to just test the basic SYCL runtime functionality without Xilinx alterations.

Allowing the ability to choose any composition of sw emulation/hw emulation/hardware emulation would be ideal, but to start with just sw emulation would also be acceptable.

The end goal here is to have an easy to way to check all the existing tests to see what's functioning and what isn't for Xilinx FPGA's and to have a simple way to check for regressions.

[SYCL] Tests that require fixes for SDAccel 2019.1.1 hw_emu

The following tests have difficulty compiling for hw_emu in SDAccel 2019.1.1 at the moment:
vector_math.cpp (ICE in xocc: "Bitcode for HLS" pass)
edge_detection.cpp (ICE in xocc: 'Function Pass Manager' and 'Global Value Numbering')

The following tests compile but have run time problems for hw_emu in SDAccel 2019.1.1 at the moment:
id_mangle.cpp (XTLM Error-007:: Data Pointer)
reqd_work_group_size.cpp (XTLM Error-007:: Data Pointer)

[SYCL][Runtime] Hierarchical Parallelism/parallel_for_workgroup port for Xilinx FPGA

Currently parallel_for_workgroup is implemented for intel devices, but will not work on Xilinx devices. It should hopefully be a very easy fix, but it's low priority for us at the time of writing this issue (the construct has limited value for FPGA).

For the most part it should just take some work to find SPIR equivalents to the SPIRV intrinsics inside of group (should just be barriers and fences): https://github.com/triSYCL/sycl/blob/sycl/unified/next/sycl/include/CL/sycl/group.hpp#L36

And then implement the change in a way that will not break the Intel implementation, while allowing it to work for Xilinx FPGA. This can and should probably be done similarly to the existing SPIR SIMT intrinsics (get_global_id/get_global_size etc.).

Most of the existing legwork should be there, should just be a job of understanding how we do the existing spir-df intrinsics and learning to do the same for the barrier intrinsics! That's at the time of writing, this may change over time... and if you can see a better way of doing the intrinsics than currently exists, by all means feel free to improve it (we're not married to the current implementation, it's a means to an end!).

SYCL Vitis 2020.1 tests do not work out-of-the-box

I have updated the recipe to be independent from make b26fd4a
There are still some issues with the fact we compile a lot of stuff (like level-0) while injecting Xilinx code relying on C++20 but the global library is not compiled with C++20. Put C++20 in the library compilation flags?

[SYCL][sycl-xocc] Cleanup intermediate files sycl-xocc generates and places in /tmp/

Not the biggest problem and rather useful for debugging right now, but we shouldn't really be filling a users /tmp/ directory with garbage files and not cleaning them up!

This could be linked into #25 so that when a user passes --save-temps to Clang, we do not clean up our intermediate files. However, there was a bug with using --save-temps when compiling for Intel devices and there is an open Issue on the Intel/SYCL repository relating to this it. Although, it appears that when altering the driver to use our own ToolChain and fpga/xilinx triple it may have inadvertently bypassed it

[SYCL][Driver] Refactor xocc driver script back into the LLVM Driver/ToolChain

I'm not sure how interested you'd (@keryell) be in changing this, but I thought I'd open up an issue in any case as a general reminder that it can be done. Feel free to close it though.

It seems the Intel SYCL team had a similar problem to us in that the Clang driver doesn't really play well with unknown number of outputs from a tool step as it can't predict the future and the commands are set in stone before the commands are executed. So if your tools don't support File Lists (which xocc doesn't seem to for the moment) as an input you have a problem. So like you've noticed they made this tool: intel/llvm#793

I think you could probably use this to get rid of at least part of the xocc driver script if you ever wanted to. Part of the reason the script exists is because you don't know how many xocc compilation invocations you need to make until you've executed the optimization passes on the module (as you need to execute one per kernel) so you can't generate the right number of ConstructJob calls to xocc at the time the Clang Driver wants to know what jobs need to be done.

You could maybe circumvent that issue by generating a list of files from the Optimization passes (or another tool) and feeding it to a ConstructJob of Intel's new tool that will call an xocc invocation per item in the list of files (kernel names or paths to each kernel).

Refactoring the linker step out of the script and back into the Clang driver is likely trivial in comparison.

I think I previously mentioned something less well thought out than this, so it's nice to see a possible solution to it! Maybe there is a more trivial solution I'm overlooking though or I'm misunderstanding the tools purpose.

Perhaps a worthy avenue to look at if the Xilinx SYCL FPGA ToolChain ever became less research oriented and was looking at upstreaming as I imagine using a (hopefully) pre-existing tool would go down better than adding the dependency of a script.

[SYCL] Refactor Kernel Renaming

At the moment we translate kernel names to a hashed (SHA-1) representation during the InSPIRation pass and during execution of the SYCL runtime for a Xilinx device we unhash the name to its original name to correctly call the right kernel through the OpenCL runtime.

This little idiosyncrasy is to work around the fact that xocc requires kernel names passed to it for compilation (for looking up the kernel you wish compiled among other things) but dislikes these names having odd characters that are unfortunately present in normal C++ manglings e.g. _$.

It would be ideal to find a more optimal solution for this, working out what that is will most likely be the complex component of this task.

[SYCL] Support Split Device Compilation

Currently we only really support single step compilation of SYCL source code to end binary output. However, it's entirely possible to support two step compilation if we wish, this could allow generation of a SPIR binary or XCL binary (xocc output binary after compilation and linking).

I'm not sure how important the XCL binary use-case is, but the SPIR binary output is pretty useful e.g. SYCL offloading to POCL. It also lets you debug the SPIR output a little easier and run additional passes without needing to alter the compiler too much.

There was functionality in place for two step compilation originally in the SYCL up-streaming implementation e.g. in the Build and Test a simple SYCL program section: https://github.com/intel/llvm/blob/e7219557e61e03d59db89346cf43a53e103973ed/sycl/doc/GetStartedWithSYCLCompiler.md

It seems to still exist but it's no longer documented for the time being, so it may require some maintenance. The above device side step can be hooked into with an optional -fsycl-use-bitcode flag to generate LLVM-IR. Although, without the InSPIRation pass being ran on the output the LLVM-IR still contains SPIRV builtins.

To truly get some form of SPIR-df you would need to run the LLVM-IR through our InSPIRation pass at the moment. Having this done cleanly through a compiler invocation is the tricky part. With this final output it should be possible to use the SPIR-df output with a POCL runtime using the second command (I had some success with this in the past, although there are some issues with certain builtins at the moment: pocl/pocl#698 this should be tackled in a separate issue however)

XCL binary support is probably a different beast, as it requires you to invoke just the xocc component of the SYCL ToolChain without any additional offloading and wrapping steps into a final binary.

[SYCL] Experiment with Compile Time Regular Expressions in SYCL

Perhaps an interesting little blue skies research project for C++20. We rely a lot on regex's at the moment, so a very interesting C++ project for the runtime properties (and perhaps the compiler in some way) may be: https://github.com/hanickadot/compile-time-regular-expressions by Hana Dusíková

As we currently regex at runtime over the constexpr mangled kernel name to rip properties from it, doing this at compile time may offset a lot of the run-time impact that we incur.

[SYCL] What is the purpose of the reqd_work_group_size extension?

I noticed that there is a Xilinx-specific extension for reqd_work_group_size. I'm struggling to grasp the purpose of this extension since the SYCL specification already defines a function attribute in section 6.7. Is the extension simply for convenience or does it have some other implications I'm not aware of?

[SYCL] Compiler crashes with no input files

This is somewhat obscure. The command line

clang++ -fsycl foo.cpp -lbla

will produce a segmentation fault if the compiler can't find foo.cpp. This only occurs when trying to link a library, the command line without -lbla will just tell me that there is no foo.cpp.

Error message:

clang-9: error: no such file or directory: 'foo.cpp'                       
Stack dump:                                                                                                                                                                                                
0.      Program arguments: clang++ -fsycl foo.cpp -lbla
1.      Compilation construction
2.      Building compilation jobs                                                                                                                                                                          
3.      Building compilation jobs                                                                 
4.      Building compilation jobs
5.      Building compilation jobs
6.      Computing output path                   
 #0 0x00005610f535c83a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/home/jan/software/sycl/xilinx/bin/clang-9+0x257e83a)                                                                               
 #1 0x00005610f535a554 llvm::sys::RunSignalHandlers() (/home/jan/software/sycl/xilinx/bin/clang-9+0x257c554)
 #2 0x00005610f535a6d5 SignalHandler(int) (/home/jan/software/sycl/xilinx/bin/clang-9+0x257c6d5)
 #3 0x00007fb6ace00f40 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x13f40)
 #4 0x00007fb6ac9ec0c7 /build/glibc-KRRWSm/glibc-2.29/string/../sysdeps/x86_64/multiarch/strlen-avx2.S:96:0
 #5 0x00005610f58c5c45 clang::driver::Driver::GetNamedOutputPath(clang::driver::Compilation&, clang::driver::JobAction const&, char const*, llvm::StringRef, bool, bool, llvm::StringRef) const (/home/jan/s
oftware/sycl/xilinx/bin/clang-9+0x2ae7c45)
 #6 0x00005610f58c9a50 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*
, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action co
nst*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_trait
s<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/home/jan/software/sycl/xilinx/bin/clang-9+0x2aeba50)
 #7 0x00005610f58ca8f7 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::
map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, s
td::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>
, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/home/jan/software/sycl/xilinx/bin/clang-9+0x2aec8f7)
 #8 0x00005610f58c8a56 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*
, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action co
nst*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_trait
s<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/home/jan/software/sycl/xilinx/bin/clang-9+0x2aeaa56)
 #9 0x00005610f58ca8f7 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::
map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, s
td::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>
, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/home/jan/software/sycl/xilinx/bin/clang-9+0x2aec8f7)
#10 0x00005610f58ccd34 void llvm::function_ref<void (clang::driver::Action*, clang::driver::ToolChain const*, char const*)>::callback_fn<clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Com
pilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::c
har_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >,
 std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Ac
tion::OffloadKind) const::'lambda0'(clang::driver::Action*, clang::driver::ToolChain const*, char const*)>(long, clang::driver::Action*, clang::driver::ToolChain const*, char const*) (/home/jan/software/s
ycl/xilinx/bin/clang-9+0x2aeed34)
#11 0x00005610f59c7c54 clang::driver::OffloadAction::doOnEachDeviceDependence(llvm::function_ref<void (clang::driver::Action*, clang::driver::ToolChain const*, char const*)> const&) const (/home/jan/softw
are/sycl/xilinx/bin/clang-9+0x2be9c54)
#12 0x00005610f58c824a clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*
, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action co
nst*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_trait
s<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/home/jan/software/sycl/xilinx/bin/clang-9+0x2aea24a)
#13 0x00005610f58ca8f7 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::
map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, s
td::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>
, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/home/jan/software/sycl/xilinx/bin/clang-9+0x2aec8f7)
#14 0x00005610f58cae50 clang::driver::Driver::BuildJobs(clang::driver::Compilation&) const (/home/jan/software/sycl/xilinx/bin/clang-9+0x2aece50)
#15 0x00005610f58cc6a5 clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) (/home/jan/software/sycl/xilinx/bin/clang-9+0x2aee6a5)
#16 0x00005610f391f467 main (/home/jan/software/sycl/xilinx/bin/clang-9+0xb41467)
#17 0x00007fb6ac88cb6b __libc_start_main /build/glibc-KRRWSm/glibc-2.29/csu/../csu/libc-start.c:342:3
#18 0x00005610f39a5aca _start (/home/jan/software/sycl/xilinx/bin/clang-9+0xbc7aca)
[1]    25391 segmentation fault (core dumped)  clang++ -fsycl foo.cpp -lbla

I'm on the latest commit of the master branch (commit c3c6023).

I don't have the Intel compiler available right now, so I can't confirm if this is specific to this compiler or is an upstream issue.

[SYCL] SYCL SLAMBench Port

Port the existing SLAMBench implementation over to see how feasible it is to run a large application with the compiler at the moment: https://github.com/agozillon/syclslambench

This should mostly be working out how to integrate the compiler appropriately with the existing CMake and syclcc script infrastructure. However, integrating it could unearth some problems with the existing compiler infrastructure or existing runtime bugs.

This should start with Integrating the Intel Compilation flow into SYCL SLAMBench and then adding the Xilinx xocc compilation flow to it. Doing it this way should hopefully make life a little easier.

sycl/build/bin/opt: error: expected top-level entity

I am currently trying to integrate the sycl/unified/next compiler into alpaka's (https://github.com/alpaka-group/alpaka) CMake infrastructure. Compiling a simple test case (axpy) works fine but once the link stage is reached it fails with the following error:

[100%] Linking CXX executable axpy
cd /home/jan/workspace/zoo/alpaka/build/test/integ/axpy && /snap/cmake/703/bin/cmake -E cmake_link_script CMakeFiles/axpy.dir/link.txt --verbose=1
/home/jan/sycl-workspace/sycl/build/bin/clang++ -O3 -DNDEBUG -fsycl -fsycl-targets=fpga64-xilinx-unknown-sycldevice CMakeFiles/axpy.dir/src/axpy.cpp.o -o axpy  ../../common/libcommon.a -lpthread /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libOpenCL.so -lsycl ../../catch_main/libCatchMain.a 
++ pwd
+ CWD=/home/jan/workspace/zoo/alpaka/build/test/integ/axpy
+ export VERBOSE=1
+ VERBOSE=1
+ [[ -z /home/jan/software/xilinx/Vitis/2020.1/bin ]]
+ [[ -z /home/jan/sycl-workspace/sycl/build/bin ]]
+ [[ -z axpy.cpp-37c5af ]]
+ [[ -z /tmp/axpy.cpp-37c5af.txt ]]
+ [[ -z /tmp ]]
+ [[ -z /tmp/axpy-e9e831.bc ]]
+ [[ -z /tmp/sycl-xocc-args2712-cbcee1 ]]
+ [[ -z /tmp/sycl-xocc-args2716-8d3ef5 ]]
+ [[ -z sw_emu ]]
+ SDX_BIN_PATH_DIR=/home/jan/software/xilinx/Vitis/2020.1/bin
+ [[ ! -d /home/jan/software/xilinx/Vitis/2020.1/bin ]]
+ SDX_LIB_SPIR=/home/jan/software/xilinx/Vitis/2020.1/bin/../lnx64/lib/libspir64-39-hls.bc
+ [[ ! -f /home/jan/software/xilinx/Vitis/2020.1/bin/../lnx64/lib/libspir64-39-hls.bc ]]
+ SDX_CLANG_LLVM_BIN=/home/jan/software/xilinx/Vitis/2020.1/bin/../lnx64/tools/clang-3.9-csynth/bin
+ [[ ! -f /home/jan/software/xilinx/Vitis/2020.1/bin/../lnx64/tools/clang-3.9-csynth/bin/llvm-as ]]
+ SDX=/home/jan/software/xilinx/Vitis/2020.1/bin/v++
+ [[ ! -f /home/jan/software/xilinx/Vitis/2020.1/bin/v++ ]]
+ DRIVER_PATH_DIR=/home/jan/sycl-workspace/sycl/build/bin
+ [[ ! -d /home/jan/sycl-workspace/sycl/build/bin ]]
+ OPT=/home/jan/sycl-workspace/sycl/build/bin/opt
+ [[ ! -f /home/jan/sycl-workspace/sycl/build/bin/opt ]]
+ LLVM_LINK=/home/jan/sycl-workspace/sycl/build/bin/llvm-link
+ [[ ! -f /home/jan/sycl-workspace/sycl/build/bin/llvm-link ]]
+ [[ ! -d /tmp ]]
+ SOURCE_FILE_NAME=axpy.cpp-37c5af
+ INPUT_FILE_NAME=/tmp/axpy.cpp-37c5af.txt
++ mktemp /tmp/sycl-xocc.XXXXXX
+ TMP_PATH=/tmp/sycl-xocc.s4Ko3U
+ OUTPUT_FILE_NAME=/tmp/axpy-e9e831.bc
++ cat /tmp/sycl-xocc-args2712-cbcee1
+ ADDITIONAL_COMPILE_ARGS=
++ cat /tmp/sycl-xocc-args2716-8d3ef5
+ ADDITIONAL_LINK_ARGS=
+ XCL_EMULATION_MODE=sw_emu
+ KERNELPROP=/tmp/sycl-xocc.s4Ko3U_KernelProperties_axpy.cpp-37c5af.bash
+ cp /tmp/axpy.cpp-37c5af.txt /tmp/sycl-xocc.s4Ko3U_axpy.cpp-37c5af_kernels.bc
+ /home/jan/sycl-workspace/sycl/build/bin/opt -asfix -globaldce -O3 -globaldce --infer-address-spaces -flat-address-space=4 -globaldce -inSPIRation -globaldce -kernelPropGen --sycl-kernel-propgen-output /tmp/sycl-xocc.s4Ko3U_KernelProperties_axpy.cpp-37c5af.bash /tmp/axpy.cpp-37c5af.txt -o /tmp/sycl-xocc.s4Ko3U_axpy.cpp-37c5af_kernels-optimized.bc
/home/jan/sycl-workspace/sycl/build/bin/opt: /tmp/axpy.cpp-37c5af.txt:1:1: error: expected top-level entity
/tmp/axpy-4e596d-e13720.devo
^
clang-12: error: sycl-link-xocc command failed with exit code 1 (use -v to see invocation)
make[3]: *** [test/integ/axpy/CMakeFiles/axpy.dir/build.make:107: test/integ/axpy/axpy] Error 1
make[3]: Leaving directory '/home/jan/workspace/zoo/alpaka/build'
make[2]: *** [CMakeFiles/Makefile2:1710: test/integ/axpy/CMakeFiles/axpy.dir/all] Error 2
make[2]: Leaving directory '/home/jan/workspace/zoo/alpaka/build'
make[1]: *** [CMakeFiles/Makefile2:1717: test/integ/axpy/CMakeFiles/axpy.dir/rule] Error 2
make[1]: Leaving directory '/home/jan/workspace/zoo/alpaka/build'
make: *** [Makefile:610: axpy] Error 2

Is there anything I can do to prevent or further investigate this error?

Compiler crash in hw_emu mode during Aggressive Dead Code Elimination

This is not related to #22. I'm using the code from #28 with the following modifications:

  • all discard_write accesses are changed to write (see #31)
  • the call to cl::sycl::rsqrt in line 68 is changed to 1.f / cl::sycl::sqrt as there seems to be a missing symbol for the Alveo U200 emulation:
/tools/Xilinx/SDx/2018.3/data/emulation/unified/cpu_em/generic_pcie/model/genericpciemodel: symbol lookup error: /home/jstephan/Downloads/.run/8220/sw_emu/device0/binary_1/dltmp: undefined symbol: rsqrtf

With the current compiler and OpenCL runtime I am able to successfully run the program in sw_emu mode. When changing to hw_emu the compilation will crash:

$ clang++ -std=c++2a -fsycl -fsycl-targets=fpga64-xilinx-unknown-sycldevice nbody.cpp -o nbody.hw_emu -lOpenCL -I/opt/xilinx/xrt/include/
warning: Linking two modules of different data layouts: '/tools/Xilinx/SDx/2018.3/bin/../lnx64/lib/libspir64-39-hls.bc' is 'e-m:e-i64:64-i128:128-i256:256-i512:512-i1024:1024-i2048:2048-i4096:4096-n8:16:32:64-S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024' whereas 'llvm-link' is 'e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024'

warning: Linking two modules of different target triples: /tools/Xilinx/SDx/2018.3/bin/../lnx64/lib/libspir64-39-hls.bc' is 'fpga64-xilinx-none' whereas 'llvm-link' is 'spir64'


****** xocc v2018.3 (64-bit)
  **** SW Build 2405991 on Thu Dec  6 23:36:41 MST 2018
    ** Copyright 1986-2018 Xilinx, Inc. All Rights Reserved.

Attempting to get a license: ap_opencl
Feature available: ap_opencl jstephan@ite196  ~/Downloads 
INFO: [XOCC 60-1306] Additional information associated with this xocc compile can be found at:
	Reports: /home/jstephan/Downloads/_x/reports/x13642406682781150725
	Log files: /home/jstephan/Downloads/_x/logs/x13642406682781150725
INFO: [XOCC 60-585] Compiling for hardware emulation target
Running SDx Rule Check Server on port:40935
INFO: [XOCC 60-895]   Target platform: /opt/xilinx/platforms/xilinx_u200_xdma_201830_1/xilinx_u200_xdma_201830_1.xpfm
WARNING: [XOCC 74-49] Failed to read in file /tools/Xilinx/SDx/2018.3/platforms/achromatic during platform validation.
INFO: [XOCC 60-423]   Target device: xilinx_u200_xdma_201830_1
INFO: [XOCC 60-242] Creating kernel: 'x13642406682781150725'
ERROR: [XOCC 17-1309] Gcc: #18 0x00007f47c721409b __libc_start_main /build/glibc-B9XfQf/glibc-2.28/csu/../csu/libc-start.c:308:16
ERROR: [XOCC 60-398] clang failed
ERROR: [XOCC 60-599] Kernel compilation failed to complete
ERROR: [XOCC 60-592] Failed to finish compilation

The log file reports the following:

clang: warning: argument unused during compilation: '-include /tools/Xilinx/Vivado/2018.3/bin/../include/clc.h' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-include /tools/Xilinx/Vivado/2018.3/bin/../include/etc/autopilot_ssdm_op.h' [-Wunused-command-line-argument]
warning: overriding the module target triple with spir64-unknown-unknown [-Woverride-module]
#0 0x00000000018d831a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x18d831a)
#1 0x00000000018d645e llvm::sys::RunSignalHandlers() (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x18d645e)
#2 0x00000000018d6582 SignalHandler(int) (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x18d6582)
#3 0x00007fdb2daabdd0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12dd0)
#4 0x00000000016be28a (anonymous namespace)::AggressiveDeadCodeElimination::collectLiveScopes(llvm::DILocation const&) (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x16be28a)
#5 0x00000000016beeb7 (anonymous namespace)::AggressiveDeadCodeElimination::markLive(llvm::Instruction*) (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x16beeb7)
#6 0x00000000016bfa39 (anonymous namespace)::AggressiveDeadCodeElimination::markLiveInstructions() (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x16bfa39)
#7 0x00000000016c78e3 (anonymous namespace)::ADCELegacyPass::runOnFunction(llvm::Function&) [clone .part.347] (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x16c78e3)
#8 0x0000000001461e52 llvm::FPPassManager::runOnFunction(llvm::Function&) (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x1461e52)
#9 0x0000000001461ee3 llvm::FPPassManager::runOnModule(llvm::Module&) (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x1461ee3)
#10 0x00000000014627f4 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x14627f4)
#11 0x0000000001a9978a clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x1a9978a)
#12 0x00000000022939ad clang::CodeGenAction::ExecuteAction() (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x22939ad)
#13 0x0000000001e55ef6 clang::FrontendAction::Execute() (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x1e55ef6)
#14 0x0000000001e22d56 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x1e22d56)
#15 0x0000000001eea523 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x1eea523)
#16 0x0000000000a5ed08 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0xa5ed08)
#17 0x00000000009e43fe main (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0x9e43fe)
#18 0x00007fdb2cd5709b __libc_start_main /build/glibc-B9XfQf/glibc-2.28/csu/../csu/libc-start.c:308:16
#19 0x0000000000a5c399 _start (/tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang+0xa5c399)
Stack dump:
0.      Program arguments: /tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/bin/clang -cc1 -triple spir64-unknown-unknown -emit-llvm-bc -emit-llvm-uselists -disable-free -disable-llvm-verifier -discard-value-names -main-file-name -linked.xpirbc -mrelocation-model static -mthread-model posix -fhls -mhls-ir -disable-O0-optnone -mdisable-fp-elim -no-integrated-as -mconstructor-aliases -dwarf-column-info -debug-info-kind=limited -dwarf-version=4 -debugger-tuning=gdb -coverage-notes-file /home/jstephan/Downloads/_x/x13642406682781150725/x13642406682781150725/x13642406682781150725/x13642406682781150725.clc.00.gcno -resource-dir /tools/Xilinx/Vivado/2018.3/lnx64/tools/clang-3.9-csynth/lib/clang/7.0.0 -Wall -std=CL1.2 -fno-dwarf-directory-asm -fdebug-compilation-dir /home/jstephan/Downloads/_x/x13642406682781150725/x13642406682781150725/x13642406682781150725 -ferror-limit 19 -fmessage-length 0 -cl-kernel-arg-info -fgnu89-inline -fobjc-runtime=gcc -fdiagnostics-show-option -mhls-ir -mlink-opencl-bitcode /tools/Xilinx/Vivado/2018.3/lnx64/lib/libspir64-39-hls.bc -mllvm -hls-bitcode-version=3.1 -mllvm -hls-top-function-name=x13642406682781150725 -disable-llvm-optzns -mllvm -xcl-xmlinfo=/home/jstephan/Downloads/_x/x13642406682781150725/x13642406682781150725/x13642406682781150725/kernel.xml -o /home/jstephan/Downloads/_x/x13642406682781150725/x13642406682781150725/x13642406682781150725/x13642406682781150725.clc.00.bc -x ir /tmp/-linked.xpirbc
1.      Per-module optimization passes
2.      Running pass 'Function Pass Manager' on module '/tmp/-linked.xpirbc'.
3.      Running pass 'Aggressive Dead Code Elimination' on function '@x13642406682781150725'
clang: error: unable to execute command: Segmentation fault (core dumped)
clang: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 7.0.0
Target: spir64-unknown-unknown
Thread model: posix
InstalledDir: /home/jstephan/software/sycl/bin
clang: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
clang: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.

[SYCL] Refactor the SYCL xocc driver implementation

This is a preliminary list of things that I believe need to be done, they may or may not hold true in actual implementation.

Modifications that need to be done:

  • We need to define target triples for our devices, whether that's at least 1 target triple for all Xilinx devices or multiple triples for all targets is up for debate.
  • xocc Assembler and Linker should be made into a separate ToolChain and divorced from SYCL.h/SYCL.cpp
  • Decide where our sycl-xocc shell should reside, possibly rename it sycl-x or so if we wish to make this shell script handle all targets.

Optional that may need some more thought or experimentation (or another issue after investigation), but generally make things neater:

  • There are some construct jobs like constructing opt's and linking that occur inside the xocc Assembler, these can in theory be moved into the shell script and simplify the C++ code. But it makes things a little more obscure for people initially working with the ToolChain. Less requirement on them understanding how ToolChains work however.
  • xocc Assembler and Linker merged into just a Linker, easily achievable and aligns ourselves closer to the SYCL.cpp. Currently there is a hack we do to emit-llvm in the pre-Assembly stage of the driver similar to how the Intel implementation forces emit-spirv (now emit-llvm-bc) pre-Linker, but it should be possible to tap into this and to tell the driver we wish to output LLVM IR and not llvm-bc.
  • We are stuck using a wrapper shell for xocc, which is fine, do we have 1 master script or multiple scripts

This is related to: intel/llvm#53

[SYCL] Refactor SPIR implementation

This is related to: intel/llvm#44

The current way we support SPIR builtins is as follows:

We attempt to translate any function in the cl::_spirv (after the Reflower pass this namespace looks like: spirv_ocl) namespace to its SPIR builtin mangling. This occurs in our LLVM InSPIRation pass and works by removing the namespace mangling from the builtin and altering its Z value (Z representing the number of characters in a functions mangled name excluding arguments). There is currently no check to make sure they exist or do not exist as SPIR functions at compile time, if it doesn't exist you'll get a runtime ABI exception (can't find X function) as it won't have linked properly with xocc's SPIR builtin library.

Currently id builtins (get_global_id) are defined in spirv_vars.hpp alongside the SPIRV equivalents and based on a define the compiler creates based on our -fsycl-xocc-device the SPIR builtins are swapped with their SPIRV equivalents. For the math functions we hook into the exact same math builtins as SPIRV inside builtins.hpp without any requirement to optionally swap builtins out.

Short Term Goal:

Move SPIR specific builtins from a cl::__spirv namespace to an equivalent cl::__spir namespace and separate files/folders where necessary. This will lead to something a little cleaner and easier to understand and also divorce the SPIR builtins from transformation by the Reflower making it perhaps a little more stable. As currently any changes to the Reflower's translation of the cl::__spirv namespace is felt by the InSPIRation pass.

It may be more worthwhile to ignore this short term goal and wait for the long term goal based on our priorities.

Long Term Goal:

The idea would be to transform what we currently do, to the newer builtins offloading method where the ToolChain dictates what library functions are used in place of certain placeholder builtins (as explained by Alexey briefly in the linked issue). I believe Alexey is currently looking into this, but once this has been merged into the main implementation we should consider refactoring our implementation to align with this. This may take a reasonable amount of reworking.

[SYCL][Runtime] Fix problem with 2D/3D accessor handler copy's

There is an existing error in handler Copy for Xilinx FPGA emulation, XRT complains and throws errors at runtime when 2D/3D accessors are used with the device Handlers copy:

[XRT] ERROR: src_origin,region,src_row_pitch,src_slice_pitch out of range
[XRT] ERROR: src_origin,region,src_row_pitch,src_slice_pitch out of range
[XRT] ERROR: src_origin,region,src_row_pitch,src_slice_pitch out of range
OpenCL API failed. /storage/ogozillo/intel-sycl/sycl/sycl/source/detail/
memory_manager.cpp:282: OpenCL API returns: -30 (CL_INVALID_VALUE)
[XRT] ERROR: src_origin,region,src_row_pitch,src_slice_pitch out of range
[XRT] ERROR: Internal error. cl_mem doesn't map to buffer object
[XRT] ERROR: Internal error. cl_mem doesn't map to buffer object
[XRT] ERROR: Internal error. cl_mem doesn't map to buffer object
OpenCL API failed. /storage/ogozillo/intel-sycl/sycl/sycl/source/detail/
memory_manager.cpp:243: OpenCL API returns: -6 (CL_OUT_OF_HOST_MEMORY)
[XRT] ERROR: Internal error. cl_mem doesn't map to buffer object
terminate called after throwing an instance of 'cl::sycl::runtime_error'
free(): corrupted unsorted chunks

Two examples that reproduces this are: https://github.com/triSYCL/sycl/blob/sycl/unified/next/sycl/test/basic_tests/handler/handler_mem_op.cpp#L280
and https://github.com/triSYCL/sycl/blob/sycl/unified/next/sycl/test/basic_tests/handler/handler_mem_op.cpp#L309

It seems to stem from the usage of clEnqueueCopyBufferRect, XRT doesn't appear too pleased, it's the first thrown OpenCL API error at line 282 of the memory_manager.cpp, the subsequent failures are most likely related to XRT disliking the clEnqueueCopyBufferRect call.

[SYCL] Missing and/or Incorrect xocc SPIR Math builtins

Some math manglings from xocc's SPIR libraries appear to be incorrect or missing. In the sense that math functions correctly translated to their SPIR mangled names (found in: https://github.com/KhronosGroup/SPIR-Tools/wiki/SPIR-2.0-built-in-functions) result in runtime errors caused by missing symbols.

Two known examples are the mad(float, float, float) (_Z3madfff) function used in the test math_mangle.cpp and fmin(float2, float2) (_Z4fminDv2_fDv2_f) found in the test vector_math.cpp.

In at least fmin's case it appears to be that the mangling of the arguments type is different within xocc's SPIR library.

This issue will require some further investigation, but a preliminary look at the issue makes it seem like it will require modifications to the InSPIRation pass to support these alternate SPIR manglings in certain cases.

[SYCL] POCL SPIR Builtin Translation

As an extension to #9 there are currently some incorrectly mangled SPIR builtins in POCL (pocl/pocl#698) that will need to be handled for clean execution of SYCL spir-df output by a POCL runtime.

The incorrect mangling appears to be caused by some type aliasing via using declarations in POCL's _builtin_renames.h file and currently affects a significant amount of math functions, some atomics and printf.

I can currently see this being handled in one of two ways:

  • Rework the existing POCL implementation to use the correct function names and then put in a pull request to try and change the upstream implementation. This would be the ideal long-term avenue, but may be very difficult as there will most likely be some reason behind the renaming of these builtins that could be a far reaching problem.
  • Alter the InSPIRation pass to optionally translate the correct SPIR mangled names to the altered POCL renamed builtins, effectively by passing the need to alter POCL. The easier of the two options but inflicts a certain amount of code debt on us and if the POCL issue gets fixed then we'll have to alter the InSPIRation pass again.

[SYCL] Can't cancel sycl-xocc driver script

One side affect of using a script to help drive our ToolChain at the moment is that CTRL-C inside of a terminal to cancel the compilation will not always cancel the currently executing script process, it will continue to fire in the background until complete. This is particularly notable when compiling a lot of kernels in a single source file e.g. the reqd_work_group_size.cpp test. A solution to this will require further investigation.

[SYCL] Compiler forgets typedefs

In sw_emu mode (didn't test the others) clang will forget the integer typedefs inside namespace std after xocc has completed its tasks.

Example code:

#include <cstdint>
#include <CL/sycl.hpp>

template <std::uint32_t Var>
struct foo
{
    cl::sycl::accessor<int, 1, cl::sycl::access::mode::read_write> acc;

    auto operator()()
    {
        for(auto i = 0; i < 1024; ++i)
            acc[i] = Var;
    }
};

struct xilinx_selector : public cl::sycl::device_selector
{
    auto operator()(const cl::sycl::device& dev) const -> int
    {
        const auto vendor = dev.get_info<cl::sycl::info::device::vendor>();
        return (vendor.find("Xilinx") != std::string::npos) ? 1 : -1;
    }
};

auto main() -> int
{
    auto queue = cl::sycl::queue{xilinx_selector{}};

    auto buf = cl::sycl::buffer<int>{cl::sycl::range<1>{1024}};

    queue.submit([&](cl::sycl::handler& cgh)
    {
        auto acc = buf.get_access<cl::sycl::access::mode::read_write>(cgh);

        auto kernel = foo<42>{acc};
        cgh.single_task<foo<42>>(kernel);
    });
    queue.wait();
}

Command line:

clang++ -std=c++2a -fsycl -fsycl-targets=fpga64-xilinx-unknown-sycldevice  test.cpp -lOpenCL

Output:

[xocc output snipped]
In file included from <built-in>:1:
/tmp/test-9a9f3e.h:6:16: error: no type named 'uint32_t' in namespace 'std'
template <std::uint32_t Var> struct foo;
          ~~~~~^
test.cpp:5:25: error: template non-type parameter has a different type 'std::uint32_t' (aka 'unsigned int') in template redeclaration
template <std::uint32_t Var>
                        ^
/tmp/test-9a9f3e.h:6:25: note: previous non-type template parameter with type 'int' is here
template <std::uint32_t Var> struct foo;
                        ^
test.cpp:36:23: error: implicit instantiation of undefined template 'foo<42>'
        auto kernel = foo<42>{acc};
                      ^
/tmp/test-9a9f3e.h:6:37: note: template is declared here
template <std::uint32_t Var> struct foo;
                                    ^
3 errors generated.

[SYCL] hw_emu: Syn check fail when using C-style arrays inside kernel

During one of my misguided attempts to use burst reads/writes (see the example at the bottom) I managed to break the compiler in hw_emu mode. Error log mentioned in console output:

clang: warning: argument unused during compilation: '-include /opt/xilinx/Vivado/2019.1/bin/../include/clc.h' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-include /opt/xilinx/Vivado/2019.1/bin/../include/etc/autopilot_ssdm_op.h' [-Wunused-command-line-argument]
warning: overriding the module target triple with spir64-unknown-unknown [-Woverride-module]
error: Syn check fail!

Command line and output:

$ clang++ -O3 -std=c++2a -fsycl -fsycl-targets=fpga64-xilinx-unknown-sycldevice main.cpp -o evil_array -lOpenCL -I/opt/xilinx/xrt/include

warning: Linking two modules of different data layouts: '/opt/xilinx/SDx/2019.1/bin/../lnx64/lib/libspir64-39-hls.bc' is 'e-m:e-i64:64-i128:128-i256:256-i512:512-i1024:1024-i2048:2048-i4096:4096-n8:16:32:64-S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024' whereas 'llvm-link' is 'e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024'

warning: Linking two modules of different target triples: /opt/xilinx/SDx/2019.1/bin/../lnx64/lib/libspir64-39-hls.bc' is 'fpga64-xilinx-none' whereas 'llvm-link' is 'spir64'

Invoking xocc Kernel Compilation
xocc: /opt/xilinx/SDx/2019.1/bin/xocc
--target: hw_emu
--platform: xilinx_u200_xdma_201830_2
Compiling kernel: xSYCL9318603472913316893
Outputting file to: /tmp/xSYCL9318603472913316893.xo
Input file is: /tmp/main_kernels-linked.xpirbc
Option Map File Used: '/opt/xilinx/SDx/2019.1/data/sdx/xocc/optMap.xml'

****** xocc v2019.1 (64-bit)
  **** SW Build 2552052 on Fri May 24 14:47:09 MDT 2019
    ** Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

Attempting to get a license: ap_opencl
Feature available: ap_opencl
INFO: [XOCC 60-1306] Additional information associated with this xocc compile can be found at:
	Reports: /home/jan/workspace/sycl-bugreports/xilinx/local_array/_x/reports/xSYCL9318603472913316893
	Log files: /home/jan/workspace/sycl-bugreports/xilinx/local_array/_x/logs/xSYCL9318603472913316893
INFO: [XOCC 60-585] Compiling for hardware emulation target
INFO: [XOCC 60-1316] Initiating connection to rulecheck server, at Fri Sep 20 17:45:27 2019
Running Rule Check Server on port:46287
INFO: [XOCC 60-1315] Creating rulecheck session with output '/home/jan/workspace/sycl-bugreports/xilinx/local_array/_x/reports/xSYCL9318603472913316893/xocc_compile_xSYCL9318603472913316893_guidance.html', at Fri Sep 20 17:45:28 2019
INFO: [XOCC 60-895]   Target platform: /opt/xilinx/platforms/xilinx_u200_xdma_201830_2/xilinx_u200_xdma_201830_2.xpfm
INFO: [XOCC 60-423]   Target device: xilinx_u200_xdma_201830_2
INFO: [XOCC 60-242] Creating kernel: 'xSYCL9318603472913316893'
ERROR: [XOCC 60-399] clang failed, please see log file for detail: '/home/jan/workspace/sycl-bugreports/xilinx/local_array/_x/xSYCL9318603472913316893/xSYCL9318603472913316893/xSYCL9318603472913316893/xSYCL9318603472913316893_clang.log'
ERROR: [XOCC 60-599] Kernel compilation failed to complete
ERROR: [XOCC 60-592] Failed to finish compilation
Option Map File Used: '/opt/xilinx/SDx/2019.1/data/sdx/xocc/optMap.xml'

****** xocc v2019.1 (64-bit)
  **** SW Build 2552052 on Fri May 24 14:47:09 MDT 2019
    ** Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

ERROR: [XOCC 60-602] Source file does not exist: /tmp/xSYCL9318603472913316893.xo
ERROR: [XOCC 60-623] Unsupported input file type specified.
/usr/bin/ld: /tmp/main-9bcdb9.o: file not recognized: file truncated
clang-9: error: sycl-link-xocc command failed with exit code 255 (use -v to see invocation)
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

Code:

#include <CL/sycl.hpp>

struct kernel;

auto loop(cl::sycl::accessor<std::size_t, 3, cl::sycl::access::mode::read_write> s)
{
    for(std::size_t z = 0; z < 42; ++z)
    {
        for(std::size_t y = 0; y < 42; ++y)
        {
            std::size_t in_row[42];
            std::size_t out_row[42];

            for(std::size_t x = 0; x < 42; ++x)
            {
                const auto id = cl::sycl::id<3>{x, y, z};
                in_row[x] = s[id];
            }

            for(std::size_t x = 0; x < 42; ++x)
                out_row[x] = in_row[x] + 42;

            for(std::size_t x = 0; x < 42; ++x)
            {
                const auto id = cl::sycl::id<3>{x, y, z};
                s[id] = out_row[x];
            }
        }
    }
}

auto main() -> int
{
    auto queue = cl::sycl::queue{};

    auto s_buf = cl::sycl::buffer<std::size_t, 3>{
                    cl::sycl::range<3>{42, 42, 42}};

    queue.submit([&](cl::sycl::handler& cgh)
    {
        auto s = s_buf.get_access<cl::sycl::access::mode::read_write>(cgh);

        cgh.single_task<kernel>([=]()
        {
            loop(s);
        });
                                                
    });
    queue.wait();

    return 0;
}

[SYCL][Driver] Driver Cleanup/Refactor Round 2

Perhaps worth delaying this for a while until we have use cases for it, as the upstream driver isn't cemented and ours works "fine" for the moment.

But there is now a nice clean way to integrate backends for ahead of time compilation via BC, if we want to go down that route: https://github.com/intel/llvm/pull/262/files

So it is possible to integrate our XOCC ToolChain back in at some point if we keep it simple and script based. It's worth seeing what we need to do with other backends we're interested in first though before we fully commit to that I think. As we could, keep them all separate each having its own ToolChain (maybe not as easy to maintain but easier if they're vastly different implementation wise), integrate them into SYCL (all in one place, maybe a little difficult for us when pulling from upstream but we are force fed good new driver changes) or have a parallel to the SYCL ToolChain for Xilinx tools (no pulling from upstream impact, we can adopt changes when and how we want, but we have everything we need in the one spot).

Other than that there are a lot of driver options we do not support right now or if we do support them, it's by chance and I've had no use case to test it yet. One example is targeting several devices for ahead-of-time compilation at once.

sycl-xocc: Wildcard in script not working correctly

The wildcards in the lines below fail for me after the upgrade to Vitis 2020.2. I have to manually edit the sycl-xocc script and replace 2* with 2020.2.

SDX_LIB_SPIR="$SDX_BIN_PATH_DIR/../../../Vitis_HLS/2*/lnx64/lib/libspir64-39-hls.bc"

SDX_CLANG_LLVM_BIN="$SDX_BIN_PATH_DIR/../../../Vitis_HLS/2*/lnx64/tools/clang-3.9-csynth/bin"

If I don't do this I end up with the following error:

++ pwd
+ CWD=/home/jan/workspace/zoo/alpaka/build/test/integ/axpy
+ export VERBOSE=1
+ VERBOSE=1
+ [[ -z /home/jan/software/xilinx/Vitis/2020.2/bin ]]
+ [[ -z /home/jan/sycl-workspace/SplitCompilation/sycl/build/bin ]]
+ [[ -z axpy.cpp-2f6f39 ]]
+ [[ -z /tmp/axpy.cpp-2f6f39.txt ]]
+ [[ -z /tmp ]]
+ [[ -z /tmp/axpy-671ac1.bc ]]
+ [[ -z /tmp/sycl-xocc-args2712-3d2a4b ]]
+ [[ -z /tmp/sycl-xocc-args2716-4c91c5 ]]
+ [[ -z sw_emu ]]
+ SDX_BIN_PATH_DIR=/home/jan/software/xilinx/Vitis/2020.2/bin
+ [[ ! -d /home/jan/software/xilinx/Vitis/2020.2/bin ]]
+ SDX_LIB_SPIR=/home/jan/software/xilinx/Vitis/2020.2/bin/../lnx64/lib/libspir64-39-hls.bc
+ [[ ! -f /home/jan/software/xilinx/Vitis/2020.2/bin/../lnx64/lib/libspir64-39-hls.bc ]]
+ SDX_LIB_SPIR='/home/jan/software/xilinx/Vitis/2020.2/bin/../../../Vitis_HLS/2*/lnx64/lib/libspir64-39-hls.bc'
+ [[ ! -f /home/jan/software/xilinx/Vitis/2020.2/bin/../../../Vitis_HLS/2*/lnx64/lib/libspir64-39-hls.bc ]]
+ usage 9 'could not find SDx'\''s libspir64-39-hls.bc'
+ echo /home/jan/sycl-workspace/SplitCompilation/sycl/build/bin/sycl-xocc: error: could not find 'SDx'\''s' libspir64-39-hls.bc
/home/jan/sycl-workspace/SplitCompilation/sycl/build/bin/sycl-xocc: error: could not find SDx's libspir64-39-hls.bc
+ exit 9
clang-12: error: sycl-link-xocc command failed with exit code 9 (use -v to see invocation)

I first assumed that this was because my default shell is zsh (version 5.8) but it also occurs if I run it from bash (version 5.0.17).

[SYCL] hls's libspir contains unoptimized IR and some buggy IR

This is an HLS bug not a trisycl bug

the library is located at $XILINX_ROOT/Vitis/2020.1/lnx64/lib/libspir64-39-hls.bc and is shipped as part of Vitis
it isn't optimized, which increases build time and it contains some buggy functions like

; Function Attrs: alwaysinline nounwind
define spir_func float @erfc_impl(float %x) #29 {
  %retval = alloca float, align 4
  %x.addr = alloca float, align 4
  store float %x, float* %x.addr, align 4
  call void @llvm.trap()
  unreachable

1:                                                ; No predecessors!
  %2 = load float, float* %retval, align 4
  ret float %2
}

[SYCL] Merge KernelNameGen pass and InSPIRation pass

KernelNameGen is a fairly simple LLVM pass at the moment that generates a list of kernel names to be used by the sycl-xocc shell script, it should be possible to refactor the functionality from here into InSPIRation to make maintenance simpler in the long term.

Although optionally, in the long term this pass could be expanded upon to generate other optimization data on a per kernel basis that needs to be linked to a specific kernel name and fed to the xocc compiler (compute unit details etc.).

[SYCL][XilinxFPGA] Known Issues

This is a non-exhaustive list of some larger problems relating to XIlinx FPGA compilation and runtime execution (some with more information than others) that need some thought long term:


Problem: If you create buffers and try to use some SYCL functionality that makes use of some underlying OpenCL functionality to modify the buffers as cl_mem objects before a kernel is invoked (single_task/parallel_for etc.) you'll incur XRT runtime errors e.g.:

[XRT] ERROR: Internal error. cl_mem doesn't map to buffer object

You should be able to see this in action if you comment out the "noop" SYCL kernel in the accessor_copy.cpp test.

Reason: XRT will not consider a device as "active" and use-able until you've loaded a binary as it can't query most of the information it needs, one unfortunate side affect of this is that cl_mem buffers are not appropriately assigned to a device and whenever you try to use something like a handler copy with a sycl accessor/buffer the underlying XRT OpenCL call will not be able to find the buffer in relation to a device (it queries devices for buffers, if they're not found, XRT is not pleased).

Possible fix Ideas:

  • Eager binary program loading as we know we'll only use pre-compiled binaries with our FPGAs
  • Lazy OpenCL buffer creation/operations, only do this after we know XRT will be happy i.e. binary loaded and good to go

Work around: Force start the device by using a noop kernel, not ideal and while it works around the issue on hw/sw emu I'm not too sure how real hardware will appreciate this.


Problem: Only 1 queue to a Xilinx FPGA device can exist at once, if you accidentally generate more than one XRT will not be happy.


Problem: All kernels require at least 1 accessor, 0 accessors will cause a compile error in xocc relating to no argument being bound to AXI_GMEM. Not too sure how many use cases there are for no accessors in a kernel but perhaps it shouldn't emit an error like this.


Problem: Cannot compile for Xilinx FPGA with -g, this prevents users using debug mode on the SYCL runtime not just the kernel.

Reason: I believe this is because we're generating a kernel file with a lot of debug code and trying to pipe that through xocc which doesn't really know how to handle all of the debug information.

Possible fix Ideas:

  • A long shot, but quick fix if it worked, add -g to the xocc compilation components of the sycl-xocc script. Perhaps it will realize that the kernel may come attached with some debug information in this case and handle it better. I find this unlikely to work, but it's low hanging fruit if it does...
  • More surefire fix: Do not compile the device compilation component of SYCL with -g, only compile the host component with -g and remove the -g from being pushed onto the device compilation. This will create a much simpler SPIR-df kernel to pass to xocc. Instead -g should be applied to the xocc compilation and linker commands to get the debug kernel information. This should circumvent any issues with debug information breaking xocc kernel compilation whilst still giving kernel debug information and SYCL runtime debug information. Shouldn't be too hard, just requires some driver tweaks.

Problem: Related to issue: #32 mixing structures inside of kernels can cause ICE's in one of xocc's passes: aggressive dead code elimination (AGDCE). The relevant minimal triggering SYCL test case for this is issue_related/agdce_ice.cpp.

Reason: Seems to be a problem relating to address space casting from a structure that is implemented outside of a kernel (ergo no address space) and when you try to index an accessor containing several passed in instantiations of the class/struct it will explode the AGDCE pass inside of the compiler as it will try to address space cast.

Status: I am led to believe it's a bug with xocc/Vivado HLS, so it appears to be outside of our jurisdiction, I have forwarded this issue onto someone on their team but it's low priority. May take a while for a fix without some follow up.


Problem: Boost Hana's times::with_index in conjunction with it's overloaded + operator will kill hw_emu and very likely hw as it will not completely optimize and inline with -03 as you would expect (and as it does in a non-SYCL -O3 pass). This leaves some external declarations and calls to functions but no definitions of the functions, so partial optimization/in-lining. Which is a little odd as the definitions exist prior to the -O3 pass and other boost hana functionality is appropriately inlined.

Reason: Current best guess is that the required index argument passed into the lambda passed to and invoked by with_index is the probable cause. It seems like it could be another address space cast related issue. The minimal test case for this issue is the example: boost_hana_functor_arg.cpp inside of the issue_related directory.

  cgh.single_task<class array_add>([=]() {
     boost::hana::int_<5>::times.with_index([&](const auto i) {
          a_rw[i+1] = 6;
      });
   });

So to highlight the issue, In the example there is an variable passed into the lambda from an externally defined function, this then gets used with the + operator. This + operator is overloaded inside of Boost Hana to support compile time usage. This snippet of code should be unrolled and inlined removing all of the Boost Hana related functionality. This doesn't happen and it seems to be because the argument passed into the function and being incremented and used with the value 1 will trigger an address space cast which I think will prevent the appropriate and required optimizations.

This is an an assumption though, based on the fact that the below code works fine:

  cgh.single_task<class array_add>([=]() {
          int i = 0;
           boost::hana::int_<N>::times([&] {
             a_rw[i+0] = 6;
             ++i;
         });
   });

The index variable and the random constant variable now exist in the same address space and the world all seems to be fine as far as the compiler is concerned.


Some of these "problems" are peculiarities in our FPGA compilation pipeline and non-standard OpenCL implementation and may not necessarily be "problems".

Tensorflow + Xilinx SYCL

Just poking around OpenCL/SYCL support in TF. Im wondering if I can use this fork of sycl as a backend to TF (and use 2019.2 v++ in the intermediate stage) and target the Alveo U2xx devices ? Thx

[SYCL] Add POCL support

It should be very simple to add support for POCL. Our kernels are already SPIR-df when we feed our FPGA target through. We just have some slight strangeness that we need to deal with for our HLS backend that's mixed in with some of the things we need for our SPIR-df transformations (mainly the kernel name hashing we do, we probably don't want or need that for POCL, but this is tied into the SYCL program_manager.cpp and the inSPIRation pass).

So briefly at a high level what should be done:

  • Create a new Clang driver flag or someway to notify the Clang driver to use a different ToolChain or Tool
  • This flag should also set up at least the define __SYCL_SPIR_DEVICE__, this is already done for the Xilinx FPGA target, you can grep over the Clang and SYCL directories for it.
  • Create a new Tool or ToolChain for POCL
  • Create a new script for POCL, this script will be much simpler than the sycl-xocc script. All it needs to do is run some Opt passes, mainly the ASFixer (this may change in the future) and the InSPIRation pass

And that should be it, but there may be one or two things that need some minor tweaking in the SYCL runtime as the XILINX_* specific flags are turned off.

Removing weird kernel name modifications:

  • Remove kernel name hashing from the InSPIRation pass and put it into it's own separate pass
  • This new pass should be added to the list of Opt passes ran in sycl-xocc or you'll probably break FPGA compilation
  • Alter the SYCL runtimes program_manager.cpp to only check for weird hashed names for kernels when it's a Xilinx device (perhaps even just a Xilinx FPGA device if that's possible).

However if you don't really care about one step compilation using a driver, I think all you'd probably need to do to get a SPIR-df kernel working with POCL using 2 step compilation is to compile the device side kernel with the __SYCL_SPIR_DEVICE__ macro defined and run the ASfixer and InSPIRation pass over them. There may have to be 1 or 2 other tweaks in the runtime as the XILINX_ related macros aren't defined and it's not likely to be a Xilinx device that's targeted but they shouldn't be too big of a problem.

If someone intends to implement this and there's any more questions on it, feel free to ping me in a comment and I'll get back to you ASAP I'm aware it's not incredibly detailed and I'm happy to ping people to the right segments of code if I still can...

How to specify host compiler for xocc?

I am currently facing some troubles while getting the SYCL compiler to run on a cluster equipped with Alveo U200 FPGAs. The cluster is running CentOS; unfortunately all default system tools are so ancient that they'd actually belong into a museum. The cluster's module system allows me to load more recent compilers and libraries, so I was able to compile the SYCL compiler (with some changes to sycl-clang's include files and overriding system include directories).

Running it opens up another can of worms since the Xilinx tools don't seem to play nice with the module system. xocc doesn't even consider using the compiler defined by the environment variables, instead it always wants to call /usr/bin/g++ through the xcpp wrapper. It also doesn't use the include directories I gave to the sycl-clang frontend, which causes the called g++ to fail because it doesn't find the standard library includes.

So far I tried adjusting the environment variables for the compiler (CPATH, CPLUS_INCLUDE_PATH, LD_LIBRARY_PATH) as well as copying xcpp to a local PATH and modifying the call to gcc, all to no avail or even more weird problems with header files and libraries.

is there some obvious way I'm missing that doesn't require any system-wide changes? I can't write to the SDAccel installation directory or any other system directory since I don't have root privileges.

[SYCL] Add build-bot and test support

Add build-bot and test support to the repository for Intel devices/tests to make sure pull requests are not breaking existing functionality. There should already be some functionality added for this that Intel added and use at the moment: https://github.com/triSYCL/sycl/tree/sycl/unified/master/buildbot

It would be nice to have FPGA compilation support for the build-bot tests, but it does not seem to be feasible as the compilation takes too long and the SDAccel package isn't an Ubuntu package (also the size of it would probably kill the Travis environment).

Missing access mode: discard_write

Using the same N-body code as in #28 I receive the following failed assertion once the control flow reaches the first host-to-device copy:

nbody.sw_emu: /home/jstephan/software/sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl.hpp:519: size_t cl::sycl::detail::buffer_impl<std::allocator<char> >::convertSycl2OCLMode(cl::sycl::access::mode) [AllocatorT = std::allocator<char>]: Assertion `0 && "Unhandled conversion from Sycl access mode to OCL one."' failed.

And the mode is indeed missing from buffer_impl.hpp. Is this a feature waiting for implementation? I haven't found anything related in the issues here or the original triSYCL repository.

Edit: Ah, I should've read @agozillon's message more thoroughly. I suppose this is the "runtime assertion" failing. This already seems to be fixed in Intel's implementation:

template <typename AllocatorT>
size_t
buffer_impl<AllocatorT>::convertSycl2OCLMode(cl::sycl::access::mode mode) {
  switch (mode) {
  case cl::sycl::access::mode::read:
    return CL_MEM_READ_ONLY;
  case cl::sycl::access::mode::write:
  case cl::sycl::access::mode::discard_write:
    return CL_MEM_WRITE_ONLY;
  case cl::sycl::access::mode::read_write:
  case cl::sycl::access::mode::discard_read_write:
  case cl::sycl::access::mode::atomic:
    return CL_MEM_READ_WRITE;
  default:
    assert(0 && "Unhandled conversion from Sycl access mode to OCL one.");
    return 0;
  }
}

Array Partitioning

Problems with ssdm

The ssdm functions work with 1D arrays but when applied to 2D
it doesn't work. By looking in the LLVM-IR we found that the
ssdm function is always called on a 1D array. I didn't found
a way to work around this problem

Meta programming

The solution for partitioning without using the ssdm functions
is to use template meta programmation to implement in c++ the
behaviour of a partitioned array.

We had two ideas to implement this :

  • recursive template structure
  • tuples using some boost::hana stuff

Problems with tuples

The try with tuples did not go far because there is an error
with current sycl implementation when using some functions like
std::make_tuple inside lambdas. This error make tuples unusable for
the partitioning problem for the moment.

Problems with recursive template structure

The first problem with recursive template structure is that if the
index is not known at compile time the subscript operator has a o(N)
complexity because all cases have to be tested and performance would
even worse than before. So to be effective all the loop using the array
have to be unrolled

Problems with loop unrolling

The implementation we found for loop unrolling use tuples so it suffers
the same problem with lambdas. With this implementation we can unroll a loop
but we cannot unroll a loop which is inside an unrolled one. By discussing
with Alexandre we learned that to be effective with partitioned array we
need to unroll the loop (for all implementations). So it is mandatory to
find a solution to this problem in order to use partitioned array.

How to know if partitioning works

We discussed this with Alexandre and the conclusion is that the only way
to know if the array is partitioned is to look at the resource use and
timing of the hardware result. This method is purely empirical, so finding
the good examples is essential. The problem is if the example is to simple
the compiling process will be able to perform too much optimizations.
For example the table below describes a case where when the loop is unrolled,
the compiler seems to do the partitioning by itself. This example don't
allow any conclusion about the array partitioning implementations. I tried
with a more complex example, a cache for the edge detection algorithm on FPGA.
But I didn't achieve to make it working due to the problems above.

classic for loop FF LUT DSP BRAM URAM Cycle
reference 3785 5766 36 10 0 4270
ssdm 5685 4198 20 2 0 470
struct inherit array 35663 8000 36 2 0 3270
meta_programing 1862 11021 4 4 0 5147
unrolled for loop FF LUT DSP BRAM URAM Cycle
reference 1605 2008 8 2 0 148
ssdm compilation error
struct inherit array compilation error
meta_programing 1605 2008 8 2 0 148

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.