Coder Social home page Coder Social logo

intel / opencl-intercept-layer Goto Github PK

View Code? Open in Web Editor NEW
304.0 17.0 77.0 2.68 MB

Intercept Layer for Debugging and Analyzing OpenCL Applications

License: MIT License

C 23.25% CMake 1.19% C++ 73.79% Python 1.74% Shell 0.03%
opencl debugging performance compute gpgpu

opencl-intercept-layer's Introduction

Intercept Layer for OpenCLTM Applications

GitHub Actions: GitHub Actions Build Status

The Intercept Layer for OpenCL Applications is a tool that can intercept and modify OpenCL calls for debugging and performance analysis. Using the Intercept Layer for OpenCL Applications requires no application or driver modifications.

To operate, the Intercept Layer for OpenCL Applications masquerades as the OpenCL ICD loader (usually) or as an OpenCL implementation (rarely) and is loaded when the application intends to load the real OpenCL ICD loader. As part of the Intercept Layer for OpenCL Application's initialization, it loads the real OpenCL ICD loader and gets function pointers to the real OpenCL entry points. Then, whenever the application makes an OpenCL call, the call is intercepted and can be passed through to the real OpenCL with or without changes.

Intercept Layer Architecture

This project adheres to the Intercept Layer for OpenCL Application's code of conduct. By participating, you are expected to uphold this code.

Documentation

All controls are documented here.

Instructions to build the Intercept Layer for OpenCL Applications can be found here.

Instructions to use the Intercept Layer for OpenCL Applications Loader (cliloader) can be found here.

Instructions for the old loader (cliprof) can still be found here.

Instructions to install the Intercept Layer for OpenCL Applications can be found here.

Troubleshooting steps and answers to frequently asked questions can be found here.

Detailed instructions:

Tutorial

A tutorial demonstrating common usages of the Intercept Layer for OpenCL Applications can be found here.

License

The Intercept Layer for OpenCL Applications is licensed under the MIT License.

Notes:

Attached Licenses

The Intercept Layer for OpenCL Applications uses third-party code licensed under the following licenses:

Support

Please file a GitHub issue to report an issue or ask questions. Private or sensitive issues may be submitted via email to this project's maintainer (Ben Ashbaugh - ben 'dot' ashbaugh 'at' intel 'dot' com), or to any other Intel GitHub maintainer (see profile for email address).

How to Contribute

Contributions to the Intercept Layer for OpenCL Applications are welcomed and encouraged. Please see CONTRIBUTING for details how to contribute to the project.


OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.

* Other names and brands may be claimed as the property of others.

Copyright (c) 2018-2024, Intel(R) Corporation

opencl-intercept-layer's People

Contributors

al42and avatar alexbatashev avatar alisanikiforova avatar baryluk avatar bashbaug avatar dependabot[bot] avatar echeresh avatar haraldservat avatar isanghao avatar ivvenevt avatar jwlawson avatar karolherbst avatar msriram avatar novermars avatar nsdhaman avatar ph0b avatar philck avatar ppabis-intel avatar tapplencourt avatar trbauer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opencl-intercept-layer's Issues

Add Basic Travis CI Format Checks

Observed Behavior

It's easy for new contributors to inadvertently introduce formatting inconsistencies into this codebase.

Desired Behavior

Add automated checks for basic formatting inconsistencies, such as use of tabs vs. spaces, or trailing whitespace.

Steps to Reproduce

Create a pull request with tabs or tailing whitespace. Travis CI and Appveyor both run but only verify that the build succeeds, and no formatting checks are performed.

Opencl Intercept layer not working (segmentation fault)

Hello

I have build and installed opencl intercept layer in my linux platform with kernel version 4.13.0. However, I couldn’t get the intercept layer to work. I would mention the steps I have taken to make it work in detail hoping to get it work. I am trying to get some metrics out of my opencl application. First, I downloaded whole opencl intercept repo in a folder and followed the cmake instructions as provided in the release mode and I have enabled other flags like ENABLE_CLIPROF, ENABLE_MDAPI, ENABLE_CLILOADER followed by make and make install.
The opencl library created successfully inside the folder where I downloaded the repo. After that I am trying to use it for a particular application rather than doing a global install. So I believed I set the environment properly.
Specifically, the environment setting steps are as follows:
export LD_LIBRARY_PATH=/home/user/Desktop/OpenclIntercept/:$LD_LIBRARY_PATH
export CLI_DLLName=/opt/intel/opencl/SDK/lib64/libOpenCL.so

Setting other environment variables like:
export CLI_CallLogging=1 CLI_DumpProgramSource=1 CLI_DevicePerfCounterCustom=ComputeBasic CLI_DevicePerfCounterTiming=1

After setting all this when I am trying to execute both the cliloader and cliprof on my application I am getting segmentation fault.

CliLoader details:

sudo ./cliloader --debug --call-logging --dump-source /path/to/application/Application_itself

Output:

[cliloader debug] full path to executable is: /home/User/Desktop/opencl-intercept-layer/cliloader/cliloader
[cliloader debug] pProcessName is non-NULL: /cliloader
[cliloader debug] process directory is /home/User/Desktop/opencl-intercept-layer/cliloader
[cliloader debug] New LD_PRELOAD is /home/User/Desktop/opencl-intercept-layer/cliloader/../libOpenCL.so
[cliloader debug] New LD_LIBRARY_PATH is /home/User/Desktop/opencl-intercept-layer/cliloader/..
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CLIntercept (64-bit) is loading...
CLintercept file location: /home/User/Desktop/opencl-intercept-layer/cliloader/../libOpenCL.so
CLIntercept URL: https://github.com/intel/opencl-intercept-layer
CLIntercept git description: v2.2.1-111-g331ce28
CLIntercept git refspec: refs/heads/master
CLInterecpt git hash: 331ce28
CLIntercept optional features:
cliloader(supported)
cliprof(supported)
kernel overrides(NOT supported)
ITT tracing(NOT supported)
MDAPI(supported)
CLIntercept environment variable prefix: CLI_
CLIntercept config file: clintercept.conf
Trying to load dispatch from: ./real_libOpenCL.so
Couldn't load library from: ./real_libOpenCL.so
Trying to load dispatch from: /usr/lib/x86_64-linux-gnu/libOpenCL.so
Couldn't get exported function pointer to: clSetProgramReleaseCallback
Couldn't get exported function pointer to: clSetProgramSpecializationConstant
... success!
CallLogging is set to a non-default value!
ReportToStderr is set to a non-default value!
DumpProgramSource is set to a non-default value!
Timer Started!
... loading complete.

clGetPlatformIDs
<<<< clGetPlatformIDs -> CL_SUCCESS
Number of Platforms: 2
clGetPlatformIDs
<<<< clGetPlatformIDs -> CL_SUCCESS
clGetDeviceIDs: platform = [ Intel(R) OpenCL HD Graphics ], device_type = CL_DEVICE_TYPE_GPU (4)
<<<< clGetDeviceIDs -> CL_SUCCESS
clGetDeviceIDs: platform = [ Intel(R) OpenCL HD Graphics ], device_type = CL_DEVICE_TYPE_GPU (4)
<<<< clGetDeviceIDs -> CL_SUCCESS
clCreateContext: properties = [ NULL ], num_devices = 1, devices = [ Intel(R) Gen9 HD Graphics NEO (CL_DEVICE_TYPE_GPU) ]
<<<< clCreateContext: returned 0x1c7ee50 -> CL_SUCCESS
Context created successfully
Segmentation fault (core dumped)

CliLoader details:

sudo ./cliprof --debug --verbose /path/to/application/Application_itself

Output:
[cliprof debug] full path to executable is: /home/user/Desktop/opencl-intercept-layer/cliprof/cliprof
[cliprof debug] pProcessName is non-NULL: /cliprof
[cliprof debug] process directory is /home/user/Desktop/opencl-intercept-layer/cliprof
[cliprof debug] New LD_PRELOAD is /home/user/Desktop/opencl-intercept-layer/cliprof/../libOpenCL.so
[cliprof debug] New LD_LIBRARY_PATH is /home/user/Desktop/opencl-intercept-layer/cliprof/..
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CLIntercept (64-bit) is loading...
CLintercept file location: /home/user/Desktop/opencl-intercept-layer/cliprof/../libOpenCL.so
CLIntercept URL: https://github.com/intel/opencl-intercept-layer
CLIntercept git description: v2.2.1-111-g331ce28
CLIntercept git refspec: refs/heads/master
CLInterecpt git hash: 331ce28
CLIntercept optional features:
cliloader(supported)
cliprof(supported)
kernel overrides(NOT supported)
ITT tracing(NOT supported)
MDAPI(supported)
CLIntercept environment variable prefix: CLI_
CLIntercept config file: clintercept.conf
Trying to load dispatch from: ./real_libOpenCL.so
Couldn't load library from: ./real_libOpenCL.so
Trying to load dispatch from: /usr/lib/x86_64-linux-gnu/libOpenCL.so
Couldn't get exported function pointer to: clSetProgramReleaseCallback
Couldn't get exported function pointer to: clSetProgramSpecializationConstant
... success!
ReportToStderr is set to a non-default value!
DevicePerformanceTiming is set to a non-default value!
Timer Started!
... loading complete.
Number of Platforms: 2
Context created successfully
Segmentation fault (core dumped)

Also when I am trying to launch the application after setting up environment variables I am getting a segmentation fault. I don’t know if the directory CLIntercept_Dump would be created automatically, I created it manually and I didn’t see any dump inside it even after enabling logging and dump. So I am not sure how to make this thing work and what I should do. I have one more question as to how can I use the control apis as mentioned in here.

I would really appreciate if someone could help me out with the errors.
Thank you.

Upgrade to CMake 3.1 or Newer

I'd like to move this project to a newer version of CMake shortly - mostly likely CMake 3.1, but possibly CMake 3.4.3. I am not planning to move to a newer version of CMake at this time.

CMake 3.1 is still fairly old, but it is newer than the CMake version provided by Ubuntu 14.04. So, Ubuntu 14.04 users will need to upgrade manually to a newer version of CMake, if they haven't already.

I believe that moving to the newer version of CMake is worth losing out-of-the-box support for Ubuntu 14.04, but if this will cause problems for Ubuntu 14.04 users or any other operating systems, please let me know via this issue. Thanks!

unable to dump ISA with cliloader

hi,

i tried with https://github.com/intel/opencl-intercept-layer/blob/master/docs/cliloader.md, but met issues. Could please help, thanks. My system is ubuntu 18.04.

Here is my steps

$ git clone https://github.com/intel/opencl-intercept-layer.git
$ cd opencl-intercept-layer/

$ mkdir build
$ cd build
$ cmake -DENABLE_CLILOADER=1 ..
$ make

// then try to dump
$ export DumpKernelISABinaries=1
$ ./cliloader/cliloader clinfo
// only an empty file is generated
$ ls ~/CLIntercept_Dump/clinfo/clintercept_report.txt -s
0 /home/yguo18/CLIntercept_Dump/clinfo/clintercept_report.txt

// i tried with other program, still no ISA dumped.
// i'm sure that OpenCL kernels are executed in this program.
$ /work/gfx/opencl-intercept-layer/build/cliloader/cliloader ./ffmpeg ...
$ cat ~/CLIntercept_Dump/ffmpeg/clintercept_report.txt 
Total Enqueues: 72

thanks

Inconsistency in HostPerformanceTiming and DevicePerformanceTiming Reports

Observed Behavior

The HostPerformanceTiming report currently consists of a section for time, and another different section for ticks. This is different than DevicePerformanceTiming, which has a single section for time.

Desired Behavior

It would be nice if the HostPerformanceTiming report were more consistent with the DevicePerformanceTiming report.

Steps to Reproduce

Simply set HostPerformanceTiming and DevicePerformanceTiming and observe the difference in report formats.

aubcapture

Observed Behavior

No aub file found.

Desired Behavior

aub file is expected when AubCapture is set to 1.

Steps to Reproduce

set AubCapture to 1
run apps

  • Any Intercept Layer for OpenCL Applications controls you are setting.
  • The application you ran. For large or proprietary applications, it will likely be easier to address your issue if you include a simpler reproducer.
  • The OpenCL implementation and device you tested on.
  • Your operating system, and any other relevant system information.

Some of this information may automatically be included if you attach your Intercept Layer for OpenCL Applications log, particularly if you set CLInfoLogging.)

CLIProf Support for ISA Kernel Binaries

Observed Behavior

Dumping ISA kernel binaries currently requires "manual" setup via environment variables or registry keys. The cliconfig app can help with the manual setup, but only on Windows.

Desired Behavior

A cliprof option to simplify collection of ISA kernel binaries. The cliprof option would setup the appropriate environment variable to dump ISA kernel binaries without any manual setup.

Invalid Params Crash

Observed Behavior

In our Unit testing we have a lot of tests that try to weed out bad parameters. When running through the intercept layer these bad params can take down the application from the intercept layer.

clCreateProgramWithSource with bad params crash on these lines.

// intercept.cpp
size_t length = 0;
if( ( lengths == NULL ) ||
  ( lengths[i] == 0 ) )
{  
  length = strlen( strings[i] );
}

clCreateSubBuffer with junk crashes on pRegion->origin

// intercept.cpp
{
  cl_buffer_region* pRegion = (cl_buffer_region*)createInfo;
  ss << "origin = "
  << pRegion->origin
  << " size = "
  << pRegion->size;
}

clGetExtensionFunctionAddress* with nullptr gets passed to func_name;

void* CLIntercept::getExtensionFunctionAddress(cl_platform_id platform, const std::string& func_name);

Desired Behavior

Not crashing.

Steps to Reproduce

  • Passing junk values to the above entry points.
  • Internal OCL

DevicePerfCounterTiming changes the behaviour of clGetEventProfilingInfo

Observed Behavior

I have a very simple OpenCL application that reports wrong kernel profiling timings if executed using the intercept layer.

Desired Behavior

The intercept layer shouldn't alter the query results of clGetEventProfilingInfo.

Steps to Reproduce

  • Build this application, which is a single file that uses OpenCL to do some useless work: https://gist.github.com/mcleary/5915b184ada922d6739710d1ad54e575
  • Build the Intercept Layer with MDAPI support.
  • Use the following controls
    • CLI_DevicePerfCounterCustom = ComputeBasic
    • CLI_DevicePerfCounterTiming = 1
    • CLI_CLInfoLogging = 1
  • Run the application
  • Verify the output results
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CLIntercept (64-bit) is loading...
CLintercept file location: C:\Users\Thales\source\repos\SimpleOpenCLApp\x64\Release\OpenCL.dll
CLIntercept URL: https://github.com/intel/opencl-intercept-layer
CLIntercept git description: v2.2.1-111-g331ce28
CLIntercept git refspec: refs/heads/master
CLInterecpt git hash: 331ce28ff998ea83620317aa1dfc64b0c06f6587
CLIntercept optional features:
    cliloader(NOT supported)
    cliprof(NOT supported)
    kernel overrides(supported)
    ITT tracing(NOT supported)
    MDAPI(supported)
CLIntercept environment variable prefix: CLI_
CLIntercept registry key: SOFTWARE\INTEL\IGFX\CLINTERCEPT
Trying to load dispatch from: real_opencl.dll
Couldn't load library from: real_opencl.dll
Trying to load dispatch from: C:\WINDOWS/syswow64/opencl.dll
Couldn't load library from: C:\WINDOWS/syswow64/opencl.dll
Trying to load dispatch from: C:\WINDOWS/system32/opencl.dll
... success!
CLInfoLogging is set to a non-default value!
DevicePerfCounterCustom is set to a non-default value!
DevicePerfCounterTiming is set to a non-default value!
Metric Discovery initialized.
Timer Started!
... loading complete.

Enumerated 2 platforms.

Platform 0:
        Name:           Intel(R) OpenCL
        Vendor:         Intel(R) Corporation
        Driver Version: OpenCL 2.0
        Profile:        FULL_PROFILE
        Extensions:
                cl_intel_dx9_media_sharing
                cl_khr_3d_image_writes
                cl_khr_byte_addressable_store
                cl_khr_d3d11_sharing
                cl_khr_depth_images
                cl_khr_dx9_media_sharing
                cl_khr_gl_sharing
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_icd
                cl_khr_image2d_from_buffer
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_spir
                14 Platform Extensions Found
        Platform has 2 devices.

Device 0:
        Name:           Intel(R) HD Graphics 530
        Vendor:         Intel(R) Corporation
        Version:        OpenCL 2.0
        Driver Version: 20.19.15.4352
        Type:           CL_DEVICE_TYPE_GPU
        Extensions:
                cl_intel_accelerator
                cl_intel_advanced_motion_estimation
                cl_intel_ctz
                cl_intel_d3d11_nv12_media_sharing
                cl_intel_dx9_media_sharing
                cl_intel_motion_estimation
                cl_intel_simultaneous_sharing
                cl_intel_subgroups
                cl_khr_3d_image_writes
                cl_khr_byte_addressable_store
                cl_khr_d3d10_sharing
                cl_khr_d3d11_sharing
                cl_khr_depth_images
                cl_khr_dx9_media_sharing
                cl_khr_fp16
                cl_khr_gl_depth_images
                cl_khr_gl_event
                cl_khr_gl_msaa_sharing
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_gl_sharing
                cl_khr_icd
                cl_khr_image2d_from_buffer
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_mipmap_image
                cl_khr_mipmap_image_writes
                cl_khr_spir
                28 Device Extensions Found

Device 1:
        Name:           Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
        Vendor:         Intel(R) Corporation
        Version:        OpenCL 2.0 (Build 10094)
        Driver Version: 5.2.0.10094
        Type:           CL_DEVICE_TYPE_CPU
        Extensions:
                cl_khr_icd
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_byte_addressable_store
                cl_khr_depth_images
                cl_khr_3d_image_writes
                cl_intel_exec_by_local_thread
                cl_khr_spir
                cl_khr_dx9_media_sharing
                cl_intel_dx9_media_sharing
                cl_khr_d3d11_sharing
                cl_khr_gl_sharing
                cl_khr_fp64
                cl_khr_image2d_from_buffer
                16 Device Extensions Found

Platform 1:
        Name:           AMD Accelerated Parallel Processing
        Vendor:         Advanced Micro Devices, Inc.
        Driver Version: OpenCL 2.1 AMD-APP (2766.5)
        Profile:        FULL_PROFILE
        Extensions:
                cl_khr_icd
                cl_khr_d3d10_sharing
                cl_khr_d3d11_sharing
                cl_khr_dx9_media_sharing
                cl_amd_event_callback
                cl_amd_offline_devices
                6 Platform Extensions Found
        Platform has 1 device.

Device 0:
        Name:           Baffin
        Vendor:         Advanced Micro Devices, Inc.
        Version:        OpenCL 2.0 AMD-APP (2766.5)
        Driver Version: 2766.5
        Type:           CL_DEVICE_TYPE_GPU
        Extensions:
                cl_khr_fp64
                cl_amd_fp64
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_int64_base_atomics
                cl_khr_int64_extended_atomics
                cl_khr_3d_image_writes
                cl_khr_byte_addressable_store
                cl_khr_fp16
                cl_khr_gl_sharing
                cl_khr_gl_depth_images
                cl_amd_device_attribute_query
                cl_amd_vec3
                cl_amd_printf
                cl_amd_media_ops
                cl_amd_media_ops2
                cl_amd_popcnt
                cl_khr_d3d10_sharing
                cl_khr_d3d11_sharing
                cl_khr_dx9_media_sharing
                cl_khr_image2d_from_buffer
                cl_khr_spir
                cl_khr_subgroups
                cl_khr_gl_event
                cl_khr_depth_images
                cl_khr_mipmap_image
                cl_khr_mipmap_image_writes
                cl_amd_liquid_flash
                cl_amd_planar_yuv
                31 Device Extensions Found

[0]: Intel(R) OpenCL
        [0]: Intel(R) HD Graphics 530
        [1]: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
[1]: AMD Accelerated Parallel Processing
        [0]: Baffin

Running on Intel(R) HD Graphics 530
Initializing ... done
Copying ... done
Queued : 19136003988800
Submit : 19136003996383
Start  : 19136005131628
End    : 19136005131628

Queued : 19136009502800
Submit : 19136009511549
Start  : 19136009679799
End    : 19136009679799

Queued : 19136014366200
Submit : 19136014375283
Start  : 19136014543532
End    : 19136014543532

Queued : 19136019566000
Submit : 19136019574499
Start  : 19136019741749
End    : 19136019741749

Queued : 19136024386900
Submit : 19136024394733
Start  : 19136024560482
End    : 19136024560482

Copying back ...done

CLIntercept is shutting down...
... shutdown complete.
  • Operating system
    • Windows 10

Running the application without the intercept layer returns numbers that make sense:

[0]: Intel(R) OpenCL
        [0]: Intel(R) HD Graphics 530
        [1]: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
[1]: AMD Accelerated Parallel Processing
        [0]: Baffin

Running on Intel(R) HD Graphics 530
Initializing ... done
Copying ... done
Queued : 19305726216800
Submit : 19305726221799
Start  : 19305727164462
End    : 19305727314962

Queued : 19305732613900
Submit : 19305732618566
Start  : 19305732747232
End    : 19305732897648

Queued : 19305737749500
Submit : 19305737754916
Start  : 19305737886499
End    : 19305738037498

Queued : 19305743947600
Submit : 19305743952183
Start  : 19305744083182
End    : 19305744233265

Queued : 19305750804600
Submit : 19305750809433
Start  : 19305750984182
End    : 19305751134098

Copying back ...done

check for elevated privileges or paranoid mode when MDAPI perf counters are enabled

Observed Behavior

When MDAPI perf counters are enabled but the application is not executed with elevated privileges or with paranoid mode enabled the intercept layer will generate error messages that perf counters could not be collected but it is not clear why the errors are occurring.

Desired Behavior

Detect and generate a more meaningful error message ("please enable paranoid mode or run with elevated privileges") when MDAPI perf counters are enabled.

Steps to Reproduce

Execute an app with MDAPI enablied (cliloader --mdapi-ebs) but without elevated privileges or enabling paranoid mode.

Unable to collect MD metrics from given adapter (>0) in multi adapter scenario.

Observed Behavior

Unable to collect MD metrics from given adapter in multi adapter scenario.

Desired Behavior

Collecting MD metrics from any adapter.

Steps to Reproduce

  • Use multi adapter environment
  • Use application that works on subsequent adapter
  • MD will not be able to collect metrics, since it collects metrics from the first adapter

Hook for `clLinkProgram` options.

Observed Behavior

No option seems to be present to dump the clLinkProgram options. Sorry if it's the case and that I didn't find it.

Desired Behavior

It will be nice to have a hook to dump the clLinkProgram option. Indeed I have a program that hangs at link time. Being able to get the link options without parsing the log will be a big plus.

Add Error String to Call Logging Exit

Observed Behavior

CallLogging currently doesn't log any error information. If ErrorLogging is enabled, it will log when an error occurs, but on a separate line:

>>>> clSetKernelArg( GenerateJuliaSet ): kernel = 0717BDA8, index = 4, size = 4
ERROR! clSetKernelArg returned CL_INVALID_ARG_INDEX (-49)
<<<< clSetKernelArg

Desired Behavior

Consider logging the error information as part of CallLogging when the function exist. Something like:

>>>> clSetKernelArg( GenerateJuliaSet ): kernel = 0717BDA8, index = 4, size = 4
ERROR! clSetKernelArg returned CL_INVALID_ARG_INDEX (-49)
<<<< clSetKernelArg returned CL_INVALID_ARG_INDEX

It could be useful to print the error information even if there were no errors, for example:

>>>> clSetKernelArg( GenerateJuliaSet ): kernel = 0717BDA8, index = 3, size = 4
<<<< clSetKernelArg returned CL_SUCCESS

Steps to Reproduce

Set CallLogging and ErrorLogging and run a program that generates an error.

support injecting link options

Observed Behavior

We can currently inject build options and compile options but not link options.

Desired Behavior

Support injecting link options also.

Note that this is a little trickier (or at least different) than injecting build and compile options because we don't have a program object yet when we need to check for link options to inject.

Steps to Reproduce

  1. Run a program that compiles and links a program separately and provides link options.
  2. Dump link options.
  3. Copy the link options to the Inject folder and enable program injection.
  4. No link options are injected.

In API Trace Syntactic Consistency

Observed Behavior

For example clGetPlatformIDs omits arguments and uses , separator instead of : like all other API calls

>>>> clGetPlatformIDs, EnqueueCounter: 1
                     ^ uses comma
Host Time for call 1: clGetPlatformIDs = 1114623933
<<<< clGetPlatformIDs
....
>>>> clGetDeviceIDs: platform = [ Intel(R) OpenCL ], device_type = CL_DEVICE_TYPE_GPU (4), EnqueueCounter: 1
Host Time for call 3: clGetDeviceIDs = 292
<<<< clGetDeviceIDs

Probably the , belongs to the EnqueueCounter suffix,

  1. Another way to view this is clGetPlatformIDs omits its arguments.
  2. Perhaps suggest we should change the API trace to a more consistent syntax so that it's more consumable by tools? See below for a suggestion.

Desired Behavior

Perhaps we could use a syntax like the following.

<API-CALL> ::= <API-CALLING> <API-BODY> <API-CALL-RETURN>
<API-BODY> ::= anything but '<<<<'
<API-CALLING> ::=  '>>>>'  <THREAD-ID-ETC>? <API_NAME> ':' <ARG-LIST> <EXT-INFO>
<API-CALL-RETURN> ::= '<<<<' <THREAD-ID-ETC>? <API-NAME> ' returned '<RETURN-VALUE> (':' <CALL-RETURN-VALUES>)?
<ARG-LIST> ::=  <ARG-NAME> ' = ' <ARG-VALUE>  (',' <ARG-NAME> ' = ' <ARG-VALUE>)| 
<CALL-RETURN-VALUES> :: ':' <ARG-LIST> // values returned via pointer or return value
<EXT-INFO> ::=<empty-string> |  ';' <OTHER-STUFF-LIKE-ENQ-COUNTER>
<ARG-VALUE> ::= <ARG_SCALAR> | <ARG_PTR>
<ARG_SCALAR> = <HEX-VAL> |  '{' <HEX-VAL> (',' <HEX-VAL>)* '}' // structs for things like float4's
<ARG_PTR> = '[' <HEX-VAL> ']' // means the value returned is indirect

This would give us.

>>>> clGetPlatformIds: num_entries = 10, platforms = [0x12345...], num_platforms = [0x22345...]; EnqueueCounter:1
....
<<<< clGetPlatformIds returned CL_SUCCESS: platforms[] = {...platform_id's...}, *num_platforms = 2
// note the return by ptr values are decoded to a minimal extent using API domain specific knowledge (e.g. num_platforms is a single int)

Note, I am not tied to the exact syntax, but am just proposing something consistent for tool consumability. Probably there are a lot of improvements we could apply to the above.

Regarding API's that return a pointer (e.g. clCreateProgram) and return the status via pointer, we might consider untangling them logically to always present the cl_int (status) as the "returned" value and the new object allocated (e.g. program) as a pseudo argument like new_program. Or we just leave it alone and decode the status as a regular "return by pointer" argument (but decode it). I see benefits either way.

Steps to Reproduce

Enable the API trace (with enqueue counters) and observe clintercept_report.txt calls.

MDAPI not supported on macOS

Observed Behavior

On macOS intercept layer is successfully compiled with MDAPI functionality enabled. However, the code for Darwin operating systems is incorrect. See:

#elif defined(__linux__) || defined(__APPLE__)
static const char* cMDLibFileName = "libmd.so";

Although there's some kind of MDAPI implementation on macOS, it has a different library name:

objdump -t /System/Library/Extensions/AppleIntelKBLGraphicsGLDriver.bundle/Contents/MacOS/libigdmd.dylib

/System/Library/Extensions/AppleIntelKBLGraphicsGLDriver.bundle/Contents/MacOS/libigdmd.dylib:	file format Mach-O 64-bit x86-64

SYMBOL TABLE:
0000000005614542 l    d  *UND*	radr://5614542
0000000000089760 g     F __TEXT,__text	_CloseMetricsDevice
00000000002355b0 g     F __TEXT,__text	_ClosePerformanceInterface
0000000000089e40 g     F __TEXT,__text	_CloseTraceDevice
000000000008a000 g     F __TEXT,__text	_CreateMDAPIObjectFactory

One could have simply change lib name to the correct one, but then we run into another kind of problem:

cl_int errorCode = dispatch().clGetEventProfilingInfo(
event,
CL_PROFILING_COMMAND_PERFCOUNTERS_INTEL,
reportSize,
pReport,
&outputSize );

CL_PROFILING_COMMAND_PERFCOUNTERS_INTEL is not supported by Apple implementation (in fact, it will never be, as OpenCL is now deprecated on macOS).

Proposed solution

Disable MDAPI on macOS for good and mention this functionality as not supported in documentation to avoid confusion.

Without support from OpenCL runtime, I don't know any way to retrieve profiling data for specific kernel. If there's some, it would be great to know.

Include Program Build Options as Part of the Hash

Observed Behavior

Currently the hashes used in DevicePerformanceTimeHashTracking do not include the program build options its calculation. This makes it impossible to distinguish between programs that differ only in build options. Many OpenCL projects heavily (ab)use the C preprocessor (thus build options) and different options can radically change the program behavior based on those.

Two examples:

#ifdef USE_IMAGES
   pix = read_imageui(...);
#else // use buffers
   pix = buffer[...];
#endif

Another example is programs that hardcode the workgroup size in for things like loop unrolling.

 local tile[TILE_SIZE];
 for (int i = 0; i < TILE_SIZE; i++)
     ...

Presumably we want the program to be distinguished based on the TILE_SIZE chosen.

Desired Behavior

This issue proposes we simply include the build options as part of the hash. In the intercept log, the hash will be part of the same .cl file dumped and the user can track down the options the program was built with as needed.

Steps to Reproduce

Enable DevicePerformanceTimeHashTracking and DevicePerformanceTiming and build two programs using the same input and different build options and invoke kernels from both programs. In the DevicePerformance output file (clintercept_report.txt) note that the hash part for each program does not differ (the program and build count still differ.

"too long" instead of event_list in clWaitForEvents

Observed Behavior

>>>> TID = 4596 clRetainEvent: [ ref count = 1 ] event = 00000000052AA270
<<<< TID = 4596 clRetainEvent: [ ref count = 2 ]
>>>> TID = 4596 clWaitForEvents: too long
<<<< TID = 4596 clWaitForEvents
>>>> TID = 4596 clReleaseEvent: [ ref count = 2 ] event = 00000000052AAE10
<<<< TID = 4596 clReleaseEvent: [ ref count = 1 ]

Desired Behavior

>>>> TID = 4596 clRetainEvent: [ ref count = 1 ] event = 00000000052AA270
<<<< TID = 4596 clRetainEvent: [ ref count = 2 ]
>>>> TID = 4596 clWaitForEvents: event_list = (size = 500 <# or more #>) [ 00000000052AA270, 00000000052AA271, ... ]
<<<< TID = 4596 clWaitForEvents
>>>> TID = 4596 clReleaseEvent: [ ref count = 2 ] event = 00000000052AAE10
<<<< TID = 4596 clReleaseEvent: [ ref count = 1 ]

Steps to Reproduce

To repro the issue, please use GITS stream from mwalkowi\clIntercept-repros (from Intel internal IGK share - please contact me for details). I have put there also full log and registry keys I have used (.reg file).

To run it you need to gitsPlayer from mwalkowi\gits (from share) - I am using version 186 combined with fix from mwalkowi\gits\hotfix-gits-169 (to support device only staff correctly).

The stream was run on SKL GT4e. clIntercept was built today:

CLIntercept (64-bit) is loading...
CLintercept file location: C:\Windows\SYSTEM32\OpenCL.dll
CLIntercept URL: https://github.com/intel/opencl-intercept-layer
CLIntercept git description: v2.2-55-g6a016b9
CLIntercept git refspec: refs/heads/embargo
CLInterecpt git hash: 6a016b9c6a7d13e14e8b620dbf3cc99f5ee311bb
CLIntercept environment variable prefix: CLI_
CLIntercept registry key: SOFTWARE\INTEL\IGFX\CLINTERCEPT
```.

API trace clCreateBuffer omits context argument

Observed Behavior

For example:

>>>> clCreateBuffer: flags = CL_MEM_READ_WRITE (1), size = 8192, host_ptr = 0000000000000000, EnqueueCounter: 1
<<<< clCreateBuffer: returned 0000029F58C24880

But clCreateBuffer takes a context argument as well.

Desired Behavior

We'd like to see the pointer value.

>>>> clCreateBuffer: context = 0000029F50FAD910, flags = CL_MEM_READ_WRITE (1), size = 8192, host_ptr = 0000000000000000, EnqueueCounter: 1
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
<<<< clCreateBuffer: returned 0000029F58C24880

Steps to Reproduce

Enable call logging on a program that calls clCreateBuffer.

Feature request: Replay specific extracted enqueued OpenCL kernel(s) (ranges)

Introduction

For some scenarios where you don't have access to a program's sources, it can be very convenient to be able to replay part of the program's execution and compare it between platforms/devices.

It is already possible to determine where there is divergence between two runs of the same program on different platforms/devices, by extracting the buffers after each kernel launch via the CLI and comparing them (e.g. via a hash).

The next step would be to specifically look at this kernel and to run it in isolation to further debug any issues (such as code-gen), by comparing the results of running the kernel on different platforms/devices.

Currently, this is a manual process, from extracting the individual parts (kernel source, input buffers, input arguments etc) to writing a small C, C++ or Python program to be compiled and executed.

This process should be able to be fully automated, since all the information is already extracted by the CLI and just have to be put together properly.

Loose requirements

  • Be able to specify an enqueued kernel number for which a standalone program (C or C++ source file + CMakeLists.txt, or Python script) with the input buffers and arguments should be generated
  • Possibly extend to a continuous range of kernels
  • All the files should be put into a separate folder for convenient sharing between different systems

Potential design choices

  • C/C++/Python for the replay-program
  • Generate replay program sources directly in the CLI, or just generate some meta-data and have a Python program do this

Visible change to the CLI

  • Add a control which allows the user to specify an enqueued kernel number for which is a replay-program should be generated

I'm more than willing to do the work for this one and to make a pull request, but if this would be a 10 min job for you then I won't stop you :)

Local install for Linux

Could you please add a section about how a user can install the tool without root access to the real ICD ?

Thanks

Race condition in Chrome Tracing

Observed Behavior

Using cliloader to capture Chrome trace JSON file, I observe potential race condition in the JSON output. Multiple lines of output are interleaved, resulting in an incorrect JSON file.

Example lines:

...
{"ph":"X", "pid":74961, "tid":-140266891780736, "name":"iclEnqueueMapBuffer", "ts":218627, "dur":0},
{"ph":"X", "pid":74961, "tid":1998309, "name":"iclReleaseKernel", "ts":{"ph":"X", "pid":21865274961, "tid":1998272, "name":"},
EnqueueMapBuffer, "", "ts":218682, "dur":16},
{"ph":"X", "pid":74961, "tid":-140266891780736, "name":"iclEnqueueMapBuffer", "ts":218682, "dur":0},
...

Desired Behavior

Expecting a correct JSON file.

Steps to Reproduce

clinfo program args... with the controls as defined in clIntercept.conf stated below.

Platform: Mac OSX Catalina 10.15.4 / 2015 15" MBP
OpenCL device: CPU (OSX OpenCL framework)

clIntercept.conf:

ChromeCallLogging=1
ChromePerformanceTiming=1
ChromePerformanceTimingInStages=0 # bug is prevalent with this setting on or off
CLInfoLogging=1

stdout

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CLIntercept (64-bit) is loading...
CLintercept file location: /Users/ncgalopp/code/opencl-intercept-layer/build/intercept/libOpenCL.1.2.dylib
CLIntercept URL: https://github.com/intel/opencl-intercept-layer
CLIntercept git description: v2.2.2-11-g77f84bd
CLIntercept git refspec: refs/heads/master
CLInterecpt git hash: 77f84bd3a84b500de78639479c47a17b333279dc
CLIntercept optional features:
    cliloader(supported)
    cliprof(supported)
    kernel overrides(NOT supported)
    ITT tracing(NOT supported)
    MDAPI(supported)
CLIntercept environment variable prefix: CLI_
CLIntercept config file: clintercept.conf
Dispatch table initialized.
Control ChromeCallLogging is set to non-default value: true
Control CLInfoLogging is set to non-default value: true
Control ReportToStderr is set to non-default value: true
Control ChromePerformanceTiming is set to non-default value: true
Timer Started!
... loading complete.

[application stdout omitted]

Enumerated 1 platform.

Platform 0:
	Name:           Apple
	Vendor:         Apple
	Driver Version: OpenCL 1.2 (Feb 29 2020 00:40:07)
	Profile:        FULL_PROFILE
	Extensions:
		cl_APPLE_SetMemObjectDestructor
		cl_APPLE_ContextLoggingFunctions
		cl_APPLE_clut
		cl_APPLE_query_kernel_names
		cl_APPLE_gl_sharing
		cl_khr_gl_event
		6 Platform Extensions Found
	Platform has 3 devices.

Device 0:
	Name:           Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
	Vendor:         Intel
	Version:        OpenCL 1.2
	Driver Version: 1.1
	Type:           CL_DEVICE_TYPE_CPU
	Extensions:
		cl_APPLE_SetMemObjectDestructor
		cl_APPLE_ContextLoggingFunctions
		cl_APPLE_clut
		cl_APPLE_query_kernel_names
		cl_APPLE_gl_sharing
		cl_khr_gl_event
		cl_khr_fp64
		cl_khr_global_int32_base_atomics
		cl_khr_global_int32_extended_atomics
		cl_khr_local_int32_base_atomics
		cl_khr_local_int32_extended_atomics
		cl_khr_byte_addressable_store
		cl_khr_int64_base_atomics
		cl_khr_int64_extended_atomics
		cl_khr_3d_image_writes
		cl_khr_image2d_from_buffer
		cl_APPLE_fp64_basic_ops
		cl_APPLE_fixed_alpha_channel_orders
		cl_APPLE_biased_fixed_point_image_formats
		cl_APPLE_command_queue_priority
		20 Device Extensions Found

Device 1:
	Name:           Intel(R) UHD Graphics 630
	Vendor:         Intel Inc.
	Version:        OpenCL 1.2
	Driver Version: 1.2(Mar 15 2020 21:29:48)
	Type:           CL_DEVICE_TYPE_GPU
	Extensions:
		cl_APPLE_SetMemObjectDestructor
		cl_APPLE_ContextLoggingFunctions
		cl_APPLE_clut
		cl_APPLE_query_kernel_names
		cl_APPLE_gl_sharing
		cl_khr_gl_event
		cl_khr_global_int32_base_atomics
		cl_khr_global_int32_extended_atomics
		cl_khr_local_int32_base_atomics
		cl_khr_local_int32_extended_atomics
		cl_khr_byte_addressable_store
		cl_khr_image2d_from_buffer
		cl_khr_gl_depth_images
		cl_khr_depth_images
		cl_khr_3d_image_writes
		15 Device Extensions Found

Device 2:
	Name:           AMD Radeon Pro 560X Compute Engine
	Vendor:         AMD
	Version:        OpenCL 1.2
	Driver Version: 1.2 (Mar  5 2020 22:38:07)
	Type:           CL_DEVICE_TYPE_GPU
	Extensions:
		cl_APPLE_SetMemObjectDestructor
		cl_APPLE_ContextLoggingFunctions
		cl_APPLE_clut
		cl_APPLE_query_kernel_names
		cl_APPLE_gl_sharing
		cl_khr_gl_event
		cl_khr_global_int32_base_atomics
		cl_khr_global_int32_extended_atomics
		cl_khr_local_int32_base_atomics
		cl_khr_local_int32_extended_atomics
		cl_khr_byte_addressable_store
		cl_khr_image2d_from_buffer
		cl_khr_depth_images
		cl_APPLE_command_queue_priority
		cl_APPLE_command_queue_select_compute_units
		cl_khr_fp64
		16 Device Extensions Found

[application stdout omitted]

Total Enqueues: 3478

CLIntercept is shutting down...
... shutdown complete.

Enable ARM/Linux builds

Observed Behavior

Builds fail on ARM/Linux due to binary kernel dependencies

Desired Behavior

Builds succeed on ARM/Linux

Steps to Reproduce

Run cmake/make on an ARM/Linux platform

Patch

Here is the proposed patch:
arm_linux.txt

add decoding for CL_SEMAPHORE_EXPORT_HANDLE_TYPES

Observed Behavior

There is currently no behavior for decoding CL_SEMAPHORE_EXPORT_HANDLE_TYPES when creating a semaphore for export.

Desired Behavior

Best case: decode the handle types for CL_SEMAPHORE_EXPORT_HANDLE_TYPES. At minimum, properly skip this property, since it is a list of variable length.

Steps to Reproduce

Create a semaphore with the CL_SEMAPHORE_EXPORT_HANDLE_TYPES property with CallLogging enabled.

How to use DevicePerfCounterTimeBasedSampling?

Observed Behavior

Profiling my application with CLI_DevicePerfCounterTimeBasedSampling=1 gives me the message E:[MDAPI]: enable timer mode escape failed MDAPI Helper: OpenIoStream failed 42

Desired Behavior

I would like to be able to sample the performance counters of my Intel GPU when running my OpenCL application.

Steps to Reproduce

I am just running a small OpenCL application with the intercept layer. Is this because my device doesn't support this feature?

Here are the values for all controls I am using:

"CLI_ChromeCallLogging": 0,
"CLI_DevicePerformanceTimelineLogging": 1,
"CLI_DevicePerformanceTimeLogging": 1,
"CLI_DevicePerformanceTiming ": 1,
"CLI_DevicePerfCounterTimeBasedSampling": 1,
"CLI_CLInfoLogging": 1,
"CLI_DevicePerfCounterEventBasedSampling": 0,
"CLI_ReportToFile": 1,
"CLI_EventCallbackLogging": 1,
"CLI_HostPerformanceTiming": 1,
"CLI_DevicePerfCounterTiming": 1,
"CLI_DevicePerfCounterCustom": "ComputeExtended"
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CLIntercept (64-bit) is loading...
CLintercept file location: C:\Users\Thales\Desktop\syclone\out\build\x64-Debug\test\system\tiled_convolution\Debug\OpenCL.dll
CLIntercept URL: https://github.com/intel/opencl-intercept-layer
CLIntercept git description: v2.2.2-7-g08e6ba8
CLIntercept git refspec: refs/heads/master
CLInterecpt git hash: 08e6ba842d66643edc8b06f119316f732c28b604
CLIntercept optional features:
    cliloader(NOT supported)
    cliprof(NOT supported)
    kernel overrides(supported)
    ITT tracing(NOT supported)
    MDAPI(supported)
CLIntercept environment variable prefix: CLI_
CLIntercept registry key: SOFTWARE\INTEL\IGFX\CLINTERCEPT
Trying to load dispatch from: real_opencl.dll
Couldn't load library: real_opencl.dll
Trying to load dispatch from: C:\Windows/syswow64/opencl.dll
Couldn't load library: C:\Windows/syswow64/opencl.dll
Trying to load dispatch from: C:\Windows/system32/opencl.dll
Couldn't get exported function pointer to: clCreateBufferWithProperties
Couldn't get exported function pointer to: clCreateImageWithProperties
... success!
EventCallbackLogging is set to a non-default value!
CLInfoLogging is set to a non-default value!
HostPerformanceTiming is set to a non-default value!
DevicePerformanceTimeLogging is set to a non-default value!
DevicePerformanceTimelineLogging is set to a non-default value!
DevicePerfCounterTimeBasedSampling is set to a non-default value!
DevicePerfCounterCustom is set to a non-default value!
DevicePerfCounterTiming is set to a non-default value!
Metric Discovery initialized.
Timer Started!
... loading complete.
Running [tiled_convolution]. Executing on [intel:gpu].

Enumerated 2 platforms.

Platform 0:
        Name:           Intel(R) OpenCL
        Vendor:         Intel(R) Corporation
        Driver Version: OpenCL 2.1 WINDOWS
        Profile:        FULL_PROFILE
        Extensions:
                cl_khr_icd
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_int64_base_atomics
                cl_khr_int64_extended_atomics
                cl_khr_byte_addressable_store
                cl_khr_depth_images
                cl_khr_3d_image_writes
                cl_khr_il_program
                cl_intel_unified_shared_memory_preview
                cl_intel_exec_by_local_thread
                cl_intel_vec_len_hint
                cl_khr_spir
                cl_khr_fp64
                cl_khr_image2d_from_buffer
                17 Platform Extensions Found
        Platform has 1 device.

Device 0:
        Name:           Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
        Vendor:         Intel(R) Corporation
        Version:        OpenCL 2.1 (Build 0)
        Driver Version: 2020.10.3.0.04
        Type:           CL_DEVICE_TYPE_CPU
        Extensions:
                cl_khr_icd
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_int64_base_atomics
                cl_khr_int64_extended_atomics
                cl_khr_byte_addressable_store
                cl_khr_depth_images
                cl_khr_3d_image_writes
                cl_khr_il_program
                cl_intel_unified_shared_memory_preview
                cl_intel_exec_by_local_thread
                cl_intel_vec_len_hint
                cl_khr_spir
                cl_khr_fp64
                cl_khr_image2d_from_buffer
                17 Device Extensions Found

Platform 1:
        Name:           Intel(R) OpenCL
        Vendor:         Intel(R) Corporation
        Driver Version: OpenCL 2.1
        Profile:        FULL_PROFILE
        Extensions:
                cl_khr_3d_image_writes
                cl_khr_byte_addressable_store
                cl_khr_depth_images
                cl_khr_fp64
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_icd
                cl_khr_image2d_from_buffer
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_spir
                11 Platform Extensions Found
        Platform has 2 devices.

Device 0:
        Name:           Intel(R) HD Graphics 520
        Vendor:         Intel(R) Corporation
        Version:        OpenCL 2.1 NEO
        Driver Version: 26.20.100.7463
        Type:           CL_DEVICE_TYPE_GPU
        Extensions:
                cl_khr_3d_image_writes
                cl_khr_byte_addressable_store
                cl_khr_fp16
                cl_khr_depth_images
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_icd
                cl_khr_image2d_from_buffer
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_intel_subgroups
                cl_intel_required_subgroup_size
                cl_intel_subgroups_short
                cl_khr_spir
                cl_intel_accelerator
                cl_intel_media_block_io
                cl_intel_driver_diagnostics
                cl_khr_priority_hints
                cl_khr_throttle_hints
                cl_khr_create_command_queue
                cl_khr_fp64
                cl_khr_subgroups
                cl_khr_il_program
                cl_intel_spirv_device_side_avc_motion_estimation
                cl_intel_spirv_media_block_io
                cl_intel_spirv_subgroups
                cl_khr_spirv_no_integer_wrap_decoration
                cl_khr_mipmap_image
                cl_khr_mipmap_image_writes
                cl_intel_unified_shared_memory_preview
                cl_intel_planar_yuv
                cl_intel_packed_yuv
                cl_intel_motion_estimation
                cl_intel_device_side_avc_motion_estimation
                cl_intel_advanced_motion_estimation
                cl_khr_int64_base_atomics
                cl_khr_int64_extended_atomics
                cl_khr_gl_sharing
                cl_khr_gl_depth_images
                cl_khr_gl_event
                cl_khr_gl_msaa_sharing
                cl_intel_dx9_media_sharing
                cl_khr_dx9_media_sharing
                cl_khr_d3d10_sharing
                cl_khr_d3d11_sharing
                cl_intel_d3d11_nv12_media_sharing
                cl_intel_simultaneous_sharing
                47 Device Extensions Found

Device 1:
        Name:           Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
        Vendor:         Intel(R) Corporation
        Version:        OpenCL 2.1 (Build 0)
        Driver Version: 7.6.0.0814
        Type:           CL_DEVICE_TYPE_CPU
        Extensions:
                cl_khr_icd
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_byte_addressable_store
                cl_khr_depth_images
                cl_khr_3d_image_writes
                cl_intel_exec_by_local_thread
                cl_khr_spir
                cl_khr_fp64
                cl_khr_image2d_from_buffer
                cl_intel_vec_len_hint
                13 Device Extensions Found

Can not trace DeviceTimeLine line to corresponding kernel when KernelInfoTracking is enabled

Observed Behavior

When DevicePerformanceTimeKernelInfoTracking is enabled along with DevicePerformanceTimelineLogging, the timeline output contains, for example, lines like:

>>>> clEnqueueNDRangeKernel( kernel_name ): queue = <qptr>, kernel = <kernptr>, global_work_size = < gws >, local_work_size = < lws >, EnqueueCounter: <enqcounter>

Device Timeline for call <n> to kernel_name  SIMD<x> SPILL=<yy> SLM=<zz> GWS[<gws>] LWS[<lws>] = <t1> ns (queued), <t2> ns (submit), <t3> ns (start), <t4> ns (end)

To identify which clEnqueueNDRangeKernel corresponds to the timeline line, one needs to count the calls and find out, the n'th call. This counting is w.r.t. the full key. So the counting should only be within calls to that kernel/gws/lws combination which has the same SIMDxx SPILL=<yy> SLM=<zz> GWS[ <gws> ] LWS[ <lws>].

However, when the clEnqueueNDRange is executed, only the kernelname, GWS and LWS are known. So, the corresponding log line also mentions only these three components. It is therefore not easy to identify which calls could possibly match the detailed kernel info.

Desired Behavior

The details provided in the device timeline log should be easily traceable to the corresponding API calls. Such details could, for instance, be the absolute API counter. Counting the absolute API counter is unambiguous.

Ideally, the tracing details should appear not only in the device timeline line, but also in the line describing the clEnqueueNDRange as well, as shown below.

  • Also note, that the format of GWS and LWS in the two lines is different as of now. Ideally, GWS/LWS should be reported using a single format anywhere they appear in the log. More generally, any unit of information should appear consistently through out the log. This consistency aspect also relates to #28
  • The global_working_size format in the clEnqueueNDRangeKernel line uses ',' to separate the different dimensions. This results in overloading the ',' character since it separates different other elements in the complete line. The 'axb' format avoids using the line-level field separator to separate sub-fields and hence is unambiguous.
>>>> clEnqueueNDRangeKernel( kernel_name ): queue = <qptr>, kernel = <kernptr>, GWS[<gws>], LWS = [lws], EnqueueCounter: <enqcounter>, APICounter: <apicounter>


Device Timeline for API counter <m> call <n> to kernel_name  SIMD<x> SPILL=<yy> SLM=<zz> GWS[ <gws> ] LWS[ <lws> ] = <t1> ns (queued), <t2> ns (submit), <t3> ns (start), <t4> ns (end)

Steps to Reproduce

  • Enable DevicePerformanceTimeKernelInfoTracking
  • Enable DevicePerformanceTimelineLogging

Can't get dumped ISA binaries using DumpKernelISABinaries

I followed this doc https://github.com/intel/opencl-intercept-layer/blob/master/docs/kernel_isa_gpu.md to dump opencl kernel. However, it seems that there is no ***.isabin file generated. The output message is like below:

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CLIntercept (64-bit) is loading...
CLintercept file location: C:\workspace\Release\OpenCL.dll
CLIntercept URL: https://github.com/intel/opencl-intercept-layer
CLIntercept git description: v2.2.1-145-g412f826
CLIntercept git refspec: refs/heads/master
CLInterecpt git hash: 412f8264bf2d9744595c41a25f15194964f02565
CLIntercept optional features:
    cliloader(supported)
    cliprof(supported)
    kernel overrides(supported)
    ITT tracing(NOT supported)
    MDAPI(supported)
CLIntercept environment variable prefix: CLI_
CLIntercept registry key: SOFTWARE\INTEL\IGFX\CLINTERCEPT
Trying to load dispatch from: real_opencl.dll
Couldn't load library from: real_opencl.dll
Trying to load dispatch from: C:\WINDOWS/syswow64/opencl.dll
Couldn't load library from: C:\WINDOWS/syswow64/opencl.dll
Trying to load dispatch from: C:\WINDOWS/system32/opencl.dll
... success!
BuildLogging is set to a non-default value!
CLInfoLogging is set to a non-default value!
DevicePerformanceTiming is set to a non-default value!
DumpKernelISABinaries is set to a non-default value!
Timer Started!
... loading complete.

Enumerated 1 platform.

Platform 0:
        Name:           Intel(R) OpenCL
        Vendor:         Intel(R) Corporation
        Driver Version: OpenCL 2.1
        Profile:        FULL_PROFILE
        Extensions:
                cl_khr_3d_image_writes
                cl_khr_byte_addressable_store
                cl_khr_depth_images
                cl_khr_fp64
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_icd
                cl_khr_image2d_from_buffer
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_spir
                11 Platform Extensions Found
        Platform has 2 devices.

Device 0:
        Name:           Intel(R) UHD Graphics 630
        Vendor:         Intel(R) Corporation
        Version:        OpenCL 2.1 NEO
        Driver Version: 26.20.100.7000
        Type:           CL_DEVICE_TYPE_GPU
        Extensions:
                cl_khr_3d_image_writes
                cl_khr_byte_addressable_store
                cl_khr_fp16
                cl_khr_depth_images
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_icd
                cl_khr_image2d_from_buffer
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_intel_subgroups
                cl_intel_required_subgroup_size
                cl_intel_subgroups_short
                cl_khr_spir
                cl_intel_accelerator
                cl_intel_media_block_io
                cl_intel_driver_diagnostics
                cl_intel_device_side_avc_motion_estimation
                cl_khr_priority_hints
                cl_khr_throttle_hints
                cl_khr_create_command_queue
                cl_khr_fp64
                cl_khr_subgroups
                cl_khr_il_program
                cl_intel_spirv_device_side_avc_motion_estimation
                cl_intel_spirv_media_block_io
                cl_intel_spirv_subgroups
                cl_khr_spirv_no_integer_wrap_decoration
                cl_khr_mipmap_image
                cl_khr_mipmap_image_writes
                cl_intel_planar_yuv
                cl_intel_packed_yuv
                cl_intel_motion_estimation
                cl_intel_advanced_motion_estimation
                cl_khr_gl_sharing
                cl_khr_gl_depth_images
                cl_khr_gl_event
                cl_khr_gl_msaa_sharing
                cl_intel_dx9_media_sharing
                cl_khr_dx9_media_sharing
                cl_khr_d3d10_sharing
                cl_khr_d3d11_sharing
                cl_intel_d3d11_nv12_media_sharing
                cl_intel_simultaneous_sharing
                44 Device Extensions Found

Device 1:
        Name:           Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
        Vendor:         Intel(R) Corporation
        Version:        OpenCL 2.1 (Build 0)
        Driver Version: 7.6.0.0228
        Type:           CL_DEVICE_TYPE_CPU
        Extensions:
                cl_khr_icd
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_byte_addressable_store
                cl_khr_depth_images
                cl_khr_3d_image_writes
                cl_intel_exec_by_local_thread
                cl_khr_spir
                cl_khr_fp64
                cl_khr_image2d_from_buffer
                cl_intel_vec_len_hint
                13 Device Extensions Found

Platforms (1):
    [0] Intel(R) OpenCL [Selected]
Devices (1; filtered by type gpu):
    [0] Intel(R) UHD Graphics 630 [Selected]
Build Info for program 000001CF90D41490 (0000_1E103682_0000_7BC09F9A) for 1 device(s):
    Build finished in 2525.65 ms.
Build Status for device 0 = Intel(R) UHD Graphics 630 (OpenCL C 2.0 ): CL_BUILD_SUCCESS
-------> Start of Build Log:
<------- End of Build Log

-----------------------------------------
matrix size: ( 1024x1024 ) * ( 1024x1024 )
Algorithm                       Avg kernel GFlops               Avg Host GFlops                 Peak Kernel GFlops
L3_SLM_8x8_4x16         294.15                          280.354                         296.47
CLIntercept is shutting down...
... shutdown complete.

After some investigations with the help of @bashbaug, I found that the test needs administrator permission to run so that it can write ***.isabin to C disk. Otherwise, the write will fail.
From my case, I think there are some places that can be improved.

  1. If the OS Write file failed, print a log message to tell the end user. Currently, nothing is printed.
  2. Declare that administrator privilege is needed in kernel_isa_gpu.md in case someone meets similar issue.

Crash when running OpenCL conformance test

Observed Behavior

OpenCL conformance test math_brute_force crashes with a segmentation fault. Application runs fine without the intercept layer. Other conformance tests run fine without crashing.

Desired Behavior

Application does not crash.

Steps to Reproduce

  1. Build from source: OpenCL-SDK, OpenCL-CTS, opencl-intercept-layer
  2. Run ./cliloader -h -d -f ./test_bruteforce -w
  3. Observe a crash. Time to crash is intermittent and shortens when debugging is enabled.

See attached logs for additional information.
clintercept_log.txt
math_bruteforce_output.txt

Incomplete details in API log

Observed Behavior

When logging API calls, some information is missing from the log.

Current instances:

  • clCreateCommandQueue log does not show cl_context
  • clBuildProgram log does not have build knobs (knobs need to be inferred from the source dump)
  • clCreateUserEvent log does not show cl_context
  • clSetUserEventStatus log does not show the status set
  • clEnqueueWriteImage.* logs do not show cl_queue

Desired Behavior

All the relevant parameters to each API call should be logged.

Steps to Reproduce

Run an OpenCL application using the above API calls under the control of CLIntercept

print the platform and device that a captured kernel is replayed on

Observed Behavior

Running the replay script "run.py" prints a fair amount of information about what the script is doing, but it doesn't print the platform and device name. This can be useful especially if a system has two platforms with the same name, say for a dGPU and an iGPU.

Desired Behavior

Print the platform and device name from the "run.py" replay script.

Steps to Reproduce

Run the replay script. Observe that no platform and device name is printed.

Add NaN checker control for fp buffers and images

Add a control which checks all fp buffers/images for NaNs, outputs the first kernel which causes such a NaN to appear. Possibly before and after a kernel enqueue, to prevent garbage in == garbage out.

For debugging, it can be nice to detect the first time a NaN appears, so that this kernel and its inputs can be captured and replayed for further analysis.

Probably needs kernel argument reflection, to prevent false positives in int buffers or OpenCL images for which have an integer as underlying data type.

Give me a few days and I'll prepare a merge request :)

System-wide Config File

Observed Behavior

The Intercept Layer can currently read from a config file or environment variables, but both of these are per-user controls (the config file is in the user's home directory). There is currently no way to setup a system-wide config file instead.

Desired Behavior

Consider adding support for a system-wide config file in addition to the existing per-user controls. For example, a system-wide config file in /etc/OpenCL could be searched if a control is not specified by the existing per-user environment variables or config file.

Steps to Reproduce

N/A, new feature request.

Add Hash to Call Logging Kernel Name

Observed Behavior

CallLogging frequently includes a decoded kernel name in the log, for example:

>>>> clSetKernelArg( GenerateJuliaSet ): kernel = 0717BDA8, ...

To differentiate between multiple kernels with the same name, DevicePerformanceTimeHashTracking will append hash information information in the report:

                       Function Name,  Calls,     ...
GenerateJuliaSet(0000_3F40E1CD_0001),      4,     ...

The hash is not included in the log, though, which makes it difficult to differentiate between calls affecting kernels with the same name.

Desired Behavior

Include the program hash in the log, when DevicePerformanceTimeHashTracking is enabled. For example:

>>>> clSetKernelArg( GenerateJuliaSet(0000_3F40E1CD_0001) ): kernel = 0717BDA8, ...

Steps to Reproduce

Set CallLogging, DevicePerformanceTiming, and DevicePerformanceTimeHashTracking, and run a program that enqueues a kernel.

Param Mutations

Observed Behavior

Not a huge problem, mutations occur in some of the parameters presumably for hashing and general ease. In our negative testing clBuildProgram tests (and others) would fail because const char *options = nullptr would silently get transformed into "" (partly due to the changes you made for me in stopping null strings getting access, a few months ago)`, thus by the time it got to the impl the params would be different.

Is this desired? We test bad params alot, I know the CTS doesn't really touch this.

Desired Behavior

The mutations in things like count in clCreateProgramWithSource going from n to 1 are fine in our testing, as they are both valid, but from invalid to valid is less clear to us if its desirable. From code inspection this looks like it will impact some of the caching.

Setting CLI_FinishAfterEnqueue gives prints but does not actually call clFinish

Observed Behavior

Setting CLI_FinishAfterEnqueue=1 gives prints saying "Calling clFinish after clEnqueueNDRangeKernel" but does not actually call clFinish. This is when using Release build of OpenCL.dll

Desired Behavior

Setting CLI_FinishAfterEnqueue=1 should actually make the clFinish call for all the Enqueue APIs.

Steps to Reproduce

  • set CLI_FinishAfterEnqueue=1 (along with other call logging settings).
  • Ran a rendering application.
  • Windows 10 RS2.

Crash with SYCL

Observed Behavior

When I try to intercept OpenCL calls coming from the intel/llvm project (specifically the "simple SYCL application" as described in here), I get an assertion failure, with the following stack trace:
#0 __GI_raise (sig=5) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff7b987eb in dummyGetPlatformIDs (num_entries=2, platforms=0x7fffffffc240,
num_platforms=0x0) at /rr-home/istvan/opencl-intercept-layer/intercept/src/stubs.cpp:34
#2 0x00007ffff7aafe9a in clGetPlatformIDs (num_entries=2, platforms=0x7fffffffc240,
num_platforms=0x0) at /rr-home/istvan/opencl-intercept-layer/intercept/src/dispatch.cpp:43
#3 0x00007ffff6f76d7c in cl::sycl::detail::pi::OclpiPlatformsGet(unsigned int, _pi_platform**, unsigned int*) () from /rr-home/shared/istvan/intel_llvm2/lib/libsycl.so
#4 0x00007ffff6f8e3d8 in cl::sycl::detail::platform_impl_pi::get_platforms() ()
from /rr-home/shared/istvan/intel_llvm2/lib/libsycl.so
#5 0x00007ffff6fb87ae in cl::sycl::platform::get_platforms() ()
from /rr-home/shared/istvan/intel_llvm2/lib/libsycl.so
#6 0x00007ffff6fb6418 in cl::sycl::device::get_devices(cl::sycl::info::device_type) ()
from /rr-home/shared/istvan/intel_llvm2/lib/libsycl.so
#7 0x00007ffff6fb6838 in cl::sycl::device_selector::select_device() const ()
from /rr-home/shared/istvan/intel_llvm2/lib/libsycl.so
#8 0x0000000000406120 in cl::sycl::queue::queue(cl::sycl::device_selector const&, cl::sycl::property_list const&) ()

Desired Behavior

I'd like to see OpenCL kernels and API calls logged

Steps to Reproduce

Tested on several platforms, including Ubuntu 18 and Debian 9.9, with CPU and Intel GPU targets.
cliloader -d ./a.out
Running the code without cliloader works fine.

Documentation lacks build step for OpenCL

There is a lack of build steps in documentation.
It was not obvious to see what's wrong after first call to cliloader with my app.
Should've been added make cliloader/cliprof (the obvious ones) and make OpenCL steps or a simple make step.

Enqueue Counter Race Condition

Observed Behavior

With a multi-threaded application there is a chance that two "enqueue" API calls may be logged with the same "enqueue counter". This is because the main enqueue counter is used throughout the API call and only updated at the end of the API call, so if two threads happen to execute an API call at approximately the same time, they may both execute using the same main enqueue counter.

Desired Behavior

Ensure that each API call is logged with a unique enqueue counter. One way to accomplish this is to safely assign and increment the main enqueue counter at the start of the API, and then to only refer to the assigned enqueue counter while the API call is executing.

Steps to Reproduce

Because this issue requires timing and multiple threads reproducibility is not 100%, but it should be possible to see multiple API calls with the same enqueue counter with a sufficiently long running multi-threaded application.

fix MacOS warnings

Observed Behavior

A few warnings have crept into the MacOS builds. We should fix these:

/Users/runner/work/opencl-intercept-layer/opencl-intercept-layer/intercept/src/intercept.cpp:10290:21: warning: variable 'nameSize' set but not used [-Wunused-but-set-variable]
            size_t  nameSize = 0;
                    ^
/Users/runner/work/opencl-intercept-layer/opencl-intercept-layer/intercept/src/intercept.cpp:10368:21: warning: variable 'nameSize' set but not used [-Wunused-but-set-variable]
            size_t  nameSize = 0;
                    ^
/Users/runner/work/opencl-intercept-layer/opencl-intercept-layer/intercept/src/intercept.cpp:10488:21: warning: variable 'nameSize' set but not used [-Wunused-but-set-variable]
            size_t  nameSize = 0;
                    ^

Desired Behavior

No warnings.

Steps to Reproduce

See the automated MacOS builds.

[windows] app exits immediately and no logs are produced

Observed Behavior

OpenCL application instantly closes when running in same directory with clintercept OpenCL.dll on Windows. Happens for all opencl apps on windows for me. I have some older versions of clintercept and they work correctly. Issue observed with 3.0.2 release and on master branch.

I've also tested it with very simple clinfo.exe app, that simply prints some basic GPU information and it also doesn't work. I've tested both with OpenCL.dll in same working directory and with cliloader.exe

Desired Behavior

Application should finish as usual and produce logs in C:\Intel\CLintercept

Steps to Reproduce

Build clintercept from source from release 3.0.2
copy OpenCL.dll to application working directory
Invoke app from command line
Application exits immediately, no logs are created

Uploading intercept built by me. This is built from release 3.0.2
clintercept.zip

Flags used

"LogToFile"=dword:00000001
"BuildLogging"=dword:00000001
"CallLogging"=dword:00000001
"ErrorLogging"=dword:00000001
"SimpleDumpProgram"=dword:00000001
"DumpProgramsScript"=dword:00000001
"DumpProgramsInject"=dword:00000001
"InjectPrograms"=dword:00000001
"InjectProgramSource"=dword:00000001
"CallLoggingThreadId"=dword:00000001
"FinishAfterEnqueue"=dword:00000000
"AppendFiles"=dword:00000000

profiling overhead question

The time is listed below for running a sycl program. Is the long profiling time expected ? Thank you for your suggestion.

user@s001-n192:~/oneAPI-DirectProgramming/scan-sycl$ time SYCL_BE=PI_OPENCL make run
./main
PASS

real    0m6.716s
user    0m2.364s
sys     0m3.412s


user@s001-n192:~/oneAPI-DirectProgramming/scan-sycl$ time SYCL_BE=PI_OPENCL cliloader -h -d -q make run
./main
PASS
Total Enqueues: 100001


Host Performance Timing Results:

Total Time (ns): 33540256246

                                                                                         Function Name,  Calls,     Time (ns), Time (%),  Average (ns),      Min (ns),                                                                                                                       Max (ns)
                                                                                      clCompileProgram,      6,         71334,    0.00%,         11889,         11333,                                                                                                                          13000
                                                                                        clCreateBuffer,      2,        746000,    0.00%,        373000,         24625,                                                                                                                         721375
                                                                    clCreateCommandQueueWithProperties,      1,         11333,    0.00%,         11333,         11333,                                                                                                                          11333
                                                                                       clCreateContext,      1,         24709,    0.00%,         24709,         24709,                                                                                                                          24709
                                                                                        clCreateKernel,      1,         25334,    0.00%,         25334,         25334,                                                                                                                          25334
                                                                                 clCreateProgramWithIL,      6,         99332,    0.00%,         16555,         11333,                                                                                                                          25333
clEnqueueNDRangeKernel( _ZTSZZ7runTestIfEvPKT_PS0_iENKUlRN2cl4sycl7handlerEE24_14clES7_E10scan_block ), 100000,    6275220967,   18.71%,         62752,         39000,                                                                                                                         505250
                                                                                   clEnqueueReadBuffer,      1,     118655277,    0.35%,     118655277,     118655277,                                                                                                                      118655277
                                                                                      clGetContextInfo,     20,        533545,    0.00%,         26677,         11292,                                                                                                                         318209
                                                                                        clGetDeviceIDs,      8,         90751,    0.00%,         11343,         11333,                                                                                                                          11375
                                                                                       clGetDeviceInfo,     28,        351499,    0.00%,         12553,         11291,                                                                                                                          32000
                                                              clGetExtensionFunctionAddressForPlatform,      3,         35792,    0.00%,         11930,         11375,                                                                                                                          13042
                                                                                       clGetKernelInfo, 100000,    1162555235,    3.47%,         11625,         11291,                                                                                                                         119667
                                                                                      clGetPlatformIDs,      2,     119243652,    0.36%,      59621826,         11333,                                                                                                                      119232319
                                                                                     clGetPlatformInfo,     20,        226666,    0.00%,         11333,         11292,                                                                                                                          11375
                                                                                         clLinkProgram,      1,     141313449,    0.42%,     141313449,     141313449,                                                                                                                      141313449
                                                                                 clReleaseCommandQueue,      1,         11333,    0.00%,         11333,         11333,                                                                                                                          11333
                                                                                      clReleaseContext,      1,         11333,    0.00%,         11333,         11333,                                                                                                                          11333
                                                                                        clReleaseEvent, 100001,    1159952614,    3.46%,         11599,          5000,                                                                                                                         500291
                                                                                       clReleaseKernel,      1,         11333,    0.00%,         11333,         11333,                                                                                                                          11333
                                                                                    clReleaseMemObject,      2,         70333,    0.00%,         35166,         34000,                                                                                                                          36333
                                                                                      clReleaseProgram,      7,        151334,    0.00%,         21619,         11333,                                                                                                                          69334
                                                                                        clSetKernelArg, 1300000,   15267126476,   45.52%,         11743,          5000,                                                                                                                         427875
                                                                                   clSetKernelExecInfo, 300000,    3487335058,   10.40%,         11624,          5000,                                                                                                                         423250
                                                                                       clWaitForEvents, 100011,    5806381557,   17.31%,         58057,         39000,                                                                                                                     1145250387

Device Performance Timing Results for Intel(R) Graphics Gen9 [0x3e96] (24CUs, 1150MHz):

Total Time (ns): 876254175

                                                               Function Name,  Calls,     Time (ns), Time (%),  Average (ns),      Min (ns),      Max (ns)
_ZTSZZ7runTestIfEvPKT_PS0_iENKUlRN2cl4sycl7handlerEE24_14clES7_E10scan_block, 100000,     876236425,  100.00%,          8762,          8083,         34000
                                                         clEnqueueReadBuffer,      1,         17750,    0.00%,         17750,         17750,         17750

real    0m59.670s
user    0m3.917s
sys     0m54.593s

Documentation for Linux "Targeted Usage" is incorrect

Observed Behavior

Passing CLI_DLLName= doesn't work, because Linux is case-sensitive and the code looks for CLI_DllName.

The documentation under install.md lists the upper-case version.

Desired Behavior

Change the documentation to demonstrate the lower-case version (or perhaps change the preferred variable name to the upper-case version, or check for both).

Steps to Reproduce

Set CLI_DLLName=, and opencl-intercept-layer will not find it with getenv.

This should be an issue with any POSIX OS, regardless of application or other variables set. Having something set in ~/clintercept.conf might change the behavior, I haven't tried that.

Some of this information may automatically be included if you attach your Intercept Layer for OpenCL Applications log, particularly if you set CLInfoLogging.)

VS 2019 has no 32bit build

Observed Behavior

On Windows 10, 64 bit, only has 64 bit build

Desired Behavior

Need to have 32bit build as well.

Steps to Reproduce

cd <root-clintercept>
mkdir _bin64
cmake -G "Visual Studio 16 2019" ..
cmake --build . --config RelWithDebInfo --target install

This will give 64 bit build. Note that the following has cmake error:
cmake -G "Visual Studio 16 2019 win32" ..

Buffer Hashes / Compressed buffer dumps

Observed Behavior

If one enables buffer dumps for multiple (may be all) buffers, the dumps occupy large space, running into several giga bytes. The need for such a large disk space can rule out performing such a run on many systems. Compressing the dumped files after the entire run is also impractical.

Desired Behavior

  1. If it is possible to output buffer hashes (e.g. sha512) as an independently selectable knob, it can provide a way to collect a low-footprint signature of all the buffers.
  2. Buffer Compression as an independently selectable knob will provide another equally useful way to collect a medium-footprint signature of all the buffers.

Steps to Reproduce

Execute an OpenCL application under CLIntercept control, enabling all buffer dumps

Symbolicated dissassemly

I have followed this guide to get the disassembled OpenCL kernel.

I retrieved the .isabin file from my Intel AlderLake-S GT1 using:

$ cliloader --dump-output-binaries --dump-kernel-isa-binaries ./viewer fs=0

And then I disassembled with:

$ iga64 -d -p 12p1 CLI_0000_27223D7A_0000_00000000_GPU_gib_raytest.isabin

I get the disassembled code, which is great. However, I find it hard to read.

There are no variable names, no source-code line-numbers.

How can I get a disassembled OpenCL kernel where the symbol names or at least the line numbers are shown?

Update

I added -g as an option to clBuildProgram() but that does not seem to have made a difference? The resulting .bin dump is a lot larger with that option, but the .isabin files are roughly the same size.

[call logging] Missing call information for clSetUserEventStatus.

Observed Behavior

There is no information (event, execution_status) for clSetUserEventStatus call:

>>>> TID = 6856 clCreateUserEvent
<<<< TID = 6856 clCreateUserEvent: returned 000000003853B020
>>>> TID = 6856 clRetainEvent: [ ref count = 1 ] event = 000000003853B020
<<<< TID = 6856 clRetainEvent: [ ref count = 2 ]
>>>> TID = 6856 clSetUserEventStatus
<<<< TID = 6856 clSetUserEventStatus
>>>> TID = 6856 clReleaseEvent: [ ref count = 2 ] event = 000000003853B020
<<<< TID = 6856 clReleaseEvent: [ ref count = 1 ]
>>>> TID = 6856 clCreateUserEvent
<<<< TID = 6856 clCreateUserEvent: returned 000000003853BDB0
>>>> TID = 6856 clRetainEvent: [ ref count = 1 ] event = 000000003853BDB0
<<<< TID = 6856 clRetainEvent: [ ref count = 2 ]
>>>> TID = 6856 clSetUserEventStatus
<<<< TID = 6856 clSetUserEventStatus
>>>> TID = 6856 clReleaseEvent: [ ref count = 2 ] event = 000000003853BDB0
<<<< TID = 6856 clReleaseEvent: [ ref count = 1 ]
>>>> TID = 6856 clCreateUserEvent
<<<< TID = 6856 clCreateUserEvent: returned 0000000038537600
>>>> TID = 6856 clRetainEvent: [ ref count = 1 ] event = 0000000038537600
<<<< TID = 6856 clRetainEvent: [ ref count = 2 ]
>>>> TID = 6856 clSetUserEventStatus
<<<< TID = 6856 clSetUserEventStatus
>>>> TID = 6856 clReleaseEvent: [ ref count = 2 ] event = 0000000038537600
<<<< TID = 6856 clReleaseEvent: [ ref count = 1 ]
>>>> TID = 6856 clCreateUserEvent
<<<< TID = 6856 clCreateUserEvent: returned 000000003853A480
>>>> TID = 6856 clRetainEvent: [ ref count = 1 ] event = 000000003853A480
<<<< TID = 6856 clRetainEvent: [ ref count = 2 ]
>>>> TID = 6856 clSetUserEventStatus
<<<< TID = 6856 clSetUserEventStatus

Desired Behavior

>>>> TID = 6856 clCreateUserEvent
<<<< TID = 6856 clCreateUserEvent: returned 000000003853B020
>>>> TID = 6856 clRetainEvent: [ ref count = 1 ] event = 000000003853B020
<<<< TID = 6856 clRetainEvent: [ ref count = 2 ]
>>>> TID = 6856 clSetUserEventStatus: event = 000000003853B020, execution_status = CL_COMPLETE
<<<< TID = 6856 clSetUserEventStatus
>>>> TID = 6856 clReleaseEvent: [ ref count = 2 ] event = 000000003853B020
<<<< TID = 6856 clReleaseEvent: [ ref count = 1 ]
>>>> TID = 6856 clCreateUserEvent
<<<< TID = 6856 clCreateUserEvent: returned 000000003853BDB0
>>>> TID = 6856 clRetainEvent: [ ref count = 1 ] event = 000000003853BDB0
<<<< TID = 6856 clRetainEvent: [ ref count = 2 ]
>>>> TID = 6856 clSetUserEventStatus: event = 000000003853BDB0, execution_status = CL_COMPLETE
<<<< TID = 6856 clSetUserEventStatus
>>>> TID = 6856 clReleaseEvent: [ ref count = 2 ] event = 000000003853BDB0
<<<< TID = 6856 clReleaseEvent: [ ref count = 1 ]
>>>> TID = 6856 clCreateUserEvent
<<<< TID = 6856 clCreateUserEvent: returned 0000000038537600
>>>> TID = 6856 clRetainEvent: [ ref count = 1 ] event = 0000000038537600
<<<< TID = 6856 clRetainEvent: [ ref count = 2 ]
>>>> TID = 6856 clSetUserEventStatus: event = 0000000038537600, execution_status = CL_COMPLETE
<<<< TID = 6856 clSetUserEventStatus
>>>> TID = 6856 clReleaseEvent: [ ref count = 2 ] event = 0000000038537600
<<<< TID = 6856 clReleaseEvent: [ ref count = 1 ]
>>>> TID = 6856 clCreateUserEvent
<<<< TID = 6856 clCreateUserEvent: returned 000000003853A480
>>>> TID = 6856 clRetainEvent: [ ref count = 1 ] event = 000000003853A480
<<<< TID = 6856 clRetainEvent: [ ref count = 2 ]
>>>> TID = 6856 clSetUserEventStatus: event = 000000003853A480, execution_status = CL_COMPLETE
<<<< TID = 6856 clSetUserEventStatus
...

Steps to Reproduce

To repro the issue, please use GITS stream from mwalkowi\clIntercept-repros (from Intel internal IGK share - please contact me for details). I have put there also full log and registry keys I have used (.reg file).

To run it you need to gitsPlayer from mwalkowi\gits (from share) - I am using version 186 combined with fix from mwalkowi\gits\hotfix-gits-169 (to support device only staff correctly).

Cannot get cliloader result for python app with virtualenv on Windows

Observed Behavior

  • (my_env) cliloader.exe python [app_name.py] => No cliloader log (Total Enqueues: 0)
  • (my_env) cliloader.exe /actual/path/to/python [app_name.py] => cliloader log is obtained as expected

Desired Behavior

  • Even with python virtualenv, it is expected to provide clilog

(Describe the behavior that you would like to see instead.)

Steps to Reproduce

  • C:\Users\xxxx\AppData\Local\Programs\Python\Python310\python.exe -m venv my_env
  • run my_env\Script\Activate
  • run python app

(my_env) HOME_NAME> cliloader.exe python run_app.py -d GPU
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CLIntercept (64-bit) is loading...
CLIntercept file location: ..\opencl.dll
CLIntercept URL: https://github.com/intel/opencl-intercept-layer
CLIntercept git description: v3.0.3-56-g10a578e
CLIntercept git refspec: refs/heads/main
CLIntercept git hash: 10a578e
CLIntercept optional features:
cliloader(supported)
cliprof(supported)
kernel overrides(supported)
ITT tracing(NOT supported)
MDAPI(supported)
Demangling(NOT supported)
clock(steady_clock)
CLIntercept environment variable prefix: CLI_
CLIntercept registry key: SOFTWARE\INTEL\IGFX\CLINTERCEPT
Trying to load dispatch from: real_opencl.dll
Couldn't load library: real_opencl.dll
Trying to load dispatch from: C:\windows/syswow64/opencl.dll
Couldn't load library: C:\windows/syswow64/opencl.dll
Trying to load dispatch from: C:\windows/system32/opencl.dll
Couldn't get exported function pointer to: clGetGLContextInfoKHR
... success!
Control AppendFiles is set to non-default value: true
Control LogIndent is set to non-default value: 48
Control CallLogging is set to non-default value: true
Control CallLoggingEnqueueCounter is set to non-default value: true
Control CallLoggingThreadId is set to non-default value: true
Control CallLoggingThreadNumber is set to non-default value: true
Control ChromeCallLogging is set to non-default value: true
Control EventCallbackLogging is set to non-default value: true
Control QueueInfoLogging is set to non-default value: true
Control EventChecking is set to non-default value: true
Control LeakChecking is set to non-default value: true
Control USMChecking is set to non-default value: true
Control DumpDir is set to non-default value: D:/mylog/DumpCLI
Control ReportToStderr is set to non-default value: true
Control HostPerformanceTiming is set to non-default value: true
Control HostPerformanceTimeLogging is set to non-default value: true
Control DevicePerformanceTimeLogging is set to non-default value: true
Control DevicePerformanceTimelineLogging is set to non-default value: true
Control ChromePerformanceTiming is set to non-default value: true
Control ChromePerformanceTimingInStages is set to non-default value: true
Control ChromePerformanceTimingPerKernel is set to non-default value: true
Timer Started!
... loading complete.
Total Enqueues: 0

Leak Checking:

(Describe how to reproduce the issue. Some information you may want to include:

  • Any Intercept Layer for OpenCL Applications controls you are setting.
  • The application you ran. For large or proprietary applications, it will likely be easier to address your issue if you include a simpler reproducer.
  • The OpenCL implementation and device you tested on.
  • Your operating system, and any other relevant system information.

Some of this information may automatically be included if you attach your Intercept Layer for OpenCL Applications log, particularly if you set CLInfoLogging.)

dlopen libOpenCL.so crash

Observed Behavior

Dear developer:
I meet an issue on android armeabi-v7a .I use cmake (NDK21) to build the libOpenCL.so, the size is 14.6M using the lastest repo.
The build step is like following scripts:
cmake ..
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake
The ANDROID_NDK is android-ndk-r21. I rename the true libOpenCL.so to real_libOpenCL,so and then push the generated libOpenCL.so to /system/vendor/lib directory. However, when I dlopen("libOpenCL.so," RTLD_NOW | RTLD_LOCAL);
It crashed , the crash log is like
08-11 14:21:50.135 13890 13890 F DEBUG : Revision: '0'
08-11 14:21:50.135 13890 13890 F DEBUG : ABI: 'arm'
08-11 14:21:50.135 13890 13890 F DEBUG : SYSVMTYPE: Maple
08-11 14:21:50.135 13890 13890 F DEBUG : APPVMTYPE: Unknown
08-11 14:21:50.136 13890 13890 F DEBUG : Timestamp: 2020-08-11 14:21:50+0800
08-11 14:21:50.136 13890 13890 F DEBUG : pid: 13887, tid: 13887,
08-11 14:21:50.136 13890 13890 F DEBUG : uid: 0
08-11 14:21:50.136 13890 13890 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xfffffff4
08-11 14:21:50.136 13890 13890 F DEBUG : r0 00000000 r1 00000000 r2 00000041 r3 00000051
08-11 14:21:50.136 13890 13890 F DEBUG : r4 efcf8340 r5 ffa85770 r6 00000001 r7 ffa85768
08-11 14:21:50.136 13890 13890 F DEBUG : r8 f013c4c0 r9 00000041 r10 f010a0f0 r11 f08af17c
08-11 14:21:50.136 13890 13890 F DEBUG : ip 20000000 sp ffa85730 lr efc735db pc efc98004
08-11 14:21:50.141 13890 13890 F DEBUG :
08-11 14:21:50.141 13890 13890 F DEBUG : backtrace:
08-11 14:21:50.141 13890 13890 F DEBUG : #00 pc 0007d004 /vendor/lib/libOpenCL.so (_ZNSt6__ndk124__put_character_sequenceIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_PKS4_j+40) (BuildId: 06dfb7dbfd7865f319ddeff45062aa64aebe7fbd)
08-11 14:21:50.141 13890 13890 F DEBUG : #1 pc 000585d7 /vendor/lib/libOpenCL.so (CLIntercept::log(std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator> const&)+290) (BuildId: 06dfb7dbfd7865f319ddeff45062aa64aebe7fbd)
08-11 14:21:50.141 13890 13890 F DEBUG : #2 pc 0005360d /vendor/lib/libOpenCL.so (CLIntercept::init()+8172) (BuildId: 06dfb7dbfd7865f319ddeff45062aa64aebe7fbd)
Do you have any idea to fix it? Is it caused by my build steps?
Thanks

DevicePerformanceTimelineLogging outputs values as 32-bit instead of 64-bit

Observed Behavior

When the performance control DevicePerformanceTimelineLogging is set to 1, this layer outputs the timeline as 32-bit values

Desired Behavior

Since the OpenCL performance counters are 64-bit, the full 64-bit value should be output

Steps to Reproduce

  • Set DevicePerformanceTimelineLogging=1 and run any OpenCL application.
  • Observed on an Intel Skylake system running Windows 10.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.