oneapi-src / unified-runtime Goto Github PK

View Code? Open in Web Editor NEW

31.0 21.0 108.0 15.53 MB

Home Page: https://oneapi-src.github.io/unified-runtime/

License: Other

CMake 1.75% Batchfile 0.07% CSS 0.01% HTML 0.01% Python 2.51% Mako 1.31% C 10.73% C++ 83.63%

unified-runtime's Introduction

Unified Runtime

Unified Runtime

Contents of the repo

This repo contains the following:

API specification in YaML
API programming guide in RST
Loader and a null adapter implementation (partially generated)
Example applications
API C/C++ header files (generated)
API Python module (generated)
Sample C++ wrapper (generated)
Sample C/C++ import library (generated)

Integration

The recommended way to integrate this project into another is via CMake's FetchContent, for example:

include(FetchContent)

FetchContent_Declare(
    unified-runtime
    GIT_REPOSITORY https://github.com/oneapi-src/unified-runtime.git
    GIT_TAG main  # This will pull the latest changes from the main branch.
)
FetchContent_MakeAvailable(unified-runtime)

add_executable(example example.cpp)
target_link_libraries(example PUBLIC unified-runtime::headers)

Weekly tags

Each Friday at 23:00 UTC time a prerelease tag is created which takes the form weekly-YYYY-MM-DD. These tags should be used by downstream projects which intend to track development closely but maintain a fixed point in history to avoid pulling potentially breaking changes from the main branch.

Third-Party tools

Tools can be acquired via instructions in third_party.

Building

The requirements and instructions below are for building the project from source without any modifications. To make modifications to the specification, please see the Contribution Guide for more detailed instructions on the correct setup.

Requirements

Required packages:

C++ compiler with C++17 support
CMake >= 3.20.0
Python v3.6.6 or later

Windows

Generating Visual Studio Project. EXE and binaries will be in build/bin/{build_config}

$ mkdir build
$ cd build
$ cmake {path_to_source_dir} -G "Visual Studio 15 2017 Win64"

Linux

Executable and binaries will be in build/bin

$ mkdir build
$ cd build
$ cmake {path_to_source_dir}
$ make

CMake standard options

List of options provided by CMake:

Name	Description	Values	Default
UR_BUILD_EXAMPLES	Build example applications	ON/OFF	ON
UR_BUILD_TESTS	Build the tests	ON/OFF	ON
UR_BUILD_TOOLS	Build tools	ON/OFF	ON
UR_FORMAT_CPP_STYLE	Format code style	ON/OFF	OFF
UR_DEVELOPER_MODE	Treat warnings as errors and enables additional checks	ON/OFF	OFF
UR_ENABLE_FAST_SPEC_MODE	Enable fast specification generation mode	ON/OFF	OFF
UR_USE_ASAN	Enable AddressSanitizer	ON/OFF	OFF
UR_USE_TSAN	Enable ThreadSanitizer	ON/OFF	OFF
UR_USE_UBSAN	Enable UndefinedBehavior Sanitizer	ON/OFF	OFF
UR_USE_MSAN	Enable MemorySanitizer (clang only)	ON/OFF	OFF
UR_ENABLE_TRACING	Enable XPTI-based tracing layer	ON/OFF	OFF
UR_ENABLE_SANITIZER	Enable device sanitizer layer	ON/OFF	ON
UR_CONFORMANCE_TARGET_TRIPLES	SYCL triples to build CTS device binaries for	Comma-separated list	spir64
UR_CONFORMANCE_AMD_ARCH	AMD device target ID to build CTS binaries for	string	`""`
UR_CONFORMANCE_ENABLE_MATCH_FILES	Enable CTS match files	ON/OFF	ON
UR_BUILD_ADAPTER_L0	Build the Level-Zero adapter	ON/OFF	OFF
UR_BUILD_ADAPTER_OPENCL	Build the OpenCL adapter	ON/OFF	OFF
UR_BUILD_ADAPTER_CUDA	Build the CUDA adapter	ON/OFF	OFF
UR_BUILD_ADAPTER_HIP	Build the HIP adapter	ON/OFF	OFF
UR_BUILD_ADAPTER_NATIVE_CPU	Build the Native-CPU adapter	ON/OFF	OFF
UR_BUILD_ADAPTER_ALL	Build all currently supported adapters	ON/OFF	OFF
UR_HIP_PLATFORM	Build HIP adapter for AMD or NVIDIA platform	AMD/NVIDIA	AMD
UR_ENABLE_COMGR	Enable comgr lib usage	AMD/NVIDIA	AMD
UR_DPCXX	Path of the DPC++ compiler executable to build CTS device binaries	File path	`""`
UR_DEVICE_CODE_EXTRACTOR	Path of the `clang-offload-extract` executable from the DPC++ package, required for CTS device binaries	File path	`"${dirname(UR_DPCXX)}/clang-offload-extract"`
UR_DPCXX_BUILD_FLAGS	Build flags to pass to DPC++ when compiling device programs	Space-separated options list	`""`
UR_SYCL_LIBRARY_DIR	Path of the SYCL runtime library directory to build CTS device binaries	Directory path	`""`
UR_HIP_ROCM_DIR	Path of the default ROCm HIP installation	Directory path	`$ENV{ROCM_PATH}` or `/opt/rocm`
UR_HIP_INCLUDE_DIR	Path of the ROCm HIP include directory	Directory path	`${UR_HIP_ROCM_DIR}/include`
UR_HIP_HSA_INCLUDE_DIRS	Path of the ROCm HSA include directory	Directory path	`${UR_HIP_ROCM_DIR}/hsa/include;${UR_HIP_ROCM_DIR}/include`
UR_HIP_LIB_DIR	Path of the ROCm HIP library directory	Directory path	`${UR_HIP_ROCM_DIR}/lib`

Additional make targets

To run automated code formatting, configure CMake with UR_FORMAT_CPP_STYLE option and then run a custom cppformat target:

$ make cppformat

If you've made modifications to the specification, you can also run a custom generate target prior to building. It will generate the source code and run automated code formatting:

$ make generate

This target has additional dependencies which are described in the Build Environment section of the Contribution Guide.

Contributions

For those who intend to make a contribution to the project please read our Contribution Guide for more information.

Adapter naming convention

To maintain consistency and clarity in naming adapter libraries, it is recommended to use the following naming convention:

On Linux platforms, use libur_adapter_[name].so.
On Windows platforms, use ur_adapter_[name].dll.

Source code generation

Code is generated using included Python scripts.

Documentation

Documentation is generated from source code using Sphinx - see scripts dir for details.

Release Process

Unified Runtime releases are aligned with oneAPI releases. Once all changes planned for a release have been accepted, the release process is defined as:

Create a new release branch based on the main branch taking the form v<major>.<minor>.x where x is a placeholder for the patch version. This branch will always contain the latest patch version for a given release.
Create a PR to increment the CMake project version on the main and merge before accepting any other changes.
Create a new tag based on the latest commit on the release branch taking the form v<major>.<minor>.<patch>.
Create a new GitHub release using the tag created in the previous step.
- Prior to version 1.0, check the Set as a pre-release tick box.
Update downstream projects to utilize the release tag. If any issues arise from integration, apply any necessary hot fixes to v<major>.<minor>.x branch and go back to step 3.

unified-runtime's People

Contributors

Stargazers

Watchers

Forkers

franklandjack smaslov-intel jatinx pvchupin kswiecicki pbalcer tcreech-intel veselypeta kbenzie patkamin lukaszstolarczuk igchor damianduy jandres742 xcleancode fabiomestre callumfare aarongreig martygrant sudhirverma be-secure ldorau bmyates yingcong-wu wlemkows omarahmed1111 nawrinsu bratpiorka szadam bensuo isaacault rdeodhar kseniyatikhomirova dm-vodopyanov kurapov-peter jackakirk mfrancepillois przemektmalon npmiller vinser52 alcpz wee-free-scot bb-ur uwedolinsky sarnex ewanc hdelan 0x12cc seanst98 tszczyp nrspruit al42and konradkusiak97 steffenlarsen naghasan jchlanda reble martinwehking tovinkere lplewa jinz2014 mmoadeli georgeweb chenweicdw dbduncan ykhatav allanzyne alexbatashev pasaulais pietroghg zwalczyx wenju-he ldrumm maarquitos14 againull raiyanla duncanmcbain progtx febuiles sommerlukas jzc cppchedy hotakayagi wangzy0327 winstonzhang-intel rafbiels jsji ph0b zhaomaosu rossbrunton jinge90 eminsight lbushi25 frasercrmck abagusetty ad2605 ayylol aero-project-eu caozhongz ianayl

unified-runtime's Issues

Consider semantic parameter verification/valid usage for arguments

Unified Runtime currently supports parameter verification for some arguments based on their type. For example, pointer and handle (which is technically still a pointer) arguments are automatically checked against NUL. The generation of the error language in the spec appears to be automatic and based only on the type of the argument.

(Aside: In some cases this is too strict, e.g. the out parameter for the event argument on enqueue APIs should probably be optional: https://github.com/oneapi-src/unified-runtime/blob/main/include/zer_api.h#L859.)

More generally, we should consider whether (like OpenCL) we want to have verification for the semantics of arguments where appropriate e.g. checking that a read/write command will not overflow a buffer. In OpenCL the clEnqueue(Read|Write)Buffer APIs require that the entry point returns:

CL_INVALID_VALUE if the region being read or written specified by (offset, size) is out of bounds or if ptr is a NULL value.

We need to decide whether we want parameter verification of this kind since in total for all the APIs there will be a lot of it. We will also need to write negative tests for all these cases once we have a test suite.

Update CMakeLists.txt to better support FetchContent

As noted in #29 downstream projects want to use FetchContent to pull in the unified-runtime into their builds. Currently the CMakeLists.txt do not follow best practices for use with FetchContent, e.g. no targets are export.

We should properly support FetchContent, and include usage examples in README.md. The main use case, for this initial support, should be to provide a target which only provides the include path for the header files with all other parts of the build disabled.

Fix ur.py syntax error

When attempting to import the ur.py module with ipython the follow syntax error occured:

[ins] In [1]: import ur
Traceback (most recent call last):

  File ~/.local/lib/python3.10/site-packages/IPython/core/interactiveshell.py:3378 in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  Cell In [1], line 1
    import ur

  File ~/Projects/oneapi-src/unified-runtime/include/ur.py:1815
    _urDeviceSelectBinary_t = WINFUNCTYPE( ur_result_t, ur_device_handle_t, POINTER(c_ubyte*), c_ulong, POINTER(c_ulong) )
                                                                                            ^
SyntaxError: invalid syntax

Looks like argument which are pointer to pointer types are not being handled correctly. There may be other syntax issues hidden by this error also.

Fix the syntax errors, and make sure the ur.py module can, at the very least, be imported. Ideally this would also introduce CI testing to avoid regressions happening in future. More extensive testing, actually using the library, will probably have to wait though.

Unified Runtime Command Buffers

Command buffers are a tried and tested solution to scheduling bottle necks when dispatching commands to hardware (see Vulkan, Level Zero, Metal and cl_khr_command_buffers). Unified Runtime is low level enough that it could support command buffers as a core feature (potentially replacing the regular enqueue APIs completely). This is something that has been raised in working group discussions and was met with a generally favorable response.

Remove first parameter from ur{Object}CreateWithNativeHandle APIs

The Problem

The various ur{Object}CreateWithNativeHandle APIs seem to take an unnecessary first argument of the type they are going to create.

Example:

urPlatformCreateWithNativeHandle has the following signature:

UR_APIEXPORT ur_result_t UR_APICALL
urPlatformCreateWithNativeHandle(
    ur_platform_handle_t hPlatform,                 ///< [in] handle of the platform instance
    ur_native_handle_t hNativePlatform,             ///< [in] the native handle of the platform.
    ur_platform_handle_t* phPlatform                ///< [out] pointer to the handle of the platform object created.
    );

It is unclear what the meaning of the first hPlatform parameter is. This API is designed to create a ur_platform_handle_t from some native handle, so why it needs to accept one as an input parameter is confusing. This also doesn't match the corresponding API in PI: piPlatformCreateWithNativeHandle:

__SYCL_EXPORT pi_result piextPlatformCreateWithNativeHandle(
    pi_native_handle nativeHandle, pi_platform *platform);

which has no first argument.

Some create-with-native APIs in PI accept a context (e.g. piextProgramCreateWithNativeHandle) these have been documented below. Some APIs in PI also have an argument for ownership which is missing in UR (although this is beyond the scope of this ticket).

Task

Remove these unnecessary arguments.

The full list of APIs are:

urContextCreateWithNativeHandle - Takes unnecessary ur_platform_handle_t as its first argument.
urEventCreateWithNativeHandle - Takes unnecessary ur_platform_handle_t as its first argument. (Although should maybe take a ur_context_handle_t)
urMemCreateWithNativeHandle - Takes unnecessary ur_platform_handle_t as its first argument. (Although should maybe take a ur_context_handle_t)
urQueueCreateWithNativeHandle - Takes unnecessary ur_queue_handle_t as its first argument. (Although should maybe take a ur_context_handle_t)
urSamplerCreateWithNativeHandle - Takes unnecessary ur_sampler_handle_t as its first argument (Although should maybe take a ur_context_handle_t)
urKernelCreateWithNativeHandle - Takes unnecessary ur_platform_handle_t as its first argument. (Although should maybe take a ur_context_handle_t)
urModuleCreateWithNativeHandle - Takes unnecessary ur_platform_handle_t as its first argument. (Although should maybe take a ur_context_handle_t)
urPlatformCreateWithNativeHandle - Takes unnecessary ur_platform_handle_t as its first argument.
urProgramCreateWithNativeHandle - Takes unnecessary ur_program_handle_t as its first argument. (Although should maybe take a ur_context_handle_t)

Redesign entry points taking in/out parameter

Issue

Some Unified Runtime entry points are taking in/out parameters as pointers and treating the argument as an in or out parameter depending on the value at the address held in the pointer. Whilst this approach does allow the spec to reduce the number of arguments passed to these APIs by one, the semantics is confusing especially when coming from PI and OpenCL where the equivalent APIs have dedicated out parameters. It is also error prone if the caller accidentally leaves the value uninitialized which could result in indeterminate behavior at runtime.

Note that this pattern seems to have been adopted inconsistently throughout the UR spec, some entry points use it and some use the PI/OpenCL style (see below)

Example

urDeviceGet has the following documentation:

urDeviceGet(
    ur_platform_handle_t hPlatform,                 ///< [in] handle of the platform instance
    ur_device_type_t DevicesType,                   ///< [in] the type of the devices.
    uint32_t* pCount,                               ///< [in,out] pointer to the number of devices.
                                                    ///< If count is zero, then the call shall update the value with the total
                                                    ///< number of devices available.
                                                    ///< If count is greater than the number of devices available, then the
                                                    ///< call shall update the value with the correct number of devices available.
    ur_device_handle_t* phDevices                   ///< [out][optional][range(0, *pCount)] array of handle of devices.
                                                    ///< If count is less than the number of devices available, then platform
                                                    ///< shall only retrieve that number of devices.
    );

where the pCount is treated as an out parameter if the value at that address is zero and otherwise is the size of the value the array that will be passed via phDevices parameter.

Compare this to the piDevicesGet entry point:

__SYCL_EXPORT pi_result piDevicesGet(pi_platform platform,
                                     pi_device_type device_type,
                                     pi_uint32 num_entries, pi_device *devices,
                                     pi_uint32 *num_devices);

This has dedicated num_entries representing the size of the array passed in devices and a num_devices parameter which will return the number of devices in the platform of type device_type to the caller.

This is also the style used in OpenCL with clGetDeviceIDs:

cl_int clGetDeviceIDs(
    cl_platform_id platform,
    cl_device_type device_type,
    cl_uint num_entries,
    cl_device_id* devices,
    cl_uint* num_devices);

Other APIs such as urPlatformGetInfo unconditionally treat parameters as in/out updating the value at the address passed after first reading it:

UR_APIEXPORT ur_result_t UR_APICALL
urPlatformGetInfo(
    ur_platform_handle_t hPlatform,                 ///< [in] handle of the platform
    ur_platform_info_t PlatformInfoType,            ///< [in] type of the info to retrieve
    size_t* pSize,                                  ///< [in,out] pointer to the number of bytes needed to return info queried.
                                                    ///< the call shall update it with the real number of bytes needed to
                                                    ///< return the info
    void* pPlatformInfo                             ///< [out][optional] array of bytes holding the info.
                                                    ///< if *pSize is not equal to the real number of bytes needed to return
                                                    ///< the info then the ::UR_RESULT_ERROR_INVALID_SIZE error is returned and
                                                    ///< pPlatformInfo is not used.
    );

In this case the value at pSize is unconditionally updated by the entry point however its value as an in parameter is only used in the case that pPlatformInfo is non-null.

Task

Replace the in/out parameters with dedicated in and out parameters for the following entry points:

Allocation functions, memory transfers and context

We've been investigating changing the PI interface for memory allocations and also to some extent for memory transfers, which in turns also changes some of the meaning of the PI context. A lot of the reasoning for these changes is based on how the SYCL DPC++ runtime currently works, but it would be good to consider them for the Unified Runtime.

The changes are:

Add a pi_device argument to buffer and image allocation entry points (piMemBufferCreate, piMemImageCreate). It doesn't necessarily mean that the allocation will only be usable on that device, but it's helpful for backends that don't natively support context style allocations. For the DPC++ SYCL runtime this makes a lot of sense because we already do lazy allocation so when we call these functions we always already know the exact device targeted and not just the context (the SYCL context_bound property is not currently implemented in DPC++).
Add a new query piextGetMemoryConnection that takes two pairs of (pi_device, pi_context), and returns information on how the memory can or should be handled between the two pairs. It currently has three options:
- PI_MEMORY_CONNECTION_NONE: memory in the first (context, device) pair cannot be used or migrated by the plugin into the second (context, device) pair, copies through host are necessary.
- PI_MEMORY_CONNECTION_MIGRATABLE: memory in the first (context, device) pair cannot be used directly by the second (context, device) pair, but the plugin can handle migrating data between the two (piEnqueueMemBufferCopy).
- PI_MEMORY_CONNECTION_UNIFIED: memory in the first (context, device) pair is usable in the second pair.

And with these two changes it means that a backend that doesn't natively support context-style allocations doesn't have to emulate them anymore, and can simply allocate for a specific device and report that the memory still needs to be migrated between devices in the same context. And a device that does support context-style allocations can ignore the pi_device passed to the allocation functions and then simply report PI_MEMORY_CONNECTION_UNIFIED when the contexts are identical, and PI_MEMORY_CONNECTION_NONE when the contexts are different. In addition it also means that we can let plugins inform us if they can optimize memory copies between different context by reporting PI_MEMORY_CONNECTION_MIGRATABLE, which would mean that piEnqueueMemBufferCopy is supported between the two contexts and may be more efficient than doing a copy through host.

And so to circle back to the initial motivation, CUDA doesn't have context-style memory allocations like OpenCL or PI, and so to support having multiple CUDA devices in the same pi_context we would have to roll out our own memory manager in the CUDA plugin (which I believe the LevelZero plugin also does), but since the SYCL runtime already has a memory manager, these PI plugin changes allow us to simply defer the management of memory allocations within the same context for the CUDA plugin to the SYCL runtime.

You can see more discussions and initial implementations of this on the following PR:

intel/llvm#6446

Stop using _<name> reserved identifiers

Unified Runtime appears to have followed the example of OpenCL and used _<name> in typedefs of the opaque API objects e.g. _ur_module_handle_t.

This is actually illegal in both the C and C++ specifications as all identifiers with the _ prefix are reserved for standard library implementation and internals.

We should rename the structs in this pattern.

Introduce a clGetDeviceAndHostTimer analogue

intel/llvm#7526 introduces the ability to query the device timestamp, this is an analogue of clGetDeviceAndHostTimer. We should also introduce this into the UR spec.

Missing device queries

The following list of device queries are present in PI and missing in Unified Runtime. If we are to have parity with PI we should consider whether we need to add these queries:

PI_DEVICE_INFO_DEVICE_ID
PI_DEVICE_INFO_GPU_EU_COUNT_PER_SUBSLICE
PI_DEVICE_INFO_BUILD_ON_SUBDEVICE
PI_EXT_INTEL_DEVICE_INFO_FREE_MEMORY
PI_EXT_INTEL_DEVICE_INFO_MEMORY_CLOCK_RATE
PI_EXT_INTEL_DEVICE_INFO_MEMORY_BUS_WIDTH
PI_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES
PI_DEVICE_INFO_GPU_HW_THREADS_PER_EU
PI_DEVICE_INFO_BACKEND_VERSION
PI_EXT_ONEAPI_DEVICE_INFO_BFLOAT16
PI_EXT_ONEAPI_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS
PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_1D
PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_2D
PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_3D
PI_EXT_ONEAPI_DEVICE_INFO_CUDA_ASYNC_BARRIER

Redefinition of typedefs in ur_api.h

Issue

The ur_api.h header redefines typdefs. Redefinition of typdefs is a C11 feature. Presumably we want to support earlier C versions.

Example

To reproduce, compile the ur_api.h header on linux with -Werror, -Wtypedef-redefinition, this should give something like:

ur_api.h:274:39: error: redefinition of typedef 'ur_base_properties_t' is a C11 feature [-Werror,-Wtypedef-redefinition]                       
typedef struct _ur_base_properties_t ur_base_properties_t;                                                                                                                                   
                                      ^                                                                                                                                                        
ur_api.h:241:3: note: previous definition is here                                                                                               
} ur_base_properties_t; /source/ur/external/unified-runtime/include/ur_api.h:278:33: error: redefinition of typedef 'ur_base_desc_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef struct _ur_base_desc_t ur_base_desc_t;
                                ^
ur_api.h:250:3: note: previous definition is here
} ur_base_desc_t;
  ^
ur_api.h:282:35: error: redefinition of typedef 'ur_rect_offset_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef struct _ur_rect_offset_t ur_rect_offset_t;
                                  ^
ur_api.h:260:3: note: previous definition is here
} ur_rect_offset_t;
  ^
ur_api.h:286:35: error: redefinition of typedef 'ur_rect_region_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef struct _ur_rect_region_t ur_rect_region_t;
                                  ^
ur_api.h:270:3: note: previous definition is here
} ur_rect_region_t;
  ^
ur_api.h:1702:3: error: redefinition of typedef 'ur_image_format_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_image_format_t;
  ^
ur_api.h:290:36: note: previous definition is here
typedef struct _ur_image_format_t ur_image_format_t;
                                   ^
ur_api.h:1720:3: error: redefinition of typedef 'ur_image_desc_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_image_desc_t;
  ^
ur_api.h:294:34: note: previous definition is here
typedef struct _ur_image_desc_t ur_image_desc_t;
                                 ^
ur_api.h:1848:3: error: redefinition of typedef 'ur_buffer_region_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_buffer_region_t;
  ^
ur_api.h:298:37: note: previous definition is here
typedef struct _ur_buffer_region_t ur_buffer_region_t;
                                    ^
ur_api.h:2211:3: error: redefinition of typedef 'ur_sampler_property_value_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_sampler_property_value_t;
  ^
ur_api.h:302:46: note: previous definition is here
typedef struct _ur_sampler_property_value_t ur_sampler_property_value_t;
                                             ^
ur_api.h:2816:3: error: redefinition of typedef 'ur_device_partition_property_value_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_device_partition_property_value_t;
  ^
ur_api.h:306:55: note: previous definition is here
typedef struct _ur_device_partition_property_value_t ur_device_partition_property_value_t;

Task

Remove the redefinition typdefs for the above types.

Entry points should not use `size_t` parameters.

Currently the UR spec uses a mix of size_t and uint32_t params.

For example urDevicePartition uses uint32_t https://spec.oneapi.io/unified-runtime/latest/core/api.html#zerdevicegetinfo.
while, urDeviceInfo uses size_t https://github.com/oneapi-src/unified-runtime/blob/main/scripts/core/device.yml#L280

We should make make these all uint32_t as size_t is platform dependent.

Add piEventSetCallback entry point

Issue

PI supports a piEventSetCallback entry point. Despite the fact this entry point has no uses in the DPC++ runtime it is quite a useful API for compute runtimes to support allowing the callback to trigger work when a command or set of commands completes.

Unified Runtime currently has no urEventSetCallback entry point.

Task

Add urEventSetCallback to the Unified Runtime spec.

Add urMemGetInfo entry point

Issue

PI supports a piMemGetInfo entry point which has usage in the DPC++ runtime.

Unified Runtime currently doesn't have a urMemGetInfo entry point.

Task

Add a urMemGetInfo entry point to the Unified Runtime spec.

Make Parameter Naming Consistent

In general the spec appears to be following the convention that pointer parameters are given the prefix p and opaque UR API object handle parameters are given the prefix h.

For example in the urDeviceGet entry point:

UR_APIEXPORT ur_result_t UR_APICALL urDeviceGet(ur_platform_handle_t hPlatform, ur_device_type_t DevicesType, uint32_t *pCount, ur_device_handle_t *phDevices)

the platform parameter of type ur_platform_handle_t has the name hPlatform and the device count (which is an out parameter of type uint32_t*) is called pCount.

However this convention doesn't appear to have been followed consistently throughout the spec.

For example urEventCreate takes a context parameter of type ur_context_handle_t with the name context rather than hContext: https://github.com/oneapi-src/unified-runtime/blob/main/scripts/core/event.yml#L52

And the urEnqueueUSMMemcpy entry point takes an out parameter of type ur_event_handle_t* called eventWaitList not pEventWaitList: https://github.com/oneapi-src/unified-runtime/blob/main/scripts/core/enqueue.yml#L963

In general these typos show up in quite a lot of places. We should do a pass through the spec and make them consistent.

Fix duplicate API value name substrings

Some enumeration values have the same substring repeated twice e.g. UR_EVENT_INFO_EVENT_INFO_COMMAND_QUEUE, we should figure out why this is happening and stop it, the duplicate name doesn't add any information and just makes things verbose.

Consider interface to expose P2P capabilities

Some devices have peer to peer capabilities for memory transfers and/or USM buffers, we may need to consider a way to expose this in the unified runtime.

There's currently an extension proposal in DPC++ to handle this, with some discussion on it:

intel/llvm#6104

It's currently only a SYCL level extension but I suspect implementing it would require some changes in PI or UR, so we should consider it.

Add urQueueFush

PI contains a API function PIQueueFlush which is not yet been added to UnifiedRuntime. We should add this to have feature parity with PI.

https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/detail/pi.h#L1128

There is an equivalent OpenCL function: https://registry.khronos.org/OpenCL/sdk/1.0/docs/man/xhtml/clFlush.html

Decide if urEventSetStatus entry point is required

Issue

PI supports a piEventSetStatus entry point. This entry point doesn't seem to have any uses in the DPC++ runtime and would only really be of value if we intend to support user events which is something we've previously discussed not including in the Unified Runtime spec due to the burdensome device <-> host synchronization requirement they place on an adapter.

Task

Decide if the a urEventSetStatus entry point is required in the Unified Runtime spec. If it is then add it, if it is not then no action is required.

Add cmake machinery to regenerate generated files as part of build when generator sources change

Currently we commit the ur_api.h header to this repo. ur_api.h (and the other generated files not currently committed) need to be manually regenerated any time the yaml source files to the generator, or the generator scripts themselves are updated.

It would be useful to have a dependency on the generator sources so that anyone changing those files and who is using the CMake targets to integrate the headers (for example) into their build (perhaps in a downstream project), doesn't need to manually keep track of when they need to regenerate the headers, it will just happen automatically as part of their build if they change any of the sources.

I think in CMake this could be done with the use of add_custom_command and add_custom_target.

Consider typedefing ur_bool

Any boolean values passed to APIs in Unified Runtime are currently of bool type. In C bool has to be large enough to hold the true and false values, other than that there are no restrictions on its size.

Other compute frameworks like OpenCL typically define their own typedefs of the various integer types in C to avoid parameters to entry points having different sizes. For example, OpenCL defines a typdef cl_uint cl_bool (where cl_uint is always 32 bits).

PI defines a pi_bool type and the global values PI_TRUE/PI_FALSE for comparisons. A ur_bool_t type has also been added to Unified Runtime (although it doesn't appear to be used in any entry points).

We should consider whether we want to replace bool usage with ur_bool_t in Unified Runtime and define some global UR_TRUE/UR_FALSE values.

Alternatively we could follow the Vulkan example of using the predefined platform independent C types defined in stdint.h and replace bool usage with a type of known width such as int8_t and specify non-zero values as truthy.

This would also address the issues raise in: #3

Restructure Directory Hierarchy

Issues

Following discussions on #22 it has been decided that this repository will act as a monorepo for the various components of Unified Runtime: headers, spec, loader, utilities adapters, tests etc.

Since this repo currently only contains the spec and headers it will need to be logically restructured to host the other components in a sensible way.

Task

Design and agree on a directory hierarchy for this repo making sure it includes logical locations for the following components:

spec (and spec generation tools)
headers
loader
utility libraries (e.g. command buffer software implementation)
test suite
adapters

Actually restructure the repo.

Decide if piEnqueueNativeKernel entry point is required

Issue

PI supports a piEnqueueNativeKernel entry point. This entry point doesn't seem to have any uses in the DPC++ runtime so its value is questionable from the perspective of implementing SYCL however it may be of use to other language runtimes.

Task

Decide if the a urEnqueueNativeKernel entry point is required in the Unified Runtime spec. If it is then add it, if it is not then no action is required.

Add ci to ensure the header is up to date

When the spec is updated the API header should also be updated. Add a GitHub Action workflow to ensure PR's are updating the header when he spec is changed.

Add get-last-error functionality

The PI plugin layer has an API for querying the last error from a plugin implementation/adapter e.g. https://github.com/intel/llvm/blob/sycl/sycl/plugins/opencl/pi_opencl.cpp#L87. To get parity with PI we will need to add something like this to unified runtime.

Initially we could just have an API that matches that in PI i.e.

ur_result_t zerGetLastError(char **message);

where the returned value is the last reported error code and message is an out parameter which is set to point at some adapter specific string containing a detailed message about the context of the error value.

This approach isn't ideal, since the memory returned is owned by the adapter so any caller would need to make a copy of the returned string if they want it to remain consistent, concurrent accesses may also be problematic and in the above declaration there is no way to get the size of the memory without assuming the string is null terminated, so these things would all need to be documented in the spec. In general though this is a tricky problem to solve with a C API.

What SPIR-V dialect does Unified Runtime consume?

The urProgramCreate API seems to mandate that ur_module_handle_ts are SPIR-V modules:

Create Program from input SPIR-V modules.

However we don't specify which SPIR-V dialect we can consume. It should probably either reference the OpenCL environment spec or a new Unified Runtime dialect.

`urPlatformGet` & `urDeviceGet` should return valid error code.

urPlatformGet currently says that "If phPlatforms is not NULL, then NumEntries but be greater than zero.", However we do not specify what should be returned in this case.

We should specify that urPlatformGet and urDeviceGet will return UR_RESULT_ERROR_INVALID_SIZE or something similar.

Missing _ur_event_status enumeration

Unified Runtime is missing the enumeration of values for querying and setting event statuses.

PI has the following (which matches OpenCL):

typedef enum {
  PI_EVENT_COMPLETE = 0x0,
  PI_EVENT_RUNNING = 0x1,
  PI_EVENT_SUBMITTED = 0x2,
  PI_EVENT_QUEUED = 0x3
} _pi_event_status;

Unified Runtime already has the ability to query this property via urEventGetInfo with the UR_EVENT_INFO_EVENT_INFO_COMMAND_EXECUTION_STATUS query, so we should add an enumeration for the values the returned argument can be set to.

Add urQueueFinish

PI contains a API function PIQueueFinish which is not yet been added to UnifiedRuntime. We should add this to have feature parity with PI.

https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/detail/pi.h#L1126

There is an equivalent OpenCL function: https://registry.khronos.org/OpenCL/sdk/1.0/docs/man/xhtml/clFinish.html

urModuleCreate and the void **pfnNotify parameter

urModuleCreateis declared as:

UR_APIEXPORT ur_result_t UR_APICALL urModuleCreate(ur_context_handle_t hContext, const void *pIL, uint32_t length, const char *pOptions, void **pfnNotify, void *pUserData, ur_module_handle_t *phModule)

It accepts a pfnNotify argument that is supposed to be called when program compilation is complete, this is confusing for two reasons:

urModuleCreate doesn't have to do any compilation (it seems to just be an in memory representation of a SPIR-V string program, although maybe this is a misunderstanding), this may be handled further down the pipeline when you create the program, so why pass a callback here? In OpenCL clBuildProgram takes a callback to be called when the program is built. clBuildProgram takes a cl_program object that has already been created via clCreateProgramWith(Source|IL|Binary). The clCreateProgramWith.* APIs do not take a callback, if these are the analogous entry points then should urModuleCreate take a callback?
The type of pfnNotify is void **, which is not a function pointer. This is probably a typo and should be void (*pfnNotify) or void * (*pfnNotify) although the latter is unlikely since callbacks don't often return values, so the second * may also be a typo.

Start a Migration Guide

As part of Unified Runtime we need to provide a migration guide documenting the mapping from the PI plugin API to UR. This should consist of a set of mappings between the various entry points, types enumerations etc. and any divergences in semantics.

i.e. something like

Functions

PI	UR	Differences
piDeviceGetInfo	urDeviceGetInfo	None
...	...	...
...	...	...

Types

PI	UR	Differences
pi_context	ur_context_handle_t	None
...	...	...
...	...	...

Enumerations

PI	UR	Differences
pi_device_info	ur_device_info	(list of removed queries)
...	...	...
...	...	...

The initial proposal is to do this in markdown and have a living document in this repo that can be update as spec changes are made.

ur: add enqueue fill image

PI currently has an API function piEnqueueMemImageFill which is analogous to the openCL version clEnqueueFillImage

We should add this to Unified Runtime

Missing device info enumeration values

The _pi_device_info enumeration in PI has the following enumeration values which do not have equivalents in Unified Runtime:

PI_DEVICE_INFO_DEVICE_ID - used here
PI_DEVICE_INFO_GPU_EU_COUNT_PER_SUBSLICE - used here
PI_DEVICE_INFO_BUILD_ON_SUBDEVICE - used here
PI_EXT_INTEL_DEVICE_INFO_FREE_MEMORY - used here
PI_EXT_INTEL_DEVICE_INFO_MEMORY_CLOCK_RATE - used here
PI_EXT_INTEL_DEVICE_INFO_MEMORY_BUS_WIDTH - used here
PI_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES - used here
PI_DEVICE_INFO_GPU_HW_THREADS_PER_EU - used here
PI_DEVICE_INFO_BACKEND_VERSION - used here
PI_EXT_ONEAPI_DEVICE_INFO_BFLOAT16 - used here
PI_EXT_ONEAPI_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS - used here
PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_1D - used here
PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_2D - use here
PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_3D - used here
PI_EXT_ONEAPI_DEVICE_INFO_CUDA_ASYNC_BARRIER - used here

We should add these to the UR spec.

Remove ZER_DEVICE_INFO_PARENT_DEVICE

ZER_DEVICE_INFO_PARENT_DEVICE was presumably copied from CL_DEVICE_PARENT_DEVICE which has the following description:

Returns the cl_device_id of the parent device to which this sub-device belongs. If device is a root-level device, a NULL value is returned.

Unified Runtime doesn't have an API analogues to clCreateSubDevices and hence has no concept of a sub-device.

We should probably remove the ZER_DEVICE_INFO_PARENT_DEVICE enumeration value since it has no use.

Consider relaxing ur_module_handle_t SPIR-V requirement

There seems to be an implicit requirement that ur_module_handle_t objects are SPIR-V modules. From the urProgramCreate entry point:

Create Program from input SPIR-V modules.

Do we want to restrict the ur_module_handle_t objects to contain SPIR-V only. Or is it possible that higher level language runtimes sitting on top of Unified Runtime could feed it non SPIR-V based intermediate representations for compilation to target specific executables in the adapter via the urModuleCreate -> urProgramCreate path?

Support sycl_ext_oneapi_memcpy2d extension

intel/llvm#7370 introduces support for the sycl_ext_oneapi_memcpy2d extension by adding 3 entry points to PI: piextUSMEnqueueFill2D, piextUSMEnqueueMemset2D, and piextUSMEnqueueMemcpy2D, along with supporting enumerations.

Add support to the unified runtime spec, the relevant changes for the spec are in the sycl/include/sycl/detail/pi.h header.

Automatically add opened issues to project planning board

Currently new issues do not automatically get added to the project planning board. This is possible using the actions/add-to-project action. Look into enabling this action so that triage of opened issues is more streamlined.

should urEnqueueMemBufferCopy have offsets?

urEnqueueMemBufferCopy has the following declaration:

UR_APIEXPORT ur_result_t UR_APICALL urEnqueueMemBufferCopy(ur_queue_handle_t hQueue, ur_mem_handle_t hBufferSrc, ur_mem_handle_t hBufferDst, size_t size, uint32_t numEventsInWaitList, const ur_event_handle_t *eventWaitList, ur_event_handle_t *event)

It does not have offset parameters for the hBufferSrc hBufferDst arguments, OpenCL has offsets in the analogous API:

cl_int clEnqueueCopyBuffer(
    cl_command_queue command_queue,
    cl_mem src_buffer,
    cl_mem dst_buffer,
    size_t src_offset,
    size_t dst_offset,
    size_t size,
    cl_uint num_events_in_wait_list,
    const cl_event* event_wait_list,
    cl_event* event);

should unified runtime?

Reconsider urEventCreate

urEventCreate appears to be inspired by the clCreateUserEvent API.

OpenCL user events can be painful to implement since they essentially require bidirectional host-device synchronization, which may not be something a platform can efficiently support. We should consider whether user events are really required in the Unified Runtime API and if not remove this entry point.

Add Missing urProgramGetBuildInfo

Issue

PI has a piProgramGetBuildInfo entry point which has uses in the DPC++ runtime.

For Unified Runtime to replace PI as the API on top of which DPC++ is implemented we will need an equivalent urProgramGetBuildInfo entry point.

Task

Define an urProgramGetBuildInfo entry point and add it to the spec.

Missing context info enum values

PI_CONTEXT_INFO_PLATFORM - Its a useful information, used by some backend to make certain decision (device->platform->name). But this can also be done by urDeviceGetInfo(UR_DEVICE_INFO_PLATFORM)->urGetPlatformInfo(PLATFORM_NAME). So we can skip this.
PI_CONTEXT_INFO_PROPERTIES - Cant find its usage
PI_CONTEXT_INFO_REFERENCE_COUNT Cant find its usage too. UR has the concept of context acquire/release, but no way to query its reference count. Its inconsistent, for device which also has acq/rel, we have a UR_DEVICE_INFO_REFERENCE_COUNT. can add it to make it consistent.
PI_CONTEXT_INFO_ATOMIC_MEMORY_[ORDER|SCOPE]_CAPABILITIES - used to populate sycl device info for order/scope capabilities, Should they be moved to device info instead of context?

Recondisder urKernelSetArg

urKernelSetArg appears to have been heavily influenced by the clSetKernelArgs OpenCL API.

The clSetKernelArgs entry point is problematic for two reasons:

It is one of the only non-thread safe APIs in OpenCL, this behavior seems to have been inherited by urKernelSetArg:

   * The application must not call this function from simultaneous threads with the same kernel handle.

   * The implementation of this function should be lock-free.

It only sets one argument at a time.

We should be careful about repeating history here. Firstly, the API should be thread safe unless there is a valid reason it can't be, second we should support an API that sets multiple kernel arguments at once since this may be more optimal on some platforms.

Fix "zer" naming

The fact that Unified Runtime is also referred to as "level zero runtime" and uses the zer.* prefix for entry points and objects in the API is very confusing considering it is not level zero and sits above it in the stack. This naming convention should be replaced with ur.*.

Remove nullptr references

The API spec has references to nullptr (e.g. in error conditions) which is a C++ specific feature. As a C API Unified Runtime probably shouldn't make reference to nullptr, we should replace these references with NULL.

Missing _ur_image_info_t enumeration

PI has a _pi_image_info enumeration, that is typdef'd here and then the typdef is used here in the DPC++ runtime.

Unified Runtime is missing an equivalent enum so we should add one.

Missing _ur_mem_info_t enumerations.

Currently in UR we have an enumeration _ur_mem_info_t However this relates to USM allocation. We should rename this to _ur_mem_alloc_info_t, which will be used in the corresponding query urMemGetMemAllocInfo. Similarly to how it is named in PI _pi_mem_alloc_info

For the urMemGetInfo entry point we should define _ur_mem_info_t enumeration in order to complete(#62). Equivalent in PI.

PI only has two enumerations:

PI_MEM_CONTEXT
PI_MEM_SIZE

OpenCL has several more options. TBD - how many of these we should add.

Both queries appear to be needed as they are both used by the DPC++ runtime.
piGetMemInfo - https://github.com/intel/llvm/blob/sycl/sycl/source/detail/sycl_mem_obj_t.cpp/#L49
piextUSMGetMemAllocInfo - https://github.com/intel/llvm/blob/sycl/sycl/source/detail/usm/usm_impl.cpp/#L534

Add urMemImageGetInfo entry point

Issue

PI supports a piMemImageGetInfo entry point which has usage in the DPC++ runtime.

Unified Runtime currently doesn't have a urMemImageGetInfo entry point.

Task

Add a urMemImageGetInfo entry point to the Unified Runtime spec.

Clarify UR_PLATFORM_INIT_FLAG_LEVEL_ZERO platform init flag

urInit accepts a ur_platform_init_flags_t platform_flags argument. Currently the only valid value for this flag is UR_PLATFORM_INIT_FLAG_LEVEL_ZERO.

The documentation for this flag isn't particularly clear:

initialize Unified Runtime platform drivers

But its name makes it sound like it is specific to a level zero adapter (although perhaps this is just an artifact of Unified Runtime previously being called Level Zero Runtime?)

Either way, we should either rename this value, or clarify its meaning.

Repo Naming

There appears to be an empty repo for the Unified Runtime spec: https://github.com/oneapi-src/unified-runtime-spec

Should we:
a. Move the headers/spec from here to that repo? (we will loose all the pull requests and issues)
b. Delete https://github.com/oneapi-src/unified-runtime-spec and rename this repo to unified-runtime-spec
c. Delete https://github.com/oneapi-src/unified-runtime-spec and do not rename this repo.

Allow optional null arguments in API generation

Currently any pointer arguments to entry points in the API automatically have a error condition that the API should return UR_RESULT_ERROR_INVALID_NULL_POINTER see here.

Whilst this makes sense for most pointer parameters, some are optional e.g. return events. We should generalize this logic to allow for optional pointer arguments.

oneapi-src / unified-runtime Goto Github PK

unified-runtime's Introduction

Unified Runtime

Table of contents

Contents of the repo

Integration

Weekly tags

Third-Party tools

Building

Requirements

Windows

Linux

CMake standard options

Additional make targets

Contributions

Adapter naming convention

Source code generation

Documentation

Release Process

unified-runtime's People

Contributors

Stargazers

Watchers

Forkers

unified-runtime's Issues

The Problem

Example:

Task

Issue

Example

Task

Issue

Example

Task

Issue

Task

Issue

Task

Issue

Task

Issues

Task

Issue

Task

Functions

Types

Enumerations

Issue

Task

Issue

Task

Recommend Projects

Recommend Topics

Recommend Org