Coder Social home page Coder Social logo

oneapi-src / unified-runtime Goto Github PK

View Code? Open in Web Editor NEW
26.0 26.0 96.0 10.99 MB

Home Page: https://oneapi-src.github.io/unified-runtime/

License: Other

CMake 1.61% Batchfile 0.09% CSS 0.01% HTML 0.01% Python 2.35% Mako 1.40% C 12.07% C++ 82.49%

unified-runtime's People

Contributors

0x12cc avatar aarongreig avatar allanzyne avatar bensuo avatar callumfare avatar ewanc avatar fabiomestre avatar franklandjack avatar georgeweb avatar hdelan avatar igchor avatar jackakirk avatar kbenzie avatar kswiecicki avatar lukaszstolarczuk avatar maarquitos14 avatar martygrant avatar mfrancepillois avatar nrspruit avatar omarahmed1111 avatar patkamin avatar pbalcer avatar pietroghg avatar rdeodhar avatar smaslov-intel avatar steffenlarsen avatar uwedolinsky avatar veselypeta avatar wlemkows avatar zhaomaosu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

unified-runtime's Issues

Unified Runtime Command Buffers

Command buffers are a tried and tested solution to scheduling bottle necks when dispatching commands to hardware (see Vulkan, Level Zero, Metal and cl_khr_command_buffers). Unified Runtime is low level enough that it could support command buffers as a core feature (potentially replacing the regular enqueue APIs completely). This is something that has been raised in working group discussions and was met with a generally favorable response.

What SPIR-V dialect does Unified Runtime consume?

The urProgramCreate API seems to mandate that ur_module_handle_ts are SPIR-V modules:

Create Program from input SPIR-V modules. 

However we don't specify which SPIR-V dialect we can consume. It should probably either reference the OpenCL environment spec or a new Unified Runtime dialect.

Allow optional null arguments in API generation

Currently any pointer arguments to entry points in the API automatically have a error condition that the API should return UR_RESULT_ERROR_INVALID_NULL_POINTER see here.

Whilst this makes sense for most pointer parameters, some are optional e.g. return events. We should generalize this logic to allow for optional pointer arguments.

Support sycl_ext_oneapi_memcpy2d extension

intel/llvm#7370 introduces support for the sycl_ext_oneapi_memcpy2d extension by adding 3 entry points to PI: piextUSMEnqueueFill2D, piextUSMEnqueueMemset2D, and piextUSMEnqueueMemcpy2D, along with supporting enumerations.

Add support to the unified runtime spec, the relevant changes for the spec are in the sycl/include/sycl/detail/pi.h header.

Add urMemGetInfo entry point

Issue

PI supports a piMemGetInfo entry point which has usage in the DPC++ runtime.

Unified Runtime currently doesn't have a urMemGetInfo entry point.

Task

Add a urMemGetInfo entry point to the Unified Runtime spec.

Remove nullptr references

The API spec has references to nullptr (e.g. in error conditions) which is a C++ specific feature. As a C API Unified Runtime probably shouldn't make reference to nullptr, we should replace these references with NULL.

urModuleCreate and the void **pfnNotify parameter

urModuleCreateis declared as:

UR_APIEXPORT ur_result_t UR_APICALL urModuleCreate(ur_context_handle_t hContext, const void *pIL, uint32_t length, const char *pOptions, void **pfnNotify, void *pUserData, ur_module_handle_t *phModule)

It accepts a pfnNotify argument that is supposed to be called when program compilation is complete, this is confusing for two reasons:

  1. urModuleCreate doesn't have to do any compilation (it seems to just be an in memory representation of a SPIR-V string program, although maybe this is a misunderstanding), this may be handled further down the pipeline when you create the program, so why pass a callback here? In OpenCL clBuildProgram takes a callback to be called when the program is built. clBuildProgram takes a cl_program object that has already been created via clCreateProgramWith(Source|IL|Binary). The clCreateProgramWith.* APIs do not take a callback, if these are the analogous entry points then should urModuleCreate take a callback?
  2. The type of pfnNotify is void **, which is not a function pointer. This is probably a typo and should be void (*pfnNotify) or void * (*pfnNotify) although the latter is unlikely since callbacks don't often return values, so the second * may also be a typo.

Add get-last-error functionality

The PI plugin layer has an API for querying the last error from a plugin implementation/adapter e.g. https://github.com/intel/llvm/blob/sycl/sycl/plugins/opencl/pi_opencl.cpp#L87. To get parity with PI we will need to add something like this to unified runtime.

Initially we could just have an API that matches that in PI i.e.

ur_result_t zerGetLastError(char **message);

where the returned value is the last reported error code and message is an out parameter which is set to point at some adapter specific string containing a detailed message about the context of the error value.

This approach isn't ideal, since the memory returned is owned by the adapter so any caller would need to make a copy of the returned string if they want it to remain consistent, concurrent accesses may also be problematic and in the above declaration there is no way to get the size of the memory without assuming the string is null terminated, so these things would all need to be documented in the spec. In general though this is a tricky problem to solve with a C API.

Fix ur.py syntax error

When attempting to import the ur.py module with ipython the follow syntax error occured:

[ins] In [1]: import ur
Traceback (most recent call last):

  File ~/.local/lib/python3.10/site-packages/IPython/core/interactiveshell.py:3378 in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  Cell In [1], line 1
    import ur

  File ~/Projects/oneapi-src/unified-runtime/include/ur.py:1815
    _urDeviceSelectBinary_t = WINFUNCTYPE( ur_result_t, ur_device_handle_t, POINTER(c_ubyte*), c_ulong, POINTER(c_ulong) )
                                                                                            ^
SyntaxError: invalid syntax

Looks like argument which are pointer to pointer types are not being handled correctly. There may be other syntax issues hidden by this error also.

Fix the syntax errors, and make sure the ur.py module can, at the very least, be imported. Ideally this would also introduce CI testing to avoid regressions happening in future. More extensive testing, actually using the library, will probably have to wait though.

ur: add enqueue fill image

PI currently has an API function piEnqueueMemImageFill which is analogous to the openCL version clEnqueueFillImage

We should add this to Unified Runtime

Remove ZER_DEVICE_INFO_PARENT_DEVICE

ZER_DEVICE_INFO_PARENT_DEVICE was presumably copied from CL_DEVICE_PARENT_DEVICE which has the following description:

Returns the cl_device_id of the parent device to which this sub-device belongs. If device is a root-level device, a NULL value is returned.

Unified Runtime doesn't have an API analogues to clCreateSubDevices and hence has no concept of a sub-device.

We should probably remove the ZER_DEVICE_INFO_PARENT_DEVICE enumeration value since it has no use.

should urEnqueueMemBufferCopy have offsets?

urEnqueueMemBufferCopy has the following declaration:

UR_APIEXPORT ur_result_t UR_APICALL urEnqueueMemBufferCopy(ur_queue_handle_t hQueue, ur_mem_handle_t hBufferSrc, ur_mem_handle_t hBufferDst, size_t size, uint32_t numEventsInWaitList, const ur_event_handle_t *eventWaitList, ur_event_handle_t *event)

It does not have offset parameters for the hBufferSrc hBufferDst arguments, OpenCL has offsets in the analogous API:

cl_int clEnqueueCopyBuffer(
    cl_command_queue command_queue,
    cl_mem src_buffer,
    cl_mem dst_buffer,
    size_t src_offset,
    size_t dst_offset,
    size_t size,
    cl_uint num_events_in_wait_list,
    const cl_event* event_wait_list,
    cl_event* event);

should unified runtime?

Restructure Directory Hierarchy

Issues

Following discussions on #22 it has been decided that this repository will act as a monorepo for the various components of Unified Runtime: headers, spec, loader, utilities adapters, tests etc.

Since this repo currently only contains the spec and headers it will need to be logically restructured to host the other components in a sensible way.

Task

  1. Design and agree on a directory hierarchy for this repo making sure it includes logical locations for the following components:
  • spec (and spec generation tools)
  • headers
  • loader
  • utility libraries (e.g. command buffer software implementation)
  • test suite
  • adapters
  1. Actually restructure the repo.

Fix "zer" naming

The fact that Unified Runtime is also referred to as "level zero runtime" and uses the zer.* prefix for entry points and objects in the API is very confusing considering it is not level zero and sits above it in the stack. This naming convention should be replaced with ur.*.

Consider interface to expose P2P capabilities

Some devices have peer to peer capabilities for memory transfers and/or USM buffers, we may need to consider a way to expose this in the unified runtime.

There's currently an extension proposal in DPC++ to handle this, with some discussion on it:

It's currently only a SYCL level extension but I suspect implementing it would require some changes in PI or UR, so we should consider it.

Remove first parameter from ur{Object}CreateWithNativeHandle APIs

The Problem

The various ur{Object}CreateWithNativeHandle APIs seem to take an unnecessary first argument of the type they are going to create.

Example:

urPlatformCreateWithNativeHandle has the following signature:

UR_APIEXPORT ur_result_t UR_APICALL
urPlatformCreateWithNativeHandle(
    ur_platform_handle_t hPlatform,                 ///< [in] handle of the platform instance
    ur_native_handle_t hNativePlatform,             ///< [in] the native handle of the platform.
    ur_platform_handle_t* phPlatform                ///< [out] pointer to the handle of the platform object created.
    );

It is unclear what the meaning of the first hPlatform parameter is. This API is designed to create a ur_platform_handle_t from some native handle, so why it needs to accept one as an input parameter is confusing. This also doesn't match the corresponding API in PI: piPlatformCreateWithNativeHandle:

__SYCL_EXPORT pi_result piextPlatformCreateWithNativeHandle(
    pi_native_handle nativeHandle, pi_platform *platform);

which has no first argument.

Some create-with-native APIs in PI accept a context (e.g. piextProgramCreateWithNativeHandle) these have been documented below. Some APIs in PI also have an argument for ownership which is missing in UR (although this is beyond the scope of this ticket).

Task

Remove these unnecessary arguments.

The full list of APIs are:

Allocation functions, memory transfers and context

We've been investigating changing the PI interface for memory allocations and also to some extent for memory transfers, which in turns also changes some of the meaning of the PI context. A lot of the reasoning for these changes is based on how the SYCL DPC++ runtime currently works, but it would be good to consider them for the Unified Runtime.

The changes are:

  1. Add a pi_device argument to buffer and image allocation entry points (piMemBufferCreate, piMemImageCreate). It doesn't necessarily mean that the allocation will only be usable on that device, but it's helpful for backends that don't natively support context style allocations. For the DPC++ SYCL runtime this makes a lot of sense because we already do lazy allocation so when we call these functions we always already know the exact device targeted and not just the context (the SYCL context_bound property is not currently implemented in DPC++).
  2. Add a new query piextGetMemoryConnection that takes two pairs of (pi_device, pi_context), and returns information on how the memory can or should be handled between the two pairs. It currently has three options:
    • PI_MEMORY_CONNECTION_NONE: memory in the first (context, device) pair cannot be used or migrated by the plugin into the second (context, device) pair, copies through host are necessary.
    • PI_MEMORY_CONNECTION_MIGRATABLE: memory in the first (context, device) pair cannot be used directly by the second (context, device) pair, but the plugin can handle migrating data between the two (piEnqueueMemBufferCopy).
    • PI_MEMORY_CONNECTION_UNIFIED: memory in the first (context, device) pair is usable in the second pair.

And with these two changes it means that a backend that doesn't natively support context-style allocations doesn't have to emulate them anymore, and can simply allocate for a specific device and report that the memory still needs to be migrated between devices in the same context. And a device that does support context-style allocations can ignore the pi_device passed to the allocation functions and then simply report PI_MEMORY_CONNECTION_UNIFIED when the contexts are identical, and PI_MEMORY_CONNECTION_NONE when the contexts are different. In addition it also means that we can let plugins inform us if they can optimize memory copies between different context by reporting PI_MEMORY_CONNECTION_MIGRATABLE, which would mean that piEnqueueMemBufferCopy is supported between the two contexts and may be more efficient than doing a copy through host.

And so to circle back to the initial motivation, CUDA doesn't have context-style memory allocations like OpenCL or PI, and so to support having multiple CUDA devices in the same pi_context we would have to roll out our own memory manager in the CUDA plugin (which I believe the LevelZero plugin also does), but since the SYCL runtime already has a memory manager, these PI plugin changes allow us to simply defer the management of memory allocations within the same context for the CUDA plugin to the SYCL runtime.

You can see more discussions and initial implementations of this on the following PR:

Add piEventSetCallback entry point

Issue

PI supports a piEventSetCallback entry point. Despite the fact this entry point has no uses in the DPC++ runtime it is quite a useful API for compute runtimes to support allowing the callback to trigger work when a command or set of commands completes.

Unified Runtime currently has no urEventSetCallback entry point.

Task

Add urEventSetCallback to the Unified Runtime spec.

Missing _ur_event_status enumeration

Unified Runtime is missing the enumeration of values for querying and setting event statuses.

PI has the following (which matches OpenCL):

typedef enum {
  PI_EVENT_COMPLETE = 0x0,
  PI_EVENT_RUNNING = 0x1,
  PI_EVENT_SUBMITTED = 0x2,
  PI_EVENT_QUEUED = 0x3
} _pi_event_status;

Unified Runtime already has the ability to query this property via urEventGetInfo with the UR_EVENT_INFO_EVENT_INFO_COMMAND_EXECUTION_STATUS query, so we should add an enumeration for the values the returned argument can be set to.

Missing _ur_mem_info_t enumerations.

Currently in UR we have an enumeration _ur_mem_info_t However this relates to USM allocation. We should rename this to _ur_mem_alloc_info_t, which will be used in the corresponding query urMemGetMemAllocInfo. Similarly to how it is named in PI _pi_mem_alloc_info

For the urMemGetInfo entry point we should define _ur_mem_info_t enumeration in order to complete(#62). Equivalent in PI.

PI only has two enumerations:

  • PI_MEM_CONTEXT
  • PI_MEM_SIZE

OpenCL has several more options. TBD - how many of these we should add.

Both queries appear to be needed as they are both used by the DPC++ runtime.
piGetMemInfo - https://github.com/intel/llvm/blob/sycl/sycl/source/detail/sycl_mem_obj_t.cpp/#L49
piextUSMGetMemAllocInfo - https://github.com/intel/llvm/blob/sycl/sycl/source/detail/usm/usm_impl.cpp/#L534

Redesign entry points taking in/out parameter

Issue

Some Unified Runtime entry points are taking in/out parameters as pointers and treating the argument as an in or out parameter depending on the value at the address held in the pointer. Whilst this approach does allow the spec to reduce the number of arguments passed to these APIs by one, the semantics is confusing especially when coming from PI and OpenCL where the equivalent APIs have dedicated out parameters. It is also error prone if the caller accidentally leaves the value uninitialized which could result in indeterminate behavior at runtime.

Note that this pattern seems to have been adopted inconsistently throughout the UR spec, some entry points use it and some use the PI/OpenCL style (see below)

Example

urDeviceGet has the following documentation:

urDeviceGet(
    ur_platform_handle_t hPlatform,                 ///< [in] handle of the platform instance
    ur_device_type_t DevicesType,                   ///< [in] the type of the devices.
    uint32_t* pCount,                               ///< [in,out] pointer to the number of devices.
                                                    ///< If count is zero, then the call shall update the value with the total
                                                    ///< number of devices available.
                                                    ///< If count is greater than the number of devices available, then the
                                                    ///< call shall update the value with the correct number of devices available.
    ur_device_handle_t* phDevices                   ///< [out][optional][range(0, *pCount)] array of handle of devices.
                                                    ///< If count is less than the number of devices available, then platform
                                                    ///< shall only retrieve that number of devices.
    );

where the pCount is treated as an out parameter if the value at that address is zero and otherwise is the size of the value the array that will be passed via phDevices parameter.

Compare this to the piDevicesGet entry point:

__SYCL_EXPORT pi_result piDevicesGet(pi_platform platform,
                                     pi_device_type device_type,
                                     pi_uint32 num_entries, pi_device *devices,
                                     pi_uint32 *num_devices);

This has dedicated num_entries representing the size of the array passed in devices and a num_devices parameter which will return the number of devices in the platform of type device_type to the caller.

This is also the style used in OpenCL with clGetDeviceIDs:

cl_int clGetDeviceIDs(
    cl_platform_id platform,
    cl_device_type device_type,
    cl_uint num_entries,
    cl_device_id* devices,
    cl_uint* num_devices);

Other APIs such as urPlatformGetInfo unconditionally treat parameters as in/out updating the value at the address passed after first reading it:

UR_APIEXPORT ur_result_t UR_APICALL
urPlatformGetInfo(
    ur_platform_handle_t hPlatform,                 ///< [in] handle of the platform
    ur_platform_info_t PlatformInfoType,            ///< [in] type of the info to retrieve
    size_t* pSize,                                  ///< [in,out] pointer to the number of bytes needed to return info queried.
                                                    ///< the call shall update it with the real number of bytes needed to
                                                    ///< return the info
    void* pPlatformInfo                             ///< [out][optional] array of bytes holding the info.
                                                    ///< if *pSize is not equal to the real number of bytes needed to return
                                                    ///< the info then the ::UR_RESULT_ERROR_INVALID_SIZE error is returned and
                                                    ///< pPlatformInfo is not used.
    );

In this case the value at pSize is unconditionally updated by the entry point however its value as an in parameter is only used in the case that pPlatformInfo is non-null.

Task

Replace the in/out parameters with dedicated in and out parameters for the following entry points:

Redefinition of typedefs in ur_api.h

Issue

The ur_api.h header redefines typdefs. Redefinition of typdefs is a C11 feature. Presumably we want to support earlier C versions.

Example

To reproduce, compile the ur_api.h header on linux with -Werror, -Wtypedef-redefinition, this should give something like:

ur_api.h:274:39: error: redefinition of typedef 'ur_base_properties_t' is a C11 feature [-Werror,-Wtypedef-redefinition]                       
typedef struct _ur_base_properties_t ur_base_properties_t;                                                                                                                                   
                                      ^                                                                                                                                                        
ur_api.h:241:3: note: previous definition is here                                                                                               
} ur_base_properties_t; /source/ur/external/unified-runtime/include/ur_api.h:278:33: error: redefinition of typedef 'ur_base_desc_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef struct _ur_base_desc_t ur_base_desc_t;
                                ^
ur_api.h:250:3: note: previous definition is here
} ur_base_desc_t;
  ^
ur_api.h:282:35: error: redefinition of typedef 'ur_rect_offset_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef struct _ur_rect_offset_t ur_rect_offset_t;
                                  ^
ur_api.h:260:3: note: previous definition is here
} ur_rect_offset_t;
  ^
ur_api.h:286:35: error: redefinition of typedef 'ur_rect_region_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef struct _ur_rect_region_t ur_rect_region_t;
                                  ^
ur_api.h:270:3: note: previous definition is here
} ur_rect_region_t;
  ^
ur_api.h:1702:3: error: redefinition of typedef 'ur_image_format_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_image_format_t;
  ^
ur_api.h:290:36: note: previous definition is here
typedef struct _ur_image_format_t ur_image_format_t;
                                   ^
ur_api.h:1720:3: error: redefinition of typedef 'ur_image_desc_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_image_desc_t;
  ^
ur_api.h:294:34: note: previous definition is here
typedef struct _ur_image_desc_t ur_image_desc_t;
                                 ^
ur_api.h:1848:3: error: redefinition of typedef 'ur_buffer_region_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_buffer_region_t;
  ^
ur_api.h:298:37: note: previous definition is here
typedef struct _ur_buffer_region_t ur_buffer_region_t;
                                    ^
ur_api.h:2211:3: error: redefinition of typedef 'ur_sampler_property_value_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_sampler_property_value_t;
  ^
ur_api.h:302:46: note: previous definition is here
typedef struct _ur_sampler_property_value_t ur_sampler_property_value_t;
                                             ^
ur_api.h:2816:3: error: redefinition of typedef 'ur_device_partition_property_value_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_device_partition_property_value_t;
  ^
ur_api.h:306:55: note: previous definition is here
typedef struct _ur_device_partition_property_value_t ur_device_partition_property_value_t;

Task

Remove the redefinition typdefs for the above types.

Clarify UR_PLATFORM_INIT_FLAG_LEVEL_ZERO platform init flag

urInit accepts a ur_platform_init_flags_t platform_flags argument. Currently the only valid value for this flag is UR_PLATFORM_INIT_FLAG_LEVEL_ZERO.

The documentation for this flag isn't particularly clear:

initialize Unified Runtime platform drivers

But its name makes it sound like it is specific to a level zero adapter (although perhaps this is just an artifact of Unified Runtime previously being called Level Zero Runtime?)

Either way, we should either rename this value, or clarify its meaning.

Update CMakeLists.txt to better support FetchContent

As noted in #29 downstream projects want to use FetchContent to pull in the unified-runtime into their builds. Currently the CMakeLists.txt do not follow best practices for use with FetchContent, e.g. no targets are export.

We should properly support FetchContent, and include usage examples in README.md. The main use case, for this initial support, should be to provide a target which only provides the include path for the header files with all other parts of the build disabled.

Recondisder urKernelSetArg

urKernelSetArg appears to have been heavily influenced by the clSetKernelArgs OpenCL API.

The clSetKernelArgs entry point is problematic for two reasons:

  1. It is one of the only non-thread safe APIs in OpenCL, this behavior seems to have been inherited by urKernelSetArg:
   * The application must not call this function from simultaneous threads with the same kernel handle.

   * The implementation of this function should be lock-free.
  1. It only sets one argument at a time.

We should be careful about repeating history here. Firstly, the API should be thread safe unless there is a valid reason it can't be, second we should support an API that sets multiple kernel arguments at once since this may be more optimal on some platforms.

Missing device queries

The following list of device queries are present in PI and missing in Unified Runtime. If we are to have parity with PI we should consider whether we need to add these queries:

  • PI_DEVICE_INFO_DEVICE_ID
  • PI_DEVICE_INFO_GPU_EU_COUNT_PER_SUBSLICE
  • PI_DEVICE_INFO_BUILD_ON_SUBDEVICE
  • PI_EXT_INTEL_DEVICE_INFO_FREE_MEMORY
  • PI_EXT_INTEL_DEVICE_INFO_MEMORY_CLOCK_RATE
  • PI_EXT_INTEL_DEVICE_INFO_MEMORY_BUS_WIDTH
  • PI_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES
  • PI_DEVICE_INFO_GPU_HW_THREADS_PER_EU
  • PI_DEVICE_INFO_BACKEND_VERSION
  • PI_EXT_ONEAPI_DEVICE_INFO_BFLOAT16
  • PI_EXT_ONEAPI_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS
  • PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_1D
  • PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_2D
  • PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_3D
  • PI_EXT_ONEAPI_DEVICE_INFO_CUDA_ASYNC_BARRIER

Start a Migration Guide

As part of Unified Runtime we need to provide a migration guide documenting the mapping from the PI plugin API to UR. This should consist of a set of mappings between the various entry points, types enumerations etc. and any divergences in semantics.

i.e. something like

Functions

PI UR Differences
piDeviceGetInfo urDeviceGetInfo None
... ... ...
... ... ...

Types

PI UR Differences
pi_context ur_context_handle_t None
... ... ...
... ... ...

Enumerations

PI UR Differences
pi_device_info ur_device_info (list of removed queries)
... ... ...
... ... ...

The initial proposal is to do this in markdown and have a living document in this repo that can be update as spec changes are made.

Reconsider urEventCreate

urEventCreate appears to be inspired by the clCreateUserEvent API.

OpenCL user events can be painful to implement since they essentially require bidirectional host-device synchronization, which may not be something a platform can efficiently support. We should consider whether user events are really required in the Unified Runtime API and if not remove this entry point.

Decide if urEventSetStatus entry point is required

Issue

PI supports a piEventSetStatus entry point. This entry point doesn't seem to have any uses in the DPC++ runtime and would only really be of value if we intend to support user events which is something we've previously discussed not including in the Unified Runtime spec due to the burdensome device <-> host synchronization requirement they place on an adapter.

Task

Decide if the a urEventSetStatus entry point is required in the Unified Runtime spec. If it is then add it, if it is not then no action is required.

Consider relaxing ur_module_handle_t SPIR-V requirement

There seems to be an implicit requirement that ur_module_handle_t objects are SPIR-V modules. From the urProgramCreate entry point:

Create Program from input SPIR-V modules. 

Do we want to restrict the ur_module_handle_t objects to contain SPIR-V only. Or is it possible that higher level language runtimes sitting on top of Unified Runtime could feed it non SPIR-V based intermediate representations for compilation to target specific executables in the adapter via the urModuleCreate -> urProgramCreate path?

Missing device info enumeration values

The _pi_device_info enumeration in PI has the following enumeration values which do not have equivalents in Unified Runtime:

  • PI_DEVICE_INFO_DEVICE_ID - used here
  • PI_DEVICE_INFO_GPU_EU_COUNT_PER_SUBSLICE - used here
  • PI_DEVICE_INFO_BUILD_ON_SUBDEVICE - used here
  • PI_EXT_INTEL_DEVICE_INFO_FREE_MEMORY - used here
  • PI_EXT_INTEL_DEVICE_INFO_MEMORY_CLOCK_RATE - used here
  • PI_EXT_INTEL_DEVICE_INFO_MEMORY_BUS_WIDTH - used here
  • PI_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES - used here
  • PI_DEVICE_INFO_GPU_HW_THREADS_PER_EU - used here
  • PI_DEVICE_INFO_BACKEND_VERSION - used here
  • PI_EXT_ONEAPI_DEVICE_INFO_BFLOAT16 - used here
  • PI_EXT_ONEAPI_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS - used here
  • PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_1D - used here
  • PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_2D - use here
  • PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_3D - used here
  • PI_EXT_ONEAPI_DEVICE_INFO_CUDA_ASYNC_BARRIER - used here

We should add these to the UR spec.

Stop using _<name> reserved identifiers

Unified Runtime appears to have followed the example of OpenCL and used _<name> in typedefs of the opaque API objects e.g. _ur_module_handle_t.

This is actually illegal in both the C and C++ specifications as all identifiers with the _ prefix are reserved for standard library implementation and internals.

We should rename the structs in this pattern.

`urPlatformGet` & `urDeviceGet` should return valid error code.

urPlatformGet currently says that "If phPlatforms is not NULL, then NumEntries but be greater than zero.", However we do not specify what should be returned in this case.

We should specify that urPlatformGet and urDeviceGet will return UR_RESULT_ERROR_INVALID_SIZE or something similar.

Consider semantic parameter verification/valid usage for arguments

Unified Runtime currently supports parameter verification for some arguments based on their type. For example, pointer and handle (which is technically still a pointer) arguments are automatically checked against NUL. The generation of the error language in the spec appears to be automatic and based only on the type of the argument.

(Aside: In some cases this is too strict, e.g. the out parameter for the event argument on enqueue APIs should probably be optional: https://github.com/oneapi-src/unified-runtime/blob/main/include/zer_api.h#L859.)

More generally, we should consider whether (like OpenCL) we want to have verification for the semantics of arguments where appropriate e.g. checking that a read/write command will not overflow a buffer. In OpenCL the clEnqueue(Read|Write)Buffer APIs require that the entry point returns:

CL_INVALID_VALUE if the region being read or written specified by (offset, size) is out of bounds or if ptr is a NULL value.

We need to decide whether we want parameter verification of this kind since in total for all the APIs there will be a lot of it. We will also need to write negative tests for all these cases once we have a test suite.

Missing context info enum values

  • PI_CONTEXT_INFO_PLATFORM - Its a useful information, used by some backend to make certain decision (device->platform->name). But this can also be done by urDeviceGetInfo(UR_DEVICE_INFO_PLATFORM)->urGetPlatformInfo(PLATFORM_NAME). So we can skip this.
  • PI_CONTEXT_INFO_PROPERTIES - Cant find its usage
  • PI_CONTEXT_INFO_REFERENCE_COUNT Cant find its usage too. UR has the concept of context acquire/release, but no way to query its reference count. Its inconsistent, for device which also has acq/rel, we have a UR_DEVICE_INFO_REFERENCE_COUNT. can add it to make it consistent.
  • PI_CONTEXT_INFO_ATOMIC_MEMORY_[ORDER|SCOPE]_CAPABILITIES - used to populate sycl device info for order/scope capabilities, Should they be moved to device info instead of context?

Consider typedefing ur_bool

Any boolean values passed to APIs in Unified Runtime are currently of bool type. In C bool has to be large enough to hold the true and false values, other than that there are no restrictions on its size.

Other compute frameworks like OpenCL typically define their own typedefs of the various integer types in C to avoid parameters to entry points having different sizes. For example, OpenCL defines a typdef cl_uint cl_bool (where cl_uint is always 32 bits).

PI defines a pi_bool type and the global values PI_TRUE/PI_FALSE for comparisons. A ur_bool_t type has also been added to Unified Runtime (although it doesn't appear to be used in any entry points).

We should consider whether we want to replace bool usage with ur_bool_t in Unified Runtime and define some global UR_TRUE/UR_FALSE values.

Alternatively we could follow the Vulkan example of using the predefined platform independent C types defined in stdint.h and replace bool usage with a type of known width such as int8_t and specify non-zero values as truthy.

This would also address the issues raise in: #3

Make Parameter Naming Consistent

In general the spec appears to be following the convention that pointer parameters are given the prefix p and opaque UR API object handle parameters are given the prefix h.

For example in the urDeviceGet entry point:

UR_APIEXPORT ur_result_t UR_APICALL urDeviceGet(ur_platform_handle_t hPlatform, ur_device_type_t DevicesType, uint32_t *pCount, ur_device_handle_t *phDevices)

the platform parameter of type ur_platform_handle_t has the name hPlatform and the device count (which is an out parameter of type uint32_t*) is called pCount.

However this convention doesn't appear to have been followed consistently throughout the spec.

For example urEventCreate takes a context parameter of type ur_context_handle_t with the name context rather than hContext: https://github.com/oneapi-src/unified-runtime/blob/main/scripts/core/event.yml#L52

And the urEnqueueUSMMemcpy entry point takes an out parameter of type ur_event_handle_t* called eventWaitList not pEventWaitList: https://github.com/oneapi-src/unified-runtime/blob/main/scripts/core/enqueue.yml#L963

In general these typos show up in quite a lot of places. We should do a pass through the spec and make them consistent.

Add cmake machinery to regenerate generated files as part of build when generator sources change

Currently we commit the ur_api.h header to this repo. ur_api.h (and the other generated files not currently committed) need to be manually regenerated any time the yaml source files to the generator, or the generator scripts themselves are updated.

It would be useful to have a dependency on the generator sources so that anyone changing those files and who is using the CMake targets to integrate the headers (for example) into their build (perhaps in a downstream project), doesn't need to manually keep track of when they need to regenerate the headers, it will just happen automatically as part of their build if they change any of the sources.

I think in CMake this could be done with the use of add_custom_command and add_custom_target.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.