oneapi-src / unified-runtime Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://oneapi-src.github.io/unified-runtime/
License: Other
Home Page: https://oneapi-src.github.io/unified-runtime/
License: Other
PI supports a piEnqueueNativeKernel
entry point. This entry point doesn't seem to have any uses in the DPC++ runtime so its value is questionable from the perspective of implementing SYCL however it may be of use to other language runtimes.
Decide if the a urEnqueueNativeKernel
entry point is required in the Unified Runtime spec. If it is then add it, if it is not then no action is required.
Command buffers are a tried and tested solution to scheduling bottle necks when dispatching commands to hardware (see Vulkan, Level Zero, Metal and cl_khr_command_buffers). Unified Runtime is low level enough that it could support command buffers as a core feature (potentially replacing the regular enqueue APIs completely). This is something that has been raised in working group discussions and was met with a generally favorable response.
The urProgramCreate
API seems to mandate that ur_module_handle_t
s are SPIR-V modules:
Create Program from input SPIR-V modules.
However we don't specify which SPIR-V dialect we can consume. It should probably either reference the OpenCL environment spec or a new Unified Runtime dialect.
Currently any pointer arguments to entry points in the API automatically have a error condition that the API should return UR_RESULT_ERROR_INVALID_NULL_POINTER
see here.
Whilst this makes sense for most pointer parameters, some are optional e.g. return events. We should generalize this logic to allow for optional pointer arguments.
intel/llvm#7370 introduces support for the sycl_ext_oneapi_memcpy2d
extension by adding 3 entry points to PI: piextUSMEnqueueFill2D
, piextUSMEnqueueMemset2D
, and piextUSMEnqueueMemcpy2D
, along with supporting enumerations.
Add support to the unified runtime spec, the relevant changes for the spec are in the sycl/include/sycl/detail/pi.h
header.
PI supports a piMemGetInfo
entry point which has usage in the DPC++ runtime.
Unified Runtime currently doesn't have a urMemGetInfo
entry point.
Add a urMemGetInfo
entry point to the Unified Runtime spec.
The API spec has references to nullptr
(e.g. in error conditions) which is a C++ specific feature. As a C API Unified Runtime probably shouldn't make reference to nullptr
, we should replace these references with NULL
.
urModuleCreate
is declared as:
UR_APIEXPORT ur_result_t UR_APICALL urModuleCreate(ur_context_handle_t hContext, const void *pIL, uint32_t length, const char *pOptions, void **pfnNotify, void *pUserData, ur_module_handle_t *phModule)
It accepts a pfnNotify
argument that is supposed to be called when program compilation is complete, this is confusing for two reasons:
urModuleCreate
doesn't have to do any compilation (it seems to just be an in memory representation of a SPIR-V string program, although maybe this is a misunderstanding), this may be handled further down the pipeline when you create the program, so why pass a callback here? In OpenCL clBuildProgram
takes a callback to be called when the program is built. clBuildProgram
takes a cl_program
object that has already been created via clCreateProgramWith(Source|IL|Binary)
. The clCreateProgramWith.*
APIs do not take a callback, if these are the analogous entry points then should urModuleCreate
take a callback?pfnNotify
is void **
, which is not a function pointer. This is probably a typo and should be void (*pfnNotify)
or void * (*pfnNotify)
although the latter is unlikely since callbacks don't often return values, so the second *
may also be a typo.When the spec is updated the API header should also be updated. Add a GitHub Action workflow to ensure PR's are updating the header when he spec is changed.
The PI plugin layer has an API for querying the last error from a plugin implementation/adapter e.g. https://github.com/intel/llvm/blob/sycl/sycl/plugins/opencl/pi_opencl.cpp#L87. To get parity with PI we will need to add something like this to unified runtime.
Initially we could just have an API that matches that in PI i.e.
ur_result_t zerGetLastError(char **message);
where the returned value is the last reported error code and message
is an out parameter which is set to point at some adapter specific string containing a detailed message about the context of the error value.
This approach isn't ideal, since the memory returned is owned by the adapter so any caller would need to make a copy of the returned string if they want it to remain consistent, concurrent accesses may also be problematic and in the above declaration there is no way to get the size of the memory without assuming the string is null terminated, so these things would all need to be documented in the spec. In general though this is a tricky problem to solve with a C API.
When attempting to import
the ur.py module with ipython
the follow syntax error occured:
[ins] In [1]: import ur
Traceback (most recent call last):
File ~/.local/lib/python3.10/site-packages/IPython/core/interactiveshell.py:3378 in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
Cell In [1], line 1
import ur
File ~/Projects/oneapi-src/unified-runtime/include/ur.py:1815
_urDeviceSelectBinary_t = WINFUNCTYPE( ur_result_t, ur_device_handle_t, POINTER(c_ubyte*), c_ulong, POINTER(c_ulong) )
^
SyntaxError: invalid syntax
Looks like argument which are pointer to pointer types are not being handled correctly. There may be other syntax issues hidden by this error also.
Fix the syntax errors, and make sure the ur.py
module can, at the very least, be imported. Ideally this would also introduce CI testing to avoid regressions happening in future. More extensive testing, actually using the library, will probably have to wait though.
PI currently has an API function piEnqueueMemImageFill
which is analogous to the openCL version clEnqueueFillImage
We should add this to Unified Runtime
Some enumeration values have the same substring repeated twice e.g. UR_EVENT_INFO_EVENT_INFO_COMMAND_QUEUE
, we should figure out why this is happening and stop it, the duplicate name doesn't add any information and just makes things verbose.
ZER_DEVICE_INFO_PARENT_DEVICE was presumably copied from CL_DEVICE_PARENT_DEVICE which has the following description:
Returns the cl_device_id of the parent device to which this sub-device belongs. If device is a root-level device, a NULL value is returned.
Unified Runtime doesn't have an API analogues to clCreateSubDevices and hence has no concept of a sub-device.
We should probably remove the ZER_DEVICE_INFO_PARENT_DEVICE
enumeration value since it has no use.
urEnqueueMemBufferCopy
has the following declaration:
UR_APIEXPORT ur_result_t UR_APICALL urEnqueueMemBufferCopy(ur_queue_handle_t hQueue, ur_mem_handle_t hBufferSrc, ur_mem_handle_t hBufferDst, size_t size, uint32_t numEventsInWaitList, const ur_event_handle_t *eventWaitList, ur_event_handle_t *event)
It does not have offset parameters for the hBufferSrc
hBufferDst
arguments, OpenCL has offsets in the analogous API:
cl_int clEnqueueCopyBuffer(
cl_command_queue command_queue,
cl_mem src_buffer,
cl_mem dst_buffer,
size_t src_offset,
size_t dst_offset,
size_t size,
cl_uint num_events_in_wait_list,
const cl_event* event_wait_list,
cl_event* event);
should unified runtime?
PI has a piProgramGetBuildInfo
entry point which has uses in the DPC++ runtime.
For Unified Runtime to replace PI as the API on top of which DPC++ is implemented we will need an equivalent urProgramGetBuildInfo
entry point.
Define an urProgramGetBuildInfo
entry point and add it to the spec.
Following discussions on #22 it has been decided that this repository will act as a monorepo for the various components of Unified Runtime: headers, spec, loader, utilities adapters, tests etc.
Since this repo currently only contains the spec and headers it will need to be logically restructured to host the other components in a sensible way.
The fact that Unified Runtime is also referred to as "level zero runtime" and uses the zer.*
prefix for entry points and objects in the API is very confusing considering it is not level zero and sits above it in the stack. This naming convention should be replaced with ur.*
.
PI contains a API function PIQueueFinish
which is not yet been added to UnifiedRuntime. We should add this to have feature parity with PI.
https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/detail/pi.h#L1126
There is an equivalent OpenCL function: https://registry.khronos.org/OpenCL/sdk/1.0/docs/man/xhtml/clFinish.html
Some devices have peer to peer capabilities for memory transfers and/or USM buffers, we may need to consider a way to expose this in the unified runtime.
There's currently an extension proposal in DPC++ to handle this, with some discussion on it:
It's currently only a SYCL level extension but I suspect implementing it would require some changes in PI or UR, so we should consider it.
The various ur{Object}CreateWithNativeHandle
APIs seem to take an unnecessary first argument of the type they are going to create.
urPlatformCreateWithNativeHandle
has the following signature:
UR_APIEXPORT ur_result_t UR_APICALL
urPlatformCreateWithNativeHandle(
ur_platform_handle_t hPlatform, ///< [in] handle of the platform instance
ur_native_handle_t hNativePlatform, ///< [in] the native handle of the platform.
ur_platform_handle_t* phPlatform ///< [out] pointer to the handle of the platform object created.
);
It is unclear what the meaning of the first hPlatform
parameter is. This API is designed to create a ur_platform_handle_t
from some native handle, so why it needs to accept one as an input parameter is confusing. This also doesn't match the corresponding API in PI: piPlatformCreateWithNativeHandle
:
__SYCL_EXPORT pi_result piextPlatformCreateWithNativeHandle(
pi_native_handle nativeHandle, pi_platform *platform);
which has no first argument.
Some create-with-native APIs in PI accept a context (e.g. piextProgramCreateWithNativeHandle) these have been documented below. Some APIs in PI also have an argument for ownership which is missing in UR (although this is beyond the scope of this ticket).
Remove these unnecessary arguments.
The full list of APIs are:
urContextCreateWithNativeHandle
- Takes unnecessary ur_platform_handle_t
as its first argument.urEventCreateWithNativeHandle
- Takes unnecessary ur_platform_handle_t
as its first argument. (Although should maybe take a ur_context_handle_t
)urMemCreateWithNativeHandle
- Takes unnecessary ur_platform_handle_t
as its first argument. (Although should maybe take a ur_context_handle_t
)urQueueCreateWithNativeHandle
- Takes unnecessary ur_queue_handle_t
as its first argument. (Although should maybe take a ur_context_handle_t
)urSamplerCreateWithNativeHandle
- Takes unnecessary ur_sampler_handle_t
as its first argument (Although should maybe take a ur_context_handle_t
)urKernelCreateWithNativeHandle
- Takes unnecessary ur_platform_handle_t
as its first argument. (Although should maybe take a ur_context_handle_t
)urModuleCreateWithNativeHandle
- Takes unnecessary ur_platform_handle_t
as its first argument. (Although should maybe take a ur_context_handle_t
)urPlatformCreateWithNativeHandle
- Takes unnecessary ur_platform_handle_t
as its first argument.urProgramCreateWithNativeHandle
- Takes unnecessary ur_program_handle_t
as its first argument. (Although should maybe take a ur_context_handle_t
)We've been investigating changing the PI interface for memory allocations and also to some extent for memory transfers, which in turns also changes some of the meaning of the PI context. A lot of the reasoning for these changes is based on how the SYCL DPC++ runtime currently works, but it would be good to consider them for the Unified Runtime.
The changes are:
pi_device
argument to buffer and image allocation entry points (piMemBufferCreate
, piMemImageCreate
). It doesn't necessarily mean that the allocation will only be usable on that device, but it's helpful for backends that don't natively support context style allocations. For the DPC++ SYCL runtime this makes a lot of sense because we already do lazy allocation so when we call these functions we always already know the exact device targeted and not just the context (the SYCL context_bound
property is not currently implemented in DPC++).piextGetMemoryConnection
that takes two pairs of (pi_device, pi_context)
, and returns information on how the memory can or should be handled between the two pairs. It currently has three options:
PI_MEMORY_CONNECTION_NONE
: memory in the first (context, device)
pair cannot be used or migrated by the plugin into the second (context, device)
pair, copies through host are necessary.PI_MEMORY_CONNECTION_MIGRATABLE
: memory in the first (context, device)
pair cannot be used directly by the second (context, device)
pair, but the plugin can handle migrating data between the two (piEnqueueMemBufferCopy
).PI_MEMORY_CONNECTION_UNIFIED
: memory in the first (context, device)
pair is usable in the second pair.And with these two changes it means that a backend that doesn't natively support context-style allocations doesn't have to emulate them anymore, and can simply allocate for a specific device and report that the memory still needs to be migrated between devices in the same context. And a device that does support context-style allocations can ignore the pi_device
passed to the allocation functions and then simply report PI_MEMORY_CONNECTION_UNIFIED
when the contexts are identical, and PI_MEMORY_CONNECTION_NONE
when the contexts are different. In addition it also means that we can let plugins inform us if they can optimize memory copies between different context by reporting PI_MEMORY_CONNECTION_MIGRATABLE
, which would mean that piEnqueueMemBufferCopy
is supported between the two contexts and may be more efficient than doing a copy through host.
And so to circle back to the initial motivation, CUDA doesn't have context-style memory allocations like OpenCL or PI, and so to support having multiple CUDA devices in the same pi_context
we would have to roll out our own memory manager in the CUDA plugin (which I believe the LevelZero plugin also does), but since the SYCL runtime already has a memory manager, these PI plugin changes allow us to simply defer the management of memory allocations within the same context for the CUDA plugin to the SYCL runtime.
You can see more discussions and initial implementations of this on the following PR:
PI supports a piEventSetCallback
entry point. Despite the fact this entry point has no uses in the DPC++ runtime it is quite a useful API for compute runtimes to support allowing the callback to trigger work when a command or set of commands completes.
Unified Runtime currently has no urEventSetCallback
entry point.
Add urEventSetCallback
to the Unified Runtime spec.
intel/llvm#7526 introduces the ability to query the device timestamp, this is an analogue of clGetDeviceAndHostTimer. We should also introduce this into the UR spec.
Unified Runtime is missing the enumeration of values for querying and setting event statuses.
PI has the following (which matches OpenCL):
typedef enum {
PI_EVENT_COMPLETE = 0x0,
PI_EVENT_RUNNING = 0x1,
PI_EVENT_SUBMITTED = 0x2,
PI_EVENT_QUEUED = 0x3
} _pi_event_status;
Unified Runtime already has the ability to query this property via urEventGetInfo
with the UR_EVENT_INFO_EVENT_INFO_COMMAND_EXECUTION_STATUS
query, so we should add an enumeration for the values the returned argument can be set to.
Currently in UR we have an enumeration _ur_mem_info_t
However this relates to USM allocation. We should rename this to _ur_mem_alloc_info_t
, which will be used in the corresponding query urMemGetMemAllocInfo
. Similarly to how it is named in PI _pi_mem_alloc_info
For the urMemGetInfo
entry point we should define _ur_mem_info_t
enumeration in order to complete(#62). Equivalent in PI.
PI only has two enumerations:
OpenCL has several more options. TBD - how many of these we should add.
Both queries appear to be needed as they are both used by the DPC++ runtime.
piGetMemInfo - https://github.com/intel/llvm/blob/sycl/sycl/source/detail/sycl_mem_obj_t.cpp/#L49
piextUSMGetMemAllocInfo - https://github.com/intel/llvm/blob/sycl/sycl/source/detail/usm/usm_impl.cpp/#L534
Some Unified Runtime entry points are taking in/out parameters as pointers and treating the argument as an in or out parameter depending on the value at the address held in the pointer. Whilst this approach does allow the spec to reduce the number of arguments passed to these APIs by one, the semantics is confusing especially when coming from PI and OpenCL where the equivalent APIs have dedicated out parameters. It is also error prone if the caller accidentally leaves the value uninitialized which could result in indeterminate behavior at runtime.
Note that this pattern seems to have been adopted inconsistently throughout the UR spec, some entry points use it and some use the PI/OpenCL style (see below)
urDeviceGet
has the following documentation:
urDeviceGet(
ur_platform_handle_t hPlatform, ///< [in] handle of the platform instance
ur_device_type_t DevicesType, ///< [in] the type of the devices.
uint32_t* pCount, ///< [in,out] pointer to the number of devices.
///< If count is zero, then the call shall update the value with the total
///< number of devices available.
///< If count is greater than the number of devices available, then the
///< call shall update the value with the correct number of devices available.
ur_device_handle_t* phDevices ///< [out][optional][range(0, *pCount)] array of handle of devices.
///< If count is less than the number of devices available, then platform
///< shall only retrieve that number of devices.
);
where the pCount
is treated as an out parameter if the value at that address is zero and otherwise is the size of the value the array that will be passed via phDevices
parameter.
Compare this to the piDevicesGet
entry point:
__SYCL_EXPORT pi_result piDevicesGet(pi_platform platform,
pi_device_type device_type,
pi_uint32 num_entries, pi_device *devices,
pi_uint32 *num_devices);
This has dedicated num_entries
representing the size of the array passed in devices
and a num_devices
parameter which will return the number of devices in the platform of type device_type
to the caller.
This is also the style used in OpenCL with clGetDeviceIDs
:
cl_int clGetDeviceIDs(
cl_platform_id platform,
cl_device_type device_type,
cl_uint num_entries,
cl_device_id* devices,
cl_uint* num_devices);
Other APIs such as urPlatformGetInfo
unconditionally treat parameters as in/out updating the value at the address passed after first reading it:
UR_APIEXPORT ur_result_t UR_APICALL
urPlatformGetInfo(
ur_platform_handle_t hPlatform, ///< [in] handle of the platform
ur_platform_info_t PlatformInfoType, ///< [in] type of the info to retrieve
size_t* pSize, ///< [in,out] pointer to the number of bytes needed to return info queried.
///< the call shall update it with the real number of bytes needed to
///< return the info
void* pPlatformInfo ///< [out][optional] array of bytes holding the info.
///< if *pSize is not equal to the real number of bytes needed to return
///< the info then the ::UR_RESULT_ERROR_INVALID_SIZE error is returned and
///< pPlatformInfo is not used.
);
In this case the value at pSize
is unconditionally updated by the entry point however its value as an in parameter is only used in the case that pPlatformInfo
is non-null.
Replace the in/out parameters with dedicated in and out parameters for the following entry points:
The ur_api.h header redefines typdefs. Redefinition of typdefs is a C11 feature. Presumably we want to support earlier C versions.
To reproduce, compile the ur_api.h header on linux with -Werror, -Wtypedef-redefinition
, this should give something like:
ur_api.h:274:39: error: redefinition of typedef 'ur_base_properties_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef struct _ur_base_properties_t ur_base_properties_t;
^
ur_api.h:241:3: note: previous definition is here
} ur_base_properties_t; /source/ur/external/unified-runtime/include/ur_api.h:278:33: error: redefinition of typedef 'ur_base_desc_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef struct _ur_base_desc_t ur_base_desc_t;
^
ur_api.h:250:3: note: previous definition is here
} ur_base_desc_t;
^
ur_api.h:282:35: error: redefinition of typedef 'ur_rect_offset_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef struct _ur_rect_offset_t ur_rect_offset_t;
^
ur_api.h:260:3: note: previous definition is here
} ur_rect_offset_t;
^
ur_api.h:286:35: error: redefinition of typedef 'ur_rect_region_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef struct _ur_rect_region_t ur_rect_region_t;
^
ur_api.h:270:3: note: previous definition is here
} ur_rect_region_t;
^
ur_api.h:1702:3: error: redefinition of typedef 'ur_image_format_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_image_format_t;
^
ur_api.h:290:36: note: previous definition is here
typedef struct _ur_image_format_t ur_image_format_t;
^
ur_api.h:1720:3: error: redefinition of typedef 'ur_image_desc_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_image_desc_t;
^
ur_api.h:294:34: note: previous definition is here
typedef struct _ur_image_desc_t ur_image_desc_t;
^
ur_api.h:1848:3: error: redefinition of typedef 'ur_buffer_region_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_buffer_region_t;
^
ur_api.h:298:37: note: previous definition is here
typedef struct _ur_buffer_region_t ur_buffer_region_t;
^
ur_api.h:2211:3: error: redefinition of typedef 'ur_sampler_property_value_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_sampler_property_value_t;
^
ur_api.h:302:46: note: previous definition is here
typedef struct _ur_sampler_property_value_t ur_sampler_property_value_t;
^
ur_api.h:2816:3: error: redefinition of typedef 'ur_device_partition_property_value_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
} ur_device_partition_property_value_t;
^
ur_api.h:306:55: note: previous definition is here
typedef struct _ur_device_partition_property_value_t ur_device_partition_property_value_t;
Remove the redefinition typdefs for the above types.
There appears to be an empty repo for the Unified Runtime spec: https://github.com/oneapi-src/unified-runtime-spec
Should we:
a. Move the headers/spec from here to that repo? (we will loose all the pull requests and issues)
b. Delete https://github.com/oneapi-src/unified-runtime-spec and rename this repo to unified-runtime-spec
c. Delete https://github.com/oneapi-src/unified-runtime-spec and do not rename this repo.
PI supports a piMemImageGetInfo
entry point which has usage in the DPC++ runtime.
Unified Runtime currently doesn't have a urMemImageGetInfo
entry point.
Add a urMemImageGetInfo
entry point to the Unified Runtime spec.
urInit
accepts a ur_platform_init_flags_t platform_flags
argument. Currently the only valid value for this flag is UR_PLATFORM_INIT_FLAG_LEVEL_ZERO.
The documentation for this flag isn't particularly clear:
initialize Unified Runtime platform drivers
But its name makes it sound like it is specific to a level zero adapter (although perhaps this is just an artifact of Unified Runtime previously being called Level Zero Runtime?)
Either way, we should either rename this value, or clarify its meaning.
As noted in #29 downstream projects want to use FetchContent
to pull in the unified-runtime into their builds. Currently the CMakeLists.txt
do not follow best practices for use with FetchContent
, e.g. no targets are export.
We should properly support FetchContent
, and include usage examples in README.md
. The main use case, for this initial support, should be to provide a target which only provides the include path for the header files with all other parts of the build disabled.
urKernelSetArg
appears to have been heavily influenced by the clSetKernelArgs
OpenCL API.
The clSetKernelArgs
entry point is problematic for two reasons:
urKernelSetArg
: * The application must not call this function from simultaneous threads with the same kernel handle.
* The implementation of this function should be lock-free.
We should be careful about repeating history here. Firstly, the API should be thread safe unless there is a valid reason it can't be, second we should support an API that sets multiple kernel arguments at once since this may be more optimal on some platforms.
The following list of device queries are present in PI and missing in Unified Runtime. If we are to have parity with PI we should consider whether we need to add these queries:
PI_DEVICE_INFO_DEVICE_ID
PI_DEVICE_INFO_GPU_EU_COUNT_PER_SUBSLICE
PI_DEVICE_INFO_BUILD_ON_SUBDEVICE
PI_EXT_INTEL_DEVICE_INFO_FREE_MEMORY
PI_EXT_INTEL_DEVICE_INFO_MEMORY_CLOCK_RATE
PI_EXT_INTEL_DEVICE_INFO_MEMORY_BUS_WIDTH
PI_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES
PI_DEVICE_INFO_GPU_HW_THREADS_PER_EU
PI_DEVICE_INFO_BACKEND_VERSION
PI_EXT_ONEAPI_DEVICE_INFO_BFLOAT16
PI_EXT_ONEAPI_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS
PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_1D
PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_2D
PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_3D
PI_EXT_ONEAPI_DEVICE_INFO_CUDA_ASYNC_BARRIER
PI contains a API function PIQueueFlush which is not yet been added to UnifiedRuntime. We should add this to have feature parity with PI.
https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/detail/pi.h#L1128
There is an equivalent OpenCL function: https://registry.khronos.org/OpenCL/sdk/1.0/docs/man/xhtml/clFlush.html
As part of Unified Runtime we need to provide a migration guide documenting the mapping from the PI plugin API to UR. This should consist of a set of mappings between the various entry points, types enumerations etc. and any divergences in semantics.
i.e. something like
PI | UR | Differences |
---|---|---|
piDeviceGetInfo | urDeviceGetInfo | None |
... | ... | ... |
... | ... | ... |
PI | UR | Differences |
---|---|---|
pi_context | ur_context_handle_t | None |
... | ... | ... |
... | ... | ... |
PI | UR | Differences |
---|---|---|
pi_device_info | ur_device_info | (list of removed queries) |
... | ... | ... |
... | ... | ... |
The initial proposal is to do this in markdown and have a living document in this repo that can be update as spec changes are made.
urEventCreate
appears to be inspired by the clCreateUserEvent
API.
OpenCL user events can be painful to implement since they essentially require bidirectional host-device synchronization, which may not be something a platform can efficiently support. We should consider whether user events are really required in the Unified Runtime API and if not remove this entry point.
PI supports a piEventSetStatus
entry point. This entry point doesn't seem to have any uses in the DPC++ runtime and would only really be of value if we intend to support user events which is something we've previously discussed not including in the Unified Runtime spec due to the burdensome device <-> host synchronization requirement they place on an adapter.
Decide if the a urEventSetStatus
entry point is required in the Unified Runtime spec. If it is then add it, if it is not then no action is required.
There seems to be an implicit requirement that ur_module_handle_t
objects are SPIR-V modules. From the urProgramCreate
entry point:
Create Program from input SPIR-V modules.
Do we want to restrict the ur_module_handle_t
objects to contain SPIR-V only. Or is it possible that higher level language runtimes sitting on top of Unified Runtime could feed it non SPIR-V based intermediate representations for compilation to target specific executables in the adapter via the urModuleCreate
-> urProgramCreate
path?
The _pi_device_info
enumeration in PI has the following enumeration values which do not have equivalents in Unified Runtime:
PI_DEVICE_INFO_DEVICE_ID
- used herePI_DEVICE_INFO_GPU_EU_COUNT_PER_SUBSLICE
- used herePI_DEVICE_INFO_BUILD_ON_SUBDEVICE
- used herePI_EXT_INTEL_DEVICE_INFO_FREE_MEMORY
- used herePI_EXT_INTEL_DEVICE_INFO_MEMORY_CLOCK_RATE
- used herePI_EXT_INTEL_DEVICE_INFO_MEMORY_BUS_WIDTH
- used herePI_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES
- used herePI_DEVICE_INFO_GPU_HW_THREADS_PER_EU
- used herePI_DEVICE_INFO_BACKEND_VERSION
- used herePI_EXT_ONEAPI_DEVICE_INFO_BFLOAT16
- used herePI_EXT_ONEAPI_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS
- used herePI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_1D
- used herePI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_2D
- use herePI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_3D
- used herePI_EXT_ONEAPI_DEVICE_INFO_CUDA_ASYNC_BARRIER
- used hereWe should add these to the UR spec.
PI has a _pi_image_info enumeration, that is typdef'd here and then the typdef is used here in the DPC++ runtime.
Unified Runtime is missing an equivalent enum so we should add one.
Currently new issues do not automatically get added to the project planning board. This is possible using the actions/add-to-project action. Look into enabling this action so that triage of opened issues is more streamlined.
Unified Runtime appears to have followed the example of OpenCL and used _<name>
in typedefs of the opaque API objects e.g. _ur_module_handle_t
.
This is actually illegal in both the C and C++ specifications as all identifiers with the _ prefix are reserved for standard library implementation and internals.
We should rename the structs in this pattern.
urPlatformGet
currently says that "If phPlatforms is not NULL, then NumEntries but be greater than zero.", However we do not specify what should be returned in this case.
We should specify that urPlatformGet
and urDeviceGet
will return UR_RESULT_ERROR_INVALID_SIZE
or something similar.
Unified Runtime currently supports parameter verification for some arguments based on their type. For example, pointer and handle (which is technically still a pointer) arguments are automatically checked against NUL
. The generation of the error language in the spec appears to be automatic and based only on the type of the argument.
(Aside: In some cases this is too strict, e.g. the out parameter for the event argument on enqueue APIs should probably be optional: https://github.com/oneapi-src/unified-runtime/blob/main/include/zer_api.h#L859.)
More generally, we should consider whether (like OpenCL) we want to have verification for the semantics of arguments where appropriate e.g. checking that a read/write command will not overflow a buffer. In OpenCL the clEnqueue(Read|Write)Buffer
APIs require that the entry point returns:
CL_INVALID_VALUE if the region being read or written specified by (offset, size) is out of bounds or if ptr is a NULL value.
We need to decide whether we want parameter verification of this kind since in total for all the APIs there will be a lot of it. We will also need to write negative tests for all these cases once we have a test suite.
Currently the UR spec uses a mix of size_t
and uint32_t
params.
For example urDevicePartition
uses uint32_t
https://spec.oneapi.io/unified-runtime/latest/core/api.html#zerdevicegetinfo.
while, urDeviceInfo
uses size_t
https://github.com/oneapi-src/unified-runtime/blob/main/scripts/core/device.yml#L280
We should make make these all uint32_t
as size_t
is platform dependent.
Any boolean values passed to APIs in Unified Runtime are currently of bool
type. In C bool
has to be large enough to hold the true
and false
values, other than that there are no restrictions on its size.
Other compute frameworks like OpenCL typically define their own typedefs of the various integer types in C to avoid parameters to entry points having different sizes. For example, OpenCL defines a typdef cl_uint cl_bool
(where cl_uint is always 32 bits).
PI defines a pi_bool
type and the global values PI_TRUE/PI_FALSE
for comparisons. A ur_bool_t
type has also been added to Unified Runtime (although it doesn't appear to be used in any entry points).
We should consider whether we want to replace bool
usage with ur_bool_t
in Unified Runtime and define some global UR_TRUE/UR_FALSE
values.
Alternatively we could follow the Vulkan example of using the predefined platform independent C types defined in stdint.h
and replace bool
usage with a type of known width such as int8_t
and specify non-zero values as truthy.
This would also address the issues raise in: #3
In general the spec appears to be following the convention that pointer parameters are given the prefix p
and opaque UR API object handle parameters are given the prefix h
.
For example in the urDeviceGet
entry point:
UR_APIEXPORT ur_result_t UR_APICALL urDeviceGet(ur_platform_handle_t hPlatform, ur_device_type_t DevicesType, uint32_t *pCount, ur_device_handle_t *phDevices)
the platform parameter of type ur_platform_handle_t
has the name hPlatform
and the device count (which is an out parameter of type uint32_t*
) is called pCount
.
However this convention doesn't appear to have been followed consistently throughout the spec.
For example urEventCreate
takes a context parameter of type ur_context_handle_t
with the name context
rather than hContext
: https://github.com/oneapi-src/unified-runtime/blob/main/scripts/core/event.yml#L52
And the urEnqueueUSMMemcpy
entry point takes an out parameter of type ur_event_handle_t*
called eventWaitList
not pEventWaitList
: https://github.com/oneapi-src/unified-runtime/blob/main/scripts/core/enqueue.yml#L963
In general these typos show up in quite a lot of places. We should do a pass through the spec and make them consistent.
Currently we commit the ur_api.h
header to this repo. ur_api.h
(and the other generated files not currently committed) need to be manually regenerated any time the yaml source files to the generator, or the generator scripts themselves are updated.
It would be useful to have a dependency on the generator sources so that anyone changing those files and who is using the CMake targets to integrate the headers (for example) into their build (perhaps in a downstream project), doesn't need to manually keep track of when they need to regenerate the headers, it will just happen automatically as part of their build if they change any of the sources.
I think in CMake this could be done with the use of add_custom_command
and add_custom_target
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.