webgpu-native / webgpu-headers Goto Github PK

Home Page: https://webgpu-native.github.io/webgpu-headers/

License: BSD 3-Clause "New" or "Revised" License

C 96.48% Makefile 1.96% C++ 1.00% Go 0.56%

webgpu-headers's Introduction

WebGPU Headers

This repository contains C headers equivalent to the WebGPU API and documentation on the native specificities of the headers.

This header is NOT STABLE yet, and the documentation is very much a work in progress!

All of the API is defined in the webgpu.h header file. Read the documentation here!

Why?

While WebGPU is a JavaScript API made for the Web, it is a good tradeoff of ergonomic, efficient and portable graphics API. Almost all of its concepts are not specific to the Web platform and the headers replicate them exactly, while adding capabilities to interact with native concepts (like windows).

Implementations of this header include:

Dawn, the C++ WebGPU implementation used in Chromium
wgpu-native, C bindings to wgpu, the Rust WebGPU implementation used in Firefox
Emscripten translates webgpu.h calls to JavaScript WebGPU calls when compiling to WASM

Details

Here are some details about the structure of this repository.

Main files

webgpu.h is the one and only header file that defines the WebGPU C API. Only this needs to be integrated in a C project that links against a WebGPU implementation.
webgpu.yml is the main machine-readable source of truth for the C API and its documentation (in YAML format). It is used to generate the official webgpu.h header present in this repository, (will be used) to generate the official documentation, and may be used by any other third party to design tools and wrappers around WebGPU-Native.
schema.json is the JSON schema that formally specifies the structure of webgpu.yml.

Generator

Makefile defines the rules to automatically generate webgpu.h from webgpu.yml and check the result.
gen/ and the go.* files are the source code of the generator called by the Makefile.
tests/compile is used to check that the generated C header is indeed valid C/C++ code.

Workflows

.github/workflows defines the automated processes that run upon new commits/PR, to check that changes in webgpu.yml and webgpu.h are consistent.

Contributing

Important When submitting a change, one must modify both the webgpu.yml and webgpu.h files in a consistent way. One should first edit webgpu.yml (the source of truth), then run make gen to update webgpu.h and finally commit both changes together.

webgpu-headers's People

Contributors

Stargazers

Watchers

Forkers

kangz austineng kvark blockspacer umberto-sonnino jcant0n kainino0x jiawei-shao xiaoshzx vendored xiaoxiang781216 ldash4 shrekshao radgerayden almarklein amadeusine rajveermalviya nullcatalyst caiiiycuk wasabia b0bh00d slimsag steve132 snstruthers cyberflamego lokokung jupahe64 rafaelbeckel dj2 boldhonor beaufortfrancois delusionengine simeks eliemichel lucasgivord strogo hocheung-chromium andersbakken mehmetoguzderin

webgpu-headers's Issues

Figuring out lifetime management

Lifetime management is the only issue left before webgpu.h can be useful (that, and device creation, and swapchain but they can be in dawn/wgpu-rs private headers for some time).

Here's three options that I believe cover most of the design space and are all useful and reasonable:

1: Single call delete (no refcounting)

With this, there is a single function for each object type, that when called will make the WebGPU object handle invalid. (but won't implicitly call wgpuBufferDestroy or wgpuTextureDestroy).

WGPUDevice wgpuDeviceCreateBuffer(WGPUDevice device,
                                  const WGPUDeviceDescriptor* descriptor);
void wgpuBufferDelete(WGPUBuffer buffer);

2: Refcounting, "ComPtr" way.

Object lifetime is controlled by a refcount, which starts at 1 and can be increased or decreased by 1. When it reaches 0, the object handle becomes invalid (but there is no implicit call to wgpuBufferDestroy or wgpuTextureDestroy)

WGPUDevice wgpuDeviceCreateBuffer(WGPUDevice device,
                                  const WGPUDeviceDescriptor* descriptor);
void wgpuBufferReference(WGPUBuffer buffer);
void wgpuBufferRelease(WGPUBuffer buffer);

3: Refcounting, shallow-cloning way.

This one is more experimental, in particular objects that are shallow-clones of each other may have the same handle representation in some implementations and not others, which could break apps if they try to compare them.

Objects start with a refcount of 1. Refcount can be increased by returning a handle that represents that reference. Deleting a handle decreases the refcount by 1 and makes this handle invalid. Reaching refcount 0 there are no more valid handles (but there is no implicit call to wgpuBufferDestroy or wgpuTextureDestroy).

WGPUDevice wgpuDeviceCreateBuffer(WGPUDevice device,
                                  const WGPUDeviceDescriptor* descriptor);
WGPUBuffer wgpuBufferReference(WGPUBuffer buffer);
void wgpuBufferRelease(WGPUBuffer buffer);

Discussion

As described previously, I think we should have some form of refcounting because it is already present in all implementations, and helps when making large apps. You could wrap WebGPU objects in shared_ptr but that adds an extra unnecessary indirection and allocation.

I remember there was some concern due to the structure of wgpu-rs which I didn't fully understand or remember, but maybe option 3 helps with it?

memset/memcpy for mapped buffers in WASM

@juj on the implementation of these two functions, which involve a malloc, Uint8Array.set, and free.

emscripten-core/emscripten#11737 (comment)

Oh, these two functions wgpuBufferGetConstMappedRange and wgpuBufferGetMappedRange are horrible 🤮 😞

I suppose mapping a GPU buffer directly to wasm heap address space will be a hard no-go? (has that been discussed with Wasm folks? CC @lukewagner @dschuff )

Assuming it is not possible, I wonder if we should consider changing up the API completely. Given that there is only asynchronous mapping support, we should make this a callback-based API where a cb function gets called when the map happens, and the cb function will receive a handle ID, and then have dedicated functions
void wgpuMemset8ToMappedBuffer(uint destinationMappedBufferHandleId, uint destinationOffset, uint destinationSize, uchar fillValue);
void wgpuMemset16ToMappedBuffer(uint destinationMappedBufferHandleId, uint destinationOffset, uint destinationSize, ushort fillValue);
void wgpuMemset32ToMappedBuffer(uint destinationMappedBufferHandleId, uint destinationOffset, uint destinationSize, uint fillValue);
void wgpuMemcpyToMappedBuffer(uint destinationMappedBufferHandleId, uint destinationOffset, uint destinationSize, void *sourcePtr);
void wgpuMemcpyFromMappedBuffer(void *destinationPtr, uint destinationOffset, uint destinationSize, uint sourceMappedBufferHandleId);
This way the caller will be in control whether there will need to be malloc/free in effect, and avoid any redundant memsets and memcpys?

WGPUShaderModuleWGSLDescriptor needs a byteLength field

WGSL code can contain nulls (EDIT: only in comments), so the string cannot be treated as null-terminated. Instead a byteLength field would be needed.

gpuweb/gpuweb#1816

WGPUPowerPreference does not have a "no preference" option

The powerPreference field in GPURequestAdapterOptions is optional, and there's no way to represent its absence in these headers.

Determine how to return CompilationInfo

CompilationInfo is a little unique because it's one of the only places in the API that a structure of data is being returned out of the API. AdapterProperties seems to be the only close analog, but because CompilationInfo is queried with a promise in the JS API it would need to be supplied by a callback here.

So a first stab at this would suggest we'd want an API surface like that following?

typedef enum WGPUCompilationMessageType {
    WGPUCompilationMessageType_Undefined = 0x00000000,
    WGPUCompilationMessageType_Error = 0x00000001,
    WGPUCompilationMessageType_Warning = 0x00000002,
    WGPUCompilationMessageType_Info = 0x00000003,
} WGPUCompilationMessageType;

typedef struct WGPUCompilationMessage {
    WGPUChainedStruct const * nextInChain; // Are these extensible?
    char const * message;
    WGPUCompilationMessageType type;
    uint64_t lineNum;
    uint64_t linePos;
} WGPUCompilationMessage;

typedef struct WGPUCompilationInfo {
    WGPUChainedStruct const * nextInChain; // Are these extensible?
    WGPUCompilationMessage * messages;
    uint32_t messageCount;
} WGPUCompilationInfo ;

typedef void (*WGPUCompilationInfoCallback )(WGPUCompilationInfo* compilationInfo, void * userdata);

WGPU_EXPORT void wgpuShaderModuleGetCompilationInfo(WGPUShaderModule shaderModule, WGPUCompilationInfoCallback callback, void * userdata);

Or, recognizing that WGPUCompilationInfo is just a list of messages, if we weren't worried about extensibility of that particular structure the callback signature could simply be:

typedef void (*WGPUCompilationInfoCallback )(WGPUCompilationMessage* messages, uint32_t messageCount, void * userdata);

I'm also not sure what sort of guarantees the library gives about lifetime of objects like this, if it's expected that it would only be valid for the duration of the callback or if the library would be expected to keep it alive and owned by the shader module after being queried.

Defining undefined on WGPULimits

In #104 the WGPULimits struct was added. To specify "undefined" (i.e. the default), the value to use depends on the type:

webgpu-headers/webgpu.h

Lines 57 to 58 in 4b4dbc3

    
           #define WGPU_LIMIT_U32_UNDEFINED (0xffffffffUL) 
        
           #define WGPU_LIMIT_U64_UNDEFINED (0xffffffffffffffffULL)

I suppose this is fine if your brain is wired for Rust, but my Python-brain finds this rather scary. In other words: not all code that consumes wgpu-native has a type-checker that is going to prevent setting a massive limit where the default was intended. I fear this can cause very annoying issues.

Is there still room for discussion here? Some options:

Use zeros as undefined. For many limits zero makes no sense anyway. However, it does for some, so probably not a good idea.
Use signed integers, and consider <0 as undefined.
Use uin32 for all limits, it looks like wgpu-core also uses u32 for all limits. Then at least its consistent.

Waiting for asynchronous operations to finish (polling)

wgpu's compute example uses wgpuDevicePoll to block until the memory is mapped. This works, but wgpuDevicePoll is wgpu-specific. The JS API returns a promise that can be used for blocking here, but I cannot see how something similar can be done using just webgpu.h. Some mechanism for blocking until asynchronous operations are done should probably be added.

Unclear what signalValue means

It's only used in one place. What value should callers pass in? What should implementations do with it?

Swapchain / Surface questions

I've started to implement swapchains in Dawn and ran into a bunch of questions.

Q: How do we report surface creation errors?
Tentative-A: We return a nullptr surface? But it doesn't allow reporting a string message to explain what went wrong. Alternatively surface creation can never go wrong and errors are always exposed when the swapchain is created on the surface.

Q: Can we have multiple swapchains on the same surface?
A: No, there is always a single current swapchain. Creating a new swapchain on the surface invalidates the previous one (this includes destroying the "current texture").

Q: Can we have subsequent swapchains on a surface be on different backends?
Tentative-A: No, it is a validation error. (this is to not deal with backend-compatibility stuff)

Q: Do we have the equivalent of GPUCanvasContext.getPrefferredSwapChainFormat?
A: Maaaybe? But then is it synchronous or async? Or we could just have a list of formats that we guarantee always work (and do blits)?

Q: How do we handle window resizes and minimization?
A: No clue, Vulkan has the "outdated" swapchain concept, but there's no clear way to do this in webgpu.h that would also match the Web where the application sets the size of the canvas directly. Also we should take care on Windows, where Vulkan for example requires the size of the swapchain to match exactly the size of the window.

Q: Do we do things for the user like resize blits, format blits etc?

LoadOp and StoreOp have different order

LoadOp and StoreOp have different orders: {clear, load} vs {store, clear}.
This isn't inherently a problem but if you accidentally use storeOp=WGPULoadOp_Clear or loadOp=WGPUStoreOp_Clear then the result could be confusing.

Arguably WGPUStoreOp_Store should be 0 because it's the default value in JS, but semantically that's part of GPURenderPassColorAttachmentDescriptor, not GPUStoreOp. Depending on how closely we want to match upstream, we could:

Ignore that fact and flip WGPULoadOp
Change the enums to be 1 and 2, with 0 being invalid
^, and add WGPUStoreOp_Undefined = 0 which applies the default in WGPURenderPassColorAttachmentDescriptor

typedef enum WGPULoadOp {
    WGPULoadOp_Clear = 0x00000000,
    WGPULoadOp_Load = 0x00000001,
    WGPULoadOp_Force32 = 0x7FFFFFFF
} WGPULoadOp;

typedef enum WGPUStoreOp {
    WGPUStoreOp_Store = 0x00000000,
    WGPUStoreOp_Clear = 0x00000001,
    WGPUStoreOp_Force32 = 0x7FFFFFFF
} WGPUStoreOp;

Tracking issue to write spec / explainers for native-specific parts of webgpu.h

Most of the behavior of webgpu.h is the same as the WebGPU API (normative reference) but there's a couple places where webgpu.h has non-trivial differences with WebGPU, and it also has native-specific APIs. This is a tracking issue to create initial explainers / spec for these parts.

Things that are very different in native:

#24: surface creation and swapchains
Instance / adapters and device creation
Event loop

Things that use callbacks instead of promises:

Other:

WGPUBindGroupBindingDescriptor (no sum types in C)
How the regular part of webgpu.h relates the the WebGPU IDL
#160 How to make holes in WebGPU sparse arrays that are plain C arrays.

WGPU_WHOLE_SIZE should probably be WGPU_REMAINING_SIZE

When we introduced this constant, defaulting the size of the GPUBindGroupEntry in the JS API made it take the whole buffer, but we then changed it to be the rest of the buffer. The semantic is different and the name should probably be changed.

Readonly vs ReadOnly

If we take a direct casing conversion from JS/IDL's "readonly-storage" to camel case ReadonlyStorage then it looks maybe a little funny. Should we case it as ReadOnlyStorage?

Note this also comes up again soon in WGPUStorageTextureAccess.

Should we keep consistency with IDL (Readonly), break with it (ReadOnly), or change the IDL ("read-only-storage")?

cc @toji

Meaning of TextureViewDescriptor mipLevelCount==0

In JS, mipLevelCount of 0 really means 0. If you want "auto" (baseMipLevel..end) then you have to specify undefined.

In C, should 0 mean "auto" or should it mean 0 (and WGPU_WHOLE_SIZE means "auto")?

EDIT: Also applies to arrayLayerCount.

Handling invalid/not-enabled chained structs on extensible Out-structs

WebGPU's limits don't seem to be optional, so it doesn't make much sense why wgpuAdapterGetLimits() returns a bool.

Add first instance feature.

That's now landed in the spec.

WGPUShaderModuleDescriptor has nothing in it

It probably at least needs const char* code.

C Blocks API

C blocks are a more convenient way of handling callbacks. They automatically capture the relevant variables from the enclosing scope.

This:

WGPU_EXPORT void wgpuAdapterRequestDevice(WGPUAdapter adapter, WGPUDeviceDescriptor const * descriptor, WGPURequestDeviceCallback callback, void * userdata);

could turn into this:

WGPU_EXPORT void wgpuAdapterRequestDevice(WGPUAdapter adapter, WGPUDeviceDescriptor const * descriptor, WGPURequestDeviceCallback callback, void * userdata);
#if defined(__BLOCKS__)
WGPU_EXPORT void wgpuAdapterRequestDeviceWithBlock(WGPUAdapter adapter, WGPUDeviceDescriptor const * descriptor, void (^callback)(WGPURequestDeviceStatus status, WGPUDevice device, char const * message))
#endif

(Of course, typedefs could make this more readable.)

Return a texture instead of a view by the swapchain

As a follow-up to #88, we should start aligning this API closer to upstream.
Switching to a texture from a view is one such step.

typedef WGPUTextureView (*WGPUProcSwapChainGetCurrentTextureView)(WGPUSwapChain swapChain);

Should be:

typedef WGPUTexture (*WGPUProcSwapChainGetCurrentTexture)(WGPUSwapChain swapChain);

WebGPU-Native -> WASI

As part of the discussion here WebAssembly/WASI#276

There seems to be enough interest to bring webgpu native to WASI(on the wasi side). It would be welcoming devlopment that potentially makes webgpu-native run on wasm vms besides the web env. Just want to bring up the attention from this community, and see if it is of interest to the webgpu-native community (@lachlansneff already volunteered)

WGPUVertexState's buffers array can't be sparse

WebGPU has

sequence<GPUVertexBufferLayout?> buffers = [];

So each item inside the array can be null.

However, these headers have:

typedef struct WGPUVertexBufferLayout {
    uint64_t arrayStride;
    WGPUVertexStepMode stepMode;
    uint32_t attributeCount;
    WGPUVertexAttribute const * attributes;
} WGPUVertexBufferLayout;
...
WGPUVertexBufferLayout const * buffers;

This array appears to have to be dense.

Questions: Shader stages, texture formats, vertex formats, spirv

I have the following questions:

Shader stages only contains vert, frag and comp. This means geom, tesc or tese are not available. Also extended shader types like mesh or RT shaders aren't. Will this be updated when the spec is more mature?
If this standard is going to be implemented on mobile, it will need ASTC texture compression support since BC won't work on there. And ways to query the support of things like formats and extensions. Is this in progress?
A lot of vertex formats are missing, such as RGB 11_11_10 and other compression related formats. 3 and 1 attribute vertex formats are also missing, though I guess that could be accomplished by rounding up to the next and then discarding an attribute. But this is still quite inconvenient. Will this be added?
I thought SPIRV was not the choice for webgpu, or is this still a debate? Or is this handled by transpiling spir-v to an intermediate and then compiling it again. In which case I'd rather have the build system take care of that.

Setup the CI to ensure the headers can be compiled

Callback order and threading is unspecified

If someone calls the same asynchronous function twice, with the second one being before the first one's callback is called, is the first one's callback guaranteed to be called before the second one's callback?

Are the callbacks allowed to be called on different threads than the source call was made on?

How to represent "undefined" in bytesPerRow/rowsPerImage?

The obvious choices are 0 and 0xFFFF'FFFF. Zero is nice because it's the default in C, where you can't specify other default values.

The problem with 0 is that you can't implement all of the validation behind the C API: undefined can be valid when 0 would not be valid. In order to implement the JS API on top of the C API, you have to treat all 0s as undefineds in C, then implement additional validation in front of the C API to reject 0s if they wouldn't be valid.

The problem with either is that you're narrowing 2**32+1 values (uint32_t + undefined) into 2**32 values (uint32_t). However in practice 0xFFFF'FFFF is (probably) never a valid value for either one, so you can implement the JS API in front of the C API a bit more simply: just if (value == UINT32_MAX) { injectDeviceError }.

Alternatives are:

0
UINT32_MAX
uint32_t* bytesPerRow (gross)
bool hasBytesPerRow; uint32_t bytesPerRow; (also gross)
uint64_t bytesPerRow; or int64_t bytesPerRow; (eh)
Change rules in JS
?

Making FragmentState an Object in RenderPipelineDescriptor instead of a ptr.

Right now in RenderPipelineDescriptor, VertexState is an object, not a pointer since it is easier to just modify it (with it already allocated). IIUC, FragmentState is a pointer right now only because it is optional. It would be nice if FragmentState was also an already allocated object in the descriptor, and we could use an internal member's null-ness to determine if we have any FragmentState. (i.e. we could check if FragmentState.module is null to determine if we have an actual FragmentState).

This way, Vertex and Fragment state can be consistent with one another.

Issues with C / C++ bool type.

This is an issue that someone (graphite?) raised on Discord but didn't file here. The bool types in C/C++ are kinda weird and can not match, especially when put back to back. webgpu.h shouldn't rely on this type, and instead do something like Vulkan's VkBool32.

WGPUDeviceDescriptor doesn't have a label

WebGPU has

dictionary GPUDeviceDescriptor : GPUObjectDescriptorBase

yet WGPUDeviceDescriptor has no label.

Synchronous validation in getMappedRange

How should synchronous validation errors be exposed in getMappedRange? It can return null, but that doesn't tell the caller what went wrong. This would probably be fine, except e.g. Chromium needs to know what string to attach to the DOMException it generates. To do that now it needs to duplicate all of the validation that's already in getMappedRange.

Make the blend state nullable

Need to update the headers according to gpuweb/gpuweb#1134

Support for etc2 and astc is not present

WGPUTextureFormat and WGPUFeatureName would have to be updated.

Signaling allocation failure for buffers with `mapAtCreation==true`

Following gpuweb#872, we need a way for buffer creation with mapAtCreation==true to fail.

I can think of a couple options, none of which I love, so I would be open to other ideas.

Option 1

wgpuDeviceCreateBuffer returns nullptr. This would match the JS API best (throws an exception, and the GPUBuffer would be undefined), but has the unfortunate consequence that it is the only part of the C API that returns something null.

Option 2

wgpuDeviceCreateBuffer works, but wgpuBufferGetMappedRange returns nullptr. This would be a little more natural as wgpuBufferGetMappedRange is what gives you the pointer, and developers would be more used to checking if a void* is non-null. The drawbacks are as follows:

It would not match the JS API. You would still get back a WGPUBuffer when using the C API.
Implementing it would involve some magic in the WASM to JS layer (ex. Emscripten) to try {} catch {} around the call to wgpuDeviceCreateBuffer and then give back some proxy object which does nothing on the other methods and return nullptr from getMappedRange

Thinking about this after I wrote it, Option 2 may have way too much magic involved with other places a WGPUBuffer is used like making bind groups and stuff. Maybe it is workable if the implementation reserves a handle for the "not actually allocated" WGPUBuffer object but it doesn't seem like a good idea. Only Option 1 looks feasible.

nullable annotations would be useful

It would help implementations know what's allowed to be null and what isn't.

https://clang.llvm.org/docs/analyzer/developer-docs/nullability.html

Compatibility with iOS simulator

@CurryPseudo found out that iOS simulator doesn't support comparison sampler states, see gfx-rs/wgpu#1715 (comment)
This means we can't run WebGPU proper on it, and it's quite sad for development.

Possible solutions:

draft a native-only feature for immutable samplers
add "downlevel" features to make it more compatible with GL, DX11, and things like iOS simulator. It would have a flag for comparison samplers, basically.
push Apple to make the iOS simulator more capable
live with the problem forever

No signature for fillBuffer

https://gpuweb.github.io/gpuweb/#dom-gpucommandencoder-fillbuffer

Change "FromHTMLCanvasId" to "FromHTMLSelector"?

On emscripten-core/emscripten#11361 (comment) @juj suggests that we select using document.querySelector instead of document.getElementById as it's more flexible and matches the behavior of the rest of Emscripten.

I originally considered doing this but stuck with getElementById to begin with for simplicity. However querySelector is probably the most "modern web" way to do this. It returns 0-1 results (only the first thing matching the selector) in a standard/reliable way.

WDYT?

Holding pass dependencies alive

Render passes are expected to hold majority of the function calls coming from the most complex applications. It's in everybody's best interest to make encoding render passes to be light and performant.

One of the aspects of the JS spec that complicates it is tracking dependencies. Something used in a render pass may be deleted before it has a chance to be used, and it's the responsibility of the spec implementor to make sure the render pass as well as command encoder holds those dependencies alive.

For native targets though, we could make encoding simpler if this responsibility was put on the shoulders of the users. For example, in Rust it's easy to enforce the dependency life times at compile time, thus avoiding the overhead completely.

[DISCUSS] Synchronization or Lightweight Sync to Async Abstraction

One important nature of the WebGPU is that it is asynchronize, which means synchronization can only be done through callbacks.

Such API deisgn forces async semantics on the application, and it totally makes sense on a host execution environment such as javascript. For example, in our recent work to support machine learning, we simply wrap the async interface as a JS future and use javascript's await for synchronizaiton.

On a native only wasm application though, the async nature of the API puts quite a lot of burdens on the application itself. For example, it is extremely painful to directly use the webgpu-native C API to write a C/C++ application, because there is no built in async/await mechanism in the language.

I want to use the thread to gather thoughts along the direction as there are quite a few design choices. These design choices relates to both the WASM execution env, as well as the header defintion. I will list them below

C0: Only Keep Async C API, and rely on async/await support on the languages that compiles to wasm
- Explaination: it is certainly be easier to target the current C API using rust, because the language have native async/await support.
C1: Introduce Sync API, and sync support on native
- Most of the the downstream APIs(metal, vulkan) do have a synchronization primitive, and we could just expose them as an API
C2: Introduce Sync API, think about asynchization
- Same as C1, but we acknowledge the fact that async is the nature of WebGPU. Because the synchronization(blocking) happens in WASM to system boundary, there are certainly techiniques(with limitations) to turn the synchronization call to an async version. However, such feature either depends on the compiler, or the WASM VM(runtime). As a simple example, if we place a restriction that the async system call can only resume at the call-site. Then we could simply "freeze" the state of wasm vm, do other jobs, and then re-enter without any backup of the stack(because stack is already in the linear memory of the wasm), this removes the overhead of a pause/resume, but requires the support of the WASM runtime.

My current take is that C2 is the most ideal one, as it enables applications to write as native, but still deploys to most platforms, however, there are certain gaps in runtime(related to standardization) to make that happen.

How to pass features in the device descriptor?

In the JS API, the device descriptor takes a sequence of strings which are the feature names. The list of features (as strings) can also be queried on the adapter.

In the C API I see two options to represent this. (actually three, but the last one seems bad)

List of null-terminated strings. This is how Vulkan does device extensions. We'd have a list of feature names somewhere and ask people not to make extensions with conflicting names.
List of enums - we'd ask people making extensions to make a PR to webgpu-headers to reserve their enum values. This is what SPIR-V does for OpCodes introduced by an extension. Emscripten or whatever translates C -> JS would need to have a table mapping the enums to strings for the JS API.
A struct where each feature is a boolean member. This is not easily extensible.

@kvark @Kangz what do you think?

I have a slight preference for (1) since it's the most direct translation to JS, so it would be less maintenance in the C->JS layer. And, I don't think it's that important to optimize passing the list of features since it only happens on device creation.

Edit: Nevermind, I remembered we have string enums in JS anyway so there need to be enum tables maintained anyway. I prefer (2) then.

WGPUCompilationInfo seems unused

WGPUShaderModule probably needs a wgpuShaderModuleCompilationInfo(WGPUCompilationInfoCallback callback, void * userdata) method.

Add swapchains!

Right now the header doesn't have any swapchain interfaces. Dawn's swapchain interface is OMG-level of horrible but I heard wgpu-native has something nice. @kvark could you contribute that interface to this repo?

Figure out when callbacks are called, and on what thread

Currently in Dawn, we have to call device.Tick() repetitively in order to make progress and receive callbacks. If there are any pending callbacks, they're resolved on the same thread that you call Tick(). We've been thinking about removing Tick and checking callback completion on a separate thread. This has opened up a couple of possible ways forward, and we should have the behavior consistent for users of webgpu.h

1. Use Tick

Calling this in a loop is particularly wasteful.
It's easy to misuse -- calling it often all the time or not at all
It isn't part of the JavaScript API.
It does however let the application most explicitly control when callbacks are delivered
May require an application to do their own cross-thread posting if they want to process the callback somewhere else

2. Call callbacks on a separate thread

May require an application to do their own cross-thread posting if they want to process the callback somewhere else
The application receives the callback on a thread it didn't create -- exposing an implementation detail

3. Forward callbacks to the originating thread

May require an application to do their own cross-thread posting if they want to process the callback somewhere else
Requires the implementation to check at every / most API calls if there's a callback to process
This is probably where an application wants to receive a callback, so it's nice if we do it for them.

I lean toward option (2). Having (1) is a fairly inefficient solution. I don't want to have the implementation do extra work to implement (3), especially if an application already manages cross-thread tasks itself.

API contract for wgpuInstanceProcessEvents() is unclear

Its not clear whether it supposed to block or not, and how it integrates with an already-existing runloop.

Query sampling has moved to be declarative on the passes

gpuweb/gpuweb@e0cf8ca

Most objects have no way of setting their label

Compute pipelines, render pipelines, and shader modules are the only objects which have a way to set their label. However, in WebGPU, almost all objects have settable labels.

Have all enums have "undefined" as value 0?

This would allow zero-initializing to be as close as an empty dictionnary as possible in JS. It would also help if the JS API decide in after version 1 to make some enum members optional for example when adding defaults.

add usage examples

How to target both wgpu-native and Dawn's implementations of WebGPU?

Formulation of features doesn't match WebGPU API

In WebGPU, they're exposed as:

interface GPUSupportedFeatures {
    readonly setlike<DOMString>;
};
...
readonly attribute GPUSupportedFeatures features;

This setlike interface doesn't match these headers, which just have

WGPU_EXPORT bool wgpuAdapterHasFeature(WGPUAdapter adapter, WGPUFeatureName feature);

How to support WGSL?

Now that WebGPU decided to ingest WGSL webgpu.h needs to be changed to reflect that. However I don't think that we want to remove the possibility of ingesting SPIR-V because it would make a lot of people in native very sad.

How about having webgpu.h support both? Either by putting both shading languages in WGPUShaderModuleDescriptor or alternatively by making WGPUShaderModuleDescriptor like WGPUSurfaceDescriptor, mostly an empty interface that only has extension structs?

usability of swapchain in a native context

At Deno, we are currently working on adding swapchain support which would allow to render to a window. Problem is that Deno strives to be browser compatible, but that is difficult as usage of swapchain wasn't designed for a native context.

	#define WGPU_LIMIT_U32_UNDEFINED (0xffffffffUL)
	#define WGPU_LIMIT_U64_UNDEFINED (0xffffffffffffffffULL)