immersive-web / depth-sensing Goto Github PK

Specification: https://immersive-web.github.io/depth-sensing/ Explainer: https://github.com/immersive-web/depth-sensing/blob/main/explainer.md

License: Other

Makefile 0.81% Bikeshed 99.19%

webxr incubation augmented-reality depth-sens

depth-sensing's Introduction

Depth Sensing

This is the repository for depth-sensing.

Taking Part

Read the code of conduct
See if your issue is being discussed in the issues, or if your idea is being discussed in the proposals repo.
We will be publishing the minutes from the bi-weekly calls.
You can also join the working group to participate in these discussions.

Specifications

Related specifications

WebXR Device API - Level 1: Main specification for JavaScript API for accessing VR and AR devices, including sensors and head-mounted displays.

Relevant Links

Communication

Maintainers

Tests

For normative changes, a corresponding web-platform-tests PR is highly appreciated. Typically, both PRs will be merged at the same time. Note that a test change that contradicts the spec should not be merged before the corresponding spec change. If testing is not practical, please explain why and if appropriate file a web-platform-tests issue to follow up later. Add the type:untestable or type:missing-coverage label as appropriate.

License

Per the LICENSE.md file:

All documents in this Repository are licensed by contributors under the W3C Software and Document License.

Summary

For more information about this proposal, please read the explainer and issues/PRs.

depth-sensing's People

Contributors

Stargazers

Watchers

Forkers

alfaomegagrafx himorin raymondxie maksims autokagami aasim-syed raananw isabella232 ungtb10d drumath2237 c05m0ch405 cabanier seanpm2001 rigotamas bkardell

depth-sensing's Issues

`normDepthBufferFromNormView` in XRDepthInformation is not a rigid transform

We should look into changing the type of normDepthBufferFromNormView since it can be a non-rigid transform - rigid transforms cannot apply scaling, and this is what can be performed by this transformation.

What we'd need is a general-purpose matrix type in the IDL. There is a DOMMatrix which I think could work here, but this is going to unfortunately be a breaking change. 😢

Allow data to be accessed directly by the GPU

Splitting @cabanier's question into a new issue:

Is there a way to directly push this data to the GPU? Having an intermediate JS buffer will create overhead (and more GC). If the data is going to be used by the GPU, it seems more appropriate to surface it there.

To me, the main question to answer here is: do we want the data to be accessible by the CPU as well? I'd expect so, because if not, we'll lose out on most (all?) use cases for physics.

Broken references in WebXR Depth Sensing Module

While crawling WebXR Depth Sensing Module, the following links to other specifications were detected as pointing to non-existing anchors, which should be fixed:

_{This issue was detected and reported semi-automatically by Strudy based on data collected in webref.}

WebXR Depth Sensing Module: What is DepthInfo.normDepthBufferFromNormView matrix?

Inside WebXR Depth Sensing Module :
What is the difference between "normalized view coordinates" and "depth buffer’s coordinate system" ?
Does normDepthBufferFromNormView provides the projection matrix to transform depth buffer’s array to normalized view coordinates?

Handling of depth buffers for stereoscopic systems

Splitting @cabanier's question into a new issue:

How is a single depth buffer going to work with stereo? Will each eye have its own depth buffer?

The way I thought about it is that XRDepthInformation that we return must be relevant to the XRView that was used to retrieve it. In case of a stereo system w/ only one depth buffer, there would be 2 options: either reprojecting the buffer so that each of XRViews gets the appropriate XRDepthInformation, or exposing an additional XRView that would be used only to obtain the single depth buffer (but then it'd be up to the app to reproject, there are some XRViews for which XRDepthInformation would be null, and we are creating a synthetic XRView so maybe not ideal). If we were to require the implementation to reproject the depth buffer, how big of a burden would that be?

Depth Sensing up to 65m since ARCore May 2022 Update

ARCore SKD was updated to version 1.31 in May 2022.

With this update, they increased the Depth Sensing from 8m to 65m: https://developers.google.com/ar/develop/depth/changes#android-ndk-c

Old function: ArFrame_acquireDepthImage()
New function: ArFrame_acquireDepthImage16Bits()

However, WebXR seems to be using the old function until now. Because I am only able to retrieve the Depth until 8m.
This is probably because WebXR has not been updated since May.

Will this be updated very soon?
And what does WebXR return? Does it return the raw depth values or the depth values?

Thank you very much for your replies.

Early feedback request for the API shape

Hey all,

I'd like to ask people to take a look at the initial version of the explainer and let me know if there are any major problems with the current approach (either as a comment under this issue, or by filing a new issue). I'm looking mostly for feedback around API ergonomics / general usage, and possible challenges related to implementation of the API on various different kinds of hardware / in different browsers / etc.

+@toji, @thetuvix, @grorg, @cabanier, @mrdoob, @elalish

Potentially incorrect wording in the specification

When going over the spec for issue #43, I have realized that we may have a mismatch between what the specification says, and what we do in our ARCore-backed implementation in Chrome.

Namely, the spec says that in the buffer that we return, "each entry corresponding to distance from the view's near plane to the users' environment".

ARCore's documentation seems to have a conflicting phrasing:

In ArFrame_acquireDepthImage(), we have "Each pixel contains the distance in millimeters to the camera plane".
In Developer Guide, we have "Given point A on the observed real-world geometry and a 2D point a representing the same point in the depth image, the value given by the Depth API at a is equal to the length of CA projected onto the principal axis".

If ARCore returns data according to 1), then I think it'd be acceptable to leave the spec text as-is, but then our implementation may not be correct (namely, I think we may run into the same issue that causes @cabanier to need to expose at the very least the near plane distance that ARCore internally uses?).

If ARCore returns data according to 2), then the values in the buffer we return are not going to depend on the near plane. In this case, we are not going to be compliant with the spec (we don't have a distance from near plane to user's environment), and the only way to be compliant will require us to adjust each entry in the buffer - this may be expensive given that this'll happen on CPU. IMO the best way to fix this would be to change the spec prose here, but I think this may be considered a breaking change, so we'll need to discuss how to move forward.

I'm going to try to confirm with ARCore what is actually their behavior, I'm not sure if this issue is actionable until that happens.

Failed to read the 'dataFormatPreference' property from 'XRDepthStateInit'

Hello,
When I try to request an XR session with depth support using the code provided in explainer.md, namely:

const session = await navigator.xr.requestSession("immersive-ar", {
  requiredFeatures: ["depth-sensing"],
  depthSensing: {
    usagePreference: ["cpu-optimized", "gpu-optimized"],
    formatPreference: ["luminance-alpha", "float32"]
  }
});

the browser (Chrome Canary 98 on Android 11) complains that:

TypeError: Failed to execute 'requestSession' on 'XRSystem': Failed to read the 'depthSensing' property from 'XRSessionInit': Failed to read the 'dataFormatPreference' property from 'XRDepthStateInit': Failed to read the 'dataFormatPreference' property from 'XRDepthStateInit': Required member is undefined.

There seems to have been a change in naming conventions.
The fix is to replace the line:

formatPreference: ["luminance-alpha", "float32"]

with the following:

dataFormatPreference: ["luminance-alpha", "float32"]

I haven't studied the behavior for other browsers / browser versions.
Could you update the explainer.md file accordingly?
Thank you!

And thank you for your great work and documentation! There is no document that explains the Depth API as clearly as explainer.md does (and by far).

Exposing confidence levels through the API

The current API shape does not expose confidence levels even though both ARKit and ARCore provide this information, in 2 different ways:

in ARKit, the confidence map is a separate attribute available on ARDepthData
in ARCore, the confidence values are packed in 3 most significant bits of depth buffer entries, moreover, they are currently documented as being equal to 000s.

It seems to me that the cleanest way to expose confidence levels via the depth sensing API is to add a separate attribute to XRDepthInformation (an Uint8Array?). This would mean that for ARCore implementation, we'd need to make sure to always copy 3 most significant bits of depth buffer entries into a separate buffer (if we decide to expose this information), and mask them in the depth buffer to always be set to 0 in case the underlying ARCore code changes. It would also mean that we'd have to surface this as 1 more opaque WebGLTexture so that it's available for GPU consumption (assuming that we want to optimize for GPU access, see my comment). Additionally, for devices that support depth, but do not offer confidence values, it would be simple to just return null to communicate this.

If the above sounds acceptable, the current API shape allows us to add the separate attribute once that's something that's needed & I'm not worried. Alternatively, if we were to pack confidence levels in the depth buffer entries (ARCore-style), this is something we need to decide on now to ensure we're not making breaking changes later.

How to estimate focal length of an android phone ToF camera?

I need to estimate the focal length of the ToF camera for the calculation of depth surface normal. I was going through the documentation of "WebXR Depth Sensing Module" but I couldn't find any info regarding the ToF intrinsic or field of view.

Any comments and suggestions will be appreciated!!!

Revisit a way to surface depth API to the app?

Quote from twitter's @mrmaxm:
https://twitter.com/mrmaxm/status/1333516895975305218

"Or look into similar approach of Hand Tracking API with providing allocated array into function, which will fill it with data.
Allocations - is very important issue with realtime apps."

Close distance measurement at screen center

I've been trough a bit of documentation, and I have a couple of questions I cannot find answers to. Hoping someone here can assist.

What is the closest distance that can be measured. I am not talking about the depthNear clipping, which I understand, But say I have an object 15cm away in the center of the screen, will the depth sensing be able to report that distance?

Also, how do you get the distance in the center of the screen, 'getDepthInMeters()' where x and y are both '0.5' should be right in the middle of the image, right? I this is the case, I keep getting a distance between 0.5m and 0.7m (or 0m) when aiming at a surface about 20cm away? If I am misunderstanding what the depthInMeters() is meant to return, please enlighten me

/agenda can we make XRDepthStateInit optional?

Relationship / intersection with Media Capture Depth

Moved from my personal version of depth sensing repository - @dontcallmedom wrote:

The WebRTC Working Group developed a few years ago an extension to Media Capture and Streams to be able to capture depth data from depth cameras:
https://w3c.github.io/mediacapture-depth/

I understand there is an experimental implementation of that spec in Chromium (cc @riju).

There is likely some overlap between this proposal and that API that might be useful to explore.