wicg / capability-delegation Goto Github PK

View Code? Open in Web Editor NEW

14.0 9.0 6.0 112 KB

An API to allow developers transfer the ability to use restricted APIs to any target window in the frame tree.

License: Other

HTML 85.33% Bikeshed 14.67%

delegation activation-delegation payment-request fullscreen

capability-delegation's Introduction

Capability Delegation

Transferring the ability to use restricted APIs to another window.

(Draft specification)

Author

Mustaq Ahmed ([email protected], github.com/mustaqahmed)

Participate

Github repository: WICG/capability-delegation
Issue tracker: WICG/capability-delegation/issues

Introduction

"Capability delegation" means allowing a frame to relinquish its ability to call a restricted API and transfer the ability to another (sub)frame it trusts. The focus here is a dynamic delegation mechanism which exposes the capability to the target frame in a time-constrained manner (unlike <iframe allow=...> attribute which is not time-constrained).

The API proposed here is based on postMessage(), where the sender frame uses a new PostMessageOptions member to specify the capability it wants to delegate.

Motivating use-cases

Here are some practical scenarios that are enabled by the Capability Delegation API.

Secure PaymentRequest processing in a subframe

Many merchant websites perform payment processing through a Payment Service Provider (PSP) site (e.g. Stripe) to comply with security and regulatory complexities around card payments. When the end-user clicks on the "Pay" button on the merchant website, the merchant website sends a message to a cross-origin iframe from the PSP website to initiate payment processing, and then the iframe uses the Payment Request API to complete the task.

But sites are only allowed to call the Payment Request API after transient user activation (a recent click or other interaction) to prevent malicious attempts like unattended or repeated payment requests. Since the user probably clicked on the main site, and not the PSP iframe, this would prevent the PSP from using the Payment Request API at all. Browsers today support such payment processing by ignoring the user activation requirement altogether (see crbug.com/1114218)!

Capability Delegation API provides a way to support this use-case while letting the browser enforce the user activation requirement, as follows:

// Top-frame (merchant website) code
checkout_button.onclick = () => {
    targetWindow.postMessage("process_payment", {targetOrigin: "https://example.com",
                                                 delegate: "payment"
                                                });
};

// Sub-frame (PSP website) code
window.onmessage = () => {
    const payment_request = new PaymentRequest(...);
    const payment_response = await payment_request.show();
    ...
}

Allowing fullscreen from opener Window click

This is a work-in-progress in Chrome.

Consider a presentation/slide website where the main "control panel" window has spawned a few presentation windows, and the user wants to selectively make one presentation window fullscreen by clicking on the appropriate button on the main window (a feature request from a developer). Clicking on the "control panel" button does not make the user activation available to the presentation window, so this does not work today.

The Web does not support this use-case today but Capability Delegation API provides a solution:

// Main window ("control panel") code
let win1 = open("presentation1.html");
let win2 = open("presentation2.html");

button1.onclick = () => win1.postMessage("msg", {targetOrigin: "https://example.com",
                                                 delegate: "fullscreen"});
button2.onclick = () => win2.postMessage("msg", {targetOrigin: "https://example.com",
                                                 delegate: "fullscreen"});

// Sub-frame ("presentation window") code
window.onmessage = () => document.body.requestFullscreen();

Allowing display capture from cross-origin iframe click

Consider a web app in which you want to add video-conferencing capabilities. You turn to a third party solution that can be embedded in a cross-origin iframe. There's a lot of logic behind the scenes, but UX-wise, maybe you work out a scheme where it's mostly the video which is user-facing in the video-conferencing iframe, and the user-facing controls - mute, leave, share-screen - are all part of the web app, and receive its specific UX styling. When those buttons are pressed, some messages are exchanged between the web app and the embedded video-conferencing solution.

To let the third-party iframe to prompt the user to share a tab, a window, or a screen, the top frame would delegate the mediaDevices.getDisplayMedia() permission to the iframe as follows:

// In the top frame, user clicks the "Share My Screen" button.
button.onclick = () =>
  frames[0].postMessage("msg", { delegate: "display-capture" });

// In the cross-origin video-conferencing iframe, prompt the user
// to share a tab, a window, or a screen.
window.onmessage = () => navigator.mediaDevices.getDisplayMedia();

Other similar scenarios

A web service that does not care about user location except for a "branch locator" functionality provided by a third-party map-provider app can delegate its own location access capability to the map iframe in a temporary manner right after the "branch locator" button is clicked.
An authentication provider may wish to show a popup to complete the authentication flow before returning a token to the host site.
A website may want a third-party chat app in an iframe to be able to vibrate the phone on message receipt, even when the user is not active in the iframe.

Non-goals

This explainer is not about delegation of user activation (i.e., allowing the iframe to choose from all of the things the top frame could do after a user click or other interaction). See Considered Alternatives below for more details.
This explainer does not determine which APIs could possibly support capability delegation. If any API needs the support, the designers of the API would decide details of delegated behavior. The PaymentRequest API case presented here (in collaboration with the owners of that API) serves as a guide for similar changes in other API specifications.

Using capability delegation

Developers would use Capability Delegation by just initiating the delegation appropriately, as shown in the example code snippets above. In short, when a browsing context wants to delegate a capability to another browsing context, it sends a postMessage() to the second browsing context with an extra WindowPostMessageOptions member called delegate specifying the capability.

After a successful delegation, the "user API" (the restricted API being delegated) just works when called at the right moment. The general idea is calling the restricted API in a MessageEvent handler or soon afterwards. In the examples above, the restricted APIs are payment_request.show(), element.requestFullscreen(), and mediaDevices.getDisplayMedia() respectively.

Demo

Payment Request API: To see how this API works with Payment Request, run Chrome with the command-line flag: --enable-blink-features=PaymentRequestRequiresUserActivation, then open this demo.
Fullscreen API: Work in progress.
Screen Capture API: Work in progress.

Considered alternatives

Delegating user activation instead of a specific capability

It may appear that we can delegate user activation to solve the same use-cases and thus avoid specifying a feature in the postMessage() call. We attempted this direction in the past from a few different perspectives, and decided not to pursue this. In particular, user activation controls many Web APIs, so delegating user activation for any of the mentioned use-cases is impossible without causing problems with unrelated APIs. See the TAG discussion with one past attempt.

Using a delegation-specific method instead of postMessage()

Instead of piggy-backing the delegation request as a PostMessageOptions entry, we considered adding a new delegation-specific interface on the Window object. While the latter may look cleaner from a developer’s perspective, to support cross-origin communication this solution would require adding the new method on the WindowProxy wrapper, which HTML's editor strongly disliked.

Stakeholder feedback/opposition

We will track the overall status through this Chrome Status entry.

Acknowledgements

Many thanks for valuable feedback and advice from:

Anne van Kesteren (github.com/annevk)
Jeffrey Yasskin (github.com/jyasskin)
Robert Flack (github.com/flackr)

capability-delegation's People

Contributors

Stargazers

Watchers

Forkers

mustaqahmed tomayac alexanderfedin beaufortfrancois tidoust seanpm2001

capability-delegation's Issues

Requiring the postMessage origin not to be a wildcard?

If website don't use COOP:same-origin, they could, without knowing, have an opener/openee relationship with a malicious window. This relation can remain open a long time, despite several navigations in both windows.

With postMessage, we recommend developers not to use targetOrigin="*".
With capability delegation, it might be wise to require it instead of recommending it.

Why it is safe to take timestamp when delegation has been received

https://wicg.github.io/capability-delegation/spec.html#monkey-patch-to-html-tracking-delegation

"If delegate is not null, AND the user agent supports delegating delegate, then set DELEGATED_CAPABILITY_TIMESTAMPS[delegate] to current high resolution time."

But if the main thread of the target has been busy, that current time might be way in the future comparing to when the message was sent. Why is that ok - or am I missing something?

Make postMessage()'s delegate an enum

Given that we want to throw for unknown values, it might as well be an IDL enum? That would simplify some things for #9, too.

Why consumption of user gesture in targetWindow

It seems to me that in https://wicg.github.io/capability-delegation/spec.html#monkey-patch-to-html-initiating-delegation you want to consume the gesture in incumbentSettings's global or some such. That's the window that is transferring its activation, right?

(Also, if you change an existing variable you cannot use let, you need to use set. See https://infra.spec.whatwg.org/#variables.)

Fullscreen and other APIs

The specification talks about fullscreen, but it seems only payments is supported? Why is that?

Should it be possible to delegate capabilities over a MessagePort?

Should it be possible to delegate capaibilities over a MessagePort?
Should it be possible to do so even cross-tab?

How many layers of delegation

It's somewhat common for sites to have multiple layers of nested documents. How would that work here? In the current setup it seems the top-layer would have to be aware of each of them so it can message directly to the innermost document that might be responsible for fullscreen or some such, but is that ideal? (It doesn't seem ideal.)

Normative text on how the sender frame relinquishes the capability

This was originally reported in Chrome intent thread:

We need to expand more on the relinquishing aspect and how regaining the capability happens. We don't have any normative text in the spec that explains how it happens.

Clarifying the algorithm for feature detection

Ideally, the spec should provide an example (and clarify the window post message algorithm's monkey-patch to eludicate how a site can detect this feature.

Ideally, the site could call postMessage on its own window with a capability to delegate, without user activation, and check for a NotAllowedError (or similar) to detect the user agent's ability to delegate that capability.

The algorithm may also want to consider clarifying the behavior when the destination doesn't have a supporting feature policy.

Is "token" the best term to use here?

We got an early feedback that the proposed term token is confusing...it seems to suggest a "transferable object". In our case the "token" would be "non-transferable by design". Not sure what could be a better alternative here. Any suggestions welcome.

Why not use sandbox?

Sorry if this is a silly question, but I'm wondering... why not just use something like sandbox=? Say:

<iframe sandbox="allow-transient-activation" allow="payments">

Then regular transient activation expiry time still applies to the remote origin and no need need to do any capability delegation (it's handled by allow= permissions policy).

Open question: do cross-origin iframes have their own transient-activation timer or is global cross-process? (I think I know the answer... but).

Sync proposed change to Payment Request

Our proposed monkey patch to Payment Request is out of sync with the target spec after w3c/payment-request#961.

FYI @marcoscaceres

How does this work with permission-gated capabilities and permission prompts?

What happens if the frame delegating the capability does not have the necessary permission to actually use it? In the case of subframe the usage seems close enough to permission/feature delegation but when it comes to popup windows this delegation seems confusing.

Would it be reasonable to enforce that the top-level frame can only delegate capabilities that it already has the permission to make use of?

Architecture thoughts

Looking at https://wicg.github.io/capability-delegation/spec.html#monkey-patch-to-payment-req it seems that the current setup is quite involved for participating specifications.

It seems to me the contract could be simpler. Whereby a participating specification provides an identifier and a global and a shared algorithm then returns whether it can proceed (previously known as "has transient activation").

(Perhaps a bit more is needed to address the variety of use cases, I haven't looked at this in detail, but in general we should strive for making adoption easy and put the bulk of the logic in the base specification.)

When does a delegated Payment Request capability expire?

The proposed monkey-patch to Payment Request spec needs to clarify when a delegated payment request capability expires.

The current text seems to suggest the same expiry time as transient user activation through the link from "expired" to HTML spec, but technically it doesn't really work because we are talking about a different timestamp field here.

@stephenmcgruer Before I change it, does "the same expiry time as transient user activation" make sense to Payment Request team? Or you want a different expiry?

Clarifying the behavior for consuming the user activation and delegated capability

For the APIs that would consume the user activation and delegated capability, Fullscreen and Payment, they have different behavior. If the global has both valid transient activation and delegated capability, the Payment API only consumes the transient activation, which means the Payment API is allowed to be called again because of the delegated capability is still valid; however Fullscreen API seems to consume both. Is this intentional?

Fix "browsing context" vs "document" mixup

The delegation should be to a target document, not to a target browsing context!

Thanks @annevk for pointing this out.

Require a non-* targetOrigin

This is to prevent giving access to a navigated child frame. You should know the origin of a trusted partner.

Consider extending MessageEvent

It may be useful to pass additional information to the receiver via the MessageEvent, so a developer can know that delegation failed, or was denied or something similar.

window.addEventListener("message", e => {
  if (e.delegate == false) {
    // do something useful, rather than hope the API i would have called has a rejection handler (or w/e)
 }
});

Serialize a string?

Let delegate be the serialization of options["delegate"].

There's nothing to serialize here, right?

Delegating a capability not acquired yet.

By providing the capability delegation as a string, developers may delegate features they don't have access to yet.

For instance. What would happen if:

postMessage("msg", {delegate: "geolocalisation"});

is sent before asking the users to allow geolocalisation on the document?

One alternative would have been to pass some "Capability/Token" object that can be constructed after getting access to a feature. This way, you could only delegate capabilities you already have access to.

If strings are used this bring interesting questions:

Do delegations applies retroactively to user's permission prompt?
If the permission prompt happens after the delegation, where do you show permission prompt? On both windows?
Do you have race conditions? How do the specification deals with it?

+CC @mikewest

Demo postMessage options cleanup

The demo currently has both delegate: "paymentrequest" and createToken: "paymentrequest" to make it testable with older Chrome. We need to remove the latter one when appropirate.

Also, the capability should be mentioned as "payment" as per our proposed spec draft.

Examples lack targetOrigin

It's not clear to me how the examples work at the moment. In the same-origin case this isn't needed as the relevant windows would already have transient activation. In the cross-origin case you need to supply targetOrigin.

Relation to permissions policy

The specification lists Permissions Policy as a normative dependency, but never references it. At a minimum, that's an editorial problem.

More substantively, I don't understand how the relationship between the two is supposed to be conceptualized. Is it only limited to those cases where there are transient activations involved? What if the top-level site delegates a capability that is not otherwise transient; does it become transient as a consequent of the delegation?

Interaction with Permissions / Feature Policy

How does this interact with Permissions Policy. e.g. in the example and demo shown:

checkout_button.onclick = () => {
    targetWindow.postMessage("process_payment", {delegate: "payment"});
};

The top-level frame knows the origin of targetWindow so it seems reasonable that it might have set a Permissions Policy to enable or restrict the feature for that origin. As in, it feels like a site might want to prevent the ability for a malicious script to insert an entry like this to delegate a capability.

Likewise, though this might just be my own confusion / wish-list, it would be nice if the capabilities were consistent in naming with Permissions Policy too… but I don't know if that's a valid thing to want.

USVString -> DOMString

I don't think there's a need for USVString here.