wicg / proposals Goto Github PK

A home for well-formed proposed incubations for the web platform. All proposals welcome.

License: Other

proposals's Introduction

Welcome to the WICG Proposals Repo!

This is the WICG proposals repo, a place for ideas to start their incubation journey. Plan to use this repo's issue tracker for submitting and discussing new proposals much like the now archived Discourse threads were used previously.

Please note that all proposals in this repo (including those in separate markdown documents) have no official status in the WICG as incubations.

What does a proposal look like?

A proposal outlines a particular problem or challenge on the web and offers a potential concrete solution. Without being too prescriptive, you know you've got a proposal when you can articulate a specific way (procedurally, algorithmically, declaratively) that a new or current web technology solves an existing problem or challenge. If the problem is unclear, or the potential solution is too abstract you're not quite there yet.

For example, this is not a proposal:

"Websites have lots of bugs because they don't get updated or because browsers change their behavior over time. There should be a universal way for users to report bugs to websites so that they get fixed."

Instead, a proposal might be:

"Websites have lots of bugs because they don't get updated or because browsers change their behavior over time. One way to solve this is to create an experience in the browser that allows users to record a set of steps that reproduce the problem, and then standardize the format for these replay instructions, and provide an API to allow sites to capture this feedback or an HTTP header to post this feedback back to their site. Below I describe the proposed format and API... <snip>"

Proposals in issues vs. separate markdown documents?

If you would like to make a proposal as a separate markdown document (if for example, it was developed in a separate repo) you are welcome to link to it from the proposal issue; just provide a bit of high-level context on the problem and proposed solution in the issue as well. Alternatively, if you'd like to use the proposal repository itself to host a separate markdown document, you are welcome to submit a PR for it— we still ask that you file an issue to track your proposal and provide context as previously mentioned. This will make it easier for our entire community to use the issue tracker to search through all proposals.

Getting Started

Is your proposal unique? Head over to the issues list and search for it; if it was suggested by someone else give it your support with a 👍 or leave a comment. If you don't find anything there, consider starting a new issue. (You are also welcome to visit Discourse and search there as well, especially while this repo is first getting populated).

Search proposals by category

The proposals are grouped by category (as discussions are on Discourse):

Label	Short description	Discourse category
APIs	All proposals about JS APIs.	APIs
HTML	HTML-related proposals (not only the HTML standard but any markup-related ideas).	HTML
CSS	CSS-related proposals.	CSS
Uncategorized	Proposals that don't fit into any other existing category.	Uncategorized
Meta	Proposals/ideas/discussions about this proposals repo, its organization, how it works, and how we can improve it.	meta
JS	JS language proposals.	JS
Security	Proposals with a focus on web security, client-side protections, improved site security, etc.	Security
Protocols	Proposals for anything relating to protocols such as HTTP, Web Sockets, & JSON-based protocols.	protocols
WASM	Proposals for the WebAssembly language. Also consider reviewing the WebAssembly Community Group's issue tracker.	asm.js
Architecture	For proposals related to web architecture or architectural components.	Architecture
Media & RTC	For Media (video/audio) and Real-Time Communications proposals.	Media and Real-Time Communications
Web Components	Proposals for web components. Also consider reviewing the webcomponents incubation issue tracker	Web Components
Web Apps	Proposals related to bringing App-like behavior to the web	n/a

Evolving from Discourse

Previously, we used Discourse discussions for anything that wasn't an official incubation (i.e., an incubation with its own repo). That forum was host to all discussions, early explorations, suggestions, and proposals for the web platform. The goal was to help establish our community where those who participated in developing web standards and those who didn't have the time but still wanted to influence/advance the web could mingle and share ideas. In short, bringing together the broadest possible community of interested developers to give and receive feedback. Our goal was largely successful.

As time passed, the developer ecosystem shifted. We now see that GitHub has a larger community of web developer interest than ever before, is the primary host of specification development for major web standards organizations and has great tools and integration for project and issue management. We also see an opportunity to provide more direct guidance on how to start an incubation, by creating this dedicated home for future incubations to be proposed that is distinct from other ideas, explorations, and discussions about the web platform.

This proposals repo will meet these changing needs. Here in GitHub you can easily extend your existing repo and issue monitoring techniques to keep track of what's being proposed. Additionally, this gives us a chance to clarify the WICG process for starting new incubations: rather than ask that proposals for new incubations be started on Discourse (intermingled with all other Discourse conversations), instead ideas should be filed as issues here in this repo, making it clear that these proposal issues intend to begin life as an incubation. Our expectations for evaluation of new proposals remains the same: as soon as sufficient interest is shown in the proposal's issue thread (notably from potential implementers), the WICG chairs will enable a team of editors to manage the proposal, and those team members can begin work in a new repo or move ownership of an existing GitHub repo to WICG. For more information about the evaluation and incubation process, see the admin repo's README file.

proposals's People

Contributors

Stargazers

Watchers

Forkers

aarongustafson global-localhost global19 global19-atlassian-net lifeisstrange tommyhero619 seanpm2001 wontonsam hiep-zen mickeymaddula

proposals's Issues

Document Policy

Introduction

This was originally proposed on Discourse, but this seems to be the new place for proposals.

Document Policy is a configuration mechanism for the web platform, allowing site authors (for instance) to enable or disable the use of platform APIs, improve their performance by setting thresholds for the sizes or efficiency of images, or sandbox individual frames.

It has split from Feature Policy, as a means of configuring features other than permissions, and so currently lives in a W3C WebAppSec WG repo. The spec is already in progress there, but the consensus of the WG is that it should be incubated at WICG while it is completed, and while we gather implementer interest.

Read the complete Explainer or the spec

Document Policy is under review by TAG.
It has been partially shipped in Chrome to support the "Scroll-to-text-fragment" feature.

Feedback (Choose One)

Please provide all feedback below.

Close signals

Introduction

Modals are UI components that are layered on top of all other content and take interaction focus. Some examples are:

a <dialog> element, especially the showModal() API;
a sidebar menu;
a lightbox;
a custom picker input (e.g. date picker);
a custom context menu;
fullscreen mode.

An important common feature of these modals is that they are designed to be easy to close, with a uniform interaction mechanism for doing so. Typically, this is the Esc key on desktop platforms, and the back button on some mobile platforms (notably Android). Game consoles also tend to use a specific button as their "close/back" button. Another case is VoiceOver users on iOS, who have a special dismiss gesture.

We define a close signal as a platform-mediated interaction that's intended to close a modal. This is distinct from page-mediated interactions, such as clicking on an "x" or "Done" button, or clicking on the backdrop outside of the modal.

Currently, web developers have no good way to handle these close signals for their own modals. This is especially problematic on Android devices, where the back button is the traditional close signal. Imagine a user filling in a twenty-field form, with the last item being a custom date picker modal. The user might click the back button hoping to close the date picker, like they would in a native app. But instead, the back button navigates the web page's history tree, likely closing the whole form and losing the filled information.

This explainer proposes a new API to enable web developers, especially component authors, to better handle these close signals. It also contemplates an alternate proposal that does not involve introducing a specific new API for close signals, but instead bundles these semantics with new higher-level APIs for modals, which would hopefully solve other problems like top-layer behavior or focus trapping. (But, the explainer does not itself tackle those problems.)

Read the complete Explainer.

Feedback

I welcome feedback in this thread, but encourage you to file bugs against the Explainer.

Managed configuration API

Introduction

On devices that are managed by an organization, there is a need to thoroughly set up the environment for the web applications before use. This API provides a way for web applications to access administrator-provided configuration.

This API is proposed to be added under new navigator.device namespace, which will be available to highly-trusted applications only.

Read the complete Explainer.

Feedback

I welcome feedback in this thread, but encourage you to file bugs against the Explainer.

Cross-Origin-Embedder-Policy: credentialless

The problem

Sites that wish to continue using SharedArrayBuffer must opt-into cross-origin isolation. Among other things, cross-origin isolation will block the use of cross-origin resources and documents unless those resources opt-into inclusion via either CORS or CORP. This behavior ships today in Firefox, and Chrome aims to ship it as well in 2021.

The opt-in requirement is generally positive, as it ensures that developers have the opportunity to adequately evaluate the rewards of being included cross-site against the risks of potential data leakage via Spectre. It poses adoption challenges, however, as it does require developers to adjust their servers to send an explicit opt-in. This is challenging in cases where there’s not a single developer involved, but many. Google Earth, for example, includes user-generated content in sandboxed frames, and it seems somewhat unlikely that they’ll be able to ensure that all the resources typed in by all their users over the years will do the work to opt-into being loadable.

Cases like Earth are, likely, outliers. Still, it seems clear that adoption of any opt-in mechanism is going to be limited (metrics). From a deployment perspective (especially with an eye towards changing default behaviors), it would be ideal if we could find an approach that provided robust-enough protection against accidental cross-process leakage without requiring an explicit opt-in.

The proposal

The goal of the existing opt-in is to block interesting data that an attacker wouldn’t otherwise have access to from flowing into a process they control. It might be possible to obtain a similar result by minimizing the risk that outgoing requests will generate responses personalized to a specific user by extending COEP to support a new credentialless mode which strips credentials (cookies, client certs, etc) by default for no-cors subresource requests.

Read the complete Explainer & Proposed specification

Feedback

I welcome feedback in this thread, but encourage you to file bugs against:
HTML spec. topic: coep-credentialless

+CC @mikewest @camillelamy @annevk @whatwg/cross-origin-isolation

URLPattern

Introduction

Service workers scopes currently use a very simplistic URL matching mechanism. We have heard from a number of sites that scopes could benefit from a more expressive pattern syntax. In addition, web developers often need to match URLs in order to implement routing systems.

This proposal introduces a URL matching primitive that can be used both directly in javascript and in web platform APIs like service workers.

Read the complete Explainer.

This proposal has been reviewed by the TAG.

There is also a detailed design doc that goes into greater API and chromium implementation details.

This proposal was discussed at TPAC 2019 and a virtual call around TPAC 2020.

The group decided at the latest meeting URLPattern should be spec'd under WICG to start. I'd like to move the explainer repo to WICG and rename it to urlpattern. Eventually the service worker bits will move out to the service worker WG repo.

Feedback (Choose One)

I welcome feedback in this thread, but encourage you to file bugs against the Explainer.

Region Capture: Cropping API for Video Tracks

Summary

Pre-Summary: Status

There is a detailed spec draft, and Chrome is implementing this for an origin trial.

Problem Overview

Recall that applications may currently obtain a capture of the tab in which they run using getDisplayMedia, either with or without preferCurrentTab. Moreover, soon another API will allow similar functionality - getViewportMedia. In either case, the application may then also wish to crop the resulting video track so as to remove some content from it (typically before sharing it remotely). We introduce a performant and robust API for cropping a self-capture video track.

Core Challenges

Layout can change asynchronously when the user scrolls, zooms or resizes the window. The application cannot robustly react to such changes without risking mis-cropping the video track on occasion. The browser therefore needs to step in and help.

Sample Use Case

Consider a combo-application consisting of two major parts - a video-conferencing application and a productivity-suite application co-existing in a single tab. Assume the video-conferencing uses existing/upcoming APIs such as getDisplayMedia and/or getViewportMedia and captures the entire tab. Now it needs to crop away everything other than a particular section of the productivity-suite. It needs to crop away its own video-conferencing content, any speaker notes and other private and/or irrelevant content in the productivity-suite, before transmitting the resulting cropped video remotely.

Moreover, consider that it is likely that the two collaborating applications are cross-origin from each other. They can post messages, but all communication is asynchronous, and it's easier and more performant if information is transmitted sparingly between them. That precludes solutions involving posting of entire frames, as well as solutions which are too slow to react to changes in layout (e.g. scrolling, zooming and window-size changes).

Goals and Non-Goals

Goals

The new API we introduce allows an application which is already in possession of a self-capture video track, to crop that track to the contours of its desired element.
The API allows this to be done performantly, consistently and robustly.

Non-Goals

This API does not introduce new ways to obtain a self-capture video track.
This API does not introduce mechanisms by which a captured document may control what the capturing document can see.

Solution

Solution Overview

A two-pronged solution is presented:

Crop-ID production: A mechanism for tagging an HTMLElement as a potential target for the cropping mechanism.
Cropping mechanism: A mechanism for instructing the user agent to start cropping a video track to the contours of a previously tagged HTMLElement, or to stop such cropping and revert a track to its uncropped state.

Crop-ID production

We introduce navigator.mediaDevices.produceCropId().

MediaDevices {
  Promise<DOMString>
  produceCropId((HTMLDivElement or HTMLIFrameElement) target);
};

Given an HTMLElement, produceCropId() produces a UUID that can uniquely identify that element to our second mechanism - the cropping mechanism.
(The Promise returned by produceCropId() is only resolved when the ID is ready for use, allowing the browser time to set up prerequisites and propagate state cross-process.)

Cropping mechanism

We introduce a cropTo() method, which we expose on all video tracks derived of tab-capture.

[Exposed = Window]
interface BrowserCaptureMediaStreamTrack : FocusableMediaStreamTrack {
  Promise<undefined> cropTo(DOMString cropTarget);
};

Given a UUID, cropTo() starts cropping the video track to the contours of the referenced HTMLElement.
Given an empty string, cropTo() reverts a video track to its uncropped state.
"On-the-fly" changing of crop-targets is possible.

Code Samples

/////////////////////////////////
// Code in the capture-target: //
/////////////////////////////////

const mainContentArea = navigator.getElementById('mainContentArea');
const cropId = await navigator.mediaDevices.produceCropId(mainContentArea);
sendCropId(cropId);

function sendCropId(cropId) {
  // Can send the crop-ID to another document in this browsing context
  // using postMessage() or using any other means.
  // Possibly there is no other document, and this is just consumed locally.
}

/////////////////////////////////////
// Code in the capturing-document: //
/////////////////////////////////////

async function startCroppedCapture(cropId) {
  const stream = await navigator.mediaDevices.getDisplayMedia();
  const [track] = stream.getVideoTracks();
  if (!!track.cropTo) {
    handleError(stream);
    return;
  }
  await track.cropTo(cropId);
  transmitVideoRemotely(track);
}

Spec draft

Please take a look at the proposed spec. (Easily missed, so repeated.)

New history event proposal

preface: this was first introduced here whatwg/html#5562 and at time of writing this message has a bit of positive "reaction" feedback (for a proposal, anyway 😄 ). Since this is a new place for proposals, I'm reopening the issue here. The original text is copy/pasted below, in hopes that 1) I'm doing this in the right spot, and 2) that it gains some traction. Please let me know if I've done something wrong - I'm open to feedback

Proposal

Add an event called statechange or historychange that will fire on any change to the history stack, whether that be through the browser's back button, or window.history.pushState or other methods.

This proposed event would be similar to popstate, except that it would fire on all route changes regardless of the source, much like hashchange fires on all hash changes regardless of the source.

Current Problems

hashchange events allowed javascript router libraries (e.g. React Router, vue-router) to easily respond to any routing event when the application is using hash routing.

However, with the HTML5 History API, there is no equivalent event that javascript routers can listen to.

This means that routers have the following limitations/problems:

They assume that they're the only Router that exists on the page - Routers require all code to call into it whenever making a URL change
Users cannot call window.history.pushState directly and must only use the Router's custom methods
Third party libraries that may want to change the URL or cause a Router to update have a difficult time since they can't call native APIs

References:

remix-run/react-router#6304

Example

window.addEventListener('historychange', (event) => {
  console.log('changed')
})

history.pushState(null, null, '/path')  // logs "changed"
location.hash = "sub" // logs "changed"

Side notes

This was created in collaboration with @joeldenning

[Proposal] Handwriting Recognition API

Introduction

Hi WICG,

I’d like to propose an API for web applications to utilize handwriting recognition services available on operating systems.

This would make it easier for developers to integrate handwriting recognition support in their applications, without having to use a third-party library or service.

Use cases

Note taking web app. Users use a stylus to take notes, and the app converts the drawings to text in real time.
Custom input fields. Developers can create custom handwriting input areas (e.g. Please sign here”), without using a pop-up virtual keyboard. This offers more immersive user experience.

Objective

We want to have an API that takes in a handwriting drawing (illustrated below), and tell us what characters were written, in real time:

Web apps can import handwriting recognition libraries via Web Assembly, but this would not allow the use of more advanced proprietary handwriting libraries (e.g. those available on the operating system). This topic of “Web Assembly vs Web API” was previously discussed in Shape Detection and Text-to-Speech APIs (to which the same constraints apply).

Given that handwriting recognition capability already exists in many OSes, we think web apps should have a way to use them:

How can we expose this capability to web apps?
How can we design a Web API that works well with different implementations?
How can we design an extensible way for web apps to fine-tune the recognizer?
What should we do if the operating system has no native handwriting recognition support?

Proposal

Add classes necessary to represent handwriting drawings in the JavaScript world.
Add a class so the app can instantiate and use handwriting recognizers provided by the OS or browser.
Add a way for apps to query which features are supported on the platform (e.g. languages).

Explainer

Here's the first attempt: https://github.com/wacky6/web-handwriting-recognition/blob/main/explainer.md

Feedback

Please provide all feedback below.

Forms lack basic functionality

As I have expressed in some proposals that I have sent, the forms have very little functionality, they have hardly evolved and it would be necessary to carry out a study on their present and future.

Form submission in the background without Javascript

Form submission does not show progress

Form not submitted warning message before unload

Form submission confirmation without Javascript

Change form "action" attribute to "target" and add "action" attribute to buttons

Groups of inputs and arrays of groups of inputs in forms

Add boolean input element with "true" and "false" values

Null value in file input

Add method to reset file input and show file size and image/video/audio preview

Add "type" attribute to "textarea" element

Change URL parameter without Javascript or Add a new element to change a URL parameter without Javascript

Searchable dynamic datalist with JSON list without Javascript

Add a new input type for verification codes

Device Attributes API

Introduction

It is a common requirement for operating systems to be able to provide applications with the highest degree of trust access to the highly-priviliged data/functions. One of the popular requirements in the commercial systems is the ability for a highly-trusted application to access device-identification information(serial number, asset it). This API tries to bring this ability to the Web, although restricting it to a small configurable subset of highly trusted applications.

Read the complete Explainer.

Feedback

I welcome feedback in this thread, but encourage you to file bugs against the Explainer.

Share Button Type

Introduction

The Web Share API provides a means “for sharing text, URLs and images to an arbitrary destination of the user's choice”. The most basic sharing use case is sharing the current page (especially important in progressive web apps that launch with a display value of “standalone” or “fullscreen”). A declarative option that meets this basic use case would enable authors to provide this functionality without requiring knowledge of JavaScript.

A value of “share” for the type attribute of the button element would allow authors to provide an interface element for sharing the current web page.

The content model for the button element allows for its type attribute to be extended in this way while allowing backwards-compatibility with non-supporting browsers. Support for a value of "share" can be tested in the same way that supported values for input types can.

Here's a polyfill.

Read the complete Explainer.

Feedback

I'm currently gathering feedback on existing usage of the Web Share API to verify that sharing the current page URL is a common use case.

I welcome feedback in this thread, but encourage you to file bugs against the Explainer.

Support currentColor in SVG displayed in HTML5 img tag

Introduction

Modern web sites often embed logotypes and other glyphs as vector graphics as icon fonts or inline SVG tags or img linking to SVG files.
One problem is that modern web sites want to support both light and dark modes, which creates a problem where black areas of SVG images become obscure in dark mode. Developers currently overcome that by applying filter: invert CSS property on dark theme conditions, but this approach causes a number of drawback such cpu usage of rasterization and wrong colors.

Use Cases (Recommended)

1 color SVG Website logotypes

Goals (Optional)

Add an attribute glyph which will make the black color #000 white or equalent to currentColor value. So an img ligo with this attribute would display black color in dark mode as white or what set to currentcolor

Proposed Solution

Add an attribute on img tag color=glyph which will substitute the color #000 or black with 'currentcolor' value as if the img data were embedded in an SVG tag where fill or border color have been set to 'currentcolor'

Examples (Recommended)

...
<img src="/images/logo.svg" colorscheme="glyph">
...

Alternative solution

An alternative solution would be that browser supports the SVG spec extended color values like 'currentColor' as the SVG format is used on more platforms and could allow such color scheme aware image resources to operate well with iOS and Android platform's dynamic color features such iOS .tint and Android's vector drawables.

Privacy & Security concerns

This functionality could make it possible for scripts to track if an user has toggled dark mode.

Privacy Factor for Form Fields

In our line of work, we provide online tech support, often using popular screen sharing software like Google Hangouts, Microsoft Teams, Zoom, and many others. The request can be from a user stuck on a form that includes sensitive information. Consider the use case of a disabled person stuck on an inaccessible ecommerce order form where the user and vendor are both quite interested in the successful placement of an order, perhaps the final form in the process.

Such users often have sensitive information such as a credit card number as a value displayed on the form. Sharing that screen as-is presents a security liability that neither the user nor we as support people wish to accept, even when the problem to address has nothing to do with such private information.

We are open to any solution to this problem, but lacking the W3C braintrust, we have come up with a method to assign and use privacy factors to hide such private field values. The web designer assigns default integer values from 1 (low vulnerability) to 9 (high vulnerability) to each field. A first name may get a 1 for example, but a credit card number or social security number may get a 9. Atop such a form is a Private/Public toggle button. Clicking it temporarily displays asterisks in place of values above a 5 threshold. The display thus becomes suitable for screen sharing.

It is also suitable for users who wish to hide fields from passing eyes, perhaps filling in medical forms.

We have posted a demo of the above at
http://privacy.bizwaredev.com
with optional features to change the threshold from the default 5 for users who wish more or less security overall. It also allows users to raise the security factor of a particular field if that has a sensitive value in their case, or to reduce it for their circumstance, for example, SS#: None.

There are many ways to achieve the above goal. If the W3C doesn't produce a standard, those many ways will each have their followers. Better would be if all web designers, remote support providers, and screen sharing app developers had a standard on which they could count on.

Evolution of DOM API to a higher level of abstraction.

I think it would be convenient for the DOM API to evolve to a higher level of abstraction as some Javascript libraries allow with high level and chainable methods.

Example:

 paragraph.appendElement("a").setAttribute("href", "/article");

I have made this same proposition in whatwg-dom and have been recommended to do it here.

Computing multi-line and formatted text layout for non-DOM scenarios

Hello WICG,

Microsoft has put together an explainer for a method to leverage the UA's ability to compute line-breaking of formatted text runs in scenarios where the DOM is not directly usable or available. The API takes advantage of the UA's layout engine to address many of the subtle complexities of text layout that make implementing line-breaking of formatted text a complex task in JavaScript. For example, properly handling international text, bidi, text shaping, etc. (see the explainer for more detail).

We would like to have the WICG community join us in reviewing this proposal, and would like to move it soon into an incubation as it is generating interest from web developers and some partners that originally suggested the idea.

The proposal currently is targeted for Canvas text layout scenarios, but we anticipate generalizing some of the concepts to harmonize and be potentially shared with the Houdini Layout API and potentially other platform areas in which this capability could be useful in the future.

HapticsDevice API

Introduction

In today's device ecosystem, there are several types of haptic-enabled surfaces:

In-built haptic engines (i.e. mobile devices)
Laptop/external touchpads
Game/XR controllers
Peripheral hardware such as Surface Dial

While solutions such as navigator.vibrate() and GamepadHapticActuator aim to expose a limited set of these haptic capabilities to the web, web developers today do not have the ability to harness the majority of these surfaces as they do on native platforms. This prevents them from building tactile experiences that physically engage users, help them understand when critical activities have succeeded or failed, or immerse them by simulating virtual textures/actions.

Goals

Provide web developers with access to more haptic-enabled devices and features during user interaction
Give developers a mechanism to leverage both pre-defined and custom haptic waveforms on hardware/platforms that support them
Define a flexible enough API surface to enable support for extensions in the future (see Potential Extensions)

Featured Use Case

A new generation of gaming controllers are built on buffered haptics and Linear Resonance Actuators (LRAs). Notable devices are the Nintendo Switch JoyCon, Playstation's DualSense, and the HTC Vive Wands. Using the existing haptics APIs for Gamepad, there is no way to fully take advantage of the haptic capabilities of these devices. With this new Haptics API, it would provide an extensible interface to allow developers to create rich XR and gaming experiences on the web.

Read the complete Explainer

Feedback

Please provide all feedback below.

I welcome feedback in this thread, but encourage you to file bugs against HapticsDevice API Explainer.

Client-side A/B testing

Introduction

Client-side A/B testing refers to the method of performing experimentation related changes to a web application in the browser — sometimes without integrating code changes to the actual application source code. This is popular in the industry as the method usually helps to cut down resources required for experimentation and to scale A/B testing involving external teams. This is a proposal to explore ways of conducting the same outcome, while avoiding or minimizing the performance penalty that is associated with the techniques used today.

Please read the complete Explainer. The associated prototype and code can be found here.

You can view the W3PerfWG March 17 2022 Presentation and Notes here.

Longer version

A/B testing on the web involves creating variations of a web application that can be selected for a sample of traffic, in order to verify a hypothesis. With infinitely scalable engineering resources, every experimentation could happen right inside the application engineering team and application source code, and could modify the application structurally to suit the needs.

However, in reality, due to engineering resource constraints, service provider boundaries, and due to application architecture choices, modifying the application for every experiment can be difficult or cost prohibitive.

Some use-cases and arguments that make Client-side A/B testing approach attractive are:

Marketing or Research teams, external to the web application engineering team, want to conduct experimentation.
Product management personnel want to conduct experimentation with minimal engineering bandwidth spent.
Potential to reduce technical debt incurred through integrating experimental changes.
Less engineering bandwidth spent on experimentation translates to a larger number of experiments, more hypotheses being tested.

In order to meet these use cases, combined with the flexibility available in web application architectures, the industry has taken a defacto approach to A/B testing — applying cosmetic changes to a web application. What do we mean by “cosmetic”? It means the experimentation-related changes aren’t baked into the original application’s source code. Instead, changes are applied in the browser, after application source and binaries have been sourced from its server.

This type of testing enables the above use-cases to function at scale, and usually employs Javascript as a de-facto to perform the modifications. And there lies the tradeoffs — employing Javascript in this manner as it is done today, comes with a few drawbacks:

Until the experimentation script has had a chance to make the necessary modifications to the web application, the experience isn’t final or presentable to the user. The user might see the non-adjusted variant and start interacting with it, or could be the victim of a jarring experience of the changes being made — most of them resulting in layout shifts.
To counter the layout shifts introduced by these cosmetic modifications, experimentation providers typically block the page from being rendered using styling, which is later removed as the experiment-related changes have been applied. This improves layout stability, however, ends up delaying rendering, resulting in a performance degradation.
If a network request to the experimentation script and changes are inserted prior to the body of the application, that introduces rendering blocks that block document parsing and loading critical resources — incurring significant delays in performance metrics and thus creating missed opportunities in business and user experience.
This also introduces a potentially unnecessary dependency on Javascript, even for static pages. It wastes computational resources on each client, even in cases where the test outcome can be pre-computed once and cached.
This is a proposal to collaborate and create better, performant means to conduct Client A/B testing.

Please read the complete Explainer. The associated prototype and code can be found here.

Feedback

I welcome feedback in this thread, but encourage you to file bugs against the Explainer.

Resource bundles

Introduction

A web site is composed of multiple resources, such as HTML, CSS, JavaScript and images. When a web application is loaded, the web browser first fetches the resources referenced by the page, and ultimately renders the web page.

The traditional way of building and deploying web sites is to use separate files for code organization purposes, and allow the browser to fetch them separately.

This model is well-supported by browsers and web specifications, but does not perform well in real-world applications, which frequently organize their code into hundreds or even thousands of files (even small websites quickly accumulate dozens of files).

In an attempt to address these performance issues without losing the ability to organize their code reasonably, developers have historically built tools that group together source files together in various ad-hoc ways:

CSS concatenation.
Image spriting.
Bundling multiple JavaScript files together. Developers have used script concatenators for decades. More recently, developers have begun to use semantics-preserving module bundlers that combine many standard JavaScript modules into a single script or module.
In recent years, developers have begun to bundle resources such as images and styles together with their JavaScript. In the case of CSS, this is accomplished by imperatively inserting styles into the DOM. In the case of images, it is accomplished by Base64-encoding the image, and then decoding the images at runtime using JavaScript and imperatively inserting them into the DOM.

Developers have also found ways to bundle newer file types (such as WebAssembly) with their JavaScript by base64 encoding them and including them in the combined JavaScript files that are created by build tools.

Modern tools that automate these ad-hoc strategies are known as "bundlers". Some popular bundlers include webpack, rollup, Parcel and esbuild.

Each bundler ecosystem is effectively a walled garden. Their bundling strategies are implementation details that are non-standard and not interoperable. In other words, there is no way for an application bundle that was created using webpack to access an image inside of an application bundle that was created using Parcel.

This proposal aims to create a first-class bundling API for the web that would satisfy the use-cases that motivated today's bundler ecosystem, while allowing resources served as part of a bundle to behave like individual resources once they are used in a page.

Read the complete Explainer

Relationship to Web Bundles

I've been working closely with Jeffrey Yasskin, Yoav Weiss and others who are involved with the WICG/webpackage repository. The idea is that the two repositories coexist with different focuses/scopes, but towards a unified solution where there is overlap. We plan keep working together (both in WICG and in the IETF WPACK WG) to iron out any open questions.

Feedback

I welcome feedback in this thread, but encourage you to file bugs against the Explainer.

Add preferCurrentTab to getDisplayMedia

Introduction

Sites often wish to self-capture. For example, a slides deck application might wish to let the user stream the presentation to a virtual conference.

Calling getDisplayMedia offers the user a wide selection of possible capture-sources. What if the application really just wants the current tab? It could be hard for the user to hunt down the specific tab out of all the tabs they have open.

Ideally, the application would be able to present a confirmation-only dialog to the user - share the current tab, yes/no? Standardization efforts for this feature as getViewportMedia are underway.

However, getViewportMedia will be gated by (1) cross-origin isolation and (2) an opt-in header. That will limit adoption, at least initially.

We therefore extend getDisplayMedia in a way that allows the application to inform the browser that it prefers the current tab. getDisplayMedia currently accepts a single parameter of type MediaStreamConstraints (a dictionary). We extend that dictionary with a new member called preferCurrentTab. This new member is a boolean defaulting to false. When set to true, the browser presents the current tab as the most prominent option.

This is an imperfect solution; a compromise between two needs:

Applications need a way to signal preference for the current tab. Possibly even exclusive need of the current tab.
getViewportMedia is a long way off, and the security requirements gating it will need a long time to gain widespread adoption.

Feedback (Choose One)

Please provide all feedback below.

Explainer: preferCurrentTab - Explainer

Allow browsers to report the CPU architecture

Introduction

It's quite crazy that I have in 2021 to make a proposal to such an antique problem but here we are and the problem is growing at an increasingly fast pace.
There is no reliable way do distinguish between 32 bit and 64 bit X86 CPUs, more importantly, there is no reliable way to even distinguish between ARM and x86 cpus...
The web browser makers have procrastinated this need for decades since the amalgam of Android/IOS == ARM and PC/macOS == x86 didn't worked too bad.. (except Android x86 is a thing) until now.

Use Cases

By far the main use case is to give the right binary to download for a given user. PCs ARM marketshare is growing at an increasingly fast pace. All new MacBooks use the M1 ARM processor, all (?) chromebooks use ARM since a long time and have (~10% ! of laptop marketshare), people have built ARM linuxes since decades and Windows ARM "Always connected" laptop will follow the path that Apple has paved. ARM is by design more energy efficient than x86 which make it more suitable for laptops, AND is becoming as fast as x86 latest CPUs.

The issue about this paradigm shift is that compile time languages that have no bytecode/VM generate arch dependant binaries (C, C++, Rust, Go, others).
And asking the user to choose its processor version adds friction and more importantly is just not a reasonable solution since a huge chunk of users will be unaware of what even x86 or ARM means...
Sure FatELF is a thing but it is 1) FAT AKA not ecological and 2) doesn't work on Windows.

Solution: Write Once, Run Anywhere

There seem to be a reliable (?) API to query the platform:
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/runtime/PlatformArch
and it is supported by ALL browsers! Except it was placed under the WebExtension umbrella...
The proposal is to make this API accessible for regular websites too.
optional:
At the same time, this might be the occasion to extends the support to other archs (MIPS, RISC-V, etc) (I hope browsers can internally leverage an API to automatically get the CPU arch for any ARCH, it should be doable).
optional:
exposing hardware support level with things like HWCAPS levels
The most useful hardware support to detect would be SIMD support (and vector length)
https://v8.dev/features/simd apparently it's possible to detect but not sure it report the max supported vector length nor how accurate it is (ARM neon, SVE, etc)
moreover one can hope that one day at least for strong ecological reasons, JS will support SIMD natively like all other programming language in existence and when that day will come, such feature detection would be necessary.
optional: distinguish between ARM 32 and 64 bit

Privacy & Security Considerations

"No considerable privacy or security concerns are expected, but we welcome community feedback."
Native programs have had access to such informations since the begining of time and it is not a cause of significant security flaws.
About browser fingerprinting, that is not a real concern, the discriminative power of knowing the arch is extremely low especially compared to the regular low hanging fruits used in fingerprinting. The same apply for the optional extension of expliciting vector support, since any CPU from the decade support vector instructions and the vector length is increasingly reaching a maximal cap (512), leading to a progressive normalization of those values.

A new history API, "app history"

Introduction

The web's existing history API is problematic for a number of reasons, which makes it hard to use for web applications. This proposal introduces a new window.appHistory API, which is more directly usable by web application developers to address the use cases they have for history introspection, mutation, and observation/interception.

The proposed API layers on top of the existing API and specification infrastructure, with well-defined interaction points. The main differences are that it is scoped to the current origin and frame, and it is designed to be pleasant to use instead of being a historical accident with many sharp edges. Some notable features:

Allow easy conversion of cross-document navigations into single-page app same-document navigations, without fragile hacks like a global click handler.
Provide a uniform way to signal single-page app navigations, including their duration.
Provide a reliable system to tie application and UI state to history entries. (Better than the existing history.state.)
Provides reliable events for notifying the application about navigations through the list of history entries, which they can use to synchronize application or UI state. (Better than the existing popstate and hashchange events.)

Read the complete Explainer.

Feedback

We'd love to hear from the community of developers as to whether this sounds like a worthwhile direction to explore, and in particular whether we should migrate it to the WICG for further discussion in its own dedicated repository.

For any general feedback or questions on the idea, especially any support for developing the idea, please use this thread. For specific technical questions or suggestions on the API, file issues on WICG/app-history where we can work through them in more detail. (Hopefully such questions can move to a dedicated WICG repository once we get general support.)

Compute Pressure API

Introduction

Web applications are today being used to provide a wide range of solutions, including some use-cases that push the capabilities of the device they are running on, e.g. video gaming and video-conferencing.

It is a testament to the web's power as a platform that one application will run on devices ranging from low-powered devices all the way to the most powerful machines.

However, a one-size-fits all approach does not work well for computational intensive applications; a missing capability of the web is to allow developers to tailor the experience depending on computational capacity.

Compute-intensive web apps that want to provide a good experience across their user base rely on browser extensions to access system metrics.

The Compute Pressure API will let developers write web applications that anticipate and react to system load changes to provide a great experience for users regardless of their device's compute prowess.

Read the complete Explainer.

Feedback (Choose One)

I welcome feedback in this thread, but encourage you to file bugs against the Explainer.

Scripting Policy.

Introduction

XSS is bad. CSP's syntax is obtuse, and it's trying to do too many things. What if we could just target XSS?

Read the complete Explainer. Also the spec.

Feedback (Choose One)

Please provide all feedback below.

Visual Contrast and Readability methods and guidelines

Introduction

Hello, I'd like to open a discussion regarding development of standards related to readability for internet content.

Executive Summary

WCAG 3 is years into the future, and will probably embody a few minimum standards for readability. But there is room for a greater depth of non-normative recommendations and best practices to improve readability. These can, and arguably should, be a separate independent set of guidelines, eventually a superset of guidelines beyond the normative minimums adopted by any given standard.

Visual Readability Contrast

Nearly three years ago, I began thread #695 on the WCAG Github, where I pointed out the several significant problems with the WCAG_2 contrast guidelines, including 1.4.3 adm 1.4.11, and especially the WCAG_2 maths and methods for estimating contrast. As that progressed, I took a proactive approach in solving these issues by leading the research and development of the solutions.

The International Reading Crisis

The internet destroyed the print industry. Major cities were once filled with newsstands. We had a massive newsstand here in Hollywood near my home that was a block long and densely packed with magazines and newspapers from around the world. But as the internet became popular, this newsstand shrunk over the years eventually closing completely some years ago. This is also true of the several area bookstores that have long since shut their doors.

Some estimates indicate that reading in general—internet and print— is down 40% over the last two decades. Certainly the growth of the internet is not the only factor, but there is a related concern: in the shift to mobile devices, it has become difficult it is to read for extended periods. Contrast is often too low, making fatigue high. There is a great deal of misinformation and misunderstanding regarding best practices for readability. And with the rapid advance of technology, WCAG_2 contrast guidelines became a part of this misunderstanding.

Let There (Not) be Light: Genesis of an Error

The WCAG_2 contrast guidelines were first developed circa 2005, before the birth of the iPhone and the subsequent mobile media revolution. In the early 2000s, web content was typically set with core web fonts, with black text on a light grey or white background, often HTML with no CSS, for viewing on (predominantly) CRT type displays.

Today of course the landscape is completely different. CRTs are museum pieces, and web content is viewed on mobile devices under much wider ambient conditions. The content itself is CSS-styled using readily available font services like Google Fonts which are available in any number of ultra-thin styles combined with unreadably low contrast colors.

To be sure, there were known problems with the WCAG_2 contrast guidelines at the time of adoption, but WCAG_2 was more focused on accessibility needs such as ARIA and related technologies. Modern technologies are not so forgiving. Designers are unhappy being forced to use contrast math results that are known to be wrong, while accessibility experts are in a difficult spot being forced to explain why these visibly wrong results have to be adhered to.

And the losers in this tug-of-war? All sighted users. The wrong math and inappropriate guidelines with the resultant misunderstandings surrounding color choices and contrast for readability has created a decidedly unreadable web experience.

The Perceptual Solution

As a result of some years of research (which is continuing) I authored the contrast section of Silver/WCAG_3 and also created the APCA (Advanced Perceptual Contrast Algorithm), APCA is a perceptually accurate method of predicting contrast for readability. APCA is demonstrating superior performance for readability compared to WCAG_2 contrast guidelines, here is a side by side comparison which illustrates the problem of false-passes that WCAG_2 1.4.3 can create:

A Constant Contrast Comparison

Each column set to a specific contrast. The top half of the table, each row has the same text color. For the bottom half, each row shares the same background color. Pink areas indicate out of range.

All sample fonts are at 400 weight. For APCA columns, sizes changed per APCA guidelines. To demonstrate extended range, cells that don't quite hit the target have the text enlarged per the APCA guideline, and have that cell's LC value listed in the pink area with an arrow.

While WCAG_2 degenerates to an unreadably-low-contrast as colors get darker, APCA maintains readability across the visible range, and you may notice a slight increase in APCA contrast for darker colors to address the cases of display flare and misadjusted black levels in monitors, and certain common mobile-use environments.

The Need for a New Standard

While the APCA was originally, and is still, aimed toward WCAG 3, that standard is still some years off from being the recommendation. Due to the charter for WCAG_2, changes must be backwards compatible (though there has been some discussion of relaxing that for a possible future version). It is difficult to make something that is perceptually correct like APCA, backwards compatible to an old method that is not perceptually accurate like WCAG_2. This is being partly addressed with "Bridge-PCA", and interim APCA-based method and guideline that is compatible with WCAG2.

While I have been working to create standards and guidance for WCAG_3, and in some cases WCAG_2.x, APCA is also being developed for other scopes and other guideline uses. The problem of readability is significant, and APCA and other related research needs to move forward to help designers, developers, accessibility advocates, etc. make better design choices that meet the needs of all sighted users.

In other words, as research progressed over the last three years, and in conversations with designers and developers, early adopters, and beta testers, it is abundantly clear that the scope is well beyond accessibility. 100% of sighted users have visual needs in terms of readability and distinguishability. And those needs are met with a comprehensive set of guidelines, beyond just contrast. For instance visual fatigue is as yet unaddressed, and a significant part of why reading for extended periods on devices is challenging for most people.

APCA is not just a new contrast math, it is a complete set of guidance for content presentation, weighted for readability, but based in modern vision science, empirical studies, consideration of current and emerging technologies, and most important the needs of users, and by users I includes designers and developers as well.

A Deeper Dive

Rather than list many links, here is a short list, on the linktree with key links to a deeper dive into the work that has been done thus far.

And for a more complete catalog of APCA related resources, please see:

https://git.myndex.com

Discussion and questions are welcome at the main SAPC-APCA repo in the Discussions tab: https://github.com/Myndex/SAPC-APCA/discussions

Beta Packages

APCA is presently public beta, and beta tools, and sample code is available at the GitHub repos and npm. The base version is:

 npm i apca-w3

And the WCAG 2 compatible version, with alternate guidelines, is Bridge PCA:

npm i bridge-pca

Next, my question to you is how best to move forward here? And please let me know of any thoughts or questions, I am at your disposal.

Thank you,

Andy

Andrew Somers
W3 / AGWG Invited Expert

𝐓𝐇𝐄 𝐑𝐄𝐕𝐎𝐋𝐔𝐓𝐈𝐎𝐍 𝐖𝐈𝐋𝐋 𝐁𝐄 𝐑𝐄𝐀𝐃𝐀𝐁𝐋𝐄^™

PARAKEET for privacy-preserving-ads

Introduction

[From the intro to the explainer]

At Microsoft, we are committed to fostering a healthy web ecosystem where everyone can thrive – consumers, publishers, advertisers, and platforms alike. Protecting user privacy is foundational to that commitment and is built into Microsoft Edge with features like Tracking Prevention, Microsoft Defender SmartScreen, and InPrivate browsing. We also support an ad-funded web because we don't want to see a day where all quality content has moved behind paywalls, accessible to only those with the financial means.

Through this proposal, we believe we can substantially improve end-user privacy while retaining the ability for sites to sustain their businesses through ad funding. We propose a new set of APIs that leverage a browser-trusted service to develop a sufficient understanding of user interests and therefore enable effective ad targeting without allowing ad networks to passively track users across the web. This approach removes the need for cross-site identity tracking, enables new privacy enabling methods to support monetization, and maintains ad auction functionality within existing ad tech ecosystems.

Today, we're pleased to further this discussion among the web community with the contribution of one possible approach. We will continue to collaborate with the community to iterate on this and other proposals with the goal of landing on a set of interoperable standards that work in multiple browsers. Together with the web community, we have an opportunity to share ideas, learn from each other, and create a better future for the web.

Overview

This proposal defines an internet-hosted service trusted by the browser that assists in user interest inference and anonymization in the ad ecosystem. This service replaces the current model where ad networks store cross-site identifiers for interest-based ad targeting.

The service will use Private and Anonymized Requests for Ads that Keep Efficacy and Enhance Transparency ("PARAKEET").

PARAKEET is a service for anonymizing ad requests and user interest inference support. It has many advantages including strong privacy contracts based on differential privacy and ad request anonymization, user transparency and control of ad interest membership and granularity, monetization that leverages both site context and ad interest information in a unified request.

The service aims to offer significant privacy improvements with fewer changes (relative to other proposals in this space) required on the part of website owners which will help accelerate adoption in the ad ecosystem.

Read the complete Explainer.

Feedback

I welcome feedback in this thread, but encourage you to file bugs against the Explainer.

Accept-Monetization Content Negotiation

Introduction

This proposal imagines a new content negotiation header, accept-monetization, to pass along the user's payment desires to a http server. The server can then choose the best response for the user, or indicate that there's a mismatch between client and server's capabilities.

Read the complete Accept-Monetization Explainer.

Feedback

I welcome feedback in this thread, but encourage you to file bugs against the Explainer.

Web Instruction Set (WISE)

Introduction

What if we could compile Web content, XML and HTML5 documents, into Web instructions for faster transmission, decoding, and page loads?

What if these Web instructions could support functionalities from the DOM and also other APIs such as the CSSOM, Custom Elements, CSS Animations, WebAssembly, Canvas, and WebGPU?

Example

Given the following XML document:

 <?xml version="1.0" encoding="UTF-8"?>
 <DocumentElement param="value">
     <FirstElement>
         &#xb6; Some Text
     </FirstElement>
     <?some_pi some_attr="some_value"?>
     <SecondElement param2="something">
         Pre-Text <Inline>Inlined text</Inline> Post-text.
     </SecondElement>
</DocumentElement>

when it is passed through a SAX parser, it will generate a sequence of events resembling the following:

XML Element start, named DocumentElement, with an attribute param equal to value
XML Element start, named FirstElement
XML Text node, with data equal to ¶ Some Text
XML Element end, named FirstElement
Processing Instruction event, with the target some_pi and data some_attr equal to some_value
XML Element start, named SecondElement, with an attribute param2 equal to something
XML Text node, with data equal to Pre-Text
XML Element start, named Inline
XML Text node, with data equal to Inlined text
XML Element end, named Inline
XML Text node, with data equal to Post-text
XML Element end, named SecondElement
XML Element end, named DocumentElement

Another sequence could be based on the DOM JS API, resembling the following:

var el1 = document.createElement('DocumentElement');
el1.setAttribute('param', 'value');
var el2 = document.createElement('FirstElement');
el2.textContent = "¶ Some Text";
var el3 = document.createProcessingInstruction('some_pi', 'some_attr="some_value"');
var el4 = document.createElement('SecondElement');
el4.setAttribute('param2', 'something');
var txt1 = document.createTextNode('Pre-Text ');
var el5 = document.createElement('Inline');
el5.textContent = 'Inlined text';
var txt2 = document.createTextNode(' Post-text.');
el4.appendChild(txt1);
el4.appendChild(el5);
el4.appendChild(txt2);
el1.appendChild(el2);
el1.appendChild(el3);
el1.appendChild(el4);
document.append(el1);

A Stack-based Virtual Machine

Considering a stack-based virtual machine with the capabilities for one or more local variables, using a string table, creating an appendChild2() extension method on Element which returns the appended-to node instead of the appended node, a sequence of instructions might resemble:

var text = ['DocumentElement', 'param', 'value', 'FirstElement', '¶ Some Text', 'some_pi', 'some_attr="some_value"', 'SecondElement', 'param2', 'something', 'Inline', 'Inlined text', 'Pre-Text ', ' Post-text.'];
var local = [null];
stack.push(document.createElement(text[0]));
stack.top.setAttribute(text[1], text[2]);
stack.push(document.createElement(text[3]));
stack.top.textContent = text[4];
stack.push(document.createProcessingInstruction(text[5], text[6]));
stack.push(document.createElement(text[7]));
stack.top.setAttribute(text[8], text[9]);
stack.push(document.createElement(text[10]));
stack.top.textContent = text[11];
local[0] = stack.pop();
stack.top.appendChild(document.createTextNode(text[12]);
stack.top.appendChild(local[0]);
stack.top.appendChild(document.createTextNode(text[13]);
stack.reverse();
stack.push(stack.pop().appendChild2(stack.pop()));
stack.push(stack.pop().appendChild2(stack.pop()));
stack.push(stack.pop().appendChild2(stack.pop()));
document.append(stack.pop());

Which, towards a binary serialization of virtual machine instructions, might resemble:

SPDCE(0)
STSA(1, 2)
SPDCE(3)
STTC(4)
SPDCPI(5, 6)
SPDCE(7)
STSA(8, 9)
SPDCE(10)
STTC(11)
SETLP(0)
STACDCTN(12)
STACL(0)
STACDCTN(13)
SREV()
SPSPAC2SP()
SPSPAC2SP()
SPSPAC2SP()
DASP()

Or, perhaps, might resemble:

LDTEXT.0
CALL DOCUMENT_CREATEELEMENT
DUP
LDTEXT.1
LDTEXT.2
CALL ELEMENT_SETATTRIBUTE
LDTEXT.3
CALL DOCUMENT_CREATEELEMENT
DUP
LDTEXT.4
CALL ELEMENT_SETTEXTCONTENT
LDTEXT.5
LDTEXT.6
CALL DOCUMENT_CREATEPROCESSINGINSTRUCTION
LDTEXT.7
CALL DOCUMENT_CREATEELEMENT
DUP
LDTEXT.8
LDTEXT.9
CALL ELEMENT_SETATTRIBUTE
...

A Registers-based Virtual Machine

Next, considering a registers-based virtual machine with a list of registers, r, a sequence of instructions might resemble:

var text = ['DocumentElement', 'param', 'value', 'FirstElement', '¶ Some Text', 'some_pi', 'some_attr="some_value"', 'SecondElement', 'param2', 'something', 'Inline', 'Inlined text', 'Pre-Text ', ' Post-text.'];
r[0] = document.createElement(text[0]);
r[0].setAttribute(text[1], text[2]);
r[1] = document.createElement(text[3]);
r[1].textContent = text[4];
r[2] = document.createProcessingInstruction(text[5], text[6]);
r[3] = document.createElement(text[7]);
r[3].setAttribute(text[8], text[9]);
r[4] = document.createElement(text[10]);
r[4].textContent = text[11];
r[5] = document.createTextNode(text[12]);
r[6] = document.createTextNode(text[13]);
r[3].appendChild(r[5]);
r[3].appendChild(r[4]);
r[3].appendChild(r[6]);
r[0].appendChild(r[1]);
r[0].appendChild(r[2]);
r[0].appendChild(r[3]);
document.append(r[0]);

Which, towards a binary serialization of virtual machine instructions, might resemble:

DCE(0, 0)
SA(0, 1, 2)
DCE(1, 3)
TC(1, 4)
DCPI(2, 5, 6)
DCE(3, 7)
SA(3, 8, 9)
DCE(4, 10)
TC(4, 11)
DCTN(5, 12)
DCTN(6, 13)
AC(3, 5)
AC(3, 4)
AC(3, 6)
AC(0, 1)
AC(0, 2)
AC(0, 3)
DA(0)

Considered Uses

Web instructions would have multiple uses, including, but not limited to:

Static Web content could be compiled into instructions for more efficient storage, transmission, and reconstruction.
Dynamic Web content could be streamed using these instructions. Streams of Web instructions needn't conclude upon the loading and presentation of initial Web content. Streams could continue, providing dynamic and unfolding instructions including in response to user-input events and navigation.
Web synchronization and cobrowsing scenarios.
Tracks of Web instructions could enable other interactive hypervideo scenarios.

Optimizations

Potential optimizations include that well-known element names and attribute names, e.g., those of HTML5, could have reserved indices in the text array and would not need to be stored or transmitted.

Discussion

Any thoughts on these ideas? Is there any interest in incubating a Web Instruction Set?

Fenced frames

Introduction

In a web that has its cookies and storage partitioned by top-frame site, there are occasions (such as Interest group based advertising or Conversion Lift Measurements) when it would be useful to display content from different partitions in the same page. This can only be allowed if the documents that contain data from different partitions are isolated from each other such that they're visually composed on the page, but unable to communicate with each other. Iframes cannot be used for this purpose since they have several communication channels with their embedding frame (e.g., postMessage, URLs, size attribute, name attribute, etc.). We propose fenced frames, a new element to embed documents on a page, that explicitly prevents communication between the embedder and the frame.

Overview

The fenced frame enforces a boundary between the embedding page and the embedded document such that user's identity/information visible to the two sites is not able to be joined together and exfiltrated.

The different use cases and their privacy model are discussed as the different fenced frame modes here.

Fenced frames are embedded contexts that have the following characteristics:

They’re not allowed to communicate with the embedder and vice-versa, except for certain information such as limited size information.
They may have access to browser-managed, limited unpartitioned user data, for example, turtledove interest group.

The idea is that the fenced frame should not have access to both of the following pieces of information and be able to exfiltrate a join on those:

User information on the embedding site
- Accessible via communication channels
Information from other top-site partitions
- Accessible via an API (e.g., Turtledove) or via access to unpartitioned storage

A primary use case for a fenced frame is to have read-only access to some other partition’s storage, for example, in Turtledove, it is the interest-based ad to be loaded which was added while visiting another site. The URL of the ad is sufficient to give away the interest group of the user to the embedding site. Therefore the URL for the ad creative is an opaque url in this mode of fenced frame (details here) — which can be used for rendering, but cannot be inspected directly.

Fenced frame API

The proposed fenced frame API is to have a new element type and treat it as a top-level browsing context.
In this approach, a fenced frame behaves as a top-level browsing context that is embedded in another page. This is aligned with the model that a fenced frame is similar to a “tab” since it has minimal communication with the embedding context and is the root of its frame tree and all the frames within the tree can communicate normally with each other.
Having said that, since fenced frames are embedded frames, they also behave like iframes in many ways. For example:

Browser extensions will access a fenced frame as an iframe, e.g., for ad blocking.
Browser features like accessibility, developer tools etc. will access a fenced frame like an iframe.

Read the complete Explainer.

Feedback

I welcome feedback in this thread, but encourage you to file bugs against the explainer.

Anonymous iframe.

Introduction

Deploying COEP is difficult for some developers, because of third party iframes. Here is the typical scenario:

End users need performant websites.
Some developers get performant websites by using multithreading/SharedArrayBuffer in their top-level document.
To mitigate Spectre attacks, browsers vendors like Chrome, Firefox and Safari gate SharedArrayBuffer usage behind the crossOriginIsolated capability. This requires deploying both COEP and COOP
COEP requirement is recursive: To use COEP, all the <iframe> must also use COEP.
Waiting for third party to deploy COEP is painful for developers. This is almost always out of their control.

Beyond performance, there are additionnal features gated behind the crossOriginIsolated capability: high resolution timers, getViewportMedia, etc...

Deploying COEP is challenging in cases where there's not a single developer involved, but many.

Anonymous iframe gives developers a way to load documents in a third party iframe from a new and ephemeral context, scoped to the current page. In return, the Cross-Origin-Embedder-Policy (COEP) embedding rules can be lifted.

This way, developers using COEP can now embed third party iframes that do not set COEP.

See repository

Feedback

Zoom, StackBlitz, and Google Display Ads are supportive.
For instance, the latter loads ads content in iframes. The content can be served from 3rd parties, which is out of their direct control. It takes an industry-wide change and opt-in every resource for ads to work properly. It seems somewhat unlikely that they'll be able to ensure that all the ads creators will do the work. Implementing Anonymous-iframe would allow all publishers to get out of the SAB reverse origin trial.

Twitter:
Twitter is very close to ship COEP:credentialless, modulo patching React and completing a few de-iframing tasks. So they will probably not need anonymous iframe to enable crossOriginIsolation. I will get more detailed feedback soon. For now:

Generally though, along the lines of the same-origin-allow-popups, if it can be done securely, this seems like a reasonable thing to support. My only hesitation would be that adding it means it's likely people will use this and never actually lean on the iframed site to upgrade.

+CC @mikewest @camillelamy @annevk @whatwg/cross-origin-isolation

User Preference Media Features Client Hints Headers

Introduction

HTTP Client Hints defines an Accept-CH response header that servers can use to advertise their use of request headers for proactive content negotiation. This proposal introduces a set of user preference media features client hints headers like Sec-CH-Prefers-Color-Scheme, which notify the server of user preferences that will meaningfully alter the requested resource, like, for example, through the currently preferred color scheme. These client hints will commonly also be used as critical client hints via the Critical-CH header.

Read the complete Explainer or the spec draft.

Feedback

We welcome feedback in this thread, but encourage you to file bugs in the repo.

EyeDropper API

Introduction

Currently on the web, creative application developers are unable to implement an eyedropper, a tool that allows users to select a color from the pixels on their screen, including the pixels rendered outside of the web page requesting the color data. This explainer proposes an API that enables developers to use a browser-supplied eyedropper in the construction of custom color pickers.

Read the complete Explainer.

Feedback

I welcome feedback in this thread, but encourage you to file bugs against the Explainer.

Standardize JS API to encode/decode CBOR

Introduction

Concise Binary Object Representation, aka. CBOR (RFC 8949) is a data format widely deployed on the web. However there is no standardization of JS API to encode/decode it.

Use Cases

A consensus around a standard CBOR JS API specification might be used by web browsers or any other JS implementation to expose native CBOR support¹.

Goals

The goal of this proposal is to define a standard JS API to encode/decode CBOR data format. It includes:

the prototype of functions for encoding/decoding (based on know-how from existing JS implementations),
the mapping and convertion between JS notation and CBOR representation considering the non-normative advices from RFC 8949 Section 6: "Converting Data between CBOR and JSON".

Non-goals

This proposal does not consider the RFC 8742: "Concise Binary Object Representation (CBOR) Sequences".

Proposed Solution

Based on existing implementations, the proposed solution defines the simple JS API of the form:

encoded = CBOR.encode(data) ➔ data = CBOR.decode(encoded)

Encoding CBOR

The CBOR.encode() method converts a JavaScript object or value to its CBOR representation in an ArrayBuffer object.

This method follow the recommendations from RFC 8949 Section 6.2: "Converting from JSON to CBOR
" to encode JS types into CBOR representation.

Syntax

CBOR.encode(value)

Decoding CBOR

The CBOR.decode() method converts the CBOR representation within an ArrayBuffer object to a JavaScript object or value.

This method follow the recommendations from RFC 8949 Section 6.1: "Converting from CBOR to JSON" to decode JS types from CBOR representation.

Syntax

CBOR.decode(value)

Let’s Discuss

A few remaining questions:

Is it the right place for this kind of proposal ?²
Should we map (some) CBOR tags to their equivalent in JS ? For instance date/time, URI,...

Previous discussion on discourse: https://discourse.wicg.io/t/proposal-native-cbor-or-messagepack-support/2011

CBOR is already implemented in some browsers (Chrome, Firefox, and Edge) thanks to the new W3C Specification Web Authentication (WebAuthn). ↩
Request for Mozilla Position has been asked here. ↩

Screen Brightness API

Introduction

When displaying scannable images like barcodes or QR codes, readability by software is assisted by temporarily maximising the screen brightness.

Browser based Web apps currently cannot do this which negatively impacts user experience and accuracy should the screen be needed in tandem with the front facing camera to illuminate or scan something.

Native apps have the ability to set screen brightness with relative ease:

iOS (possibly with M1 Macbooks?) has setBrightness:

UIScreen.mainScreen().brightness = YourBrightnessValue

Android has this three liner:

final WindowManager.LayoutParams lp = getWindow().getAttributes();
lp.screenBrightness = WindowManager.LayoutParams.BRIGHTNESS_OVERRIDE_OFF;
getWindow().setAttributes(lp);

More discussion around how Screen Brightness is needed and how it may interact with other sensors and APIs here:

w3c/ambient-light#64 (comment)

Describe the challenge or problem on the web you are proposing we address.

Use Cases (Recommended)

This ability would benefit:

travel applications, where a user must scan a boarding pass
locker applications, where a user can scan a barcode to open a locker door containing their purchase
remote medical, where increasing the screen brightness could assist in remote examination
biometric security apps, where increasing the screen brightness can help illuminate features to get better imagery from the front facing camera
increasing contrast for readability for the visually impaired
gaming and further legitimising WebGL and WebXR
make up mirror style apps
basically, any instance where maximising brightness or contrast would be useful

Goals (Optional)

Provide ability to read current screen brightness from the navigator's current host display.
Provide ability to request maximum brightness on navigator's current host display, either indefinitely or for a period of time.
Provide ability to release the request so that the device's brightness returns to its pre-request value (i.e. hand back control to OS).
Provide ability to emit specific errors to handle cases where such requests are denied or not possible.

Non-goals (Optional)

Proposed Solution

Add object to navigator called screenBrightness:

brightness: Float returns current brightness percentage in the range of 0.0..1.0
override({ brightness, maxDuration }): BrightnessLock async function, returning a BrightnessLock when brightness target met, agnostic to implementation (instant or transition).
- brightness: Float requested screen brightness
- maxDuration: Number maximum time the lock will be in effect, if unspecified then defaults to vendor value. The BrightnessLock should be released by the requesting userland code before this time elapses, otherwise the vendor must release it.

Add BrightnessLock implementation:

brightness: Float returns the requested brightness
release: Promise<Float> reverts to the OS controlled brightness, resolving with the new brightness level which may differ from the state prior to the BrightnessLock
expires: DOMHighResTimestamp the expiry time according to maxDuration

Add brightness to Feature Policy for use in nested browsing contexts.

Examples (Recommended)

const requestScreenBrightness = async (brightness, maxDuration) => {
    try {
        // would fulfil after the device transitions to the requested brightness, leaving implementation up to vendors: could be instantaneous, could transition
        screenBrightness = await navigator.screenBrightness.override({ brightness, maxDuration })
    } catch (err) {
        // err could be due to battery save mode, possible vendor permissions gating, out of range, etc
        console.error("Screen brightness change failed:", err.message)
        throw err
    }
}

// Hold 100% screen brightness for 3 seconds, then release:
if ('screenBrightness' in navigator) {
    console.log("Requesting increase to 100% from current brightness:", navigator.screenBrightness.brightness)
    const brightnessLock = await requestScreenBrightness(1.0)
    console.log("Brightness changed! Expires in:", brightnessLock.expires - performance.now())
    setTimeout(async () => {
        await brightnessLock.release()
        console.log("Brightness reverted to:", navigator.screenBrightness.brightness)
    }, 3000);
}

Alternate Approaches (Optional)

Extending the Wake Lock API with an option for screenBrightness:

// The wake lock sentinel.
let wakeLock = null;

// Function that attempts to request a screen wake lock.
const requestWakeLock = async () => {
    wakeLock = await navigator.wakeLock.request({
        screenBrightness: 1.0, // 100% brightness
    });
};

// Request a screen wake lock with maximum brightness:
await requestWakeLock();
// …and release it again after 5s.
window.setTimeout(() => {
  wakeLock.release();
  wakeLock = null;
}, 5000);

Privacy & Security Considerations

What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?
Possible fingerprinting, although the screen brightness should change over time independently of the user's device, according to device ambient light sensor feedback or user control.

Do features in your specification expose the minimum amount of information necessary to enable their intended uses?
Yes.

How do the features in your specification deal with personal information, personally-identifiable information (PII), or information derived from them?
No PII.

How do the features in your specification deal with sensitive information?
No sensitive information.

Do the features in your specification introduce new state for an origin that persists across browsing sessions?
No, any brightness lock should be implicitly released when a page is hidden or closed.

Do the features in your specification expose information about the underlying platform to origins?
Yes, but to a minimal extent. The presence of an API may infer some hardware information, and the time it takes to transition between brightness values may infer hardware and software information.

Do features in this specification allow an origin access to sensors on a user’s device?
Not directly. An argument could be made that reading the untouched value of the brightness may be used as a proxy value for accessing an ambient light sensor, but this wouldn't be very precise.

What data do the features in this specification expose to an origin? Please also document what data is identical to data exposed by other features, in the same or different contexts.
The device's screen brightness, specifically that of the display hosting the requesting application.

Do feautres in this specification enable new script execution/loading mechanisms?
No.

Do features in this specification allow an origin to access other devices?
No.

Do features in this specification allow an origin some measure of control over a user agent’s native UI?
No.

What temporary identifiers do the features in this specification create or expose to the web?
Presence of a specific screen brightness value. If it is felt that this is significant, this could be mitigated by a maximum lifetime on the brightness lock, enforced by browser vendors.

How does this specification distinguish between behavior in first-party and third-party contexts?
By default this API should be enabled on top level browsing contexts and by feature policy for nested browsing contexts.

How do the features in this specification work in the context of a browser’s Private Browsing or Incognito mode?
Ideally the same, but the brightness getter could reduce precision or return 0.75 by default unless it is overridden with a BrightnessLock.

Does this specification have both "Security Considerations" and "Privacy Considerations" sections?
Yes. Security is an issue due to handing control over a high-battery-usage component to the web, and could be mitigated by browser vendors and operating systems alike, either by denying access for battery reasons, or adding a maximum lock duration.

Do features in your specification enable downgrading default security characteristics?
No.

Let’s Discuss (Optional)

Does this even need to be readable? If there’s no readable data, there’s less fingerprintable area. Looking at the use cases, settable seems enough.
Shall we attempt to tackle multiple monitors?
How should multiple brightness requests be handled?
Should this be permissions-gated?

Proposal: Shared Element Transitions

Introduction

Currently, a smooth transition between activities is hard to accomplish in a Single-Page App, and is impossible to do so in a Multi-Page App. This is because although the animation language is expressive and powerful, coordinating different animations and DOM mutations is a difficult task.

We propose an API that is easy to adopt, with an explicit "prepare" and "transition" steps that allow the user-agent to do the bulk of the work of saving the current visual state and transitioning to the new state with an animated effect.

Please see the complete explainer.

Feedback

I welcome feedback in this thread, but encourage you to file bugs in this repo in order to keep better track of any concerns raised.

Interactive Video

Introduction

Interactive videos are videos which support user interaction. Interactive videos can navigate to, or branch to, video content depending upon: users’ interactions with menus, users’ settings and configurations, models of users, other variables or data, random numbers, and program logic.

Educational uses of interactive video include, but are not limited to: instructional materials, how-to videos, and interactive visualizations.

Some chatbots or dialogue systems can be stored as interactive videos.

Interactive films are interactive videos or cinematic videogames where one or more viewers can interact to influence the courses of unfolding stories. Interactive films can be described as video forms of choose-your-own-adventure stories, or gamebooks. Contemporary examples of interactive films include: “Black Mirror: Bandersnatch”, “Puss in Book: Trapped in an Epic Tale”, and “Minecraft Story Mode”.

One day, some AI systems could be trained using large collections of interactive films.

A Standard Runtime Environment

As envisioned, interactive videos contain JavaScript scripts and/or WebAssembly (WASM) modules. These scripts and modules should be provided with a runtime environment different from the one provided by Web browsers for Web documents. The runtime environment provided for interactive video scripts and modules should include functionality for:

navigation (e.g., seeking to clips, segments, or chapters in interactive videos)
prefetching resources (e.g., files in videos, or video attachments)
a. there should be a way, e.g., using URL fragments, to locate and prefetch files in videos, or video attachments
opening resources (e.g., files in videos, or video attachments)
a. there should be a way, e.g., using URL fragments, to locate and retrieve files in videos, or video attachments
presenting menus
a. perhaps also functionality for presenting image maps or “hotspots” atop video
accessing users’ settings and configurations
storing and accessing local data
storing and accessing remote data
a. e.g., learner, player, and user models
parsing JSON, XML, RDF, and other data formats

There is a need for one standard runtime environment for use by multiple interactive video formats. With a standard runtime environment, interactive video player software (e.g., Web browsers) could more readily play multiple formats of interactive video.

Security Considerations and User Permissions

Only the standard runtime environment intended for interactive videos’ JavaScript scripts and WASM modules should be available to them. Containing documents could, perhaps, permit otherwise (see also: <iframe>).

Interactive videos could contain hashes for and/or digital signatures of files in videos, or video attachments. WASM modules could be digitally signed.

Interactive video players (e.g., Web browsers) could make use of user permissions systems to protect users’ data privacy while providing users with features.

Menus and Accessibility

As envisioned, the presentation and display of menus is handled by interactive video players through the runtime environment API. Menus could be presented to users by invoking a present function.

Promise<MenuReponse> present(any menu, optional any options);

Spoken Language Interaction and Dialogue Systems

Interactive videos could utilize speech recognition grammars (e.g., SRGS) to enable users to select menu options via spoken natural language. In Web browsers, this functionality could be provided via the Web Speech API.

Perhaps interactive videos could also utilize remote speech recognition services.

Internationalization

Different versions of files in videos, or of video attachments, could exist in interactive videos for specific languages (e.g., menus, speech recognition grammars).

Documents Containing Interactive Videos

A document element for interactive video (e.g., <video>) could have an event upon its interface to be raised whenever a menu is to be presented to a user. In this way, the presentation of a menu could be intercepted by a containing document. The containing document could then process the arguments passed to the present function, display a menu to a user in a stylized manner, and provide a response back to the interactive video’s scripts or WASM modules, for example resulting in navigation to, or seeking to, a video clip, segment, or chapter.

Interactivity via Animated Colored Silhouettes in Secondary Video Tracks

One could use secondary video tracks to provide arbitrarily-shaped interactive regions. These secondary video tracks could each contain multiple animated colored silhouettes, overlay regions, or “hotspots”. The colors of these silhouettes would correspond with arbitrarily-shaped interactive elements. The color black, however, would be reserved for indicating the absence of a silhouette.

Consider an interactive video of an automobile engine, under a hood. Envision a secondary video track where there is a colored silhouette for each part of the engine. While the primary video track would be visible to a user, using the secondary video track(s), the user could, with their mouse, hover over and click on the parts of the engine.

Animated colored silhouettes could also be rectangular and mirror the motion of text or images in videos. This would facilitate traditional hyperlinks in videos.

Semantics and Metadata

With semantics and metadata, one could describe the contents of videos, the objects and events which occur in them, and place this information in semantic tracks. One could utilize semantic graphs as well as “semantic deltas”, or “semantic diffs”, which indicate instantaneous changes to semantic graphs.

As envisioned, the animated colored silhouettes in secondary video tracks have unique identifiers and are URI-addressable. In this way, semantics and metadata could more readily describe the silhouetted regions which map to visual contents in videos.

With semantic tracks, users could, for example, utilize queries, via user interfaces, upon the contents of videos and observe the query results, e.g., objects or events, visually selected, outlined, or highlighted in the videos. Also possible is that query results could be presented to users with storyboards or other visualizations using relevant images from videos.

JavaScript and WebVTT

A syntax example is indicated for embedding JavaScript in WebVTT text tracks. The example provides two lambda functions for a cue, one to be called when the cue is entered and the other to be called when the cue is exited.

05:10:00.000 --> 05:12:15.000
enter:()=>{...}
exit:()=>{...}

Polyfills and Prototyping

It appears that to implement a polyfill to prototype interactive video functionality, the HTML5 media API would need to surface the capability to access files in videos, or video attachments.

As envisioned, a polyfill would load JavaScript scripts and WASM modules from videos, implement and provide the standard runtime environment for those scripts and modules, and then run the videos, e.g., perhaps by calling a function like main or raising an event.

Another approach could make use of HTML5 custom elements. A custom element:

<custom-ivideo src="X" />

could utilize an <iframe> to load a generated HTML5 document:

<iframe src="ivideo.php?src=url_encode(X)" allowfullscreen="true" />

with that generated HTML5 document resembling:

<html>
  <head>
    <script src="ivideo-polyfill.js" />
  </head>
  <body>
    <video src="X" />
  </body>
</html>

This could also be achieved utilizing the srcdoc attribute of the <iframe> element.

Attachments

Attachments in videos are additional files, such as "related cover art, font files, transcripts, reports, error recovery files, picture or text-based annotations, copies of specifications, or other ancillary files".

One can refer to a comparison of video container formats to see which video container formats presently support attachments.

Attachments in videos are a means for adding JavaScript scripts and WASM modules to videos. By placing utilized scripts and modules in interactive videos, the videos can be self-contained and portable.

Interfaces for inspecting and accessing attachments could resemble:

partial interface HTMLMediaElement
{
  [SameObject] readonly attribute AttachmentList attachments;
}

[Exposed=Window]
interface AttachmentList : EventTarget {
  readonly attribute unsigned long length;
  getter Attachment (unsigned long index);

  attribute EventHandler onchange;
  attribute EventHandler onaddattachment;
  attribute EventHandler onremoveattachment;
};

In theory, one could provide arguments including MIME type and natural language when opening a video attachment, per
content negotiation.

partial interface AttachmentList
{
  Attachment? getAttachment(DOMString name, optional DOMString type, optional DOMString lang);
}

partial interface AttachmentList
{
  Promise<Attachment?> getAttachment(DOMString name, optional DOMString type, optional DOMString lang);
}

for example getAttachment("main", "application/wasm").

Related specifications include the File API and BufferSource. The BufferSource interface is utilized for loading WASM modules.

Alternatively, mechanisms for inspecting and accessing video attachments could be encapsulated by Web browsers and the standard runtime environment for interactive video.

Conclusion

A standard runtime environment for interactive videos and standard formats for interactive videos are needed. With new standards, interactive videos would be readily authored, self-contained, portable, secure, accessible, interoperable, readily analyzed, and readily indexed and searched.

With standard interactive video formats (and perhaps open file formats for project files), extensible content authoring tools could be more readily developed. Authoring interactive stories and producing interactive films is difficult and, with new software tools, we could expect much more content.

With standard interactive video formats, interactive videos would be self-contained and portable.

With a standard runtime environment and standard interactive video formats, interactive videos would be more secure and would access user data and other resources only in accordance with user permissions.

With a standard runtime environment and standard interactive video formats, interactive videos would be accessible.

With standard interactive video formats, interactive videos would be interoperable with other technologies. For example, interactive videos could be played in Web documents and EPUB digital textbooks.

With standard interactive video formats, interactive videos could be better analyzed.

With standard interactive video formats, large collections of interactive videos could be better indexed and searched.

Thank you. I look forward to discussing these ideas with you.

References

[HTML5 Media] https://html.spec.whatwg.org/multipage/media.html
[PLS] https://www.w3.org/TR/pronunciation-lexicon/
[SMIL] https://www.w3.org/TR/SMIL/
[SRGS] https://www.w3.org/TR/speech-grammar/
[V8] https://v8.dev/
[V8 Isolates, Contexts, Worlds and Frames] https://chromium.googlesource.com/chromium/src/+/refs/heads/main/third_party/blink/renderer/bindings/core/v8/V8BindingDesign.md#A-relationship-between-isolates_contexts_worlds-and-frames
[WASM] https://webassembly.org/
[Web Animations] https://www.w3.org/TR/web-animations-1/
[Web Speech API] https://wicg.github.io/speech-api/
[WebVTT] https://www.w3.org/TR/webvtt1/

Proposal to define privacy-enhanced prefetching and prerendering

tl;dr: We'd like to define how to prefetch and prerender content with more privacy-focused direction, and we think we'll need a mechanism for authors to become eligible since we think it is likely to break existing content otherwise.

Proposed solution explainer: https://github.com/jeremyroman/alternate-loading-modes

In order to making the experience of loading on the web faster, user agents employ prefetching and prerendering techniques. However, making cookies and other credentials available to the origin server or script may be inconsistent with the privacy objectives of the user or of the referring site.

First, consider the fetch of the resource. User agents would ideally prefetch the content in a way that does not identify the user. For example, the user agent could:

send a request without credentials (e.g., no Cookie or Authorization request header)
establish the connection from a different client IP address (e.g., using a proxy server or virtual private network, if available)
use a previously fetched response, including one previously fetched by a third party if it can be authenticated

However, because this (intentionally) obscures the user's identity, the response document cannot be personalized for the user. If it is used when the user navigates, the user will notice that they are not logged in (even if they should be), and other surprising behavior. A page designed with this in mind could "upgrade" itself when it loads, by personalizing the page based on data in unpartitioned storage and by fetching personalized content from the server.

Second, consider prerendering the page. User agents would ideally allow HTML parsing, subresource fetching, and script execution in a way that does not identify the user or cause user-visible annoyance. For example, the user agent could:

apply mitigations as above to subresource and scripted fetches
deny scripted access to unpartitioned storage, such as cookies and IndexedDB
deny permission to invoke window.alert, autoplay audio, and other APIs inappropriate at this time

In this case, not only is the HTML resource not personalized, but script will observe restrictions that would not ordinarily apply until navigation actually occurs. A page designed with this in mind could tolerate this at prerender time, and "upgrade" itself on navigation by accessing storage or fetching from the network.

Since existing web pages are unlikely to behave well with these restrictions today, and it is impractical for user agents to distinguish such pages, we propose a lightweight way for a page to declare that it is prepared for this and will, if necessary, upgrade itself when it gains access to unpartitioned storage and other privileges.

There has been previous discussion along these lines in w3c/resource-hints#82. (It also proposes a new prenavigate hint; defining triggers for these loading modes is not yet part of this proposal.)

Secure local and remote access to self hosted devices without manual setup

Introduction

In the IoT era, it is more and more common to have a need to remotely access a device located in a user's home or business. The current method for achieving this is almost entirely cloud-centric, which has notable disadvantages, especially in underprivileged areas or for critical equipment.

I propose a new protocol, which would allow for greater security and convenience in both remote and local access.

I have prototyped and refined this protocol in limited form as a standalone app, and believe it to have good usability and security.

As I've never written a proposal here before and this spec is rather complicated, I had trouble balancing generality and detail, but I believe this to be sufficient information to evaluate it.

Use Cases (Recommended)

Remotely accessing a consumer grade home automation system or smart camera, without any middleman
Setting up devices such as routers without relying on unencrypted HTTP or the cloud
Serving local content on a private network at an event, accessible via QR code.
Self-hosting more traditional applications such as chat servers or personal file storage.
Allowing a traditionally-hosted web app to communicate with a locally-hosted service, such as a management site for a smart TV.

Goals

Allow a user to discover applications, and connect to them.
Provide similar security as other trusted modern protocols without requiring a certificate authority or extensive manual setup.
Allow local connection to open source devices without trusting, or even needing to access, any device outside LAN.

Non-goals (Optional)

This protocol is not intended to specify any particular
"Web 3.0" features, and is only meant to be an extension of the use cases generally covered by a dynamic DNS service and a free SSL certificate.

It is also not in any way intended to replace a VPN, it just makes individual services remotely accessible.

Proposed Solution

A new protocol scheme, would be created to handle URLs that embed the server's public SSL key used for transport, along with a discovery scheme to resolve such URLs.

This would allow one to use a self-signed certificate(which may by generated onboard a device, for total E2E security), as the URL itself specifies the trust.

This kind of approach is already widely used in rather critical applications, in the form of "Anyone with this link" sharing.

Discovery

Discovery would have several modes, both built-in and pluggable.

The primary discovery mode would be a privacy-enhanced MDNS service type that would hash the special URL with the current time, rounded to the hour, to produce a discovery string, which would become part of an MDNS name.

This would ensure that users could not be tracked on public WiFi by observing their device's attempt to contact their home hub.

Implementing this alone would already cover a large number of use cases, meaning features could be added in an incremental way.

Upon connecting in this way, an additional in band step would allow the device to supply a set of public IP addresses.

These public IP addresses would be cached long term(weeks to months) for later use in connecting when outside the network. The host or user would be responsible for opening ports or whatever else was needed to make these URLs work.

As even dynamic IPs rarely change, this would enable remote access most of the time after connecting once locally, even if other discovery methods failed.

In the case of some mesh networks, this would be enough to be reliable indefinitely as IPs are based on cryptographic identifiers.

However, since IPs do change, and it is the intent of this proposal to enable sharing to anyone the user wishes, a secondary discovery method would be used, wherein the URL of a discovery server could be embedded directly in the main URL.

This discovery server URL would be passed the same rolling-hash string discussed in local discovery, and would be expected to return a list of IP addresses or traditional DNS domain names at which the device could be accessed.

As P2P technologies such as distributed hash tables are still extremely unstandardized, it does not make sense to attempt to add anything of the sort to this specification.

Instead, software and hardware vendors would create their own remote discovery servers, use public DHT key-value gateways, or perhaps implement some future true P2P method using service workers.

One important note is that the discovery server can be another self-hosted P2P URL. This is important as the system should be fully usable in an offline mesh scenario.

Traditional DNS is undesirable here due to the very high rate of change in lookup key, caused by use of rolling code.

URL Portability

It is important to note that this additional information contained in the URL bar should not change the unique origin associated with the URL.

This means that a device owner could easily take a device URI for a no-longer-supported hardware devices, open a port manually, and create a new URL to access it via some other means, without compromising the security provided by the embedded key itself.

As a convenience, a URL could even directly include an IP address for the host, providing a quick way to manually make a connection in case the discovery server fails.

UI Considerations

As with Bluetooth, the concept of "Pairing" could be used to enhance security. Pairing would be done via a new UI menu that could scan for devices.

Access to unpaired devices via this method could be treated as an insecure context.

Services could be associated with a title, which would be shown in the URL bar for that service once paired.

For enhanced security, users could be prompted to enter their own title, preventing bad services from showing trustworthy-looking things in the URL bar to people who just click through.

Security Enhancements

In addition, as an extra layer of security, not intended to replace logins, URLs would embed a strong random password which the server would require for connection, sent in a small JSON negotiation step after SSL setup before HTTP traffic begins.

While this password could be visible during discovery, it could not be found by an attacker with M2M or sniffing capabilities on networks other than the one containing the server, making attacks much harder.

For enhanced security, discovery could be entirely disabled by the host, making this password, providing full protection.

This password would not affect unique origin and could be freely changed.

Even in cases where the device itself had poor security, the pool of possible attackers would be mostly limited to those who already had access to the user's network, providing at least as good of security as a traditional unencrypted device admin page, protecting users from poor quality firmware.

Beyond this, as the URI scheme does not currently exist, it can be treated differently with respect to CORS, disallowing any POSTs from any other server without a preflight, and any other strengthened restrictions that may be desired in a "fresh start".

As CORS may be a highly uncommon use case, even GET could require a preflight, to further guard against the dime-a-dozen low quality devices on the market.

Similarly, non-preflighted requests FROM pages using the new scheme could be forbidden, to protect in cases of unrelated vulnerable services on the public internet.

Corporate Environments

Many situations call for a secure intranet, and this proposal could not be used unmodified by any company that did not trust employees not to discover and pair with random things that a threat actor could sneak onto the network.

This issue is mitigated by browser management rules, which could disable the feature entirely, restrict it to a certain set of URLs, or mark certain URLs as approved, or simply disabling discovery, and using out of band means to distribute URLs.

Examples

Theme Park

A park would like to make information such as maps available to all guests. Some of this information may be highly dynamic such as current ride closure conditions.

Router setup

A router has advanced features such as NAS functionality. A user plugs in the router and opens an instruction manual, where he is guided to connect to the router's hotspot, and visit the "Nearby Hosts" page of his or her browser.

He opens the browser page and sees that there is a service labeling itself as a router.

Clicking it, he is asked to confirm the pairing. The prompt informs the user that you should only pair a device if you are on a network you trust.

Once paired, he is taken to the router admin page, and enters the password from the bottom of his router as normal.

For remote access, the router manufacturers would of course need to run a discovery server, and embed code into the router to discover via this server.

As the connection is now a secure context, this router may provide advanced personal cloud features like offline-first service workers and WebRTC chat.

With enhanced security

In an enhanced security scenario, using a router with more advanced capability, the router could disable discovery and provide an NFC or QR code via LCD containing the secure URL.

In this scenario the user is partially protected even if they reuse a password, or have attackers that can intercept traffic right on their network.

Privacy & Security Considerations

The primary consideration I see is that someone may make a malicious server with whatever title they want.

However, this same attack vector exists in Bluetooth, using the same pairing-based trust model, and it seems to be within generally accepted industry practice.

In addition, it is mitigated by the URL-based nature, which allows for secure out of band transfer that does not rely on discovery.

The use cases covered by this proposal are currently covered by things like plain HTTP or Bluetooth itself, so there should not be a decrease in security.

As I see it, issues like tracking via the lookup resolvers are similar to the issues presented by any other current web tech.

There may be concerns with harmful or spam-filled sites on public networks, which may be harder for law enforcement to trace as there is no domain owner.

However it is a fairly poor vector, that does little one could not already do with a local IP and a QR code, and if it were considered an issue could be solved simply by removing the discovery feature.

One possible scenario where it could be an issue is a user creating a fake service having the same title as one ran by a business, and advertising it via local discovery from a hidden device.

Again, considering typically poor physical security in the world, this doesn't do much that could not be done by putting up a QR code or any other malicious signage.

A compromise mitigation might be to exclude general discovery from the browser, allowing technical users to download general scanner apps, and others to use NFC setup or manufacturer-specific apps that will only respond to signed devices.

Another potential issue is confusion caused by sending people links to sites with malicious titles, and getting them to trust them via social engineering.

This threat should be the same as any other malicious link. To reduce it further, URL titles could carry restrictions, such as not being able to end with a standard TLD.

One thing that would seem to largely mitigate the risk, is that discovery is entirely driven by the user. No notifications would be shown, preventing use in spam.

As mentioned above, allowing only user-selected titles prevents one clicking through a pairing process without thinking, and then seeing what appears to be a bank URL or such in the URL bar.

Furthermore, as the intention is to support locally hosted services that do not depend on the internet, payment-related APIs could be disabled on these devices.

Another notable consideration is that someone with physical access to a machine could set a malicious title for a paired device, tricking the user when they return.

This could already be done rather easily through a normal phishing URL, but in this case could be harder to prosecute. This scenario is partiality mitigated by restricting titles.

Only accept signed code for WebUSB

Note: This is my first time trying to write a proposal and I'm not a security expert. It may contain many missing pieces, but any feedback is welcomed.

Introduction

The new WebUSB draft API it's an amazing feature that may change how we think about webpages and webapps, but it lakes on security. Right now, the proposal says that a permission prompt is displayed to request access to a USB device. This is not completely safe, and privacy and security for a feature like this are crucial.

Use Cases

Put for example that a website has an XSS bug, that may introduce untrusted WebUSB device requests from an untrusted source. If that website usually requests access to your USB, the user may give access to the USB to untrusted code.

Goals

Only trusted code can have access to WebUSB API
Have a more clear prompt when requesting USB access

Proposed Solution

To solve this issue, the javascript file that is going to be executed and request USB access must be signed with the same SSL pub/priv keys.

Signing the source code allows the browser to automatically reject USB access to code that is not signed by origin. Also, allow the browser to display a prompt with more information, like "Foo Bar LLC is requesting access to your USB device".

Examples

Mock example:

webusb_js = WebUSB.requestDevice("https://www.samesite.com/webust_app.js.ssl")
webusb_js.run()

Privacy & Security Considerations

What to do with the data obtained from the USB? Should outlive the sandbox?

<transcript> element inside <video> and <audio> for deafblind users

Transcripts of multimedia are essential for deafblind users

I previously submitted this here, but was told it would be better here.

Deafblind multimedia users need everything in a machine-readable text format to be displayed via a screen reader in a refreshable braille device. Closed captions in videos are not usable text for them, because, though some modern braille readers can display the captions, the captions change too quickly to read them in real time as the video plays. Also, there is no easy way to access the captions separately from the video, even if the captions are in a text format. A separate transcript to audio or video is the only way that multimedia content can be made accessible to deafblind users.

Use Cases

There are ways to create transcripts already, like placing transcript content into a separate markup container after the multimedia content. But this technique has accessibility and usability issues:

There is no universal role for transcripts: anything can be a transcript and not be named as such. For deafblind users, it's hard to find a transcript for a multimedia element. The closeness to the multimedia element and the naming by authors are the only indicators for something that works as a "transcript".
Transcripts are neither connected directly nor semantically to the <video> or <audio>.
A connection via aria-describedby to the multimedia content is not usable for deafblind users, because it does not allow pausing or navigating the text in a screen reader. It makes the screen reader read the whole thing at once.
A large transcript would need a skip link placed before it to make it possible for other users to skip the content.
There should be an accessible way to show or hide the transcript for all users like there is for close captions (like through buttons in the UI of the multimedia element).
The duty to prepare a transcript rests solely with the author. The necessity for a transcript is not obvious to developers/authors.

Goals

Allow a <transcript> element inside the <video> or <audio>, which ensures a semantical connection and controllability via the multimedia player. It should be possible to place sectioning content and flow content inside, like in a <section>.
To make this usable for all users a <transcript> should reflect a button in the multimedia player to show/hide the transcript, something like there already is for close captions.
It should be possible to have reference a whole HTML document as a transcript, maybe something like this: <transcript src="/transcript.html">. An embedded solution like an <iframe> could be possible, but have the same security and accessibility issues.
This would allow to reference a <section> of the current document as transcript like this: <transcript src="#my-custom-transcript">.
A reference outside the multimedia element like this should allow new ARIA roles like <div role="transcript" id="my-custom-transscript">. A custom transcript outside the multimedia element would allow custom styling.
A native HTML element would highlight the importance of a transcript for deafblind users who can't access multimedia in another way than text.
Also search engines and their users could benefit from a semantically correct transcript.
Missing <transcript> elements could be automatically detected via automated testing tools like HTML Validator, Lighthouse, AXE, WAVE etc.

Examples

Example 1: embedded `<transcript>`

<video>
    <source src="/video.mp4" type="video/mp4">
    <transcript>
        <h1>Transcript for my Video</h1>
        <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p>
    </transcript>
    Sorry, your browser doesn't support embedded videos.
</video>

Example 2: embedded `<transcript>` with reference in the document

<video>
    <source src="/video.mp4" type="video/mp4">
    <transcript src="#my-transcript" />
    Sorry, your browser doesn't support embedded videos.
</video>
<div role="transcript" id="my-transcript">
    <h1>Transcript for my Video</h1>
    <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p>
</div>

Privacy & Security Considerations

Privacy

I think you can monitor and track, if <transcript> was viewed or not if toggled via a control in the multimedia element. But that does not reveal data about the person viewing it. This could be a robot or a human as well.

Security

Assumed, that <transcript> can work like an <iframe>, it could have the same security issues.

Add an HTMLElement.focus smooth scroll option

When we focus an element in the DOM, the default behaviour is to autoscroll to the element.
The scrolling can be disabled with the option preventScroll: true;

The manual way to scroll would be to call
https://developer.mozilla.org/en-US/docs/Web/API/Element/scrollIntoView
ScrollIntoView has a behaviour option which allow to define smooth scrolling instead of the default (instant teleport "scrolling").

Since smooth scrolling is often the desired behaviour and since we very often want scrolling on focus, it would be nice to provide a smooth scrolling behaviour option for focus directly instead of having to workaround the issue (and I suspect the easiests workaround hinder scrolling performance/framerate).

Searchable dynamic datalist with JSON list without Javascript

When the options to choose are many, instead of using <select>, an input and a <datalist> dynamically filled with a search in a data source can be used. Javascript is used to do this, but it is something so basic that the browser could do it without the need for Javascript, it would only be necessary to indicate the source of the data.

<input list="article-list" name="article" type="text" />
<datalist id="article-list" src="/article?lang=en&view=json" />

Example query URL with a search parameter added with the content of the input: "/article?lang=en&view=json&search=Ne"

A faster, parallelizable querySelectorAll

Speeding up DOM accesses would eliminate bottlenecks for a wide range of applications.
I recently discovered that the spec mandate querySelectorAll() to return elements in the document order.
This behavior is useful and is a nice default.
However, if I am not mistaken, the algorithm used for implementing a querySelectorAll is in essence a recursive (DOM) tree search and while regular Breadth first search or depth first search or other hybrid of a tree search can be easily parallelized/multithreaded, the constraint of retrieving elements in document order either:

Mostly prevents parallelism.
induce slowdowns through lockings/syncs and efforts in maintaining and joining in order during the search.

I understand that web browsers have clever implementations but it seems to me that the constraint or retrieving elements in document order, inherently limits parallelism/performance to some extent.
Therefore it seems to make sense to create a variant, named for example querySelectorAllUnordered() that would behave exactly the same as querySelectorAll, except that the order constraint would be alleviated, hence allowing implementors to have a faster, more parallel implementation of the tree search.
This has the potential to make the web platform faster and seems like a low hanging fruit.

Document Services

Introduction

Document services are a generalization over services for documents such as: spelling checking, grammar checking, proofreading, fact checking, and mathematical proof and argumentation checking. Document services are relevant to both document authoring and document reviewing scenarios. Imagine being able to check, in real-time, if a document has any informational, warning, or error messages with respect to its factuality or any steps of its reasoning. Tools for authoring and reviewing documents, in these regards, would be useful across sectors, across industry, academia, military, and government, with specific applicability to journalism, encyclopedias, digital textbooks, and science.

Presented, herein, is an approach for declaring and describing document services utilizing document metadata.

Varieties of Document Services

Thus far considered are three varieties of document services. Firstly, there are services which adhere to an informational message, warning, error pattern. Secondly, there are services which offer corrections, recommendations, or options for users. Thirdly, there are services which provide metadata about documents, document elements, or ranges (e.g. word count, reading level).

User Interface Discussion

Users could make use of application menus to have entire documents processed by document services. Users could also utilize context menus on specific document elements.

For those services which return informational messages, warnings or errors about documents, document elements, or ranges, there could be table views or grid views (see also: software development IDE’s) for collecting together and presenting multi-source informational messages, warnings, and errors.

For those services which offer corrections, recommendations, or options for users, interactive contextual panels or widgets could be of use.

For those services which return metadata about documents, document elements, or ranges, the visualization of such metadata is also a user interface topic.

Styling Documents for Document Services

It may be possible for document authors to style their documents for interoperation with various document services.

Various text decorations and background colors could be utilized.

Inline graphical symbols such as green checkmark symbols, white information icon symbols, yellow warning symbols, and red error symbols could be utilized.

Author-recommended and Reader-specified Services

When reviewing documents, there is an envisioned interplay between author-recommended and reader-specified services. For example, for the fact-checking scenario, document authors could indicate recommended fact-checking services to make use of and readers could have their own services configured.

Multiple Simultaneous Services

Multiple document services could be utilized simultaneously. The informational messages, warnings, and errors from multiple services could be merged. Similarly, corrections, recommendations, or options from multiple services could be merged. Similarly, metadata about documents, document elements, or ranges from multiple services could be merged.

URI-addressability of Document Content

Content of interest in documents could be URI-addressable in a number of ways.

https://www.example.org/document.xhtml#fact-123
https://www.example.org/document.xhtml#xpointer(...)
https://www.example.org/document.xhtml#:~:text=...

Firstly, document authors could make use of the id attribute. Secondly, XPointer could be utilized to address document content with URI. Thirdly, text fragments could be utilized with URI.

Document Metadata and Selectors

It is possible to utilize document metadata to declare document services without having to specify how document authors should use document markup.

Namespace-prefixable Selectors

There are numerous ways that document authors might use markup to indicate facts or claims. For example, some are:

new markup elements (e.g. <fact> or <claim>)
extensible markup elements (e.g. <ext:fact xmlns:ext="...">)
class names (e.g. <span class="fact">)
role attribute (e.g. <span role="fact">)
EPUB type attribute (e.g. <span epub:type="fact">)

For each way, one can select the facts or claims in a document.

With a CSS-based syntax:

fact
@namespace ext url(...); ext|fact
.fact
[role~='fact']
@namespace epub url(...); [epub|type~='fact']

With an XPath-based syntax:

//fact
xmlns(ext=...) //ext:fact
//*[contains(concat(' ',normalize-space(@class),' '),' fact ')]
//*[contains(concat(' ',normalize-space(@role),' '),' fact ')]
xmlns(epub=...) //*[contains(concat(' ',normalize-space(@epub:type),' '),' fact ')]

Using the namespace-prefixable selectors, above, one could use document metadata to indicate which document elements in a document were facts or claims.

With a CSS-based syntax:

<meta name="fact-checking-selector" content="fact" />

<meta name="fact-checking-selector" content="@namespace ext url(...); ext|fact" />

<meta name="fact-checking-selector" content=".fact" />

<meta name="fact-checking-selector" content="[role~='fact']" />

<meta name="fact-checking-selector" content="@namespace epub url(...); [epub|type~='fact']" />

With an XPath-based syntax:

<meta name="fact-checking-selector" content="//fact" />

<meta name="fact-checking-selector" content="xmlns(ext=...) //ext:fact" />

<meta name="fact-checking-selector" content="//*[contains(concat(' ',normalize-space(@class),' '),' fact ')]" />

<meta name="fact-checking-selector" content="//*[contains(concat(' ',normalize-space(@role),' '),' fact ')]" />

<meta name="fact-checking-selector" content="xmlns(epub=...) //*[contains(concat(' ',normalize-space(@epub:type),' '),' fact ')]" />

Namespace-prefixable Attribute Selectors

There is also the matter of using document metadata to indicate which attributes, if any, are utilized by a document author to reference inline or external resources on selected document elements.

With a CSS-based syntax:

attr(something url)
@namespace ext url(...); attr(ext|something url)

With an XPath-based syntax:

@something
xmlns(ext=...) @ext:something

One could indicate which attributes on those document elements were for specifying resources by using document metadata.

With a CSS-based syntax:

<meta name="fact-checking-resource" content="attr(something url)" />

<meta name="fact-checking-resource" content="@namespace ext url(...); attr(ext|something url)" />

With an XPath-based syntax:

<meta name="fact-checking-resource" content="@something" />

<meta name="fact-checking-resource" content="xmlns(ext=...) @ext:something" />

Document Service Providers

One could indicate which document service providers were recommended by a document author using document metadata.

<link rel="fact-checking-service-provider" href="https://www.wikidata.org/wiki/Special:FactCheck" />

Examples

A number of document metadata examples are provided.

Spelling Checking

<html>
  <head>
    <base href="https://www.example.org/document.xhtml" />
    <link rel="spelling-checking-service-provider" href="https://www.services.org/spelling-checking" />
  </head>
  <body>
    <p>HTML and MathML Content</p>
  </body>
</html>

Grammar Checking

<html>
  <head>
    <base href="https://www.example.org/document.xhtml" />
    <link rel="grammar-checking-service-provider" href="https://www.services.org/grammar-checking" />
  </head>
  <body>
    <p>HTML and MathML Content</p>
  </body>
</html>

Proofreading

<html>
  <head>
    <base href="https://www.example.org/document.xhtml" />
    <link rel="proofreading-service-provider" href="https://www.services.org/proofreading" />
  </head>
  <body>
    <p>HTML and MathML Content</p>
  </body>
</html>

Fact Checking

<html>
  <head>
    <base href="https://www.example.org/document.xhtml" />
    <meta name="fact-checking-selector" content="[role~='fact']" />
    <link rel="fact-checking-service-provider" href="https://www.wikidata.org/wiki/Special:FactCheck" />
  </head>
  <body>
    <span role="fact">HTML and MathML content</span>
    <div  role="fact">HTML and MathML content</div>
  </body>
</html>

Metadata

<html xmlns:ext="http://www.namespace.org/extensibility#">
  <head>
    <base href="https://www.example.org/document.xhtml" />
    <meta name="metadata-selector" content="@namespace ext url('http://www.namespace.org/extensibility#'); [ext|meta]" />
    <meta name="metadata-resource" content="@namespace ext url('http://www.namespace.org/extensibility#'); attr(ext|meta url)" />
    <script id="inline-metadata-123" type="...">...</script>
  </head>
  <body>
    <span ext:meta="#inline-metadata-123">HTML and MathML Content</span>
    <div  ext:meta="external-metadata-124.php">HTML and MathML Content</div>
  </body>
</html>

Provenance

<html xmlns:ext="http://www.namespace.org/extensibility#">
  <head>
    <base href="https://www.example.org/document.xhtml" />
    <meta name="provenance-selector" content="@namespace ext url('http://www.namespace.org/extensibility#'); [ext|provo]" />
    <meta name="provenance-resource" content="@namespace ext url('http://www.namespace.org/extensibility#'); attr(ext|provo url)" />
    <script id="inline-provenance-123" type="...">...</script>
  </head>
  <body>
    <span ext:provo="#inline-provenance-123">HTML and MathML Content</span>
    <div  ext:provo="external-provenance-124.php">HTML and MathML Content</div>
  </body>
</html>

Mathematical Proof

<html xmlns:ext="http://www.namespace.org/extensibility#">
  <head>
    <base href="https://www.example.org/document.xhtml" />
    <meta name="proof-selector" content="@namespace ext url('http://www.namespace.org/extensibility#'); [ext|proof]" />
    <meta name="proof-resource" content="@namespace ext url('http://www.namespace.org/extensibility#'); attr(ext|proof url)" />
    <script id="inline-proof-123" type="...">...</script>
  </head>
  <body>
    <math ext:proof="#inline-proof-123">MathML Content</math>
    <math ext:proof="external-proof-124.php">MathML Content</math>
  </body>
</html>

<html>
  <head>
    <base href="https://www.example.org/document.xhtml" />
    <meta name="proof-selector" content="math.proveable" />
    <link rel="proof-service-provider" href="https://www.services.org/proof" />
  </head>
  <body>
    <math class="proveable">MathML Content</math>
    <math class="proveable">MathML Content</math>
  </body>
</html>

Mathematical Proof Checking

<html>
  <head>
    <base href="https://www.example.org/document.xhtml" />
    <meta name="proof-selector" content="math.proveable" />
    <meta name="proof-resource" content="attr(data-proof url)" />
    <link rel="proof-checking-service-provider" href="https://www.services.org/proof-checking" />
    <script id="inline-proof-123" type="...">...</script>
  </head>
  <body>
    <math class="proveable" data-proof="#inline-proof-123">MathML Content</math>
    <math class="proveable" data-proof="external-proof-124.php">MathML Content</math>
  </body>
</html>

Argumentation

<html xmlns:ext="http://www.namespace.org/extensibility#">
  <head>
    <base href="https://www.example.org/document.xhtml" />
    <meta name="argumentation-selector" content="@namespace ext url('http://www.namespace.org/extensibility#'); [ext|argu]" />
    <meta name="argumentation-resource" content="@namespace ext url('http://www.namespace.org/extensibility#'); attr(ext|argu url)" />
    <script id="inline-argu-123" type="...">...</script>
  </head>
  <body>
    <span ext:argu="#inline-argu-123">HTML and MathML Content</span>
    <div  ext:argu="external-argu-124.php">HTML and MathML Content</div>
  </body>
</html>

Argumentation Checking

<html xmlns:ext="http://www.namespace.org/extensibility#">
  <head>
    <base href="https://www.example.org/document.xhtml" />
    <meta name="argumentation-selector" content="@namespace ext url('http://www.namespace.org/extensibility#'); [ext|argu]" />
    <meta name="argumentation-resource" content="@namespace ext url('http://www.namespace.org/extensibility#'); attr(ext|argu url)" />
    <link rel="argumentation-checking-service-provider" href="https://www.services.org/argumentation-checking" />
    <script id="inline-argu-123" type="...">...</script>
  </head>
  <body>
    <span ext:argu="#inline-argu-123">HTML and MathML Content</span>
    <div  ext:argu="external-argu-124.php">HTML and MathML Content</div>
  </body>
</html>

Types of Resources

A number of types of resources could be involved in document services scenarios.

Service-specific Data Formats

Data could be served in data formats specific to document services, e.g. mathematical proofs and argumentation, and these service-specific formats could be consumed by other document services, e.g. mathematical proof checking and argumentation checking.

Hypertext-embedded Data Formats

Data could be served embedded in HTML documents, e.g. with RDFa or microdata, for simultaneous machine-utilizability and human-readability.

Markup for Informational Messages, Warnings, and Errors

<messages xmlns="..." xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <m kind="info" type="..." about="https://www.example.org/document.xhtml#fact-123">This is an informative message.</m>
  <m kind="info" type="..." about="https://www.example.org/document.xhtml#xpointer(...)">This is an informative message.</m>
  <m kind="info" type="..." about="https://www.example.org/document.xhtml#:~:text=...">This is an informative message.</m>
</messages>

<messages xmlns="..." xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <m kind="info" type="..." start="https://www.example.org/document.xhtml#xpointer(...)" end="https://www.example.org/document.xhtml#xpointer(...)">This is an informative message.</m>
</messages>

Document and Document Element Metadata Formats

Information about documents or document elements could be provided in service response data. Example scenarios include providing the word count or reading level of a document or portion of a document.

<response xmlns="...">
  <metadata type="application/rdf+xml">
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ext="...">
      <rdf:Description rdf:about="https://www.example.org/document.xhtml">
        <ext:wordCount rdf:datatype="http://www.w3.org/2001/XMLSchema#int">1234</ext:wordCount>
      </rdf:Description>
    </rdf:RDF>
  </metadata>
</response>

URL-formulation Formats

Technologies such as OpenSearch utilize XML to provide URL-addressable services using a curly-brackets-based syntax for specifying how URL's should be formed from data. OpenSearch description documents are served with the MIME type of application/opensearchdescription+xml.

An example of OpenSearch markup:

<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/" xmlns:moz="http://www.mozilla.org/2006/browser/search/">
  <ShortName>Wikipedia (en)</ShortName>
  <Description>Wikipedia (en)</Description>
  <Image height="16" width="16" type="image/x-icon">https://en.wikipedia.org/static/favicon/wikipedia.ico</Image>
  <Url type="text/html" method="get" template="https://en.wikipedia.org/w/index.php?title=Special:Search&amp;search={searchTerms}" />
  <Url type="application/x-suggestions+json" method="get" template="https://en.wikipedia.org/w/api.php?action=opensearch&amp;search={searchTerms}&amp;namespace=0" />
  <Url type="application/x-suggestions+xml" method="get" template="https://en.wikipedia.org/w/api.php?action=opensearch&amp;format=xml&amp;search={searchTerms}&amp;namespace=0" />
  <moz:SearchForm>https://en.wikipedia.org/wiki/Special:Search</moz:SearchForm>
</OpenSearchDescription>

Web Services Description Language

Web Services Description Language (WSDL) is an XML language for describing Web services. It is served with the MIME type of application/wsdl+xml.

Conclusion

Some ideas were presented towards facilitating document services, such as real-time fact-checking and reasoning-checking, for HTML documents.

These ideas are also expressed in a document available at: https://www.w3.org/community/argumentation/wiki/Document_Services.

There is also, presently, a Document Services Community Group in a proposal stage. If working on these or related ideas is of interest to you, please feel free to support the creation of the group and to join the group (https://www.w3.org/community/groups/proposed/#services).

I look forward to discussing these ideas with you. Feedback, comments, ideas, suggestions are welcomed. Thank you.

Saving, Noting and Scrapbooking Webpages and/or Any Objects Embedded in Them

Introduction

Users could save, note or scrapbook webpages and/or any objects embedded in them, storing these contents for later use. Objects can be embedded in webpages in a number of ways: via semantic annotation, Web Schema annotation, or Web Components custom elements.

Resembling the buttons for backward navigation, forward navigation and reloading, a button could be added to Web browser user interfaces for saving, noting or scrapbooking webpages and/or any objects embedded in them. Similarly, a context menu item could be added for users to easily access such Web browser functionality. Similarly, a keyboard shortcut could be made available for these functionalities.

Local Storage

Users could choose to store webpages and/or any objects embedded in them locally. Object-based storage could be organized into folders. For instance, objects annotated by a Recipe schema could go into a “Recipes” folder while objects annotated by NewsArticle schema could go into a “News Articles” folder.

Features available from the extensibility of operating systems’ shells or explorer folders could be utilized to provide user experiences for object-based storage.

Cloud-based Storage

Users could choose to store webpages and/or any objects embedded in them on cloud-based storage such as Microsoft OneDrive, Google Drive, or Apple iCloud.

There are a set of interesting and worthwhile services that software developers, including third-party developers, could provide for users should they choose to save, note or scrapbook webpages and/or any objects embedded in them to their cloud-based storage. Apps, plugins and services could be developed to analyze the contents of collections of stored webpages and/or objects to provide features for users.

Services for News Articles Stored on the Cloud

Should users choose to save, note or scrapbook news articles that they encounter to their cloud-based storage, a number of advanced services could be provided. Examples of such services include, but are not limited to: (1) notifying end-users of the distribution of the sources of their collected articles, (2) notifying end-users whether their collected articles contain any misinformation or disinformation, (3) notifying end-users whether any new articles supersede any collected articles, (4) indicating to end-users the distribution of topics or keywords in their collected articles, (5) indicating to end-users the results of processing sentiment analysis tools upon their collected articles, (6) indicating to end-users any spin or persuasion in their collected articles, (7) indicating to end-users the comprehensiveness of their news search and gathering processes for given topics, (8) providing end-users with multi-document summarization, (9) providing end-users with multi-document Q&A systems, (10) providing end-users with news recommendation systems which, in a configurable manner, could recommend news articles and editorials to end-users to mitigate potential cognitive biases, e.g. confirmation biases, evident in end-users’ collections, and (11) providing end-users with other features and services.

Conclusion

Secure Curves in WebCrypto

Introduction

The Web Cryptography API currently only specifies the NIST P-256, P-384, and P-521 curves, and does not specify any "safe curves". Among the safe curves, Curve25519 and Curve448 have gained the most traction, and have been specified for use in TLS 1.3, for example. They have also been recommended by the Crypto Forum Research Group (CFRG) of the Internet Research Task Force (IRTF), and are expected to be approved by NIST.

In addition, Node.js has implemented a nonstandard extension to Web Crypto, adding Curve25519 and Curve448 under a vendor-prefixed name. We would like to avoid other implementations doing the same, and encourage intercompatibility going forward by providing a standard specification.

Today, web developers are getting around the unavailability of Curve25519 and Curve448 in the browser either by using less secure curves, or by including an implementation of Curve25519 and/or Curve448 in JavaScript or WebAssembly. In addition to wasting bandwidth shipping algorithms that are already included in browsers that implement TLS 1.3, this practice also has security implications, e.g. side-channel attacks as studied by Daniel Genkin et al.

We propose solving the above problem by adding support for Curve25519 and Curve448 algorithms in the Web Cryptography API, namely the signature algorithms Ed25519 and Ed448, and the key agreement algorithms X25519 and X448.

Read the complete Explainer.

See the Draft Specification.

Feedback

I welcome feedback in this thread, but encourage you to file bugs against the Explainer.

Multi-capture

Introduction

Some applications wish to concurrently capture multiple surfaces.

Capturing multiple surfaces is doable using existing APIs - it is possible to call getDisplayMedia() multiple times. However, this is not very ergonomic, and creates serious friction for the user:

The user has to interact with the browser's media-picker multiple times.
The user has to interact with the application multiple times, signaling that they want to capture yet another surface, and providing a new transient activation each time.
The user is liable to make mistakes when trying to remember which surfaces they've already started capturing, and which surfaces remain for them to capture.

Ideally, a single transient activation could be used for single API invocation, providing the user with a media-picker with functionality akin to checkboxes (mentioned here by way of example; we don't need to mandate specific UX elements). The user would be allowed to choose all of the display surfaces that they want to capture, then click OK once. It is clear from context that these are all of e surfaces the user was aiming to capture, and that no additional API calls to gDM or the like are necessary.

Illustration:

Use Cases

Use-case 1: Streamers presenting multiple surfaces (dynamic receivers)

Consider an instructor presenting multiple tabs to several students.

Instructor streams multiple tabs to an SFU.
Individual students independently choose tab to view at any given moment.

With a single click, the instructor can start capturing all the relevant tabs.

Use-case 2: Streamers presenting changing surface (dynamic sender)

Video conferencing software asks the user to choose all the tabs the user wishes to share.
The application captures all surfaces, but, at any given moment, it only relays to the SFU a single tab. Which tab is relayed depends on app-specific logic. (For instance, maybe only the last-active tab.)

Use-case 3: Record N screens/windows/tab

Recording for compliance/training/billing reasons.

Use-case 4: Record and compose

Record multiple windows.
Redraw them to a canvas to produce a video of a virtual desktop.
This virtual desktop only has the captured windows, which improves privacy a lot over what users currently do nowadays - sharing the entire (real) desktop so as to share a handful of windows.
(Additional, orthogonal API for learning the position and size of windows needed to make this truly powerful.)

Goals

Provide an API which allows multiple screen-captures to be initiated. It should only require a single transient activation. Ideally, the user agent should present to the user a UX which would render certain user-mistakes impossible (e.g. capturing the same surface multiple times).

Proposed Solution

Possible API 1: New method (getDisplayMediaSet)

partial interface MediaDevices {
  Promise<sequence<MediaStream>> getDisplayMediaSet(
    optional MediaStreamConstraints constraints = {});
}

Possible API 2: Overloaded return type for getDisplayMedia

Add a possible paramter to getDisplayMedia called maxSurfaces.
Its default value is 1. With that value, the existing behavior is manifested.
For values greather than 1, the new behavior is manifested (multi-picker), and the return type changes to Promise<sequence<MediaStream>>.

Examples

See mock.

Let’s Discuss

Which of the APIs proposed above is preferable? (Or anything else...?)
Any unforeseen issues with allowing multiple display surfaces of different types?
Audio - global or per-surface?

Form submission in the background without Javascript

Currently the only way to submit a form in the background and display the result of the action in a dialog (for example "Action executed correctly") is the use of Javascript. If you have different buttons with different actions, it is necessary to detect which one has been pressed, add its value as a parameter to the form, use FormData to read the parameters, use XMLHttpRequest to send the request and process its result, show a dialog with "Accept" and other buttons and blocking the background.

All that added complexity for the developer could be eliminated. With a indication in the form with an attribute the browser could take care of all this as in the submission in the foreground.

Add boolean input element with "true" and "false" values

Currently a way to enter in a form a boolean value that can be true or false is to use a checkbox. If the input is selected the field is sent and if it is not selected it is not sent. If it is sent is true and if it is not sent it is false.

It may be the case that in a form, on some occasion, you do not want to add that parameter. So in the server it is not known if that parameter is not included because it was not in the form or because you want to give it the false value.

Right now what is usually done is to have a checkbox and a hidden input with the same name. The checkbox has true value and the hidden one the false value. When the checkbox is marked the two parameters are sent and it is taken the value of the first, the checkbox, which is true. When the checkbox is not marked only the value of the hidden one is sent, which is false and its value is taken. If checkbox and hidden are not in the form on the server you know the parameter is not included.

<input name="boolean-parameter" type="checkbox" value="true" />
<input name="boolean-parameter" type="hidden" value="false" />

Also in databases the boolean field in addition to true or false can be null. In languages like Java the Boolean type can be null. With the above method we cannot differentiate between absent and null, we would have to use for example radio inputs.

<input name="boolean-parameter" type="radio" value="true" /> True
<input name="boolean-parameter" type="radio" value="false" /> False
<input name="boolean-parameter" type="radio" value="null" /> Null

It would be quite useful to have an input with the appearance of checkbox o radio inputs that had the value true when checked and false when unchecked. It would also be useful that it had some mechanism to give it the null value. In summarize an input that allows differentiating between the true, false, null and absent states.

<input name="boolean-parameter" type="boolean" />

Conditional Focus (When Display-Capture Starts)

Conditional Focus

Problem

When an application starts capturing a display-surface, the user agent faces a decision - should the captured display-surface be brought to the forefront, or should the capturing application retain focus.

The user agent is mostly agnostic of the nature of the capturing and captured applications, and is therefore ill-positioned to make an informed decision.

In contrast, the capturing application is familiar with its own properties, and is therefore better suited to make this decision. Moreover, by reading displaySurface and/or using Capture Handle, the capturing application can learn about the captured display-surface, driving an even more informed decision.

Sample Use Case

For example, a video conferencing application may wish to:

Focus a captured application that users typically interact with during the call, like a text editor.
Retain for itself focus when the captured display-surface is non-interactive content, like a video.
- (Using Capture Handle, the capturing application may even allow the user to remotely start/pause the video.)

Proposed Solution

Recall that getDisplayMedia() returns a Promise<MediaStream>, and that said MediaStream is guaranteed to contain at least one video track.
When getDisplayMedia() is called and the user approves the capture of a tab or a window, the video track will be of type FocusableMediaStreamTrack, subclassing MediaStreamTrack.
FocusableMediaStreamTrack exposes a focus() method.
This method may only be called on the microtask on which the aforementioned Promise was resolved. Later invocations of focus() produce an error.
Calls to focus() that occur more than 1s after capture started are silently ignored, preventing the application from performing a busy-wait on the aforementioned microtask and then calling focus() later.
Calling focus("no-focus-change") leads to focus being retained by the capturing application. Calling focus("focus-captured-surface") immediately switches focus to the captured tab/window.
Not calling focus() at all, or calling it too late, leaves the decision in the hands of the user agent.

Suggested-spec

See spec-draft for the full description of the suggested solution.

Demo

This solution is implemented in Chrome starting with m95. It is gated by --enable-blink-features=ConditionalFocus. (Or enable Experimental Web Platform features on chrome://flags.)
A demo is available. It works with Chrome m95 and up.

Sample Code

const stream = await navigator.mediaDevices.getDisplayMedia();
track.focus(ShouldFocus(stream) ? "focus-captured-surface" : "no-focus-change")

function ShouldFocus(stream) {
  const [track] = stream.getVideoTracks();
  if (sampleUsesCaptureHandle) {
    // Assume logic discriminating focusability by origin,
    // for instance focusing anything except https://collaborator.com.
    const captureHandle = track.getCaptureHandle();
    return ShouldFocusOrigin(captureHandle && captureHandle.origin);
  } else {  // Assume Capture Handle is not a thing.
    // Assume the application is only interested in focusing tabs, not windows.
    return track.getSettings().displaySurface == 'browser';
  }
}

Security Concerns

One noteworthy security concerns is that allowing switching focus at an arbitrary moment could allow clickjacking attacks. The suggested spec addresses this concern by limiting the time when focus-switching may be triggered/suppressed.

Alternate Approaches

Alternate Approach 1: Extra Parameter to getDisplayMedia()

This would allow an application to always/never switch focus, but it would not allow the application to make a divergent decision based on what display-surface was selected by the user. The sample use case presented earlier showcases why that would not be a desirable limitation.

Alternate Approach 2: Focus Hand-Off (Capturer->Captured)

Allowing capturer->captured handoff of focus was considered and pitched to the WebRTC WG. This option is a bit scarier from a security perspective, as the capturing application could try to clickjack the user into pressing something problematic on the captured application at an inopportune moment. The risk seems limited, but it's still greater than with the current proposal.

Let's Discuss

Discussion welcome either here or on the relevant thread in the WebRTC WG.

Proposal: NativeIO

Introduction

We would like to collaborate on NativeIO, a performant and generic storage API.

Please see the Explainer for more information.

Feedback

I welcome feedback in this thread, but encourage you to file bugs against the Explainer