wintercg / fetch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from whatwg/fetch

23.0 23.0 0.0 7.55 MB

WinterCG Fetch Standard

Home Page: https://fetch.spec.wintercg.org/

License: Other

HTML 100.00%

fetch's People

Contributors

Stargazers

Watchers

fetch's Issues

Use compliance modes throughout

Use the compliance modes from #19 throughout the spec:

URL with relative uri

E.g. fetch('/foo'), what should the whole URL should be?

As fetch() is a fairly well-known API, and a bunch of folks are used to consulting the WHATWG fetch, would it make sense to have a section in the WinterCG spec that summarizes where the differences between these two specs are?

I think that would help a couple of decision makers feel more comfortable following the WinterCG spec where appropriate.

(This is probably generally true for WinterCG specs that are forks of existing web specs.)

Standardize fetch calls to `file:` URLs

The WHATWG Fetch spec says this regarding fetching of URLs using the file: protocol:

For now, unfortunate as it is, file URLs are left as an exercise for the reader.

When in doubt, return a network error.

That'd be nice to have a standard way for handling those.

Cookies and fetch() on servers

The fetch() spec forbids to read and store cookies because of https://developer.mozilla.org/en-US/docs/Glossary/Forbidden_header_name.
How should handle cookies in server environments?

Most implementations have allow them in some way, however this changed between all the implementations:

Cloudflare ahttps://developers.cloudflare.com/workers/runtime-apis/headers/ whatwg#973
Deno (It seems it allows all the Forbidden Header names - @lucacasonato please edit this issue to confirm, possibly add a reference).
node-fetch (https://www.npmjs.com/package/node-fetch#extract-set-cookie-header)

I think it would be a good way to create a shared standard for this behavior.

Standardize supported subset of `fetch`

We should discuss things like cors behavior - there are questions/suggestions about this in nodejs/undici#1315 (comment)

I think https://deno.land/manual/runtime/web_platform_apis#spec-deviations is a good baseline but I would request the following deviations for what we standardize:

The server user agent does not have a cookie jar. As such, the set-cookie header on a response is not processed, or filtered from the visible response headers.
Servers do not follow the same-origin policy, because the http agent currently does not have the concept of origins, and it does not have a cookie jar. This means servers do not need to protect against leaking authenticated data cross origin. Because of this servers do not implement the following sections of the WHATWG fetch specification:

Section 3.1. 'Origin' header.
Section 3.2. CORS protocol.
Section 3.5. CORB.
Section 3.6. 'Cross-Origin-Resource-Policy' header.
Atomic HTTP redirect handling.
The opaqueredirect response type.

A fetch with a redirect mode of manual will return a basic response rather than an opaqueredirect response.
The request and response header guards are implemented, but unlike browsers do not have any constraints on which header names are allowed.
The referrer, referrerPolicy, mode, credentials, cache, integrity, keepalive, and window properties and their relevant behaviours in RequestInit are not implemented. The relevant fields are not present on the Request object.

Of course, this would need to be bike-shedded and written more formally. Please suggest any more deviations we'd want here.

Note this list omits the handling of file: urls. Node.js does not wish to implement file url support at the moment because of security concerns. People (@mcollina for example) have raised good concerns it would be too easy to get a file url from a user and pass that to fetch. I think it's probably fine for servers/edge to deviate on this?

cc @lucacasonato

For compressed responses, omit `Content-Length` and `Content-Encoding` headers

The fetch API is becoming the standard HTTP client for server usage. When proxying back a fetched resource to the user-agent, here's what i'd intuitively do:

async function myEndpoint(request: Request): Promise<Response> {
   return await fetch("https://www.gatsbyjs.com/Gatsby-Logo.svg");
}

The URL in this example responds with a gzip-compressed SVG. As specced in 16.1.1.1 of 15.6 HTTP - network fetch, the body stream contains an uncompressed version of the SVG, making compression transparent to the user. This works well for consuming the SVG.

However, the spec does not require headers to mirror this decompression. Although the Response has uncompressed body, the Content-Length headers shows it compressed length and the Content-Encoding header pretends it's compressed. In nodejs/undici#2514 (the issue I originally opened), I've included a repro of what this means if the Response is proxied to the user-agent:

The body ends up being longer than the headers described. This is not allowed in the HTTP spec, and leads clients like cURL to warn.
The user-agent will try decompress a body that isn't compressed

The current workaround is to manually alter headers on responses from fetch:

async function myEndpoint(request: Request): Promise<Response> {
   let resp = await fetch("https://www.gatsbyjs.com/Gatsby-Logo.svg");
   if (resp.headers.get("content-encoding") {
     const headers = new Headers(resp.headers)
     headers.delete("content-encoding")
     headers.delete("content-length")
     resp = new Response(body, { ...resp, headers })
   }
   return resp
}

This is cumbersome and should live in library code. It can't live in frameworks, because it's impossible to differentiate a response produced by fetch (content-encoding header, but uncompressed body) from a compressed Response created in user code (same content-encoding header, but compressed body).

Instead of this workaround, fetch should require the content-length and content-encoding headers to be deleted when response body is decompressed.

Some implementors already do this, including Deno and workerd. Netlify might implement the same for Netlify Functions 2.0.

I've checked undici (Node.js), node-fetch, Chrome and Safari - all of them expose the behaviour explained above, where the headers don't match the body.

An alternative solution for this would be whatwg#1524 - that way, frameworks can use the decompressed body field to tell compressed responses from uncompressed ones.

Summary:

What's the problem? compressed responses having content-encoding header, but decompressed body leads to incoherent Response
What's the usecase? mostly reverse proxying
Is it relevant to upstream? Yes, potentially because of service workers.
What's my suggestion for a fix? Upon decompression, content-length and content-encoding headers should be deleted.
How do engines deal with this? Most have the bug, but Deno and workerd implemented the fix I propose.

Work on standardizing multipart/form-data parsing (for `Request.prototype.formData`)

The fetch spec includes APIs for interacting with form submissions. For example, there is the Request and Response constructors accepting URLSearchParams and FormData objects as the request/response body, which is generally useful and is expected to be part of the common minimum API.

However, the fetch spec also defines the formData() method of the Body interface mixin, which is included in Request and Response. This method parses the HTTP body as a form submission enctype (either application/x-www-form-urlencoded or multipart/form-data) and returns a FormData object. Since form submission bodies only generally make sense as requests, and it's rarely useful to parse a request body from an HTTP client, it wouldn't make much sense to include this method as part of the common minimum API – but it is certainly useful for fetch-based HTTP server APIs, as Deno and CFW have.

For multipart/form-data parsing, however, this method leaves things almost completely unspecified. While there is a formal definition of this format (in RFC7578, which relies on the multipart definitions in RFC2046), it is in the form of an ABNF grammar rather than a parsing algorithms, and so different implementations differ in how they parse some input.

What's more, browsers have not always escaped field names and filenames in multipart/form-data payloads in the same way. For example, until last year Firefox escaped double quotes by prepending a backslash, and newlines by turning them into spaces; while Chromium and Webkit used percent-encoding. And while this percent-encoding behavior was added to the HTML spec (whatwg/html#6282), and FIrefox's behavior fixed in turn, no implementation of the parsing that I'm aware of (including Chromium and Webkit!) decode the percent-encoding escapes:

const original = new FormData();
original.set('a"b', "");
original.set('c"d', new File([], 'e"f'));
log(original);  // a"b c"d e"f

const parsed = await new Response(original).formData();
log(parsed);  // a%22b c%22d e%22f
// (In CFW it's a%22b c%22d undefined, because it seems like files are not
// distinguished from non-file values when parsing.)

function log(formdata) {
  // FormData is pair-iterable.
  const entries = [...formdata];
  const firstEntryName = entries[0][0];
  const secondEntryName = entries[1][0];
  const secondEntryFilename = entries[1][1].name;
  console.log(firstEntryName, secondEntryName, secondEntryFilename);
}

For browsers, specifying multipart/form-data parsing is not a big priority, since there are not many use cases for them, and the formData() method has been broken for 8 years or so. But for WinterCG runtimes with a fetch-based HTTP server API, being able to parse form submissions with the existing fetch API is crucial, and being able to accurately parse the form submissions that all browser engines are currently submitting is a large part of that. So this seems like a very interesting issue to tackle as part of the WinterCG project.

`Response.redirect` with relative URLs

This is sort of similar to #9 but I think it has some interesting semantics that might be worth looking into seperately.

The current spec is very clear that Response.redirect MUST be resolved against the current url. This is rather unfortunate by itself since server runtimes don't really have a concept of "current url". Outside of that, redirects (especially when paired with status code 307) are incredibly common for stuff like OAuth flows.

Currently, doing:

return Response.redirect("/home", 307);

just errors in Node and Deno (without the --location flag). The correct way to do this currently is

return new Response(null, {
    status: 307,
    headers: {
        "Location": "/home"
    }
});

which I think is suboptimal. Perhaps this issue should be raised upstream, because this could actually work in a web-standard Response.redirect. Ideally, Response.redirect would just use a relative location for status codes that support it. I'm hoping to champion this change in WinterCG and eventually land it upstream later.

Mutual TLS (mTLS)

Proposal

I would like to propose that the CG pursue standardization of Mutual TLS authentication in the fetch API.

This is a feature that is not likely to be implemented by browser runtimes but is in my opinion missing in non-browser runtimes where fetch is the only interoperable HTTP client.

The use-case I have in mind is implementation of OAuth mTLS Client Authentication and Client Certificate-Bound Access Tokens.

Prior Art

Node.js - https module has the option to provide the cert, key, crl, passphrase, pfx, and ca options.

Deno - using deno --unstable there's Deno.createHttpClient, the result of which can be passed as a client property to fetch's init argument. This method accepts certChain, privateKey, and caCerts options.

Request/Response browser differences

Currently, both the Request and Response classes contain a bunch of properties that likely have no effect server-side.

Request:

request.destination
request.referrer
request.referrerPolicy
request.mode *
request.cache "...indicating how the request will interact with the browser’s cache when fetching"
request.integrity "A cryptographic hash of the resource to be fetched by request."
request.isReloadNavigation
request.isHistoryNavigation

RequestInit

(options that are passed to the Request constructor)

RequestInit.referrer: "A string whose value is a same-origin URL"
RequestInit.referrerPolicy: "A referrer policy to set request’s referrerPolicy."
RequestInit.mode: "A string to indicate whether the request will use CORS, or will be restricted to same-origin URLs. Sets request’s mode. If input is a string, it defaults to "cors"."
RequestInit.credentials: see request.credentials *
RequestInit.cache: see request.cache
RequestInit.integrity: see request.integrity
RequestInit.window: "Can only be null. Used to disassociate request from any Window."

Response

response.type **

* omit and include may be useful for developers, however same-origin is not.
** cors should be omitted from this type.

Implementations:

node.js: implements "everything" according to the spec (even things that aren't relevant to a server environment).
Deno: https://deno.land/manual/runtime/web_platform_apis#fetch-api
node-fetch: https://github.com/node-fetch/node-fetch#class-request
Cloudflare workers: https://developers.cloudflare.com/workers//runtime-apis/request#requestinit https://developers.cloudflare.com/workers//runtime-apis/request#properties

Aligning Behavior

As you can see, each environment is different in supported properties which can cause cross-platform confusion. It also makes everything more confusing considering that these platforms typically leave in unsupported properties in their typings, but do not document which types are ignored (unless you look for it on google).

Potential Solutions:

Choose default values to return for useless flags (for example, node.js' Request class will always return false for request.isHistoryNavigation). Default flags would also be needed for RequestInit as the spec heavily defines fetch's behavior from certain flags being set.
Fork the fetch spec and remove mentions of these flags, along with conditions that would no longer be possible with said flags being removed. This would be a lot of work.
Create a new document that would supersede certain steps/behaviors in the fetch spec. For example:

<-- Original fetch spec -->

# some title
1. If request's mode is "cors" then:
   ...
2. Perform scheme fetch.
3. If request's `referrer` is not this's current settings origin url then:
   1. Abort this request.
   2. Return a network error.
// and so on

<-- Server environment spec -->

# some title
1. Ignore step 1
3. Ignore step 3
// and so on

Specification outline

As promised a few calls ago, I have been working on drafting the initial specification for WinterCG Fetch. I've had many discussions with multiple folks and I have arrived at two options for us. I'd like us to decide on one of them as the organization structure for our specification. Once agreed; I will continue #11 and get our base line specification published.

Option 1

The first option is to create a fork of whatwg/fetch here in wintercg. We will utilize aspects of the Bikeshed language (which is what whatwg/fetch is written in) to omit sections and include notes/extensions for aspects that we want to modify.

We will be responsible for rebasing our modifications every time Fetch lands a change to the specification. This could be partially automated where we create a bot that watches the whatwg/fetch repo, and anytime new commit(s) are merged to main, it would open a branch and attempts to do the necessary git operations. Of course, if there are merge conflicts they would need to be settled by a contributor here in WinterCG.

This will ensure our specification is always up to date with the latest whatwg version.

This option has a long-term maintenance cost where members of WinterCG would be responsible for managing the rebasing overtime. As stated, it could be automated, but it wouldn't be a perfect solution as whenever conflicts arise someone would have to spend time fixing them.

Option 2

The second option is to start with essentially an empty specification that states something along the lines of: "Unless otherwise specified in this document, WinterCG Fetch is compatible with the latest edition of WHATWG Fetch specification". Then, overtime as we agree on modifications to whatwg/fetch, we will create new sections within our document that states the necessary changes. For example, lets pretend we agree to get rid of the entire concept of "Forbidden Headers". Our specification may include a section such as:

Please note this is purely for demonstration purposes. The WinterCG has made no decisions regarding modifications to the Whatwg Fetch API and the content in the following example is purely hypothetical. Do not use this issue thread to discuss the nuance of the example.

### Headers

#### Modification of Forbidden Headers List

Section [2.2.2 Headers #forbidden-header-name](https://fetch.spec.whatwg.org/#forbidden-header-name) of the whatwg/fetch specification states a list of header names that are considered "forbidden". During runtime execution of the Fetch API, usage of a forbidden header results in an early return such as in the [Concept Headers append](https://fetch.spec.whatwg.org/#concept-headers-append) section.

WinterCG Fetch deviates from this section by stating that there are **no** forbidden headers. A WinterCG Fetch API will not return early if it encounters one of these headers.

This option has less maintenance burden as it could essentially stagnate while remaining "up to date". With the catch all statement stating that essentially WinterCG Fetch is WHATWG Fetch unless otherwise noted. The WHATWG Fetch could land changes and unless we need to deviate from those changes, we don't have to modify our specification.

Unfortunately, this also means that if we are not on top of changes to WHATWG Fetch, we could incorrectly be supporting something they add that we want to deviate from. Arguably, implementations don't generally move as quickly as standards. And so even if there is a bit of a lag between us coming to decision on a hypothetical change to WHATWG Fetch, many implementers would already be apart of the conversation and it wouldn't have much impact.

With these two options, please react to this post with which one you prefer more to give us a sense of what folks are preferring. We will also be discussing this at upcoming wintercg calls. When we come to a majority decision I will create the initial proposal draft. In the mean time, we can being making API decisions for WinterCG Fetch - capture the result in issues, and when we eventually get our proposal created, I can add those decisions to the initial draft. Also please feel free to use this issue to discuss details of either option too.

Thank you!

Option 1 - react with: 😄

Option 2 - react with: 🚀

RRS: Resolveable Resource Specifier

The Concept of RRS maps directly to the MOJO::IPC dispatcher concept of ResourceIds that get resolved back to the real Resources

Why?

see: #12 - the comment about urls as replacement

How?

the serviceWorker is the main browser context entrypoint for system modules builtIns it offers API's for that
modifing the URL Implementation inside v8 makingit RRS Compilant with backward compat to whatwg url parsing.

Usecases

Problem: Allow the user to load and use local code with the browser context

it is mainly a concept of exposing dynamic module loading to userland via filePicker or other methods.

Problem: integrating a external graphics context directly into the canvas element or iframeelement videoelement imageelement

Allow Nativ HTML Elements to interact with dynamic registered nativ components.

Creating a fullscreen snapshot if there would be a component called gpu::0 via directly memCopy of the GPU buffer.

<img src="gpu::0"></img>

Internals

The Compositor (Shell, Browser) is able to expose IPC Channels to the serviceWorker / nodejs / deno which is able to translate the gpu::0 string to a real resourceId inside the IPC System and this way get a Handle Back for it to access.

the serviceWorker solves also write locks and all that for multiple context instances.

Successor over fetch proposal

as this allows to implement fetch as is with all its context restrictions as a none restricted fetch would be simply done by the net::https://domain.com/target.file call like it is already done internal on the C++ side.

browser implementations need to request permissions for dynamic module loads while host runtimes do not if they do not want to. it is a Runtime Shell depend.

think about it as Resolve Able Urls that do not get resolved by the Network Stack they are Internal ResolveableReferences defined as string Specifier.

this defines a Winterop component system using RRS as Specifiers Identifiers

Resolving a a module via reference inside v8

// result can be used after that via MessageChannels ports send to the context
// result can be used to supply fetch responses.
serviceWorker.load(fileHandle) // browser secure way without permissions as perimissions come from the fileHandle Request. Only localFiles
serviceWorker.load() // browser secure way without permissions as perimissions come from the fileHandle Request.
serviceWorker.load("net::https") 
serviceWorker.load("fs::/home/path/shared.so")

Error conditions, retry, and Error.cause

Within Workers we have been having a discussion about how to communicate to users via Errors that the conditions leading to an error are temporary and that the user should retry their operation. The how and when to retry is not important here.

For example, a fetch() promise can fail for many reasons. The network path could temporarily be down, the URL could be blocked, the header could be malformated, etc. We want to be able to clearly indicate that the user can/should retry their operation without requiring that the user resort to parsing the error message.

We have several possible paths forward, all of which have the same fundamental problem. We'd like to get consensus on which approach folks would find the most agreeable.

Option 1: New error types

const err = new Error('an error occurred');
Object.defineProperty(err, 'name', { value: 'RetriableError' });

Option 2: Non-standard own properties on `Error`

const err = new Error('an error occurred');
err.retriable = true;

Option 3: Using `cause`

const err = new Error('an error occured', { cause: { retriable: true } })

Option 4: Using `AggregateError`

// The first object is always an error but the additional things communicate
// the additional structured information we want.
const err = new AggregateError([
  new Error('an error occurred'),
  { retriable: true }
])

Option 5: ??

Other ideas?

Current Thinking

My current thinking here is to prefer Option 3, using the cause property.

Specifically, pulling out to a logical level: The purpose of the cause is to communicate the reason for this error. That reason might be that another Error was thrown, or it might be that some other condition occurred. For instance, the network was down, or there was an internal error, etc. So let's differentiate between Error and Condition.

If I have a transient condition and want to communicate that the user should retry their operation, then I could logically do something like:

cont condition = {
  // The condition is temporary....
  transient: true,
  // The operation is retriable...
  retriable: true,
};
const err = new Error('oops that failed', { cause: condition });

The challenge with this, of course, is interoperability. If workers chooses to use cause in this way but other fetch() implementations choose to use cause in other ways then we can run into interop issues. To be clear, ALL of the options suffer from this exact problem.

Proposal

The proposal I would like to make is to define a new ErrorCondition interface specifically for use with cause

Essentially (treat this as a discussion example to express intent... the actual proposal can be refined):

dictionary ErrorConditionInit {
  boolean transient = false;
  boolean retriable = false;
  DOMString name = "";
};

interface ErrorCondition {
  constructor(optional DOMString message = "", optional ConditionInit init = {});
  readonly attribute boolean transient;
  readonly attribute boolean retriable;
  readonly attribute DOMString name;
  readonly attribute DOMString message;
}

Note that this interface intentionally mimics DOMException with the inclusion of a name and message accessors.

Example use (assuming the proposal to add cause to DOMException goes through):

const err = new DOMException('The operation failed', {
  name: 'NETWORK_ERR',
  cause: new ErrorCondition('The network path is down', {
    transient: true,
    retriable: true,
  })
});

console.log(err.cause.transient);  // true
console.log(err.cause.retriable); // true

To be clear, I don't really have strong opinions on exactly how we solve this use case. My only requirement is that we have a mechanism for reliably communicating transient/retriable conditions that is interoperable across runtimes.

Some questions

How are retriable errors like this handled elsewhere on the web?

Passing platform-specific meta information with the `Request`/`Response` instances

Platforms like Cloudflare Workers or Shopify Oxygen may need to add additional information to requests/responses. Cloudflare Workers has a proprietary .cf property which includes fields like geolocation data etc. Shopify's Oxygen runtime has similar needs but uses custom HTTP headers to pass the data.

Since the standard doesn't offer any way to pass extra meta information along with Request/Response instances, it would be great to explore these and similar scenarios to see how the extra meta data may be added. There likely were no use cases for this in the browser world, but on the server side, the situation is different.

Using extra HTTP custom headers might be the most obvious approach, but it has a clear downside that headers have tight size limits, aren't very suitable for holding complex data, and can't contain anything that isn't directly serialisable into a string.

A custom property like .meta or .metadata (which should hold a JavaScript object, leaving its fields up to the vendor implementation) would be more flexible but needs to be standardised.

Standardize access to cookie headers

As outlined at WHATWG/fetch there is often the need in a server environment to access cookie headers.

For example editing Cookie headers:

const h = new Headers;
h.append("Set-Cookie", "a=1; Expires=Wed, 21 Oct 2015 07:28:00 GMT");
h.append("Set-Cookie", "b=1; Expires=Wed, 21 Oct 2015 07:28:00 GMT");

h.get("Set-Cookie")
// a=1; Expires=Wed, 21 Oct 2015 07:28:00 GMT, b=1; Expires=Wed, 21 Oct 2015 07:28:00 GMT

Headers.prototype.getAll would solve this problem, but was removed from the browser specification. Should we standardize on adding it back? Or should we define a new method all together? Ideally we should align with node-fetch and undici as well.

Looking for private communication channel

Hello,

I would like to report a potential vulnerability in the fetch spec.
Is there a private email list or an internal issue tracker that I can submit
the details to?

Thanks in advance and happy new year!

wintercg / fetch Goto Github PK

fetch's People

Contributors

Stargazers

Watchers

fetch's Issues

Proposal

Prior Art

Request:

RequestInit

Response

Implementations:

Aligning Behavior

Potential Solutions:

Option 1

Option 2

Why?

How?

Usecases

Problem: Allow the user to load and use local code with the browser context

Problem: integrating a external graphics context directly into the canvas element or iframeelement videoelement imageelement

Internals

Successor over fetch proposal

Option 1: New error types

Option 2: Non-standard own properties on Error

Option 3: Using cause

Option 4: Using AggregateError

Option 5: ??

Current Thinking

Proposal

Some questions

Recommend Projects

Recommend Topics

Recommend Org

Option 2: Non-standard own properties on `Error`

Option 3: Using `cause`

Option 4: Using `AggregateError`