This could be entirely separate to the cache API and be called "background-fetch". The

I would prefer we spec'd this something like: Requests are dow

Should this be independent of the cache API? about background-fetch HOT 23 CLOSED

jakearchibald commented on May 13, 2024

Should this be independent of the cache API?

from background-fetch.

Comments (23)

asutherland commented on May 13, 2024 2

Background caching a movie, but I'd like to start watching it now it's 90% fetched.

It makes sense for your background-fetch API to expose a list of the pending downloads and their progress. This needs to be tracked anyways and there are UX benefits for the user.

It seems like adding this introspection for all fetches is just asking for trouble. In the requests for the ability to introspect pending requests in the SW repo, the requests seem motivated by a lack of understanding of or confidence in the HTTP cache. The SW spec could likely do with more references to http://httpwg.org/specs/rfc7234.html or similar to help make it clear that the HTTP cache exists and it knows how to unify requests and is generally very clever.

For the movie use-case, knowing the download is 90% complete should provide confidence that the HTTP cache is sufficiently primed that straightforward use of the online URL can occur. Because of the range requests issue, it seems like providing a Response from background-fetch may be the wrong answer until the file is entirely complete. I suspect it may be worth involving media/network experts for this specific scenario.

from background-fetch.

jakearchibald commented on May 13, 2024

I decided against this because I didn't want to create yet another storage system in the browser, and instead lean on the request/response store we already have.

Another question that came up is "can we give access to the in-progress response?" for cases when you have enough of a podcast to play. Having this feature in the cache API would be cool too, so saves us having to define it twice.

from background-fetch.

wanderview commented on May 13, 2024

I would prefer we spec'd this something like:

Requests are downloaded by the browser in the background
Background downloads are stored on disk and count against the domain quota limits
Responses are provided to the script in bgfetchcomplete.
Responses are deleted from the background download disk location after bgfetchcomplete's waitUntil() resolves. The js script has to store it somewhere if they want to keep it. Cache API is a natural choice.

I would like this approach for both implementation and spec reasons.

From an implementation point of view we probably don't want to write directly to Cache API anyway. Cache API does not support restarting downloads. It would make more sense to download to http cache or another disk area in chunks. We can then restart the download at the last chunk if we need to. At the end we stitch it all together and send it where it needs to go.

From a spec perspective, writing to Cache API would raise these questions:

When is the named Cache created? At the beginning of the bg download or at the end?
When is the name Cache.put() operation initiated? I believe we want to have ordered writes, so this impacts js script.
What happens if js script calls caches.delete() with the same cache name as the background download? I assume it would still write to the Cache object and then it would be deleted after the Cache DOM reflector is GC'd. (This is what happens if js does this to itself.)

I imagine we would probably spec things to open and do the Cache.put() when the download is complete. If we are going to do that, we might as well let the js script decide what to do with the Response.

Anyway, just my initial thoughts.

from background-fetch.

jakearchibald commented on May 13, 2024

I imagine we would probably spec things to open and do the Cache.put() when the download is complete.

Agreed. And this means my "in-progress" response idea doesn't really work. We'd be better off making a general way to get pending fetches from same-origin fetch groups.

If we are going to do that, we might as well let the js script decide what to do with the Response.

I started off with background-fetch and thought I was simplifying standardisation and implementation by rolling it into the cache. If it isn't doing that, I'm happy to split it back up. Background-fetch is a more meaningful name too.

My gut instinct is developers won't much care about the extra step for adding to the cache.

from background-fetch.

wanderview commented on May 13, 2024

We'd be better off making a general way to get pending fetches from same-origin fetch groups.

Why do we need this?

from background-fetch.

jakearchibald commented on May 13, 2024

Background caching a movie, but I'd like to start watching it now it's 90% fetched.

from background-fetch.

wanderview commented on May 13, 2024

I guess I'd rather put a getter on the background download registration to get a Response for the in-progress fetch.

I don't think a window or worker would be in the same "fetch group" as this background thing (per my understanding of gecko load groups, anyway).

from background-fetch.

jakearchibald commented on May 13, 2024

Yeah, that's why I said "same-origin fetch groups". The reason I'm pondering around making this general is we've seen a few requests for knowing about general in-progress fetches in the service worker repo.

FWIW I think we can make the 90% playback case v2 (but the kind of v2 we actually do).

from background-fetch.

jakearchibald commented on May 13, 2024

This feedback is great. Interested to hear from other implementers, but leaning towards making this background-fetch rather than background-cache.

from background-fetch.

asutherland commented on May 13, 2024

I've raised a (hopefully!) coherent request for feedback from Firefox/Gecko network and media experts on the Mozilla dev-platform list at https://groups.google.com/forum/#!topic/mozilla.dev.platform/C2CwjW9oPFM

from background-fetch.

wanderview commented on May 13, 2024

My testing suggests that firefox http cache does not re-use any in-progress requests from http cache. See:

Edit: Don't click this unless you want to download 200+ MB!

https://people.mozilla.org/~bkelly/fetch/http-cache/

from background-fetch.

wanderview commented on May 13, 2024

Andrew pointed out my file was too big. We have some size thresholds in our http cache that was preventing the in-progress request sharing from working. I've updated it now to use a 10MB file which does get the request sharing:

(downloads 30MB on FF and maybe 50MB on other browsers with fetch)

https://people.mozilla.org/~bkelly/fetch/http-cache/

from background-fetch.

jakearchibald commented on May 13, 2024

Sooooo this kind of thing isn't good for video/podcasts?

from background-fetch.

wanderview commented on May 13, 2024

Well it means a getter on the background download request is a good idea. For this reason and also for requests restarted after browser shutdown, etc.

The http cache heuristics are tuned for the common request cases.

from background-fetch.

wanderview commented on May 13, 2024

Maybe one of the network people will comment, but I think the size threshold is there due to the constrained cache size. If any single resource is a large enough percentage of the total http cache, then the cache becomes much less useful in general. You don't want to evict 25℅ of the cache for a single video file.

I think anyway.

from background-fetch.

jakearchibald commented on May 13, 2024

Yeah, a getter would solve this, and it's something we can add later as long as we keep it in mind. I'm just worried that we're going to end up needing to create the same thing for the cache API.

from background-fetch.

jduell commented on May 13, 2024

I think the size threshold is there due to the constrained cache size....
You don't want to evict 25℅ of the cache for a single video file.

Exactly. We have a rule of thumb right now that we don't store resources larger than 50 MB in the HTTP cache. (Back in the days when the entire HTTP cache was 50 MB max, the rule was nothing larger than 1/8 of the entire cache, and IIRC that's still true for mobile if the cache there is set to be small enough). It's quite likely that we could bump that limit up by possibly a lot if it's useful.

The old HTTP cache couldn't start reading a resource that was being written until the write ended. I know we put a lot of effort into fixing that in the new cache (I also seem to recall that there are at least some cases where we still can't do it, but I think most of the time we can--I can check with the cache folks).

We don't have an API right now that lets you know when, for instance, enough of a video file has been stored in the cache to make playing the video possible. But we could add one if needed.

The HTTP cache right now doesn't count towards quota limits--that might be an issue?

Happy to talk more about this, or you can contact Honza Bambas and/or Michal Novotny directly.

from background-fetch.

wanderview commented on May 13, 2024

The HTTP cache right now doesn't count towards quota limits--that might be an issue?

Thats not a problem. This background-fetch thing is different than normal http cache. It could be implemented in http cache, but not necessary.

The question was more if we needed an API to "get in-progress requests" in general. For most requests I think this is overkill and the http cache semantics already DTRT.

from background-fetch.

mayhemer commented on May 13, 2024

Wait... what are you talking about here? One of the goals stated is:

"Allow the OS to handle the fetch, so the browser doesn't need to continue running"

Then I don't understand why Necko should at all be involved in such a fetch or upload and why we are testing behavior of the Necko HTTP cache at all.

Also remember that DOM Cache (serviceworkers APIs) is completely separated from the Necko's HTTP cache. It uses a different storage area (disk folder) and different storage format. What I mean is that moving from http cache to dom cache might not be a trivial task.

But, if that above mentioned goal is something "in the stars", then I still don't think you should rely on the HTTP cache. The response and the physical data has to end up in the dom cache. We had similar discussion when DOM cache was being developed, and the final and only logical :) conclusion was to not use/rely on HTTP caching at all.

from background-fetch.

jakearchibald commented on May 13, 2024

Ok, so we'd likely add a "get in-progress" API for background fetch. Are we likely to need this for the cache API too, and does that warrant merging these APIs? We could look at this at TPAC.

from background-fetch.

asutherland commented on May 13, 2024

Then I don't understand why Necko should at all be involved in such a fetch or upload and why we are testing behavior of the Necko HTTP cache at all.

I've been raising the HTTP cache issue because:

I don't think we want to encourage Service Worker authors to duplicate functionality HTTP caches are already performing. In issues like w3c/ServiceWorker#959 there's been discussion of exposing in-flight DOM Cache/fetch requests for use cases that I believe are already covered by the HTTP cache.
Playback of actively-downloading media files seems like it is potentially much more complex than only providing the completed download. Specifically, I would expect the desired UX is to allow random-access seeking like if the file were entirely served from online. The DOM Cache currently has no concept of files that are still streaming in. A scenario where the user seeks to well-beyond the current download position seems like something Gecko's HTTP cache (and others) are more likely to handle well, or is a better location to handle it rather than duplicating large swathes of similar logic. So I wanted feedback about this.

@mayhemer It's sounding like the answer is indeed to stay out of the HTTP cache for background-fetch, but I figured it was worth asking rather than assuming. And it would be great if we could determine whether Firefox/Gecko might need to do something like "the background-fetch in-progress Response snapshots the existing download and new bytes won't magically show up until you caller the getter again" or not. If the answer is going to be very Gecko-specific and doesn't have spec implications, maybe we should take this to the Mozilla dev-platform thread.

"Allow the OS to handle the fetch, so the browser doesn't need to continue running"

I've been reading requirements like this as a combination of:

Indicating that the SW should not need to be alive/active for the download.
Reflecting the implementation desires of browsers like MS Edge where the browser vendor also is the operating system vendor and the architecture leverages that. For example, MS has expressed a desire to be able to service push notifications in a SW in a non-browser process that is not the same SW instance that would service "fetch" requests issued in a browser context. (Or at least that's my interpretation.)

I would expect that in Firefox/Gecko we would implement this entirely in the browser and expose the downloads via browser chrome using the existing downloads UI.

from background-fetch.

rocallahan commented on May 13, 2024

Background caching a movie, but I'd like to start watching it now it's 90% fetched.

Authors could use MSE for playback and break the resource into chunks. It sounds like that would solve this problem.

from background-fetch.

jakearchibald commented on May 13, 2024

Done ead8574

from background-fetch.

Should this be independent of the cache API? about background-fetch HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent