Overview & Purpose
The Problem
For live/"DVR" content, it's common to have some indication as to whether or not they are currently playing "at the live edge". However, due to nature of HTTP Adaptive Streaming (HAS), the live edge cannot be represented as a simple point/moment in the media's timeline. This is for a few reasons:
- At the manifest/playlist level, the available live segments are typically known by the client by periodically re-fetching these using in-specification polling rules. These files may also be updated by the server in a discontiguous manner as segments are ready for streaming. Together, since the client does not know when segments will be added by the server, the known advertised "true live edge" will "jump" discontiguously through this process, which needs to be accounted for as a plausible "range" or "window" for what counts as the live edge.
- HAS provides segmented media via a client-side pull based model (most typically, e.g., a GET request), where each segment has a duration. This means that a client must first "see" the segment (via the process described above), then GET and buffer the segment, and then (eventually) play the segment, starting at its start time. Here again, this entails a discontiguous, per-segment update of the timeline, which again needs to be accounted for via a "range" or "window", rather than a discrete point.
- In order to avoid live edge stalls, both MPEG-DASH and HLS have a concept of a "holdback" or "offset," which inform client players that they should not attempt to fetch/play some set of segments from the end of a playlist/manifest. Luckily, this can be treated as an independent offset calculation applied to e.g. the
seekable.end(0)
of a media element, which can then be used as a reference for any other live edge window computation.
(Visual representation may help here)
A concrete sub-optimal (not worst case) but in spec example - HLS:
Let's say a client player fetches a live HLS media playlist just before the server is about to update it with the following values:
# ...
# Unfortunately, EXT-X-TARGETDURATION is only an upper limit (>= any EXTINF duration) after rounding to the nearest integer
#EXT-X-TARGETDURATION: 5
# Client side "LIVE EDGE" will be 5.46 seconds into the segment below, aka 3 * 5 (target duration) = 15 seconds from the playlist end duration
# NOTE: Assume playback begins at the beginning of the segment below, since some client players choose to do this to avoid stalling/rebuffering, meaning playback starts -5.46 seconds from the "LIVE EDGE"
#EXTINF:5.49
#EXTINF:4.99
#EXTINF:4.99
#EXTINF:4.99
The server then ends up updating the playlist with two larger-duration segments (in spec and happens under sub-optimal but not unheard of conditions) before the client re-requests the playlist after 4.99 seconds (the minimum amount of time the player must wait) and continues re-fetching the available segments, with an updated playlist of:
# ...
#EXT-X-TARGETDURATION: 5
# NOTE: Current playhead will be 4.99 seconds into the segment below, assuming optimal buffering and playback conditions at 1x playback speed
#EXTINF:5.49
#EXTINF:4.99
#EXTINF:4.99
# New Client side "LIVE EDGE" will be 0.97 seconds into the segment below, aka 3 * 5 (target duration) = 15 seconds from the playlist end duration
#EXTINF:4.99
#EXTINF:5.49
#EXTINF:5.49
In this example, playback started 5.46 seconds behind the computed "LIVE EDGE" and, after a single reload of the playlist, ended up 11.45 seconds behind the next computed "LIVE EDGE" without any stalls/rebuffering. Note that, even in this example, we do not account for round trip times (RTT) for fetches, time to parse playlists, times to buffer segments, initial seeking of the player's playhead/currentTime
, and the like. Note also that, even without those considerations, the playhead still ends up > 2 * TARGETDURATION behind the "LIVE EDGE".
The solution
Since this information can be derived from a media element's "playback engine"/by parsing the relevant playlists or manifest, the extended media element should have an API to advertise what the live edge window is for a given live HAS media source. Call this the "live window offset"
Additionally, due to consideration (3), above, we should treat the seekable.end(0)
as the end time of a live stream accounting for the per-specification "holdback" or "delay".
Proposed API
Constrained meaning of seekable.end(0)
as "live edge" (with HOLD-BACK/etc) for HAS
To account for the distinction between the live edge duration of the media stream as advertised by the playlist or manifest vs. the latest time a client player should try to play, based on per-specification rules and additional information also provided in the playlist or manifest, extended media elements SHOULD set the seekable.end(0)
value to account for this offset. This shall be assumed for all computations of the "live edge window", where seekable.end(0)
will be the presumed "end" of the window/range, already taking into account the aforementioned offset. With these offsets presumed, seekable.end(0)
may be treated as synonymous with a client player's "live edge" and these terms should be treated as interchangeable in this initial proposal.
For RFC8216bis12 (aka HLS)
- "Standard Latency" Live
seekable.end(0)
should be based on the inferred or explicit HOLD-BACK
attribute value, where:
HOLD-BACK
The value is a decimal-floating-point number of seconds that indicates the server-recommended minimum distance from the end of the Playlist at which clients should begin to play or to which they should seek, unless PART-HOLD-BACK applies. Its value MUST be at least three times the Target Duration.
This attribute is OPTIONAL. Its absence implies a value of three times the Target Duration. It MAY appear in any Media Playlist.
- Low Latency Live
seekable.end(0)
should be based on the explicit PART-HOLD-BACK
(REQUIRED) attribute value, where:
PART-HOLD-BACK
The value is a decimal-floating-point number of seconds that indicates the server-recommended minimum distance from the end of the Playlist at which clients should begin to play or to which they should seek when playing in Low-Latency Mode. Its value MUST be at least twice the Part Target Duration. Its value SHOULD be at least three times the Part Target Duration. If different Renditions have different Part Target Durations then PART-HOLD-BACK SHOULD be at least three times the maximum Part Target Duration.
For ISO/IEC 23009-1 (aka "MPEG-DASH")
- "Standard Latency" Live
seekable.end(0)
should be based on the explicit MPD@suggestedPresentationDelay
(OPTIONAL) attribute, when present, otherwise it may be whatever the client chooses based on its implementation rules. Per the spec:
it specifies a fixed delay offset in time from the presentation time of each access unit that is suggested to be used for presentation of each access unit... When not specified, then no value is provided and the client is expected to choose a suitable value.
- From §5.3.1.2 Table 3 - Semantics of
MPD
element
(NOTE: there may be additional suggestions/recommendations available via the DASH IOP)
- Low Latency Live
seekable.end(0)
should be based on the ServiceDescription -> Latency@target
attribute. Note that this value is an offset not of the manifest timeline, but rather of the (presumed NTP or similarly synchronized) wallclock time. Per the spec:
The service provider's preferred presentation latency in milliseconds compared to the producer reference time. Indicates a content provider's desire for the content to be presented as close to the indicated latency as is possible given the player's capabilities and observations.
This attribute may express latency that is only achievable by low-latency players under favourable network conditions.
(NOTE: This implies that the value could change marginally over time based on precision and other wallclock time updates based on the runtime environment. However, since these differences should be minor, it's likely fine to treat this value as static for the case of this document and can likely be implemented as such in an extended media element)
liveWindowOffset
Definition
An offset or delta from the "live edge"/seekable.end(0)
. An extended media element is playing "in the live window" iff: mediaEl.currentTime > (mediaEl.seekable.end(0) - mediaEl.liveWindowOffset
).
Possible values
undefined
- Unimplemented
NaN
- "unknown" or "inapplicable" (e.g. for streamType = "on-demand"
)
0 <= x <= Number.MAX_SAFE_INTEGER
- known stable value for current stream
Recommended computation for RFC8216bis12 (aka HLS)
- "Standard Latency" Live
liveWindowOffset = 3 * EXT-X-TARGETDURATION
Note that this is a cautious computation. In many stream + playback scenarios, 2 * EXT-X-TARGETDURATION
will likely be sufficient. However, with this less cautious value, there may be edge cases where standard playback will "hop in and out of the live edge," so recommending the more cautious value here.
- Low Latency Live
liveWindowOffset = 2 * PART-TARGET
Unlike "standard" segments (#EXTINF
s), parts' durations must be <= #EXT-X-PART-INF:PART-TARGET
(without rounding). Also unlike "standard," HLS servers must add new partial segments to playlists within 1 (instead of 1.5) Part Target Duration after it added the previous Partial Segment. This means that, even under sub-optimal conditions, low latency HLS should end up with a much smaller liveWindowOffset
.
Recommended computation for ISO/IEC 23009-1 (aka "MPEG-DASH")
TBD
Open Questions
- What should we actually call the property?
- In #4, we decided to call the numeric value representing a live "DVR" window the
targetLiveWindow
. Since this value represents a window for the "live edge" and not for "available live content to seek through/play", having both refer to the "live window" will likely be confusing. In the current related preliminary implementation in Media Chrome, we refer to the related attribute as the livethreshold
. Should that be the name here as well? Do we want the name to try to capture the fact that this is an "offset" value from the "live edge"/seekable.end(0)
?
- Distinct event or repurposed event?
- The above proposal makes no mention of a corresponding
livewindowoffsetchange
event. While we cannot likely rely on any of the built in HTMLMediaElement
events, we should be able to guarantee computation of the relevant values before dispatching the streamtypechange
event, as documented in #3. Is this repurposing of the event acceptable? Should we consider a more generic event name that more clearly relates to states announced for stream type, DVR, live edge window offset, and potentially additional future properties/state?