Coder Social home page Coder Social logo

canvas-formatted-text's Introduction

Welcome!

This is the home for the Formatted Text incubation effort. There are several explainers for this feature:

  1. This readme is a general introduction to the problem space and the motivation for this effort.
  2. Input data model for the API is described in data model.
  3. Output data model or text metrics.
  4. Rendering of the output data model.

Introduction & Challenges

Imperative multi-line text lacking in canvas

Applications like word processors, spreadsheets, PDF viewers, etc., face a choice when moving their legacy presentation algorithms to the web. The view layer of these applications can be designed to output HTML, SVG or use Canvas. Canvas is an expedient choice for some view models since it can easily present data models where the view model and the data model are not tightly coupled. Additionally, the Canvas APIs map well to the imperative APIs the legacy algorithms used for rendering content.

In gaming and mixed reality (VR and AR) scenarios today, Canvas is the only logical choice for presenting a rapidly-changing view.

In these scenarios, it is often necessary to present text to the user. The 2D Canvas API currently provides a relatively simplistic text rendering capability: A single run of text can be measured and rendered as a single line, optionally compressed to fit within a certain width. If the developer needs to present more than just a few words (e.g., a paragraph of text) then things get very complicated very quickly. Some options might include trying a hybrid approach of overlaying HTML elements with the Canvas to leverage HTML's native text rendering and wrapping capabilities, or attempting to write your own line breaking logic in JavaScript using the primitive Canvas text measuring and rendering APIs as a starting point. Neither of these options are very desirable, and the HTML hybrid approach may not be available depending on the scenario (e.g., in VR headsets).

Lack of unified metrics for multi-line text in HTML

In other scenarios, separately from Canvas, applications are interested in getting more detailed text metrics for multi-line text (e.g., from previously rendered text in the DOM). Precise text metrics from the source text can help inform placement of overlays, annotations, translations, etc. Unfortunately, text metrics that help map source text placement in a given layout are not currently provided by the web platform. As a result a variety of workarounds are employed to try to approximate this data today often at the cost of slower performance for users.

It is our aspiration to bring a unified handling of multi-line text metrics to the web platform that will service the imperative data model for canvas scenarios as well as address many other use cases for text metrics in HTML.

Challenges for JavaScript implementations

Why would a JavaScript implementation of line-breaking and wrapping be hard? While this may seem trivial at first glance, there are a number of challenges especially for applications that must be designed to handle rendering of text in multiple languages. Consider the complexity involved in the following requirements for a robust line breaking algorithm:

  • Identify break opportunities between words or Graphemes. Break opportunities are based primarily on the Unicode Spec but also use dictionaries for languages like Thai and French that dictate additional line breaking rules.
  • Identify grapheme clusters. Grapheme clusters are character combinations (such as Diacritics and Ligatures) that result in a single glyph and hence should not be broken up. E.g.: g (Latin small letter G 0067) + ◌̈ (Combining dieresis 0308) =
  • Handle Bidi text. For proper Bidi rendering the bidi level context needs to be considered across lines.
  • Text Shaping and Kerning. These features can affect the measured pixel length of a line.

Javascript libraries could perform line breaking, but as noted above, this is an arduous task. This gets more complicated if text with different formatting characteristics (e.g., size, bold, italic) are needed.

Goals

The browser already has a powerful line breaking, text shaping component used for regular HTML and SVG layout. Unfortunately, this component has been tightly coupled into the browser's rendering pipeline, and not exposed in an imperative way. Furthermore HTML's DOM provides only a very limited set of metrics for text elements and almost no insight into the post-layout details of how the source text was formatted. SVG's DOM is slightly more helpful by providing some additional text metrics.

Our goal is to create an abstraction that allows authors to collect multi-line formatted text into a data model that is independent of the HTML/SVG data model but can still leverage the power of the browser's native line breaking and text shaping component. We think it is valuable to re-use formatting principles from CSS as much as possible. Given the data model, we also want to provide a way to initiate layout and rendering of that data model to a canvas, and likewise provide a way to initiate layout and then read-back the text metrics that describe how the text was formatted and laid out given prescribed layout constraints (without requiring the text to be rendered at all).

Related Work

This proposal builds on a variety of existing and proposed specifications in the web platform, for whose efforts we are very grateful, and from whom we expect to get lots of feedback.

Implementations

Open issues and questions

Please review and comment on our existing open issues.

Alternatives Considered

Fixed Data Model Objects

A previous iteration of this proposal used WebIDL interfaces as containers for a group of text (FormattedText objects), which contained an array of text run objects (FormattedTextRun) with properties for style, etc.

The prior reason for having an express data model was to enable persistent text (e.g., similar to DOM's Text nodes) to be used for memory-resident updates. However, experience with DOM Text nodes helps us understand that in most cases, the Text node object itself is irrelevant. Instead, what is needed is the JavaScript string from the Text node in order to mutate, aggregate, and change the text. In the DOM, such string changes need to be presented via document layout (e.g., "update the rendering"), and the only way to do that is to re-add changed strings into DOM Text nodes in an attached document. For Canvas-based scenarios, rendering is not done through document layout, but with explicit drawing commands (which then paint when rendering is udpated). Therefore having a DOM Text node-like data model really did not add much value.

Furthermore, simplifying the data model was advantageous for performance-critical scenarios that inlcude the creation time of the data model, in addition to layout and rendering optimizations.

Measure/render pattern for intermediate line objects

In a previous iteration of this proposal, we called for a "simple" and "advanced" model for rendering formatted text to the canvas. The advanced model allowed authors to place arbitrary lines by alternately measuring a part of the data model and then rendering the resulting "line metrics" object.

Note: a newer version of this measure/render design has been integrated into the latest iteration of this proposal, although it uses an iterable design pattern. Scenarios that require potentially flowing text through multiple containers (such as implementing custom line positioning for CSS regions/ pagination / multicolumn were the motivation for bringing this capability back despite some of the downsides noted below.

Under that design, we were assuming that authors would want to cache and re-use the line objects that were produced. If authors would not re-use these objects, then no optimization could be made for performance (since we were letting authors do the fragmentation themselves). For simple use cases where the canvas will be re-painted without changes, having authors cache and reuse line objects seemed a reasonable request. However, if authors decide to only re-paint the canvas when things change, then to recapture performance, authors are left to implement their own line invalidation logic or to just throw away all the lines and start from scratch--the worst-case scenario for performance.

We thought about addressing this concern by adding a "dirty" flag to lines that have been invalidated to help the author create an efficient invalidation scheme. But line analysis and caching logic has never been exposed to authors before in the web platform, and we didn't want to create a feature with this foot-gun. The advanced use case was primarily about enabling occusions and supporting things like floaters obstructing lines, and CSS already has standards for those scenarios--when we decided to embrace more CSS constructs for this feature, it was decided that the advanced use case could be dropped entirely.

Note: many of the advanced features possible with this iteration of the proposal require new CSS features that are only recently starting to be interoperably implemented (E.g., CSS Shapes).

Imperative model

The proposal here addresses two separate problems. One of styling ranges of text and having an object model and two of auto wrapping text.

An alternative design considered was to support auto wrapping without requiring that the developer provides all the text upfront. Similar to canvas path, the developer would call setTextWrapWidth( availableWidth ) and follow through with multiple calls to fillText on the canvas context that renders text and advances a cursor forward.

Such an imperative style API was not pursued for two reasons. With bidi text, the entire width of a right-to-left segment needs to be determined before any of the fillText calls can be rendered. This issue can be addressed by adding a finalization step say finalizeFillText(). However still, such an imperative API adds a performance cost in terms of recreating the formatted text when the developer is simply trying to redraw the same text content for a changing available width.

Privacy Considerations

HTML5 canvas is a browser fingerprinting vector (see canvas fingerprinting). Fingerprinting happens through APIs getImageData, toDataURL, etc. that allow readback of renderer content exposing machine specific rendering artifacts. This proposal adds the ability to render multiple lines of text, potential differences in text wrapping across browsers could contribute to additional fingerprinting. The existing implementer mitigations in some user agents that prompt users on canvas readback continues to work here.

We are currently evaluating whether this API would increase fingerprinting surface area and will update this section with our findings. We welcome any community feedback.

Contributors:

(In alphabetical order)

dlibby-, ijprest, sushraja-msft, travisleithead,

canvas-formatted-text's People

Contributors

chrishtr avatar malvoz avatar michael8090 avatar nhelfman avatar sainap avatar travisleithead avatar xfq avatar yjbanov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

canvas-formatted-text's Issues

Why is `format()` async?

It is unclear to me what is it a necessary for making the format() function async. Can we get more information on the rationale?

@nhelfman

assigned style vs computed style

It may be good to differentiate between CSS properties assigned by the author, and whether those properties were applied in any way. For example, to support testing of what properties are/will be applied, and which are ignored by a given implementation.

Would ECMA402 Intl.Segmenter v2 proposal help/conflict/redudent w/ this work?

I am proposing an enhancement of Intl.Segmenter API (adding line break support) to TC39 for ECMA402 (see https://github.com/tc39-transfer/proposal-intl-segmenter-v2 ) The API expose "line break opportunity" of text which implement the
Unicode® Standard Annex #14 UNICODE LINE BREAKING ALGORITHM (see https://www.unicode.org/reports/tr14/)
so it can be used for developer to use it with

multitple line of text rendering in <CANVAS>
jsPDF and other context which need to layout text into multiple lines
SVG multiple lines of text
During the stage advancement discussion, one delegate suggest me to reach out to Houdini to see which one of the following is better

Instead of adding that to ECMA402 as a low level API, leave that part of job (breaking text into multiple lines of text) to your API to add such API to empower developers to use it on <CANVAS>, SVG , or for jsPDF and therefore there are no need to add them into ECMA402, OR
TC39 add such low level API into ECMA402 to allow developers to use that with your APIs, OR
TC39 add such low level API into ECMA402 to allow your API to depends on Intl.Segmenter for the job of finding out lingustic line break opportunity.

Please comment so we can move forward better in TC39 / ECMA402. Thanks

Add an example of using `shape-inside` to handle shape-based line wrapping

An example from the original explainer shows how lines could be made to wrap around a shape (box, logically floated left). This should be very possible with CSS Shapes Level 2 (shape-inside property). Though support is not great at the moment, an example of how this should work would be ideal to cover this common use case.

More efficient way to get total line height

Copied from: MicrosoftEdge/MSEdgeExplainers#370

Based on some early experimentation, developers have discovered that they almost always need to know the total line height (sum height of all lines), in order to calculate vertical placement of the formatted text on the canvas. The simple usage of fillFormattedText does not provide this information, and the advanced API requires iteratively measuring lines (which is slightly cumbersome and has potential performance overhead for all the other unneeded metrics calculations).

It would be nice to have a more convenient way of getting the total height needed for placement.

--

As previously noted in the explainer:

  • Should fillFormattedText (the single shot API) return the total height consumed after drawing lines?

This could be one way of getting the height, but ideally it can be obtained prior to drawing the text on the canvas.

Selection scenarios (render-only styling changes)

In brainstorming sessions around how an author would implement selection of text rendered out through a canvas using this feature, we determined that some additional functionality for rendering a selection is advantagous. This was also pointed out in #27 as a performance drawback:

More expensive text style animations: consider a piece of text with animating color. If the color is supplied through the data model, when the value of the color changes the developer would need to reconstruct the text with new color and re-lay it out. This is more expensive than rerendering previously built and laid out text.

The web platform has taken an interesting turn with CSS Custom Highlighting, in which the notion of arbitrary ranges can be used as markers upon which the web platform may draw a highlight (supporting a very restricted set of CSS properties for styling that don't require shaping or formatting) in which it is not necessary to adjust the DOM by adding spans, etc., to attach the styling. This direction is also important for the FormattedText objects, because (like DOM), there are important use cases for supporting highlighting of ranges or other effects that should not require a full re-format of the input data in order to draw.

But why not just leave selection/highlighting as an exercise to the author?

It seems trivially easy to implement selection in a web app by drawing a semi-transparent rectangle over the text and calling it good. However, native selection often handles a variety of edge cases that aren't usually considered and are much more tricky/involved to do with author-provided JavaScript. Specifically,

  1. Font color inversion: to improve color-contrast, many selections will not only paint a background behind the given text, but will also invert the font-color of the selected text. In order to effect a font-color change, in the FormattedText API as currently specified, the entire text would need to be reformatted with an object representing the new highlighted text surface with alternate font color. This is a heavyweight operation for a simple highlight effect. Complicating matters, breaking the source text up into a separate run for styling may also impact ligatures that were on the boundary of adjacent character input which could result in text formatting size changes causing text to "jump" slightly, or for different glyphs to be selected (non-ligatures).
  2. Overlapping glyhs and ligatures: selection of strongly italicized glphs or ligatures will often apply a clipping rect in order to mark the selection of a partial ligature or the rectangular bounds of a glyph that extends past its regular advance. The selection thus only paints a part of the glyph and not the entire glyph. If text color is changed (see the previous point), then oddness between the selection rect and the font color clipping may result.

In order to make the above complexities easier, to support a variety of highlight effects (a la Custom Highlights) without needing to reformat already formatted text, and to improve the performance of selection scenarios over the text, we think it would be a good idea to provide a primitive to apply "override" selection styles onto a "range" of already formatted text.

On Font preferences and handling missing external fonts

(This is internal Microsoft feedback from a review of the spec by resident expert @PeterCon. Re-posting here with permission.)


Looking at the set of CSS properties supported: you have the shorthand font property, but none of the basic properties (font-family, etc.). You do have font-feature-settings, which is good, but not font-variation-settings.

Speaking of basic font properties…

mike: the font id is useless if I don't know which font it was resovled ot.

In the Web world, fonts have always been vague in the sense that the declarative content can indicate preferences, but only the UA actually knows what is displayed. I would suspect for apps that might want to use these APIs that there would be scenarios in which they need to know what font is actually selected in the context. (Ok, this is where my Web knowledge gets sketchy.) IIUC, Canvas works pretty much the same as HTML wrt handling of font styling, so a (not @font-face) font might be assumed to be present on a platform but turn out not to be present, and even an @font-face font might not be available when rendering occurred. Wouldn’t developers at times want to confirm what font is actually being used to compute the format or to render?

And what happens if the font isn’t available when format() is called but then is available milliseconds later when drawFormattedText() is called? Or is that guaranteed not to happen?

mike: might need something more than just font-family (good note)

… font-family will be a unique identifier…

Font family alone probably won’t be sufficient for all scenarios, given the indeterminacy allowed in the CSS Fonts font matching logic.

Handle n:n relationships between glyphs and character input

From CG meeting on 8/18

Glyph relationships with respect to the source text is a many-to-many relationship. Graphemes not covered. This is a lot of complexity (especially for editing scenarios and caret movement logic). Consider simplifying and focusing on rendering needs only (e.g., size and position) if possible.

In most N-to-N mappings, knowing why they are mapped that way is important (Arabic decomposition different from emoji)

what are the defaults for styles not explicitly specified in the data model?

The explainer documents don't explicitly state this anywhere, and probably should.

I suspect we want the default "font" properties (size, face, etc.) to come from the canvas' 2d context, or at least as much as possible.

What are the defaults for other CSS properties? Do we inherit values from the <canvas> tag? From the body/document? Other?

Thinking aloud: Taking values from any existing element might (?) pose invalidation issues.

Naming of `FormattedText`

(This is internal Microsoft feedback from a review of the spec by resident expert @PeterCon. Re-posting here with permission.)


There’s an oddness to the name of this interface in that these objects need to be created before text is formatted, and even after the format() method is called these objects don’t carry any info that reflects that formatting. It seems like StyledText might be more fitting.

DWrite does have the IDWriteTextFormat interface that is somewhat similar in relation to carrying styling info (but not the text itself). It includes several aspects of paragraph styling relevant to layout, as well as the font styling. But it doesn’t include the layout width. In DWrite, the text content and the layout dimensions are input parameters along with IDWriteTextFormat when an IDWriteTextLayout object is created. So, creating an IDWTL is akin to calling FormattedText.format() and getting back

Supplying fonts for WebGL use-case

When rendering to WebGL, fallback fonts provided by the browser are not available. The application must provide its own fonts. Let's say the app starts with a set of Uint8Arrays containing TTF, perhaps downloaded using fetch():

  • What is the sequence of steps that an app must perform to get to a FormattedTextFragment that's laid out using the provided font set?
  • What happens if no glyph is found in the font set?
  • How is the font chosen for a particular FormattedTextFragment communicated back to the application such that the application is able to render it?

FormattedText size and text decorations

FormattedText provides a width/height of the line, which has two potential meanings in the presence of text decorations like shadows, underlines etc.

In standard web contexts, text decorations do not contribute to text size for the purposes of computing overflow (scrollbars) or otherwise have any impact on how the text is laid out. This is an important principle: text decorations do not affect layout. This argues for not including decorations in the metrics, if only because it would make it possible to reproduce standard HTML line layout when rendered.

But when rendering lines in a canvas, maybe you want to know how much bigger the painted area is when shadows and decorations are applied. In particular, you may wish to prevent visual overlap. Such information could be provided; it is the ink overflow.

The obvious options are to define it as one or the other, or provide both, or make it a parameter. I lean toward defining it as the scrollable overflow size.

Vertical Writing Modes in Canvas Formatted Text

Copied (with discussion) from: MicrosoftEdge/MSEdgeExplainers#372

Text drawing needs to be aware of vertical writing mode to rotate glyphs in some fonts / languages while rendering a line.

The topic needs further investigation.

  • Should metrics be flipped automatically? Are new APIs needed?

(Note existing text APIs in Canvas do not handle this scenario either--a solution would ideally be comprehensive.)


@Malvoz :
/cc @bdon


@kojiishi :

Would it make sense for this to match what CSS renders, by making Canvas aware of writing-mode and text-orientation properties?


@travisleithead

I think finding a way to incorporate those properties into the input to this feature is likely the best way to achieve vertical writing mode.

Currently some values, e.g., "width", and "height" would need to be generalized to make sense in vertical writing modes. I like the idea of reusing some of the terminology in CSS Layout API to do this (as well as aligning on a few other concepts as well).

[IDEA] Hook for Houdini layout worklet callback

As the format operation will trigger layout of the text, it will be symmetric with HTML, if calls to format can also trigger a layout worklet to run on the per-fragment level (synchronously, but off thread) during the API call, so that layout results have been modified on a per-line basis (as with HTML inline-level layout of the same) when the results are returned.

Make wrapWidth on measureFormattedText optional

Copied from: MicrosoftEdge/MSEdgeExplainers#371

In scenarios where text metrics are desired for formatted text, but text wrapping is not needed, the measureFormattedText API must be called with a wrapWidth second parameter of sufficiently large size so that the API doesn't attempt to wrap the text. This is because the scenarios for getting formatted text metrics and for calculating wrapping positions are tied together in this single API call.

Early experimentation suggests it may be useful to make the 2nd parameter to measureFormattedText optional and have it default (when not provided) to +Infinity, such that no line-breaking is attempted when computing line metrics.

Apply styles via 'class' or stylesheet?

(This is internal Microsoft feedback from a review of the spec by resident expert @PeterCon. Re-posting here with permission.)


Might there be any scenarios for setting text styling from a stylesheet using class (or any other potentially relevant selectors)?

efficient raw data for Metrics output

The Metrics output describes how an entire paragraph is broken down into

  • FormattedTextParagraph
  • ... composed of a set of FormattedTextLines
  • ... each composed of a set of FormattedTextFragments

Motivation

For clients that want to draw the Fragments themselves, and/or want to perform hit-testing and editing/selection themselves, more explicit data is needed. Below is a proposal for the necessary data to perform this drawing and/or editing.

interface Typeface {
    // Number or opaque object: Whatever is needed for the client to know exactly
    // what font-resource (e.g. file, byte-array, etc.) is being used.
    // Without this, the glyph IDs would be meaningless.
    //
    // This interface is really an “instance” of the font-resource. It includes
    // any font-wide modifies that the client (or the shaper) may have requested:
    //    e.g. variations, synthetic-bold, …
    //
    // Factories to create Typeface can be described elsewhere. The point here
    // is that such a unique identifier exists for each font-asset-instance,
    // and that they can be passed around (in/out of the browser), and compared
    // to each other.
};

interface FormattedTextFragment {
    // Information to know which font-resource (typeface) to use,
    // and at what transformation (size, etc.) to use it.
    //
    readonly attribute Typeface typeface;
    readonly attribute double size;
    readonly attribute double scaleX?;   // 1.0 if not specified
    readonly attribute double skewX?:    // 0.0 if not specified (could be a bool)

    // Information to know what positioned glyphs are in the run,
    // and what the corresponding text offsets are for those glyphs.
    // These “offsets” are not needed to correctly draw the glyphs, but are needed
    // during selections and editing, to know the mapping back to the original text.
    //
    readonly attribute sequence<unsigned short> glyphIDs;   // N glyphs
    readonly attribute sequence<float> positions;           // N+1 x,y pairs
    readonly attribute sequence<TextIndex> offsets;         // N+1 offsets
};

Descriptions

Each fragment includes a specific Typeface and scaling information. The Typeface is needed, so the client can refer to the exact font asset that was used by the Shaping process, since the positioned glyphs are only meaningful with that asset. The specifics of what a Typeface is are left intentionally vague (for now). The choice likely will need to be coordinated with how "requested" typefaces are specified in the Data Model. The key feature however, is that somehow a Typeface object is sufficient for the client to identify the font assets per fragment.

The glyph IDs and positions (x,y pairs) are straight-forward. This is sufficient to know precisely where to draw each glyph, and sufficient (given other utilities such as glyph-bounds, not provided by this API) to perform hit-testing.

Offsets are not required for drawing, but are needed for any selections or editing, as they provide a back-mapping for each glyph to the (first) unichar that generated that glyph. With this information, and proper grapheme knowledge (e.g. from ICU) a client can properly identify corresponding graphemes and glyphemes (logical clusters for purposes of selections).

Efficiency

The choice to use homogenous arrays (for glyphIDs, positions, offsets) is a deliberate one. Especially for clients calling from WASM modules, have fast access to the underlying data is very important, so the hope is that JavaScript would use TypedArrays to return this data.

Background details for this proposal.

[IDEA] Support inline-block 'spacers' in the data model

In the Houdini layout API, the metrics infra will already need to handle inline-block content in the flow of inline text. Thus a representation of this kind of content will be needed for alignment of the output of format with what is already possible in HTML.

With that, it makes sense to provide inline-block 'spacers' as input to format so that similar in-flow content can be laid out, and later found-and-replaced with actual content, once the metrics are queried. I wouldn't expect the spacers to render anything (they would just reserve space in the flow and participate in wrapping as a whole).

Note: fragmentation of the spacer would need to be considered.

Meet & Discuss - Formatted Text: current state and future?

Hi folks,

With the recent explainer updates to the Formatted Text proposal, I'd like to issue a general call to see if interested parties in the WICG might want to get together for a virtual meeting to discuss/review this API in its current shape and raise issues and thoughts?

Right now only the data model is fairly decently fleshed out; there's been some thought put into the rendering of the data model (primarily for Canvas), and almost nothing proposed for the text metrics part (which is arguably the most interesting bit). With that in mind, I still see value in hearing folks' thoughts on the direction this is headed and even brainstorming a bit into what comes next. For certain, there are many varied use cases and the recent integration of CSS into the data model (I believe) helps to support many of the text layout and rendering use cases that parties have. This seems like a great moment to stop and listen to the community.

Some folks that might be interested in joining:
@bfgeek @fserb @sushraja-msft @tantek @mysteryDate @hober @cynthia @atanassov @tabatkins @rocallahan @stpeter @gregwhitworth @litherum

I imagine sometime in June would be great--I'm available nearly every week Wed-Fri and can make most times work. I suspect perhaps 1 to 1.5 hours would be a time box.

Positions referencing text runs may lead to confusion

FormattedTextPosition - the source reference a FormattedTextRun which is a mutable object (or does it reference a read only copy?). This means it might point to a run which is not the run used to format the paragraph. I think it might lead to confusion and potential bugs. Is it possible to consider a design which follows immutability principle? I assume the reason it is done like that is to reduce potential GC load but is it really such a concern compared to the risk?

@nhelfman

Next Meeting of Formatted Text

After our first successful meeting (#13), there is still lots to discuss and having another meeting on the calendar will be a good forcing function to make progress on the design of the metrics explainer.

Please mark your calendars for August 11, at 1500 UTC (that's 8am Pacific). Meeting page with details is here

@bfgeek @fserb @sushraja-msft @tantek @mysteryDate @hober @cynthia @atanassov @tabatkins @rocallahan @stpeter @gregwhitworth @litherum @yjbanov @nhelfman

Supporting Colour fonts? (Black & White and Colour glyphs?)

(This is internal Microsoft feedback from a review of the spec by resident expert @PeterCon. Re-posting here with permission.)


Rendering: something I don’t see mentioned at all is colour fonts. A given OpenType font can potentially support BW and colour glyphs, and could even support alternate formats for colour glyphs (COLR table, SVG table, …). The choice of colour vs. BW doesn’t affect layout or line metrics (though it might affect glyph bounding boxes); but obviously it affects rendering. Canvas 2D context doesn’t seem to handle this currently itself, so does that constrain you? I may be overlooking something already there, but it seems like a gap.

Bringing back line-at-a-time formatting

By popular demand, and from several internal customer requests (both at Microsoft and at Google), we should support line-at-a-time rendering (see also: #41 which could be addressed by this.)

In recent discussion, it was suggested that something like a

FormattedText.formatLine(...)

might be a good starting point.

Access to font MATH tables and curves for larger glyph variants

I wanted to document an additional use-case - math typesetting. This requires access to the MATH font table which includes, among other things:

  • A large number of math-typesetting constants. E.g. scriptPercentScaleDown denotes how much to scale down when changing from the base text size into a superscript or subscript.
  • A list of larger variants for a glyph. E.g. Σ is rendered larger with the same font size when used as summation.
  • Glyph assemblies for constructing arbitrarily tall or wide versions of a glyph (e.g. stretchy { or ).

Standard text rendering APIs are insufficient for rendering both of these because they consume strings (i.e. a sequence of code points), and:

  • rendering the string only renders the base glyph (smallest variant) and not any of the larger variants, and
  • some glyph assemblies include glyphs that have no corresponding code points.

In both of these cases, we need access to the actual paths that constitutes the glyphs so that they may be rendered to the client target (SVG, Canvas, etc).

[IDEA] Support late-binding styles

As currently envisioned, the API operates on styles and text completely independently of any other context. As a common use case is to render the FormattedText to a canvas, and because the canvas context will have some drawing state already configured (in the form of the font, strokeStyle, etc.) it might be useful to "defer" some style values to whatever is configured on the current canvas. Such values would be specified as "placeholders" and then late-bound to actual canvas state at the time of rendering.

Possible scenarios:

  • animations of color/opacity without needing the formatted text content to be re-formatted with the different styles (display-only changes that don't really impact formatting).

Note there's a small subset of things that wouldn't impact layout of the content, and these would need to be clearly documented.

N:M mappings and other thoughts

(This is internal Microsoft feedback from a review of the spec by resident expert @PeterCon. Re-posting here with permission.)


Graphemes are character combinations… that result in a single glyph and hence should not be broken up.

It’s not true that graphemes always result in a single glyph. It’s very often the case they don’t, and for some scripts (e.g., Indic) it’s usually not the case.

I think the real significance of grapheme clusters, rather, is that they are minimal units for selection, and maybe also for editing. In particular, if you wanted to select a sub-string within a cluster, you would encounter problems with how to display the selection, and need to start supporting split carets.

(Instead of a single caret line, a split caret has two half carets: one to show what the current insertion point precedes, and another to show what it follows. Back in the 1990s, some apps using Apple’s QuickDraw GX APIs implemented split carets, but they have usability problems and didn’t survive.)

Moreover, sometimes multiple grapheme clusters might get displayed as a single glyph. That’s not uncommon for Arabic, for example. OpenType Layout tables support font data that can inform (with certain limitations) where a single ligature glyph can be divided into separate grapheme clusters—so that the app can know where within a ligature it can display a caret.

There was some related discussion in your recent wicg meeting:

Julia: need to consider early in the design process that one character can produce multiple glyphs…
Travis: trying to avoid M:N mapping between characters and glyphs…

You really can’t avoid the M:N mapping. In GDI, the ScriptShape function has an out param that returns a logical cluster array: for each character, what is the index for the corresponding glyph in the result glyph sequence. In DWrite, IDWriteTextAnalyzer::GetGlyphs similarly returns a char/glyph cluster mapping. IDWriteTextLayout is higher level; instead, it has methods for hit testing, such as HitTestTextPosition.

fserb: need to know the reason for the n-to-n mapping…
yjbanov_: Folks may want to render the accent marks with different fonts!

I’ve never heard that as a requirement, and it’s inherently problematic: one font doesn’t have information about how to position its glyphs relative to glyphs from another font.

fserb: there are languages in which the n-to-n mapping doesn't exist.

I’m not sure what was meant; and I’m not sure what it could possibly mean.

Language and direction metadata

I just read the three documents of this incubation experiment. Leveraging the power of the CSS layout engine sounds like a useful way to style text in Canvas.

I wonder if there is a way to associate language and direction metadata with FormattedText and/or FormattedTextRun? See string-meta for more information.

What should constraining the block-progression direction do?

The explainer's data model allows height and width to be specified. Depending on the writing-mode specified on the context object, either of these dimensions could become the block-progression direction. We know that we want the inline direction to wrap content. But what should happen when the limit specified in the block direction is reached?

  • Is there a "fit" model that choses when to overflow the box, already specified (or in draft) in CSS fragmentation? (Partial lines would be "pushed" to a following container/dropped alltogether?)
  • Should content be clipped exactly at that pixel boundary (relevant when rendering)
  • Should content just continue in the block-progression dimension inrrespective of the limit specified (as if it renders offscreen)--effectively ignoring the limit?

This was brought up in review of PR 39

How to handle leading/trailing whitespace?

Copied from: MicrosoftEdge/MSEdgeExplainers#373

How should white space collapsing behavior (like collapsing trailing or leading whitespaces around a line break) be handled? CSS uses white-space and related properties. How might these configuration options be applied to this API?


Current behavior appears to be:

  • Newlines are preserved
  • Spaces are preserved
  • Tabs are collapsed
  • End-of-line spaces are preserved

Suggestion: Ship polyfill that can already be used

As far as I understand the following thing would be very tricky to implement with the existing canvas API:

  • have a very long text that one wants to render at a fixed with 300px
  • if doing so in plain HTML, the browser automatically break the lines based on font information & CSS
  • now wanting to render the paragraph identical in canvas without manually specifying the line breaks would be pretty much impossible with existing APIs

All it currently exposes if one was to parse the original paragraph is all the boundingClientRects for each line, but there is not way of telling which character / word belong to which line.

The proposed API would pretty much solve the problem, yet will probably take quite a long time before it's adopted by all major browsers.

Perhaps if possible, one could already ship a polyfill for the library that would allow to use its functionality before shipped natively in browsers, especially with the focus on that line breaking behaves identicaly to how the browser determines it.

Constraint-only changes

From comment raised by @yjbanov in PR #39:

Have the implementors evaluated the performance implications of this API for the case when the text contents remain the same while the constraints change rapidly? In this case the application would be passing the same array of text runs but pass a new width/height. If caching could provide performance benefits, then the text engine would also have to perform the following extra computation to produce a cache hit:

  • Make a defensive copy of the previously formatted text.
  • Fully traverse the newly input and compare to the previous snapshot.

The engine has to do these things because the text is expressed using plain mutable JavaScript objects (arrays, dictionaries). It's not possible to tell if something inside changed without the extra work.

A common use-case is resizing a window, which transitively resizes text inside. While all the strings and styles remain the same the value of width would change on every frame for multiple paragraphs of text in the UI. Some apps support resizing within the app, such as columns and rows of a spreadsheet.

And my comment in reply:

This is a very good use case to consider and to test for. My expectation at this early stage is that this case would be sub-optimal and that no caching/comparing of deltas between iterations of format are maintained.

These were all good reasons to have a retained data model supported by the platform--changes to the data model could trigger invalidation (or not) and make formatting faster. However, we're making the trade-off with this new approach to remove any retained platform data model and rely on JS strings, and as a consequence, this issue is now very much a potential concern.

There may be an opportunity for a new feature here: given an existing retained metrics object, allow it to be "re-calculated" based only on changing the constraints--no changes to style or text content. Since no new input is needed except for the new constraint it seems possible to make a fast adjustment to the existing formatted metrics and output (or in-place update) new metrics.

let width = 5;
let renderMe = FormattedText.format( "some content...", null, { width: width } );
function renderLoop() {
   if ( width < 100 ) {
      canvasCtx.drawFormattedText( renderMe, 50, 50 );
      width += 5;
      renderMe = renderMe.reflow( { width: width } ); // New API to take the existing rendered content and re-flow it given new constraints
      requestAnimationFrame( renderLoop );
   }
}
requestAnimationFrame( renderLoop );

Concerns about metrics lifecycle

I have some concerns about the metric lifetime recommendation. It feels non-intuitive to invalidate format() return structure without the user requesting to invalidate it. Can we provide an invalidate() API so that authors can have control on the lifecycle? If this is as must, how about adding an API to check is the object is valid so that author can decide if it can be use instead of try/catch?

@nhelfman

Consider stronger separation of concerns in the data model

The data model proposes to use CSS as the source of text properties for layout and styling. Some of the CSS properties only take effect later in the formatting and rendering pipeline. Specifying them too early has a number of ergonomic and performance disadvantages. The runtime semantics of CSS will also likely increase the complexity of implementation.

Ergonomic disadvantages

  • CSS is scoped to support features that are useful when rendering via HTML. For applications that render using non-HTML methods some of these features may be superfluous. This may create confusion when a developer specifies a CSS property but the renderer of their choice ignores it.
  • Conversely, CSS cannot express all possible text styling features that may be needed by non-HTML renderers, such as WebGL. This would require a second method of supplying renderer-specific properties. Having two ways to achieve the same thing would make the API unnecessarily more complex.
  • CSS has its own language (e.g. "color: yellow; font: 15pt Verdana") and types (e.g. StylePropertyMap, and the upcoming Typed OM), that will not match those used by some UI toolkits and will require conversion.
  • Web worker support is important because both WebGL and Canvas2D support them. It is unclear what CSS would mean on web workers, where document is not available and there's nothing for the styles to "cascade" through. This means that only a subset of CSS would be supported, which needs extra documentation and more brain cycles to understand.
  • CSS deals with coordinates and units of length. Not carrying it over to the Formatted Text API would simplify it and make it easier to reason about.

Performance disadvantages

  • More expensive text style animations: consider a piece of text with animating color. If the color is supplied through the data model, when the value of the color changes the developer would need to reconstruct the text with new color and re-lay it out. This is more expensive than rerendering previously built and laid out text.
  • CSS language needs to be parsed. If the app already has text styling data computed, converting it to CSS and then back would waste CPU cycles.
  • CSS style recalculation is unnecessary work in some use-cases (e.g. in Flutter where the renderer receives pre-computed style information).

Complexity of implementation

  • Parts of the CSS engine (language parser; style recalc engine) are required.
  • Extra complexity from having to bring the CSS engine into web workers.

Proposal

tl;dr Start with a pure JavaScript API emphasizing performance over convenience, with no dependency on CSS, DOM, or anything else that implies being on the main UI thread. Add CSS and/or DOM integration as a layer on top of the core JS API. This second layer is available in the main UI thread where document and the CSS engine are present, it adds developer conveniences, such as ability to apply styles via CSS selectors, and compatibility with existing HTML-based web frameworks.

Core layer: pure JavaScript API

This layer is sufficient for the following use-cases:

  • Flutter (and other cross-platform UI toolkits, such as Qt)
  • Browser-based games
  • Maps
  • WebGL-based UI (e.g. Google Earth)

A common theme in these use-cases is that none of them use CSS and HTML as the rendering technology for the UI. For cases that render into WebGL the code may actually run in a web worker and composited via an OffscreenCanvas.

Formatting text has two distinct parts: layout and rendering. The layout part is useful on its own, without rendering. For example, the output of layout can be used to hit-test the text, and to compute the mapping of a scrollbar position to the scroll offset in the content. Rendering does depend on layout, but multiple rendering backends are possible (DOM, canvas, WebGL, server-side), and some of them may want to support alternative layout systems, as well as temporally disconnect layout from rendering (e.g. across threads/workers, across events, shared cache). Let's separate the layout concern from the rendering concern.

The layout API does not have any opinion about rendering properties, whether HTML, Canvas2D, or WebGL. Its API only specifies properties relevant to text layout. The layout API should support annotating text runs with arbitrary data. This data is passed through the layout process and it assigned to the laid out text fragments. A particular UI toolkit annotates text runs with data as it sees fit.

Example:

let text = new FormattedText();
let run = new FormattedTextRun("hello");

// Strictly typed properties relevant to layout
let style = new FormattedTextStyle();
style.fontSize = 12;
style.fontWeight = "bold";
style.letterSpacing = 4;
run.style = style;

// Attach object that's interesting to the rendering system to
// be used to paint this text, but has no effect on text layout.
// The app and/or UI toolkit decides what goes here. It's optional.
run.annotation = {
  color: new MyColor({ red: 0, green: 255, blue: 0, opacity: 0.5}),
};

text.textruns.push(run);

This is more ergonomic because of the simplicity of the API. It is clear about what's relevant to layout and what's not. There's no confusion about the subset of CSS features supported by this API (e.g. CSS matchers, "cascading properties", "property inheritance", "computed style").

This is more performant because it does not require a CSS engine or data conversion. Animation can be cheaply expressed by attaching the animation object instead of a specific color value. The text does not need to be rebuilt or relaid out. FormattedTextStyle contains the layout properties (and only layout properties) of a text. For extra efficiency, the same FormattedTextStyle object can be reused across multiple text runs, and across multiple instances of FormattedText.

The implementation is simpler (and smaller) as there is no need for a CSS engine, particularly in the web worker use-case. An additional benefit of not requiring a CSS engine is that this makes this API viable outside the browser, e.g. Node.js.

CSS layer: integration with HTML DOM on UI thread

This layer can be used by apps that use HTML and CSS for UI rendering. In this case, the input into the system is the CSS/HTML DOM created by the app (typically with the assistance of an HTML-based framework such as React, Vue, Angular, etc). In this mode it is safe to assume the presence of an HTML/CSS engine, as well as the document top-level variable.

When running on the main UI thread, and when building the UI on top of HTML and CSS, an additional API layer is available. This layer allows applying styles that cascade through the document. The API would need to support supplying styles using the CSS syntax (as described in the current proposal).

In addition, the API should allow reading back a FormattedTextParagraph from an HTML element containing (triggering the necessary style recalc and page layout in the process, similar to how getBoundingClientRect does it).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.