During one of our design review, one of our colleague question why we name this API as

The initial question of this issue was resolved by <a class="commit-link" data-hoverca

Closing per <a class="issue-link js-issue-link" data-error-text="Failed to load title"

Should "breakType" rename to "segmentType" about proposal-intl-segmenter HOT 5 CLOSED

FrankYFTang commented on September 23, 2024 1

Should "breakType" rename to "segmentType"

from proposal-intl-segmenter.

Comments (5)

littledan commented on September 23, 2024

Good idea. (Or, should we call it, Intl.Breaker???) Added to the October 2018 Intl meeting agenda. https://github.com/tc39/ecma402/blob/master/meetings/agenda-2018-10-18.md

from proposal-intl-segmenter.

gibson042 commented on September 23, 2024

UAX #29 employs the following vocabulary:

[significant] text element: a sentence, word, or user-perceived character
segmentation: the process of boundary determination
boundary: a transition point between two segments
segment: synonym for [significant] text element
break: synonym for boundary
grapheme cluster: an algorithmically-defined approximation of a user-perceived character

UAX #14 adds:

line break: a position in text where one line ends
[line] break opportunity: a position in text where a line is allowed to end
mandatory break: a character property that requires an immediately following line break

Since this proposal is derived from those technical reports, it would be nice if the interface introduced by it hewed as closely to them as practical. ICU demonstrates that there is value in providing detail beyond the mere position of boundaries, but taking its interface (which targets low-level languages and has grown organically in specialized directions) as gospel seems like a mistake. And model accuracy is also important... boundaries don't have properties of their own, but their preceding segments do (and in combination rather than as partitioners, cf. getRuleStatusVec)—even mandatory vs. optional line break opportunities (or "hard" vs. "soft" in ICU vocabulary) are determined by whether or not the last character of the preceding segment is a terminator.

I'm not sure this proposal should include reflection of segment characteristics, but if it does then we should avoid the singular "type" altogether, in anticipation of future extensions describing segments by multiple dimensions (e.g., a word being foreign to the segmenter locale, having code points from multiple general categories, etc.). Do we want granularity-specific properties (e.g., mandatory: true or terminatingPunctuation: "!")? Or perhaps an array or set that is always present and contains granularity-specific values (e.g., segmentTags: ["word"])? But if you're worried about performance, it might be best to leave such determinations out of the implementation itself, or make them opt-in at iterator construction time.

from proposal-intl-segmenter.

gibson042 commented on September 23, 2024

The current text does a poor job of defining what breakType is. Possible values seem to describe segments rather than boundaries, and it is not specified to which boundary-adjacent segment they correspond with. This is especially confusing for backwards iteration—what is the proper value of breakType after (new Intl.Segmenter("fr", {granularity: "word"})).segment("Ceci n'est pas une pipe").preceding(8)? There's also the issue of a missing definition for "numbers, letters, kana characters, ideographic characters, etc" and "sentence terminator ('.', '?', '!', etc.)".

I am in favor of removing breakType because it is easy for consumers to check the break-preceding code unit at index - 1 on their own, but if breakType or a renamed equivalent remains then it needs a better and more complete specification.

from proposal-intl-segmenter.

gibson042 commented on September 23, 2024

The initial question of this issue was resolved by 242ce14.

from proposal-intl-segmenter.

littledan commented on September 23, 2024

Closing per #72

from proposal-intl-segmenter.

Should "breakType" rename to "segmentType" about proposal-intl-segmenter HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent