It's a bit strange for each segmentIterator to expose some but not all of the data ret

Well, none of the ECMA built-in iterators expose any state at all beyond a <code

One reason is that we need preceding and following methods, see <a class="issue-link j

I don't actually understand what's incomplete about <a class="issue-link

Cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Incomplete amount of exposed state on %SegmentIteratorPrototype% about proposal-intl-segmenter HOT 10 CLOSED

tc39 commented on June 22, 2024

Incomplete amount of exposed state on %SegmentIteratorPrototype%

from proposal-intl-segmenter.

Comments (10)

littledan commented on June 22, 2024

Could you say why (beyond that it's "strange")? I'd like to make that iterator return objects that are not invalidated by future next() calls.

from proposal-intl-segmenter.

gibson042 commented on June 22, 2024

Well, none of the ECMAScript built-in iterators expose any state at all beyond a next method returning ephemeral results, and that pattern should only be broken with good cause. There's a rough analog in the form of lastIndex on RegExp instances (which predates ES iterators), but even that is limited to next start position and includes no information about the last match.

What makes this API so special that it demands auto-memoization of next results, and only covering a subset of their data at that?

from proposal-intl-segmenter.

littledan commented on June 22, 2024

One reason is that we need preceding and following methods, see #9 . Another is performance concerns (I am having trouble finding that thread). Please, you can disagree, but don't assume that all of this is just in place by accident.

from proposal-intl-segmenter.

gibson042 commented on June 22, 2024

%SegmentIteratorPrototype%.breakType was added without explanation in 7f8b345 two years ago, and it's hard to debate unstated reasons. But I don't assume that it was accidental, and I'm sorry if i gave that impression... I'm just suggesting that behavior shouldn't diverge from analogous APIs like %StringIterator% without explicit justification.

I don't dispute that arbitrary-index preceding and following methods can be useful, although to be honest I'd have a hard time coming up with sufficient justification and would love to see a realistic application in the FAQ. But although even next requires internal position tracking, none of the three methods need or even benefit from exposing it on the iterator itself (as opposed to only in the iterator results), let alone taking the further step of memoizing breakType.

As for performance, I don't want to get into that guessing game but will observe that if bypassing object allocations were a strong concern (which I'd argue against anyway), then all the result properties should be mirrored as accessors on the iterator, rather than just two of them (in particular, segment itself is currently absent).

from proposal-intl-segmenter.

littledan commented on June 22, 2024

If you want to avoid these kinds of impressions, you could start by asking why, rather than filing a bug claiming incorrectness.

Segment is just a convenience property, which you can calculate based on the string, the position before, and the position after. It is omitted to avoid that allocation.

PRs welcome to improve the documentation to summarize the result of #9.

from proposal-intl-segmenter.

gibson042 commented on June 22, 2024

If you want to avoid these kinds of impressions, you could start by asking why, rather than filing a bug claiming incorrectness.

Updated to replace "incorrect" with the more accurate "incomplete". But I try not to phrase issues as questions because the resolution shouldn't be an answer, it should be either an update to the explainer or an update to the spec text—and it's impossible to determine which without an issue to capture discussion. I'm happy to adapt to whatever patterns you prefer, though... where would you like to see such questions?

PRs welcome to improve the documentation to summarize the result of #9.

You keep asking me to submit PRs explaining decisions, but I can't do that for decisions that didn't come with reasons. #9 requested preceding and following, but there is no example code showing how those methods pay for theirselves in realistic situations. And this issue isn't even about that, it's mostly about %SegmentIteratorPrototype%.breakType (which sprang into existence with no GitHub discussion at all).

Segment is just a convenience property, which you can calculate based on the string, the position before, and the position after. It is omitted to avoid that allocation.

The iterator object has internal slots for position and break type, and everything else is derived from those—the iteration result currently has explicit segment, breakType, and position data properties, and (the topic of this issue) the iterator itself has position and breakType getters. Both of those accessors seem to be convenience properties, but no allocations take place until they are invoked (which seems to be moot anyway, since CreateIterResultObject itself necessitates allocations).

I'll open some PRs to clarify what I'm talking about.

from proposal-intl-segmenter.

littledan commented on June 22, 2024

I'm not trying to put the burden on you to make PRs, though I'd really appreciate your help. If no one gets around to it, I hope to eventually come back and do it.

I don't actually understand what's incomplete about #9, or what kind of thing would make them "pay for themselves". What makes them expensive?

About breakType, the rationale is to enable segmentation, with the user checking the breakType (e.g., soft vs hard line break) without the overhead of the iteration protocol and also with the flexibility of preceding and following methods.

from proposal-intl-segmenter.

gibson042 commented on June 22, 2024

I don't actually understand what's incomplete about #9

Nothing. This issue is totally unrelated to #9. It is about duplicating information from iteration results on the iterator itself—specifically, breakType and position but not segment.

or what kind of thing would make them "pay for themselves". What makes them expensive?

They are expensive in terms of the cognitive burden and spec complexity of Intl segment iterators being different from ES string iterators and every other built-in iterator, none of which directly expose state.

About breakType, the rationale is to enable segmentation, with the user checking the breakType (e.g., soft vs hard line break) without the overhead of the iteration protocol and also with the flexibility of preceding and following methods.

That sounds like premature optimization, which someone once called "the root of all evil (or at least most of it) in programming". The benefit of avoiding object allocations comes at the cost of introducing significant internal inconsistency in the form of properties that have no analogues on otherwise similar iterators and—in the case of position—an entirely different meaning from the common and already-established convention of identifying an index after the last match.

Couldn't we at least try the simple conventional interface first? If this complexity is actually worthwhile, then perhaps it should be added across the board rather than limited to a single built-in iterator.

from proposal-intl-segmenter.

littledan commented on June 22, 2024

Cc @sebmarkbage who raised the performance issue IIRC

from proposal-intl-segmenter.

littledan commented on June 22, 2024

I removed segment, so this issue should be fixed.

from proposal-intl-segmenter.

Incomplete amount of exposed state on %SegmentIteratorPrototype% about proposal-intl-segmenter HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent