Comments (10)
Could you say why (beyond that it's "strange")? I'd like to make that iterator return objects that are not invalidated by future next()
calls.
from proposal-intl-segmenter.
Well, none of the ECMAScript built-in iterators expose any state at all beyond a next
method returning ephemeral results, and that pattern should only be broken with good cause. There's a rough analog in the form of lastIndex
on RegExp instances (which predates ES iterators), but even that is limited to next start position and includes no information about the last match.
What makes this API so special that it demands auto-memoization of next
results, and only covering a subset of their data at that?
from proposal-intl-segmenter.
One reason is that we need preceding and following methods, see #9 . Another is performance concerns (I am having trouble finding that thread). Please, you can disagree, but don't assume that all of this is just in place by accident.
from proposal-intl-segmenter.
%SegmentIteratorPrototype%.breakType
was added without explanation in 7f8b345 two years ago, and it's hard to debate unstated reasons. But I don't assume that it was accidental, and I'm sorry if i gave that impression... I'm just suggesting that behavior shouldn't diverge from analogous APIs like %StringIterator% without explicit justification.
I don't dispute that arbitrary-index preceding
and following
methods can be useful, although to be honest I'd have a hard time coming up with sufficient justification and would love to see a realistic application in the FAQ. But although even next
requires internal position tracking, none of the three methods need or even benefit from exposing it on the iterator itself (as opposed to only in the iterator results), let alone taking the further step of memoizing breakType
.
As for performance, I don't want to get into that guessing game but will observe that if bypassing object allocations were a strong concern (which I'd argue against anyway), then all the result properties should be mirrored as accessors on the iterator, rather than just two of them (in particular, segment
itself is currently absent).
from proposal-intl-segmenter.
If you want to avoid these kinds of impressions, you could start by asking why, rather than filing a bug claiming incorrectness.
Segment is just a convenience property, which you can calculate based on the string, the position before, and the position after. It is omitted to avoid that allocation.
PRs welcome to improve the documentation to summarize the result of #9.
from proposal-intl-segmenter.
If you want to avoid these kinds of impressions, you could start by asking why, rather than filing a bug claiming incorrectness.
Updated to replace "incorrect" with the more accurate "incomplete". But I try not to phrase issues as questions because the resolution shouldn't be an answer, it should be either an update to the explainer or an update to the spec text—and it's impossible to determine which without an issue to capture discussion. I'm happy to adapt to whatever patterns you prefer, though... where would you like to see such questions?
PRs welcome to improve the documentation to summarize the result of #9.
You keep asking me to submit PRs explaining decisions, but I can't do that for decisions that didn't come with reasons. #9 requested preceding
and following
, but there is no example code showing how those methods pay for theirselves in realistic situations. And this issue isn't even about that, it's mostly about %SegmentIteratorPrototype%.breakType
(which sprang into existence with no GitHub discussion at all).
Segment is just a convenience property, which you can calculate based on the string, the position before, and the position after. It is omitted to avoid that allocation.
The iterator object has internal slots for position and break type, and everything else is derived from those—the iteration result currently has explicit segment
, breakType
, and position
data properties, and (the topic of this issue) the iterator itself has position
and breakType
getters. Both of those accessors seem to be convenience properties, but no allocations take place until they are invoked (which seems to be moot anyway, since CreateIterResultObject itself necessitates allocations).
I'll open some PRs to clarify what I'm talking about.
from proposal-intl-segmenter.
I'm not trying to put the burden on you to make PRs, though I'd really appreciate your help. If no one gets around to it, I hope to eventually come back and do it.
I don't actually understand what's incomplete about #9, or what kind of thing would make them "pay for themselves". What makes them expensive?
About breakType
, the rationale is to enable segmentation, with the user checking the breakType
(e.g., soft vs hard line break) without the overhead of the iteration protocol and also with the flexibility of preceding
and following
methods.
from proposal-intl-segmenter.
I don't actually understand what's incomplete about #9
Nothing. This issue is totally unrelated to #9. It is about duplicating information from iteration results on the iterator itself—specifically, breakType
and position
but not segment
.
or what kind of thing would make them "pay for themselves". What makes them expensive?
They are expensive in terms of the cognitive burden and spec complexity of Intl segment iterators being different from ES string iterators and every other built-in iterator, none of which directly expose state.
About
breakType
, the rationale is to enable segmentation, with the user checking thebreakType
(e.g., soft vs hard line break) without the overhead of the iteration protocol and also with the flexibility ofpreceding
andfollowing
methods.
That sounds like premature optimization, which someone once called "the root of all evil (or at least most of it) in programming". The benefit of avoiding object allocations comes at the cost of introducing significant internal inconsistency in the form of properties that have no analogues on otherwise similar iterators and—in the case of position
—an entirely different meaning from the common and already-established convention of identifying an index after the last match.
Couldn't we at least try the simple conventional interface first? If this complexity is actually worthwhile, then perhaps it should be added across the board rather than limited to a single built-in iterator.
from proposal-intl-segmenter.
Cc @sebmarkbage who raised the performance issue IIRC
from proposal-intl-segmenter.
I removed segment
, so this issue should be fixed.
from proposal-intl-segmenter.
Related Issues (20)
- Advance to stage 3 HOT 7
- Advance to stage 4 HOT 5
- Should we throw exception when the string in Intl.Segmenter.prototype.segment ( string ) is not type string HOT 2
- Should segment data objects expose the context string? HOT 1
- FYI: ICU+WASM based polyfill ongoing work HOT 2
- Consistency with Number.range model HOT 5
- Indexed access and/or Symbol.slice support? HOT 2
- Why do we need to create a isWordLike: undefined in CreateSegmentDataObject If granularity is NOT "word" HOT 2
- Confusing fragment in README.md
- Adopt new GetOptions behavior
- Custom Dictionaries HOT 32
- Extensibility for non-ICU approaches? HOT 2
- Word segmenter with generic locale HOT 10
- Punctuation in the word segmenter
- No locale grapheme segmenter
- Line break support HOT 1
- Unicode Database and Related APIs HOT 1
- -
- Sentence break suppressions
- `granularity: "syllable"` HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from proposal-intl-segmenter.