Comments (10)
Sure, PRs welcome!
from proposal-intl-segmenter.
I can't PR this one myself because "TODO: define possible values, not part of UTS" doesn't provide sufficient information for me to understand the behavior. That's part of why I want a more complete demonstration. 😉
from proposal-intl-segmenter.
Now, the values of breakType are written in the specification, so that TODO is out of date.
from proposal-intl-segmenter.
I saw that, but I am still not clear on how everything is intended to manifest or what exactly breakType
conveys for all granularities. Is this correct?
let segmenter = new Intl.Segmenter("fr", {granularity: "word"});
console.log(...segmenter.segment("Ceci n'est pas une pipe"));
// logs the following to console:
// { segment: "Ceci", index: 0, breakType: "word" }
// { segment: " ", index: 4, breakType: "none" }
// { segment: "n'est", index: 5, breakType: "word" }
// { segment: " ", index: 10, breakType: "none" }
// { segment: "pas", index: 11, breakType: "word" }
// { segment: " ", index: 14, breakType: "none" }
// { segment: "une", index: 15, breakType: "word" }
// { segment: " ", index: 18, breakType: "none" }
// { segment: "pipe", index: 19, breakType: "word" }
from proposal-intl-segmenter.
I thought it would be the other way around, of indicating the type of the break it found and what caused that, but I am not positive. Btw you can test this in Chrome Canary or Node nightly versions if you pass a flag to enable the feature.
from proposal-intl-segmenter.
I really don't care what any given early implementation does, I care about the what the proposal intends. The current README is both incomplete and also in disagreement with the spec text (which doesn't allow for breakType: "letter"
), and I seem to have guessed wrong in my attempt to divine that intent. Could you please update whatever is necessary to resolve this confusion?
from proposal-intl-segmenter.
Usually the break type of every segment will be the break type of the iterator itself. That is, iterators of breakType: word
produce word breaks. In the example @gibson042 gives above, the spaces have word breaks (see UAX#29 here).
from proposal-intl-segmenter.
OK, if word breaks will always give "word", we should probably just use undefined
as the breakType
. The place where the breakType
is really needed is soft vs hard line breaks; I just included it in the others out of consistency and because ICU does it.
from proposal-intl-segmenter.
The reason I suggested testing it in Chrome is because @FrankYFTang implemented this version with care and based on his subject matter expertise, so I believe he chose good details for these sorts of semantic edge cases.
from proposal-intl-segmenter.
hi @aphillips long time no see (10 years ?) :)
from proposal-intl-segmenter.
Related Issues (20)
- Advance to stage 3 HOT 7
- Advance to stage 4 HOT 5
- Should we throw exception when the string in Intl.Segmenter.prototype.segment ( string ) is not type string HOT 2
- Should segment data objects expose the context string? HOT 1
- FYI: ICU+WASM based polyfill ongoing work HOT 2
- Consistency with Number.range model HOT 5
- Indexed access and/or Symbol.slice support? HOT 2
- Why do we need to create a isWordLike: undefined in CreateSegmentDataObject If granularity is NOT "word" HOT 2
- Confusing fragment in README.md
- Adopt new GetOptions behavior
- Custom Dictionaries HOT 32
- Extensibility for non-ICU approaches? HOT 2
- Word segmenter with generic locale HOT 10
- Punctuation in the word segmenter
- No locale grapheme segmenter
- Line break support HOT 1
- Unicode Database and Related APIs HOT 1
- -
- Sentence break suppressions
- `granularity: "syllable"` HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from proposal-intl-segmenter.