Coder Social home page Coder Social logo

Comments (10)

littledan avatar littledan commented on September 24, 2024

Sure, PRs welcome!

from proposal-intl-segmenter.

gibson042 avatar gibson042 commented on September 24, 2024

I can't PR this one myself because "TODO: define possible values, not part of UTS" doesn't provide sufficient information for me to understand the behavior. That's part of why I want a more complete demonstration. 😉

from proposal-intl-segmenter.

littledan avatar littledan commented on September 24, 2024

Now, the values of breakType are written in the specification, so that TODO is out of date.

from proposal-intl-segmenter.

gibson042 avatar gibson042 commented on September 24, 2024

I saw that, but I am still not clear on how everything is intended to manifest or what exactly breakType conveys for all granularities. Is this correct?

let segmenter = new Intl.Segmenter("fr", {granularity: "word"});
console.log(...segmenter.segment("Ceci n'est pas une pipe"));
// logs the following to console:
// { segment: "Ceci", index: 0, breakType: "word" }
// { segment: " ", index: 4, breakType: "none" }
// { segment: "n'est", index: 5, breakType: "word" }
// { segment: " ", index: 10, breakType: "none" }
// { segment: "pas", index: 11, breakType: "word" }
// { segment: " ", index: 14, breakType: "none" }
// { segment: "une", index: 15, breakType: "word" }
// { segment: " ", index: 18, breakType: "none" }
// { segment: "pipe", index: 19, breakType: "word" }

from proposal-intl-segmenter.

littledan avatar littledan commented on September 24, 2024

I thought it would be the other way around, of indicating the type of the break it found and what caused that, but I am not positive. Btw you can test this in Chrome Canary or Node nightly versions if you pass a flag to enable the feature.

from proposal-intl-segmenter.

gibson042 avatar gibson042 commented on September 24, 2024

I really don't care what any given early implementation does, I care about the what the proposal intends. The current README is both incomplete and also in disagreement with the spec text (which doesn't allow for breakType: "letter"), and I seem to have guessed wrong in my attempt to divine that intent. Could you please update whatever is necessary to resolve this confusion?

from proposal-intl-segmenter.

aphillips avatar aphillips commented on September 24, 2024

Usually the break type of every segment will be the break type of the iterator itself. That is, iterators of breakType: word produce word breaks. In the example @gibson042 gives above, the spaces have word breaks (see UAX#29 here).

from proposal-intl-segmenter.

littledan avatar littledan commented on September 24, 2024

OK, if word breaks will always give "word", we should probably just use undefined as the breakType. The place where the breakType is really needed is soft vs hard line breaks; I just included it in the others out of consistency and because ICU does it.

from proposal-intl-segmenter.

littledan avatar littledan commented on September 24, 2024

The reason I suggested testing it in Chrome is because @FrankYFTang implemented this version with care and based on his subject matter expertise, so I believe he chose good details for these sorts of semantic edge cases.

from proposal-intl-segmenter.

FrankYFTang avatar FrankYFTang commented on September 24, 2024

hi @aphillips long time no see (10 years ?) :)

from proposal-intl-segmenter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.