Coder Social home page Coder Social logo

Comments (7)

aphillips avatar aphillips commented on June 13, 2024 1

There are really two choices that occur to me here. One is to use zxx. The other would be und (Undetermined). The und tag is usually imputed to content with no language tag and it is used in CLDR and locale systems (such as JS's Intl.Locale) to mean the "root" locale. This might be more like what you intend. Regardless of what primary language subtag you choose, you should not use invalid tags such as zxx-shape. You might use a private-use tag, though, such as zxx-x-shape or und-x-symbols.

from handwriting-recognition.

wacky6 avatar wacky6 commented on June 13, 2024

I agree. zh-CN is not the technically correct way of unambiguously specifying a language, but its arguably more commonly used on the Web (Accept-Language, navigator.languages).

So I'd keep it in the example. In the meanwhile, I've created a PR to include script subtag in the example.

See #8 (in conjunction with #2)

from handwriting-recognition.

r12a avatar r12a commented on June 13, 2024

I think the time when it was commonly used on the Web as a way to refer to Simplified Chinese was many years ago. Nowadays zh-Hans works fine pretty well everywhere. And by using zh-CN prominently in your example you only promote the incorrect usage. So i still think you should change it. I can refer this issue to the i18n WG if you like.

I think it's fine to mention zh-CN as something that a user may type in, but which should be interpreted to mean zh-Hans, which you do in #8. But i think it should be framed to look like the recogniser is correcting incorrect input (which it is, since the actual script/orthography is very important for handwriting). I see an implication in the quoted text above (esp. because it doesn't even mention zh-Hans) that zh-CN is an appropriate way of referring to SC. It's really not. It's only appropriate if the language tag ignores script information and actually focuses on the region – which it may do, for example, when what's important is the spoken language (although that's problematic wrt zh too unless there's an implicit association of zh with cmn), or the locale (eg. for location services, legal reasons, etc.)

from handwriting-recognition.

wacky6 avatar wacky6 commented on June 13, 2024

I woundn't say using "zh-CN" here is incorrect, given:

  1. The attribute is language, not script. Language is a broad term. "zh-CN" basically means "Chinese used in Mainland China".

    • In fact, simplified chinese, traditional chinese and latin alphabets are all used in Mainland China.
    • Assuming we promote "zh-Hans", would "Hans" exclude characters from latin alphabet (from the recognizer), I'm not sure.
    • We don't want the API to say "you need to unambiguously specify all the scripts". It's probably more confusing than just specifying the region.
  2. The script can be determined by using some established rules (e.g. Unicode likely subtag). "zh-CN" gets interpreted to "zh-Hans-CN". Though this precise interpretation may be undesirable (see the point above).

    • The recognizer may have to include more scripts (i.e. Latn + Hans / Hani).
  3. From API ergonomic point of view, we don't want to give developers the impression that they need / should convert "zh-CN" to "zh-Hans" so they use the API correctly.

    • I don't know of a simple way to get the script for any language tag in the browser. My feeling is developers will use "zh-CN" (even it's technically incorrect for a script), as long as it works (if the browser interprets reasonably).
    • It's perfectly okay for a website to target users in a region, and don't worry about the exact script being used (and let the browser deal with it). The recognizer is free to (and should) find out the scripts (appropriate for that region, and include all of them).

from handwriting-recognition.

wacky6 avatar wacky6 commented on June 13, 2024

Hi @r12a , we have a question about language tag for non-standard "languages".

We have handwriting models for recognizing geometric shapes and/or user guestures (e.g. a square), what language tag could we use for this case?

I see there is a "zxx" primary tag for "No linguistic content; Not applicable". Is it suitable? For example, use "zxx-Shape" for the above recognizer. Or is private subtags more suitable?

from handwriting-recognition.

r12a avatar r12a commented on June 13, 2024

I think it's best to avoid private subtags if at all possible, and zxx may indeed be what you need, but i refer this question to @aphillips, since he's a co-author of BCP-47.

from handwriting-recognition.

wacky6 avatar wacky6 commented on June 13, 2024

Closing this issue.

zh_CN and zh_Hans convey different meanings, "zh_CN" means "Chinese as used in mainland China", "zh_Hans" means "Simplified Chinese regardless of where it's used". Web applications should choose whichever is more suitable for their use cases.

We allow the browser implementation and the underlying recognizer to make reasonable assumptions about the script (considering different handwriting recognizer implementations identifies their models differently).


For shape / user gesture models, we will use a zxx private tag ("zxx-x-shape"), following this precedence: MLKit shape detection models.

from handwriting-recognition.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.