Comments (12)
yeah sgtm
from proposal-intl-locale.
I'll wait for the stakeholders I CC'ed here, and file an issue against ecma402 based on this. Thank you!
from proposal-intl-locale.
To me this sounds like we can basically specify a parsing + emitting algorithm in the spec with the ability to overwrite specific options.
What's the benefit of it? We have the syntax specified via EBNF in Unicode LDML - https://unicode.org/reports/tr35/tr35.html#Unicode_language_identifier
What's the benefit of copying that into our spec instead of referencing? How would it make it any cleaner for you?
However the spec is being very vague on the parsing and only spell out unicode extension and not others.
How does it spell out unicode extensions differently from other subtags?
Besides, in order to know if, e.g a language matches the unicode_language_subtag production of a unicode_locale_id, we have to parse it anyway.
That's true, but I don't understand how is it an issue. Everywhere in the spec where the spec specifies that a value has to match something, you need to parse it to know if it matches, no?
during reconstruction implementers seem to be expected to re-parse specific pieces.
What do you mean by this? Reconstruction of an Intl.Locale
requires parsing of the input string, yes.
from proposal-intl-locale.
I don't mean to copy the EBNF in the spec, what I mean is:
During constructor, we have to parse the locale
, which results in some form of data structure, e.g:
interface PuExtension {
type: 'x';
value: string;
}
interface Keyword {
key: string;
value: string;
}
interface TransformedExtension {
type: 't';
fields: string[];
lang?: UnicodeLanguageId;
}
interface UnicodeExtension {
type: 'u';
keywords: Keyword[];
attributes?: string[];
}
interface UnicodeLanguageId {
lang: string;
script?: string;
region?: string;
variants?: string[];
}
interface UnicodeLocaleId {
lang: UnicodeLanguageId;
unicodeExtension: UnicodeExtension;
transformedExtension: TransformedExtension;
puExtension: PuExtension;
otherExtensions: Record<string, string>;
}
The data structure also inherently has structural integrity checks like no duplicate singletons & such.
At this point I know it's structurally valid, along with all the components in the locale
. However, in the spec, most modification algorithms involve replacing a well-formed substring with another substring, which in my head means:
- Re-parse the old substring
- Validate the new substring
- Replace certain pieces in the old substring w/ the new substring
- Serialize the result
But if I already parse the original input into a data structure, why do I have to re-parse to conform to the substring
language of the spec? IMO it might be easier to define the Internal Slots
with a data structure like tc39/proposal-unified-intl-numberformat#26 (comment) and then any option
that override would replace that slot, and at the end specify a serialization algorithm (or reference one). So the flow in my head is something like:
Constructor -> parse the locale into Internal Slots -> applyOptions -> serialize
Does that make sense?
from proposal-intl-locale.
I think I understand now what area your concern is around!
Re-parse the old substring
I don't understand what makes you see the replacements as requiring a reparsing the substring.
Implementations are free to store intermediate representation of the data for the use in the algorithm, and all implementers do this all across the code, and usually not on internal slots as defined by the spec.
In my mind the internal slots are mostly useful to define input data for the algorithms in a implementation-independent model (usually supplied by CLDR for us).
@anba, @littledan , @sffc, @jswalden - thoughts?
from proposal-intl-locale.
To be transparent - we're aiming to request advancement of this proposal to Stage 4 during the ongoing TC39 meeting.
I'd also appreciate the position of all stakeholders (esp. @longlho ) on whether this issue should cause us to drop this advancement request from the agenda.
from proposal-intl-locale.
I don't wanna hold stage-4 back and I think due to the nature of most ECMA-402 spec implementations using ICU, the end result of the API will be correct :) I think from a non-ICU implementer perspective this is fairly non-straightforward.
I understand that intermediate representation is up to implementers but based on the current language of the spec right now the intermediate representations being passed around in abstract calls are all String
-based (by the language of replacing substring with another substring
) so it's becoming an implicit requirement for implementers.
Take language getter for example. loc.[[Locale]]
is a string since we're returning the substring of locale corresponding to the unicode_language_subtag production of the unicode_language_id.
. But given we already parsed it and apply options to it already, it seems to implicitly mean that we parse, apply options, store it as string, then when the getter
gets triggered, reparse/revalidate that, and then return the correct unicode_language_subtag substring
.
from proposal-intl-locale.
I think from a non-ICU implementer perspective this is fairly non-straightforward.
I did write a very early polyfill back in 2016, and a Rust implementation and this has not been an area of concern for me when reading the spec. I'm wondering if that's because of my implicit assumptions and experience with ECMA402?
But given we already parsed it and apply options to it already, it seems to implicitly mean that we parse, apply options, store it as string, then when the getter gets triggered, reparse/revalidate that, and then return the correct unicode_language_subtag substring.
I also assume that, but I'm not sure what is the value of including an exact structure of the intermediate data stored by the implementations.
from proposal-intl-locale.
I think the value of it is turning language getter
to just return loc.[[Language]]
(given that Language
is an internal slot). Having an intermediate data structure, as I mentioned, can also correctly reflect structural integrity like no duplicate singleton, no duplicate variant subtags and the like.
I did take a look at your early polyfill :) I'd say that if we just have constructor + toString
it makes the intermediate representation an impl details, cause all you need is parse -> <some data structure> -> serialize
. With the getter it becomes parse -> <some data structure> -> get a field from the data structure
but the data structure is actually not specified.
Side question: does this spec effectively get rid of grandfathered locales?
from proposal-intl-locale.
Thanks for the Rust impl, I took a look as well. I think it's not fully following the ecma spec (https://github.com/zbraniecki/unic-locale/blob/master/unic-locale-impl/src/lib.rs#L255) and seems like there is internal impl data structure.
I think all in all, the spec can be implemented but it'd be a lot clearer/easier having more structure to it.
from proposal-intl-locale.
Side question: does this spec effectively get rid of grandfathered locales?
I believe our switch to Unicode BCP47 Locale Identifiers did.
I think all in all, the spec can be implemented but it'd be a lot clearer/easier having more structure to it.
My position is that its a tradeoff between "clearer" spec and overspecification that attempts to describe what internal logic should do. At best implementers will diverge without any observable impact, but at worst we'll have some observable impact on such internal fields being defined.
I'm open to make such change as a separate PR against ECMA402 spec after merging this into the spec, if other stakeholders agree with you.
Is that an acceptable way forward for you?
from proposal-intl-locale.
I agree with @zbraniecki that this specification was designed to permit more straightforward internal representations that don't imply reparsing. I'm open to editorial PRs to make this change. I think these PRs should land in the ecma402 repo, not here, given that Intl.Locale has already been merged into the main spec.
from proposal-intl-locale.
Related Issues (20)
- Modified CanonicalizeLocaleList does not accept Intl.Locale instances when provided as the only argument HOT 1
- Using Unicode locale ID vs BCP 47 in our spec HOT 51
- Resolved value for Unicode extension values when "true" is removed? HOT 4
- Simplified the change to 2.1 CanonicalizeLocaleList HOT 3
- Docs(MDN) Documentation for Intl.Locale HOT 17
- Does hourCycle exist unconditionally? HOT 3
- Not all RFC 5646 production names updated HOT 3
- Move "get caseFirst" after "get calendar"
- Update references to match current UTS 35 spec HOT 11
- Use `type` production from UTS35
- Directly use unicode_language_id in "get baseName"
- Need to test for SameValue(r.[[kn]], "") in addition to SameValue(r.[[kn]], "true")
- no validation of value of extensions in tag? HOT 8
- CanonicalizeLanguageTag should remove duplicate attributes/keywords in a Unicode extension, consistent with Intl.Locale HOT 6
- Getting the default locale HOT 10
- Drop this proposal in favor of a more general user settings object HOT 6
- Canonicalise Unicode extensions options added through ResolveLocale HOT 8
- CanonicalizeUnicodeLocaleId should only perform step 1 of "BCP 47 Language Tag to Unicode BCP 47 Locale Identifier" HOT 3
- Archive repository, redirect draft to ecma402 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from proposal-intl-locale.