tc39 / proposal-intl-locale Goto Github PK

View Code? Open in Web Editor NEW

32.0 15.0 16.0 103 KB

`Intl.Locale` specification [draft]

Home Page: https://tc39.github.io/proposal-intl-locale/

HTML 97.92% Shell 2.08%

proposal-intl-locale's Introduction

`Intl.Locale` API Specification [draft]

Overview

Motivation

The JavaScript Intl library (ECMA 402) has used strings to identify locales since the beginning. This works well for many simple cases, and is lightweight and user-friendly. ICU uses a Locale class instead. Defining a Locale class allows the following:

Parsing and manipulating the language, region and script of a locale
Reading or writing the Unicode extension tags in a locale
A serializable, standard format to store user locale preferences for use in Intl APIs rather than a combination of language and options bag.
In follow-on proposals, a Locale class can be used for an interface to get at various kinds of locale data, including likely subtags, first day of the week, various display names, etc.

Intl.Locale has a toString method which represents the complete contents of the locale. This method allows Locale instances to be provided as an argument to existing Intl constructors, serialized in JSON, or any other context where an exact string representation is useful.

Intl.Locale is proposed to be the class that HTML uses to expose the current locale to the Web. Currently, HTML supports only navigator.languages, but with navigator.locales, an Array of Intl.Locale instances, browsers may expose user preferences for calendar, numbering system, and more to Progressive Web applications.

Usage examples

The following example shows how to use Intl.Locale

let loc = new Intl.Locale("pl-u-hc-h12", {
  calendar: 'gregory'
});
console.log(loc.language); // "pl"
console.log(loc.hourCycle); // "h12"
console.log(loc.calendar); // "gregory"
console.log(loc.toString()); // "pl-u-ca-gregory-hc-h12"

Implementation Status

Stage 4

Implementation Progress

Backpointers

tc39/ecma402#106

Authors

Zibi Braniecki (@zbraniecki)
Daniel Ehrenberg (@littledan)

Reviewers

TBD

Proposal

Spec

You can view the spec text or rendered as HTML.

Prior Art

Development

Render Spec

npm install
npm run build
open index.html

proposal-intl-locale's People

Contributors

Stargazers

Watchers

Forkers

littledan anba ms2ger gsathya neotim dalavancloud romulocintra mathiasbynens jswalden bocoup isabella232 badges-bot

proposal-intl-locale's Issues

Resolved value for Unicode extension values when "true" is removed?

new Intl.Locale("en-u-kf-true").toString() is "en-u-kf", because of step 9 in https://tc39.github.io/proposal-intl-locale/#sec-apply-unicode-extension-to-tag:

Let newExtension be the canonicalized Unicode BCP 47 U Extension based on attributes and keywords as defined in UTS 35 section 3.6.

But it looks like new Intl.Locale("en-u-kf-true").caseFirst is allowed to be either "true" or "", depending on whether or not CanonicalizeLanguageTag removes "true". From https://tc39.github.io/ecma402/#sec-canonicalizelanguagetag

The specifications for extensions to BCP 47 language tags, such as RFC 6067, may include canonicalization rules for the extension subtag sequences they define that go beyond the canonicalization rules of RFC 5646 section 4.5. Implementations are allowed, but not required, to apply these additional rules.

If we interpret this section to also allow UTS 35 section 3.6, "" is allowed to be returned from caseFirst.

Applying Likely-Subtags algorithms may add non-IANA registered subtags

CLDR contains private use region entries for Kosovo (XK) and Outlying Oceania (QO) which are not official IANA assigned region subtags. That means for example when applying the "Add Likely Subtags" algorithm, the returned language tag may contain non-IANA registered subtags:

js> new Intl.Locale("aln").maximize().toString() 
"aln-Latn-XK"

Do we care about this case? Is it acceptable to return non-IANA registered subtags or do we need to filter them out?

Using Unicode locale ID vs BCP 47 in our spec

@littledan this is a proposal we could work into our Locale spec, if we can get group to agree on the change.

Current spec (and most of the constructors) expect bcp-47 locale id. A cleaner approach would be to use Unicode locale ID, see here for differences:

http://unicode.org/repos/cldr/trunk/specs/ldml/tr35.html#BCP_47_Conformance

It does not allow for the full syntax of [BCP47]:

No irregular or BCP47 grandfathered tags are allowed
No extlang subtags are allowed
A tag must not start with the subtag "x". Thus a privateuse (eg x-abc) can only be after a language subtag like "und"

It allows for certain additions:

For field separator characters, the "_" character can be used as well as the "-" used in [BCP47].
"root" to indicate the generic locale used as the parent of all languages in the CLDR data model.
Certain codes that are private-use in BCP-47 and ISO are given semantics by LDML.
Each macrolanguage has an identified primary encompassed language. That encompassed language is treated as an alias for the macrolanguage, and thus is replaced when canonicalizing.
The language tag may begin with a script rather than a language (specialized use only).

There are multiple problems with bcp-47 tags, from slightly annoying grandfathered tags (source of most Locale bugs in v8), to script mapping.

For example:

Canonicalization of zh, zh-cmn - https://unicode-org.atlassian.net/browse/ICU-7656
ICU strict bcp-47 implementation (given migration to unicode language ids, team is hesitant to do this) - https://unicode-org.atlassian.net/browse/ICU-20167

Display name of a locale

It seems this proposal has no property for revealing the display name of a locale. Has this been purposely omitted?

Need to test for SameValue(r.[[kn]], "") in addition to SameValue(r.[[kn]], "true")

As currently specified we have:

js> new Intl.Locale("en-u-kn", {numeric:true}).numeric
true
js> new Intl.Locale("en-u-kn-true").numeric
false
js> new Intl.Locale("en-u-kn").numeric      
false

Intl.Locale, step 36 should be changed from:

If relevantExtensionKeys contains "kn", then
a. Set locale.[[Numeric]] to ! SameValue(r.[[kn]], "true").

To:

If relevantExtensionKeys contains "kn", then
a. If ! SameValue(r.[[kn]], "true") is true or ! SameValue(r.[[kn]], "") is true, then

Let numeric be true.

b. Else, let numeric be false.
c. Set locale.[[Numeric]] to numeric.

Or alternatively change step 25

If kn is not undefined, set kn to ! ToString(kn).

To:

If kn is not undefined, then
a. If kn is true, set kn to the empty string; else set kn to "false".

And step 36 to:

If relevantExtensionKeys contains "kn", then
a. Set locale.[[Numeric]] to ! SameValue(r.[[kn]], "").

Include motivation, explanation in README

A good explainer should be relatively self-contained, including an explanation of the problem attempted to be solved, some use cases, and how the API solved the problem. This information is likely to partly duplicate the bugs and polyfills linked, but putting the information in one place makes it easier for reviewers to evaluate.

Should _ be supported (equivalent to -)

The current spec parses on -, supporting en-US but not en_US. This matches RFC 5646, but not the extensions included by UTS 35. I'm pretty sure we should always normalize to -, but should we parse _ or throw some sort of error?

Getting the default locale

This is more of a question. Since the tag parameter is required per #15, is the following expression the correct way to get the default locale into an Intl.Locale object?

new Intl.Locale(navigator.language);

If so, I would like to add that to MDN, because I feel this is a common use case.

Use internal slots of Locale object from Intl constructors

Suggestion from @jswalden: the observable effect is that, even if you override toString on a Locale object, it should still "work" in an Intl constructor. There's some benefit for optimization, but the bigger benefit may be for evolution--if Locale objects grow features that can't be serialized to a BCP47 string, then using internal slots can allow those features to be passed to Intl constructor.

Validation of 'calendar', 'collation', and 'numberingSystem' options

The 'calendar', 'collation', and 'numberingSystem' options need to be validated, otherwise we could end up with invalid language tags, for example new Intl.Locale("en", {numberingSystem: "!@-#asdf"}).toString() should not return en-u-nu-!@-#asdf.

Three possible choices:

Perform complete validation, similar to the existing validation for 'hourCycle' and 'caseFirst':

Let calendar be ? GetOption(options, "calendar", "string", undefined, undefined).
If calendar is not undefined, then
1. If calendar is not the name of a calendar type in Unicode Technical Standard 35, throw a RangeError.
Set opt.[[ca]] to calendar.

Only validate the input matches (3*8alphanum) *("-" (3*8alphanum)) (in RFC 5646's ABNF) resp. the type production per UTS35.

Let calendar be ? GetOption(options, "calendar", "string", undefined, undefined).
If calendar is not undefined, then
1. If calendar does not match the [(3*8alphanum) *("-" (3*8alphanum))] sequence, throw a RangeError exception.
Set opt.[[ca]] to calendar.

- Or change the (currently incorrect) assertion in ApplyUnicodeExtensionToTag:
~~1. Assert: ! IsStructurallyValidLanguageTag(locale) is true.~~
1. If ! IsStructurallyValidLanguageTag(locale) is false, throw a RangeError.
(See Edit 1 and Edit 2 below.)

Edit 1:
The third proposal (calling IsStructurallyValidLanguageTag in ApplyUnicodeExtensionToTag for validation) is probably not the right choice, because it may let new Intl.Locale("en", {numberingSystem: "latn-ca-gregory"}) slip through.

Edit 2:
Yup, just tested that this'll be a non-starter:

andre@VBdev:~/hg/mozilla-inbound/js/src/build-debug-opt-obj$ dist/bin/js                             
js> addIntlExtras(Intl)
js> new Intl.Locale("de",{numberingSystem:"latn-ca-gregory"}).toString()
"de-u-nu-latn-ca-gregory"
js> new Intl.Locale("de",{numberingSystem:"latn-ca-gregory"}).numberingSystem
"latn-ca-gregory"

Docs(MDN) Documentation for Intl.Locale

Create Documentation for ** Intl.Locale**

Review Readme documentation and examples
Create MDN Main Docs Page Link

MDN Pages :

prototype
constructor
methods
properties

Interactive Examples MDN :

Locale Generic Usage
Locale.prototype.maximize ()
Locale.prototype.minimize ()
Locale.prototype.toString ()

Browser compat-data :

Locale Generic Usage
Locale.prototype.maximize ()
Locale.prototype.minimize ()
Locale.prototype.toString ()

transfer ownership

@bterlson can you accept the transfer pls?

Should baseName have privateuse extensions?

I wasn't clear what the result of the last meeting was. The current draft spec leaves out privateuse extensions from baseName, but should they be included? They are included in the [[dataLocale]] in ResolvedOptions, so maybe they make sense here too. cc @zbraniecki @nciric @srl295

Move "get caseFirst" after "get calendar"

Similar to #53

Canonicalization between ApplyOptionsToTag and ApplyUnicodeExtensionToTag with grandfathered tags

Consider this example:

var loc = new Intl.Locale("en-gb-oed", {region: "US", calendar: "gregory"});
print(loc.toString()); "en-GB-oxendict-u-ca-gregory"

If a grandfathered tag has a modern replacement, calling CanonicalizeLanguageTag will change the grandfathered tag into a normal langtag language tag. And that means options applied in ApplyOptionsToTag are ignored, but options applied in ApplyUnicodeExtensionToTag are used. Do we want to support this behaviour or do we rather want to ignore all options if the original input was a grandfathered language tag?

Include additional Unicode tags and transforms

If a user uses the Locale constructor with tags and transforms which are not recognized, they will be dropped. This behavior is not very future-compatible because if more tags are added later, the output of Intl.Locale("en-u-xx").toString() will change. Modify the constructor to preserve un-interpreted tags and transforms in an internal slot, to be appended to the output of toString().

Return DefaultLocale() when calling Intl.Locale() with an absent/undefined tag argument?

Should we support Intl.Locale() as a way to retrieve the default language as per DefaultLocale()?

Proposal of adding languageNameOf and regionNameOf methods to Locale

Intl project started mostly because collation requires large amounts of data. Language and region name translations also carry steep size penalty for developers, so it would be nice to expose this data through API for say language and region pickers, or labeling maps, etc.

var t = new Intl.Locale('sr')
t.languageNameOf('en')   // Енглески
t.regionNameOf('GB')     // Велика Британија

var x = new Intl.Locale('sr-Latn')
x.languageNameOf('en')   // Engleski
x.regionNameOf('GB')     // Velika Britanija

For languageNameOf we can accept both string literal or already created Locale object.

For regionNameOf we would accept only string literal with two letter country code. Possibility that would make implementation harder is to also accept Locale object, and expand region using maximize if it's missing.

Question - what do we return if translation is not supported/available? We can return default English names, undefined or throw.

"numeric" option is processed as a boolean, but returned as a string

"numeric" option is processed as a boolean, but returned as a string, which leads to results like:

js> var loc = new Intl.Locale("de-u-kn-false"); 
js> loc.numeric
"false"
js> new Intl.Locale(loc, {numeric: loc.numeric}).numeric
"true"

Maybe save it internally as a boolean, too? Also see InitializeCollator, step 22, for a similar case for Intl.Collator:

If relevantExtensionKeys contains "kn", then
a. Set collator.[[Numeric]] to ! SameValue(r.[[kn]], "true").

Allow for parsing/serializing of the locale code components

Currently, the proposal focused on extension keys. It allows the user to parse a string, such as sr-Cyrl-RU-u-hc-h12 into an object for locale sr-Cyrl-RU with hourCycle=h12.

The missing part is that it doesn't help us handle the core part of the language tag - language, script, region, variant etc.

I think it would be a missed oportunnity if we designed this core class around extensions, and not let us also handle the core operations.

My initial idea would be handle sth like this:

let locale = new Intl.Locale('sr-Cyrl-RU');
locale.language === 'sr';
locale.script === 'Cyrl';
locale.region === 'RU';

And this would of course work in reverse too:

let locale = new Intl.Locale('sr');
locale.region = 'RU';
locale.toString() === 'sr-RU';

I followed the convention for parsing/serializing from RFC4646 - https://tools.ietf.org/html/rfc4646

The reason I'm bringing it early on, is that I'd like us to consider how that would affect our handling of extension keys.
In the proposal as of today, we change the code (hc, nu etc.) to names (hourCycle, numberingSystem) and apply it as a property on the main object.
But that could either come in collision of confuse the user. Why does hourCycle end up being part of unicode extnension keys, while region becomes part of the language tag?

So, and this is pure brainstorming, we should somehow differentiate, either by namespacing the language tag parts, or the extension keys. Systematically, it seems that extension keys are a better option:

let locale = new Intl.Locale('sr-Cyrl-RU-u-hc-h12-nu-arab');
locale.language === 'sr';
locale.script === 'Cyrl';
locale.region === 'RU';
locale.extensions.u.hourCycle === 'h12';
locale.extensions.u.numberingSystem === 'arab';

The issue with this, is that it makes us lose the value of being able to pass the locale object to Intl API constructors:

let locale = new Intl.Locale('en-US', {
  hourCycle: 'h12'
});
let dtf = new Intl.DateTImeFormat(locale);

let dtf = new Intl.DateTimeFormat('en-US', {
  hourCycle: 'h12'
});
let locale = new Intl.Locale('en-US', dtf.resolvedOptions());

I'm not sure how to resolve that.

Opinions, ideas? @caridy, @littledan, @rxaviers ?

In the constructor, should the options argument have a null check as well?

We have an undefined check (mostly because it's an optional arg), but should we add a null check as well? Otherwise, the ToObject in the following line throws.

Differences between RFC 6067 and UTR 35 Unicode extension canonicalization

When comparing https://tools.ietf.org/html/rfc6067#section-2.1.1 against https://www.unicode.org/reports/tr35/#u_Extension, UTR 35 contains the following two additional canonicalization steps:

All keys and types use the canonical form (from the name attribute; see Section 3.6.4 U Extension Data Files).
Type value "true" is removed.

The current proposal only implements canonicalization per RFC 6067. We should document the difference between RFC 6067 and UTR 35.

I'm not sure if we want to apply the canonicalization steps from UTR 35, because while removing "true" type values is easy, the requirement to replace deprecated keys and types requires more thought. For example when the time zone tz Unicode extension key is canonicalized, we may want to ensure the result is consistent with 6.4.2 CanonicalizeTimeZoneName.

"baseName" accessor should be moved before "calendar" accessor for alphabetical ordering

Noticed by @jswalden in https://bugzilla.mozilla.org/show_bug.cgi?id=1433303#c23

no validation of value of extensions in tag?

I am not sure my observation is correct below. Need some help.
It seems to me, with the current spec, we will valid the value of options. For example, if we put in new Intl.Locale("en", {caseFirst: "true"}) or Intl.Loclae("en", {caseFirst: "mom"}) we will throw RangeError because we only pass « "upper", "lower", "false" » to GetOption in https://tc39.es/proposal-intl-locale/#sec-Intl.Locale which does not contains "true" or "mom"

However, what will happen if we call
new Intl.Locale("en-u-kf-true") or new Intl.Locale("en-u-kf-mom") ?
I somehow cannot see anywhere in the spec cause us to throw RangeError.
Is that true, or is it because I missed something.

What should the expected result of
(new Intl.Locale("en-u-kf-true")).caseFirst and
(new Intl.Locale("en-u-kf-mom")).caseFirst ?
@littledan @sffc @Ms2ger @zbraniecki @anba

Unnecessary exception check for IsStructurallyValidLanguageTag in InsertUnicodeExtension

Assert: ! IsStructurallyValidLanguageTag(locale) is true.

I don't think there's a need for the !.

Modified CanonicalizeLocaleList does not accept Intl.Locale instances when provided as the only argument

This is because step 3-4 still expects an object argument to be array-like. Is this intentional?

What is the InitializedLocale slot used for?

I see that this proposal is re-using the convention from the ecma402 spec so maybe this is broader question, but I happen to be reviewing the implementation of Intl.Locale in V8 so I'm asking here.

Why do we check for the InitializedLocale slot and then proceed to use some other internal slot? Shouldn't we check for the slot that we want to use?

If we don't want to do that, can the spec be cleaned up to have all the brand checking in a common IsLocale() operation like in ecma262?

Incorrect double validation for language tag in ApplyOptionsToTag

In ApplyOptionsToTag,

4. If language is not undefined, then
       a. If language does not match the language production, throw a RangeError exception.
       b. If language matches the grandfathered production, throw a RangeError exception.

which means that language can not match a grandfathered production after this point.

Later in ApplyOptionsToTag,

 10. If language is not undefined,
        a. If tag matches the privateuse or grandfathered production,
            i. Set tag to language.
           ii. If tag matches the grandfathered production,
                1. Set tag to CanonicalizeLanguageTag(tag).

We've already checked to see if the language tag matches a grandfathered production in step 4. I don't see why we require another validation and canonicalization in steps 10. a. ii and 10. a. ii. 1

No intrinsic defined for %LocalePrototype%

The value of Intl.Locale.prototype is %LocalePrototype%.

InsertUnicodeExtension when called from ApplyUnicodeExtensionToTag cannot fail

ApplyUnicodeExtensionToTag, step 10.a

If newExtension is not the empty String, then
a. Let locale be ? InsertUnicodeExtension(locale, newExtension).

But InsertUnicodeExtension can only fail when locale is either a grandfathered or private-use only tag, see InsertUnicodeExtension, step 3.

If locale matches the privateuse or the grandfathered production, throw a RangeError exception.

And when called from ApplyUnicodeExtensionToTag, locale is never a grandfathered or private-use only tag, because of ApplyUnicodeExtensionToTag, step 2.

If tag matches the privateuse or the grandfathered production, then
...
d. Return result.

Does hourCycle exist unconditionally?

https://tc39.github.io/proposal-intl-locale/#sec-Intl.Locale.prototype.hourCycle does NOT say:

This property only exists if [...].

However, in the V8 implementation, e.g. new Intl.Locale("pl", { calendar: 'gregory' }).hourCycle === undefined.

Does hourCycle exist unconditionally? If so, we should fix the V8 implementation. If not, we should update the spec to add this line.

@@toStringTag value

Intl.Locale seems to be the first ECMA-402 object which has a meaningful @@toStringTag :tada: . But that also means the decision to use a meaningful value should be coordinated with the rest of ECMA-402 (for example with the other stage 3 proposals: Intl.ListFormat and Intl.RelativeTimeFormat both use "Object" while they probably shouldn't; Intl.Segmenter doesn't define a @@toStringTag property at all).

Also see tc39/ecma402#176.

Always retrieve 'language', 'script', and 'region' options

The 'language', 'script', and 'region' options should always be retrieved and validated, even when unused, for consistency with Unicode extension options and for consistency with other Intl constructors, e.g. compare to InitializeNumberFormat which gets and validates the currency and currencyDisplay options even when style is not "currency".

Add a way to remove subtags through options?

The Intl.Locale constructor currently only allows to add subtags to a language tag, but it's not possible to remove existing subtags. Does it make sense to add support to remove specific subtags?

var en = new Intl.Locale("en");
var enUS = new Intl.Locale(en, {region: "US"});
// no way to remove the region subtag when calling `new Intl.Locale(enUS, {...})`.

Error behaviour not defined for Add Likely Subtags algorithm

https://www.unicode.org/reports/tr35/#Likely_Subtags has under step 3:

If there is no match,either return

an error value, or

the match for "und" (in APIs where a valid language tag is required).

FWIW ICU's uloc_addLikelySubtags seems to perform neither of the two options from above, instead it returns the input language tag.

Dealing with key-values & options, and duplicates

I am working on v8 implementation of locale, and there are some questions about actual behavior.

If I do this:

let loc = new Intl.Locale("sr-Cyrl-u-hc-h12", {
  calendar: 'gregory'
});
console.log(loc.locale); // "sr-Cyrl"
console.log(loc.hourCycle); // "h12"
console.log(loc.calendar); // "gregory"
console.log(loc.toString()); // "sr-Cyrl-u-ca-gregory-hc-h12"

Each key-value from -u- section will be present as property on the object (hourCycle, calendar,...)
What do we do with -t- section? There is only value, no key (as in it-t-ja)
What happens with -x- key-values? Esp. if they match names in -u-?

What about this:

let loc = new Intl.Locale("sr-u-ca-buddhist", {
  calendar: 'gregory'
});
console.log(loc.locale); // "sr"
console.log(loc.calendar); // ?
console.log(loc.toString()); // "sr-u-ca-gregory-ca-buddhist" ?

ICU canonicalization does sort, keys, but it doesn't remove duplicates
If specified in locale string, what do we do with dupes?
If specified in options, does the key-value overwrite the same key in the locale string (in this case, would we overwrite buddhist with gregory, or would we append)?

Missing "be" in UnicodeExtensionComponents, step 6.c

Noticed by @jswalden in https://bugzilla.mozilla.org/show_bug.cgi?id=1433303#c23

Let subtag the String value equal to the substring [...].

Missing "be" between "subtag" and "the":

Let subtag be the String value equal to the substring [...].

Use `type` production from UTS35

In https://tc39.es/proposal-intl-locale/#sec-Intl.Locale, (3*8alphanum) *("-" (3*8alphanum)) can be replaced with type from 3.2 Unicode Locale Identifier.

Decide on behavior for Intl.Locale if the locale is unsupported

The current spec draft allows parsing any locale into an Intl.Locale object, even if there's no data in the system. An alternative is to only permit locales which have some data in the system, and throw an exception for other, unknown locales. One complication here is that the set of supported locales is not universal but modelled by Intl as individual per service (in the [[AvailableLocales]] internal slot). We could specify this by saying the set of supported locales must include at least the union of all of the services, for example.

I heard arguments from TC39 members in both directions. @bterlson argued it would be confusing to have a Locale object pointing to a non-existent locale. @kverrier was saying that it might be useful for his application, on the other hand. The current policy was decided based on discussions with @zbraniecki which seemed to point in the direction of supporting it.

One first step would be trying out other Intl libraries and seeing what they do. When I was looking at them before, I didn't see their documentation listing this point one way or another.

This was the main thing that the committee raised to be answered for Stage 2.

Simplified the change to 2.1 CanonicalizeLocaleList

I think with the current shape of the spec, the changes to step 7.c.iii and iv
"
If Type(kValue) is Object and kValue has an [[InitializedLocale]] internal slot, then
Let tag be kValue.[[Locale]].
Else,
"
is not longer needed and could be removed. Because if we keep the old way without this, it will simply execute "Let tag be ? ToString(kValue)."
and eventually call to the Intl.Locale.prototype.toString ()
and that will still return a loc.[[Locale]]

the recent change on the top, in the other hand, I think is necessary.

Throw an error if options cannot be applied instead of silently ignoring them?

Does it make sense to throw an error instead of silently ignore options when they cannot be applied? For example new Intl.Locale("x-private", {language: "en"}).toString() does not produce "en-x-private" as some may expect, but instead returns "x-private".

Consider performing complete Unicode extension canonicalization per RFC6067

That means sorting all keys and attributes per https://tools.ietf.org/html/rfc6067#section-2.1.1.

Like, we're already halfway there (deduplication and sorting of known keys), so maybe we should also take the last step into full canonicalization per RFC6067?

And maybe also removing duplicate attributes which are considered irrelevant per https://tools.ietf.org/html/rfc6067#section-2.1, so we handle them consistent compared to duplicate keys:

Only the first occurrence of an attribute or key conveys meaning in a
language tag. When interpreting tags containing the Unicode locale
extension, duplicate attributes or keywords are ignored in the
following way: ignore any attribute that has already appeared in the
tag and ignore any keyword whose key has already occurred in the tag.

CanonicalizeLanguageTag should remove duplicate attributes/keywords in a Unicode extension, consistent with Intl.Locale

Canonicalization performed by CanonicalizeLanguageTag and that performed by Intl.Locale differ in two intended ways.

CanonicalizeLanguageTag doesn't remove duplicated attributes or keywords, e.g. "en-u-attr-attr" and "en-u-co-dict-co-phonebk" are both considered to be canonical. Intl.Locale does (and almost necessarily must, to integrate keywords in the input tag with keywords specified through the options bag).
CanonicalizeLanguageTag doesn't replace aliased subtags in Unicode locale extension sequences with their preferred forms, e.g. "en-u-ms-imperial" is canonical according to CanonicalizeLanguageTag, but Intl.Locale will transform it to "en-u-ms-uksystem". (This latter behavior doesn't exist in the current spec because of changes to TR35 upstream. See #77 for dealing with that change.)

On the call last week I had thought the latter TR35 upstream change was something we had accepted, and I didn't understand that the first problem still remained, so I was fine with this proposal moving forward. But the latter change was unintentional (#77 will deal with it), and the first problem is real. We need to fix both of these to move this proposal forward, IMO. :-(

I have a patch that augments this proposal with changes to the existing CanonicalizeLanguageTag algorithm such that duplicate attributes and keywords are removed. I am not sure that this is the most elegant way to implement deduplication. But it gets the job done, and of course implementations will choose whatever approach works best for them in reality. I'll create a PR once I've gotten this issue filed and have an issue number to refer to.

API for component validation/canonicalization

When operating on a Locale object we want to be able to operate on its components. For example:

let loc = new Intl.Locale("fr-CH-u-hc-h12");
loc.language === "fr";
loc.region === "CH";

but what if we want to modify one of the components:

let loc = new Intl.Locale("fr-CH-u-hc-h12");
loc.region = "ca";

When should be validate/canonicalize the component?

I'd suggest we do this on input, so that ca in the example above gets canonicalized to CA and if you provide an invalid input it throws:

let loc = new Intl.Locale("fr-CH-u-hc-h12");
loc.region = "2!"; // throws

Does it make sense to throw on getter, or should we switch to get_region/set_region?

Directly use unicode_language_id in "get baseName"

get Intl.Locale.prototype.baseName

Step 4 can be removed, because locale always matches unicode_locale_id:

If locale does not match the unicode_locale_id production, return locale.

Step 5 is unnecessarily complicated:

Return the substring of locale corresponding to the language ["-" script] ["-" region] *("-" variant) subsequence of the unicode_language_id grammar.

It should be changed to:

Return the substring of locale corresponding to the unicode_language_id production.

Spec questions/errors

ApplyOptionsToTag

Step 3 should read:
If tag matches the langtag production and does not match grandfathered,
to cover the case of regular grandfathered language tags.
There are a few copy-paste errors in sub-steps of step 3 (wrong production names, wrong variable names).

FindExtension

Step 7 has the same error has the main 402 spec (cf. discussion at http://logs.libuv.org/tc39/2017-12-21#22:33:18.344)

Intl.Locale

I don't understand why ResolveLocale is used here. For example if you step through the relevant steps of ResolveLocale and BestFitMatcher, you'll realise that using ResolveLocale will always remove the script and region subtags, which doesn't really seem to be the expected result. 😉

Internal slots

all two- or three-character strings with code points in the range "a" through "z" doesn't cover all possible strings which can be generated from the language production.
Do we want to keep the same restrictions for "co-standard" and "co-search" which are applied for Intl.Collator?
Do we want to restrict the input for Intl.Locale if certain Unicode extensions values aren't supported in Intl.{Collator, NumberFormat, DateTimeFormat}?

Use Intl.Locale as the first step of instantiating formatters

In the April 2018 Intl call, the group decided to attempt to base Intl formatters on creating an Intl.Locale object. This would have a few observable changes to existing functionality, for example resolvedOptions().locale would be fully canonicalized and not leave out unsupported options. Given the diversity of behavior of existing implementations, we're hoping this change will be web compatible. Patch to come to make the details more clear.

Update references to match current UTS 35 spec

The current draft spec was written against UTS 35, version 34, but the UTS 35 is now at version 35 and version 35 contained many changes for Unicode BCP 47 locale identifiers. I'd suggest making a check over the complete Intl.Locale spec to verify it still matches what's currently in UTS 35.

For example:

Things like "Use the subtag matching unicode_language_subtag" is now ambiguous, because unicode_language_subtag is not only used in unicode_language_id, but also in the tlang production.
Step 8 in ApplyUnicodeExtensionToTag:

Let newExtension be the canonicalized Unicode BCP 47 U Extension based on attributes and keywords as defined in UTS #35 section 3.6.

The text refers to what was in http://www.unicode.org/reports/tr35/tr35-53/tr35.html#u_Extension, but that's now part of http://unicode.org/reports/tr35/#Canonical_Unicode_Locale_Identifiers (cf. canonical syntax and canonical form in that section).

Not all RFC 5646 production names updated

There are still some references to RFC 5646 BCP-47 language tag production names, e.g. region instead of unicode_region_subtag in https://tc39.es/proposal-intl-locale/#sec-Intl.Locale.prototype.region. Also in the "script" and "baseName" getters, maybe more elsewhere.

I would like to expose other user settings. Can this proposal accomodate me?

As someone who uses the en-US locale, but sets their date format to YYYY-MM-DD, time format to HH:mm:ss, temperature unit to Celsius, distance unit to km, and thousands separator to a thin space, I am often poorly served when apps use the defaults for my locale, instead of reading from my OS settings.

One other environment I am familiar with, .NET, encompasses at least some of these settings in a CultureInfo, which they say is "called a locale for unmanaged code development". It seems to largely overlap with this proposal, but also have DateTimeFormat and NumberFormat to address my use cases. (It does not have units of measure; I guess that is separate.)

There isn't any description of the list of things exposed, except in the spec, which only lists their names. So it's hard to tell if e.g. numeric formatting info is exposed. But at least according to whatwg/html#3046 , date formats are not supported.

I realize this proposal may be working off some predefined set of things to expose, apparently going under the name "Unicode extension tags". I'm not clear if that means it will never support my use cases, or if there are Unicode extension tags for my use cases that aren't exposed yet, or if you eventually plan on going beyond Unicode extension tags...

If we don't plan to support at least retrieving the user's configured date and number formatting, through a combination of this and the navigator.locales proposal, then I am worried this proposal might not be a good idea, since we'll just need to work on another, very similar proposal for addressing those use cases later, and things would get confusing.