Comments (14)
There are two concepts that could be applied:
- Matching which looks for the nearest match of the requested locale (described as "lookup" as described in Section 3.4 of RFC4647).
- Fallbacks (filtering in the spec) in which a list of acceptable alternatives (as in the
Accept-Language
header) is used to filter and select a locale to use.
I think in the first instance the most useful implementation would be matching (aka lookup). Fallbacks (filtering) could be considered in a second round.
Matching requires defining a language tag syntax
The current Gettext local name has no formal syntax (just a simple string comparison). Performing a locale lookup will require a language tag definition. I suggest a limited subset of the RFC5646 language tag.
The basic structure of a language tag is: language-extlang-script-region-variant-extension-privateuse
. I suggest the following limitations be defined to simplify the matching requirements, simplify backwards compatibility and simplify the implementation.
- Extensions and private use subtags should not be supported.
- Variants should not be supported.
- Extlang should not be supported.
- Only define the use of the following subtags: language, script, region
Proposed language tag syntax
Based upon the proposal above, the ABNF of the supported language tag would be:
langtag = language
["-" script]
["-" region]
language = 2*3ALPHA ; shortest ISO 639 code
/ 4ALPHA ; or reserved for future use
/ 5*8ALPHA ; or registered language subtag
script = 4ALPHA ; ISO 15924 code
region = 2ALPHA ; ISO 3166-1 code
/ 3DIGIT ; UN M.49 code
Since validation of language tags is not in scope of Gettext, the format of language, script and region could be relaxed to be simply 1*ALPHA
.
Lookup process
- Perform a string comparison (current process). If the locale exists, use it.
- Downcase the requested locale and the match candidates (although there are conventions, language tags are defined to be case insensitive)
- Split the language tag on
["-", "_"]
. Could be limited to_
if required. BCP47 specifies-
but Posix (Gettext) uses_
. I propose either but that might be considered a breaking change. - Look for a match in the following order:
- language-script-territory
- language-territory
- language-script
- language
from gettext.
I think the job of breaking "de_CH@informal" into multiple locales should be decoupled from Gettext. What Gettext can help with is matching that against the locales it knows.
from gettext.
@josevalim I think we are set then.
- Get available locales:
Gettext.known_locales/1
- Choosing a locale
Gettext.put_locale/2
That should allow an external library to choose a local from the list of known ones and then select it.
from gettext.
Some related standards for this issue:
I guess we're looking for a solution that does a Filtering Matching Scheme
on all available Gettext Languages and then tries lookup messages according to the distance and priority of the match.
I think you are already familiar with that @kipcole9. Does that sound about correct?
from gettext.
Awesome @kipcole9 !
perhaps this is a bit out of scope for Gettext especially because it can be used outside of Gettext?
from gettext.
@kipcole9 Thanks for those details. I have a few follow-up questions:
In german we have formal / informal language. It is quite common to have a corresponding gettext translation each. How would you represent that in a language tag?
Why would you not support extlang?
Does a lookup normally try to find the closest match even if none match strictly or do you compare strictly?
Example:
- Gettext Translations provided for: de-DE, en-US
- User language: de-CH
Would we identify the de-DE language in this scenario or would we request from the user that he provides a translation for “de” itself?
Currently, gettext calls languages “locales” internally. I believe this to be an incorrect term since there’s not necessarily a region involved which would make it a locale. Would you rename where possible to “language”?
Do you think BCP-47 is a good fit (including supporting underscores) even though it specifies ISO639 as the standard for the Language header?
https://www.gnu.org/software/gettext/manual/html_node/Header-Entry.html
from gettext.
@josevalim ex_cldr
implements this proposal (more completely that the proposal) so its available outside of Gettext now. The reason for including this capability in Gettext would be to more easily support the examples where the users locale is en-AU
but the site only implements translations for en
. Currently no translation would be found. There are 108 (!) en-*
locales alone, 6 de
etc etc. Given that its no uncommon to set the locale based upon what the browser sends being able to match to an available Gettext locale is helpful I believe.
This could of course be implemented in Gettext.handle_missing_translation/5
so one approach might be to provide a function that a user could delegate to in their own MyApp.Gettext.handle_missing_translation/5
that implements the proposal but otherwise doesn't interfere with default Gettext behaviour?
from gettext.
@maennchen good questions as always:
Why would you not support extlang?
extang
is there to support legacy language tag formats. From the spec:
Language+extlang combinations are provided to accommodate legacy language tag forms
In german we have formal / informal language. It is quite common to have a corresponding gettext translation each. How would you represent that in a language tag?
In BCP47 that would be handled with a private-use extension. For example, de-CH-x-informal
. Or you could try to have a variant subtag added to the IETF subtag registry :-)
Does a lookup normally try to find the closest match even if none match strictly or do you compare strictly?
In this proposal its strict match. So in your example de-CH
would not resolve to de-DE
. Fallback chains (filtering) would cater for this requirement but is more complex and likely outside the scope of Gettext. ex_cldr
does this when processing Accept-Language
headers.
Do you think BCP-47 is a good fit
Based upon your reference, maybe not. But the principles can still apply. de-CH@Latn
could be considered the same as de-Latn-CH
.
Currently, gettext calls languages “locales” internally.
I believe the appropriate way to references are:
- Language tag is the string uses to denote the collection of localised data
- Locale is the set of localised data
In the Gettext context I think the right descriptions are:
- Language tag: "de-CH"
- Locale: the translations making up the data for "de-CH"
from gettext.
@kipcole9 There’s some people that thought about ISO639 & BCP47 conversion: https://wiki.openoffice.org/wiki/LocaleMapping
@josevalim I believe based on this that this whole thing is too complicated to handle in gettext as well.
Maybe it would be a good idea to provide a behaviour for the language selection and provide a default implementation.
The default implementation could be just: If the search language starts with the user language, it matches.
This could of course be implemented in Gettext.handle_missing_translation/5 so one approach might be to provide a function
Why would you implement this functionality at that level instead of earlier when selecting the language to read?
from gettext.
I believe based on this that this whole thing is too complicated to handle in gettext as well
I can extract some code from ex_cldr
into a separate library that does the matching (and maybe filtering). If that's preferred then close the issue is fine.
from gettext.
Agreed that at this point this is out of scope for Gettext. Let's focus on making sure that Gettext is extendable enougn so that users who wish to do so can use more complex locale-discovery logic.
from gettext.
What do you all think about an approach like this?
defmodule Gettext.LanguageSelection do
@callback language_parents(locale :: String.t()) :: [String.t()]
end
Our default (based on ISO639) would work look something like this:
Gettext.LanguageSelection.Default.language_parents("de_CH@informal")
# => ["de_CH@informal", "de_CH", "de"]
Gettext would just use the first entry where a .po
file exist or the default language if none match.
Libraries like CLDR could then provide their own implementation that works more along the lines of this:
Cldr.GettextLanguageSelection.language_parents("de-CH-latin-x-informal")
# => ["de-CH-latin-x-informal", "de-CH-latin", "de-CH-x-informal", "de-CH", "...", "de"]
This way, the default behaviour should be rather simple to implement and would allow for extension.
from gettext.
@josevalim Ok, fair.
I think we decided to not take any action then and should close this issue. Is that correct or do you still see something that we want to do?
from gettext.
It depends if there is something to do on this part: "What Gettext can help with is matching that against the locales it knows."
from gettext.
Related Issues (20)
- check-up-to-date fails even though files are just extracted HOT 8
- Flaky test in `gettext.extract`
- Duplicate Filename in Reference when `:write_reference_line_numbers` is set to `false`
- Gettext.PluralFormError for plural form "1" in "ja" locale HOT 9
- `gettext.merge` FunctionClauseError HOT 4
- Module is not loaded because :nofile HOT 4
- Mention file path in plural forms deprecation warning HOT 4
- Plural Forms warning occurs in newly generated language file HOT 2
- bump a new version for #359 HOT 1
- Duplicate msgid with singular and plural form HOT 5
- Running `mix gettext.extract` doesn't extract new messages with Elixir 1.15 HOT 2
- Retain custom flags during merge HOT 3
- Interpolation option set but not working HOT 4
- Duplicate references in POT files and warnings about redefining modules HOT 5
- compile depend excoveralls - origin/httpc failed HOT 1
- Add `Gettext.example` macro HOT 6
- If changes to `.po` file are discarded (accidentally), they're not added back HOT 17
- Allow to transform messages at compile time HOT 3
- `expo.msguniq` merges translations with different plurals HOT 4
- Locale changes between static mount and liveview HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gettext.