Coder Social home page Coder Social logo

Comments (14)

kipcole9 avatar kipcole9 commented on June 2, 2024 1

There are two concepts that could be applied:

  • Matching which looks for the nearest match of the requested locale (described as "lookup" as described in Section 3.4 of RFC4647).
  • Fallbacks (filtering in the spec) in which a list of acceptable alternatives (as in the Accept-Language header) is used to filter and select a locale to use.

I think in the first instance the most useful implementation would be matching (aka lookup). Fallbacks (filtering) could be considered in a second round.

Matching requires defining a language tag syntax

The current Gettext local name has no formal syntax (just a simple string comparison). Performing a locale lookup will require a language tag definition. I suggest a limited subset of the RFC5646 language tag.

The basic structure of a language tag is: language-extlang-script-region-variant-extension-privateuse. I suggest the following limitations be defined to simplify the matching requirements, simplify backwards compatibility and simplify the implementation.

Proposed language tag syntax

Based upon the proposal above, the ABNF of the supported language tag would be:

   langtag       = language
                   ["-" script]
                   ["-" region]

   language      = 2*3ALPHA            ; shortest ISO 639 code
                 / 4ALPHA              ; or reserved for future use
                 / 5*8ALPHA            ; or registered language subtag

   script        = 4ALPHA              ; ISO 15924 code

   region        = 2ALPHA              ; ISO 3166-1 code
                 / 3DIGIT              ; UN M.49 code

Since validation of language tags is not in scope of Gettext, the format of language, script and region could be relaxed to be simply 1*ALPHA.

Lookup process

  • Perform a string comparison (current process). If the locale exists, use it.
  • Downcase the requested locale and the match candidates (although there are conventions, language tags are defined to be case insensitive)
  • Split the language tag on ["-", "_"]. Could be limited to _ if required. BCP47 specifies - but Posix (Gettext) uses _. I propose either but that might be considered a breaking change.
  • Look for a match in the following order:
    • language-script-territory
    • language-territory
    • language-script
    • language

from gettext.

josevalim avatar josevalim commented on June 2, 2024 1

I think the job of breaking "de_CH@informal" into multiple locales should be decoupled from Gettext. What Gettext can help with is matching that against the locales it knows.

from gettext.

maennchen avatar maennchen commented on June 2, 2024 1

@josevalim I think we are set then.

That should allow an external library to choose a local from the list of known ones and then select it.

from gettext.

maennchen avatar maennchen commented on June 2, 2024

Some related standards for this issue:

I guess we're looking for a solution that does a Filtering Matching Scheme on all available Gettext Languages and then tries lookup messages according to the distance and priority of the match.

I think you are already familiar with that @kipcole9. Does that sound about correct?

from gettext.

josevalim avatar josevalim commented on June 2, 2024

Awesome @kipcole9 !

perhaps this is a bit out of scope for Gettext especially because it can be used outside of Gettext?

from gettext.

maennchen avatar maennchen commented on June 2, 2024

@kipcole9 Thanks for those details. I have a few follow-up questions:

In german we have formal / informal language. It is quite common to have a corresponding gettext translation each. How would you represent that in a language tag?


Why would you not support extlang?


Does a lookup normally try to find the closest match even if none match strictly or do you compare strictly?

Example:

  • Gettext Translations provided for: de-DE, en-US
  • User language: de-CH

Would we identify the de-DE language in this scenario or would we request from the user that he provides a translation for “de” itself?


Currently, gettext calls languages “locales” internally. I believe this to be an incorrect term since there’s not necessarily a region involved which would make it a locale. Would you rename where possible to “language”?


Do you think BCP-47 is a good fit (including supporting underscores) even though it specifies ISO639 as the standard for the Language header?

https://www.gnu.org/software/gettext/manual/html_node/Header-Entry.html

from gettext.

kipcole9 avatar kipcole9 commented on June 2, 2024

@josevalim ex_cldr implements this proposal (more completely that the proposal) so its available outside of Gettext now. The reason for including this capability in Gettext would be to more easily support the examples where the users locale is en-AU but the site only implements translations for en. Currently no translation would be found. There are 108 (!) en-* locales alone, 6 de etc etc. Given that its no uncommon to set the locale based upon what the browser sends being able to match to an available Gettext locale is helpful I believe.

This could of course be implemented in Gettext.handle_missing_translation/5 so one approach might be to provide a function that a user could delegate to in their own MyApp.Gettext.handle_missing_translation/5 that implements the proposal but otherwise doesn't interfere with default Gettext behaviour?

from gettext.

kipcole9 avatar kipcole9 commented on June 2, 2024

@maennchen good questions as always:

Why would you not support extlang?

extang is there to support legacy language tag formats. From the spec:

Language+extlang combinations are provided to accommodate legacy language tag forms

In german we have formal / informal language. It is quite common to have a corresponding gettext translation each. How would you represent that in a language tag?

In BCP47 that would be handled with a private-use extension. For example, de-CH-x-informal. Or you could try to have a variant subtag added to the IETF subtag registry :-)

Does a lookup normally try to find the closest match even if none match strictly or do you compare strictly?

In this proposal its strict match. So in your example de-CH would not resolve to de-DE. Fallback chains (filtering) would cater for this requirement but is more complex and likely outside the scope of Gettext. ex_cldr does this when processing Accept-Language headers.

Do you think BCP-47 is a good fit

Based upon your reference, maybe not. But the principles can still apply. de-CH@Latn could be considered the same as de-Latn-CH.

Currently, gettext calls languages “locales” internally.

I believe the appropriate way to references are:

  • Language tag is the string uses to denote the collection of localised data
  • Locale is the set of localised data

In the Gettext context I think the right descriptions are:

  • Language tag: "de-CH"
  • Locale: the translations making up the data for "de-CH"

from gettext.

maennchen avatar maennchen commented on June 2, 2024

@kipcole9 There’s some people that thought about ISO639 & BCP47 conversion: https://wiki.openoffice.org/wiki/LocaleMapping

@josevalim I believe based on this that this whole thing is too complicated to handle in gettext as well.

Maybe it would be a good idea to provide a behaviour for the language selection and provide a default implementation.

The default implementation could be just: If the search language starts with the user language, it matches.


@kipcole9

This could of course be implemented in Gettext.handle_missing_translation/5 so one approach might be to provide a function

Why would you implement this functionality at that level instead of earlier when selecting the language to read?

from gettext.

kipcole9 avatar kipcole9 commented on June 2, 2024

I believe based on this that this whole thing is too complicated to handle in gettext as well

I can extract some code from ex_cldr into a separate library that does the matching (and maybe filtering). If that's preferred then close the issue is fine.

from gettext.

whatyouhide avatar whatyouhide commented on June 2, 2024

Agreed that at this point this is out of scope for Gettext. Let's focus on making sure that Gettext is extendable enougn so that users who wish to do so can use more complex locale-discovery logic.

from gettext.

maennchen avatar maennchen commented on June 2, 2024

What do you all think about an approach like this?

defmodule Gettext.LanguageSelection do
  @callback language_parents(locale :: String.t()) :: [String.t()]
end

Our default (based on ISO639) would work look something like this:

Gettext.LanguageSelection.Default.language_parents("de_CH@informal")
# => ["de_CH@informal", "de_CH", "de"]

Gettext would just use the first entry where a .po file exist or the default language if none match.

Libraries like CLDR could then provide their own implementation that works more along the lines of this:

Cldr.GettextLanguageSelection.language_parents("de-CH-latin-x-informal")
# => ["de-CH-latin-x-informal", "de-CH-latin", "de-CH-x-informal", "de-CH", "...", "de"]

This way, the default behaviour should be rather simple to implement and would allow for extension.

from gettext.

maennchen avatar maennchen commented on June 2, 2024

@josevalim Ok, fair.

I think we decided to not take any action then and should close this issue. Is that correct or do you still see something that we want to do?

from gettext.

josevalim avatar josevalim commented on June 2, 2024

It depends if there is something to do on this part: "What Gettext can help with is matching that against the locales it knows."

from gettext.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.