Coder Social home page Coder Social logo

Support CLDR JSON about d3-format HOT 9 CLOSED

d3 avatar d3 commented on May 4, 2024
Support CLDR JSON

from d3-format.

Comments (9)

curran avatar curran commented on May 4, 2024

For the record, CLDR stands for "Common Locale Data Repository", and this seems to be its official home page http://cldr.unicode.org/

Are you suggesting that d3-format adds https://github.com/unicode-cldr/cldr-json as a dependency? There is no package.json there, also no JSON files that I can see. Here's an interesting segment of their README:

Because the CLDR is so large and contains so many different types of information, the JSON data
here is grouped into packages by functionality. For each type of functionality, there are two
available packages: The "[modern][]" packages, which contain the set of locales listed as modern
coverage targets by the CLDR subcomittee, and the "[full][]" packages, which contain the complete
set of locales, including those in the corresponding modern packages. The functional groups are:

  • [cldr-core][] – Basic CLDR supplemental data — only one package here, no "full" and "modern".
  • [cldr-dates][] – Data for date/time formatting, including data for Gregorian calendar.
    Requires that the corresponding [cldr-numbers][] package be installed as well.
  • cldr-cal-[type] – CLDR data for non-Gregorian calendars. [type] is one of the supported non-Gregorian calendar types in CLDR:
    [buddhist][], [chinese][], [coptic][], [dangi][], [ethiopic][], [hebrew][], [indian][], [islamic][], [japanese][], [persian][], or [roc][].
  • [cldr-localenames][] – Translated versions of locale display name elements: languages, scripts, territories, and variants.
  • [cldr-misc][] – Other CLDR data not defined elsewhere.
  • [cldr-numbers][] – Data for number formatting.
  • [cldr-rbnf][] – Rule Based Number Formatting data — only one package here, no "full" and "modern".
  • [cldr-segments][] – Line breaking data from Unicode's ULI project
  • [cldr-units][] – Data for units formatting.

Note that the links do not go anywhere.

from d3-format.

thedavidmeister avatar thedavidmeister commented on May 4, 2024

@curran CLDR format is the JSON format for defining formatting, collation, etc. for many locales used by Unicode.

So, no I'm not really saying that CLDR is a dependency (it doesn't have to be).

I'm just saying that for each of decimal, thousands, grouping, currency support the CLDR equivalent config options.

So, for a concrete example, let's say we want to declare how to format numbers in en-AU (because I'm Australian 😉). We know that we want 1000000 to look like "1,000,000" as a string.

We can go to CLDR numbers modern (as opposed to numbers full) https://github.com/unicode-cldr/cldr-numbers-modern/tree/master/main and then find the JSON for en-AU https://github.com/unicode-cldr/cldr-numbers-modern/blob/master/main/en-AU/numbers.json.

It looks like this:

{
  "main": {
    "en-AU": {
      "identity": {
        "version": {
          "_number": "$Revision: 13050 $",
          "_cldrVersion": "30.0.3"
        },
        "language": "en",
        "territory": "AU"
      },
      "numbers": {
        "defaultNumberingSystem": "latn",
        "otherNumberingSystems": {
          "native": "latn"
        },
        "minimumGroupingDigits": "1",
        "symbols-numberSystem-latn": {
          "decimal": ".",
          "group": ",",
          "list": ";",
          "percentSign": "%",
          "plusSign": "+",
          "minusSign": "-",
          "exponential": "e",
          "superscriptingExponent": "×",
          "perMille": "‰",
          "infinity": "∞",
          "nan": "NaN",
          "timeSeparator": ":"
        },
        "decimalFormats-numberSystem-latn": {
          "standard": "#,##0.###",
          "long": {
            "decimalFormat": {
              "1000-count-one": "0 thousand",
              "1000-count-other": "0 thousand",
              "10000-count-one": "00 thousand",
              "10000-count-other": "00 thousand",
              "100000-count-one": "000 thousand",
              "100000-count-other": "000 thousand",
              "1000000-count-one": "0 million",
              "1000000-count-other": "0 million",
              "10000000-count-one": "00 million",
              "10000000-count-other": "00 million",
              "100000000-count-one": "000 million",
              "100000000-count-other": "000 million",
              "1000000000-count-one": "0 billion",
              "1000000000-count-other": "0 billion",
              "10000000000-count-one": "00 billion",
              "10000000000-count-other": "00 billion",
              "100000000000-count-one": "000 billion",
              "100000000000-count-other": "000 billion",
              "1000000000000-count-one": "0 trillion",
              "1000000000000-count-other": "0 trillion",
              "10000000000000-count-one": "00 trillion",
              "10000000000000-count-other": "00 trillion",
              "100000000000000-count-one": "000 trillion",
              "100000000000000-count-other": "000 trillion"
            }
          },
          "short": {
            "decimalFormat": {
              "1000-count-one": "0K",
              "1000-count-other": "0K",
              "10000-count-one": "00K",
              "10000-count-other": "00K",
              "100000-count-one": "000K",
              "100000-count-other": "000K",
              "1000000-count-one": "0M",
              "1000000-count-other": "0M",
              "10000000-count-one": "00M",
              "10000000-count-other": "00M",
              "100000000-count-one": "000M",
              "100000000-count-other": "000M",
              "1000000000-count-one": "0B",
              "1000000000-count-other": "0B",
              "10000000000-count-one": "00B",
              "10000000000-count-other": "00B",
              "100000000000-count-one": "000B",
              "100000000000-count-other": "000B",
              "1000000000000-count-one": "0T",
              "1000000000000-count-other": "0T",
              "10000000000000-count-one": "00T",
              "10000000000000-count-other": "00T",
              "100000000000000-count-one": "000T",
              "100000000000000-count-other": "000T"
            }
          }
        },
        "scientificFormats-numberSystem-latn": {
          "standard": "#E0"
        },
        "percentFormats-numberSystem-latn": {
          "standard": "#,##0%"
        },
        "currencyFormats-numberSystem-latn": {
          "currencySpacing": {
            "beforeCurrency": {
              "currencyMatch": "[:^S:]",
              "surroundingMatch": "[:digit:]",
              "insertBetween": " "
            },
            "afterCurrency": {
              "currencyMatch": "[:^S:]",
              "surroundingMatch": "[:digit:]",
              "insertBetween": " "
            }
          },
          "standard": "¤#,##0.00",
          "accounting": "¤#,##0.00;(¤#,##0.00)",
          "short": {
            "standard": {
              "1000-count-one": "¤0K",
              "1000-count-other": "¤0K",
              "10000-count-one": "¤00K",
              "10000-count-other": "¤00K",
              "100000-count-one": "¤000K",
              "100000-count-other": "¤000K",
              "1000000-count-one": "¤0M",
              "1000000-count-other": "¤0M",
              "10000000-count-one": "¤00M",
              "10000000-count-other": "¤00M",
              "100000000-count-one": "¤000M",
              "100000000-count-other": "¤000M",
              "1000000000-count-one": "¤0B",
              "1000000000-count-other": "¤0B",
              "10000000000-count-one": "¤00B",
              "10000000000-count-other": "¤00B",
              "100000000000-count-one": "¤000B",
              "100000000000-count-other": "¤000B",
              "1000000000000-count-one": "¤0T",
              "1000000000000-count-other": "¤0T",
              "10000000000000-count-one": "¤00T",
              "10000000000000-count-other": "¤00T",
              "100000000000000-count-one": "¤000T",
              "100000000000000-count-other": "¤000T"
            }
          },
          "unitPattern-count-one": "{0} {1}",
          "unitPattern-count-other": "{0} {1}"
        },
        "miscPatterns-numberSystem-latn": {
          "atLeast": "{0}+",
          "range": "{0}–{1}"
        }
      }
    }
  }
}

There isn't actually an en-AU entry in the equivalent place in d3 - https://github.com/d3/d3-format/tree/master/locale but if we refer to the en-GB data for d3 (close enough, right?) we get this https://raw.githubusercontent.com/d3/d3-format/master/locale/en-GB.json:

{
  "decimal": ".",
  "thousands": ",",
  "grouping": [3],
  "currency": ["£", ""]
}

You can see that CLDR JSON has much more info than d3 wants/needs, but that what d3 needs is a subset of the information provided by CLDR JSON.

For apps that are already using CLDR JSON to configure other i18n tools, it would be handy to also use the same data for d3.

from d3-format.

thedavidmeister avatar thedavidmeister commented on May 4, 2024

also, from https://github.com/unicode-cldr/cldr-json#cldr-json

Installation

Installation using NPM:

$ npm install <package-name> , where <package-name> is one of the package names mentioned above, for example:

$ npm install cldr-dates-full

Installation using bower:

$ bower install <package-name> , where <package-name> is one of the package names mentioned above, for example:

$ bower install cldr-dates-full

from d3-format.

thedavidmeister avatar thedavidmeister commented on May 4, 2024

hypothetically, if d3-format was CLDR compatible, then the solution to #21 (just as an example) would be to simply pull https://github.com/unicode-cldr/cldr-numbers-modern/blob/master/main/en-IN/numbers.json into the d3-format repo somehow and use it as-is

from d3-format.

curran avatar curran commented on May 4, 2024

Ah I see what you mean. Thanks for the clarification. So this would require modification of d3-format to be an "engine" or "compiler" of sorts for the CLDR specification.

There seems to already be a number of implementations available that do exactly that:

You mentioned that you're already using CLDR for everything except D3 axes. It should be possible to adopt one of the above libraries, and pass their formatting function into axis.tickFormat. Would that solve your use case?

from d3-format.

thedavidmeister avatar thedavidmeister commented on May 4, 2024

@curran yes, i personally ended up using tickFormat to wrap the google closure lib's number formatting.

yes, i listed existing implementations of formatting/parsing as a potential way forward in this area if it was interesting to the d3 team.

i suppose the question is, why not just use an existing CLDR implementation and migrate away from the d3 i18n code in some future release?

there would be multiple benefits to this long term, as I listed earlier 😄

from d3-format.

mbostock avatar mbostock commented on May 4, 2024

You’re welcome to use an alternative number of date formatting library instead of d3-format and d3-time-format; that’s one of the goals of D3’s module system introduced in 4.0.

I don’t think it makes sense to have this library read CLDR JSON directly. CLDR is more expressive than the limited configuration supported by this library, and adding full support for CLDR features would replicate the working of existing CLDR libraries. Wouldn’t it make more sense to just use a library intended to consume CLDR, like jQuery globalize or moment-cldr?

That said, here are two approaches that would be reasonable:

  1. Writing a script that automatically converts CLDR JSON to the subset that d3-format and d3-time-format supports, reviewing these new locale definitions, and replacing (& extending) the current locale definitions in d3-format and d3-time-format.

  2. Making a d3-format-cldr and/or d3-time-format-cldr plugin that automatically converts CLDR JSON to the respective locale definition for d3-format and d3-time-format. (This is the same as approach 1, but it’s done on-the-fly and only benefits people who use these plugins.)

With either approach I expect there will be several difficult decisions regarding how to represent the more expressive CLDR format in terms that can be understood by d3-format and d3-time-format. Thus it would be important to document explicitly what is lost in the conversion. It could also be reasonable to propose specific new features to d3-format and d3-time-format based on what CLDR supports, but we should evaluate those on a case-by-case basis rather than attempting to replicate all of CLDR’s features.

from d3-format.

mbostock avatar mbostock commented on May 4, 2024

I’ve also opened #35 to fix #21 by adding a locale definition for en-IN. Given that we don’t support decimal format for numbers greater than 1e21 anyway (see #24), I believe this is a reasonable locale definition.

from d3-format.

thedavidmeister avatar thedavidmeister commented on May 4, 2024

@mbostock

You’re welcome to use an alternative number of date formatting library instead of d3-format and d3-time-format; that’s one of the goals of D3’s module system introduced in 4.0.

Yup, it's awesome that I could provide my own formatter here. I do appreciate that 😄

I don’t think it makes sense to have this library read CLDR JSON directly.

What is the reasoning behind d3-format if not formatting (and maybe parsing #20 ) things into strings in an i18n friendly way? If that is the goal, CLDR support makes perfect sense to me as it is the world's most comprehensive and standards compliant repository of l10n formatting patterns.

en-IN isn't the only problem, it's just an example of one case where things can get tricky without a more expressive DSL. I believe that scaling from around 20-30 languages/locales to the ~800 locales that CLDR supports would quickly reveal more edge cases not currently covered by the d3 config/format system (e.g. i note there's a slot for formatting percentages and percent signs in CLDR that seems totally relevant to d3).

This isn't even really touching on other i18n issues that must affect d3 but aren't really the domain of d3-format. eg. how to collate values to correctly provide an ordered list for an axis.

Maybe this is actually a discussion for d3 itself rather than d3-format?

as you said in #21

I suppose it’d be nice to allow more arbitrary repeating sequences but without any other examples to go on, it’s hard to generalize. Trying to keep things simple.

that's all the CLDR format is supposed to be, a set of simple formats that are generalized enough to cover all locales, with tons of examples.

from d3-format.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.