Coder Social home page Coder Social logo

cldr-number-pm5's Introduction

Perl CLDR

Build status Coverage status CPAN version

NAME

CLDR::Number - Localized number formatters using the Unicode CLDR

VERSION

This document describes CLDR::Number v0.19, built with Unicode CLDR v29.

SYNOPSIS

use CLDR::Number;

# new object with 'es' (Spanish) locale
$cldr = CLDR::Number->new(locale => 'es');

# decimals
$decf = $cldr->decimal_formatter;

# when locale is 'es' (Spanish)
say $decf->format(1234.5);  # '1234,5'

# when locale is 'es-MX' (Mexican Spanish)
say $decf->format(1234.5);  # '1,234.5'

# when locale is 'ar' (Arabic)
say $decf->format(1234.5);  # '١٬٢٣٤٫٥'

# percents
$perf = $cldr->percent_formatter;

# when locale is 'tr' (Turkish)
say $perf->format(0.05);  # '%5'

# currencies
$curf = $cldr->currency_formatter(currency_code => 'USD');

# when locale is 'en' (English) and currency is USD (US dollars)
say $curf->format(9.99);  # '$9.99'

# when locale is 'en-CA' (Canadian English) and currency is USD
say $curf->format(9.99);  # 'US$9.99'

# when locale is 'fr-CA' (Canadian French) and currency is USD
say $curf->format(9.99);  # '9,99 $ US'

DEPRECATION

Using the locale method as a setter is deprecated. In the future the object’s locale will become immutable. Please see issue #38 for details and to submit comments or concerns.

DESCRIPTION

Software localization includes much more than just translations. Numbers, prices, and even percents should all be localized based on the user’s language, script, and region. Fortunately, the Unicode Common Locale Data Repository (CLDR) provides locale data and specifications for formatting numeric data to use with many of the world’s locales.

This class provides common attributes shared among the supported formatter classes as well as methods to instantiate decimal, percent, and currency formatter objects. The value for any attribute (such as locale or decimal_sign) will be passed to the formatter objects on instantiation but can be overwritten by manually passing another value for the attribute or calling a setter method on the formatter object.

Methods

  • decimal_formatter

    Returns a decimal formatter, which is a CLDR::Number::Format::Decimal object instantiated with all of the attributes from your CLDR::Number object as well as any attributes passed to this method.

  • percent_formatter

    Returns a percent formatter, which is a CLDR::Number::Format::Percent object instantiated with all of the attributes from your CLDR::Number object as well as any attributes passed to this method.

  • currency_formatter

    Returns a currency formatter, which is a CLDR::Number::Format::Currency object instantiated with all of the attributes from your CLDR::Number object as well as any attributes passed to this method.

Common Attributes

These are common attributes among this class and all formatter classes. All attributes other than locale, default_locale, and cldr_version have defaults that change depending on the current locale. All string attributes are expected to be character strings, not byte strings.

  • locale

    Default: value of default_locale attribute if it exists, otherwise root

    Valid: Unicode locale identifiers

    Examples: es (Spanish), es-ES (European Spanish), es-419 (Latin American Spanish), zh-Hant (Traditional Chinese), zh-Hans (Simplified Chinese), chr (Cherokee)

    The locale is case-insensitive and can use either - (hyphen-minus) or _ (low line) as a separator.

  • default_locale

    Default: none

    Valid: Unicode locale identifiers

    Use this if you want a locale other than the generic root if the locale attribute is not set or not valid.

  • numbering_system

    Valid: currently only decimal numbering systems are supported

    Examples: latn (Western Digits), arab (Arabic-Indic Digits), hanidec (Chinese Decimal Numerals), fullwide (Full Width Digits)

    In the future, algorithmic numbering systems like hant (Traditional Chinese Numerals), hebr (Hebrew Numerals), and roman (Roman Numerals) will be supported.

    The numbering system may alternately be provided as a Unicode locale extension subtag. For example, locale ja-u-nu-fullwide for the Japanese language (ja) with the numbering system (nu) set to Full Width Digits (fullwide).

  • decimal_sign

    Examples: . (full stop) for root, en; , (comma) for de, fr

  • group_sign

    Examples: , (comma) for root, en; . (full stop) for de;   (no-break space) for fr

  • plus_sign

    Examples: + (plus sign) for root, en, and most locales

  • minus_sign

    Examples: - (hyphen-minus) for root, en, and most locales

  • infinity

    Examples: (infinity) for root, en, and almost all locales

  • nan

    Examples: NaN for root, en, and most locales; many other variations for individual locales like не число for ru and 非數值 for zh-Hant

  • cldr_version

    Value: 29

    This is a read-only attribute that will always reflect the currently supported Unicode CLDR version.

NOTES

The Unicode private-use characters U+F8F0 through U+F8F4 are used internally and are therefore not supported in custom patterns and signs.

SEE ALSO

AUTHOR

Nova Patch <[email protected]>

This project is brought to you by Shutterstock. Additional open source projects from Shutterstock can be found at code.shutterstock.com.

COPYRIGHT AND LICENSE

© 2013–2016 Shutterstock, Inc.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Unicode is a registered trademark of Unicode, Inc., in the United States and other countries.

cldr-number-pm5's People

Contributors

mnlagrasta avatar oalders avatar patch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cldr-number-pm5's Issues

escaped quoting bug in Perl v5.8.8

Escaped quotes are being returned in formats as \xF7\xB0\x80\x84 (utf8-encoded \x{1F0000}) instead of the proper '. This is happening in all two CPAN Testers’ reports for Perl v5.8.8 and no other versions. The other reports from v5.8.x are v5.8.5 and v5.8.9, which do not have this problem.

Here are the related CPAN Testers’ reports:

Here are the three failing tests, which are the same in both reports:

#   Failed test 'single quote itself'
#   at t/from_uts35.t line 57.
#          got: '1 o÷°€„clock'
#     expected: '1 o'clock'
# Looks like you failed 1 test of 41.
t/from_uts35.t ........ 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/41 subtests 

#   Failed test at t/quoting.t line 16.
#          got: '÷°€„123÷°€„'
#     expected: ''123''

#   Failed test at t/quoting.t line 17.
#          got: '#÷°€„#'
#     expected: '#'#'
# Looks like you failed 2 tests of 7.
t/quoting.t ........... 
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/7 subtests

Moo::Role-related bug in Perl 5.8.1 through 5.8.3

Most releases of CLDR::Number have had inconsistent but common Moo::Role-related test failures in Perl v5.8.1 through v5.8.3. The oldest version of Perl that has not been known to have this problem is v5.8.4, although there are very few reports on that version.

We should either figure out the problem and fix it, or raise the minimum version of Perl from v5.8.1 (September 2003) to v5.8.4 (April 2004), which I would not be against.

Test reports:
http://matrix.cpantesters.org/?dist=CLDR-Number+0.19

Typical output:

Use of uninitialized value in method lookup at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Moo/Role.pm line 138.
Use of uninitialized value in method lookup at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Moo/Role.pm line 138.
Can't locate object method "is_role" via package "Moo::Role" at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Moo/Role.pm line 138.
BEGIN failed--compilation aborted at /home/njh/.cpan/build/CLDR-Number-0.19-M6Ajmp/blib/lib/CLDR/Number/Role/Format.pm line 13.
Compilation failed in require at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Module/Runtime.pm line 313.
Compilation failed in require at /home/njh/.cpan/build/CLDR-Number-0.19-M6Ajmp/blib/lib/CLDR/Number.pm line 32.

load locale data for each locale from a different module

Suggested by @aarondcohen:

[13:43] Aaron Cohen: as an added benefit, you could break CLDR::Number::Data::* up by locale
[13:43] Aaron Cohen: so epople would load less into memory if they aren't using the other locales

Although the number-related locale data is relatively small per locale, the aggregate is increasingly large with each CLDR release. Another idea is to remove any data from a locale that is the same as what would already be inherited.

round half-even with rounding increment

By default we use Math::BigFloat for rounding and round in half-even mode. If a rounding increment greater than 1 is provided in the pattern or via the rounding_increment attribute, we instead use Math::Round::nearest because it supports rounding increments; however, it does not support half-even rounding, which I believe we should be performing along with rounding increments. We need to investigate alternatives and possibly ask for clarification on the CLDR mailing list.

See also issue #30.

improve docs for a broader audience

tl;dr: Let’s improve the docs! Please add doc requests or suggestions in the comments here.

The first goal of this project was to implement the standardized Unicode CLDR–based localized number formatting defined in UTS #35, Part 3: Numbers. Much of the CLDR::Number documentation, however, does not go into detail to describe functionality to developers without existing familiarity with the CLDR. This project shouldn’t require external knowledge in order to use it. One problem is that it allows for a lot of advanced customization that most developers will never need to use or know about when they can instead depend on the defaults provided for the requested locale (and currency for prices). Perhaps the docs should be split into 100% self-contained intro-level with more examples, and advanced-level with all the gritty options and external references. These days I write much more documentation for developers than actual coding, and while I have less time for maintaining my CPAN modules, I’d like to commit some time to improve these docs.

Thanks to @Ovid for bringing this to my attention:

use Math::BigFloat as much as possible

We’re already using Math::BigFloat in most situations for rounding using the round_mode and ffround methods. Let’s continue to use if for any functionality we can, replacing existing code in CLDR::Number: is_nan, is_inf, is_pos, is_neg, etc.

locales should inherit from defined parent locales when available

Right now, the inheritance works like zh-Hant-MOzh-Hantzhroot, but Part 1 Core §4.1.1 Parent Locales defines exceptions in the LDML for different parents.

For example:

 <parentLocale parent="zh_Hant_HK" locales="zh-Hant-MO"/>

This would modify the inheritance to zh-Hant-MOzh-Hant-HKzh-Hantzhroot.

Others are defined with a parent of root to skip normal steps altogether. The most notable problem with the current inheritance is that es-US (US Spanish), es-MX (Mexican Spanish), es-CR (Costa Rican Spanish), etc., inherit directly from es (European Spanish) instead of es-419 (Latin American Spanish).

remove Math::BigFloat for Inf/NaN checking

We started using Math::BigFloat in CLDR::Number v0.14 [issue #45] to check for infinity, NaN, and negatives, but this addition has created many failing test reports:

http://matrix.cpantesters.org/?dist=CLDR-Number+0.14

It turns out that Perl 5.22 overhauled infinity and NaN values to be more consistent across platforms and operations, including stringifying to Inf and NaN instead of the previous inf and nan; however, Math::BigFloat doesn’t understand those titlecased values and treats them both as NaN. We’re better off performing the checks ourselves for now, as well as submitting an issue for the Math::BigInt project.

support CLDR v27

CLDR v25 was released today:
http://unicode-inc.blogspot.com/2014/03/cldr-version-25-released.html

The changes are primarily structural in nature and very few of these changes affect numbers, while none of these structural changes affect the implemented portions of CLDR::Number.

Here are the locale data changes that affect us:

  • new locales fy (West Frisian), fy-NL, ug (Uyghur), ug-Arab, ug-Arab-CN, prg (Prussian)
  • data improvements for official languages
  • number symbol fixes

Additionally there is "Better locale matching, with better fallbacks; likely subtags for regions; added scripts for various languages" but our locale matching and fallbacks were already rather minimal. We should obviously use the new version when implementing matching/fallback improvements.

deprecate mutable locales

The locale attribute being mutable has caused additional code, complexity, and bugs. The problem is that it is a rw attribute that sets a dozen or so other rw attributes. It's difficult to maintain these inherited attributes that should be lazy, publicly writable, and change based on changes to locale. The solution is to change locale from rw to ro. This is backward-incompatible, but there are no known real-world uses of a mutable locale other than convenience in unit tests and examples.

  1. Publicly announce upcoming deprecation of the locale method used as a setter and request feedback.
  2. Document the deprecation in the next release of CLDR::Number.
  3. Warn when mutating the locale in a further release.
  4. Finally, change the locale from rw to ro and remove related code.

Comments and suggestions highly appreciated!

support spelled-out currencies

Add support for spelled-out currencies using the unitPattern and displayName with a count attribute. For example, 5000 JPY (Japanese Yen) in ja (Japanese) would be 5,000 円 (as opposed to ¥5,000), which uses the unitPattern {0} {1} and displayName with the count other. See UTS #35, Part 3, §4: Currencies for details.

Review the ICU API for this feature and determine what attribute should be used to enable it. Also consider how to best store and load the data because it will take much more memory than the other currency data.

This feature has been requested by users.

Using Locale::CLDR corrupts CLDR::Number

In a project I am using CLDR::Number for quite some time to format numbers in the right locale.

Now I want to use Locale::CLDR to get country names in the correct language. However, as soon as I use Locale::CLDR, formatting an integer number via CLDR::Number fails with the message:
Can't locate object method "ffround" via package "Math::BigInt" at <path_to>/perllib/CLDR/Number/Role/Format.pm line 260

I can easily reproduce this using the following script:

#!/usr/bin/perl

use strict;

use CLDR::Number;
use Locale::CLDR;

my $cldr = CLDR::Number->new(locale => 'en');
my $formatter = $cldr->decimal_formatter(minimum_fraction_digits => 2, maximum_fraction_digits => 2);
print 'Success: ', $formatter->format(15.23), "\n";
print 'Fail: ', $formatter->format(42.0), "\n";

Here, the formatting of the number 42 will fail with the indicated message. As soon as I remove the line use Locale::CLDR, the formatting works as expected.

Do you know why using Locale::CLDR causes CLDR::Number to break? I know that the latter is a somewhat older module, but I do not want to let go of it. If there is a more up-to-date module with a similar interface as CLDR::Number, then I will definitely check it out.

support different rounding modes

As per the CLDR spec (see below), default rounding is half-even. There is no current way to change the rounding mode. We use Math::BigFloat, which supports the following modes: even, odd, +inf, -inf, zero, trunc, common. Let's add a rounding_mode attribute and decide if we should use the same modes and names as Math::BigFloat.

An implementation may allow the specification of a rounding mode to determine how values are rounded. In the absence of such choices, the default is to round "half-even", as described in IEEE arithmetic. That is, it rounds towards the "nearest neighbor" unless both neighbors are equidistant, in which case, it rounds towards the even neighbor. Behaves as for round "half-up" if the digit to the left of the discarded fraction is odd; behaves as for round "half-down" if it's even. Note that this is the rounding mode that minimizes cumulative error when applied repeatedly over a sequence of calculations.

Tests fail (with latest Moo?)

There are new test failures — see http://www.cpantesters.org/cpan/report/487b0514-e060-11e5-a971-eac272d7c31d for a sample.

Statistical analysis from test failures generated on my machine suggests that the problem is caused by the latest Moo (negative theta is bad):

****************************************************************
Regression 'mod:Moo'
****************************************************************
Name                   Theta          StdErr     T-stat
[0='const']           1.0000          0.0000    30849180474401392.00
[1='eq_1.007000']             0.0000          0.0000       0.00
[2='eq_2.000001']             0.0000          0.0000       1.98
[3='eq_2.000002']             0.0000          0.0000       3.36
[4='eq_2.001000']            -1.0000          0.0000    -28977759259709780.00

R^2= 1.000, N= 74, K= 5
****************************************************************

maximum integer digits

Implement the functionality supplied by the maximum_integer_digits attribute, which already exists as a stub. There doesn’t appear to be a symbol associated with this.

UTS #35, Part 3, §3.3:

If the number of actual integer digits exceeds the maximum integer digits, then only the least significant digits are shown. For example, 1997 is formatted as 97 if the maximum integer digits is set to 2.

change internal placeholder non-Unicode codepoints to PUA

Change non-Unicode codepoints to Private Use Area codepoints. These are internally used as placeholders. We're currently using U+1F0000, U+1F0001, U+1F0002, U+1F0003, and U+1F0004, but this caused bug #20, which required a hacky workaround.

quiet down expected warnings in tests

Use Test::Warnings so we don't actually warn to STDERR while running tests.

Here's the only current problem:

t/inheritance.t ....... ok
default_locale 'xx' is unknown at (eval 36) line 44.

add FAQ about fallback for non-existant locales

Users occasionally report that the wrong formatting is used for several non-existant locales including Mexican English (en-MX) and Brazilian Spanish (es-BR). We should document that since these locales don’t exist, they would fall back to English (en) and Spanish (es), respectively.

Note that I also plan on bringing up the issue to the CLDR Technical Committee that es-XX, where XX is any country within Latin America (419) should fall back to es-419 even if es-XX is not a valid locale. This would, however, require a new structure added to the LDML spec unless a locale was created for each combination of es with each remaining country within 419.

add algorithmic (non-decimal) numbering systems

We now support non-Latin (latn) numbering systems, but only decimal systems, not algorithmic systems like hant (Traditional Chinese Numerals), hebr (Hebrew Numerals), roman (Roman Numerals), etc.

add minimum grouping digits

Minimum grouping digits were added to the spec in CLDR v26 (#33). LDML stores the related value as minimumGroupingDigits and we should add the minimum_grouping_digits attribute.

http://www.unicode.org/reports/tr35/tr35-numbers.html#Number_Elements

The minimumGroupingDigits can be used to suppress groupings below a certain value. This is used for languages such as Polish, where one would only write the grouping separator for values above 9999. The minimumGroupingDigits contains the default for the locale.

http://cldr.unicode.org/translation/numbering-systems

In some languages, the grouping separator is suppressed in certain cases. For example, see china-auf-wachstumskurs.gif, where there is a grouping separator in 12 080 but not in 4720. The minimumGroupingDigits determines what the default for a locale is.

format inf, -inf, and nan

Perl treats inf, -inf, and nan as numbers; CLDR has formats for infinity, nan, and the negative sign; so let's format them appropriately.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.