patch / cldr-number-pm5 Goto Github PK

View Code? Open in Web Editor NEW

8.0 6.0 3.0 475 KB

Localized number formatters using the Unicode CLDR

Home Page: https://metacpan.org/pod/CLDR::Number

License: Other

Perl 100.00%

unicode cldr i18n perl5

cldr-number-pm5's Introduction

NAME

CLDR::Number - Localized number formatters using the Unicode CLDR

VERSION

This document describes CLDR::Number v0.19, built with Unicode CLDR v29.

SYNOPSIS

use CLDR::Number;

# new object with 'es' (Spanish) locale
$cldr = CLDR::Number->new(locale => 'es');

# decimals
$decf = $cldr->decimal_formatter;

# when locale is 'es' (Spanish)
say $decf->format(1234.5);  # '1234,5'

# when locale is 'es-MX' (Mexican Spanish)
say $decf->format(1234.5);  # '1,234.5'

# when locale is 'ar' (Arabic)
say $decf->format(1234.5);  # '١٬٢٣٤٫٥'

# percents
$perf = $cldr->percent_formatter;

# when locale is 'tr' (Turkish)
say $perf->format(0.05);  # '%5'

# currencies
$curf = $cldr->currency_formatter(currency_code => 'USD');

# when locale is 'en' (English) and currency is USD (US dollars)
say $curf->format(9.99);  # '$9.99'

# when locale is 'en-CA' (Canadian English) and currency is USD
say $curf->format(9.99);  # 'US$9.99'

# when locale is 'fr-CA' (Canadian French) and currency is USD
say $curf->format(9.99);  # '9,99 $ US'

DEPRECATION

Using the locale method as a setter is deprecated. In the future the object’s locale will become immutable. Please see issue #38 for details and to submit comments or concerns.

DESCRIPTION

Software localization includes much more than just translations. Numbers, prices, and even percents should all be localized based on the user’s language, script, and region. Fortunately, the Unicode Common Locale Data Repository (CLDR) provides locale data and specifications for formatting numeric data to use with many of the world’s locales.

This class provides common attributes shared among the supported formatter classes as well as methods to instantiate decimal, percent, and currency formatter objects. The value for any attribute (such as locale or decimal_sign) will be passed to the formatter objects on instantiation but can be overwritten by manually passing another value for the attribute or calling a setter method on the formatter object.

Methods

decimal_formatter

Returns a decimal formatter, which is a CLDR::Number::Format::Decimal object instantiated with all of the attributes from your CLDR::Number object as well as any attributes passed to this method.
percent_formatter

Returns a percent formatter, which is a CLDR::Number::Format::Percent object instantiated with all of the attributes from your CLDR::Number object as well as any attributes passed to this method.
currency_formatter

Returns a currency formatter, which is a CLDR::Number::Format::Currency object instantiated with all of the attributes from your CLDR::Number object as well as any attributes passed to this method.

Common Attributes

These are common attributes among this class and all formatter classes. All attributes other than locale, default_locale, and cldr_version have defaults that change depending on the current locale. All string attributes are expected to be character strings, not byte strings.

locale

Default: value of default_locale attribute if it exists, otherwise root

Valid: Unicode locale identifiers

Examples: es (Spanish), es-ES (European Spanish), es-419 (Latin American Spanish), zh-Hant (Traditional Chinese), zh-Hans (Simplified Chinese), chr (Cherokee)

The locale is case-insensitive and can use either - (hyphen-minus) or _ (low line) as a separator.
default_locale

Default: none

Valid: Unicode locale identifiers

Use this if you want a locale other than the generic root if the locale attribute is not set or not valid.
numbering_system

Valid: currently only decimal numbering systems are supported

Examples: latn (Western Digits), arab (Arabic-Indic Digits), hanidec (Chinese Decimal Numerals), fullwide (Full Width Digits)

In the future, algorithmic numbering systems like hant (Traditional Chinese Numerals), hebr (Hebrew Numerals), and roman (Roman Numerals) will be supported.

The numbering system may alternately be provided as a Unicode locale extension subtag. For example, locale ja-u-nu-fullwide for the Japanese language (ja) with the numbering system (nu) set to Full Width Digits (fullwide).
decimal_sign

Examples: . (full stop) for root, en; , (comma) for de, fr
group_sign

Examples: , (comma) for root, en; . (full stop) for de; (no-break space) for fr
plus_sign

Examples: + (plus sign) for root, en, and most locales
minus_sign

Examples: - (hyphen-minus) for root, en, and most locales
infinity

Examples: ∞ (infinity) for root, en, and almost all locales
nan

Examples: NaN for root, en, and most locales; many other variations for individual locales like не число for ru and 非數值 for zh-Hant
cldr_version

Value: 29

This is a read-only attribute that will always reflect the currently supported Unicode CLDR version.

NOTES

The Unicode private-use characters U+F8F0 through U+F8F4 are used internally and are therefore not supported in custom patterns and signs.

AUTHOR

Nova Patch <[email protected]>

This project is brought to you by Shutterstock. Additional open source projects from Shutterstock can be found at code.shutterstock.com.

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Unicode is a registered trademark of Unicode, Inc., in the United States and other countries.

cldr-number-pm5's People

Contributors

Stargazers

Watchers

Forkers

redhotpenguin syspete d-e-f-e-a-t

cldr-number-pm5's Issues

move tests to Test::Class

integrate repo with Coveralls

https://coveralls.io/r/perl-cldr/cldr-number-pm5

escaped quoting bug in Perl v5.8.8

Escaped quotes are being returned in formats as \xF7\xB0\x80\x84 (utf8-encoded \x{1F0000}) instead of the proper '. This is happening in all two CPAN Testers’ reports for Perl v5.8.8 and no other versions. The other reports from v5.8.x are v5.8.5 and v5.8.9, which do not have this problem.

Here are the related CPAN Testers’ reports:

Here are the three failing tests, which are the same in both reports:

#   Failed test 'single quote itself'
#   at t/from_uts35.t line 57.
#          got: '1 oÃ·Â°Â€Â„clock'
#     expected: '1 o'clock'
# Looks like you failed 1 test of 41.
t/from_uts35.t ........ 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/41 subtests 

#   Failed test at t/quoting.t line 16.
#          got: 'Ã·Â°Â€Â„123Ã·Â°Â€Â„'
#     expected: ''123''

#   Failed test at t/quoting.t line 17.
#          got: '#Ã·Â°Â€Â„#'
#     expected: '#'#'
# Looks like you failed 2 tests of 7.
t/quoting.t ........... 
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/7 subtests

We need plurals data from CLDR.

other numbering systems (native, traditional, finance)

Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod

locale subtag attributes for use without locale attribute parsing

Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod

Moo::Role-related bug in Perl 5.8.1 through 5.8.3

Most releases of CLDR::Number have had inconsistent but common Moo::Role-related test failures in Perl v5.8.1 through v5.8.3. The oldest version of Perl that has not been known to have this problem is v5.8.4, although there are very few reports on that version.

We should either figure out the problem and fix it, or raise the minimum version of Perl from v5.8.1 (September 2003) to v5.8.4 (April 2004), which I would not be against.

Test reports:
http://matrix.cpantesters.org/?dist=CLDR-Number+0.19

Typical output:

Use of uninitialized value in method lookup at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Moo/Role.pm line 138.
Use of uninitialized value in method lookup at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Moo/Role.pm line 138.
Can't locate object method "is_role" via package "Moo::Role" at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Moo/Role.pm line 138.
BEGIN failed--compilation aborted at /home/njh/.cpan/build/CLDR-Number-0.19-M6Ajmp/blib/lib/CLDR/Number/Role/Format.pm line 13.
Compilation failed in require at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Module/Runtime.pm line 313.
Compilation failed in require at /home/njh/.cpan/build/CLDR-Number-0.19-M6Ajmp/blib/lib/CLDR/Number.pm line 32.

create package for common constants

So we can define $N, $P, $C, $M, and $Q all in one place.

currency spacing rules

Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod

fix failing tests from CPAN Testers reports

We've been getting a lot of reports like this with 3 failing tests since the first CPAN upload this morning:
http://www.cpantesters.org/cpan/report/f66b08b0-6612-11e3-8a8a-6b1ebd322218

In fact, they all seem to be failing:
http://matrix.cpantesters.org/?dist=CLDR-Number+0.00_02

load locale data for each locale from a different module

Suggested by @aarondcohen:

[13:43] Aaron Cohen: as an added benefit, you could break CLDR::Number::Data::* up by locale
[13:43] Aaron Cohen: so epople would load less into memory if they aren't using the other locales

Although the number-related locale data is relatively small per locale, the aggregate is increasingly large with each CLDR release. Another idea is to remove any data from a locale that is the same as what would already be inherited.

round half-even with rounding increment

By default we use Math::BigFloat for rounding and round in half-even mode. If a rounding increment greater than 1 is provided in the pattern or via the rounding_increment attribute, we instead use Math::Round::nearest because it supports rounding increments; however, it does not support half-even rounding, which I believe we should be performing along with rounding increments. We need to investigate alternatives and possibly ask for clarification on the CLDR mailing list.

improve docs for a broader audience

tl;dr: Let’s improve the docs! Please add doc requests or suggestions in the comments here.

The first goal of this project was to implement the standardized Unicode CLDR–based localized number formatting defined in UTS #35, Part 3: Numbers. Much of the CLDR::Number documentation, however, does not go into detail to describe functionality to developers without existing familiarity with the CLDR. This project shouldn’t require external knowledge in order to use it. One problem is that it allows for a lot of advanced customization that most developers will never need to use or know about when they can instead depend on the defaults provided for the requested locale (and currency for prices). Perhaps the docs should be split into 100% self-contained intro-level with more examples, and advanced-level with all the gritty options and external references. These days I write much more documentation for developers than actual coding, and while I have less time for maintaining my CPAN modules, I’d like to commit some time to improve these docs.

Thanks to @Ovid for bringing this to my attention:

upgrade to CLDR v29

The cldr29 branch was generated with the CLDR v29-beta1:
https://github.com/patch/cldr-number-pm5/compare/cldr29

Everything looks good and no tests were broken. When the CLDR v29 is officially released, we can regenerate, document in Changes, and release to CPAN.

preparsed patterns for predefined locales

Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod

use Math::BigFloat as much as possible

We’re already using Math::BigFloat in most situations for rounding using the round_mode and ffround methods. Let’s continue to use if for any functionality we can, replacing existing code in CLDR::Number: is_nan, is_inf, is_pos, is_neg, etc.

locales should inherit from defined parent locales when available

Right now, the inheritance works like zh-Hant-MO → zh-Hant → zh → root, but Part 1 Core §4.1.1 Parent Locales defines exceptions in the LDML for different parents.

For example:

 <parentLocale parent="zh_Hant_HK" locales="zh-Hant-MO"/>

This would modify the inheritance to zh-Hant-MO → zh-Hant-HK → zh-Hant → zh → root.

Others are defined with a parent of root to skip normal steps altogether. The most notable problem with the current inheritance is that es-US (US Spanish), es-MX (Mexican Spanish), es-CR (Costa Rican Spanish), etc., inherit directly from es (European Spanish) instead of es-419 (Latin American Spanish).

parsed pattern caching

Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod

'accounting' currency format in addition to default 'standard'

Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod

significant digits

Add the significant_digits attribute, the @ symbol in patterns, and associated functionality described in UTS #35, Part 3, §3.5.

superscripting exponent format for scientific notation

Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod

remove Math::BigFloat for Inf/NaN checking

We started using Math::BigFloat in CLDR::Number v0.14 [issue #45] to check for infinity, NaN, and negatives, but this addition has created many failing test reports:

http://matrix.cpantesters.org/?dist=CLDR-Number+0.14

It turns out that Perl 5.22 overhauled infinity and NaN values to be more consistent across platforms and operations, including stringifying to Inf and NaN instead of the previous inf and nan; however, Math::BigFloat doesn’t understand those titlecased values and treats them both as NaN. We’re better off performing the checks ourselves for now, as well as submitting an issue for the Math::BigInt project.

To: cldr-users
Subject: Preliminary JSON available for release 28
From: John Emmons
Date: Tue, 1 Sep 2015 00:09:28 -0500

A preliminary version of the JSON for the upcoming CLDR release 28 is now
available on github for testing. Please see
https://github.com/unicode-cldr/cldr-json for details. Any errors or
omissions should be reported via CLDR trac by filing a new ticket at
http://unicode.org/cldr/trac/newticket

support CLDR v27

CLDR v25 was released today:
http://unicode-inc.blogspot.com/2014/03/cldr-version-25-released.html

The changes are primarily structural in nature and very few of these changes affect numbers, while none of these structural changes affect the implemented portions of CLDR::Number.

Here are the locale data changes that affect us:

new locales fy (West Frisian), fy-NL, ug (Uyghur), ug-Arab, ug-Arab-CN, prg (Prussian)
data improvements for official languages
number symbol fixes

Additionally there is "Better locale matching, with better fallbacks; likely subtags for regions; added scripts for various languages" but our locale matching and fallbacks were already rather minimal. We should obviously use the new version when implementing matching/fallback improvements.

deprecate mutable locales

The locale attribute being mutable has caused additional code, complexity, and bugs. The problem is that it is a rw attribute that sets a dozen or so other rw attributes. It's difficult to maintain these inherited attributes that should be lazy, publicly writable, and change based on changes to locale. The solution is to change locale from rw to ro. This is backward-incompatible, but there are no known real-world uses of a mutable locale other than convenience in unit tests and examples.

Publicly announce upcoming deprecation of the locale method used as a setter and request feedback.
Document the deprecation in the next release of CLDR::Number.
Warn when mutating the locale in a further release.
Finally, change the locale from rw to ro and remove related code.

Comments and suggestions highly appreciated!

currency symbol lengths

Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod

validate method arguments

Consider using Params::Validate. See also #22 for handling undef.

support spelled-out currencies

Add support for spelled-out currencies using the unitPattern and displayName with a count attribute. For example, 5000 JPY (Japanese Yen) in ja (Japanese) would be 5,000 円 (as opposed to ￥5,000), which uses the unitPattern {0} {1} and displayName 円 with the count other. See UTS #35, Part 3, §4: Currencies for details.

Review the ICU API for this feature and determine what attribute should be used to enable it. Also consider how to best store and load the data because it will take much more memory than the other currency data.

This feature has been requested by users.

Using Locale::CLDR corrupts CLDR::Number

In a project I am using CLDR::Number for quite some time to format numbers in the right locale.

Now I want to use Locale::CLDR to get country names in the correct language. However, as soon as I use Locale::CLDR, formatting an integer number via CLDR::Number fails with the message:
Can't locate object method "ffround" via package "Math::BigInt" at <path_to>/perllib/CLDR/Number/Role/Format.pm line 260

I can easily reproduce this using the following script:

#!/usr/bin/perl

use strict;

use CLDR::Number;
use Locale::CLDR;

my $cldr = CLDR::Number->new(locale => 'en');
my $formatter = $cldr->decimal_formatter(minimum_fraction_digits => 2, maximum_fraction_digits => 2);
print 'Success: ', $formatter->format(15.23), "\n";
print 'Fail: ', $formatter->format(42.0), "\n";

Here, the formatting of the number 42 will fail with the indicated message. As soon as I remove the line use Locale::CLDR, the formatting works as expected.

Do you know why using Locale::CLDR causes CLDR::Number to break? I know that the latter is a somewhat older module, but I do not want to let go of it. If there is a more up-to-date module with a similar interface as CLDR::Number, then I will definitely check it out.

handle undef as method argument

Handle undef by warning and returning undef like core Perl functions.

support different rounding modes

As per the CLDR spec (see below), default rounding is half-even. There is no current way to change the rounding mode. We use Math::BigFloat, which supports the following modes: even, odd, +inf, -inf, zero, trunc, common. Let's add a rounding_mode attribute and decide if we should use the same modes and names as Math::BigFloat.

An implementation may allow the specification of a rounding mode to determine how values are rounded. In the absence of such choices, the default is to round "half-even", as described in IEEE arithmetic. That is, it rounds towards the "nearest neighbor" unless both neighbors are equidistant, in which case, it rounds towards the even neighbor. Behaves as for round "half-up" if the digit to the left of the discarded fraction is odd; behaves as for round "half-down" if it's even. Note that this is the rounding mode that minimizes cumulative error when applied repeatedly over a sequence of calculations.

Tests fail (with latest Moo?)

There are new test failures — see http://www.cpantesters.org/cpan/report/487b0514-e060-11e5-a971-eac272d7c31d for a sample.

Statistical analysis from test failures generated on my machine suggests that the problem is caused by the latest Moo (negative theta is bad):

****************************************************************
Regression 'mod:Moo'
****************************************************************
Name                   Theta          StdErr     T-stat
[0='const']           1.0000          0.0000    30849180474401392.00
[1='eq_1.007000']             0.0000          0.0000       0.00
[2='eq_2.000001']             0.0000          0.0000       1.98
[3='eq_2.000002']             0.0000          0.0000       3.36
[4='eq_2.001000']            -1.0000          0.0000    -28977759259709780.00

R^2= 1.000, N= 74, K= 5
****************************************************************

format lengths (full, long, medium, short, narrow)

CLDR::Number::Role::Base already has the length attribute, which is not currently used. Valid lengths are full, long, medium, short, and narrow.

The desired functionality is described in UTS #35:

We should create a new test file: t/length.t

infinity and NaN are not supported by all perls

Some older perls on some systems don’t support inf and nan. Here are a few failing test reports from CLDR::Number v0.12.

I think we should just test for support in the test file t/inf-nan.t and skip with a diag warning when not supported, as well as documenting that the feature depends on perl’s support for the given system.

maximum integer digits

Implement the functionality supplied by the maximum_integer_digits attribute, which already exists as a stub. There doesn’t appear to be a symbol associated with this.

UTS #35, Part 3, §3.3:

If the number of actual integer digits exceeds the maximum integer digits, then only the least significant digits are shown. For example, 1997 is formatted as 97 if the maximum integer digits is set to 2.

change internal placeholder non-Unicode codepoints to PUA

Change non-Unicode codepoints to Private Use Area codepoints. These are internally used as placeholders. We're currently using U+1F0000, U+1F0001, U+1F0002, U+1F0003, and U+1F0004, but this caused bug #20, which required a hacky workaround.

quiet down expected warnings in tests

Use Test::Warnings so we don't actually warn to STDERR while running tests.

Here's the only current problem:

t/inheritance.t ....... ok
default_locale 'xx' is unknown at (eval 36) line 44.

add FAQ about fallback for non-existant locales

Users occasionally report that the wrong formatting is used for several non-existant locales including Mexican English (en-MX) and Brazilian Spanish (es-BR). We should document that since these locales don’t exist, they would fall back to English (en) and Spanish (es), respectively.

Note that I also plan on bringing up the issue to the CLDR Technical Committee that es-XX, where XX is any country within Latin America (419) should fall back to es-419 even if es-XX is not a valid locale. This would, however, require a new structure added to the LDML spec unless a locale was created for each combination of es with each remaining country within 419.

integrate repo with Travis CI

https://travis-ci.org/perl-cldr/cldr-number-pm5

add algorithmic (non-decimal) numbering systems

We now support non-Latin (latn) numbering systems, but only decimal systems, not algorithmic systems like hant (Traditional Chinese Numerals), hebr (Hebrew Numerals), roman (Roman Numerals), etc.

write FAQ

Started this in commit fdc3e57.

number parsers

Add number parsers under CLDR::Number::Parse as described in UTS #35, Part 3, §7.

default numbering systems

Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod

add minimum grouping digits

Minimum grouping digits were added to the spec in CLDR v26 (#33). LDML stores the related value as minimumGroupingDigits and we should add the minimum_grouping_digits attribute.

http://www.unicode.org/reports/tr35/tr35-numbers.html#Number_Elements

The minimumGroupingDigits can be used to suppress groupings below a certain value. This is used for languages such as Polish, where one would only write the grouping separator for values above 9999. The minimumGroupingDigits contains the default for the locale.

http://cldr.unicode.org/translation/numbering-systems

In some languages, the grouping separator is suppressed in certain cases. For example, see china-auf-wachstumskurs.gif, where there is a grouping separator in 12 080 but not in 4720. The minimumGroupingDigits determines what the default for a locale is.

possible Moo v1.000006 / v1.000007 bug

Three CPAN Testers’ reports are reporting massive test failures that may be related to Moo v1.000006 and v1.000007. More investigation is needed.

Here are the related CPAN Testers’ reports:

non-Latin (latn) numbering systems (thai, geor, hant, etc.)

May be easier to start out with numbering systems that have a @type of numeric as well as a value for @digits.

Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod

format inf, -inf, and nan

Perl treats inf, -inf, and nan as numbers; CLDR has formats for infinity, nan, and the negative sign; so let's format them appropriately.

patch / cldr-number-pm5 Goto Github PK

cldr-number-pm5's Introduction

NAME

VERSION

SYNOPSIS

DEPRECATION

DESCRIPTION

Methods

Common Attributes

NOTES

SEE ALSO

AUTHOR

COPYRIGHT AND LICENSE

cldr-number-pm5's People

Contributors

Stargazers

Watchers

Forkers

cldr-number-pm5's Issues

Recommend Projects

Recommend Topics

Recommend Org