mike-fabian / langtable Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v3.0
License: GNU General Public License v3.0
It's documented in the README but doesn't seem to exist?
Note also that the 'test' target doesn't work on a released archived unpacked since it depends on gzip and tries to compress the .xml which aren't included in the tarball
I couldn't find any Norwegian data in langtable?
Is that expected or surprising?
$ python3
Python 3.9.5 (default, May 14 2021, 00:00:00)
[GCC 11.1.1 20210428 (Red Hat 11.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import langtable
>>> langtable.language_name(languageId='ks')
'کٲشُر'
>>> langtable.language_name(languageId='ks_Deva')
'कॉशुर'
>>> langtable.language_name(languageId='fr', languageIdQuery='ks')
'فرینچ'
>>> langtable.language_name(languageId='fr', languageIdQuery='ks_Deva')
'فرینچ'
>>>
The South African languages Northern Sotho (nso), Tswana (tn) and Venda (ve) need the South African keyboard layout ("za" - a variant of en_US).
The default keyboard layout in Chinese (Taiwan) is actually "English (US)" layout.
People here don't know about what is "cn" keyboard or "zh" language about the keyboard. What users know is they are using a "English (US)" keyboard although "cn" layout is the same as "English (US)" layout.
The Linux installer such as Anaconda from Fedora uses langtable to decide the default keyboard layout for the user's region. And it selects "cn" keyboard for Chinese (Taiwan) users automatically.
However, users in Chinese (Taiwan) typically uses input methods (ibus-libzhuyin or ibus-chewing) to key-in the Chinese characters (Hazi), and switch back to "English (US)" keyboard to type in English.
The table lists the default keyboard layout of Chinese (Taiwan) as "cn" keyboard or "zh" language which makes them confuse. People here see "cn" or "zh" words as Chinese input keyboard and think that as the default input method for Chinese. And the result is that people keeps complaining about why there is no "en" keyboard for them to type English and why "cn" keyboard cannot type Chinese characters. People used to have "English (US)" keyboard as the default layout and some Chinese input methods to type Chinese. The Chinese (Taiwan) locale on Windows and Mac receives "English (US)" keyboard and some Chinese input methods by default after installation as well.
Macrolanguage subtag "zh" (Chinese) is used instead of "cmn" (Mandarin)
source: http://cldr.unicode.org/index/cldr-spec/picking-the-right-language-code#TOC-Caution-
So, I think the following part should be removed from data/languages.xml
<language>
<languageId>cmn</languageId>
...
</language>
The list_keyboards
function in langtable.py
has a variable called skipTerritory
that gets set if the language DB has an entry for the combined languageId, scriptId and territoryId or combined languageId and territoryId, but it is never used. We always go down the if territoryId in _territories_db:
path, regardless of whether skipTerritory
is True
or False
.
I'd send a PR for this, but I'm not sure if the desired fix would be to skip the territory path if skipTerritory
is True
, or remove the setting of skipTerritory
entirely.
Currently, in(eng)
is the first choice for ar_IN locale:
$ python3
Python 3.10.6 (main, Aug 2 2022, 00:00:00) [GCC 12.1.1 20220507 (Red Hat 12.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import langtable
>>> for lang in ("ar_AE", "ar_BH", "ar_DZ", "ar_EG", "ar_IN", "ar_IQ", "ar_JO", "ar_KW", "ar_LB", "ar_LY", "ar_MA", "ar_OM", "ar_QA", "ar_SA", "ar_SD", "ar_SS", "ar_SY", "ar_TN", "ar_YE"):
... print(f'Layout for {lang} is {langtable.list_keyboards(languageId=lang)}')
...
Layout for ar_AE is ['ara']
Layout for ar_BH is ['ara']
Layout for ar_DZ is ['ara(azerty)']
Layout for ar_EG is ['ara']
Layout for ar_IN is ['in(eng)', 'ara', 'ara(azerty)', 'iq', 'ma', 'sy']
Layout for ar_IQ is ['iq']
Layout for ar_JO is ['ara']
Layout for ar_KW is ['ara']
Layout for ar_LB is ['ara']
Layout for ar_LY is ['ara']
Layout for ar_MA is ['ma']
Layout for ar_OM is ['ara']
Layout for ar_QA is ['ara']
Layout for ar_SA is ['ara']
Layout for ar_SD is ['ara']
Layout for ar_SS is ['ara']
Layout for ar_SY is ['sy']
Layout for ar_TN is ['ara']
Layout for ar_YE is ['ara']
Dear maintainer,
Bug report can be summarised through a few lines of code:
$ python3
>>> import langtable
>>> langtable.list_locales(languageId='eo')
[]
>>> exit()
$ locale -a | grep eo
eo
Expected result is ['eo.UTF-8']
.
Tested on Fedora 29 and Fedora Rawhide. Related to https://bugzilla.redhat.com/show_bug.cgi?id=1652708.
See also https://sourceware.org/bugzilla/show_bug.cgi?id=23857 for issues surrounding Esperanto/glibc, and a possible cause for the issue.
Thanks!
Hello Mike, could you please extend the api with a counterpart to list_common_languages
, a new function that would return the underlying locales?
CC @OndrejZobal
Hi Mike,
I'm happy to see your tool, it seems very valuable for French users :)
I discovered it by this comment:
https://bugzilla.redhat.com/show_bug.cgi?id=485137#c20
Does sum of territory ranks must be 100?
If so, in this case, the sum is 960.
https://github.com/mike-fabian/langtable/blob/master/data/keyboards.xml#L434
<language><languageId>fr</languageId><rank>1000</rank></language>
</languages>
<territories>
<territory><territoryId>FR</territoryId><rank>900</rank></territory>
<territory><territoryId>LU</territoryId><rank>50</rank></territory>
<territory><territoryId>SN</territoryId><rank>10</rank></territory>
The current default keyboard layout for the Afrikaans language is the US layout which doesn't provide any of the diacritics that are in use in the language. The US layout is the common layout for hardware sold in South Africa, but Afrikaans can't be typed properly with it. I've noticed this incorrect default on Fedora before, but it is not my main system. It has always been some variant of US-intl on Mageia and its predecessors, which is a very good default (behaving mostly the same as the US layout anyway).
It seems that "us(altgr-intl)" would be a better default for Afrikaans, although I can't see precisely what that corresponds to on my system. On my system there is "English (US, alt. intl.)" and "English (US, intl, with dead keys)". They are pretty similar and either would be a better default than the plain US layout.
Feel free to ask if anything else is required.
part of an ipython session:
In [85]: print langtable.language_name(languageId="mai", languageIdQuery="en")
Maithili
In [86]: print langtable.language_name(languageId="mai", languageIdQuery="mai")
In [87]:
Having that name would be useful for the Anaconda installer that has translations for that language and tries to show both it's native and english name.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.