rspeer / langcodes Goto Github PK
View Code? Open in Web Editor NEWA Python library for working with and comparing language codes.
License: MIT License
A Python library for working with and comparing language codes.
License: MIT License
Noticed a bit of an inconsistency in how langcodes is doing language replacements, so that normalizing zh-HK
and zh_HK
aren't normalized the same:
>>> import langcodes
>>> langcodes.get("zh-HK")
Language.make(language='zh', script='Hant', region='HK')
>>> langcodes.get("zh_HK")
Language.make(language='zh', region='HK')
Langcodes currently lowercases the tags to do the normalization, I think this would be fixed if it used tag_parser.normalize_characters()
instead of just .lower()
, then they would both pick up the normalization.
Great library! It was comprehensive docstrings, so it would be helpful to generate API documentation with Sphinx and publish on the Read the Docs. Published documentation is easier to browse, search and link.
https://www.sphinx-doc.org/en/master/
https://readthedocs.org/
Many many other Python projects use Sphinx and Read the Docs, eg. https://docs.python-requests.org/en/master/
If I understand correctly, currently there are two ways of getting a "language object": either through its code (Language.get('eng')
), or via a search over its natural name (langcodes.find('English')
).
However, e.g. langcodes.find('eng')
returns a LookupError
.
What about providing a "unified" function that accepts either, first matching the code and then, if not found, performing the fuzzy search over the natural names?
This would remove the need for third party codes to (re)write such a function into each project. From experience, I know that user-facing code where a "language" parameter is expected will get values in both sets (natural names, codes), no matter how many times your documentation specifies that (say) codes should be provided...
For example, the following would return the same "language object":
langcodes.find('en')
langcodes.find('eng')
langcodes.find('English')
langcodes.find('english')
>>> import langcodes
>>> langcodes.get('en-GB-oed')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
----> 1 langcodes.get('en-GB-oed')
/opt/venv/local/lib/python2.7/site-packages/langcodes/__init__.pyc in get(tag, normalize)
173 data[typ] = value
174
--> 175 return LanguageData(**data)
176
177 def to_tag(self):
TypeError: __init__() got an unexpected keyword argument 'grandfathered'
Using Language.make(x) or Language.get(x) always return a Language object. What is the best way to determine if that object is a valid language?
e.g. Users often enter 'jp' instead of 'ja' for Japanese. But 'jp' is not a valid code.
>>> Language.get('jp')
Language.make(language='jp')
>>> Language.get('jp').language_name(min_score=100)
'jp'
Would the following be correct? It seems very awkward at best to rely on this.
def is_valid(lang, min_score=75):
return lang.language_name(min_score=min_score) != lang.language
>>> Language.get("cu-Cyrs").is_valid()
False
Expected to return True
. Fails on the Cyrs
script tag
Context (wikipedia):
cu
: Old Church Slavonic
Cyrs
: Cyrillic (Old Church Slavonic variant)
Both these tags are part of https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
is_valid
returns True
for an invalid tag if all its subtags are prefixes of valid subtags.
>>> Language.get('aaj').is_valid()
True
>>> Language.get('en-Latnx').is_valid()
True
Given a language code of en-ca
, the following currently happens:
code = "en-ca"
tag = langcodes.get(code) # Language.make(language='en', territory='CA')
display_name = tag.display_name() # 'English (Canada)'
tag = langcodes.find(display_name) # <- .find does not capture the territory, yielding Language.make(language='en')
Is there any way .find could correctly get the territory from the display name?
os.getenv('LANG') returns (on my ubuntu system) a language code suffixed by .utf8. I would guess that other systems also do this. Would if be possible for langcodes.standardize_tags() to remove the .utf8 rather than throw an exception?
Hello, I've found your library when looking for some good source to get long lang codes from the short ones (cs -> cs_CZ). I've tried tag_match_score function, but for middle Europe languages it does not work as expected. See the following table:
Languages | Expected | Outputted |
---|---|---|
cs - sk | 86-95 | 16 |
cs - pl | 76-85 | 16 |
sk - pl | 76-85 | 16 |
cs - hr | 76-85 | 16 |
Maybe add:
'Serbo-Croatian' in 'en'
and
'Srpskohrvatski jezik' in 'sr-Latn'
It's not working for language codes like "az_AZ_#Latn" but its valid.
Currently the Python 2 and Python 3 code bases are separate, and each has its own PyPI package.
This situation leads to dis-alignments (e.g., on PyPI langcodes
is at v1.3, while langcodes-py2
is at v1.2), and it is problematic in general.
Is there any interest in having a unified code base? At a first glance, I do not see particular issues that impede the unification --- but maybe I am wrong. I will be willing to invest some time in this, if you are interested. (I think I have the required experience with NLP and with unified PY2/PY3 code bases.)
BTW, very nice package, thank you (Rob / Luminoso) for releasing it under the MIT license!
CLDR 30 (actually, 30.0.3) has been published: http://cldr.unicode.org/index/downloads/cldr-30
AFAIU, the current database has data from CLDR 29.
Hey, noiced that v2.0.0 and v1.4.1 are released on PyPi but not in the repo as tags/releases https://pypi.org/project/langcodes/#history
I was wondering whether there is a way of decoupling the code of langcodes from the actual language db, or, more precisely, to package langcodes with a subset of the language db.
My use case is the following: I want to have the machinery provided by langcodes (in particular, the fuzzy match of languages from a user-supplied string, and the hashable Language object), but on an extremely reduced subset of languages --- say only 100.
Currently, if I use langcodes in my application, I force the end-user to get 30+ MB of data from PyPI.
For example, for a project I am working on right now I coded this: https://github.com/pettarin/lachesis/blob/master/lachesis/language.py but I would much much happier if I could use langcodes (sans 30+ MB of data) instead.
One way to achieve this could be the following:
pip install langcodes => install langcodes "code" and download all the CLDR data (e.g. from GitHub)
pip install langcodes[nodb] => install langcodes "code" but do not download all
In the second case, the client library/application would call the "register" function at runtime, providing the data for the recognized languages, (say) the subset of the CLDR of interest to that client.
I ran into the problem that
>>> Language("da") == Language.get("da")
True
>>> Language("da") in {Language.get("da")}
False
I know that the Language
constructor specifies that It's inefficient to call this directly
but it doesn't clearly say not to call it directly. And it's an easy mistake to do when it otherwise seems to work fine.
It seems that problem could easily be fixed by updating
def __hash__(self) -> int:
return hash(id(self))
to
def __hash__(self) -> int:
return hash(self._str_tag)
but I have no idea if other parts of the code doesn't work as expected either when calling the Language
constructor directly.
The environment setup is as follows:
The operating system - Windows 10.
Python version - 3.9.6
Pip version - pip 21.2.4 from C:\ProgramData\Miniconda3\lib\site-packages\pip (python 3.9)
Terminal command - pip install langcodes[data]
During the first installation, it throws the following error:
ERROR: Command errored out with exit status 1:
command: 'C:\ProgramData\Miniconda3\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\NUM\\AppData\\Local\\Temp\\pip-install-4ppckwvi\\language-data_d7af6b72ebe74a9db4472d299a899d42\\setup.py'"'"'; __file__='"'"'C:\\Users\\NUM\\AppData\\Local\\Temp\\pip-install-4ppckwvi\\language-data_d7af6b72ebe74a9db4472d299a899d42\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\NUM\AppData\Local\Temp\pip-wheel-b1_103om'
cwd: C:\Users\NUM\AppData\Local\Temp\pip-install-4ppckwvi\language-data_d7af6b72ebe74a9db4472d299a899d42\
Complete output (54 lines):
running bdist_wheel
running build
running build_py
creating build
creating build\lib
creating build\lib\language_data
copying language_data\build_data.py -> build\lib\language_data
copying language_data\language_lists.py -> build\lib\language_data
copying language_data\names.py -> build\lib\language_data
copying language_data\name_data.py -> build\lib\language_data
copying language_data\population_data.py -> build\lib\language_data
copying language_data\registry_parser.py -> build\lib\language_data
copying language_data\util.py -> build\lib\language_data
copying language_data\__init__.py -> build\lib\language_data
running egg_info
writing language_data.egg-info\PKG-INFO
writing dependency_links to language_data.egg-info\dependency_links.txt
writing requirements to language_data.egg-info\requires.txt
writing top-level names to language_data.egg-info\top_level.txt
reading manifest file 'language_data.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
no previously-included directories found matching 'language_data\data\cldr-localenames-*'
writing manifest file 'language_data.egg-info\SOURCES.txt'
creating build\lib\language_data\data
copying language_data\data\extra_language_names.csv -> build\lib\language_data\data
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\NUM\AppData\Local\Temp\pip-install-4ppckwvi\language-data_d7af6b72ebe74a9db4472d299a899d42\setup.py", line 20, in <module>
setup(
File "C:\ProgramData\Miniconda3\lib\site-packages\setuptools\__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "C:\ProgramData\Miniconda3\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "C:\ProgramData\Miniconda3\lib\distutils\dist.py", line 966, in run_commands
self.run_command(cmd)
File "C:\ProgramData\Miniconda3\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "C:\ProgramData\Miniconda3\lib\site-packages\wheel\bdist_wheel.py", line 299, in run
self.run_command('build')
File "C:\ProgramData\Miniconda3\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\Miniconda3\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "C:\ProgramData\Miniconda3\lib\distutils\command\build.py", line 135, in run
self.run_command(cmd_name)
File "C:\ProgramData\Miniconda3\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\Miniconda3\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "C:\ProgramData\Miniconda3\lib\site-packages\setuptools\command\build_py.py", line 55, in run
self.build_package_data()
File "C:\ProgramData\Miniconda3\lib\site-packages\setuptools\command\build_py.py", line 126, in build_package_data
srcfile in self.distribution.convert_2to3_doctests):
AttributeError: 'Distribution' object has no attribute 'convert_2to3_doctests'
Previously, a user raised a concern that langcodes' use of sqlite3 was not thread-safe. I thought this was fixed in modern versions of Python and SQLite, but apparently it isn't.
For reasons I don't understand, this manifests itself when the langcodes database is being accessed from a WSGI process.
I think it may be time to stop relying on sqlite3 for language data.
Please add this dependency to setup.cfg or setup.py.
First of all, thanks for the library.
I've just noticed that the ISO 639-3 code hyw
(https://iso639-3.sil.org/code/hyw) is missing. Its "Effective Date" is 2018-01-23, but in the latest version of langcodes
(https://pypi.org/project/langcodes/1.4.1/, 2018-03-07), langcodes.get('hyw').language_name()
returns just 'hyw'
.
Is that because the underlying "CLDR and the IANA subtag registry" (README.md
) are not updated yet? (A relevant issue: #11.)
The documentation for normalize_characters
says “BCP 47 is case-insensitive, and considers underscores equivalent to hyphens”, but BCP 47 doesn’t say anything about underscores. I think that’s a CLDR thing.
Tags with repeating variants or singletons are reported as valid.
>>> tag_is_valid('de-1901-1901')
True
>>> tag_is_valid('en-a-bbb-a-ccc')
True
BCP 47 says:
A tag is considered "valid" if it satisfies these conditions:
[...]
o There are no duplicate variant subtags.
o There are no duplicate singleton (extension) subtags.
Tried this dc = langcodes.Dict.items()
It seems that the library can only produce 2-letter code. Is there a way to map a language into its ISO-639-2 3-letter code?
This project depends on marisa-trie, which is a Cython/C++ extension module. The latest release from 2.5 years ago only has pre-built binaries for Python3.6 on macOS, and there are no releases with binaries for a newer version of Python than 3.6. This means users must build it from source, which requires them to have a compiler, the Python development headers, etc. installed. I created this issue but haven't gotten a response from the marisa-trie devs.
Would you be receptive to using a more active project, like datrie, or a pure-Python one like pygtrie?
I'm wary to distribute software that depends on langcodes because of the problems with the marisa-trie dependency.
Hi, since you are using the Unicode Territory-Language Information table I'd like to know if you plan to make info accessible about the languages spoken in a given territory, e.g. as the babel
library does using these two fonctions.
Thanks for the great library!
Language.get
normalizes case and enforces syntactic validity whereas Language.make
doesn’t. This causes some discrepancies. I presume that a user is not supposed to have to worry about case or syntactic validity once they have a valid (in the sense of is_valid
) Language
object.
If Language
is valid, its tag should presumably be valid. This is not always true.
>>> lang = Language.make(language='Latn')
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False
>>>
>>> lang = Language.make(script='Qaaa..Qabx')
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False
>>>
>>> lang = Language.make(extensions='t-fr')
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False
>>>
>>> lang = Language.make(extensions=['a'])
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False
>>>
>>> lang = Language.make(language='x-')
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False
Round-tripping a valid Language
through its tag should presumably return an equivalent Language
. This is not always true.
>>> lang = Language.make(language='FR')
>>> lang.is_valid()
True
>>> lang == Language.get(lang.to_tag(), normalize=False)
False
Alternatively, maybe it is the user’s responsibility to normalize case and check for syntactic validity before calling Language.make
. I don’t think the documentation actually says that though.
A singleton must be followed by at least one extension subtag for a tag to be well-formed. This library correctly detects ill-formed tags like und-a
but misses the case where the singleton is not the final subtag. It should throw a LanguageTagError
.
>>> Language.get('und-a-b-xyz')
Language.make(extensions=['a', 'b-xyz'])
Hi, I just did tried to do a pip install of the package using the --only-binary :all:
flag described in the pip documentation.
It failed with the following error.
ERROR: Could not find a version that satisfies the requirement langcodes[data]<4.0.0,>=3.1.0
ERROR: No matching distribution found for langcodes[data]<4.0.0,>=3.1.0
Then I went to the package in pypi and found that there is only a source distribution there and no built wheel. I expected there to be a wheel as it is standard in python packaging.
Package operations: 1 install, 1 update, 0 removals
• Downgrading marisa-trie (1.1.0 -> 0.7.8): Failed
ChefBuildError
Backend subprocess exited when trying to invoke build_wheel
running bdist_wheel
running build
running build_clib
building 'libmarisa-trie' library
creating build
creating build/temp.macosx-14-arm64-cpython-312
creating build/temp.macosx-14-arm64-cpython-312/marisa-trie
creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib
creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa
creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire
creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io
creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie
creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/vector
clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/agent.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/agent.o
clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/io/mapper.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/mapper.o
clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/io/reader.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/reader.o
clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/io/writer.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/writer.o
clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/trie/louds-trie.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie/louds-trie.o
clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/trie/tail.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie/tail.o
clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/vector/bit-vector.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/vector/bit-vector.o
clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/keyset.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/keyset.o
clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/trie.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/trie.o
/usr/bin/xcrun ar rcs build/temp.macosx-14-arm64-cpython-312/liblibmarisa-trie.a build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/agent.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/mapper.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/reader.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/writer.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie/louds-trie.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie/tail.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/vector/bit-vector.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/keyset.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/trie.o
ranlib build/temp.macosx-14-arm64-cpython-312/liblibmarisa-trie.a
running build_ext
building 'marisa_trie' extension
creating build/temp.macosx-14-arm64-cpython-312/src
clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/include -I/private/var/folders/m2/cl1wt_2j5qq5wlnsmywlz2yh0000gn/T/tmp1fmwnqww/.venv/include -I/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12 -c src/agent.cpp -o build/temp.macosx-14-arm64-cpython-312/src/agent.o
src/agent.cpp:1582:27: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
return likely(dict) ? __PYX_GET_DICT_VERSION(dict) : 0;
^
src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
#define __PYX_GET_DICT_VERSION(dict) (((PyDictObject*)(dict))->ma_version_tag)
^
/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
Py_DEPRECATED(3.12) uint64_t ma_version_tag;
^
/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
#define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
^
src/agent.cpp:1594:36: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
return (dictptr && *dictptr) ? __PYX_GET_DICT_VERSION(*dictptr) : 0;
^
src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
#define __PYX_GET_DICT_VERSION(dict) (((PyDictObject*)(dict))->ma_version_tag)
^
/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
Py_DEPRECATED(3.12) uint64_t ma_version_tag;
^
/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
#define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
^
src/agent.cpp:1598:56: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
if (unlikely(!dict) || unlikely(tp_dict_version != __PYX_GET_DICT_VERSION(dict)))
^
src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
#define __PYX_GET_DICT_VERSION(dict) (((PyDictObject*)(dict))->ma_version_tag)
^
/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
Py_DEPRECATED(3.12) uint64_t ma_version_tag;
^
/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
#define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
^
src/agent.cpp:1657:9: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
__PYX_PY_DICT_LOOKUP_IF_MODIFIED(
^
src/agent.cpp:1053:16: note: expanded from macro '__PYX_PY_DICT_LOOKUP_IF_MODIFIED'
if (likely(__PYX_GET_DICT_VERSION(DICT) == __pyx_dict_version)) {\
^
src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
#define __PYX_GET_DICT_VERSION(dict) (((PyDictObject*)(dict))->ma_version_tag)
^
/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
Py_DEPRECATED(3.12) uint64_t ma_version_tag;
^
/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
#define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
^
src/agent.cpp:1657:9: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
__PYX_PY_DICT_LOOKUP_IF_MODIFIED(
^
src/agent.cpp:1057:30: note: expanded from macro '__PYX_PY_DICT_LOOKUP_IF_MODIFIED'
__pyx_dict_version = __PYX_GET_DICT_VERSION(DICT);\
^
src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
#define __PYX_GET_DICT_VERSION(dict) (((PyDictObject*)(dict))->ma_version_tag)
^
/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
Py_DEPRECATED(3.12) uint64_t ma_version_tag;
^
/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
#define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
^
src/agent.cpp:1958:55: error: no member named 'ob_digit' in '_longobject'
const digit* digits = ((PyLongObject*)x)->ob_digit;
~~~~~~~~~~~~~~~~~~ ^
src/agent.cpp:2013:55: error: no member named 'ob_digit' in '_longobject'
const digit* digits = ((PyLongObject*)x)->ob_digit;
~~~~~~~~~~~~~~~~~~ ^
src/agent.cpp:2154:55: error: no member named 'ob_digit' in '_longobject'
const digit* digits = ((PyLongObject*)x)->ob_digit;
~~~~~~~~~~~~~~~~~~ ^
src/agent.cpp:2209:55: error: no member named 'ob_digit' in '_longobject'
const digit* digits = ((PyLongObject*)x)->ob_digit;
~~~~~~~~~~~~~~~~~~ ^
src/agent.cpp:2660:47: error: no member named 'ob_digit' in '_longobject'
const digit* digits = ((PyLongObject*)b)->ob_digit;
~~~~~~~~~~~~~~~~~~ ^
5 warnings and 5 errors generated.
error: command '/usr/bin/clang' failed with exit code 1
at ~/.local/pipx/venvs/poetry/lib/python3.12/site-packages/poetry/installation/chef.py:164 in _prepare
160│
161│ error = ChefBuildError("\n\n".join(message_parts))
162│
163│ if error is not None:
→ 164│ raise error from None
165│
166│ return path
167│
168│ def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path:
Note: This error originates from the build backend, and is likely not a problem with poetry but with marisa-trie (0.7.8) not supporting PEP 517 builds. You can verify this by running 'pip wheel --no-cache-dir --use-pep517 "marisa-trie (==0.7.8)"'.
This fork fixes it https://github.com/Puyodead1/language_data thanks @Puyodead1
I can't submit a PR
small issue in file init.py:657
running
from langcodes import get
e = get('en-US')
e.variant_names()
results in a TypeError instead of an empty list.
for variant in self.variants:
var_names = code_to_names('variant', variant)
names.append(self._best_name(var_names, language, min_score))
This would not raise an error:
if self.variants:
for variant in self.variants:
var_names = code_to_names('variant', variant)
names.append(self._best_name(var_names, language, min_score))
Here are some ill-formed tags that this library doesn’t throw exceptions for, and one well-formed (though invalid) tag that it does throw an exception for.
>>> Language.get('x-')
Language.make(language='x-')
>>> Language.get('x-123456789')
Language.make(language='x-123456789')
>>> Language.get('x-')
Language.make(language='x-\ue83f\ue857\ue852\ue83f')
>>> Language.get('und-u-')
Language.make(extensions=['u-'])
>>> Language.get('und-?-foo')
Language.make(extensions=['?-foo'])
>>> Language.get('ar-٠٠١')
Language.make(language='ar', territory='٠٠١')
>>> Language.get('zh-普通话')
Language.make(language='zh', extlangs=['普通话'])
>>> Language.get('non-ᚱᚢᚾᛟ')
Language.make(language='non', script='ᚱᚢᚾᛟ')
>>> Language.get('fr-1606thré')
Language.make(language='fr', variants=['1606thré'])
>>> Language.get('example')
langcodes.tag_parser.LanguageTagError: Expected a language code, got 'example'
Given a language code, I'd like to get a list of all territories where that language is being used. For example, for language it
(italian), it should return: IT, SM, CH
(Italy, San Marino, Switzerland).
Is there a way I could do that with this package?
I'm attempting to match a language code 'pa' with another language code 'pa-PK'.
def test_language_less_than():
spoken_language_1 = 'pa'
spoken_language_2 = 'pa-PK'
match = closest_match(spoken_language_1, [spoken_language_2])
print(match)
self.assertEqual(0, match[1])`
def test_language_more_than(self):
spoken_language_1 = 'pa-PK'
spoken_language_2 = 'pa'
match = closest_match(spoken_language_1, [spoken_language_2])
print(match)
self.assertEqual(0, match[1])`
This returns
('und', 1000)
('und', 1000)
I would expect this to return a match and not None. When I debug the library, I see the following which returns 54 from the tuple_distance_cached function.
desired_triple = ('pa', 'Arab', 'PK')
supported_triple = ('pa', 'Guru', 'IN')
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.