rspeer / langcodes Goto Github PK

View Code? Open in Web Editor NEW

341.0 341.0 24.0 171.37 MB

A Python library for working with and comparing language codes.

License: MIT License

Python 100.00%

langcodes's People

Contributors

Stargazers

Watchers

langcodes's Issues

zh-HK and zh_HK are normalized differently

Noticed a bit of an inconsistency in how langcodes is doing language replacements, so that normalizing zh-HK and zh_HK aren't normalized the same:

 >>> import langcodes
 >>> langcodes.get("zh-HK")
 Language.make(language='zh', script='Hant', region='HK')
 >>> langcodes.get("zh_HK")
 Language.make(language='zh', region='HK')

Langcodes currently lowercases the tags to do the normalization, I think this would be fixed if it used tag_parser.normalize_characters() instead of just .lower(), then they would both pick up the normalization.

Generate and publish API documentation

Great library! It was comprehensive docstrings, so it would be helpful to generate API documentation with Sphinx and publish on the Read the Docs. Published documentation is easier to browse, search and link.

https://www.sphinx-doc.org/en/master/
https://readthedocs.org/

Many many other Python projects use Sphinx and Read the Docs, eg. https://docs.python-requests.org/en/master/

[Question] find() working for both language natural name and code

If I understand correctly, currently there are two ways of getting a "language object": either through its code (Language.get('eng')), or via a search over its natural name (langcodes.find('English')).

However, e.g. langcodes.find('eng') returns a LookupError.

What about providing a "unified" function that accepts either, first matching the code and then, if not found, performing the fuzzy search over the natural names?

This would remove the need for third party codes to (re)write such a function into each project. From experience, I know that user-facing code where a "language" parameter is expected will get values in both sets (natural names, codes), no matter how many times your documentation specifies that (say) codes should be provided...

For example, the following would return the same "language object":

langcodes.find('en')
langcodes.find('eng')
langcodes.find('English')
langcodes.find('english')

Exception when calling LanguageData.get with "en-gb-oed"

>>> import langcodes
>>> langcodes.get('en-GB-oed')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
----> 1 langcodes.get('en-GB-oed')

/opt/venv/local/lib/python2.7/site-packages/langcodes/__init__.pyc in get(tag, normalize)
    173                 data[typ] = value
    174
--> 175         return LanguageData(**data)
    176
    177     def to_tag(self):

TypeError: __init__() got an unexpected keyword argument 'grandfathered'

[Question] Is there a way to check that a language code is valid?

Using Language.make(x) or Language.get(x) always return a Language object. What is the best way to determine if that object is a valid language?

e.g. Users often enter 'jp' instead of 'ja' for Japanese. But 'jp' is not a valid code.

>>> Language.get('jp')
Language.make(language='jp')

>>> Language.get('jp').language_name(min_score=100)
'jp'

Would the following be correct? It seems very awkward at best to rely on this.

def is_valid(lang, min_score=75):
    return lang.language_name(min_score=min_score) != lang.language

False negative for .is_valid()

>>> Language.get("cu-Cyrs").is_valid()
False

Expected to return True. Fails on the Cyrs script tag

Context (wikipedia):
cu: Old Church Slavonic
Cyrs: Cyrillic (Old Church Slavonic variant)
Both these tags are part of https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

`Language.is_valid` returns `True` for invalid tags

is_valid returns True for an invalid tag if all its subtags are prefixes of valid subtags.

>>> Language.get('aaj').is_valid()
True
>>> Language.get('en-Latnx').is_valid()
True

Don't lose region/territory when converting back from display name

Given a language code of en-ca, the following currently happens:

code = "en-ca"
tag = langcodes.get(code)  # Language.make(language='en', territory='CA')
display_name = tag.display_name()  # 'English (Canada)'
tag = langcodes.find(display_name)  # <- .find does not capture the territory, yielding Language.make(language='en')

Is there any way .find could correctly get the territory from the display name?

remove .utf8

os.getenv('LANG') returns (on my ubuntu system) a language code suffixed by .utf8. I would guess that other systems also do this. Would if be possible for langcodes.standardize_tags() to remove the .utf8 rather than throw an exception?

tag_match_score for middle Europe needs improvements

Hello, I've found your library when looking for some good source to get long lang codes from the short ones (cs -> cs_CZ). I've tried tag_match_score function, but for middle Europe languages it does not work as expected. See the following table:

Languages	Expected	Outputted
cs - sk	86-95	16
cs - pl	76-85	16
sk - pl	76-85	16
cs - hr	76-85	16

There are no language names for Serbo-Croatian (sr-Latn)

Maybe add:
'Serbo-Croatian' in 'en'
and
'Srpskohrvatski jezik' in 'sr-Latn'

Locale with hash tags

It's not working for language codes like "az_AZ_#Latn" but its valid.

[Question] Unified Python 2 + 3 code base

Currently the Python 2 and Python 3 code bases are separate, and each has its own PyPI package.

This situation leads to dis-alignments (e.g., on PyPI langcodes is at v1.3, while langcodes-py2 is at v1.2), and it is problematic in general.

Is there any interest in having a unified code base? At a first glance, I do not see particular issues that impede the unification --- but maybe I am wrong. I will be willing to invest some time in this, if you are interested. (I think I have the required experience with NLP and with unified PY2/PY3 code bases.)

BTW, very nice package, thank you (Rob / Luminoso) for releasing it under the MIT license!

Update for CLDR 30

CLDR 30 (actually, 30.0.3) has been published: http://cldr.unicode.org/index/downloads/cldr-30

AFAIU, the current database has data from CLDR 29.

[minor] Releases not tagged

Hey, noiced that v2.0.0 and v1.4.1 are released on PyPi but not in the repo as tags/releases https://pypi.org/project/langcodes/#history

[Long shot] Decoupling the library code from the language db

I was wondering whether there is a way of decoupling the code of langcodes from the actual language db, or, more precisely, to package langcodes with a subset of the language db.

My use case is the following: I want to have the machinery provided by langcodes (in particular, the fuzzy match of languages from a user-supplied string, and the hashable Language object), but on an extremely reduced subset of languages --- say only 100.

Currently, if I use langcodes in my application, I force the end-user to get 30+ MB of data from PyPI.

For example, for a project I am working on right now I coded this: https://github.com/pettarin/lachesis/blob/master/lachesis/language.py but I would much much happier if I could use langcodes (sans 30+ MB of data) instead.

One way to achieve this could be the following:

add a "download" function to the package, able to fetch a language db from Internet;
add a "register" function to "add" the data for recognized languages;
put some logic in setup.py, so that:

pip install langcodes => install langcodes "code" and download all the CLDR data (e.g. from GitHub)
pip install langcodes[nodb] => install langcodes "code" but do not download all

In the second case, the client library/application would call the "register" function at runtime, providing the data for the recognized languages, (say) the subset of the CLDR of interest to that client.

Accidentally using `Language` constructor breaks hash values

I ran into the problem that

>>> Language("da") == Language.get("da")
True
>>> Language("da") in {Language.get("da")}
False

I know that the Language constructor specifies that It's inefficient to call this directly but it doesn't clearly say not to call it directly. And it's an easy mistake to do when it otherwise seems to work fine.

It seems that problem could easily be fixed by updating

    def __hash__(self) -> int:
        return hash(id(self))

    def __hash__(self) -> int:
        return hash(self._str_tag)

but I have no idea if other parts of the code doesn't work as expected either when calling the Language constructor directly.

AttributeError: 'Distribution' object has no attribute 'convert_2to3_doctests' in Python 3.9 and Windows 10

The environment setup is as follows:
The operating system - Windows 10.
Python version - 3.9.6
Pip version - pip 21.2.4 from C:\ProgramData\Miniconda3\lib\site-packages\pip (python 3.9)
Terminal command - pip install langcodes[data]

During the first installation, it throws the following error:

ERROR: Command errored out with exit status 1:
   command: 'C:\ProgramData\Miniconda3\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\NUM\\AppData\\Local\\Temp\\pip-install-4ppckwvi\\language-data_d7af6b72ebe74a9db4472d299a899d42\\setup.py'"'"'; __file__='"'"'C:\\Users\\NUM\\AppData\\Local\\Temp\\pip-install-4ppckwvi\\language-data_d7af6b72ebe74a9db4472d299a899d42\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\NUM\AppData\Local\Temp\pip-wheel-b1_103om'
       cwd: C:\Users\NUM\AppData\Local\Temp\pip-install-4ppckwvi\language-data_d7af6b72ebe74a9db4472d299a899d42\
  Complete output (54 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib
  creating build\lib\language_data
  copying language_data\build_data.py -> build\lib\language_data
  copying language_data\language_lists.py -> build\lib\language_data
  copying language_data\names.py -> build\lib\language_data
  copying language_data\name_data.py -> build\lib\language_data
  copying language_data\population_data.py -> build\lib\language_data
  copying language_data\registry_parser.py -> build\lib\language_data
  copying language_data\util.py -> build\lib\language_data
  copying language_data\__init__.py -> build\lib\language_data
  running egg_info
  writing language_data.egg-info\PKG-INFO
  writing dependency_links to language_data.egg-info\dependency_links.txt
  writing requirements to language_data.egg-info\requires.txt
  writing top-level names to language_data.egg-info\top_level.txt
  reading manifest file 'language_data.egg-info\SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  no previously-included directories found matching 'language_data\data\cldr-localenames-*'
  writing manifest file 'language_data.egg-info\SOURCES.txt'
  creating build\lib\language_data\data
  copying language_data\data\extra_language_names.csv -> build\lib\language_data\data
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "C:\Users\NUM\AppData\Local\Temp\pip-install-4ppckwvi\language-data_d7af6b72ebe74a9db4472d299a899d42\setup.py", line 20, in <module>
      setup(
    File "C:\ProgramData\Miniconda3\lib\site-packages\setuptools\__init__.py", line 153, in setup
      return distutils.core.setup(**attrs)
    File "C:\ProgramData\Miniconda3\lib\distutils\core.py", line 148, in setup
      dist.run_commands()
    File "C:\ProgramData\Miniconda3\lib\distutils\dist.py", line 966, in run_commands
      self.run_command(cmd)
    File "C:\ProgramData\Miniconda3\lib\distutils\dist.py", line 985, in run_command
      cmd_obj.run()
    File "C:\ProgramData\Miniconda3\lib\site-packages\wheel\bdist_wheel.py", line 299, in run
      self.run_command('build')
    File "C:\ProgramData\Miniconda3\lib\distutils\cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "C:\ProgramData\Miniconda3\lib\distutils\dist.py", line 985, in run_command
      cmd_obj.run()
    File "C:\ProgramData\Miniconda3\lib\distutils\command\build.py", line 135, in run
      self.run_command(cmd_name)
    File "C:\ProgramData\Miniconda3\lib\distutils\cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "C:\ProgramData\Miniconda3\lib\distutils\dist.py", line 985, in run_command
      cmd_obj.run()
    File "C:\ProgramData\Miniconda3\lib\site-packages\setuptools\command\build_py.py", line 55, in run
      self.build_package_data()
    File "C:\ProgramData\Miniconda3\lib\site-packages\setuptools\command\build_py.py", line 126, in build_package_data
      srcfile in self.distribution.convert_2to3_doctests):
  AttributeError: 'Distribution' object has no attribute 'convert_2to3_doctests'

Sporadic "disk image is malformed" errors when used within WSGI

Previously, a user raised a concern that langcodes' use of sqlite3 was not thread-safe. I thought this was fixed in modern versions of Python and SQLite, but apparently it isn't.

For reasons I don't understand, this manifests itself when the langcodes database is being accessed from a WSGI process.

I think it may be time to stop relying on sqlite3 for language data.

missing `language_data` dependency after `pip install langcodes`

Please add this dependency to setup.cfg or setup.py.

No recent ISO 639-3 codes

First of all, thanks for the library.

I've just noticed that the ISO 639-3 code hyw (https://iso639-3.sil.org/code/hyw) is missing. Its "Effective Date" is 2018-01-23, but in the latest version of langcodes (https://pypi.org/project/langcodes/1.4.1/, 2018-03-07), langcodes.get('hyw').language_name() returns just 'hyw'.

Is that because the underlying "CLDR and the IANA subtag registry" (README.md) are not updated yet? (A relevant issue: #11.)

Minor documentation problem about normalizing underscores

The documentation for normalize_characters says “BCP 47 is case-insensitive, and considers underscores equivalent to hyphens”, but BCP 47 doesn’t say anything about underscores. I think that’s a CLDR thing.

Invalid tags with duplicate subtags are reported as valid

Tags with repeating variants or singletons are reported as valid.

>>> tag_is_valid('de-1901-1901')
True
>>> tag_is_valid('en-a-bbb-a-ccc')
True

BCP 47 says:

   A tag is considered "valid" if it satisfies these conditions:

   [...]

   o  There are no duplicate variant subtags.

   o  There are no duplicate singleton (extension) subtags.

How do I get a list of all languages to loop over?

Tried this dc = langcodes.Dict.items()

Is there a way to get the ISO-639-2 3-letter code given the language name?

It seems that the library can only produce 2-letter code. Is there a way to map a language into its ISO-639-2 3-letter code?

Alternative trie dependency?

This project depends on marisa-trie, which is a Cython/C++ extension module. The latest release from 2.5 years ago only has pre-built binaries for Python3.6 on macOS, and there are no releases with binaries for a newer version of Python than 3.6. This means users must build it from source, which requires them to have a compiler, the Python development headers, etc. installed. I created this issue but haven't gotten a response from the marisa-trie devs.

Would you be receptive to using a more active project, like datrie, or a pure-Python one like pygtrie?

I'm wary to distribute software that depends on langcodes because of the problems with the marisa-trie dependency.

Expose function to derive languages from territory?

Hi, since you are using the Unicode Territory-Language Information table I'd like to know if you plan to make info accessible about the languages spoken in a given territory, e.g. as the babel library does using these two fonctions.

Thanks for the great library!

Supposedly valid `Language` instances may have invalid tags

Language.get normalizes case and enforces syntactic validity whereas Language.make doesn’t. This causes some discrepancies. I presume that a user is not supposed to have to worry about case or syntactic validity once they have a valid (in the sense of is_valid) Language object.

If Language is valid, its tag should presumably be valid. This is not always true.

>>> lang = Language.make(language='Latn')
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False
>>> 
>>> lang = Language.make(script='Qaaa..Qabx')
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False
>>> 
>>> lang = Language.make(extensions='t-fr')
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False
>>> 
>>> lang = Language.make(extensions=['a'])
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False
>>> 
>>> lang = Language.make(language='x-') 
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False

Round-tripping a valid Language through its tag should presumably return an equivalent Language. This is not always true.

>>> lang = Language.make(language='FR')
>>> lang.is_valid()
True
>>> lang == Language.get(lang.to_tag(), normalize=False)
False

Alternatively, maybe it is the user’s responsibility to normalize case and check for syntactic validity before calling Language.make. I don’t think the documentation actually says that though.

Tags with singletons not followed by extensions should be rejected

A singleton must be followed by at least one extension subtag for a tag to be well-formed. This library correctly detects ill-formed tags like und-a but misses the case where the singleton is not the final subtag. It should throw a LanguageTagError.

>>> Language.get('und-a-b-xyz')
Language.make(extensions=['a', 'b-xyz'])

No wheel in the package on pypi when using pip installl

Hi, I just did tried to do a pip install of the package using the --only-binary :all: flag described in the pip documentation.

It failed with the following error.

ERROR: Could not find a version that satisfies the requirement langcodes[data]<4.0.0,>=3.1.0
ERROR: No matching distribution found for langcodes[data]<4.0.0,>=3.1.0

Then I went to the package in pypi and found that there is only a source distribution there and no built wheel. I expected there to be a wheel as it is standard in python packaging.

Fails to install on Python 3.11, outdated marisa-trie

Package operations: 1 install, 1 update, 0 removals

  • Downgrading marisa-trie (1.1.0 -> 0.7.8): Failed

  ChefBuildError

  Backend subprocess exited when trying to invoke build_wheel

  running bdist_wheel
  running build
  running build_clib
  building 'libmarisa-trie' library
  creating build
  creating build/temp.macosx-14-arm64-cpython-312
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/vector
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/agent.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/agent.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/io/mapper.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/mapper.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/io/reader.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/reader.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/io/writer.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/writer.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/trie/louds-trie.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie/louds-trie.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/trie/tail.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie/tail.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/vector/bit-vector.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/vector/bit-vector.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/keyset.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/keyset.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/trie.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/trie.o
  /usr/bin/xcrun ar rcs build/temp.macosx-14-arm64-cpython-312/liblibmarisa-trie.a build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/agent.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/mapper.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/reader.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/writer.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie/louds-trie.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie/tail.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/vector/bit-vector.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/keyset.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/trie.o
  ranlib build/temp.macosx-14-arm64-cpython-312/liblibmarisa-trie.a
  running build_ext
  building 'marisa_trie' extension
  creating build/temp.macosx-14-arm64-cpython-312/src
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/include -I/private/var/folders/m2/cl1wt_2j5qq5wlnsmywlz2yh0000gn/T/tmp1fmwnqww/.venv/include -I/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12 -c src/agent.cpp -o build/temp.macosx-14-arm64-cpython-312/src/agent.o
  src/agent.cpp:1582:27: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
      return likely(dict) ? __PYX_GET_DICT_VERSION(dict) : 0;
                            ^
  src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
  #define __PYX_GET_DICT_VERSION(dict)  (((PyDictObject*)(dict))->ma_version_tag)
                                                                  ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
      Py_DEPRECATED(3.12) uint64_t ma_version_tag;
      ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
  #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                       ^
  src/agent.cpp:1594:36: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
      return (dictptr && *dictptr) ? __PYX_GET_DICT_VERSION(*dictptr) : 0;
                                     ^
  src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
  #define __PYX_GET_DICT_VERSION(dict)  (((PyDictObject*)(dict))->ma_version_tag)
                                                                  ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
      Py_DEPRECATED(3.12) uint64_t ma_version_tag;
      ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
  #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                       ^
  src/agent.cpp:1598:56: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
      if (unlikely(!dict) || unlikely(tp_dict_version != __PYX_GET_DICT_VERSION(dict)))
                                                         ^
  src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
  #define __PYX_GET_DICT_VERSION(dict)  (((PyDictObject*)(dict))->ma_version_tag)
                                                                  ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
      Py_DEPRECATED(3.12) uint64_t ma_version_tag;
      ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
  #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                       ^
  src/agent.cpp:1657:9: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
          __PYX_PY_DICT_LOOKUP_IF_MODIFIED(
          ^
  src/agent.cpp:1053:16: note: expanded from macro '__PYX_PY_DICT_LOOKUP_IF_MODIFIED'
      if (likely(__PYX_GET_DICT_VERSION(DICT) == __pyx_dict_version)) {\
                 ^
  src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
  #define __PYX_GET_DICT_VERSION(dict)  (((PyDictObject*)(dict))->ma_version_tag)
                                                                  ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
      Py_DEPRECATED(3.12) uint64_t ma_version_tag;
      ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
  #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                       ^
  src/agent.cpp:1657:9: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
          __PYX_PY_DICT_LOOKUP_IF_MODIFIED(
          ^
  src/agent.cpp:1057:30: note: expanded from macro '__PYX_PY_DICT_LOOKUP_IF_MODIFIED'
          __pyx_dict_version = __PYX_GET_DICT_VERSION(DICT);\
                               ^
  src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
  #define __PYX_GET_DICT_VERSION(dict)  (((PyDictObject*)(dict))->ma_version_tag)
                                                                  ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
      Py_DEPRECATED(3.12) uint64_t ma_version_tag;
      ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
  #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                       ^
  src/agent.cpp:1958:55: error: no member named 'ob_digit' in '_longobject'
              const digit* digits = ((PyLongObject*)x)->ob_digit;
                                    ~~~~~~~~~~~~~~~~~~  ^
  src/agent.cpp:2013:55: error: no member named 'ob_digit' in '_longobject'
              const digit* digits = ((PyLongObject*)x)->ob_digit;
                                    ~~~~~~~~~~~~~~~~~~  ^
  src/agent.cpp:2154:55: error: no member named 'ob_digit' in '_longobject'
              const digit* digits = ((PyLongObject*)x)->ob_digit;
                                    ~~~~~~~~~~~~~~~~~~  ^
  src/agent.cpp:2209:55: error: no member named 'ob_digit' in '_longobject'
              const digit* digits = ((PyLongObject*)x)->ob_digit;
                                    ~~~~~~~~~~~~~~~~~~  ^
  src/agent.cpp:2660:47: error: no member named 'ob_digit' in '_longobject'
      const digit* digits = ((PyLongObject*)b)->ob_digit;
                            ~~~~~~~~~~~~~~~~~~  ^
  5 warnings and 5 errors generated.
  error: command '/usr/bin/clang' failed with exit code 1


  at ~/.local/pipx/venvs/poetry/lib/python3.12/site-packages/poetry/installation/chef.py:164 in _prepare
      160│
      161│                 error = ChefBuildError("\n\n".join(message_parts))
      162│
      163│             if error is not None:
    → 164│                 raise error from None
      165│
      166│             return path
      167│
      168│     def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path:

Note: This error originates from the build backend, and is likely not a problem with poetry but with marisa-trie (0.7.8) not supporting PEP 517 builds. You can verify this by running 'pip wheel --no-cache-dir --use-pep517 "marisa-trie (==0.7.8)"'.

This fork fixes it https://github.com/Puyodead1/language_data thanks @Puyodead1

I can't submit a PR

TypeError: 'NoneType' object is not iterable

small issue in file init.py:657

running

from langcodes import get
e = get('en-US')
e.variant_names()

results in a TypeError instead of an empty list.

for variant in self.variants:
    var_names = code_to_names('variant', variant)
    names.append(self._best_name(var_names, language, min_score))

This would not raise an error:

if self.variants:
    for variant in self.variants:
        var_names = code_to_names('variant', variant)
        names.append(self._best_name(var_names, language, min_score))

List of territories by language code

Given a language code, I'd like to get a list of all territories where that language is being used. For example, for language it (italian), it should return: IT, SM, CH (Italy, San Marino, Switzerland).

Is there a way I could do that with this package?

Closest Match for Punjabi (Pakistan) Not Resolving Match

I'm attempting to match a language code 'pa' with another language code 'pa-PK'.

def test_language_less_than():
    spoken_language_1 = 'pa'
    spoken_language_2 = 'pa-PK'
    match = closest_match(spoken_language_1, [spoken_language_2])
    print(match)
    self.assertEqual(0, match[1])`

def test_language_more_than(self):
    spoken_language_1 = 'pa-PK'
    spoken_language_2 = 'pa'
    match = closest_match(spoken_language_1, [spoken_language_2])
    print(match)
    self.assertEqual(0, match[1])`

This returns

('und', 1000)
('und', 1000)

I would expect this to return a match and not None. When I debug the library, I see the following which returns 54 from the tuple_distance_cached function.

desired_triple = ('pa', 'Arab', 'PK')
supported_triple = ('pa', 'Guru', 'IN')

rspeer / langcodes Goto Github PK

langcodes's People

Contributors

Stargazers

Watchers

Forkers

langcodes's Issues

Recommend Projects

Recommend Topics

Recommend Org