Coder Social home page Coder Social logo

langcodes's People

Contributors

dswistowski avatar garyd203 avatar hickford avatar ivuk avatar jlowryduda avatar joshua-chin avatar moss avatar rspeer avatar sheyvaert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

langcodes's Issues

zh-HK and zh_HK are normalized differently

Noticed a bit of an inconsistency in how langcodes is doing language replacements, so that normalizing zh-HK and zh_HK aren't normalized the same:

 >>> import langcodes
 >>> langcodes.get("zh-HK")
 Language.make(language='zh', script='Hant', region='HK')
 >>> langcodes.get("zh_HK")
 Language.make(language='zh', region='HK')

Langcodes currently lowercases the tags to do the normalization, I think this would be fixed if it used tag_parser.normalize_characters() instead of just .lower(), then they would both pick up the normalization.

[Question] find() working for both language natural name and code

If I understand correctly, currently there are two ways of getting a "language object": either through its code (Language.get('eng')), or via a search over its natural name (langcodes.find('English')).

However, e.g. langcodes.find('eng') returns a LookupError.

What about providing a "unified" function that accepts either, first matching the code and then, if not found, performing the fuzzy search over the natural names?

This would remove the need for third party codes to (re)write such a function into each project. From experience, I know that user-facing code where a "language" parameter is expected will get values in both sets (natural names, codes), no matter how many times your documentation specifies that (say) codes should be provided...

For example, the following would return the same "language object":

langcodes.find('en')
langcodes.find('eng')
langcodes.find('English')
langcodes.find('english')

Exception when calling LanguageData.get with "en-gb-oed"

>>> import langcodes
>>> langcodes.get('en-GB-oed')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
----> 1 langcodes.get('en-GB-oed')

/opt/venv/local/lib/python2.7/site-packages/langcodes/__init__.pyc in get(tag, normalize)
    173                 data[typ] = value
    174
--> 175         return LanguageData(**data)
    176
    177     def to_tag(self):

TypeError: __init__() got an unexpected keyword argument 'grandfathered'

[Question] Is there a way to check that a language code is valid?

Using Language.make(x) or Language.get(x) always return a Language object. What is the best way to determine if that object is a valid language?

e.g. Users often enter 'jp' instead of 'ja' for Japanese. But 'jp' is not a valid code.

>>> Language.get('jp')
Language.make(language='jp')

>>> Language.get('jp').language_name(min_score=100)
'jp'

Would the following be correct? It seems very awkward at best to rely on this.

def is_valid(lang, min_score=75):
    return lang.language_name(min_score=min_score) != lang.language

Don't lose region/territory when converting back from display name

Given a language code of en-ca, the following currently happens:

code = "en-ca"
tag = langcodes.get(code)  # Language.make(language='en', territory='CA')
display_name = tag.display_name()  # 'English (Canada)'
tag = langcodes.find(display_name)  # <- .find does not capture the territory, yielding Language.make(language='en')

Is there any way .find could correctly get the territory from the display name?

remove .utf8

os.getenv('LANG') returns (on my ubuntu system) a language code suffixed by .utf8. I would guess that other systems also do this. Would if be possible for langcodes.standardize_tags() to remove the .utf8 rather than throw an exception?

tag_match_score for middle Europe needs improvements

Hello, I've found your library when looking for some good source to get long lang codes from the short ones (cs -> cs_CZ). I've tried tag_match_score function, but for middle Europe languages it does not work as expected. See the following table:

Languages Expected Outputted
cs - sk 86-95 16
cs - pl 76-85 16
sk - pl 76-85 16
cs - hr 76-85 16

[Question] Unified Python 2 + 3 code base

Currently the Python 2 and Python 3 code bases are separate, and each has its own PyPI package.

This situation leads to dis-alignments (e.g., on PyPI langcodes is at v1.3, while langcodes-py2 is at v1.2), and it is problematic in general.

Is there any interest in having a unified code base? At a first glance, I do not see particular issues that impede the unification --- but maybe I am wrong. I will be willing to invest some time in this, if you are interested. (I think I have the required experience with NLP and with unified PY2/PY3 code bases.)

BTW, very nice package, thank you (Rob / Luminoso) for releasing it under the MIT license!

[Long shot] Decoupling the library code from the language db

I was wondering whether there is a way of decoupling the code of langcodes from the actual language db, or, more precisely, to package langcodes with a subset of the language db.

My use case is the following: I want to have the machinery provided by langcodes (in particular, the fuzzy match of languages from a user-supplied string, and the hashable Language object), but on an extremely reduced subset of languages --- say only 100.

Currently, if I use langcodes in my application, I force the end-user to get 30+ MB of data from PyPI.

For example, for a project I am working on right now I coded this: https://github.com/pettarin/lachesis/blob/master/lachesis/language.py but I would much much happier if I could use langcodes (sans 30+ MB of data) instead.

One way to achieve this could be the following:

  1. add a "download" function to the package, able to fetch a language db from Internet;
  2. add a "register" function to "add" the data for recognized languages;
  3. put some logic in setup.py, so that:
pip install langcodes => install langcodes "code" and download all the CLDR data (e.g. from GitHub)
pip install langcodes[nodb] => install langcodes "code" but do not download all

In the second case, the client library/application would call the "register" function at runtime, providing the data for the recognized languages, (say) the subset of the CLDR of interest to that client.

Accidentally using `Language` constructor breaks hash values

I ran into the problem that

>>> Language("da") == Language.get("da")
True
>>> Language("da") in {Language.get("da")}
False

I know that the Language constructor specifies that It's inefficient to call this directly but it doesn't clearly say not to call it directly. And it's an easy mistake to do when it otherwise seems to work fine.

It seems that problem could easily be fixed by updating

    def __hash__(self) -> int:
        return hash(id(self))

to

    def __hash__(self) -> int:
        return hash(self._str_tag)

but I have no idea if other parts of the code doesn't work as expected either when calling the Language constructor directly.

AttributeError: 'Distribution' object has no attribute 'convert_2to3_doctests' in Python 3.9 and Windows 10

The environment setup is as follows:
The operating system - Windows 10.
Python version - 3.9.6
Pip version - pip 21.2.4 from C:\ProgramData\Miniconda3\lib\site-packages\pip (python 3.9)
Terminal command - pip install langcodes[data]

During the first installation, it throws the following error:

ERROR: Command errored out with exit status 1:
   command: 'C:\ProgramData\Miniconda3\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\NUM\\AppData\\Local\\Temp\\pip-install-4ppckwvi\\language-data_d7af6b72ebe74a9db4472d299a899d42\\setup.py'"'"'; __file__='"'"'C:\\Users\\NUM\\AppData\\Local\\Temp\\pip-install-4ppckwvi\\language-data_d7af6b72ebe74a9db4472d299a899d42\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\NUM\AppData\Local\Temp\pip-wheel-b1_103om'
       cwd: C:\Users\NUM\AppData\Local\Temp\pip-install-4ppckwvi\language-data_d7af6b72ebe74a9db4472d299a899d42\
  Complete output (54 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib
  creating build\lib\language_data
  copying language_data\build_data.py -> build\lib\language_data
  copying language_data\language_lists.py -> build\lib\language_data
  copying language_data\names.py -> build\lib\language_data
  copying language_data\name_data.py -> build\lib\language_data
  copying language_data\population_data.py -> build\lib\language_data
  copying language_data\registry_parser.py -> build\lib\language_data
  copying language_data\util.py -> build\lib\language_data
  copying language_data\__init__.py -> build\lib\language_data
  running egg_info
  writing language_data.egg-info\PKG-INFO
  writing dependency_links to language_data.egg-info\dependency_links.txt
  writing requirements to language_data.egg-info\requires.txt
  writing top-level names to language_data.egg-info\top_level.txt
  reading manifest file 'language_data.egg-info\SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  no previously-included directories found matching 'language_data\data\cldr-localenames-*'
  writing manifest file 'language_data.egg-info\SOURCES.txt'
  creating build\lib\language_data\data
  copying language_data\data\extra_language_names.csv -> build\lib\language_data\data
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "C:\Users\NUM\AppData\Local\Temp\pip-install-4ppckwvi\language-data_d7af6b72ebe74a9db4472d299a899d42\setup.py", line 20, in <module>
      setup(
    File "C:\ProgramData\Miniconda3\lib\site-packages\setuptools\__init__.py", line 153, in setup
      return distutils.core.setup(**attrs)
    File "C:\ProgramData\Miniconda3\lib\distutils\core.py", line 148, in setup
      dist.run_commands()
    File "C:\ProgramData\Miniconda3\lib\distutils\dist.py", line 966, in run_commands
      self.run_command(cmd)
    File "C:\ProgramData\Miniconda3\lib\distutils\dist.py", line 985, in run_command
      cmd_obj.run()
    File "C:\ProgramData\Miniconda3\lib\site-packages\wheel\bdist_wheel.py", line 299, in run
      self.run_command('build')
    File "C:\ProgramData\Miniconda3\lib\distutils\cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "C:\ProgramData\Miniconda3\lib\distutils\dist.py", line 985, in run_command
      cmd_obj.run()
    File "C:\ProgramData\Miniconda3\lib\distutils\command\build.py", line 135, in run
      self.run_command(cmd_name)
    File "C:\ProgramData\Miniconda3\lib\distutils\cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "C:\ProgramData\Miniconda3\lib\distutils\dist.py", line 985, in run_command
      cmd_obj.run()
    File "C:\ProgramData\Miniconda3\lib\site-packages\setuptools\command\build_py.py", line 55, in run
      self.build_package_data()
    File "C:\ProgramData\Miniconda3\lib\site-packages\setuptools\command\build_py.py", line 126, in build_package_data
      srcfile in self.distribution.convert_2to3_doctests):
  AttributeError: 'Distribution' object has no attribute 'convert_2to3_doctests'

Sporadic "disk image is malformed" errors when used within WSGI

Previously, a user raised a concern that langcodes' use of sqlite3 was not thread-safe. I thought this was fixed in modern versions of Python and SQLite, but apparently it isn't.

For reasons I don't understand, this manifests itself when the langcodes database is being accessed from a WSGI process.

I think it may be time to stop relying on sqlite3 for language data.

Invalid tags with duplicate subtags are reported as valid

Tags with repeating variants or singletons are reported as valid.

>>> tag_is_valid('de-1901-1901')
True
>>> tag_is_valid('en-a-bbb-a-ccc')
True

BCP 47 says:

   A tag is considered "valid" if it satisfies these conditions:

   [...]

   o  There are no duplicate variant subtags.

   o  There are no duplicate singleton (extension) subtags.

Alternative trie dependency?

This project depends on marisa-trie, which is a Cython/C++ extension module. The latest release from 2.5 years ago only has pre-built binaries for Python3.6 on macOS, and there are no releases with binaries for a newer version of Python than 3.6. This means users must build it from source, which requires them to have a compiler, the Python development headers, etc. installed. I created this issue but haven't gotten a response from the marisa-trie devs.

Would you be receptive to using a more active project, like datrie, or a pure-Python one like pygtrie?

I'm wary to distribute software that depends on langcodes because of the problems with the marisa-trie dependency.

Supposedly valid `Language` instances may have invalid tags

Language.get normalizes case and enforces syntactic validity whereas Language.make doesn’t. This causes some discrepancies. I presume that a user is not supposed to have to worry about case or syntactic validity once they have a valid (in the sense of is_valid) Language object.

If Language is valid, its tag should presumably be valid. This is not always true.

>>> lang = Language.make(language='Latn')
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False
>>> 
>>> lang = Language.make(script='Qaaa..Qabx')
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False
>>> 
>>> lang = Language.make(extensions='t-fr')
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False
>>> 
>>> lang = Language.make(extensions=['a'])
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False
>>> 
>>> lang = Language.make(language='x-') 
>>> lang.is_valid()
True
>>> tag_is_valid(lang.to_tag())
False

Round-tripping a valid Language through its tag should presumably return an equivalent Language. This is not always true.

>>> lang = Language.make(language='FR')
>>> lang.is_valid()
True
>>> lang == Language.get(lang.to_tag(), normalize=False)
False

Alternatively, maybe it is the user’s responsibility to normalize case and check for syntactic validity before calling Language.make. I don’t think the documentation actually says that though.

Tags with singletons not followed by extensions should be rejected

A singleton must be followed by at least one extension subtag for a tag to be well-formed. This library correctly detects ill-formed tags like und-a but misses the case where the singleton is not the final subtag. It should throw a LanguageTagError.

>>> Language.get('und-a-b-xyz')
Language.make(extensions=['a', 'b-xyz'])

No wheel in the package on pypi when using pip installl

Hi, I just did tried to do a pip install of the package using the --only-binary :all: flag described in the pip documentation.

It failed with the following error.

ERROR: Could not find a version that satisfies the requirement langcodes[data]<4.0.0,>=3.1.0
ERROR: No matching distribution found for langcodes[data]<4.0.0,>=3.1.0

Then I went to the package in pypi and found that there is only a source distribution there and no built wheel. I expected there to be a wheel as it is standard in python packaging.

Fails to install on Python 3.11, outdated marisa-trie

Package operations: 1 install, 1 update, 0 removals

  • Downgrading marisa-trie (1.1.0 -> 0.7.8): Failed

  ChefBuildError

  Backend subprocess exited when trying to invoke build_wheel

  running bdist_wheel
  running build
  running build_clib
  building 'libmarisa-trie' library
  creating build
  creating build/temp.macosx-14-arm64-cpython-312
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie
  creating build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/vector
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/agent.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/agent.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/io/mapper.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/mapper.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/io/reader.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/reader.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/io/writer.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/writer.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/trie/louds-trie.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie/louds-trie.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/trie/tail.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie/tail.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/grimoire/vector/bit-vector.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/vector/bit-vector.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/keyset.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/keyset.o
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/lib -Imarisa-trie/include -c marisa-trie/lib/marisa/trie.cc -o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/trie.o
  /usr/bin/xcrun ar rcs build/temp.macosx-14-arm64-cpython-312/liblibmarisa-trie.a build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/agent.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/mapper.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/reader.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/io/writer.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie/louds-trie.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/trie/tail.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/grimoire/vector/bit-vector.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/keyset.o build/temp.macosx-14-arm64-cpython-312/marisa-trie/lib/marisa/trie.o
  ranlib build/temp.macosx-14-arm64-cpython-312/liblibmarisa-trie.a
  running build_ext
  building 'marisa_trie' extension
  creating build/temp.macosx-14-arm64-cpython-312/src
  clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Imarisa-trie/include -I/private/var/folders/m2/cl1wt_2j5qq5wlnsmywlz2yh0000gn/T/tmp1fmwnqww/.venv/include -I/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12 -c src/agent.cpp -o build/temp.macosx-14-arm64-cpython-312/src/agent.o
  src/agent.cpp:1582:27: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
      return likely(dict) ? __PYX_GET_DICT_VERSION(dict) : 0;
                            ^
  src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
  #define __PYX_GET_DICT_VERSION(dict)  (((PyDictObject*)(dict))->ma_version_tag)
                                                                  ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
      Py_DEPRECATED(3.12) uint64_t ma_version_tag;
      ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
  #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                       ^
  src/agent.cpp:1594:36: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
      return (dictptr && *dictptr) ? __PYX_GET_DICT_VERSION(*dictptr) : 0;
                                     ^
  src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
  #define __PYX_GET_DICT_VERSION(dict)  (((PyDictObject*)(dict))->ma_version_tag)
                                                                  ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
      Py_DEPRECATED(3.12) uint64_t ma_version_tag;
      ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
  #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                       ^
  src/agent.cpp:1598:56: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
      if (unlikely(!dict) || unlikely(tp_dict_version != __PYX_GET_DICT_VERSION(dict)))
                                                         ^
  src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
  #define __PYX_GET_DICT_VERSION(dict)  (((PyDictObject*)(dict))->ma_version_tag)
                                                                  ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
      Py_DEPRECATED(3.12) uint64_t ma_version_tag;
      ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
  #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                       ^
  src/agent.cpp:1657:9: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
          __PYX_PY_DICT_LOOKUP_IF_MODIFIED(
          ^
  src/agent.cpp:1053:16: note: expanded from macro '__PYX_PY_DICT_LOOKUP_IF_MODIFIED'
      if (likely(__PYX_GET_DICT_VERSION(DICT) == __pyx_dict_version)) {\
                 ^
  src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
  #define __PYX_GET_DICT_VERSION(dict)  (((PyDictObject*)(dict))->ma_version_tag)
                                                                  ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
      Py_DEPRECATED(3.12) uint64_t ma_version_tag;
      ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
  #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                       ^
  src/agent.cpp:1657:9: warning: 'ma_version_tag' is deprecated [-Wdeprecated-declarations]
          __PYX_PY_DICT_LOOKUP_IF_MODIFIED(
          ^
  src/agent.cpp:1057:30: note: expanded from macro '__PYX_PY_DICT_LOOKUP_IF_MODIFIED'
          __pyx_dict_version = __PYX_GET_DICT_VERSION(DICT);\
                               ^
  src/agent.cpp:1046:65: note: expanded from macro '__PYX_GET_DICT_VERSION'
  #define __PYX_GET_DICT_VERSION(dict)  (((PyDictObject*)(dict))->ma_version_tag)
                                                                  ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/cpython/dictobject.h:22:5: note: 'ma_version_tag' has been explicitly marked deprecated here
      Py_DEPRECATED(3.12) uint64_t ma_version_tag;
      ^
  /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12/pyport.h:317:54: note: expanded from macro 'Py_DEPRECATED'
  #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                       ^
  src/agent.cpp:1958:55: error: no member named 'ob_digit' in '_longobject'
              const digit* digits = ((PyLongObject*)x)->ob_digit;
                                    ~~~~~~~~~~~~~~~~~~  ^
  src/agent.cpp:2013:55: error: no member named 'ob_digit' in '_longobject'
              const digit* digits = ((PyLongObject*)x)->ob_digit;
                                    ~~~~~~~~~~~~~~~~~~  ^
  src/agent.cpp:2154:55: error: no member named 'ob_digit' in '_longobject'
              const digit* digits = ((PyLongObject*)x)->ob_digit;
                                    ~~~~~~~~~~~~~~~~~~  ^
  src/agent.cpp:2209:55: error: no member named 'ob_digit' in '_longobject'
              const digit* digits = ((PyLongObject*)x)->ob_digit;
                                    ~~~~~~~~~~~~~~~~~~  ^
  src/agent.cpp:2660:47: error: no member named 'ob_digit' in '_longobject'
      const digit* digits = ((PyLongObject*)b)->ob_digit;
                            ~~~~~~~~~~~~~~~~~~  ^
  5 warnings and 5 errors generated.
  error: command '/usr/bin/clang' failed with exit code 1


  at ~/.local/pipx/venvs/poetry/lib/python3.12/site-packages/poetry/installation/chef.py:164 in _prepare
      160│
      161│                 error = ChefBuildError("\n\n".join(message_parts))
      162│
      163│             if error is not None:
    → 164│                 raise error from None
      165│
      166│             return path
      167│
      168│     def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path:

Note: This error originates from the build backend, and is likely not a problem with poetry but with marisa-trie (0.7.8) not supporting PEP 517 builds. You can verify this by running 'pip wheel --no-cache-dir --use-pep517 "marisa-trie (==0.7.8)"'.

This fork fixes it https://github.com/Puyodead1/language_data thanks @Puyodead1

I can't submit a PR

TypeError: 'NoneType' object is not iterable

small issue in file init.py:657

running

from langcodes import get
e = get('en-US')
e.variant_names()

results in a TypeError instead of an empty list.

for variant in self.variants:
    var_names = code_to_names('variant', variant)
    names.append(self._best_name(var_names, language, min_score))

This would not raise an error:

if self.variants:
    for variant in self.variants:
        var_names = code_to_names('variant', variant)
        names.append(self._best_name(var_names, language, min_score))

More parsing problems

Here are some ill-formed tags that this library doesn’t throw exceptions for, and one well-formed (though invalid) tag that it does throw an exception for.

>>> Language.get('x-')
Language.make(language='x-')
>>> Language.get('x-123456789')
Language.make(language='x-123456789')
>>> Language.get('x-')
Language.make(language='x-\ue83f\ue857\ue852\ue83f')
>>> Language.get('und-u-')
Language.make(extensions=['u-'])
>>> Language.get('und-?-foo')
Language.make(extensions=['?-foo'])
>>> Language.get('ar-٠٠١')
Language.make(language='ar', territory='٠٠١')
>>> Language.get('zh-普通话')
Language.make(language='zh', extlangs=['普通话'])
>>> Language.get('non-ᚱᚢᚾᛟ')
Language.make(language='non', script='ᚱᚢᚾᛟ')
>>> Language.get('fr-1606thré')
Language.make(language='fr', variants=['1606thré'])
>>> Language.get('example')
langcodes.tag_parser.LanguageTagError: Expected a language code, got 'example'

List of territories by language code

Given a language code, I'd like to get a list of all territories where that language is being used. For example, for language it (italian), it should return: IT, SM, CH (Italy, San Marino, Switzerland).

Is there a way I could do that with this package?

Closest Match for Punjabi (Pakistan) Not Resolving Match

I'm attempting to match a language code 'pa' with another language code 'pa-PK'.

def test_language_less_than():
    spoken_language_1 = 'pa'
    spoken_language_2 = 'pa-PK'
    match = closest_match(spoken_language_1, [spoken_language_2])
    print(match)
    self.assertEqual(0, match[1])`

def test_language_more_than(self):
    spoken_language_1 = 'pa-PK'
    spoken_language_2 = 'pa'
    match = closest_match(spoken_language_1, [spoken_language_2])
    print(match)
    self.assertEqual(0, match[1])`

This returns

('und', 1000)
('und', 1000)

I would expect this to return a match and not None. When I debug the library, I see the following which returns 54 from the tuple_distance_cached function.

desired_triple = ('pa', 'Arab', 'PK')
supported_triple = ('pa', 'Guru', 'IN')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.