snowballstem / pystemmer Goto Github PK
View Code? Open in Web Editor NEWPython stemming library using snowball stemmers
Home Page: https://snowballstem.org/
License: Other
Python stemming library using snowball stemmers
Home Page: https://snowballstem.org/
License: Other
is it possible to use custom .sbl stemmers with pystemmer?
samples :
donates -> donat
nurse - >nurs
middle - > middl
beauty -> beauti
Windows 10
Python 3.11.4 (tags/v3.11.4:d2340ef, Jun 7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] on win32
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting pystemmer
Downloading PyStemmer-2.2.0.1.tar.gz (303 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 303.0/303.0 kB 585.6 kB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [59 lines of output]
Downloading https://snowballstem.org/dist/libstemmer_c-2.2.0.tar.gz... Traceback (most recent call last):
File "C:\Python\Python311\Lib\urllib\request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "C:\Python\Python311\Lib\http\client.py", line 1286, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Python\Python311\Lib\http\client.py", line 1332, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Python\Python311\Lib\http\client.py", line 1281, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Python\Python311\Lib\http\client.py", line 1041, in _send_output
self.send(msg)
File "C:\Python\Python311\Lib\http\client.py", line 979, in send
self.connect()
File "C:\Python\Python311\Lib\http\client.py", line 1458, in connect
self.sock = self._context.wrap_socket(self.sock,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Python311\Lib\ssl.py", line 517, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Python311\Lib\ssl.py", line 1075, in _create
self.do_handshake()
File "C:\Python\Python311\Lib\ssl.py", line 1346, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1002)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "C:\Users\thier\AppData\Local\Temp\pip-install-owy6vr_b\pystemmer_345a507ab40e445fa467ecadf0ac198f\setup.py", line 126, in <module>
LIBRARY_SOURCE_CODE.download()
File "C:\Users\thier\AppData\Local\Temp\pip-install-owy6vr_b\pystemmer_345a507ab40e445fa467ecadf0ac198f\setup.py", line 110, in download
download_and_extract_tarball(
File "C:\Users\thier\AppData\Local\Temp\pip-install-owy6vr_b\pystemmer_345a507ab40e445fa467ecadf0ac198f\tarballfetcher.py", line 40, in download_and_extract_tarball
download_file(tarball_url, tarball_filename)
File "C:\Users\thier\AppData\Local\Temp\pip-install-owy6vr_b\pystemmer_345a507ab40e445fa467ecadf0ac198f\tarballfetcher.py", line 17, in download_file
urlretrieve(url, filename)
File "C:\Python\Python311\Lib\urllib\request.py", line 241, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
^^^^^^^^^^^^^^^^^^
File "C:\Python\Python311\Lib\urllib\request.py", line 216, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Python311\Lib\urllib\request.py", line 519, in open
response = self._open(req, data)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Python311\Lib\urllib\request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Python311\Lib\urllib\request.py", line 496, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "C:\Python\Python311\Lib\urllib\request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Python311\Lib\urllib\request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1002)>
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Got an error while installing with pip
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-g7ijdajv/pystemmer_f12c7427bc3a4c31a368b9d7e7fc3a36/setup.py", line 199, in <module>
['src/Stemmer.pyx'] + list(LIBRARY_SOURCE_CODE.source_code_paths()),
File "/tmp/pip-install-g7ijdajv/pystemmer_f12c7427bc3a4c31a368b9d7e7fc3a36/setup.py", line 85, in source_code_paths
for line in self.iter_manifest_lines():
File "/tmp/pip-install-g7ijdajv/pystemmer_f12c7427bc3a4c31a368b9d7e7fc3a36/setup.py", line 74, in iter_manifest_lines
with open(self.manifest_file_path) as file:
FileNotFoundError: [Errno 2] No such file or directory: 'libstemmer_c-2.2.0/mkinc_utf8.mak'
I have checked pip's tar. For some reason it is located in libstemmer_c path, but not in libstemmer_c-2.2.0
Hi, does this library point to the latest Snowball version?
From my understanding, the latest Snowball version is 2.2.0, but this library's setup.py points to version 2.1.0 (DEFAULT_URI).
Is this intended?
src/Stemmer.c: In function '__Pyx_GetException':
src/Stemmer.c:3222:24: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_type'; did you mean 'curexc_type'?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
src/Stemmer.c:3223:25: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_value'; did you mean 'curexc_value'?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
src/Stemmer.c:3224:22: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_traceback'; did you mean 'curexc_traceback'?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
src/Stemmer.c:3225:13: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_type'; did you mean 'curexc_type'?
tstate->exc_type = local_type;
^~~~~~~~
curexc_type
src/Stemmer.c:3226:13: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_value'; did you mean 'curexc_value'?
tstate->exc_value = local_value;
^~~~~~~~~
curexc_value
src/Stemmer.c:3227:13: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_traceback'; did you mean 'curexc_traceback'?
tstate->exc_traceback = local_tb;
^~~~~~~~~~~~~
curexc_traceback
src/Stemmer.c: In function '__Pyx_ExceptionSave':
src/Stemmer.c:3250:21: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_type'; did you mean 'curexc_type'?
*type = tstate->exc_type;
^~~~~~~~
curexc_type
src/Stemmer.c:3251:22: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_value'; did you mean 'curexc_value'?
*value = tstate->exc_value;
^~~~~~~~~
curexc_value
src/Stemmer.c:3252:19: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_traceback'; did you mean 'curexc_traceback'?
*tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
src/Stemmer.c: In function '__Pyx_ExceptionReset':
src/Stemmer.c:3264:24: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_type'; did you mean 'curexc_type'?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
src/Stemmer.c:3265:25: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_value'; did you mean 'curexc_value'?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
src/Stemmer.c:3266:22: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_traceback'; did you mean 'curexc_traceback'?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
src/Stemmer.c:3267:13: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_type'; did you mean 'curexc_type'?
tstate->exc_type = type;
^~~~~~~~
curexc_type
src/Stemmer.c:3268:13: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_value'; did you mean 'curexc_value'?
tstate->exc_value = value;
^~~~~~~~~
curexc_value
src/Stemmer.c:3269:13: error: 'PyThreadState {aka struct _ts}' has no member named 'exc_traceback'; did you mean 'curexc_traceback'?
tstate->exc_traceback = tb;
^~~~~~~~~~~~~
curexc_traceback
error: command 'gcc' failed with exit status 1
i.e., it looks like Cython is now a dependency for pip install PyStemmer==2.0.0.01
(As detailed on snowball-discuss at http://article.gmane.org/gmane.comp.search.snowball/1402)
The current version of PyStemmer on PyPI (PyStemmer-1.2.0) was built with Cython 0.11 and versions of Cython < 0.17 seem to not know how to translate dict.iteritems
calls to Python 3 compatible C code.
The result is that when I tried to use PyStemmer-1.2.0 on Python 3.3, it imported and basically worked for a few examples, but if I call it in a loop for a large number of words, I get the following error on the 10,000th iteration when the cache gets purged:
word = stemmer.stemWord(word)
File "Stemmer.pyx", line 195, in Stemmer.Stemmer.stemWord (src/Stemmer.c:1657)
File "Stemmer.pyx", line 159, in Stemmer.Stemmer.__purgeCache (src/Stemmer.c:1216)
AttributeError: 'dict' object has no attribute 'iteritems'
I think fixing this is as simple as rebuilding the PyStemmer sdist with a newer version of Cython.
Here's a little test I did that verifies that using Cython >= 0.17 fixes the problem...
I'm using the tox.ini
(from PR #3) and unittest tests (PR #4).
With Cython 0.16:
~/dev/pystemmer$ .tox/py26/bin/pip install Cython==0.16 && .tox/py26/bin/cython src/Stemmer.pyx && .tox/py26/bin/python setup.py sdist && .tox/py33/bin/pip uninstall -y PyStemmer && .tox/py33/bin/pip install dist/PyStemmer-1.2.0.tar.gz && .tox/py33/bin/nosetests –v
…
======================================================================
ERROR: test_stemWord_many_times (test_pystemmer.PyStemmerEnglishTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "Stemmer.pyx", line 184, in Stemmer.Stemmer.stemWord (src/Stemmer.c:1722)
KeyError: b'spiks'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/vagrant/dev/pystemmer/tests/test_pystemmer.py", line 57, in test_stemWord_many_times
result = self.stemmer.stemWord(word)
File "Stemmer.pyx", line 195, in Stemmer.Stemmer.stemWord (src/Stemmer.c:1870)
File "Stemmer.pyx", line 159, in Stemmer.Stemmer.__purgeCache (src/Stemmer.c:1451)
AttributeError: 'dict' object has no attribute 'iteritems'
----------------------------------------------------------------------
Ran 11 tests in 0.056s
FAILED (errors=1)
With Cython 0.17:
~/dev/pystemmer$ .tox/py26/bin/pip install Cython==0.17 && .tox/py26/bin/cython src/Stemmer.pyx && .tox/py26/bin/python setup.py sdist && .tox/py33/bin/pip uninstall -y PyStemmer && .tox/py33/bin/pip install dist/PyStemmer-1.2.0.tar.gz && .tox/py33/bin/nosetests –v
…
~/dev/pystemmer$ .tox/py33/bin/nosetests -v
test_stemWord (test_pystemmer.PyStemmerEnglishTests) ... ok
test_stemWord_many_times (test_pystemmer.PyStemmerEnglishTests) ... ok
test_stemWords (test_pystemmer.PyStemmerEnglishTests) ... ok
test_stemWords_unicode_simple (test_pystemmer.PyStemmerEnglishTests) ... ok
test_stemWord (test_pystemmer.PyStemmerFrenchTests) ... ok
test_has_algorithms (test_pystemmer.PyStemmerGenericTests) ... ok
test_has_version (test_pystemmer.PyStemmerGenericTests) ... ok
test_import (test_pystemmer.PyStemmerGenericTests) ... ok
test_stemWord (test_pystemmer.PyStemmerGermanTests) ... ok
test_stemWord (test_pystemmer.PyStemmerHungarianTests) ... ok
test_stemWord (test_pystemmer.PyStemmerRussianTests) ... ok
----------------------------------------------------------------------
Ran 11 tests in 0.077s
OK
I already have Visual Studio installed and updated on my Windows 10 machine. The error I get is:
(env) C:\Users\JGC\Desktop\Python\Templates de bots\Reddit Template\reddit-karma-farming-bot>pip --version
pip 19.3.1 from C:\Users\JGC\.conda\envs\env\lib\site-packages\pip (python 3.6)
(env) C:\Users\JGC\Desktop\Python\Templates de bots\Reddit Template\reddit-karma-farming-bot>conda --version
conda 4.8.3
(env) C:\Users\JGC\Desktop\Python\Templates de bots\Reddit Template\reddit-karma-farming-bot>pip install PyStemmer
Collecting PyStemmer
Using cached https://files.pythonhosted.org/packages/55/b2/c3aeebfe4a60256ddb72257e750a94c26c3085f017b7e58c860d5aa91432/PyStemmer-2.0.1.tar.gz
Building wheels for collected packages: PyStemmer
Building wheel for PyStemmer (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: 'C:\Users\JGC\.conda\envs\env\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\JGC\\AppData\\Local\\Temp\\pip-install-x21jqz9y\\PyStemmer\\setup.py'"'"'; __file__='"'"'C:\\Users\\JGC\\AppData\\Local\\Temp\\pip-install-x21jqz9y\\PyStemmer\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\JGC\AppData\Local\Temp\pip-wheel-8i0j9des' --python-tag cp36
cwd: C:\Users\JGC\AppData\Local\Temp\pip-install-x21jqz9y\PyStemmer\
Complete output (8 lines):
running bdist_wheel
running build
running build_ext
cythoning src/Stemmer.pyx to src\Stemmer.c
C:\Users\JGC\.conda\envs\env\lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using
2 for now (Py2). This will change in a later release! File: C:\Users\JGC\AppData\Local\Temp\pip-install-x21jqz9y\PyStemmer\src\Stemmer.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
building 'Stemmer' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads/
----------------------------------------
ERROR: Failed building wheel for PyStemmer
Running setup.py clean for PyStemmer
Failed to build PyStemmer
Installing collected packages: PyStemmer
Running setup.py install for PyStemmer ... error
ERROR: Command errored out with exit status 1:
command: 'C:\Users\JGC\.conda\envs\env\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\JGC\\AppData\\Local\\Temp\\pip-install-x21jqz9y\\PyStemmer\\setup.py'"'"'; __file__='"'"'C:\\Users\\JGC\\AppData\\Local\\Temp\\pip-install-x21jqz9y\\PyStemmer\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\JGC\AppData\Local\Temp\pip-record-hhn20m10\install-record.txt' --single-version-externally-managed --compile
cwd: C:\Users\JGC\AppData\Local\Temp\pip-install-x21jqz9y\PyStemmer\
Complete output (6 lines):
running install
running build
running build_ext
skipping 'src\Stemmer.c' Cython extension (up-to-date)
building 'Stemmer' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads/
----------------------------------------
ERROR: Command errored out with exit status 1: 'C:\Users\JGC\.conda\envs\env\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0]
= '"'"'C:\\Users\\JGC\\AppData\\Local\\Temp\\pip-install-x21jqz9y\\PyStemmer\\setup.py'"'"'; __file__='"'"'C:\\Users\\JGC\\AppData\\Local\\Temp\\pip-install-x21jqz9y\\PyStemmer\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\JGC\AppData\Local\Temp\pip-record-hhn20m10\install-record.txt' --single-version-externally-managed --compile Check the logs for full command output.
I grabbed the latest snowball module and built it from scratch.
After that i copied the whole snowball folder to pystemmer. The installation succeeds this way and i have the new stemmers,. however, the import gives the following error:
removing 'PyStemmer-1.3.0' (and everything under it)
**********************************************************************
File "docs/quickstart.txt", line 8, in quickstart.txt
Failed example:
import Stemmer
Exception raised:
Traceback (most recent call last):
File "/usr/lib/python2.7/doctest.py", line 1315, in __run
compileflags, 1) in test.globs
File "<doctest quickstart.txt[0]>", line 1, in <module>
import Stemmer
ImportError: /home/ubuntu/boostai/pystemmer/dist/PyStemmer-1.3.0/Stemmer.so: undefined symbol: arabic_UTF_8_create_env
any ideas how to fix it? im basically trying to use the unreleased stemmers and they compile okay in the snowball module.
Installing pystemmer...
Error: An error occurred while installing pystemmer!
Error text: Collecting pystemmer
Using cached PyStemmer-2.0.1.tar.gz (559 kB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: pystemmer
Building wheel for pystemmer (setup.py): started
Building wheel for pystemmer (setup.py): finished with status 'error'
Running setup.py clean for pystemmer
Failed to build pystemmer
Installing collected packages: pystemmer
Running setup.py install for pystemmer: started
Running setup.py install for pystemmer: finished with status 'error'
1.Maybe "src/Stemmer.c" can't be found.
The reason is there is not "Stemmer.c" in directory, actually. We should compile "src/Stemmer.pyx" by Cython,then we will get Stemmer.c ,so you need run "sudo apt-get install cython" and "cython Stemmer.pyx"
2.Maybe you get error:"src/Stemmer.c:8:22: fatal error: pyconfig.h:......".
Run "sudo apt-get install python-dev " to resolve this problem.
3.Get error:"Permission denied"?
You can run "sudo python setup.py install" to resolve this problem.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.