afpy / pospell Goto Github PK
View Code? Open in Web Editor NEW`pospell` has migrated to an open-source forge: https://git.afpy.org/AFPy/pospell
`pospell` has migrated to an open-source forge: https://git.afpy.org/AFPy/pospell
Dependabot couldn't authenticate with https://pypi.python.org/simple/.
You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.
placeholders like %s
, %(foo)s
, or {foo}
should be ignored.
Steps to reproduce.
first - word
pospell my_previous_file.po
firstword
as an invalid word as it trims the hyphen with the surrounding spaces.The problem seems to be this regex https://github.com/JulienPalard/pospell/blob/master/pospell.py#L136
that is too abarcative. Perhaps we should allow a flag that has a more relaxed replacement policy?
Consider this simple *.po
file:
#
msgid ""
msgstr ""
msgid "pub/sub"
msgstr "pub/sub"
Let's try to generate an error for pub/sub
with hunspell
using es_ES
dictionary:
$ echo 'pub/sub' | hunspell -d es_ES -l
pub
sub
If you run it, you can see that the words pub
and sub
are marked as incorrect, and separated each one by newline characters.
If I try the same with pospell
, I have the next result:
$ pospell --language es_ES prueba.po
... nothing is marked as incorrect.
Printing the output of the call subprocess.run
to hunspell, I can see: CompletedProcess(args=['hunspell', '-d', 'es_ES', '-l'], returncode=0, stdout='pub\nsub\n')
. So the stdout is pub\nsub\n
.
After the call to hunspell, in the source code appears this:
line_of_words = defaultdict(set)
for line, text in enumerate(text_for_hunspell.split("\n"), start=1):
for word in text.split():
line_of_words[word].add(line)
for misspelled_word in set(output.stdout.split("\n")):
for line_number in line_of_words[misspelled_word]:
errors.append((po_file, line_number, misspelled_word))
So, the code that finds the line numbers for the words supposes that hunspell doesn't splits the words if it found characters as /
, -
... With some prints in the source code is easy to understand:
print(line_of_words)
for misspelled_word in set(output.stdout.split("\n")):
if misspelled_word not in line_of_words:
print("---> mispelled word '%s' doesn't exists in line_of_words" % misspelled_word)
for line_number in line_of_words[misspelled_word]:
errors.append((po_file, line_number, misspelled_word))
The complete output is:
CompletedProcess(args=['hunspell', '-d', 'es_ES', '-l'], returncode=0, stdout='pub\nsub\n')
defaultdict(<class 'set'>, {'pub/sub': {6}})
---> mispelled word '' doesn't exists in line_of_words
---> mispelled word 'pub' doesn't exists in line_of_words
---> mispelled word 'sub' doesn't exists in line_of_words
so pub
and sub
are not added to errors
list because their line numbers are not found.
Compond words behaviour are related to compounding options of hunspell and depends on the dictionaries in use. Something like this may increase correct positives, but is a poor workaround:
import re
for line, text in enumerate(text_for_hunspell.split("\n"), start=1):
for word in re.split(r' |/|-', text):
line_of_words[word].add(line)
The ideal solution would be to parse the .aff
file for the languages passed and create a set of compounding rules to split correctly the words.
I sometimes use soft hyphens for long words in my translations. Example:
msgid "Inclusion/exclusion criteria"
msgstr "Ein-/Ausschlusskriterien"
This gets reported as an error:
some/path.po:842:kriterien
Even if I add "kriterien" to personal dict this error stays.
IMHO the correct way to deal with this would be to ignore the soft hyphen. Not sure if this is an issue in pospell or the underlying spell checker.
I see that pospell do check Capitalized words, and hence python-docs-fr's dict is filled with first names and surnames like Farrugia, Catucci, Fredrik, Guido, Hettinger & co.
Maybe pospell should not verify capitalized names?
Traceback (most recent call last):
File "C:\Python34\Scripts\pospell-script.py", line 11, in
load_entry_point('pospell==0.0.3', 'console_scripts', 'pospell')
File "C:\Python34\lib\site-packages\pospell.py", line 41, in main
(tmpdir / po_file.name).write_text(po_to_text(str(po_file)))
AttributeError: 'WindowsPath' object has no attribute 'write_text'
idee?
docutils 0.18 was released today and seems to break pospell:
File "/usr/local/lib/python3.9/dist-packages/pospell.py", line 119, in visit_Text
self.output.append(node.rawsource)
AttributeError: 'Text' object has no attribute 'rawsource'
on python-docs-fr:
sed -i s/Particularité/Partypolicularité/ sphinx.po
pospell -l fr -p dict sphinx.po → nothing
Specifically:
$ hunspell -d fr_FR -p dict -u3 test.txt
$ cat test.txt
Partypolicularité de l'implémentation.
Is it something I don't understand of -u3
? I don't have the issue without it.
Im getting next error running v1.0.7 in python-docs-es
:
Traceback (most recent call last):
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/statemachine.py", line 310, in next_line
self.line = self.input_lines[self.line_offset]
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/statemachine.py", line 1156, in __getitem__
return self.data[i]
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/statemachine.py", line 233, in run
self.next_line()
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/statemachine.py", line 313, in next_line
raise EOFError
EOFError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mondeja/files/code/python-docs-es/venv/bin/pospell", line 8, in <module>
sys.exit(main())
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/pospell.py", line 381, in main
errors = spell_check(
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/pospell.py", line 296, in spell_check
texts_for_hunspell[po_file] = po_to_text(str(po_file), drop_capitalized)
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/pospell.py", line 190, in po_to_text
buffer.append(clear(strip_rst(entry.msgstr), drop_capitalized, po_path=po_path))
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/pospell.py", line 138, in strip_rst
parser.parse(line, document)
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/__init__.py", line 191, in parse
self.statemachine.run(inputlines, document, inliner=self.inliner)
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 170, in run
results = StateMachineWS.run(self, input_lines, input_offset,
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/statemachine.py", line 248, in run
result = state.eof(context)
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 2712, in eof
self.blank(None, context, None)
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 2703, in blank
paragraph, literalnext = self.paragraph(
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 418, in paragraph
textnodes, messages = self.inline_text(text, lineno)
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 427, in inline_text
nodes, messages = self.inliner.parse(text, lineno,
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 646, in parse
before, inlines, remaining, sysmessages = method(self, match,
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 789, in interpreted_or_phrase_ref
nodelist, messages = self.interpreted(rawsource, escaped, role,
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 886, in interpreted
nodes, messages2 = role_fn(role, rawsource, text, lineno, self)
File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/roles.py", line 335, in code_role
inliner.document.settings.syntax_highlight)
AttributeError: 'Values' object has no attribute 'syntax_highlight'
msgid "Un rôle de code :code:`object().__str__`"
msgstr "A code role :code:`object().__str__`."
Seems that adding "syntax_highlight": "none"
, "syntax_highlight": "short"
or "syntax_highlight": "long"
in docutils.frontend.Values
it's fixed, but I'm not totally secure of the side effects of this because after update to v1.0.7, adding this setting property I'm getting 23751
number of errors in python-docs-es
against the 570
of v1.0.6.
Is the change in v1.0.7 a breaking change in the preprocessing step of pospell? Thanks for your work.
It would be nice to handle properly the errors from polib instead of exiting with an exception.
See this build of python-docs-fr.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.