afpy / pospell Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 6.0 84 KB

`pospell` has migrated to an open-source forge: https://git.afpy.org/AFPy/pospell

Python 100.00%

pospell's People

Contributors

Stargazers

Watchers

Forkers

xi humitos christophenan mondeja kudrinyaroslav rtobar

pospell's Issues

Dependabot couldn't authenticate with https://pypi.python.org/simple/

Dependabot couldn't authenticate with https://pypi.python.org/simple/.

You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.

View the update logs.

Drop string interpolation placeholders

placeholders like %s, %(foo)s, or {foo} should be ignored.

Bug: Words separated with a hypen and surrounding spaces are joined and showed as invalid

Steps to reproduce.

Create a valid .po file with the following content

first - word

Run pospell my_previous_file.po
Check that it shows firstword as an invalid word as it trims the hyphen with the surrounding spaces.

The problem seems to be this regex https://github.com/JulienPalard/pospell/blob/master/pospell.py#L136 that is too abarcative. Perhaps we should allow a flag that has a more relaxed replacement policy?

Compound words not correctly handled by pospell

Consider this simple *.po file:

#
msgid ""
msgstr ""

msgid "pub/sub"
msgstr "pub/sub"

Let's try to generate an error for pub/sub with hunspell using es_ES dictionary:

$ echo 'pub/sub' | hunspell -d es_ES -l
pub
sub

If you run it, you can see that the words pub and sub are marked as incorrect, and separated each one by newline characters.

If I try the same with pospell, I have the next result:

$ pospell --language es_ES prueba.po

... nothing is marked as incorrect.

Problem

Printing the output of the call subprocess.run to hunspell, I can see: CompletedProcess(args=['hunspell', '-d', 'es_ES', '-l'], returncode=0, stdout='pub\nsub\n'). So the stdout is pub\nsub\n.

After the call to hunspell, in the source code appears this:

line_of_words = defaultdict(set)
for line, text in enumerate(text_for_hunspell.split("\n"), start=1):
    for word in text.split():
        line_of_words[word].add(line)
for misspelled_word in set(output.stdout.split("\n")):
    for line_number in line_of_words[misspelled_word]:
        errors.append((po_file, line_number, misspelled_word))

So, the code that finds the line numbers for the words supposes that hunspell doesn't splits the words if it found characters as /, -... With some prints in the source code is easy to understand:

print(line_of_words)
for misspelled_word in set(output.stdout.split("\n")):
    if misspelled_word not in line_of_words:
        print("---> mispelled word '%s' doesn't exists in line_of_words" % misspelled_word)
    for line_number in line_of_words[misspelled_word]:
        errors.append((po_file, line_number, misspelled_word))

The complete output is:

CompletedProcess(args=['hunspell', '-d', 'es_ES', '-l'], returncode=0, stdout='pub\nsub\n')
defaultdict(<class 'set'>, {'pub/sub': {6}})
---> mispelled word '' doesn't exists in line_of_words
---> mispelled word 'pub' doesn't exists in line_of_words
---> mispelled word 'sub' doesn't exists in line_of_words

so pub and sub are not added to errors list because their line numbers are not found.

Possible workaround

Compond words behaviour are related to compounding options of hunspell and depends on the dictionaries in use. Something like this may increase correct positives, but is a poor workaround:

import re

for line, text in enumerate(text_for_hunspell.split("\n"), start=1):
    for word in re.split(r' |/|-', text):
        line_of_words[word].add(line)

The ideal solution would be to parse the .aff file for the languages passed and create a set of compounding rules to split correctly the words.

Issues with soft hyphen

I sometimes use soft hyphens for long words in my translations. Example:

msgid "Inclusion/exclusion criteria"
msgstr "Ein-/Ausschlusskriterien"

This gets reported as an error:

some/path.po:842:kriterien

Even if I add "kriterien" to personal dict this error stays.

IMHO the correct way to deal with this would be to ignore the soft hyphen. Not sure if this is an issue in pospell or the underlying spell checker.

Feature request: -m to check only modified files

Shall Pospel check capitalized Words?

I see that pospell do check Capitalized words, and hence python-docs-fr's dict is filled with first names and surnames like Farrugia, Catucci, Fredrik, Guido, Hettinger & co.
Maybe pospell should not verify capitalized names?

desole windows path

Traceback (most recent call last):
File "C:\Python34\Scripts\pospell-script.py", line 11, in
load_entry_point('pospell==0.0.3', 'console_scripts', 'pospell')
File "C:\Python34\lib\site-packages\pospell.py", line 41, in main
(tmpdir / po_file.name).write_text(po_to_text(str(po_file)))
AttributeError: 'WindowsPath' object has no attribute 'write_text'

idee?

AttributeError on docutils 0.18

docutils 0.18 was released today and seems to break pospell:

  File "/usr/local/lib/python3.9/dist-packages/pospell.py", line 119, in visit_Text
    self.output.append(node.rawsource)
AttributeError: 'Text' object has no attribute 'rawsource'

Dont spot "Partypolicularité"

on python-docs-fr:

sed -i s/Particularité/Partypolicularité/ sphinx.po
pospell -l fr -p dict sphinx.po → nothing

Specifically:

$ hunspell -d fr_FR -p dict -u3 test.txt 
$ cat test.txt 
Partypolicularité de l'implémentation.

Is it something I don't understand of -u3? I don't have the issue without it.

'Values' object has no attribute 'syntax_highlight'

Im getting next error running v1.0.7 in python-docs-es:

Traceback (most recent call last):
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/statemachine.py", line 310, in next_line
    self.line = self.input_lines[self.line_offset]
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/statemachine.py", line 1156, in __getitem__
    return self.data[i]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/statemachine.py", line 233, in run
    self.next_line()
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/statemachine.py", line 313, in next_line
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mondeja/files/code/python-docs-es/venv/bin/pospell", line 8, in <module>
    sys.exit(main())
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/pospell.py", line 381, in main
    errors = spell_check(
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/pospell.py", line 296, in spell_check
    texts_for_hunspell[po_file] = po_to_text(str(po_file), drop_capitalized)
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/pospell.py", line 190, in po_to_text
    buffer.append(clear(strip_rst(entry.msgstr), drop_capitalized, po_path=po_path))
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/pospell.py", line 138, in strip_rst
    parser.parse(line, document)
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/__init__.py", line 191, in parse
    self.statemachine.run(inputlines, document, inliner=self.inliner)
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 170, in run
    results = StateMachineWS.run(self, input_lines, input_offset,
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/statemachine.py", line 248, in run
    result = state.eof(context)
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 2712, in eof
    self.blank(None, context, None)
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 2703, in blank
    paragraph, literalnext = self.paragraph(
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 418, in paragraph
    textnodes, messages = self.inline_text(text, lineno)
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 427, in inline_text
    nodes, messages = self.inliner.parse(text, lineno,
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 646, in parse
    before, inlines, remaining, sysmessages = method(self, match,
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 789, in interpreted_or_phrase_ref
    nodelist, messages = self.interpreted(rawsource, escaped, role,
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/states.py", line 886, in interpreted
    nodes, messages2 = role_fn(role, rawsource, text, lineno, self)
  File "/home/mondeja/files/code/python-docs-es/venv/lib/python3.8/site-packages/docutils/parsers/rst/roles.py", line 335, in code_role
    inliner.document.settings.syntax_highlight)
AttributeError: 'Values' object has no attribute 'syntax_highlight'

Minimal reproducible example

msgid "Un rôle de code :code:`object().__str__`"
msgstr "A code role :code:`object().__str__`."

Seems that adding "syntax_highlight": "none", "syntax_highlight": "short" or "syntax_highlight": "long" in docutils.frontend.Values it's fixed, but I'm not totally secure of the side effects of this because after update to v1.0.7, adding this setting property I'm getting 23751 number of errors in python-docs-es against the 570 of v1.0.6.

Is the change in v1.0.7 a breaking change in the preprocessing step of pospell? Thanks for your work.

Errors from polib not handled

It would be nice to handle properly the errors from polib instead of exiting with an exception.
See this build of python-docs-fr.