Coder Social home page Coder Social logo

izimobil / polib Goto Github PK

View Code? Open in Web Editor NEW
95.0 95.0 28.0 675 KB

Pure python library to manipulate, create, modify gettext files (pot, po and mo files).

License: MIT License

Makefile 0.76% Python 99.08% Shell 0.15%
catalog gettext i18n l10n pofile python

polib's People

Contributors

arnaudlimbourg avatar boxed avatar dacodas avatar davvid avatar diegobz avatar encukou avatar erwinjunge avatar fpoirotte avatar gerundt avatar gumblex avatar izimobil avatar jakul avatar jezdez avatar jwilk avatar mestrelion avatar mgeisler avatar mondeja avatar petitlapin avatar pioverfour avatar samhocevar avatar techtonik avatar vsajip avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

polib's Issues

Generated MO file not the same as the one generated by msgfmt

Originally reported by: virtuo (Bitbucket: virtuo, GitHub: virtuo)


Hi,
I think there is a small bug in the to_binary method to write a mo file (http://bitbucket.org/izi/polib/src/78bebd0a7089/polib.py#cl-578)

Changing the code as such (offset of hash table) will generate the same MO file as msgfmt.

#!python

struct.pack("IIIIIII",
            0x950412de,        # Magic number
            0,                 # Version
            entries_len,       # # of entries
            7*4,               # start of key index
            7*4+entries_len*8, # start of value index
            0, keystart)       # size and offset of hash table

Binary strings with MO data can't be read

Originally reported by: Frank Smit (Bitbucket: fsx, GitHub: fsx)


I would like to read/parse MO data from strings (when the source is the database and not a file), but after a quick test I discovered it's not possible. polib raises some exceptions, because it sees null bytes in the byte string.

Instead of accepting strings with the PO data, or a file path, accept a StringIO/BytesIO object or a string with the file path. This way the PO/MO data can be distinguished from a filepath.

I can make a pull request for this.

What do you think of this?


Line wrap improvement

Originally reported by: Anonymous


The default unittext tools will keep the space at the end of a line when wrapping. Ie like this:

msgstr "This is a long message that will be wrapped "
"into multiple lines."

While polib (thanks to textwrap) will put those spaces at the beginning of the next line:

msgstr "This is a long message that will be wrapped"
" into multiple lines."

Although both technically works, it means that when you run a script based on polib you get a massive diff. And then you run msgmerge to update from the updated pot-file, and the same thing happens again... :-)

I think it would be good if the default wrapping behavior was the same as the gettext tools degfault formatting.

The workaround is of course to run msgmerge after runnning your custom script, but before commiting. But it's still kinda annoying...


polib removed #| msgid previous-untranslated-string

Originally reported by: Sharuzzaman Ahmat Raslan (Bitbucket: sharuzzaman, GitHub: sharuzzaman)


In http://www.gnu.org/software/hello/manual/gettext/PO-Files.html

there are one type of comment

#| msgid previous-untranslated-string

In the page, it shows that "Comment lines starting with #| contain the previous untranslated string for which the translator gave a translation. "

KDE is using this comment. While manipulating KDE po file, I found out that polib remove this comment, because it was not read at the first place.

Please add the function to read and write back this comment in polib.

Test python script and input and output po file to show the case attached.

Thanks.


polib mutilates escape sequences

Originally reported by: qx0monster (Bitbucket: qx0monster, GitHub: Unknown)


I've just updated to polib 1.0.1, after sticking around with an 0.5.x version for a long time. Great job, D-J! Sorry to re-raise this old issue, today with a slightly different (and hopefully more convincing) phrasing.

polib mutilates valid escape sequences.

To wit, here is a simple test case:

#!python

bash> cat t.po
#
msgid ""
msgstr ""

msgid "unicode: \u00ae; octal: \141; hex: \x61; control: \b \f \v \a"
msgstr ""
bash> python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import polib
>>> polib.pofile("t.po").save()
>>> quit()
bash> cat t.po
# 
msgid ""
msgstr ""

msgid "unicode: \\u00ae; octal: \\141; hex: \\x61; control: \\b \\f \\v \\a"
msgstr ""
bash>

All escape sequences unknown to polib (ie. outside of \t, \r and \n) get an additional '' in front of them. This is particular problematic for us in the case of unicode escapes, as they are frequently used to enter hard-to-type characters into msgid's and msgstr's (like the "Registered" character in the sample).

The problem arises as polib unescapes strings on reads (which removes some '', but leaves them with unknown sequences like '\u...') and escapes on writes/stringify (which unconditionally prefixes unknown escape seq's with another '').

I thought a lot about it, but to keep a long story short my resolution is to have polib leave unknown escape sequences untouched. We've ran long with this patch in several projects with good results. I probably add more of my considerations as a separate comment.

Here is the pull request:
https://bitbucket.org/izi/polib/pull-request/8/removed-escaping-unescaping-of-unknown


msgctxt not correctly handled

Originally reported by: David Planella (Bitbucket: dpm, GitHub: dpm)


I've noticed that when reading from mo files the msgctxt is not correctly handled, and it is treated as part as the msgid of an entry.

See the attached test for details. In that particular case, I'm reading from an mo file and when printing the msgid of entries that have a msgctxt, the msgid contains the msgctxt and the msgctxt is empty. E.g. (printing an msgid)

#!python

msgctxt "Stock label"
msgid "_Open"
msgstr "_Obre"

>> print entry.msgid
>> Stock label�_Open

test_save_as_mofile failed on powerpc

Originally reported by: Anonymous


======================================================================
FAIL: test_save_as_mofile (main.TestPoFile)

Traceback (most recent call last):
File "tests/tests.py", line 465, in test_save_as_mofile
self.assertEqual(s1, s2)
AssertionError: '\xde\x12\x04\x95\x00\x00\x00\x00\x00\x00\x02 ...

\x00vista:\x00semana\x00semanas\x00a\xc3\xb1o\x00a\xc3\xb1os\x00s\xc3\xad,no,tal vez\x00'


Ran 56 tests in 0.591s

FAILED (failures=1)
error: Bad exit status from /var/tmp/rpm-tmp.JqyRls (%check)

OS: openSUSE Factory
Build log: https://build.opensuse.org/package/live_build_log?arch=ppc64&package=python-polib&project=openSUSE%3AFactory%3APowerPC&repository=standard
Version:1.0.1


polib can not process plural

Originally reported by: Anonymous


If there is one like this:

#. translators:
#. * The number of sound outputs on a particular device
#: ../gnome-volume-control/src/gvc-mixer-control.c:1094
#, c-format
msgid "%u Output"
msgid_plural "%u Outputs"
msgstr[0] "%u 输出"

polib will not return a msgid = '%u Outputs'
and for msgid='%u Output',it return the msgstr=''


Multiline entries are not getting updated

Originally reported by: Diego Búrigo Zacarão (Bitbucket: diegobz, GitHub: diegobz)


#!python

import polib
po = polib.pofile('tests/test_utf8.po')

po.find("Ensure this value has at least %(min)d characters (it has %(length)d).")
<POEntry instance at dde810>

po.find("Ensure this value has at least %(min)d characters (it has %(length)d).")

entry.msgstr = '**' + entry.msgid + '**'

print entry.msgstr
**Ensure this value has at least %(min)d characters (it has %(length)d).**

print entry
#: newforms/fields.py:118
#, python-format
msgid "Ensure this value has at least %(min)d characters (it has %(length)d)."
msgstr ""
"Asegúrese de que su texto tiene al menos %(min)d caracteres (actualmente "
"tiene %(length)d)."

Here is a diff for testing it:

#!diff

diff -r ec920a7a1df8 polib.py
--- a/polib.py  Sat Oct 02 12:09:45 2010 +0200
+++ b/polib.py  Mon Oct 04 09:19:10 2010 -0300
@@ -112,6 +112,13 @@
     True
     >>> po.encoding == po_content.encoding
     True
+    >>> po = polib.pofile('tests/test_utf8.po')
+    >>> entry = entry = po.find("Ensure this value has at least %(min)d characters (it has %(length)d).")
+    >>> entry.msgstr = entry.msgstr=entry.msgid + '**'
+    >>> '**' in entry.msgstr
+    True
+    >>> '**' in entry.__str__() # It's failing: ``print entry.__str__()`` to check the output. True is expected.
+    True
     """
     if kwargs.get('autodetect_encoding', True) == True:
         enc = detect_encoding(pofile)

It seems a pretty important bug, but I couldn't find a fix for it in the short amount of time. :/


POFile initialization from data doesn't work on 0.6

Originally reported by: Seraphim Mellos (Bitbucket: fim, GitHub: fim)


Hello,

I just tried to test the latest release and it seems like it's breaking the ability to initialize a POFile directly from data instead of a file. In the documentation it seems like it should be supported, as in the previous version:

``pofile`` string, full or relative path to the po/pot file or its content (data)

It seems like this check is the reason it's breaking:

https://bitbucket.org/izi/polib/src/39cb2d39ba1a/polib.py#cl-39


POFile.append() raise a duplicate exception when you try to add a new entry with the same msgid and a different msgctxt

Originally reported by: Sorin Sbarnea (Bitbucket: sorin, GitHub: sorin)


POFile.append() raise a duplicate exception when you try to add a new entry with the same msgid and a different msgctxt.

The PO specification says clearly: the unique key is (msgid, msgctxt) - it is perfectly valid to have duplicate msigid entries as long they have different msgctxt.

Note, this happens only when you enable check for duplicates, so here is the bug.


does not wrap msgid and msgstr

Originally reported by: Wagner Bruna (Bitbucket: wbruna, GitHub: wbruna)


Wrapping msgids and msgstrings doesn't seem to be working:

#!python

import polib
print 'version', polib.__version__
wrapwidth = 20
print 'wrapwidth is', wrapwidth
msg = 'a message that should be wrapped'
print 'msgid length is', len(msg)
trn = 'a translation that should be wrapped'
print 'msgstr length is', len(trn)
po = polib.POFile(wrapwidth=wrapwidth)
po.append(polib.POEntry(msgid=msg, msgstr=trn))
print 'po file:'
print str(po)

The output is:

version 0.5.2
wrapwidth is 20
msgid length is 32
msgstr length is 36
po file:
#
msgid ""
msgstr ""

msgid "a message that should be wrapped"
msgstr "a translation that should be wrapped"

PO parser uses strings as msgstr_plural keys

Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)


The PO parser uses strings as msgstr_plural keys:

>>> msgid = '%(size)d byte'
>>> file = polib.pofile('tests/test_utf8.po')
>>> file.find(msgid).msgstr_plural.keys()
[u'1', u'0']

This is surprising, and unlike what MO parser does:

>>> file = polib.mofile('tests/test_utf8.mo')
>>> file.find(msgid).msgstr_plural.keys()
[0, 1]

Finding entries that use msgctxt attribute

Originally reported by: Diego Búrigo Zacarão (Bitbucket: diegobz, GitHub: diegobz)


For instance:

#!python
import polib
po = polib.pofile("msgctxt.pot")
po.find('Shift')
<POEntry instance at 25d3490>
po.find('Shift').msgctxt
u'keyboard label'

It seems that there's no way to access (by finding) the other msgid='Shift' entry with the different msgctxt. Only the first one that it's found in the POT/PO is returned, apparently.


untranslated_entries() also show fuzzy message

Originally reported by: Sharuzzaman Ahmat Raslan (Bitbucket: sharuzzaman, GitHub: sharuzzaman)


I found out that untranslated_entries() function also show fuzzy message. This was caused by the selection criteria for untranslated_entries() is not considering not fuzzy as another factor.

Patches below will fix the issue.

--- polib.py.old        2010-02-02 21:50:26.000000000 +0800
+++ polib.py    2010-02-02 21:52:58.000000000 +0800
@@ -679,7 +679,7 @@
         >>> len(po.untranslated_entries())
         6
         """
-        return [e for e in self if not e.translated() and not e.obsolete]
+        return [e for e in self if not e.translated() and not e.obsolete and not 'fuzzy' in e.flags]

     def fuzzy_entries(self):
         """

incorrectly parses MO files with no header

Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)


The MO file parser assumes that the first entry in the MO file is the header entry. This is true but only if the header entry actually exists. If it doesn't, then polib parses the MO file incorrectly.

As a test case I attached a MO file that contains two translated strings, but no header:

$ msgunfmt no-header.mo
msgid "bar"
msgstr "rab"

msgid "foo"
msgstr "oof"

This is how polib parses the file:

>>> import polib
>>> print(polib.mofile('no-header.mo'))
msgid ""
msgstr "rab: \n"

msgid "foo"
msgstr "oof"

Using unicode() function to get string with pofile contents - Getting a UnicodeDecode error

Originally reported by: markahern (Bitbucket: markahern, GitHub: markahern)


Here is how to reproduce:

#!python
Python 2.7.3 (default, May 19 2013, 04:22:38) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.51)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> po_text = ur"""
... msgid "áéíóú"
... msgstr ""
... """
>>> po_text
u'\nmsgid "\xe1\xe9\xed\xf3\xfa"\nmsgstr ""\n'
>>> import polib
>>> po_file = polib.pofile(po_text.encode('utf-8'), encoding=('utf-8'))
>>> print unicode(po_file)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "polib.py", line 602, in __unicode__
    return ret + _BaseFile.__unicode__(self)
  File "polib.py", line 297, in __unicode__
    ret.append(entry.__unicode__(self.wrapwidth))
  File "polib.py", line 988, in __unicode__
    ret.append(_BaseEntry.__unicode__(self, wrapwidth))
  File "polib.py", line 832, in __unicode__
    ret = u('\n').join(ret)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

It's very possible I'm using it wrong, but seems like a bug to me?


`msgfmt --check`-like checks by polib

Originally reported by: Transifex Sysadmin (Bitbucket: indifex, GitHub: Unknown)


msgfmt has this --check option which checks stuff that "break" po files (and builds of software using these PO files). Examples are different number of %s's between msgid and msgstr.

http://www.gnu.org/software/hello/manual/gettext/msgfmt-Invocation.html

It'd be cool if polib could check for these things. It's really important for Transifex, since a translator using our web translator could lose all his work this way.


LC_ALL=C + python3 -> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3

Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)


If you run setup.py under C locale with Python 3, it fails with UnicodeDecodeError:

$ python3 setup.py build
Traceback (most recent call last):
  File "setup.py", line 27, in <module>
    ''' % (open('README.rst').read(), open('CHANGELOG').read())
  File "/usr/lib/python3.2/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 648: ordinal not in range(128)

Necessity of escape / unescape

Originally reported by: qx0monster (Bitbucket: qx0monster, GitHub: Unknown)


David,

the longer I think about it the more I get the feeling I would like to have polib leave the entry strings unchanged, i.e. not apply unescape (on reads) and escape (on writes) to them. The main use case for gettext (and polib, for that matter) is handling strings that come from files - either source files or .po files. In those cases it would be easiest if the strings were left untouched, as they are. You read strings in and write them out, but you usually never need them in a "parsed" representation; they are just raw data, and that's fine. For example, the current implementation forces me to apply a polib.unescape() to a string collected from a source file, in order to use it properly as a POEntry.msgid.

The only exception from the general use case I can think of is when you want to bring Python string listerals into the game (ie. strings not read from files); and as polib is a Python module there might be occasions where you want to use it in this way. But then all you have to do is to use Python raw strings r'...' to populate POEntries. That would be the only concession for this use case, and an easy one at this, as it matches the po file format description (s. gettext) which mandates C-style escapes.

What do you think?


Disappearing newline characters

Originally reported by: Seraphim Mellos (Bitbucket: fim, GitHub: fim)


Hello,

I just found what appears to be a bug in polib which affects all versions after 0.6. In msgids, there are cases where trailing newlines are removed from the pofile when using the str() method to convert the loaded PO/POT back to text. Here is an example:

I have a POT file with a single PO entry containing this:

#################################################

#, python-format
msgid ""
"There was an error running your transaction for the following reason: %s\n"
msgstr ""

#################################################

However when I load it in the python interpreter I get these results:

#################################################

>>> for entry in pot: entry.msgid
u'There was an error running your transaction for the following reason: %s\n'

#################################################

which seems correct but:

#################################################

>>> print pot.__str__()

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR Red Hat, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
# 
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: test\n"
"Report-Msgid-Bugs-To: [email protected]\n"
"POT-Creation-Date: 2011-02-10 11:42-0500\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <[email protected]>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Language: \n"
"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\n"

#, python-format
msgid ""
"There was an error running your transaction for the following reason: "
"%s "
msgstr ""

#################################################

in which you can see that the trailing '\n' in the msgid is missing. I've tried it with 0.6 and 0.6.2 and both seem to be affected so it probably has to do with the changes from 0.5.5 to 0.6.

If you need any more info let me know.

Cheers


IOError on reading obsolete "previous msgid" entries

Originally reported by: David Planella (Bitbucket: dpm, GitHub: dpm)


I was trying to load a PO file from GNOME at http://l10n.gnome.org/POT/evolution.master/evolution.master.ca.po (attached), and I got the following error:

#!python

dpm@el-far:~$ ipython
In [1]: import polib

In [2]: po = polib.pofile('/home/dpm/evolution.master.ca.po')
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)

/home/dpm/<ipython console> in <module>()

/home/dpm/polib.py in pofile(pofile, **kwargs)
    100         file (optional, default: ``False``).
    101     """
--> 102     return _pofile_or_mofile(pofile, 'pofile', **kwargs)
    103 
    104 # }}}


/home/dpm/polib.py in _pofile_or_mofile(f, type, **kwargs)
     71         check_for_duplicates=kwargs.get('check_for_duplicates', False)
     72     )
---> 73     instance = parser.parse()
     74     instance.wrapwidth = kwargs.get('wrapwidth', 78)
     75     return instance

/home/dpm/polib.py in parse(self)
   1261             else:
   1262                 raise IOError('Syntax error in po file %s (line %s)' % \
-> 1263                               (self.instance.fpath, i))
   1264 
   1265         if self.current_entry:

IOError: Syntax error in po file /home/dpm/evolution.master.ca.po (line 23110)

It seems polib is crashing on the following entry, in particular at the #~| msgid "" line:

#, fuzzy
#~| msgid ""
#~| "Error on %s\n"
#~| "%s"
#~ msgid ""
#~ "Error on %s: %s\n"
#~ "%s"
#~ msgstr ""
#~ "S'ha produït un error en %s:\n"
#~ "%s"

Looking at other files on the GNOME l10n site, I can see more instances of #~|. These seem to be generated automatically by a gettext tool (probably msgmerge) when marking "previous msgid" fuzzy entries as obsolete.

Looking at http://www.gnu.org/software/gettext/manual/gettext.html#PO-Files says nothing on the format of obsolete entries, so I understand that the docs leave a wee bit too much room for guessing in the implementation of a parser.

In any case, if it's generated by a gettext tool, it would be good if polib would either ignore or treat #~| instances as obsolete entries instead of raising an exception.

Thanks!


Escaping \

Originally reported by: qx0monster (Bitbucket: qx0monster, GitHub: Unknown)


Hi David,

long time. I've just upgraded to polib 0.5.1. An issue I'm fighting with since 0.4 is the escaping and unescaping of \ (backslash) (I once wrote you an email about it).

Currently, polib adds an additional \ before another \ (provided that it is not \n, \t, etc.). See the attached escapes.po file and run it through 'pofile("escapes.po").save(fpath="out.po")'.

This means that both msgid's and msgstr's cannot make use of legal Unicode escape sequences, such as \u00AE, as they are turned into \ \u00AE. This is a problem in environments that have no other means to enter and manipulate such special characters as through their escape sequence.

What do you think about it?

Thomas


Support indented PO files

Originally reported by: Anonymous


Some of polib's checks to determine the context of a token is based on hard-coded strings like 'msgid "'.

Unfortunately, those checks cannot take into account indented PO files (-i option in gettext) where there may be more than a single space between "msgid" and the opening quote.

This is true for most tokens (msgctxt, msgid, msgstr, msgid_plural). I'm not sure about special comments like "#| msgid", though I'd suggest supporting multiple spaces there too, to make polib more lax about its inputs.


autodetect_encoding default value

Originally reported by: Vinay Sajip (Bitbucket: vinay.sajip, GitHub: Unknown)


It seems redundant to be forced to pass autodetect_encoding=False every time you want to pass an encoding - if not passed as False, the encoding passed in is never used! It's fine to have the default value be True, but that default should be ignored if an actual encoding is passed, and auto-detection never performed when an encoding is passed in.

What would it mean to pass in an encoding //and// have auto-detection enabled?


easy_install doesn't work

Originally reported by: David Evans (Bitbucket: dave_e, GitHub: Unknown)


running: easy_install polib currently doesn't work as the current version is listed 0.5.4 which isn't available in the download section here.

Work around is to run easy_install polib==0.5.3 which does exist.


polib.py raise a IOException when po file has a commented line in blank

Originally reported by: Angel Abad (Bitbucket: angelabad, GitHub: angelabad)


Hi Izi, Im debian/ubuntu polib maintainer, This bug was filled in ubuntu bug tracker:

If the po file contains a empty comment line, for example:

#. The next line is commented
#.
#. This also

The parser raise a IO exception.

Please see:

And patch:

If you think the patch is correct I can apply it in my packages before you release fix version.

Cheers!


polib doesn't check unescaped quote

Originally reported by: James Ni (Bitbucket: jamesni, GitHub: jamesni)


Hi,
Currently, I use polib in our project to convert po file to json format, i found that if msgstr or msgid containing unescaped (illegal) quote, polib didn't report error and still treated it as an untranslated string.
I have create a patch to fix it. Basically, I want to use eval() function to do python escape semantics check. The idea is get from msgfmt.py in python-tools. I also attach a test po file to test it. Thanks

Best Regards


doesn't check MO versions

Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)


The MO file format specification reads:

A program seeing an unexpected major revision number should stop reading the MO file entirely

But polib doesn't pay attention to versions at all.

As a test-case I attached a MO file with a bogus major revision number. msgunfmt correcly rejects such a file:

$ msgunfmt messages.mo 
msgunfmt: file "messages.mo" is not in GNU .mo format

Yet polib opens it happily:

#!python
>>> import polib
>>> polib.mofile('messages.mo')
[<polib.MOEntry object at 0xf6fdc14c>]

POFile.merge error when an entry is obsolete in a .po, that this entry reappears in the .pot and that we merge the two

Originally reported by: Olivier Olivier (Bitbucket: omansion, GitHub: omansion)


Here are the process to reproduce :

Let A be an entry in a .pot

Merge the .pot with the .po - A is in .po

Remove A from the .pot

Merge the .pot with the .po - A is obsolete in .po

Add again A in .pot

Merge the .pot with the .po - A remains obsolete in .po


please make detecting header field duplicates feasible

Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)


I'd like to be able to detect header field duplicates in PO/MO files. For PO files, I can do it by monkey-patching polib code, but this techique doesn't work for the MO parser.

Could you perhaps move the header parser to a separate function? This way you could avoid code duplication, and make it possible to substitue the parser easily with a customized one. Thanks in advance.


reformating of strings

Originally reported by: Anonymous


I have following problem with Transifex (which claims that root of cause is polib).
I upload translations, do no change, download it back. What I get back is very often reformated. E.g.:

#: ../src/up2date_client/rhnreg_constants.py:48 ../data/rh_register.glade.h:34
msgid ""
-"Access to the technical support experts at Red Hat or Red Hat's partners for "
-"help with any issues you might encounter with this system."
+"Access to the technical support experts at Red Hat or Red Hat's partners for"
+" help with any issues you might encounter with this system."

Note the space after "for" at the end of line.

This make diff in our git repo bigger for no reason. Glezos claims that cause is polib (which Transifex use). I could not verify this claim.


should split flags on "," rather than ", "

Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)


This is how the PO parser split flags:

self.current_token[3:].split(', ')

This is not correct; there is no requirement that the splittng comma is followed by a space.

Please see the attachment for a test-case. msgfmt considers the message fuzzy:

$ msgfmt -v messages.po
0 translated messages, 1 fuzzy translation.

But there are no fuzzy messages according to polib:

>>> polib.pofile('messages.po')
>>> len(p.fuzzy_entries())
0

os.path.exists still causes problems in 1.0.3

Originally reported by: Anonymous


I saw that between 1.0.2 and 1.0.3 you fixed one of the os.path.exists() occurrences to handle the case the input might be unicode text rather than a filename.

However, there are a few other places where os.path.exists is called that weren't fixed, and one was causing us problems (when calling polib.pofile(contents)).

I've attached a patch against 1.0.3 that fixes this. I did some cursory testing to verify that passing in a unicode file to polib.pofile() works now, as does passing in a filename.


__str__() methods are returning unicode instead of str

Originally reported by: Anonymous


Hello,

I was looking at this issue in django-rosetta, which uses polib:
http://code.google.com/p/django-rosetta/issues/detail?id=75

I tracked it down to a bug in polib that caused various str() methods to return unicode instead of str. Calling str() on such an object causes an exception, when str() tries to encode the string using the default ascii code page.

This bug is caused by polib's assumption that codecs.open() returns a generator of str objects. In fact, it generates unicode when you give codecs.open() an encoding parameter (this is the case in Python 2.5). The unicode type then gets propagated in string formatting and joining till it's returned by a str() method.

The quick fix would be to encode the string as it comes out of codecs.open(). Something like this:

(Ignore the line numbers, the version I'm using is the one shipped with django-rosetta.)

#!python

--- polib.py.old        2010-06-15 10:31:03.000000000 +0100
+++ polib.py    2010-06-15 12:20:40.000000000 +0100
@@ -1110,6 +1110,7 @@
         """
         i, lastlen = 1, 0
         for line in self.fhandle:
+            line = line.encode("utf8")
             line = line.strip()
             if line == '':
                 i = i+1

Rick S


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.