izimobil / polib Goto Github PK
View Code? Open in Web Editor NEWPure python library to manipulate, create, modify gettext files (pot, po and mo files).
License: MIT License
Pure python library to manipulate, create, modify gettext files (pot, po and mo files).
License: MIT License
Originally reported by: James Ni (Bitbucket: jamesni, GitHub: jamesni)
Hi,
Currently, I use polib in our project to convert po file to json format, i found that if msgstr or msgid containing unescaped (illegal) quote, polib didn't report error and still treated it as an untranslated string.
I have create a patch to fix it. Basically, I want to use eval() function to do python escape semantics check. The idea is get from msgfmt.py in python-tools. I also attach a test po file to test it. Thanks
Best Regards
Originally reported by: Anonymous
Some of polib's checks to determine the context of a token is based on hard-coded strings like 'msgid "'.
Unfortunately, those checks cannot take into account indented PO files (-i option in gettext) where there may be more than a single space between "msgid" and the opening quote.
This is true for most tokens (msgctxt, msgid, msgstr, msgid_plural). I'm not sure about special comments like "#| msgid", though I'd suggest supporting multiple spaces there too, to make polib more lax about its inputs.
Originally reported by: David Planella (Bitbucket: dpm, GitHub: dpm)
I've noticed that when reading from mo files the msgctxt is not correctly handled, and it is treated as part as the msgid of an entry.
See the attached test for details. In that particular case, I'm reading from an mo file and when printing the msgid of entries that have a msgctxt, the msgid contains the msgctxt and the msgctxt is empty. E.g. (printing an msgid)
#!python
msgctxt "Stock label"
msgid "_Open"
msgstr "_Obre"
>> print entry.msgid
>> Stock label�_Open
Originally reported by: qx0monster (Bitbucket: qx0monster, GitHub: Unknown)
I've just updated to polib 1.0.1, after sticking around with an 0.5.x version for a long time. Great job, D-J! Sorry to re-raise this old issue, today with a slightly different (and hopefully more convincing) phrasing.
polib mutilates valid escape sequences.
To wit, here is a simple test case:
#!python
bash> cat t.po
#
msgid ""
msgstr ""
msgid "unicode: \u00ae; octal: \141; hex: \x61; control: \b \f \v \a"
msgstr ""
bash> python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import polib
>>> polib.pofile("t.po").save()
>>> quit()
bash> cat t.po
#
msgid ""
msgstr ""
msgid "unicode: \\u00ae; octal: \\141; hex: \\x61; control: \\b \\f \\v \\a"
msgstr ""
bash>
All escape sequences unknown to polib (ie. outside of \t, \r and \n) get an additional '' in front of them. This is particular problematic for us in the case of unicode escapes, as they are frequently used to enter hard-to-type characters into msgid's and msgstr's (like the "Registered" character in the sample).
The problem arises as polib unescapes strings on reads (which removes some '', but leaves them with unknown sequences like '\u...') and escapes on writes/stringify (which unconditionally prefixes unknown escape seq's with another '').
I thought a lot about it, but to keep a long story short my resolution is to have polib leave unknown escape sequences untouched. We've ran long with this patch in several projects with good results. I probably add more of my considerations as a separate comment.
Here is the pull request:
https://bitbucket.org/izi/polib/pull-request/8/removed-escaping-unescaping-of-unknown
Originally reported by: Olivier Olivier (Bitbucket: omansion, GitHub: omansion)
Here are the process to reproduce :
Originally reported by: Marat Valiev (Bitbucket: user2589, GitHub: user2589)
My code relies to the changeset 5b2fdb5a0a4c . https://github.com/user2589/django-rosetta
Can you please add a minor version/tag (eg 0.7.1) to this changeset or the last commit so I can add it to dependencies?
Originally reported by: Vinay Sajip (Bitbucket: vinay.sajip, GitHub: Unknown)
It seems redundant to be forced to pass autodetect_encoding=False
every time you want to pass an encoding - if not passed as False
, the encoding
passed in is never used! It's fine to have the default value be True
, but that default should be ignored if an actual encoding is passed, and auto-detection never performed when an encoding is passed in.
What would it mean to pass in an encoding //and// have auto-detection enabled?
Originally reported by: Angel Abad (Bitbucket: angelabad, GitHub: angelabad)
Hi David, Im Debian polib maintainer, please take a look this bug in Debian BTS
Cheers,
Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)
This is how the PO parser split flags:
self.current_token[3:].split(', ')
This is not correct; there is no requirement that the splittng comma is followed by a space.
Please see the attachment for a test-case. msgfmt considers the message fuzzy:
$ msgfmt -v messages.po
0 translated messages, 1 fuzzy translation.
But there are no fuzzy messages according to polib:
>>> polib.pofile('messages.po')
>>> len(p.fuzzy_entries())
0
Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)
The PO parser uses strings as msgstr_plural keys:
>>> msgid = '%(size)d byte'
>>> file = polib.pofile('tests/test_utf8.po')
>>> file.find(msgid).msgstr_plural.keys()
[u'1', u'0']
This is surprising, and unlike what MO parser does:
>>> file = polib.mofile('tests/test_utf8.mo')
>>> file.find(msgid).msgstr_plural.keys()
[0, 1]
Originally reported by: Anonymous
Traceback (most recent call last):
File "tests/tests.py", line 465, in test_save_as_mofile
self.assertEqual(s1, s2)
AssertionError: '\xde\x12\x04\x95\x00\x00\x00\x00\x00\x00\x02 ...
\x00vista:\x00semana\x00semanas\x00a\xc3\xb1o\x00a\xc3\xb1os\x00s\xc3\xad,no,tal vez\x00'
Ran 56 tests in 0.591s
FAILED (failures=1)
error: Bad exit status from /var/tmp/rpm-tmp.JqyRls (%check)
OS: openSUSE Factory
Build log: https://build.opensuse.org/package/live_build_log?arch=ppc64&package=python-polib&project=openSUSE%3AFactory%3APowerPC&repository=standard
Version:1.0.1
Originally reported by: David Planella (Bitbucket: dpm, GitHub: dpm)
I was trying to load a PO file from GNOME at http://l10n.gnome.org/POT/evolution.master/evolution.master.ca.po (attached), and I got the following error:
#!python
dpm@el-far:~$ ipython
In [1]: import polib
In [2]: po = polib.pofile('/home/dpm/evolution.master.ca.po')
---------------------------------------------------------------------------
IOError Traceback (most recent call last)
/home/dpm/<ipython console> in <module>()
/home/dpm/polib.py in pofile(pofile, **kwargs)
100 file (optional, default: ``False``).
101 """
--> 102 return _pofile_or_mofile(pofile, 'pofile', **kwargs)
103
104 # }}}
/home/dpm/polib.py in _pofile_or_mofile(f, type, **kwargs)
71 check_for_duplicates=kwargs.get('check_for_duplicates', False)
72 )
---> 73 instance = parser.parse()
74 instance.wrapwidth = kwargs.get('wrapwidth', 78)
75 return instance
/home/dpm/polib.py in parse(self)
1261 else:
1262 raise IOError('Syntax error in po file %s (line %s)' % \
-> 1263 (self.instance.fpath, i))
1264
1265 if self.current_entry:
IOError: Syntax error in po file /home/dpm/evolution.master.ca.po (line 23110)
It seems polib is crashing on the following entry, in particular at the #~| msgid ""
line:
#, fuzzy
#~| msgid ""
#~| "Error on %s\n"
#~| "%s"
#~ msgid ""
#~ "Error on %s: %s\n"
#~ "%s"
#~ msgstr ""
#~ "S'ha produït un error en %s:\n"
#~ "%s"
Looking at other files on the GNOME l10n site, I can see more instances of #~|
. These seem to be generated automatically by a gettext tool (probably msgmerge) when marking "previous msgid" fuzzy entries as obsolete.
Looking at http://www.gnu.org/software/gettext/manual/gettext.html#PO-Files says nothing on the format of obsolete entries, so I understand that the docs leave a wee bit too much room for guessing in the implementation of a parser.
In any case, if it's generated by a gettext tool, it would be good if polib would either ignore or treat #~|
instances as obsolete entries instead of raising an exception.
Thanks!
Originally reported by: Anonymous
such like
#!gettext
#. translators: reverse the order of these arguments
#. * if the kicked should come before the kicker in your locale.
#.
#: ../libempathy-gtk/empathy-chat.c:2729
#, c-format
msgid "%1$s was kicked by %2$s"
msgstr "%1$s 被 %2$s 踢出"
Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)
The MO file format specification reads:
A program seeing an unexpected major revision number should stop reading the MO file entirely
But polib doesn't pay attention to versions at all.
As a test-case I attached a MO file with a bogus major revision number. msgunfmt correcly rejects such a file:
$ msgunfmt messages.mo
msgunfmt: file "messages.mo" is not in GNU .mo format
Yet polib opens it happily:
#!python
>>> import polib
>>> polib.mofile('messages.mo')
[<polib.MOEntry object at 0xf6fdc14c>]
Originally reported by: Diego Búrigo Zacarão (Bitbucket: diegobz, GitHub: diegobz)
For instance:
#!python
import polib
po = polib.pofile("msgctxt.pot")
po.find('Shift')
<POEntry instance at 25d3490>
po.find('Shift').msgctxt
u'keyboard label'
It seems that there's no way to access (by finding) the other msgid='Shift' entry with the different msgctxt. Only the first one that it's found in the POT/PO is returned, apparently.
Originally reported by: Anonymous
If there is one like this:
#. translators:
#. * The number of sound outputs on a particular device
#: ../gnome-volume-control/src/gvc-mixer-control.c:1094
#, c-format
msgid "%u Output"
msgid_plural "%u Outputs"
msgstr[0] "%u 输出"
polib will not return a msgid = '%u Outputs'
and for msgid='%u Output',it return the msgstr=''
Originally reported by: Sorin Sbarnea (Bitbucket: sorin, GitHub: sorin)
POFile.append() raise a duplicate exception when you try to add a new entry with the same msgid and a different msgctxt.
The PO specification says clearly: the unique key is (msgid, msgctxt) - it is perfectly valid to have duplicate msigid entries as long they have different msgctxt.
Note, this happens only when you enable check for duplicates, so here is the bug.
Originally reported by: virtuo (Bitbucket: virtuo, GitHub: virtuo)
Hi,
I think there is a small bug in the to_binary method to write a mo file (http://bitbucket.org/izi/polib/src/78bebd0a7089/polib.py#cl-578)
Changing the code as such (offset of hash table) will generate the same MO file as msgfmt.
#!python
struct.pack("IIIIIII",
0x950412de, # Magic number
0, # Version
entries_len, # # of entries
7*4, # start of key index
7*4+entries_len*8, # start of value index
0, keystart) # size and offset of hash table
Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)
It would be nice if MOEntry
instances had a flags
attribute, always set to an empty list.
My software works with both PO and MO files, so keeping MOEntry
and POEntry
interfaces compatible would make my life slightly easier. :)
Originally reported by: Anonymous
Hello,
I was looking at this issue in django-rosetta, which uses polib:
http://code.google.com/p/django-rosetta/issues/detail?id=75
I tracked it down to a bug in polib that caused various str() methods to return unicode instead of str. Calling str() on such an object causes an exception, when str() tries to encode the string using the default ascii code page.
This bug is caused by polib's assumption that codecs.open() returns a generator of str objects. In fact, it generates unicode when you give codecs.open() an encoding parameter (this is the case in Python 2.5). The unicode type then gets propagated in string formatting and joining till it's returned by a str() method.
The quick fix would be to encode the string as it comes out of codecs.open(). Something like this:
(Ignore the line numbers, the version I'm using is the one shipped with django-rosetta.)
#!python
--- polib.py.old 2010-06-15 10:31:03.000000000 +0100
+++ polib.py 2010-06-15 12:20:40.000000000 +0100
@@ -1110,6 +1110,7 @@
"""
i, lastlen = 1, 0
for line in self.fhandle:
+ line = line.encode("utf8")
line = line.strip()
if line == '':
i = i+1
Rick S
Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)
If you run setup.py
under C locale with Python 3, it fails with UnicodeDecodeError
:
$ python3 setup.py build
Traceback (most recent call last):
File "setup.py", line 27, in <module>
''' % (open('README.rst').read(), open('CHANGELOG').read())
File "/usr/lib/python3.2/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 648: ordinal not in range(128)
Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)
I'd like to be able to detect header field duplicates in PO/MO files. For PO files, I can do it by monkey-patching polib code, but this techique doesn't work for the MO parser.
Could you perhaps move the header parser to a separate function? This way you could avoid code duplication, and make it possible to substitue the parser easily with a customized one. Thanks in advance.
Originally reported by: Tim Gerundt (Bitbucket: gerundt, GitHub: gerundt)
A translator can mark a translation as fuzzy, if he thinks that it need improvments. Unfortantly POFile.merge() drops this fuzzy attributes from the PO file.
If I use Poedit (which use the gettext utils in the background) to update a PO file from a POT file, it keeps my fuzzy attributes.
Greetings,
Tim Gerundt
Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)
The MO file parser assumes that the first entry in the MO file is the header entry. This is true but only if the header entry actually exists. If it doesn't, then polib parses the MO file incorrectly.
As a test case I attached a MO file that contains two translated strings, but no header:
$ msgunfmt no-header.mo
msgid "bar"
msgstr "rab"
msgid "foo"
msgstr "oof"
This is how polib parses the file:
>>> import polib
>>> print(polib.mofile('no-header.mo'))
msgid ""
msgstr "rab: \n"
msgid "foo"
msgstr "oof"
Originally reported by: Sharuzzaman Ahmat Raslan (Bitbucket: sharuzzaman, GitHub: sharuzzaman)
I found out that untranslated_entries() function also show fuzzy message. This was caused by the selection criteria for untranslated_entries() is not considering not fuzzy as another factor.
Patches below will fix the issue.
--- polib.py.old 2010-02-02 21:50:26.000000000 +0800
+++ polib.py 2010-02-02 21:52:58.000000000 +0800
@@ -679,7 +679,7 @@
>>> len(po.untranslated_entries())
6
"""
- return [e for e in self if not e.translated() and not e.obsolete]
+ return [e for e in self if not e.translated() and not e.obsolete and not 'fuzzy' in e.flags]
def fuzzy_entries(self):
"""
Originally reported by: Seraphim Mellos (Bitbucket: fim, GitHub: fim)
Hello,
I just found what appears to be a bug in polib which affects all versions after 0.6. In msgids, there are cases where trailing newlines are removed from the pofile when using the str() method to convert the loaded PO/POT back to text. Here is an example:
I have a POT file with a single PO entry containing this:
#################################################
#, python-format
msgid ""
"There was an error running your transaction for the following reason: %s\n"
msgstr ""
#################################################
However when I load it in the python interpreter I get these results:
#################################################
>>> for entry in pot: entry.msgid
u'There was an error running your transaction for the following reason: %s\n'
#################################################
which seems correct but:
#################################################
>>> print pot.__str__()
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR Red Hat, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: test\n"
"Report-Msgid-Bugs-To: [email protected]\n"
"POT-Creation-Date: 2011-02-10 11:42-0500\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <[email protected]>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Language: \n"
"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\n"
#, python-format
msgid ""
"There was an error running your transaction for the following reason: "
"%s "
msgstr ""
#################################################
in which you can see that the trailing '\n' in the msgid is missing. I've tried it with 0.6 and 0.6.2 and both seem to be affected so it probably has to do with the changes from 0.5.5 to 0.6.
If you need any more info let me know.
Cheers
Originally reported by: Anonymous
Please fix the example:
entry.occurences = [('welcome.py', '12'), ('anotherfile.py', '34')]
Use "occurrences" instead
Originally reported by: Diego Búrigo Zacarão (Bitbucket: diegobz, GitHub: diegobz)
#!python
import polib
po = polib.pofile('tests/test_utf8.po')
po.find("Ensure this value has at least %(min)d characters (it has %(length)d).")
<POEntry instance at dde810>
po.find("Ensure this value has at least %(min)d characters (it has %(length)d).")
entry.msgstr = '**' + entry.msgid + '**'
print entry.msgstr
**Ensure this value has at least %(min)d characters (it has %(length)d).**
print entry
#: newforms/fields.py:118
#, python-format
msgid "Ensure this value has at least %(min)d characters (it has %(length)d)."
msgstr ""
"Asegúrese de que su texto tiene al menos %(min)d caracteres (actualmente "
"tiene %(length)d)."
Here is a diff for testing it:
#!diff
diff -r ec920a7a1df8 polib.py
--- a/polib.py Sat Oct 02 12:09:45 2010 +0200
+++ b/polib.py Mon Oct 04 09:19:10 2010 -0300
@@ -112,6 +112,13 @@
True
>>> po.encoding == po_content.encoding
True
+ >>> po = polib.pofile('tests/test_utf8.po')
+ >>> entry = entry = po.find("Ensure this value has at least %(min)d characters (it has %(length)d).")
+ >>> entry.msgstr = entry.msgstr=entry.msgid + '**'
+ >>> '**' in entry.msgstr
+ True
+ >>> '**' in entry.__str__() # It's failing: ``print entry.__str__()`` to check the output. True is expected.
+ True
"""
if kwargs.get('autodetect_encoding', True) == True:
enc = detect_encoding(pofile)
It seems a pretty important bug, but I couldn't find a fix for it in the short amount of time. :/
Originally reported by: Azamat Hackimov (Bitbucket: winterheart, GitHub: winterheart)
Please provide tarball for new version or make tag in repository. I'm distribution packager, so I need tar.bz2 to create package. Thank you.
Originally reported by: Seraphim Mellos (Bitbucket: fim, GitHub: fim)
Hello,
I just tried to test the latest release and it seems like it's breaking the ability to initialize a POFile directly from data instead of a file. In the documentation it seems like it should be supported, as in the previous version:
``pofile`` string, full or relative path to the po/pot file or its content (data)
It seems like this check is the reason it's breaking:
https://bitbucket.org/izi/polib/src/39cb2d39ba1a/polib.py#cl-39
Originally reported by: Anonymous
I saw that between 1.0.2 and 1.0.3 you fixed one of the os.path.exists() occurrences to handle the case the input might be unicode text rather than a filename.
However, there are a few other places where os.path.exists is called that weren't fixed, and one was causing us problems (when calling polib.pofile(contents)).
I've attached a patch against 1.0.3 that fixes this. I did some cursory testing to verify that passing in a unicode file to polib.pofile() works now, as does passing in a filename.
Originally reported by: Wagner Bruna (Bitbucket: wbruna, GitHub: wbruna)
Wrapping msgids and msgstrings doesn't seem to be working:
#!python
import polib
print 'version', polib.__version__
wrapwidth = 20
print 'wrapwidth is', wrapwidth
msg = 'a message that should be wrapped'
print 'msgid length is', len(msg)
trn = 'a translation that should be wrapped'
print 'msgstr length is', len(trn)
po = polib.POFile(wrapwidth=wrapwidth)
po.append(polib.POEntry(msgid=msg, msgstr=trn))
print 'po file:'
print str(po)
The output is:
version 0.5.2
wrapwidth is 20
msgid length is 32
msgstr length is 36
po file:
#
msgid ""
msgstr ""
msgid "a message that should be wrapped"
msgstr "a translation that should be wrapped"
Originally reported by: Sorin Sbarnea (Bitbucket: sorin, GitHub: sorin)
Please add __init__.py
to root of hg repository so we can clone the repository directly and just import the library and use it.
Originally reported by: Anonymous
po = polib.pofile(fp)
There were some missing strings, which turned out to have embedded double quotes in the string.
Here is an example:
"Sufficient space is not available to import exams from device on "%s"."
Originally reported by: Angel Abad (Bitbucket: angelabad, GitHub: angelabad)
Hi Izi, Im debian/ubuntu polib maintainer, This bug was filled in ubuntu bug tracker:
If the po file contains a empty comment line, for example:
#. The next line is commented
#.
#. This also
The parser raise a IO exception.
Please see:
And patch:
If you think the patch is correct I can apply it in my packages before you release fix version.
Cheers!
Originally reported by: Angel Abad (Bitbucket: angelabad, GitHub: angelabad)
Hi David, here Debian Maintainer again, we receive this bug in Debian BTS, please take a look:
You can add comments to this bug report, writing emails to [email protected].
Cheers,
Originally reported by: Apostolis Bessas (Bitbucket: mpessas, GitHub: mpessas)
Hi,
polib v0.7 has an issue with pofiles which have empty comments.
It does not recognise them as comments, but it raises an IOError instead.
See for example the attached file, or http://trac.transifex.org/ticket/790.
Thanks,
Apostolis
Originally reported by: qx0monster (Bitbucket: qx0monster, GitHub: Unknown)
David,
the longer I think about it the more I get the feeling I would like to have polib leave the entry strings unchanged, i.e. not apply unescape (on reads) and escape (on writes) to them. The main use case for gettext (and polib, for that matter) is handling strings that come from files - either source files or .po files. In those cases it would be easiest if the strings were left untouched, as they are. You read strings in and write them out, but you usually never need them in a "parsed" representation; they are just raw data, and that's fine. For example, the current implementation forces me to apply a polib.unescape() to a string collected from a source file, in order to use it properly as a POEntry.msgid.
The only exception from the general use case I can think of is when you want to bring Python string listerals into the game (ie. strings not read from files); and as polib is a Python module there might be occasions where you want to use it in this way. But then all you have to do is to use Python raw strings r'...' to populate POEntries. That would be the only concession for this use case, and an easy one at this, as it matches the po file format description (s. gettext) which mandates C-style escapes.
What do you think?
Originally reported by: Anonymous
The default unittext tools will keep the space at the end of a line when wrapping. Ie like this:
msgstr "This is a long message that will be wrapped "
"into multiple lines."
While polib (thanks to textwrap) will put those spaces at the beginning of the next line:
msgstr "This is a long message that will be wrapped"
" into multiple lines."
Although both technically works, it means that when you run a script based on polib you get a massive diff. And then you run msgmerge to update from the updated pot-file, and the same thing happens again... :-)
I think it would be good if the default wrapping behavior was the same as the gettext tools degfault formatting.
The workaround is of course to run msgmerge after runnning your custom script, but before commiting. But it's still kinda annoying...
Originally reported by: Sharuzzaman Ahmat Raslan (Bitbucket: sharuzzaman, GitHub: sharuzzaman)
In http://www.gnu.org/software/hello/manual/gettext/PO-Files.html
there are one type of comment
#| msgid previous-untranslated-string
In the page, it shows that "Comment lines starting with #| contain the previous untranslated string for which the translator gave a translation. "
KDE is using this comment. While manipulating KDE po file, I found out that polib remove this comment, because it was not read at the first place.
Please add the function to read and write back this comment in polib.
Test python script and input and output po file to show the case attached.
Thanks.
Originally reported by: Transifex Sysadmin (Bitbucket: indifex, GitHub: Unknown)
msgfmt has this --check option which checks stuff that "break" po files (and builds of software using these PO files). Examples are different number of %s's between msgid and msgstr.
http://www.gnu.org/software/hello/manual/gettext/msgfmt-Invocation.html
It'd be cool if polib could check for these things. It's really important for Transifex, since a translator using our web translator could lose all his work this way.
Originally reported by: Frank Smit (Bitbucket: fsx, GitHub: fsx)
I would like to read/parse MO data from strings (when the source is the database and not a file), but after a quick test I discovered it's not possible. polib raises some exceptions, because it sees null bytes in the byte string.
Instead of accepting strings with the PO data, or a file path, accept a StringIO/BytesIO object or a string with the file path. This way the PO/MO data can be distinguished from a filepath.
I can make a pull request for this.
What do you think of this?
Originally reported by: Rémy HUBSCHER (Bitbucket: natim, GitHub: natim)
Please make a python sdist upload
so that we've got the packet on pypi and we can add it automatically to our mirror.
Originally reported by: defaultwombat (Bitbucket: defaultwombat, GitHub: Unknown)
The textwrap module of python 2.5 doesn't know the keyword "drop_whitespace" used in the method _BaseEntry._str_field
Originally reported by: Anonymous
It would be so good if the developer could make this useful library available for python 3.
thanks for great work
Originally reported by: markahern (Bitbucket: markahern, GitHub: markahern)
Here is how to reproduce:
#!python
Python 2.7.3 (default, May 19 2013, 04:22:38)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.51)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> po_text = ur"""
... msgid "áéíóú"
... msgstr ""
... """
>>> po_text
u'\nmsgid "\xe1\xe9\xed\xf3\xfa"\nmsgstr ""\n'
>>> import polib
>>> po_file = polib.pofile(po_text.encode('utf-8'), encoding=('utf-8'))
>>> print unicode(po_file)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "polib.py", line 602, in __unicode__
return ret + _BaseFile.__unicode__(self)
File "polib.py", line 297, in __unicode__
ret.append(entry.__unicode__(self.wrapwidth))
File "polib.py", line 988, in __unicode__
ret.append(_BaseEntry.__unicode__(self, wrapwidth))
File "polib.py", line 832, in __unicode__
ret = u('\n').join(ret)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)
It's very possible I'm using it wrong, but seems like a bug to me?
Originally reported by: David Evans (Bitbucket: dave_e, GitHub: Unknown)
running: easy_install polib currently doesn't work as the current version is listed 0.5.4 which isn't available in the download section here.
Work around is to run easy_install polib==0.5.3 which does exist.
Originally reported by: qx0monster (Bitbucket: qx0monster, GitHub: Unknown)
Hi David,
long time. I've just upgraded to polib 0.5.1. An issue I'm fighting with since 0.4 is the escaping and unescaping of \ (backslash) (I once wrote you an email about it).
Currently, polib adds an additional \ before another \ (provided that it is not \n, \t, etc.). See the attached escapes.po file and run it through 'pofile("escapes.po").save(fpath="out.po")'.
This means that both msgid's and msgstr's cannot make use of legal Unicode escape sequences, such as \u00AE, as they are turned into \ \u00AE. This is a problem in environments that have no other means to enter and manipulate such special characters as through their escape sequence.
What do you think about it?
Thomas
Originally reported by: Anonymous
I have following problem with Transifex (which claims that root of cause is polib).
I upload translations, do no change, download it back. What I get back is very often reformated. E.g.:
#: ../src/up2date_client/rhnreg_constants.py:48 ../data/rh_register.glade.h:34
msgid ""
-"Access to the technical support experts at Red Hat or Red Hat's partners for "
-"help with any issues you might encounter with this system."
+"Access to the technical support experts at Red Hat or Red Hat's partners for"
+" help with any issues you might encounter with this system."
Note the space after "for" at the end of line.
This make diff in our git repo bigger for no reason. Glezos claims that cause is polib (which Transifex use). I could not verify this claim.
Originally reported by: Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk)
polib does this in a few places:
#!python
try:
do something
except:
handle exceptions
This is a bad idea, because it catches all exceptions, even those you're not expecting, e.g. KeybordInterrupt or manifestations of bugs. See also: http://docs.python.org/2/howto/doanddont.html#except
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.