Comments (5)
The output of mdls
isn't encoded text. It's a file format (the old NeXT plist format, by the looks of it), more akin to JSON.
Adding something like this to Workflow.decode()
is inappropriate. Its job is to decode encoded text and normalise text, not parse file formats.
Sure, it would do some useful "magic" if the text just happens to be mdls
output, but what if it's just a normal string that happens to have \U
in it? It'd basically be breaking decode()
to handle one weird edge case.
from alfred-workflow.
The distinction between encoded text and file format is well taken. And the point about breaking simple strings with \U
in them is also a good point. My point was merely that decode()
does not decode text Unicode text gotten from subprocess
in this edge case. As you state in the docs, subprocess
is a module that will return strings that require some messaging to get in Pythonic form. I have no idea which OS X CLIs return old NeXT plist formats, but when someone writes a workflow using Alfred-Workflow
, they won't have any idea what the hell is going on. And the solution is to .decode()
the text, so it doesn't seem totally inappropriate to add this functionality to Workflow.decode()
.
Obviously, it's your package, so do as you see fit, and this will be my last ditch effort. I've altered the function so that it handles strings with "\U" randomly in them. It will only use .decode('unicode-escape')
if it finds a "\U" followed by 3 or more numbers:
def decode(text, encoding='utf-8', normalization='NFC'):
"""Return ``text`` as normalised unicode.
:param text: string
:type text: encoded or Unicode string. If ``text`` is already a
Unicode string, it will only be normalised.
:param encoding: The text encoding to use to decode ``text`` to
Unicode.
:type encoding: ``unicode`` or ``None``
:param normalization: The nomalisation form to apply to ``text``.
:type normalization: ``unicode`` or ``None``
:returns: decoded and normalised ``unicode``
"""
# convert string to Unicode
if isinstance(text, basestring):
if not isinstance(text, unicode):
text = unicode(text, encoding)
# decode Cocoa/CoreFoundation Unicode to Python Unicode
if re.search(r'\\U\d{3,}', text):
text = text.replace('\\U', '\\u').decode('unicode-escape')
return unicodedata.normalize(normalization, text)
This seems to me sufficiently safe, such that there is no downside to adding the functionality. But, as I say, I will drop it at this.
from alfred-workflow.
My point was merely that
decode()
does not decode text Unicode text gotten from subprocess in this edge case
Yes, it does decode the text. 'To\U0304ny\U0308 Sta\U030ark'
contains representations of Unicode codepoints within an encoded string, just as writing mystring = u'To\u0304ny'
in your Python source code is a Unicode representation within an ASCII- or UTF-8-encoded text file.
What you're suggesting is hardcoding a second, mdls
codec and corresponding decoding step in decode()
. If you want to decode mdls
output, then you should be specifying something other than utf-8
as the encoding, not hardcoding your mdls
codec into decode()
.
This seems to me sufficiently safe, such that there is no downside to adding the functionality.
It's not safe, it's broken.
Let's look at your Stack Overflow question as an example. Say you're using some hypothetical Stack Overflow command line tool via subprocess
to grab your post and its comments, and then you run the output through decode()
(as you should because it might contain non-ASCII characters).
Your question:
I'm having issues reading Unicode text from the shell into Python. I have a test document with the following metadata atrribute:
kMDItemAuthors = ( "To\U0304ny\U0308 Sta\U030ark" )
Current decode()
returns:
I'm having issues reading Unicode text from the shell into Python. I have a test document with the following metadata atrribute:
kMDItemAuthors = ( "To\U0304ny\U0308 Sta\U030ark" )
Your modified decode()
returns:
I'm having issues reading Unicode text from the shell into Python. I have a test document with the following metadata atrribute:
kMDItemAuthors = ( "Tōnÿ Stårk" )
Do you see the problem?
from alfred-workflow.
Point taken.
from alfred-workflow.
Seeing as mdls
apparently only outputs ASCII, you might want to look into adding a codec for mdls
's Unicode escape format, which you could then pass to decode()
.
That might be total overkill, though.
from alfred-workflow.
Related Issues (20)
- Run with error HOT 2
- Cache Image HOT 5
- Basic auth HOT 3
- Pass parameter to subprocess HOT 5
- Tutorial options for keywords need to be updated for Alfred 4 HOT 11
- set_config raises error when the bundle id is null HOT 4
- Setting only arg on Variables adds line break HOT 1
- will it support python3? HOT 1
- python3 has no cpickle HOT 1
- cant get output HOT 11
- chr() arg not in range(256) error when trying to use Beautiful Soup 4 HOT 1
- workflow:magic not working?
- API functionality question
- AlertCautionIcon.icns does not exist on Big Sur
- ERROR: [Script Filter] JSON error
- Google SDK
- Can't get Script Filter to find the pinboard.py file from the tutorial HOT 1
- [Feature request] Possible to open bookmarks from root?
- Not working on the latest MacOS 12.3 HOT 11
- How to fetch chrome cookie?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alfred-workflow.