Coder Social home page Coder Social logo

Unicode path handling about objection HOT 7 CLOSED

sensepost avatar sensepost commented on May 18, 2024
Unicode path handling

from objection.

Comments (7)

leonjza avatar leonjza commented on May 18, 2024

Thanks. I imagine this is going to be a problem with certain locales as well. Is this IPA one you can share with me?

from objection.

FlavSec avatar FlavSec commented on May 18, 2024

Sure! It's just a quick test app I threw together in xcode. Should be attached to this comment.
unicode.ipa.zip

from objection.

FlavSec avatar FlavSec commented on May 18, 2024

So I jumped down the rabbit hole - it's an issue with zipfile.ZipFile() guessing the wrong encoding for the extracted filenames. We'll see how GitHub handles the below output, hopefully it's legible:

NetSPIs-MacBook-Pro:test netspi$ unzip unicode.ipa
Archive:  unicode.ipa
<snip>
NetSPIs-MacBook-Pro:test netspi$ ls
Payload
NetSPIs-MacBook-Pro:test netspi$ cd Payload/
NetSPIs-MacBook-Pro:Payload netspi$ ls
Test®†µ˚¬.app # <- Correct name of app bundle, as set in Xcode.
NetSPIs-MacBook-Pro:test netspi$
NetSPIs-MacBook-Pro:test netspi$ python3
Python 3.6.2 (v3.6.2:5fd33b5926, Jul 16 2017, 20:11:06) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import zipfile
>>> file = zipfile.ZipFile('/Users/netspi/Desktop/test/unicode.ipa','r')
>>> file
<zipfile.ZipFile filename='/Users/netspi/Desktop/test/unicode.ipa' mode='r'>
>>> file.namelist()[1]
'Payload/Test®†µ˚¬.app/' # <- WRONG representation of filename
>>> file.namelist()[1].encode('cp437') # <- The deep corners of the internet suggested code page 437 is commonly used as a default encoding with the zip format
b'Payload/Test\xc2\xae\xe2\x80\xa0\xc2\xb5\xcb\x9a\xc2\xac.app/' #
>>> file.namelist()[1].encode('cp437').decode('utf-8')
'Payload/Test®†µ˚¬.app/' # <- Sure enough, reversing the cp437 decoding and re-decoding as UTF8 fixes the problem.

So! I found a few python issue tickets (such as https://bugs.python.org/issue28080), but it doesn't look like there's a simple way to programmatically guess the encoding of the zip file.

The errors I initially posted in this issue stem from the fact that we use

self.app_binary = os.path.join(self.app_folder, info_plist['CFBundleExecutable'])

which gets the properly-encoded name from the plist file, regardless of how the actual filename is decoded from the zip file. If the filename decodes properly from the zip file, this will work. If it doesn't, the binary referenced by the above won't be present in the unzipped archive. Additionally, re-packaging an IPA with incorrectly encoded filenames will cause a similar mismatch when a user tries to install or run a modified IPA.

I'm not sure of the best way to proceed here. Compare the filename in the plist with the extracted filename, perhaps, and use that as a way to determine valid encodings?

from objection.

leonjza avatar leonjza commented on May 18, 2024

Thanks for looking into this. You rock! 🤘
I guess we can always fall back to using a delegator and unzip combo if that is going to make life easier.

I will also try a few things to see if I can find a way to handle unicode better.

from objection.

pachoo avatar pachoo commented on May 18, 2024

@leonjza I fixed this in a local branch by using system's unzip instead of the zipfile module to extract the IPA. In my case, the problem was that the IPA had UTF-8 filenames in it, but the zip flag for that wasn't set for those entries, so zipfile encoded/munged the pathname as cp437 (e.g. https://github.com/python/cpython/blob/master/Lib/zipfile.py#L1305 ).

from objection.

leonjza avatar leonjza commented on May 18, 2024

Excellent @pachoo. Do you want to create a PR for those changes?

from objection.

leonjza avatar leonjza commented on May 18, 2024

Closing due to age.

from objection.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.