ponty / pyunpack Goto Github PK
View Code? Open in Web Editor NEWunpack archive files in python
License: BSD 2-Clause "Simplified" License
unpack archive files in python
License: BSD 2-Clause "Simplified" License
OS version: Windows 10 20H2
Python version: Python 3.9.5 and Anaconda3 2021.5 Python 3.8.8 64-bit
Requirement patool-1,12
already satisfied
MWE:
import os
import sys
from pyunpack import Archive
path_to_file = 'xxx'
target_dir = 'xxx'
Archive(path_to_file).extractall(target_dir)
Problem: the program just hangs at the last line, not extracting the compressed files. The format of the compressed file is *.tar.xz, which is just 69350 KiB.
prerequisite: patool !!!!!!!!
currently version 0.3 does not include github init.py fix for supporting patool.exe path
#20
thanks!
I'm writing a python script to process some old archives that include many .rar files. I can successfully extract data from them using Archive.extractall from pyunpack. However, occasionally I will hit a password protected file, and the program just hangs. I have the code in a try block, but it's not throwing any exception. Would it be possible to throw an exception in this circumstance so my script can skip the file and continue?
Here's the relevant portion of my code:
import os
import rarfile
from pyunpack import Archive
for subdir,dirs,files in os.walk(directoryToExtract):
for file in files:
if rarfile.is_rarfile(os.path.join(subdir, file)):
print("Found rarfile " + os.path.join(subdir, file))
try:
Archive(os.path.join(subdir, file)).extractall(os.path.join(subdir, file[0:len(file)-4]),auto_create_dir=True)
except Exception as e:
print("ERROR: Couldn't unrar " + os.path.join(subdir, file))
print(str(e))
Most of the archives format support protecting the archive with a password.
It will be great if it will be able to extract files with a given password
NB: I have already opened the same issue on StackOverFlow here:
https://stackoverflow.com/questions/60655910/use-pyunpack-inside-an-executable-file-made-with-pyinstaller-in-combination-with
I have a strange behaviour for pyunpack, a package for unpacking, inside an executable.
I want to do the following thing:
I have a .7z type of file whose ending is not in .7z but in .sent.
First I try to unzip it the direct way, which leads to an expected error that is caught.
Inside this error catching, I am first adding the .7z extension, then I am unzipping the file properly into a folder called "grog", then I give the zipped file its original name back.
Here is the code below:
# test.py
from os.path import abspath, join, exists, dirname
from os import rename, mkdir
from shutil import copy
import multiprocessing
import pyunpack
multiprocessing.freeze_support()
print(0)
name = "file_to_be_unzipped.sent"
print("a")
path = "C:\\Users\\myname\\eclipse-workspace-tms\\test_unzip_exe"
print(abspath("."))
print("b")
unzip_dest = join(path, "grog")
if not exists(unzip_dest):
mkdir(unzip_dest)
print("c")
name = join(path, name)
print("d")
print("e")
try:
print(1)
pyunpack.Archive(name).extractall(unzip_dest)
print(2)
except pyunpack.PatoolError as pterr:
print(3)
temp_f_name = name + ".7z"
print(4)
rename(name, temp_f_name)
try:
print(5)
pyunpack.Archive(temp_f_name).extractall(unzip_dest)
print(6)
rename(temp_f_name, name)
print(7)
except pyunpack.PatoolError as pterr2:
# removing useless 7z extension
print(8)
rename(temp_f_name, name)
print(9)
# Case when the file is already unzipped
if str(pterr2).find("Is not archive"):
print(10)
copy(name, unzip_dest)
print(11)
print(12)
except ValueError as v:
print(13)
print(v)
print(14)
When I launch the script test.py, I get the expected behaviour:
0
a
C:\Users\myname\eclipse-workspace-tms\test_unzip_exe
b
c
d
e
1
3
4
5
6
7
then I build the executable with the following command line:
pyinstaller --log-level=DEBUG test.spec
and the following spec file:
# -*- mode: python ; coding: utf-8 -*-
block_cipher = None
import pyunpack
import patoolib
from pyunpack import Archive, PatoolError
from patoolib.programs import ar
from patoolib.programs import arc
from patoolib.programs import archmage
from patoolib.programs import arj
from patoolib.programs import bsdcpio
from patoolib.programs import bsdtar
from patoolib.programs import bzip2
from patoolib.programs import cabextract
from patoolib.programs import chmlib
from patoolib.programs import clzip
from patoolib.programs import compress
from patoolib.programs import cpio
from patoolib.programs import dpkg
from patoolib.programs import flac
from patoolib.programs import genisoimage
from patoolib.programs import gzip
from patoolib.programs import isoinfo
from patoolib.programs import lbzip2
from patoolib.programs import lcab
from patoolib.programs import lha
from patoolib.programs import lhasa
from patoolib.programs import lrzip
from patoolib.programs import lzip
from patoolib.programs import lzma
from patoolib.programs import lzop
from patoolib.programs import mac
from patoolib.programs import nomarch
from patoolib.programs import p7azip
from patoolib.programs import p7rzip
from patoolib.programs import p7zip
from patoolib.programs import pbzip2
from patoolib.programs import pdlzip
from patoolib.programs import pigz
from patoolib.programs import plzip
from patoolib.programs import py_bz2
from patoolib.programs import py_echo
from patoolib.programs import py_gzip
from patoolib.programs import py_lzma
from patoolib.programs import py_tarfile
from patoolib.programs import py_zipfile
from patoolib.programs import rar
from patoolib.programs import rpm
from patoolib.programs import rpm2cpio
from patoolib.programs import rzip
from patoolib.programs import shar
from patoolib.programs import shorten
from patoolib.programs import star
from patoolib.programs import tar
from patoolib.programs import unace
from patoolib.programs import unadf
from patoolib.programs import unalz
from patoolib.programs import uncompress
from patoolib.programs import unrar
from patoolib.programs import unshar
from patoolib.programs import unzip
from patoolib.programs import xdms
from patoolib.programs import xz
from patoolib.programs import zip
from patoolib.programs import zoo
from patoolib.programs import zopfli
from patoolib.programs import zpaq
# from pyunpack import Archive, PatoolError
a = Analysis(['test.py'],
pathex=['C:\\Users\\myname\\eclipse-workspace-tms\\test_unzip_exe'],
binaries=[],
datas=[],
hiddenimports=['pyunpack', 'patoolib',
'patoolib.programs.ar',
'patoolib.programs.arc',
'patoolib.programs.archmage',
'patoolib.programs.arj',
'patoolib.programs.bsdcpio',
'patoolib.programs.bsdtar',
'patoolib.programs.bzip2',
'patoolib.programs.cabextract',
'patoolib.programs.chmlib',
'patoolib.programs.clzip',
'patoolib.programs.compress',
'patoolib.programs.cpio',
'patoolib.programs.dpkg',
'patoolib.programs.flac',
'patoolib.programs.genisoimage',
'patoolib.programs.gzip',
'patoolib.programs.isoinfo',
'patoolib.programs.lbzip2',
'patoolib.programs.lcab',
'patoolib.programs.lha',
'patoolib.programs.lhasa',
'patoolib.programs.lrzip',
'patoolib.programs.lzip',
'patoolib.programs.lzma',
'patoolib.programs.lzop',
'patoolib.programs.mac',
'patoolib.programs.nomarch',
'patoolib.programs.p7azip',
'patoolib.programs.p7rzip',
'patoolib.programs.p7zip',
'patoolib.programs.pbzip2',
'patoolib.programs.pdlzip',
'patoolib.programs.pigz',
'patoolib.programs.plzip',
'patoolib.programs.py_bz2',
'patoolib.programs.py_echo',
'patoolib.programs.py_gzip',
'patoolib.programs.py_lzma',
'patoolib.programs.py_tarfile',
'patoolib.programs.py_zipfile',
'patoolib.programs.rar',
'patoolib.programs.rpm',
'patoolib.programs.rpm2cpio',
'patoolib.programs.rzip',
'patoolib.programs.shar',
'patoolib.programs.shorten',
'patoolib.programs.star',
'patoolib.programs.tar',
'patoolib.programs.unace',
'patoolib.programs.unadf',
'patoolib.programs.unalz',
'patoolib.programs.uncompress',
'patoolib.programs.unrar',
'patoolib.programs.unshar',
'patoolib.programs.unzip',
'patoolib.programs.xdms',
'patoolib.programs.xz',
'patoolib.programs.zip',
'patoolib.programs.zoo',
'patoolib.programs.zopfli',
'patoolib.programs.zpaq'],
# hiddenimports=['Archive', 'PatoolError'],
hookspath=[],
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
cipher=block_cipher)
exe = EXE(pyz,
a.scripts,
[],
exclude_binaries=True,
name='test',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
console=True )
coll = COLLECT(exe,
a.binaries,
a.zipfiles,
a.datas,
strip=False,
upx=True,
upx_exclude=[],
name='test')
and then after an unexpected long time, I get the following:
0
a
C:\Users\myname\eclipse-workspace-tms\test_unzip_exe\dist\test
b
c
d
e
1
2
where the file in the destination ("grog") is not an unzipped file as wanted but simply a copy.
Does anybody have an idea of what is going wrong?
Thanks a lot
Hello,
I'm trying to use the pyunpack to open a RAR file.
I've install the patool in order to add RAR suport (patool 1.12)
My pyunpack package is v.0.0.3
But using:
>>> from pyunpack import Archive
>>> Archive('my-file.rar').extractall('./')
I get the error: ImportError: No module named patoolib
I'm using Python 2.7 and a Mac OS El Capitan
What i'm missing?
Thanks for any help
Achive file with chinese character name file in it unpack will give wrong results, zip "gbk" decode request?
Achive:
test.zip
├─test
│ └─测试 (subfolder with chinese name)
│ └─test
│ test.txt
│ 测试.txt (file with chinese name)
after
arch = pyunpack.Archive()
arch.extractall()
will get
└─test
│ └─▓Γ╩╘
│ └─test
│ test.txt
│ ▓Γ╩╘.txt
All with chinese name will goes wrong!
===========================
And pyunpack seems cannot handle 7z?
log:
patool can not unpack
patool error: error extracting could not find an executable program to extract format 7z; candidates are (7z,7za,7zr)
PS E:\Project*\test> & E:/Installed/Python/Python37-32/python.exe e:/Project//test/unzipfile1.py
Traceback (most recent call last):
File "e:/Project/***/test/unzipfile1.py", line 7, in
Archive('软件 安排 编辑.rar').extractall('./module/temp')
File "E:\Installed\Python\Python37-32\lib\site-packages\pyunpack_init_.py", line 113, in extractall
self.extractall_patool(directory, patool_path)
File "E:\Installed\Python\Python37-32\lib\site-packages\pyunpack_init_.py", line 74, in extractall_patool
raise PatoolError("patool can not unpack\n" + str(p.stderr))
pyunpack.PatoolError: patool can not unpack
********** Oops, I did it again. *************
You have found an internal error in patool. Please write a bug report
at https://github.com/wummel/patool/issues/ and include at least the information below:
Not disclosing some of the information below due to privacy reasons is ok.
I will try to help you nonetheless, but you have to give me something
I can work with ;) .
<class 'UnicodeEncodeError'> 'charmap' codec can't encode characters in position 63-64: character maps to
Traceback (most recent call last):
File "E:\Installed\Python\Python37-32\Scripts\patool", line 213, in main
res = globals()"run_%s" % args.command
File "E:\Installed\Python\Python37-32\Scripts\patool", line 33, in run_extract
patoolib.extract_archive(archive, verbosity=args.verbosity, interactive=args.interactive, outdir=args.outdir)
File "E:\Installed\Python\Python37-32\lib\site-packages\patoolib_init_.py", line 683, in extract_archive
util.log_info("Extracting %s ..." % archive)
File "E:\Installed\Python\Python37-32\lib\site-packages\patoolib\util.py", line 516, in log_info
print("patool:", msg, file=out)
File "E:\Installed\Python\Python37-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 63-64: character maps to
System info:
patool 1.12
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:06:47) [MSC v.1914 32 bit (Intel)] on win32
sys.argv ['E:\Installed\Python\Python37-32\Scripts\patool', '--non-interactive', 'extract', 'E:\Project\**\test\\u8f6f\u4ef6 \u5b89\u6392 \u7f16\u8f91.rar', '--outdir=E:\Project\****\test\module\temp']
LANG = 'en_US.UTF-8'
******** patool internal error, over and out ********
I added --non-interactive option and timeout
My Archive:
class ArchiveWithTimeout(Archive):
def __init__(self, filename, timeout=None):
super(ArchiveWithTimeout, self).__init__(filename)
self.timeout = timeout
def extractall_patool(self, directory, patool_path):
log.debug("starting backend patool")
if not patool_path:
patool_path=fullpath('patool')
p = EasyProcess([
sys.executable,
patool_path,
'--non-interactive',
'extract',
self.filename,
'--outdir=' + directory,
# '--verbose',
]).call(timeout=self.timeout)
if p.return_code:
raise PatoolError("patool can not unpack\n" + str(p.stderr))
Hi,
I saw that patool isn't called from python. Is there a reason to call it in a sub process?
I can provide a pull request.
pyunpack currently doesn't extract password protected ZIP files with backend = "auto"
. But it is able to do it with manually chosen "patool" as backend.
Default config fails:
$ python -m pyunpack.cli -p "geheim" ./lz_input/pwd_protected.zip .
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/user/.cache/pypoetry/virtualenvs/mnxmpl-ToTEp6cO-py3.10/lib/python3.10/site-packages/pyunpack/cli.py", line 12, in <module>
def extractall(
File "/home/user/.cache/pypoetry/virtualenvs/mnxmpl-ToTEp6cO-py3.10/lib/python3.10/site-packages/entrypoint2/__init__.py", line 382, in entrypoint
return func(**kwargs)
File "/home/user/.cache/pypoetry/virtualenvs/mnxmpl-ToTEp6cO-py3.10/lib/python3.10/site-packages/pyunpack/cli.py", line 21, in extractall
Archive(filename, backend, password=password).extractall(
File "/home/user/.cache/pypoetry/virtualenvs/mnxmpl-ToTEp6cO-py3.10/lib/python3.10/site-packages/pyunpack/__init__.py", line 111, in extractall
self.extractall_zipfile(directory)
File "/home/user/.cache/pypoetry/virtualenvs/mnxmpl-ToTEp6cO-py3.10/lib/python3.10/site-packages/pyunpack/__init__.py", line 79, in extractall_zipfile
zipfile.ZipFile(self.filename).extractall(
File "/usr/lib/python3.10/zipfile.py", line 1645, in extractall
self._extract_member(zipinfo, path, pwd)
File "/usr/lib/python3.10/zipfile.py", line 1698, in _extract_member
with self.open(member, pwd=pwd) as source, \
File "/usr/lib/python3.10/zipfile.py", line 1571, in open
return ZipExtFile(zef_file, mode, zinfo, pwd, True)
File "/usr/lib/python3.10/zipfile.py", line 800, in __init__
self._decompressor = _get_decompressor(self._compress_type)
File "/usr/lib/python3.10/zipfile.py", line 699, in _get_decompressor
_check_compression(compress_type)
File "/usr/lib/python3.10/zipfile.py", line 679, in _check_compression
raise NotImplementedError("That compression method is not supported")
NotImplementedError: That compression method is not supported
$ ls .
This works:
$ python3 -m pyunpack.cli -p "geheim" -b "patool" ./lz_input/pwd_protected.zip .
$ ls .
file.txt
Info:
Python: 3.10.8
Patool: 1.12 4928f3f
Pyunpack: 0.3
Hi,
I want to use pyunpack to try to unzip cabinet file.
Code is as following:
from pyunpack import Archive
sys.executable = r"C:\Users\<myname>\AppData\Local\Continuum\anaconda3\python.exe"
Archive("folder1//A.cab").extractall("folder1")
Here is the traceback:
Traceback (most recent call last):
File "c:\users\<myname>\documents\repo\<programA>\test.py", line 70, in <module>
Archive("folder1//Alienware_Desktop_064C.cab").extractall("folder1")
File "C:\Users\<myname>\AppData\Local\Continuum\anaconda3\lib\site-packages\pyunpack\__init__.py", line 94, in extractall
self.extractall_patool(directory, patool_path)
File "C:\Users\<myname>\AppData\Local\Continuum\anaconda3\lib\site-packages\pyunpack\__init__.py", line 65, in extractall_patool
raise PatoolError("patool can not unpack\n" + str(p.stderr))
PatoolError: patool can not unpack
File "C:\Users\<myname>\AppData\Local\Continuum\anaconda3\Scripts\patool.exe", line 1
SyntaxError: Non-UTF-8 code starting with '\x90' in file C:\Users\<myname>\AppData\Local\Continuum\anaconda3\Scripts\patool.exe on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
I am try to extract folder1//A.cab but the pyunpack tries to extract patool.exe
And the other issue is that I found in __init__.py
for pyunpack , in line 48, it should be
patool_path = _exepath("patool.exe")
or there will be an error:
ValueError: patool not found! Please install patool!
even if I have already installed patool.
Environment:
Win10
spyder4.1.4
patool-1.12
pyunpack 0.2.1
I wonder if we could collaborate on extractcode at https://github.com/nexB/scancode-toolkit/tree/develop/src/extractcode since this is code with similar goals
It seems that there's a failure in _exepath
that doesn't allow it to correctly identify patool.exe
on a Windows system. Seems like that could be corrected with the following (or similar) addition:
def _exepath(cmd: str) -> Optional[str]:
for p in os.environ["PATH"].split(os.pathsep):
fullp = os.path.join(p, cmd)
if os.access(fullp, os.X_OK):
return fullp
+ if os.access(fullp + ".exe", os.X_OK):
+ return fullp + ".exe"
return None
I made a python script wich searches for .zip and .rar files and unpacks it with pyunpack. First I was puzzled as pyunpack throw me this exception since the path it showed was in fact valid:
filename.rar will be unpacked
Traceback (most recent call last):
File "ungesichtetaut.py", line 66, in
main(working_dir, sortierung)
File "ungesichtetaut.py", line 61, in main
unpacking = unpack(working_dir)
File "ungesichtetaut.py", line 47, in unpack
auto_create_dir = True)
File "/usr/local/lib/python2.7/dist-packages/pyunpack/init.py", line 66, in extractall
"archive file does not exist:" + str(self.filename))
ValueError: archive file does not exist:/media/usb1/ungesichtet/filename.rar
The code that gave me the error was:
for file in test_to_unpack:
if (".rar" in file or ".zip" in file or ".001" in file):
print file, "will be unpacked"
Archive(file).extractall(working_folder + "_unpack",
auto_create_dir = True)
I corrected it in:
for file in test_to_unpack:
if (".rar" in file or ".zip" in file or ".001" in file):
print file, "will be unpacked"
Archive(working_folder + '/' + file).extractall(working_folder + "_unpack",
auto_create_dir = True)
And now it worked given the complete path.
First I assumed unpackpy knew the path since the ValueError showed the whole path.
Nevertheless thanks for your great work.
Does password decompression currently only support ZIP format?
from pyunpack import Archive
Archive(thing).extractall(str(thing[0:thing.rfind('/')]))
except
Traceback (most recent call last):
File "importItAll.py", line 33, in <module>
Archive(thing).extractall(str(thing[0:thing.rfind('/')]))
File "/usr/local/lib/python2.7/dist-packages/pyunpack/__init__.py", line 74, in extractall
self.extractall_patool(directory, patool_path)
File "/usr/local/lib/python2.7/dist-packages/pyunpack/__init__.py", line 41, in extractall_patool
'--outdir=' + directory,
File "/usr/local/lib/python2.7/dist-packages/easyprocess/__init__.py", line 108, in __init__
self.cmd_as_string = ' '.join(self.cmd) # TODO: not perfect
TypeError: sequence item 1: expected string, NoneType found
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.