Coder Social home page Coder Social logo

unzip-http's Introduction

Welcome to Saulville

I live in the terminal, at the intersection of data and fun:

If you find this kind of stuff interesting, please reach out. Email me at [email protected], or come chat with us on IRC (libera.chat/#visidata) or on Discord. I'd love to talk with you!

unzip-http's People

Contributors

anjakefala avatar horw avatar porocyon avatar saulpw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

unzip-http's Issues

ZIP64 format not supported ๐Ÿ˜…

For reference, ZIP64 Overview

When trying to improve file read on very large files (meaning >4GB), the ZIP format has switched to ZIP64 format.
This is the case of the following : 'http://psa.download.navigation.com/automotive/PSA/RT6-SMEGx/M49RG20-Q0420-2001.ZIP'

This module present such an advantage to avoid getting more than 4GB of data for only spare Kilo Bytes.
However due to the way it works, the infolist() must support ZIP64 parsing with the new structures with signature PK\x06\x06 and PK\x06\x07

Sample failing sample code :
For example:

import unzip_http

rzf = unzip_http.RemoteZipFile('http://psa.download.navigation.com/automotive/PSA/RT6-SMEGx/M49RG20-Q0420-2001.ZIP')
fp_contents = rzf.open("DATA/CURR_VERS_NAVI.TXT")

Traceback (most recent call last):
....
  File "C:\Work\dev\psa-maps\venv\lib\site-packages\unzip_http.py", line 60, in infolist
    struct.unpack_from(self.fmt_cdirentry, resp.data, offset=filehdr_index)
struct.error: offset -1514118702 out of range for 65536-byte buffer

Your version fails because it reads classical EOCD record where some values are set to 0xFFFFFFFF in order to indicate that ZIP parser must consider the EOCD64 to find correct values.

Any chance to support this extended format ?

From conda feedstock

Can you please file an issue upstream so that the scripts tag is changed to console_scripts entry_points? That would allow it to become noarch and save CI resource.

offset out of range for 65536-byte buffer

While attempting to download files from an ultra-large zip archive (355Gb) I got the following:

Traceback (most recent call last):
File "/Users/sergey.vilov/tmp/test/test.py", line 5, in
binfp = rzf.open('train_images/10005/18667/100.dcm')
File "/Users/sergey.vilov/miniconda/envs/kaggle/lib/python3.9/site-packages/unzip_http.py", line 192, in open
f = list(self.matching_files(fn))
File "/Users/sergey.vilov/miniconda/envs/kaggle/lib/python3.9/site-packages/unzip_http.py", line 186, in matching_files
for f in self.files.values():
File "/Users/sergey.vilov/miniconda/envs/kaggle/lib/python3.9/site-packages/unzip_http.py", line 109, in files
self._files = {r.filename:r for r in self.infoiter()}
File "/Users/sergey.vilov/miniconda/envs/kaggle/lib/python3.9/site-packages/unzip_http.py", line 109, in
self._files = {r.filename:r for r in self.infoiter()}
File "/Users/sergey.vilov/miniconda/envs/kaggle/lib/python3.9/site-packages/unzip_http.py", line 151, in infoiter
struct.unpack_from(self.fmt_cdirentry, resp.data, offset=filehdr_index)
struct.error: offset -138557274 out of range for 65536-byte buff

The archive link can be obtained by downloading a Kaggle dataset from here.
Unfortunately, I can't provide a direct link without exposing my kaggle credentials

Extra 'P' char in file data after CRLF

First of all, thanks for this package. I was looking for such feature few month ago but found nothing. Finally it came.

I'm trying to read text files from large remote file to avoid gigas to be downloaded for few kilos expected.
The target URL is : 'http://psa.download.navigation.com/automotive/PSA/RT6-SMEGx/M49RG20-Q0420-2051.ZIP'

And I'm trying to get those two files :

ZIP_CONTENT_FILE = "DATA/CURR_VERS_NAVI.TXT"
ZIP_MAP_VERSION_FILE = "MAP.inf"

For both of them, reading line per line using readline(), I get an extra 'P' char after the '\r\n'

For example:

import unzip_http

rzf = unzip_http.RemoteZipFile('http://psa.download.navigation.com/automotive/PSA/RT6-SMEGx/M49RG20-Q0420-2051.ZIP')
fp_contents = rzf.open("DATA/CURR_VERS_NAVI.TXT")

print(fp_contents.data)
b'CID:013,118,0,4,2020,"11/11/2020","MIDDLE_EUROPE","HERE"\r\nP'

Exception: unknown compression method 9

Here is the command I'm trying to run:

unzip-http https://fbinter.stadt-berlin.de/lidar/Nordost.zip 3dm_33_400_5834_1_be.las > 3dm_33_400_5834_1_be.las

Which gives the following error:

Traceback (most recent call last):
  File "/home/rcm/miniconda3/envs/test/bin/unzip-http", line 62, in <module>
    main(*args)
  File "/home/rcm/miniconda3/envs/test/bin/unzip-http", line 53, in main
    fp = StreamProgress(rzf.open(f), name=f.filename, total=f.compress_size)
                        ^^^^^^^^^^^
  File "/home/rcm/miniconda3/envs/test/lib/python3.11/site-packages/unzip_http.py", line 209, in open
    error(f'unknown compression method {method}')
  File "/home/rcm/miniconda3/envs/test/lib/python3.11/site-packages/unzip_http.py", line 35, in error
    raise Exception(s)
Exception: unknown compression method 9

For some more details, I downloaded the zip and ran unzip -v for the following table:

Archive:  Nordost.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
502777261  Def64N 214255624  57% 2021-06-23 14:26 bb038a23  3dm_33_400_5826_1_be.las
419140249  Def64N 177376842  58% 2021-06-23 14:34 3ac4c66a  3dm_33_400_5827_1_be.las
62535253  Def64N 27337765  56% 2021-06-23 14:36 cf17bfe5  3dm_33_400_5828_1_be.las
30990427  Def64N 14001832  55% 2021-06-23 15:00 b060d518  3dm_33_400_5832_1_be.las
145758559  Def64N 66743943  54% 2021-06-23 15:04 749f0c26  3dm_33_400_5833_1_be.las
 7890037  Def64N  3611646  54% 2021-06-23 15:05 fef2b2ad  3dm_33_400_5834_1_be.las
463060459  Def64N 180252954  61% 2021-06-24 06:16 fdd24b11  3dm_33_401_5826_1_be.las
272314671  Def64N 104362563  62% 2021-06-24 06:09 e8303ed7  3dm_33_401_5827_1_be.las
412374691  Def64N 169386853  59% 2021-06-28 15:04 f89d361c  3dm_33_402_5826_1_be.las
40576687  Def64N 15047392  63% 2021-06-28 15:05 a28d963b  3dm_33_402_5827_1_be.las
23602277  Def64N 10555268  55% 2021-06-29 07:13 b0ca03d9  3dm_33_403_5826_1_be.las
--------          -------  ---                            -------
2381020571         982932682  59%                            11 files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.