xavierleroy / camlzip Goto Github PK

View Code? Open in Web Editor NEW

39.0 39.0 30.0 92 KB

Reading and writing zip and gzip files from OCaml

License: Other

Makefile 4.03% OCaml 86.09% C 8.58% Shell 1.30%

camlzip's People

Contributors

Stargazers

Watchers

camlzip's Issues

API to work on buffers directly

sometimes one needs to compress/decompress data that doesn't come directly from an {in,out}_channel (as part of a bigger pipeline). Right now I can't find a way of doing that.

`NATIVE_COMPILER` variable doesn't exists in 4.09

The Makefile in master uses the NATIVE_COMPILER variable which only appeared in ocaml/ocaml@987b081 and so in 4.10.

OPAM Package

Hi,
Your library seems to be a good wrapper for zlib and it might be interesting to use it for Haxe. The next version of Haxe is planned to integrate with OCaml's package manager: OPAM.

What is the current state of this library relative to OPAM ? It seems that camlzip is on OPAM, but it was published by a third-party author. Do you know him, is it a reliable source ? It would be better if you were the person who published the package since it seems that this is the home of the project.

Request to release 1.08 version on Opam

Is it possible to publish (opam-publish) a new release of camlzip which includes recents bug fixes ?

install .mli and .cmt{,i} files

right now merlin has no clue about the functions or their documentation.

Zip.add_entry_generator produces a garbage CRC in Store mode

The problem lies in this snippet:

https://github.com/xavierleroy/camlzip/blob/master/zip.ml#L582-L587

The crc reference is never updated by the callback.

Document/enforce forward slashes in entry names

The spec for the Zip format (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) says:

4.4.17.1 The name of the file, with optional relative path.
The path stored MUST NOT contain a drive or
device letter, or a leading slash. All slashes
MUST be forward slashes '/' as opposed to
backwards slashes '' for compatibility with Amiga
and UNIX file systems etc. If input came from standard
input, there is no file name field.

We just observed that using backwards slashes can effectively cause issues when unzipping on Linux (for Amiga, we couldn't check, unfortunately) with some tools. Such problem has also been reported e.g. here

We could argue that it's the responsibility of Camlzip users to know about this constraint, but it seems harmless and useful to add a note to mention it in the docstrings of functions taking an entry path argument.

Going a step further, Camlzip could replace \ with / automatically in entry paths (and also fail on leading slashes or drive?).

@xavierleroy : I'm happy to propose a PR implementing either of these variants if you tell me which one you prefer.

For the interested reader, here is some extra context. Win32 enforces by default a restriction on path lengths to about 256 characters. This can be lifted with some global settings, which cannot be expected on a typical Windows machine. The practical workaround is rather to prepend \\?\ in front of the path, which lifts the restriction; but if we do that, we have to use backslashes in the path -- forwards slashes are normally allowed as well, but not when that prefix is used. This means that Windows applications using Camlzip and supporting long paths will need to juggle between backslashes (for opening the file manually, or passing an input file name to copy_file_to_entry) and forward slashes (for the entry path name).

Random read_entry "decompression error" exception with zlib 1.2.9+

Since the notorious zlib update, I've been having hard-to-pin-down somewhat reproducible issues when processing large zip files (700 mb) and using a lot of memory while processing. I can reproduce them all the time with my core+camlzip processor on of my huge file that I am not able to share publicly, but been unable to extract to a smaller, reproducible case (even adding a debug output may make the issue go away).

The problem manifests itself as suddenly being unable to unpack zip file — the Zlib.Error "decompression error" on random files gets thrown, whereas the zip file is itself perfectly fine.

What I've been able to pinpoint so far is that it started occuring with a zlib commit madler/zlib@b516b4b — Mark Adler added some sanity checks,

The exception gets thrown in camlzip_inflateEnd, as zlib returns error,

the reason for exception is that inflateStateCheck checks the stream structure, and inside there's a "state" substructure that has a reverse pointer to the stream. The new check verifies if these streams are actually equal (madler/zlib@b516b4b#diff-327188edf18799ffbb5a51cc69f797e8R113) — and suddenly, they are not anymore.

Here's my zlib debug info,

# let lines = Zip.read_entry z entry |> String.split ~on:'\n' in ...
inflatestatecheck failed
strm   0x7f7c7cd8f7b0
state  0x2a35070
state->strm 0x7f7c887b5100
state->mode 16203 (distext)
Uncaught exception:

Zip.Error("weather.zip", "wlask.min", "decompression error")

Called from file "src/exn.ml", line 90, characters 6-10

I suppose that probably the garbage collector or something sometimes moves things around and the structure turns invalid, or something — — — any ideas?

(Up-to-date 64-bit archlinux, ocaml 4.04.0 and all via opam)

file descriptor not closed when exception raised in Zip.open_in function

in zip.ml, replace "open_in" by
let open_in filename = let ic = Pervasives.open_in_bin filename in try let (cd_entries, cd_size, cd_offset, cd_comment) = read_ecd filename ic in let entries = read_cd filename ic cd_entries cd_offset (Int32.add cd_offset cd_size) in let dir = Hashtbl.create (cd_entries / 3) in List.iter (fun e -> Hashtbl.add dir e.filename e) entries; { if_filename = filename; if_channel = ic; if_entries = entries; if_directory = dir; if_comment = cd_comment } with exn -> Pervasives.close_in_bin ic; raise exn

URL for zlib

The project README points to http://www.gzip.org/ . Wouldn't https://zlib.net/ be a better reference for zlib?

Add support for Zip64 format

Some external consumers (e.g. numpy) expect uncompressed_size to be correct. Which is not true in current implementation for files larger than 4GB. Probably it is worth considering throwing exception on inputs exceeding 4GB.

ZIP64 support

Camlzip does not support ZIP64 extensions. We are currently running into limit of 64k files. Has anyone worked on adding ZIP64 support?

version in META file

Build issue while cross-compiling

Hello
I'm currently trying to properly cross compile your library, but I'm facing an issue.

I notice the following in your README

- Edit the three variables at the beginning of the Makefile to reflect
  the location where Zlib is installed on your system.  The defaults
  are OK for Linux.

But I really prefer using environment variable instead of modifying the sources.
It's easiest for the integration in a complete cross compilation build system
Could I suggest the following patch to avoid such issue ?
Thanks
Erwan

--- Makefile.ori	2017-11-07 11:41:26.375257045 +0100
+++ Makefile	2017-11-07 11:41:38.719314251 +0100
@@ -5,12 +5,12 @@
 
 # The directory containing the Zlib library (libz.a or libz.so)
 # Leave empty if libz is in a standard linker directory
-ZLIB_LIBDIR=
+ZLIB_LIBDIR?=
 # ZLIB_LIBDIR=/usr/local/lib
 
 # The directory containing the Zlib header file (zlib.h)
 # Leave empty if zlib.h is in a standard compiler directory
-ZLIB_INCLUDE=
+ZLIB_INCLUDE?=
 # ZLIB_INCLUDE=/usr/local/include
 
 # Where to install the library.  By default: sub-directory 'zip' of

Assert failing in zip file with a large number of files

Reading a large file with a huge amount of smaller files inside, it fails:

    Assert_failure zip.ml:217:4

    assert((cd_bound = (LargeFile.pos_in ic)) &&
           (cd_entries = 65535 || !entrycnt = cd_entries));

Adding debug dump, I see that the entrycnt is not the same as cd_entries, with a hint that in this case cd_entries are truncated, and are not #ffff, but !entrycnt & 0xffff:

cdbound=34a57695, lpos=34a57695, cd_entries=284c, entrycnt=1284c

Suggested fix:

zip.ml
     assert((cd_bound = (LargeFile.pos_in ic)) &&
-           (cd_entries = 65535 || !entrycnt = cd_entries));
+           (cd_entries = 65535 || cd_entries = !entrycnt land 0xffff || !entrycnt = cd_entries));

Explicit flush behavior for out_buffer

Hi,
Is there a way to flush only the internal buffer of out channel?
Thank you in advance!

xavierleroy / camlzip Goto Github PK

camlzip's People

Contributors

Stargazers

Watchers

Forkers

camlzip's Issues

Recommend Projects

Recommend Topics

Recommend Org