Coder Social home page Coder Social logo

camlzip's People

Contributors

alainfrisch avatar balat avatar bobot avatar brendanlong avatar dhouck avatar dra27 avatar dweil avatar einars avatar leamingrad avatar nojb avatar rgrinberg avatar thelema avatar treinen avatar xavierleroy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

camlzip's Issues

API to work on buffers directly

sometimes one needs to compress/decompress data that doesn't come directly from an {in,out}_channel (as part of a bigger pipeline). Right now I can't find a way of doing that.

OPAM Package

Hi,
Your library seems to be a good wrapper for zlib and it might be interesting to use it for Haxe. The next version of Haxe is planned to integrate with OCaml's package manager: OPAM.

What is the current state of this library relative to OPAM ? It seems that camlzip is on OPAM, but it was published by a third-party author. Do you know him, is it a reliable source ? It would be better if you were the person who published the package since it seems that this is the home of the project.

Document/enforce forward slashes in entry names

The spec for the Zip format (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) says:

4.4.17.1 The name of the file, with optional relative path.
The path stored MUST NOT contain a drive or
device letter, or a leading slash. All slashes
MUST be forward slashes '/' as opposed to
backwards slashes '' for compatibility with Amiga
and UNIX file systems etc. If input came from standard
input, there is no file name field.

We just observed that using backwards slashes can effectively cause issues when unzipping on Linux (for Amiga, we couldn't check, unfortunately) with some tools. Such problem has also been reported e.g. here

We could argue that it's the responsibility of Camlzip users to know about this constraint, but it seems harmless and useful to add a note to mention it in the docstrings of functions taking an entry path argument.

Going a step further, Camlzip could replace \ with / automatically in entry paths (and also fail on leading slashes or drive?).

@xavierleroy : I'm happy to propose a PR implementing either of these variants if you tell me which one you prefer.

For the interested reader, here is some extra context. Win32 enforces by default a restriction on path lengths to about 256 characters. This can be lifted with some global settings, which cannot be expected on a typical Windows machine. The practical workaround is rather to prepend \\?\ in front of the path, which lifts the restriction; but if we do that, we have to use backslashes in the path -- forwards slashes are normally allowed as well, but not when that prefix is used. This means that Windows applications using Camlzip and supporting long paths will need to juggle between backslashes (for opening the file manually, or passing an input file name to copy_file_to_entry) and forward slashes (for the entry path name).

Random read_entry "decompression error" exception with zlib 1.2.9+

Since the notorious zlib update, I've been having hard-to-pin-down somewhat reproducible issues when processing large zip files (700 mb) and using a lot of memory while processing. I can reproduce them all the time with my core+camlzip processor on of my huge file that I am not able to share publicly, but been unable to extract to a smaller, reproducible case (even adding a debug output may make the issue go away).

The problem manifests itself as suddenly being unable to unpack zip file — the Zlib.Error "decompression error" on random files gets thrown, whereas the zip file is itself perfectly fine.

What I've been able to pinpoint so far is that it started occuring with a zlib commit madler/zlib@b516b4b — Mark Adler added some sanity checks,

The exception gets thrown in camlzip_inflateEnd, as zlib returns error,

the reason for exception is that inflateStateCheck checks the stream structure, and inside there's a "state" substructure that has a reverse pointer to the stream. The new check verifies if these streams are actually equal (madler/zlib@b516b4b#diff-327188edf18799ffbb5a51cc69f797e8R113) — and suddenly, they are not anymore.

Here's my zlib debug info,

# let lines = Zip.read_entry z entry |> String.split ~on:'\n' in ...
inflatestatecheck failed
strm   0x7f7c7cd8f7b0
state  0x2a35070
state->strm 0x7f7c887b5100
state->mode 16203 (distext)
Uncaught exception:

Zip.Error("weather.zip", "wlask.min", "decompression error")

Called from file "src/exn.ml", line 90, characters 6-10

I suppose that probably the garbage collector or something sometimes moves things around and the structure turns invalid, or something — — — any ideas?

(Up-to-date 64-bit archlinux, ocaml 4.04.0 and all via opam)

file descriptor not closed when exception raised in Zip.open_in function

in zip.ml, replace "open_in" by
let open_in filename = let ic = Pervasives.open_in_bin filename in try let (cd_entries, cd_size, cd_offset, cd_comment) = read_ecd filename ic in let entries = read_cd filename ic cd_entries cd_offset (Int32.add cd_offset cd_size) in let dir = Hashtbl.create (cd_entries / 3) in List.iter (fun e -> Hashtbl.add dir e.filename e) entries; { if_filename = filename; if_channel = ic; if_entries = entries; if_directory = dir; if_comment = cd_comment } with exn -> Pervasives.close_in_bin ic; raise exn

Add support for Zip64 format

Some external consumers (e.g. numpy) expect uncompressed_size to be correct. Which is not true in current implementation for files larger than 4GB. Probably it is worth considering throwing exception on inputs exceeding 4GB.

ZIP64 support

Camlzip does not support ZIP64 extensions. We are currently running into limit of 64k files. Has anyone worked on adding ZIP64 support?

Build issue while cross-compiling

Hello
I'm currently trying to properly cross compile your library, but I'm facing an issue.

I notice the following in your README

- Edit the three variables at the beginning of the Makefile to reflect
  the location where Zlib is installed on your system.  The defaults
  are OK for Linux.

But I really prefer using environment variable instead of modifying the sources.
It's easiest for the integration in a complete cross compilation build system
Could I suggest the following patch to avoid such issue ?
Thanks
Erwan

--- Makefile.ori	2017-11-07 11:41:26.375257045 +0100
+++ Makefile	2017-11-07 11:41:38.719314251 +0100
@@ -5,12 +5,12 @@
 
 # The directory containing the Zlib library (libz.a or libz.so)
 # Leave empty if libz is in a standard linker directory
-ZLIB_LIBDIR=
+ZLIB_LIBDIR?=
 # ZLIB_LIBDIR=/usr/local/lib
 
 # The directory containing the Zlib header file (zlib.h)
 # Leave empty if zlib.h is in a standard compiler directory
-ZLIB_INCLUDE=
+ZLIB_INCLUDE?=
 # ZLIB_INCLUDE=/usr/local/include
 
 # Where to install the library.  By default: sub-directory 'zip' of

Assert failing in zip file with a large number of files

Reading a large file with a huge amount of smaller files inside, it fails:

    Assert_failure zip.ml:217:4

    assert((cd_bound = (LargeFile.pos_in ic)) &&
           (cd_entries = 65535 || !entrycnt = cd_entries));

Adding debug dump, I see that the entrycnt is not the same as cd_entries, with a hint that in this case cd_entries are truncated, and are not #ffff, but !entrycnt & 0xffff:

cdbound=34a57695, lpos=34a57695, cd_entries=284c, entrycnt=1284c

Suggested fix:

zip.ml
     assert((cd_bound = (LargeFile.pos_in ic)) &&
-           (cd_entries = 65535 || !entrycnt = cd_entries));
+           (cd_entries = 65535 || cd_entries = !entrycnt land 0xffff || !entrycnt = cd_entries));

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.