Coder Social home page Coder Social logo

fastzip's Introduction

fastzip

godoc Build Status

Fastzip is an opinionated Zip archiver and extractor with a focus on speed.

  • Archiving and extraction of files and directories can only occur within a specified directory.
  • Permissions, ownership (uid, gid on linux/unix) and modification times are preserved.
  • Buffers used for copying files are recycled to reduce allocations.
  • Files are archived and extracted concurrently.
  • By default, the excellent github.com/klauspost/compress/flate library is used for compression and decompression.

Example

Archiver

// Create archive file
w, err := os.Create("archive.zip")
if err != nil {
  panic(err)
}
defer w.Close()

// Create new Archiver
a, err := fastzip.NewArchiver(w, "~/fastzip-archiving")
if err != nil {
  panic(err)
}
defer a.Close()

// Register a non-default level compressor if required
// a.RegisterCompressor(zip.Deflate, fastzip.FlateCompressor(1))

// Walk directory, adding the files we want to add
files := make(map[string]os.FileInfo)
err = filepath.Walk("~/fastzip-archiving", func(pathname string, info os.FileInfo, err error) error {
	files[pathname] = info
	return nil
})

// Archive
if err = a.Archive(context.Background(), files); err != nil {
  panic(err)
}

Extractor

// Create new extractor
e, err := fastzip.NewExtractor("archive.zip", "~/fastzip-extraction")
if err != nil {
  panic(err)
}
defer e.Close()

// Extract archive files
if err = e.Extract(context.Background()); err != nil {
  panic(err)
}

Benchmarks

Archiving and extracting a Go 1.13 GOROOT directory, 342M, 10308 files.

StandardFlate is using compress/flate, NonStandardFlate is klauspost/compress/flate, both on level 5. This was performed on a server with an SSD and 24-cores. Each test was conducted using the WithArchiverConcurrency and WithExtractorConcurrency options of 1, 2, 4, 8 and 16.

$ go test -bench Benchmark* -archivedir go1.13 -benchtime=30s -timeout=20m

goos: linux
goarch: amd64
pkg: github.com/saracen/fastzip
BenchmarkArchiveStore_1-24                            39         788604969 ns/op         421.66 MB/s     9395405 B/op     266271 allocs/op
BenchmarkArchiveStandardFlate_1-24                     2        16154127468 ns/op         20.58 MB/s    12075824 B/op     257251 allocs/op
BenchmarkArchiveStandardFlate_2-24                     4        8686391074 ns/op          38.28 MB/s    15898644 B/op     260757 allocs/op
BenchmarkArchiveStandardFlate_4-24                     7        4391603068 ns/op          75.72 MB/s    19295604 B/op     260871 allocs/op
BenchmarkArchiveStandardFlate_8-24                    14        2291624196 ns/op         145.10 MB/s    21999205 B/op     260970 allocs/op
BenchmarkArchiveStandardFlate_16-24                   16        2105056696 ns/op         157.96 MB/s    29237232 B/op     261225 allocs/op
BenchmarkArchiveNonStandardFlate_1-24                  6        6011250439 ns/op          55.32 MB/s    11070960 B/op     257204 allocs/op
BenchmarkArchiveNonStandardFlate_2-24                  9        3629347294 ns/op          91.62 MB/s    18870130 B/op     262279 allocs/op
BenchmarkArchiveNonStandardFlate_4-24                 18        1766182097 ns/op         188.27 MB/s    22976928 B/op     262349 allocs/op
BenchmarkArchiveNonStandardFlate_8-24                 34        1002516188 ns/op         331.69 MB/s    29860872 B/op     262473 allocs/op
BenchmarkArchiveNonStandardFlate_16-24                46         757112363 ns/op         439.20 MB/s    42036132 B/op     262714 allocs/op
BenchmarkExtractStore_1-24                            20        1625582744 ns/op         202.66 MB/s    22900375 B/op     330528 allocs/op
BenchmarkExtractStore_2-24                            42         786644031 ns/op         418.80 MB/s    22307976 B/op     329272 allocs/op
BenchmarkExtractStore_4-24                            92         384075767 ns/op         857.76 MB/s    22247288 B/op     328667 allocs/op
BenchmarkExtractStore_8-24                           165         215884636 ns/op        1526.02 MB/s    22354996 B/op     328459 allocs/op
BenchmarkExtractStore_16-24                          226         157087517 ns/op        2097.20 MB/s    22258691 B/op     328393 allocs/op
BenchmarkExtractStandardFlate_1-24                     6        5501808448 ns/op          23.47 MB/s    86148462 B/op     495586 allocs/op
BenchmarkExtractStandardFlate_2-24                    13        2748387174 ns/op          46.99 MB/s    84232141 B/op     491343 allocs/op
BenchmarkExtractStandardFlate_4-24                    21        1511063035 ns/op          85.47 MB/s    84998750 B/op     490124 allocs/op
BenchmarkExtractStandardFlate_8-24                    32         995911009 ns/op         129.67 MB/s    86188957 B/op     489574 allocs/op
BenchmarkExtractStandardFlate_16-24                   46         652641882 ns/op         197.88 MB/s    88256113 B/op     489575 allocs/op
BenchmarkExtractNonStandardFlate_1-24                  7        4989810851 ns/op          25.88 MB/s    64552948 B/op     373541 allocs/op
BenchmarkExtractNonStandardFlate_2-24                 13        2478287953 ns/op          52.11 MB/s    63413947 B/op     373183 allocs/op
BenchmarkExtractNonStandardFlate_4-24                 26        1333552250 ns/op          96.84 MB/s    63546389 B/op     373925 allocs/op
BenchmarkExtractNonStandardFlate_8-24                 37         817039739 ns/op         158.06 MB/s    64354655 B/op     375357 allocs/op
BenchmarkExtractNonStandardFlate_16-24                63         566984549 ns/op         227.77 MB/s    65444227 B/op     379664 allocs/op

fastzip's People

Contributors

kumbayo avatar saracen avatar streppel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fastzip's Issues

Prevent unnecessary directory in zip?

When using fastzip, by default, there will be a single directory at the top level of the zip, which contains everything.

Is there a way to not have this top level directory?

Zips created in windows are not valid in linux due to the invalid separator

Even if Archive is provided with the files map that has "/" separator in keys it is changed to "\" in the system calls in the ArchiveWithContext. When "\" is used as separator linux sees invalid folder structure.

Fix: in ArchiveWithContext change

fileInfoHeader(rel, fi, hdr)

to

fileInfoHeader(strings.ReplaceAll(rel, "\\", "/"), fi, hdr)

or some similar solution using os.PathSeparator

zip: not a valid zip file

I copied the README code almost verbatim as far as I can tell, but I'm seeing this error

panic: zip: not a valid zip file

My code:

package main

import (
	"context"
	"os"
	"path/filepath"

	"github.com/saracen/fastzip"
)

func main() {
	sourceDir := "util/"
	zipFile := "test.zip"
	w, err := os.Create(zipFile)
	if err != nil {
		panic(err)
	}
	defer w.Close()

	a, err := fastzip.NewArchiver(w, sourceDir)
	if err != nil {
		panic(err)
	}
	defer a.Close()

	files := make(map[string]os.FileInfo)
	err = filepath.Walk(sourceDir, func(pathname string, info os.FileInfo, err error) error {
		files[pathname] = info
		return nil
	})

	if err = a.Archive(context.Background(), files); err != nil {
		panic(err)
	}

	e, err := fastzip.NewExtractor(zipFile, "./")
	if err != nil {
		panic(err) // PANICS
	}
	defer e.Close()

	if err = e.Extract(context.Background()); err != nil {
		panic(err)
	}
}

When I unzip the file in OSX Finder, it works just fine :-?

Decompressed file content is messed up

I'm archiving a folder containing .git/config file with fastzip on Linux, following the example shown in README.

When extracting said file, I'm expecting to see:

[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
[remote "origin"]
        url = ...
[receive]
        denyNonFastForwards = false
        denyCurrentBranch = ignore
        denyDeleteCurrent = ignore

However, I get:

<..J.@^PE.._1.^CL..Z^H..>.(.%."..$;..Sfg+.{..>^....w,x0...3).yb.VO(.8A^KkSM^T0....JAS^MV.0....^E#+...fJ....^Dh..^^....N.....g.v]7........9....4w^O..5...^X.fR_.^[..^?I\.^Xy.qH..v.t~...Y.,?V.|...]^Q../b...^B....

This happens with both golang sdk (using NewExtractor) and with traditional unix tar command.
Worth mentioning, is that this is only the case when using the default Deflate compression method. Using 0 (Store) method works fine (no compression though, which is expected).

I'd appreciate any help with this issue.

EDIT: Compressing only the file itself (as opposed to the folder containing it) seems to work fine.

add custom root folder during archival?

๐Ÿ‘‹ I'm trying to compress a bunch of files and have everything inside a root folder (its name differs from the source directory).

So given

/tmp/source-dir
   /A.txt
   /B.txt

would create a ZIP file with a structure like this:

/myroot
   /A.txt
   /B.txt

I can't get the myroot root directory working without renaming the original source dir (not an option, unfortunately). I tried using symlinks (seems to compress symlink itself, not follow it?) and changing the file path in the filepath.Walk function (results in "file not found").

Is there a way to do this?

PS: I can't do this on the extract, since I have no control over that part

Archiver modifies the last modified date of the folder to archive

Hi,

I'm on MS Windows and when I use your "Archiver example" from the readme.me, the last modified date of the folder to archive is changed to now().

Instead of ~/fastzip-archiving
I'm using R:/tst_zip/zip me" which contains about 2000 files and 300 folders

Is there a way to avoid this behavior? I know that the last modified date is not know on linux but it is on Windows and it shouldn't be reset only because the folder gets zipped...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.