Coder Social home page Coder Social logo

parallel-gzip's Introduction

For the curious, gzip-src contains our initial modifications using OpenMP. The real meat is in pgzip-src, containing a parallel implementation of gzip using p-threads. The below describes the contents of the pgzip-src folder.

To test out the suite, navigate to parallel-gzip/pgzip-src, and run ./test_suite.sh . This will get you all the binaries and test it on a bunch of files. The output times and othe rinfo will go to tso.txt.

The test suite only does filesizes of 20, 100 and 250 MB, though if you wish you can also do them for 500 MB, 750, 1000, 2000 (or any size you desire) by commenting out/adding the desired lines in test_suite.sh .

Last modified: 12/2/11.


----------------------------------------------------------------------------

This is a set of various implementations, both sequential and parallel of the gzip compression program.
There are four folders: gzip, pigz, quickzip, and pgzip. The folder gzip is the standard implementation
of gzip found at www.gzip.org. There is a script called init_script.sh which can be run and will download
the file, build it, and copy the executable to the directory. The folder pigz is the standard implementation
of parallel gzip found at www.zlib.net/pigz/. There is a script called init_script.sh which can be run
and will download the file, build it, and copy the executable to the directory. These are the standard
two implementations most people use.

The next folder quickzip, implements pseudo gzip parallelism, by simply splitting a file into chunks (unix
split command) and then compresses each in a background process, we could join them back by removing the last
two bytes in each chunk and then cat the compressed files and add a crc byte and a length byte. Furthermore
we would need to remove the end block codes of each seperate chunk except the last one to comply with gzip file
format. However we do not join in this manner because we wanted a clean implemenation and did not want to modify
end chunk codes, which would need to be done in the gzip src (see LINE 653 in deflate.c source from gzip). That
is why we call it pseudo gzip, since it is technically not RFC Gzip, but does the same job.

The next folder pgzip, implements true gzip parallelism. It follows a similar approach as pigz, but absolutely no
source code is borrowed or taken as inspiration from that project. It is developed purely from the gzip source files,
and proceeds to create a threadpool and then send chunks of data to free threads, and then takes finished work and
flushes to a file appropriately. For the time being it only supports compression.

----------------------------------------------------------------------------

Build Details:

    gzip: simply go to directory gzip and run init_script.sh
    pigz: simply go to directory pigz and run init_script.sh
    pgzip: simply go to directory pgzip and run make
    quickzip: depends on having gzip and python, but otherwise you're set

----------------------------------------------------------------------------

Contributors:

    https://github.com/a-wild-tigger
    https://github.com/SeanHogan
    https://github.com/norseboar

----------------------------------------------------------------------------

parallel-gzip's People

Contributors

hantaniold avatar norseboar avatar

Stargazers

z0r1nga avatar

Watchers

z0r1nga avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.