Coder Social home page Coder Social logo

crabz's People

Contributors

shnatsel avatar sstadick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

crabz's Issues

Do not panick when piping result and closing pipe with -f bgzf.

Do not panick when piping result and closing pipe with -f bgzf.

$ crabz -p 4 -d -f bgzf test.fastq.gz | head
[2021-09-07T16:57:14Z INFO  crabz] Decompressing (bgzf) with 4 threads available.
@NB501171:702:H7Y55BGXH:4:11401:19233:1053 2:N:0:CACCGCACCA+ANTGACAGTC
NNNNNNTNAAAAATGCCCTAGCCCCCTTCAGAANACAAGGCAAA
+
######/#EEAEEEE/EE//E/E////EA////#AEE/E/E/E<
@NB501171:702:H7Y55BGXH:4:11401:23376:1053 2:N:0:CACCGCACCA+ANTGACAGTC
NNNNNNTNTTATGTAACTAATGCATCTTGCCCTNATCTCTTTGC
+
######E#EEEEEEEEEEEEEE<E/AEEAE/EA#EEEEEEEE/E
@NB501171:702:H7Y55BGXH:4:11401:8502:1053 2:N:0:CACCGCACCA+AATGACAGTC
NNNNNNTNGAAGGCAGACTGCATGGCTTAATTTNAAAAATCATT
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', ~/.cargo/registry/src/github.com-1ecc6299db9ec823/gzp-0.8.0/src/par/decompress.rs:158:35
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Regular gzipped file created instead of a bgzf one, when 1 thread is requested.

Regular gzipped file created instead of a bgzf one, when 1 thread is requested.

crabz -p 1 -f bgzf -o test.csv.bgzf_threads_1.gz test.csv
crabz -p 2 -f bgzf -o test.csv.bgzf_threads_2.gz test.csv

$ file test.csv.bgzf_threads_1.gz test.csv.bgzf_threads_2.gz
test.csv.bgzf_threads_1.gz:                                    gzip compressed data
test.csv.bgzf_threads_2.gz:                                    gzip compressed data, extra field

# Try to decompress the files
$ crabz -p 1 -d -f bgzf test.csv.bgzf_threads_1.gz | wc -l
[2021-09-14T12:22:32Z INFO  crabz] Decompressing (bgzf) with 1 threads available.
Error: Invalid block header: Extra field flag not set
0

$ crabz -p 1 -d -f bgzf test.csv.bgzf_threads_2.gz | wc -l
[2021-09-14T12:23:18Z INFO  crabz] Decompressing (bgzf) with 1 threads available.
100000000

Feature request: add --quiet flag

I wanted to pipe the output from crabz to another program, and noticed it always adds a header to the output.

It would be nice for crabz to detect if it's not running interactively, or at least have the --quiet flag.

gzip should be handled by ParDecompress

When doing parallel compression of a gzip file, crabz uses ZBuilder which instantiates a ParCompress when num_threads > 1.

However, decompression always uses single-threaded MultiGzDecoder. Why is it not using ParDecompress when num_threads > 1?

crabz failed to build on centos 7

I had a build error with centos 7 because the default cmake package is an older version. The steps below help to get crabz built and installed successfully on centos 7.

  1. Install cmake3
dnf install cmake3
alternatives --install /usr/local/bin/cmake cmake /usr/bin/cmake 10 --slave /usr/local/bin/ctest ctest /usr/bin/ctest --slave /usr/local/bin/cpack cpack /usr/bin/cpack --slave /usr/local/bin/ccmake ccmake /usr/bin/ccmake --family cmake
alternatives --install /usr/local/bin/cmake cmake /usr/bin/cmake3 20 --slave /usr/local/bin/ctest ctest /usr/bin/ctest3 --slave /usr/local/bin/cpack cpack /usr/bin/cpack3 --slave /usr/local/bin/ccmake ccmake /usr/bin/ccmake3 --family cmake
  1. build crabz
cargo install crabz --force
  1. REVERT BACK TO DEFAULT CENTOS 7 CMAKE 2
alternatives --config cmake

Does not compile with Rust-only DEFLATE

I have tried to install crabz with Rust-only compressor implementations using the following command:

cargo install crabz --no-default-features --features=snap_default,deflate_rust

It fails to compile with the following error:

error[E0599]: no variant or associated item named `Zlib` found for enum `Format` in the current scope
   --> /home/shnatsel/.cargo/registry/src/github.com-1ecc6299db9ec823/crabz-0.7.2/src/main.rs:281:21
    |
170 | enum Format {
    | ----------- variant or associated item `Zlib` not found here
...
281 |             Format::Zlib => ("zz", string_set!["zz", "z", "gz"]),
    |                     ^^^^ variant or associated item not found in `Format`

For more information about this error, try `rustc --explain E0599`.

compression level 1 error gzip stdin invalid compressed data format violated tar Unexpected EOF in archive

I created a tgz using specific compression level and threads with crabz:

git clone https://github.com/sstadick/crabz.git
mv crabz blah
time tar cf - blah/ | crabz --compression-level=1 --compression-threads=6 > ./blah.tgz
mkdir testcrabz
cd testcrabz/
mv ../blah.tgz .
tar zxf ./blah.tgz 

I expected success when extracting a tgz created with crabz.
Instead, I got the following error output:

gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

NOTE: This error only surfaces when extracting with tar when crabz uses compression level 1.
tar extracts as expected when crabz uses compression levels 0, 2 to 9 BUT NOT 1.

Performance Tracking

@Shnatsel I'm moving performance tracking for crabz related things to here.

Run the same benchmarks as found here: https://github.com/zlib-ng/pigzbench with the different backends.

The zlib-ng benchmarks pretty clearly indicate that zlib-ng is the way to go as a backend, which matches what I see in benchmarks. zlib and the rust backends for flate2 perform about the same.

Fails to build without the snap feature enabled

cargo auditable install crabz --no-default-features --features=deflate_rust results in the following error:

error[E0599]: no variant or associated item named `Snap` found for enum `Format` in the current scope
   --> /home/shnatsel/.cargo/registry/src/github.com-1ecc6299db9ec823/crabz-0.8.1/src/main.rs:264:21
    |
150 | enum Format {
    | ----------- variant or associated item `Snap` not found for this enum
...
264 |             Format::Snap => ("sz", string_set!["sz", "snappy"]),
    |                     ^^^^ variant or associated item not found in `Format`

For more information about this error, try `rustc --explain E0599`.

Use ISA-L as a backend for decompression and the lowest compression levels

Have you tried https://github.com/intel/isa-l?

It can be installed through conda and then you get igzip. It is awesome.

  • Fastest decompression. (2x zlib)
  • Very fast level 0 (extra level they added), 1, 2 and 3. Level 1 is 5(!!!) times faster than zlib while compressing better.

ISA-L has library functions that can be called. Since crabz is already dynamically choosing zlib-ng over zlib adding ISA-L should be a possibility. The best mix in my opinion is:

  • decompression: ISA-L
  • compression level 1 and 2: ISA-L
  • compression 3 and higher: zlib-ng or libdeflate (depending on format).

Thanks for the crabz project and have a nice day!

Feature request: add --no-name flag

By default the gzip header saves a NULL-terminated filename and a timestamp. However, having these results in non-reproducible output for the same content.

Therefore pigz, gzip and igzip all feature a --no-name flag in order to not include the filename and set the timestamp in the gzip header to 0.

It would be great if crabz could have the same flag for inclusion in xopen. (https://github.com/pycompression/xopen). This library enhances python compression speed by piping through external programs and since a few releases back always creates reproducible output by default.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.