flanglet / kanzi Goto Github PK

Fast lossless data compression in Java

License: Apache License 2.0

Java 100.00%

compression java lossless-data-compression lz77 bwt huffman multithreading decompression

kanzi's Issues

BWTBlockCodec requires 1Gb+ of memory regardless of input or options

Running Kanzi on a 12Kb text file with default codec allocates a buffer of at least 1Gb. See BWTBlockCodec.java:
@OverRide
public int getMaxEncodedLength(int srcLen)
{
return srcLen + BWT_MAX_HEADER_SIZE + BWT.maxBlockSize();
}
Where BWT.maxBlockSize() will return a final static constant equal to 1Gb.
Is that amount of memory really required? The compressed stream should normally be of the same order as the uncompressed stream. Maybe there should be a Math.min() instead of a sum.
With Java 8 on my computer with the default memory options the process fails with an out of memory exception. I need at least -Xmx4g to make it work.
Thanks

extra space

source has many extra white space with line break

Help with an example on image java!

Hi Kanzi, it seems like your code is very good and useful. However, i'm trying to use it on an image java and i'm having a lot of problems.
Is there an example of how to use it on image from reading to writing ?
I would appreciate a lot, thanks!!!
;)

no main manifest attribute, in kanzi-x.x.x.jar

Hi, I apologize in advance, I'm not good enough to compile this project myself and I'm not even a programmer, so I always ask my friend to compile.
With every next version he compiles I get an error when running
no main manifest attribute, in kanzi-x.x.x.jar.
My friend says it's because a this line needs to be added in META-INF\MANIFEST.MF
Main-Class: kanzi.app.Kanzi, which must be terminated by enter.
I always have to unzip kanzi-x.x.x.jar, add this line and zip the file back again.
Is it possible to fix this in the source code so I don't have to do this?
The second thing is that my friend said that the latest version 2.1 is reporting as 2.0.0, so version probably is not overridden some in the source code.

"index out of range" error in Go BWT Forward transform

Kanzi --compress --input=bwt2.bin --entropy=none --transform=bwt --force

Kanzi 1.1 (C) 2017, Frederic Langlet
Encoding ...
panic: runtime error: index out of range

goroutine 5 [running]:
kanzi/transform.(*BWT).Forward(0xc42000e900, 0xc4201f2000, 0xcccc, 0xcccc, 0xc420200003, 0xcccd, 0xcccd, 0x3, 0xc420042c80, 0x40cd6d, ...)
/home/user/go/src/kanzi/transform/BWT.go:146 +0x249
kanzi/function.(*BWTBlockCodec).Forward(0xc42000c0a8, 0xc4201f2000, 0xcccc, 0xcccc, 0xc420200000, 0xccd0, 0xccd0, 0x100, 0x0, 0x101000000000000, ...)
/home/user/go/src/kanzi/function/BWTBlockCodec.go:73 +0x110
kanzi/function.(*ByteTransformSequence).Forward(0xc42000ad20, 0xc4201f2000, 0xcccc, 0xcccc, 0xc420200000, 0xccd0, 0xccd0, 0x0, 0x0, 0x0, ...)
/home/user/go/src/kanzi/function/ByteTransformSequence.go:85 +0x1eb
kanzi/io.(*EncodingTask).encode(0xc4200d82a0)
/home/user/go/src/kanzi/io/CompressedStream.go:466 +0xb97
created by kanzi/io.(*CompressedOutputStream).processBlock
/home/user/go/src/kanzi/io/CompressedStream.go:391 +0x39d

bwt2.bin.gz

Documentation: LZ4 release version?

Which LZ4 release is the code based on? Sorry if I missed in the code but I can't seem to find it.

Is the latest r129?

Performance X Size Compress

I have tested the kanzi package with a image and the performance is fast, you really has done a great job.
But the gz format is more compressed and fastest than that snp, 18 kb aproximately. It's possible reduce the size of the snp format? Or this is not the objective of the snp format?
In my test the size buffer in .gz is the same that used in snp format 32768.

GZIPOutputStream out = new GZIPOutputStream(new FileOutputStream(outfile));
byte[] buf = new byte[32768];
int len;
while ((len = in.read(buf)) > 0) {
out.write(buf, 0, len);
}

I replaced the CompressedOutputStream("None", "Snappy", out); by CompressedOutputStream(out) of the emory-util-io.jar version 2.1 and the snappy more fastest than .gz, but still the snp is bigger, aproximately 7 kb.

edu.emory.mathcs.util emory-util-io 2.1 '

"index out of range" in BWT transform

Running kanzi on a truncated "bib" file from from calgary test suite generates an error message.

./Kanzi -compress -input=bib-truncated -output=bib.kanzi -transform=bwt -entropy=none -overwrite

_Kanzi 1.0 (C) 2017, Frederic Langlet
Encoding ...
panic: runtime error: index out of range

goroutine 5 [running]:
kanzi/transform.(*DivSufSort).ssMultiKeyIntroSort(0xc420072980, 0xb273, 0xb37, 0xe14, 0x2)
/home/user/go/src/kanzi/transform/DivSufSort.go:1261 +0x530
kanzi/transform.(*DivSufSort).ssSort(0xc420072980, 0xb273, 0xb37, 0xe14, 0x522d, 0x6046, 0x2, 0x104a0, 0x0)
/home/user/go/src/kanzi/transform/DivSufSort.go:452 +0x2ac
kanzi/transform.(*DivSufSort).sortTypeBstar(0xc420072980, 0xc42009f800, 0x100, 0x100, 0xc42020a000, 0x10000, 0x10000, 0x104a0, 0xc420072980)
/home/user/go/src/kanzi/transform/DivSufSort.go:280 +0x8c8
kanzi/transform.(*DivSufSort).ComputeSuffixArray(0xc420072980, 0xc4201e6000, 0x104a0, 0x104a0, 0x104a1, 0x104a1, 0x1b600)
/home/user/go/src/kanzi/transform/DivSufSort.go:112 +0xdb
kanzi/transform.(*BWT).Forward(0xc42006a8a0, 0xc4201e6000, 0x104a0, 0x104a0, 0xc4201f8003, 0x104a1, 0x104a1, 0x0, 0xc420040c80, 0x40cd6d, ...)
/home/user/go/src/kanzi/transform/BWT.go:137 +0x12e
kanzi/function.(*BWTBlockCodec).Forward(0xc42000c0b0, 0xc4201e6000, 0x104a0, 0x104a0, 0xc4201f8000, 0x104a4, 0x104a4, 0x4396bb, 0x10, 0x10100000053d300, ...)
/home/user/go/src/kanzi/function/BWTBlockCodec.go:73 +0x110
kanzi/function.(*ByteTransformSequence).Forward(0xc42000ade0, 0xc4201e6000, 0x104a0, 0x104a0, 0xc4201f8000, 0x104a4, 0x104a4, 0x0, 0x0, 0x0, ...)
/home/user/go/src/kanzi/function/ByteTransformSequence.go:85 +0x1eb
kanzi/io.(*EncodingTask).encode(0xc4200cc380)
/home/user/go/src/kanzi/io/CompressedStream.go:466 +0xb97
created by kanzi/io.(*CompressedOutputStream).processBlock
/home/user/go/src/kanzi/io/CompressedStream.go:391 +0x39d_

bib.zip

Kanzi not available on Maven Central?

Hello! I've got a Clojure project that I'd like to integrate Kanzi into. It uses Leiningen to manage dependencies, which requires it to pull from either a Maven repository (for Java projects, Maven Central is the default) or a Clojure one (for Clojure projects, Clojars is the default).

Is there a plan to get artifacts for Kanzi to be deployed into Maven Central?

when I tried to use the same algorithm on PDF and DOCX formats, the effect was not ideal.

For the format of Test TEXT, I tried to use TPAQ algorithm, transform: X86+RLT+TEXT, which had a good effect. However, when I tried to use the same algorithm on PDF and DOCX formats, the effect was not ideal. I would like to ask how to set the corresponding algorithm and transform

HashMap<String, Object> ctx = new HashMap<>();
        ctx.put("transform", "X86+RLT+TEXT");
        ctx.put("codec", "TPAQ");
        ctx.put("blockSize", 1024 * 1024);
        ctx.put("checksum", false);
        ctx.put("pool", pool); // not necessary if jobs = 1
        ctx.put("jobs", 4);

Go code (benchmarks in particular) are not go-gettable.

It would be nice if this

go get -d -t -v github.com/flanglet/kanzi/go/src/kanzi/benchmark

obtained the Kanzi benchmarks and all their dependencies.
However, this is the result instead:

There was an error running 'go get', stderr = github.com/flanglet/kanzi (download)
package kanzi/bitstream: unrecognized import path "kanzi/bitstream" (import path does not begin with hostname)
package kanzi/entropy: unrecognized import path "kanzi/entropy" (import path does not begin with hostname)
package kanzi/function: unrecognized import path "kanzi/function" (import path does not begin with hostname)
package kanzi/io: unrecognized import path "kanzi/io" (import path does not begin with hostname)
package kanzi/transform: unrecognized import path "kanzi/transform" (import path does not begin with hostname)

I realize this may not be fixable; it is however an issue, and preventing inclusion of the Kanzi benchmarks into a suite of benchmarks run somewhat more automatically. (See https://github.com/dr2chase/bent ).

Go BWT transform freezes on some input

Trying a BWT transform on a block from the abba file (from the gauntlet corpus) freezes.
abba-2.gz

./Kanzi -compress -input=abba-2 -transform=bwt -entropy=none -verbose=4 -overwrite -block=32000000

Kanzi 1.0 (C) 2017, Frederic Langlet
Input file name set to 'abba-2'
Output file name set to 'abba-2.knz'
Block size set to 32000000 bytes
Verbosity set to 4
Overwrite set to true
Checksum set to false
Using BWT transform (stage 1)
Using no entropy codec (stage 2)
Using 1 job
Encoding ...
{ "type":"BEFORE_TRANSFORM", "id":1, "size":3150180, "time":1496668411814}
^C

"index out of range" in Go BWT tranform

Running kanzi/BWT transform on a block of "tra1" file from from calgary test suite generates an error message in the Go implementation (not in cpp).
Thanks for all your work !

./app -compress -input=tra1-truncated -output=tra1-cpp -transform=bwt -entropy=none -overwrite
Kanzi 1.0 (C) 2017, Frederic Langlet
Encoding ...
panic: runtime error: index out of range

goroutine 5 [running]:
kanzi/transform.(*DivSufSort).ssCompare3(0xc420070980, 0x2e69, 0x45ed, 0x2, 0x3)
/home/user/go/src/kanzi/transform/DivSufSort.go:523 +0x13b
kanzi/transform.(*DivSufSort).ssMergeForward(0xc420070980, 0x3264, 0x1001, 0x1260, 0x2001, 0x2001, 0x2)
/home/user/go/src/kanzi/transform/DivSufSort.go:863 +0x10f
kanzi/transform.(*DivSufSort).ssSwapMerge(0xc420070980, 0x3264, 0x1001, 0x1260, 0x2001, 0x2001, 0x6d5, 0x2)
/home/user/go/src/kanzi/transform/DivSufSort.go:735 +0x5d2
kanzi/transform.(*DivSufSort).ssSort(0xc420070980, 0x3264, 0x0, 0x26d6, 0x2fa7, 0x2bd, 0x2, 0x620b, 0x1)
/home/user/go/src/kanzi/transform/DivSufSort.go:444 +0x1c7
kanzi/transform.(*DivSufSort).sortTypeBstar(0xc420070980, 0xc42009d800, 0x100, 0x100, 0xc4201fa000, 0x10000, 0x10000, 0x620b, 0xc420070980)
/home/user/go/src/kanzi/transform/DivSufSort.go:280 +0x8c8
kanzi/transform.(*DivSufSort).ComputeSuffixArray(0xc420070980, 0xc4201e6000, 0x620b, 0x620b, 0x620d, 0x620d, 0xc420040b00)
/home/user/go/src/kanzi/transform/DivSufSort.go:112 +0xdb
kanzi/transform.(*BWT).Forward(0xc42006a8a0, 0xc4201e6000, 0x620b, 0x620b, 0xc4201eca82, 0x620d, 0x620d, 0x3, 0xc420040c80, 0x40cd6d, ...)
/home/user/go/src/kanzi/transform/BWT.go:137 +0x12e
kanzi/function.(*BWTBlockCodec).Forward(0xc42000c0b0, 0xc4201e6000, 0x620b, 0x620b, 0xc4201eca80, 0x620f, 0x620f, 0x4396bb, 0x10, 0x10000000053d420, ...)
/home/user/go/src/kanzi/function/BWTBlockCodec.go:73 +0x110
kanzi/function.(*ByteTransformSequence).Forward(0xc42000ade0, 0xc4201e6000, 0x620b, 0x620b, 0xc4201eca80, 0x620f, 0x620f, 0x0, 0x0, 0x0, ...)
/home/user/go/src/kanzi/function/ByteTransformSequence.go:85 +0x1eb
kanzi/io.(*EncodingTask).encode(0xc4200cc380)
/home/user/go/src/kanzi/io/CompressedStream.go:466 +0xb97
created by kanzi/io.(*CompressedOutputStream).processBlock
/home/user/go/src/kanzi/io/CompressedStream.go:391 +0x39d

tra1-truncated.zip

Go BWT Inverse on a particular file generates "index out of range" error

Kanzi --compress --input=bwt.bin --output=test.kanzi --transform=BWT --entropy=none --block="13000000" --force
Kanzi 1.1 (C) 2017, Frederic Langlet
Encoding ...

Encoding: 793 ms
Input size: 12078908
Output size: 12078929
Ratio: 1.000002
Throughput (KB/s): 14874

Kanzi --decompress --input=test.kanzi

Warning: the input file name does not end with the .KNZ extension
Kanzi 1.1 (C) 2017, Frederic Langlet
Decoding ...
runtime error: index out of range

bwt.bin.gz

add all the options to the help thing

wtf dude

The wiki pdf link is dead

It is available on archive.org tho

flanglet / kanzi Goto Github PK

kanzi's Issues

BWTBlockCodec requires 1Gb+ of memory regardless of input or options

extra space

Help with an example on image java!

no main manifest attribute, in kanzi-x.x.x.jar

"index out of range" error in Go BWT Forward transform

Documentation: LZ4 release version?

Performance X Size Compress

"index out of range" in BWT transform

Kanzi not available on Maven Central?

when I tried to use the same algorithm on PDF and DOCX formats, the effect was not ideal.

Go code (benchmarks in particular) are not go-gettable.

Go BWT transform freezes on some input

"index out of range" in Go BWT tranform

Go BWT Inverse on a particular file generates "index out of range" error

add all the options to the help thing

The wiki pdf link is dead

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent