Coder Social home page Coder Social logo

koeppl / low-lz78 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rcanovas/low-lz78

0.0 2.0 0.0 238 KB

Computing LZ78 in low memory

Home Page: https://dl.acm.org/doi/10.1145/3481638

License: GNU General Public License v3.0

CMake 34.78% C++ 36.41% C 19.20% Makefile 0.27% Shell 9.34%
lz78 lz78-compression

low-lz78's Introduction

Low-LZ78

The Low-LZ78 project contains the C++11 and C codes associated with the following reference: D. Arroyuelo, R. Canovas, G. Navarro, and R. Raman. "LZ78 Compression in Low Main Memory Space". To appear in Proc. SPIRE'17

This work includes the following data structures: C++11:

  • hlz78: A LZ78 data structure that uses a fixed hash table.
  • ghlz78: A LZ78 data structure that uses a growing hash table.
  • mhlz78: A LZ78 data structure that uses multi-hash tables. C (32 bits):
  • lz78_min: The LZ78 compact representation of Arroyuelo and Navarro ("Space-efficient construction of Lempel-Ziv compressed text indexes". Information and Computation, 209(7):1070โ€“1102, 2011).
  • lz78_uc: The LZ78 uncompressd baseline representation used by Arroyuelo and Navarro.

Change in the header file include/SLZ78/defs.h the macro BONSAI_HASH_TABLE for a different hash table used for storing the displacement information of the used Bonsai hash table.

Compile

To be able to compile the codes:

  • Install sdsl-lite. Follow the installation guide here: (https://github.com/simongog/sdsl-lite)
  • Modify the location of the sdsl library in the CMakeLists.txt if necessary.
  • Require cmake version 3.5 or higher.
  • For the C++11 code, go to the build folder (create the folder if it does not exists) and run:
    • cmake ..
    • make
  • For the C codes, go to the lz78_c32 folder and run:
    • make

Methods

hlz78/ghlz78/mhlz78

-[CompressText] :

Use: ./CompressText file_name <opt> 
  		<file_name>: Name of the file to be compressed 
	<opt> : 
	  	-o output_name:  String containing the name of the output file. Default file_name
	  	-s sigma: Size of the alphabet of the input text. Default s = 256
	  	-f factor: Integer value indicating the overload factor used  for the Hash table. Default factor = 5%
	  	-d D_bits: Number of bits used to store each collision value. Default d=0 (meaning that it is compute internally)
		-w Index_type. Default = 0
    		    ---+--------------------
		     0 | HLZ78 using a map for the displacements 
		     1 | HLZ78 using a hash table and a sublayer for displacements
		     2 | MHLZ78 using maps for the displacements
		     3 | MHLZ78 using a hash table and a sublayer for displacements
		     4 | GHLZ78 using maps for the displacements
		     5 | GHLZ78 using a hash table and a sublayer for displacements
    		     

      	output:  <output_name>.index_type

	Example: ./CompressText file -w 1 -s 97 -f 10 -d 3
	output:  file.hlz78_hash

-[DecompressHLZ] :

Use: ./DecompressHLZ index_type <opt>
	<index_type>: Name of the compressed index file
	<opt> (index details are requiered): 
		-o output_name:  String containing the name of the output file. Default <index_name>_decom
		-w Index_type. Default = 0
		 # | Index_type
		 ---+--------------------
		  0 | HLZ78 using a map for the displacements
		  1 | HLZ78 using a hash table and a sublayer for displacements
		  2 | MHLZ78 using maps for the displacements
		  3 | MHLZ78 using a hash table and a sublayer for displacements
		  4 | GHLZ78 using maps for the displacements
		  5 | GHLZ78 using a hash table and a sublayer for displacements
 
	output:  output_name file

	Example: ./DecompressHLZ file.hlz78_hash -w 1 -o file   
	output: file (uncompress version of file.hlz78_hash)

lz78_min/lz78_uc

-[Clz78_(min/uc)] :

Use: ./Clz78_(min/uc) file_name
	file_name: Name of the file to be compressed 

	output:  file_name.lzt file

	Example: ./Clz78_min file 
	output: file.lzt (Compressed version of file)

-[Dlz78_(min/uc)] :

Use: ./Dlz78_(min/uc) file_name
	file_name: Name of the file to be decompressed (file_name.lzt must exist) 

	output:  file_name file

	Example: ./Dlz78_min file 
	output: file (Decompressed version of file.lzt. Note: the original file is overwrite)

The results presented in the paper "LZ78 Compression in Low Main Memory Space" were generated using the command line:

	./CompressText in_file -w type_index -s sigma -f factor
	using w = {1,3,5} and f = {5,10,20,40,60}, and files (with sigma):
		dblp.xml (s=97)
		proteins (s=27)
		english  (s=237) --> Note: we needed to erasa /0 characters from the original text before compressing
		DNA      (s=51)

For more information about the data structures and the test sets used please refer to the paper.

Note: These codes assume that the computer used has enough RAM to read and store the complete input.

low-lz78's People

Contributors

koeppl avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.