Coder Social home page Coder Social logo

andrew-epstein / paq8gen Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gotthardtz/paq8gen

0.0 0.0 0.0 281 KB

Experimental (genomic) sequencing data compressor based on the paq8* data compression software family

License: GNU General Public License v2.0

C++ 98.84% CMake 0.69% Shell 0.02% Batchfile 0.45%

paq8gen's Introduction


                              PAQ8GEN README

---------
COPYRIGHT
---------

    Copyright (C) 2008 Matt Mahoney, Serge Osnach, Alexander Ratushnyak,
    Bill Pettis, Przemyslaw Skibinski, Matthew Fite, wowtiger, Andrew Paterson,
    Jan Ondrus, Andreas Morphis, Pavel L. Holoborodko, Kaido Orav, Simon Berger,
    Neill Corlett, Márcio Pais, Andrew Epstein, Mauro Vezzosi, Zoltán Gotthardt,
    Sebastian Lehmann, Moisés Cardona, Byron Knoll

    We would like to express our gratitude for the endless support of many 
    contributors who encouraged PAQ8PX/PAQ8GEN development with ideas, testing, 
    compiling, debugging:
    LovePimple, Skymmer, Darek, Stephan Busch, m^2, Christian Schneider,
    pat357, Rugxulo, Gonzalo, a902cd23, pinguin2, Luca Biondi - and many others.

-------
LICENCE
-------

    See: LICENSE

    For a summary of your rights and obligations in plain language visit 
    https://tldrlegal.com/license/gnu-general-public-license-v2

-----
ABOUT
-----

PAQ is a series of experimental lossless data compression software that have gone 
through collaborative development to top rankings on several benchmarks measuring 
compression ratio (although at the expense of speed and memory usage).

PAQ8GEN (this branch) is one of the longest living branch of the PAQ series forked from 
PAQ8PX v200 by Zoltán Gotthardt in 2021. This branch is specialized (but not limited) to
(genomic) sequencing data. For history and current state of development visit 
the encode.su forum thread (https://encode.su/threads/3549-paq8gen-sequence-compressor)

-----------------
HOW DOES IT WORK?
-----------------

Data compression is carried out by compressing the input file bit by bit using 
context mixing i.e. mixing predictions from many models. 

----------------
PAQ8GEN ARCHIVES
----------------

A PAQ8GEN archive stores genomic sequence contents in a highly compressed format.
Compression is not limited to genomic sequences, you can supply any file to PAQ8GEN 
in any format.

You can recognize a PAQ8GEN archive from its file extension. The file extension reflects 
the exact PAQ8GEN version that created the archive. You may also examine the first few
bytes of the file. If it reads "paq8gen" then it is (most probably) a PAQ8GEN archive. 
Exact version information cannot be inferred from the archive content.

A PAQ8GEN archive may contain only a single file without any file attributes. Only the 
file content is preserved.  There is no built-in multiple file or folder compression 
support.

More notes:
Files and archives larger than 2 GB are not (yet) supported.
PAQ8GEN archives are not compatible with previous or future PAQ8GEN releases.
PAQ8GEN is tuned for the FASTA format.

----------
HOW TO USE
----------

This software is portable, you don't need to install it.

You may copy paq8gen.exe to the folder where your files to be compressed are located
and launch PAQ8GEN from the command line (cmd.exe in Windows, a terminal of your choice 
in Linux and macOS). 

----------------------
COMMAND LINE INTERFACE
----------------------

A graphical user interface is not provided, PAQ8GEN runs from the command line.
See the help screen for command line options.
For the help screen just launch PAQ8GEN from the command line without any options.

--------------
HOW TO COMPILE
--------------

Building PAQ8GEN requires a C++17 capable C++ compiler:
https://en.cppreference.com/w/cpp/compiler_support#cpp17

Windows 
If you are a Windows user you don't need to compile the source. Just grab the latest
executable from the github repository: https://github.com/GotthardtZ/paq8gen/releases
If you would like to build an executable yourself you may use the Visual Studio solution
file or in case of Mingw-w64 see the "build-mingw-w64-generic-publish.cmd" batch file
in the build subfolder.

Linux/macOS
gcc/clang users on Linux/macOS may use the following commands to build:

 sudo apt-get install build-essential
 cd build
 cmake ..
 make

The following compilers were tested and verified to compile/work correctly
in this release or in an earlier release (mostly paq8px):

  - Visual Studio 2019 Community Edition (Windows)
  - MinGW-w64 9.3.0 (Windows)
  - GCC 8.3.0 (Lubuntu)

Note:

 We build and test 64-bit releases, 32-bit releases are seldom built or tested. A known
 limitation of 32-bit releases is the 2 GB memory barrier.
 
--------------
FOR DEVELOPERS
--------------

When you make a new release:

  - Please update the version number in the "Versioning" section in the paq8gen.cpp source file.
  - Please append a short description of your modifications to the CHANGELOG file.
  - Always publish a static build (link the required library files statically).
  - Always publish a build for the general public (e.g. don't use -march=native).
  - Before publishing your build, please carry out some basic tests. Run your tests
    with asserts on (remove the "NDEBUG" option).

paq8gen's People

Contributors

gotthardtz avatar andrew-epstein avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.