Coder Social home page Coder Social logo

aaiezza / flick Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 1.4 MB

FLiCK - Format LeveragIng Compression frameworK

License: The Unlicense

Shell 0.21% Java 99.79%
fasta compression-algorithm nucleotides biology chemistry compression dna genetics rna-seq danger

flick's Introduction

FLiCK

 [File] Format Leveraging Compression framework

  A Bioinformatics thesis project by Alessandro Aiezza II
    Defended on July 20, 2016 @ the Rochester Institute of Technology

 Committee
    Dr. Gary Skuse, Dr. Greg Babbitt, Dr. Larry Buckley

 Citation
Aiezza, A.,II. (2016). The FLiCK framework; enabling rapid development and performance benchmarking of compression applications for genetic data files (Order No. 10144070). Available from ProQuest Dissertations & Theses Global. (1825611935). Retrieved from http://search.proquest.com/docview/1825611935?accountid=13567

A Java framework that makes it easier to develop file compressors/decompressors by leveraging ab inito knowledge about a specific file format. FLiCK runs independently as a file compressor and currently will ZIP any files it is given.

A developer can create a module in FLiCK for any file format. A module associates a file's format with one or many file extension names. (For example, the FASTA module will work on files with extenstions .fa, .fasta, and .fna.) When the classes or jar of a FLiCK module is found on the CLASSPATH at runtime, FLiCK will check for all associated file names and use a module's compression algorithm as oppose to the default ZIP algorithm.

FLiCK comes preloaded with FASTA and FASTQ file format compression modules


Usage - users

  1. Download from release page FLiCK Releases
  2. Untarball/unzip contents into a directory on your PATH
  • flick.jar
  • flick (executable)
  • unflick (executable)
  1. You should be ready to go! FLiCK User tutorial unFLiCK User tutorial

Usage - Developers (Module Creation)

  1. Download flick.jar from the releases page and add to CLASSPATH
$ export CLASSPATH=path/to/other/jars:flick.jar
  1. Five classes need to be implemented to create a module:
FileDeflator FileInflator DeflationOptionSet InflationOptionSet FileArchiver
Implementation of the file format compression algorithm Implementation of the file format decompression algorithm Options/flags available for altering the behavior and of the algorithm responsible for file compression Options/flags available for altering the behavior and of the algorithm responsible for file decompression (1) Holds aspects that are important to both the deflator and inflator. (2) Connects other 4 classes together. (3) Declares file extensions the module is appropriate for.
  1. The FileArchiver class must be annotated with the RegisterFileDeflatorInflator class to identify the class names of the other 4 component classes as well as to list what file extensions the module should be used for.
      (It is recommended to jar your implementing classes for ease of use and portability of your module.)

  2. Place your classes (or jar) on the CLASSPATH so that they are visible to FLiCK at runtime.


FASTA and FASTQ File Format Modules come preloaded in FLiCK

The entirety of both these modules exists in the edu.rit.flick.genetics package. The FLiCK [platform] is fully functional and executable without this package, as the package serves as an outside module.

FASTA & FASTQ file format specification

FASTA & FASTQ file format specification

Architecture of FLiCK

FLiCK UML Diagram

Example Module Registration for the FLiCK FASTA compression module

@RegisterFileDeflatorInflator (
    deflatedExtension = FastaFileArchiver.DEFAULT_DEFLATED_FASTA_EXTENSION,
    inflatedExtensions =
{ "fna", "fa", "fasta" },
    fileDeflator = FastaFileDeflator.class,
    fileInflator = FastaFileInflator.class,
    fileDeflatorOptionSet = FastaDeflationOptionSet.class,
    fileInflatorOptionSet = FastaInflationOptionSet.class )
public interface FastaFileArchiver extends FastFileArchiver
{ ...
    public static final String DEFAULT_DEFLATED_FASTA_EXTENSION       = ".flickfa";
... }

More details behind sample FASTA/FASTQ module implementations

The modules use a 2-bit compression algorithm for the nucleotides:

Nucleotide Mapped bits
A 00
C 01
G 10
T 11

Example: ACTGATTACA00011110001111000100 → 123844

FLiCK FASTQ 2-bit compression module performance analysis

Program Average Compression Ratio Average Compression Runtime Average Decompression Runtime
Path Encoding 90.9% - -
LW-FQZip 80.5% 44:39 02:52
FLiCK
(2-bit module)
77.3% 31:55 20:46
gzip 75.6% 19:03 10:24
bzip2 78.3% 32:18 16:33
Quip 77.3% 11:52 01:57
LEON 91.5% 32:10 07:52

FLiCK 2-bit module performance

flick's People

Contributors

aaiezza avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

flick's Issues

Want to see all the modules and all the options

  • There should a be a list flag to list all of the file extensions and modules that exist.
  • Also another flag to show the option set for a given module.
  • Maybe it is OK to have multiple modules for the same extensions (multiple modules issue), but be able to make one of them default and select which one you wish to use with some flag.

Unflick checks for ZipHeaders

Zip4j crashes unflick when given a file that is not a zip file or, more appropriately, a flick file:

C:\git\FLiCK\test_resources>java -cp ..\build\libs\flick-0.4.1.jar edu.rit.flick.util.Unflick -v SRR390728_1.fq net.lingala.zip4j.exception.ZipException: zip headers not found. probably not a zip file at net.lingala.zip4j.core.HeaderReader.readEndOfCentralDirectoryRecord(HeaderReader.java:122) at net.lingala.zip4j.core.HeaderReader.readAllHeaders(HeaderReader.java:78) at net.lingala.zip4j.core.ZipFile.readZipInfo(ZipFile.java:425) at net.lingala.zip4j.core.ZipFile.extractAll(ZipFile.java:475) at net.lingala.zip4j.core.ZipFile.extractAll(ZipFile.java:451) at edu.rit.flick.DefaultFlickFile.inflate(DefaultFlickFile.java:238) at edu.rit.flick.FlickFile.inflate(FlickFile.java:36) at edu.rit.flick.util.Unflick.main(Unflick.java:57)

Altering default compression

Currently, the FLiCK platform performs ZIP compression on files it has no built modules for. There could be integration of gzip and bzip2 and an adjusting flag for what type of compression to perform on these module-less files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.