Coder Social home page Coder Social logo

zhang-stephen / nxzip Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 179 KB

The final project in my undergraduate life, a data compressor based on BWT, MTF and Huffman-coding.

License: MIT License

C++ 70.64% C 23.59% CMake 0.84% Assembly 4.93%
huffman-coding rle-compression-algorithm mtf bwt data-compressor cpp17

nxzip's Introduction

Preface


这是我本科阶段最后一个成熟项目,也是我的毕业设计:

设计一款基于BWT的数据压缩程序

思考

目前可供参考的项目是bzip2,但是bzip2并无算法相关文档,仅提供API手册,可供参考之处并不多

bzip2的压缩比大概在80%-85%,但是由于使用了BWT算法,处理速度要比gzip(LZ77/LZ78)等慢一些,此外,压缩时占用的内存大概是解压时的三到七倍

BWT算法唯一的难点在于内存,随着原数组的增长,原理算法占用的内存以指数形式增长,而且遍历原理算法的缓存矩阵的时间复杂度在最好的情况下应该是O($n^3$),因此主要针对BWT算法的时间复杂度和空间复杂度进行优化

文件

  • 后缀名暂定为zipx

工程

language Compiler Project Management Version Control Project Name
c++17(gnu++17) GCC v8.2.0 cmake 3.13.2 github nxzip

任务列表

  • 实现基本的BWT算法
  • 实现基本的IBWT算法
  • 使用quick sort替换bubble sort(注:实际使用merge sort)
  • 实现基本的MTF和IMTF算法
  • 实现Huffman Coding算法
  • 实现范式Huffman Coding算法
  • 实现仅利用STL的文件读写和其他file IO
  • 定义压缩文件的结构
  • 实现软件的CRC-32C校验算法
  • 实现基于SSE4.2指令集的CRC-32C算法

其他

2019-06-09 Freeze This Project 毕业设计业已完成,虽然程序仍有些许错误,但是眼下亦无时间修复订正,留待日后解决 算法的世界是很奇妙的,希望日后仍有机会遨游其中

已知问题:

  1. BWT算法的正变换和逆变换无法对应,导致解压时CRC32校验失败(Fatal Error)
  2. 文件结构中一些冗余成分未能删除

created by vscode, wrote with markdown(MPE), 2019.02.04

nxzip's People

Contributors

zhang-stephen avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.