Coder Social home page Coder Social logo

sisong / hdiffpatch Goto Github PK

View Code? Open in Web Editor NEW
1.4K 48.0 271.0 2.33 MB

a C\C++ library and command-line tools for Diff & Patch between binary files or directories(folder); cross-platform; runs fast; create small delta/differential; support large files and limit memory requires when diff & patch.

License: Other

C++ 52.60% C 46.29% Makefile 0.90% Batchfile 0.01% Shell 0.01% Java 0.07% Objective-C 0.12%
diff patch bsdiff update delta xdelta patcher hdiffpatch differential binary

hdiffpatch's People

Contributors

housisong avatar jayxon avatar rpg3d avatar sisong avatar timgates42 avatar tostis avatar uglym8 avatar wenhailin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hdiffpatch's Issues

代码格式 #rejected

我注意到代码格式比较混乱,比如很多地方行尾有多余空格、有的逗号后面没有空格等等。
可以考虑使用clang-format来自动格式化代码,选项非常丰富,可以配置成各种代码风格,还可以把你的配置存到.clang-format文件中,这样其他人也可以使用同样的代码风格。
各大IDE也都有相应的clang-format插件来自动格式化代码。

支持流式解压缩的补丁过程

可选择压缩算法:压缩和解压算法抽象成接口分别提供给diff和patch过程;
可能的使用场景:获得压缩的补丁后,不用先完全解压(会占用磁盘或内存)就可以直接执行patch过程;
需要设计压缩形式的补丁包格式来支持;(在现实时可能会遇到patch过程中解压算法也会占用内存空间的问题)
额外可能的收获:该功能实现后补丁包有可能会压缩的更小些;

p1 将库的形式升级(添加)为一种工具的形式?

当前的代码提供形式为diff/patch函数库+一些demo演示程序;
而demo程序只能在编译时调节一些参数,而不太能作为一种工具程序来直接使用;

该工具需求为:

  1. 用命令行参数的形式来替换当前的编译参数模式
  2. 提供目录之间的diff/patch支持
  3. 提供zip包(包括apk、jar等)之间的diff/patch支持
  4. 建立一个新的代码仓库?引用该仓库;工具支持发布到多种操作系统;

hdiffz使用-zlib选项被系统kill

楼主对于sqllite或其他类型的db文件diff有没有建议选项?
目前我使用hdiffz -m -c-zlib 选项运行两个db文件会被系统kill掉,每次都是在运行到占用9648640Kb内存时,Log如下:

hdiffz run with compress plugin: "zlib"
oldDataSize : 394596352
newDataSize : 393560064
Command terminated by signal 11
real_time: 248.52(s)
max_rss: 9648640(Kb)
avg_rss: 0(Kb)
avg_mem: 0(Kb)

降低最大解压流内存大小

在缓存足够大的场景下
patch缓存coverClip后,也可以尝试同时缓存rle_loader.ctrlClip?
(如果还有内存可选rle_loader.rleCodeClip?)
用以降低解压内存从而优化最高内存占用;

更好的参数组合

1.用算法自动找到当前模型的更好(均匀)的参数组合;
2.动态的根据数据统计等自动调整更适配的参数?

p1 空间复杂度更低的diff算法实现

以可能增大生成的补丁数据大小为代价(比如使用有间隔的后缀数组就很容易实现,这是一种可能的实现,但空间使用至少还是O(oldSize),速度可能也比较低)

ugly hardcoded include pathes

the source code contains ugly hardcoded #include pathes like:

#   include "../lzma/C/LzmaEnc.h" // http://www.7-zip.org/sdk.html
#   include "../lzma/C/LzmaDec.h" // http://www.7-zip.org/sdk.html

#   include "../lz4/lib/lz4.h"      // https://github.com/lz4/lz4
#   include "../lz4/lib/lz4hc.h"  // https://github.com/lz4/lz4

#   include "../zstd/lib/zstd.h" // https://github.com/facebook/zstd

This is very ugly and cannot easily be integrated into other builds.

Instead, the #includes should be changed to simply

#   include "LzmaEnc.h" // http://www.7-zip.org/sdk.html
#   include "LzmaDec.h" // http://www.7-zip.org/sdk.html

#   include "lz4.h"      // https://github.com/lz4/lz4
#   include "lz4hc.h"  // https://github.com/lz4/lz4

#   include "zstd.h" // https://github.com/facebook/zstd

, and the Makefile should set the proper

   -Isome/header/search/path1
   -Isome/header/search/path2
   -Isome/header/search/path3
   ...

options properly

提供一个bsdiff_wrapper层

提供一个bsdiff4的兼容层:diff支持输出bsdiff格式的补丁 和 patch支持输入bsdiff格式的补丁数据;
用于兼容已经采用了bsdiff的系统切换到hdiffpatch的实现;
为了避免patch代码过于复杂可能会去掉patch_decompress_repeat_out的支持;

cppcheck warnings

Skipping configuration 的那些可以忽略,其他的还有点用

cppcheck --quiet --force --enable=all .
[builds\HDiffPatch\libHDiffPatch\HDiff\private_diff\suffix_string.h:73]: (style) Condition 'sizeof(long)>=sizeof(int32_t)' is always true
[builds\_private_searchBestParams.cpp:113]: (style) The scope of the variable 'have' can be reduced.
[builds\_private_searchBestParams.cpp:258]: (style) The scope of the variable 'bestZipDiffR' can be reduced.
[builds\_private_searchBestParams.cpp:299]: (style) Variable 'bestZipDiffR' is assigned a value that is never used.
[builds\HDiffPatch\libHDiffPatch\HDiff\private_diff\suffix_string.h:73]: (style) Condition 'sizeof(long)>=sizeof(int)' is always true
[libHDiffPatch\HDiff\private_diff\suffix_string.h:73]: (style) Condition 'sizeof(long)>=sizeof(int32_t)' is always true
[libHDiffPatch\HDiff\private_diff\suffix_string.h:73]: (style) Condition 'sizeof(long)>=sizeof(int)' is always true
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:41]: (style) The scope of the variable 'ISAb' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:41]: (style) The scope of the variable 'buf' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:46]: (style) The scope of the variable 'k' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:46]: (style) The scope of the variable 'bufsize' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:200]: (style) The scope of the variable 'c1' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:264]: (style) The scope of the variable 'c1' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:364]: (style) The scope of the variable 'm' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:364]: (style) The scope of the variable 'i' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\trsort.c.inc.h:101]: (style) The scope of the variable 'd' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\trsort.c.inc.h:101]: (style) The scope of the variable 'e' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\trsort.c.inc.h:225]: (style) The scope of the variable 'e' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\trsort.c.inc.h:225]: (style) The scope of the variable 'f' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\trsort.c.inc.h:226]: (style) The scope of the variable 's' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\trsort.c.inc.h:557]: (style) The scope of the variable 'first' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\trsort.c.inc.h:559]: (style) The scope of the variable 'skip' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\trsort.c.inc.h:559]: (style) The scope of the variable 'unsorted' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:196]: (style) The scope of the variable 'd' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:196]: (style) The scope of the variable 'e' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:294]: (style) The scope of the variable 't' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:550]: (style) The scope of the variable 'r' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:601]: (style) The scope of the variable 'r' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:753]: (style) The scope of the variable 'curbuf' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:754]: (style) The scope of the variable 'curbufsize' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:54]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:55]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:65]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:69]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:81]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:84]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:85]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:86]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:87]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:88]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:97]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:100]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:134]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:135]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:136]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:172]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:173]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:175]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:176]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:177]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:180]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:184]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:185]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:205]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:207]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:219]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:220]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:269]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:271]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:283]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:284]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:343]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:344]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:371]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:372]: (information) Skipping configuration 'ALPHABET_SIZE' since the value of 'ALPHABET_SIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:111]: (information) Skipping configuration 'SS_BLOCKSIZE' since the value of 'SS_BLOCKSIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:763]: (information) Skipping configuration 'SS_BLOCKSIZE' since the value of 'SS_BLOCKSIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:766]: (information) Skipping configuration 'SS_BLOCKSIZE' since the value of 'SS_BLOCKSIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:771]: (information) Skipping configuration 'SS_BLOCKSIZE' since the value of 'SS_BLOCKSIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:777]: (information) Skipping configuration 'SS_BLOCKSIZE' since the value of 'SS_BLOCKSIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:778]: (information) Skipping configuration 'SS_BLOCKSIZE' since the value of 'SS_BLOCKSIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:780]: (information) Skipping configuration 'SS_BLOCKSIZE' since the value of 'SS_BLOCKSIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:789]: (information) Skipping configuration 'SS_BLOCKSIZE' since the value of 'SS_BLOCKSIZE' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\sssort.c.inc.h:324]: (information) Skipping configuration 'SS_INSERTIONSORT_THRESHOLD' since the value of 'SS_INSERTIONSORT_THRESHOLD' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:43]: (style) The scope of the variable 'curbuf' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:44]: (style) The scope of the variable 'l' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:49]: (style) The scope of the variable 'd0' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:49]: (style) The scope of the variable 'd1' can be reduced.
[libHDiffPatch\HDiff\private_diff\libdivsufsort\divsufsort.c.inc.h:50]: (style) The scope of the variable 'tmp' can be reduced.
[libHDiffPatch\HDiff\private_diff\sais.hxx:112]: (style) The scope of the variable 'diff' can be reduced.
[libHDiffPatch\HDiff\private_diff\suffix_string.cpp:107]: (style) Variable 'rt' is assigned a value that is never used.
[libHDiffPatch\HPatch\patch.c:170]: (style) Variable 'newPosBack' is assigned a value that is never used.
[libHDiffPatch\HPatch\patch.c:685]: (style) Variable 'newPosBack' is assigned a value that is never used.
[patch_demo.cpp:82]: (style) Condition 'sizeof(long)<=4' is always false
[patch_demo.cpp:201]: (style) C-style pointer casting
[patch_demo.cpp:187]: (style) Struct 'TFileStreamInput' has a constructor with 1 argument that is not explicit.
[builds\_private_searchBestParams.cpp:109]: (style) The function 'zip_decompress' is never used.
(information) Cppcheck cannot find all the include files (use --check-config for details)

合理利用较大内存优化速度的一个patch实现

当前的patch函数的缺点:要么为了最快速度提前加载了整个oldFile到内存,这限制了内存占用下限;而如果为了节约内存就不进行任何的提前加载,内存占用非常小,但速度有点不可控; 所以想实现一个新的策略,利用当前环境可用的内存智能的决定提前缓存oldFile的哪些数据块;

实现途径: 当patch*with_cache函数中提供的内存足够大的时候开启该策略,这样不用增加额外的API;
算法上有一定困难度,需要利用已知的覆盖线和内存限制计算出最优缓存策略(需要考虑磁盘seek和read代价);

undefined reference to `BZ2_bzCompressInit'

hello, 还是有一些问题, Ubuntu16.04下
g++ -O3 -lbz2 -lz hdiffz.cpp libhdiffpatch.a -o hdiffz
/tmp/cc07aHEx.o: In function _bz2_compress_stream(hdiff_TStreamCompress const*, hpatch_TStreamOutput const*, hpatch_TStreamInput const*)': hdiffz.cpp:(.text+0x8c): undefined reference to BZ2_bzCompressInit'
hdiffz.cpp:(.text+0xaf): undefined reference to BZ2_bzCompressEnd' hdiffz.cpp:(.text+0x218): undefined reference to BZ2_bzCompress'
hdiffz.cpp:(.text+0x29b): undefined reference to BZ2_bzCompress' hdiffz.cpp:(.text+0x2cb): undefined reference to BZ2_bzCompressEnd'
hdiffz.cpp:(.text+0x2fe): undefined reference to BZ2_bzCompressEnd' /tmp/cc07aHEx.o: In function _bz2_decompress_part(hpatch_TDecompress const*, void*, unsigned char*, unsigned char*)':
hdiffz.cpp:(.text+0x456): undefined reference to BZ2_bzDecompress' /tmp/cc07aHEx.o: In function _bz2_open(hpatch_TDecompress*, unsigned long, hpatch_TStreamInput const*, unsigned long, unsigned long)':
hdiffz.cpp:(.text+0x516): undefined reference to BZ2_bzDecompressInit' /tmp/cc07aHEx.o: In function _bz2_close(hpatch_TDecompress*, void*)':
hdiffz.cpp:(.text+0x58e): undefined reference to `BZ2_bzDecompressEnd'
collect2: error: ld returned 1 exit status
: recipe for target 'hdiffz' failed
make: *** [hdiffz] Error 1
改成这样可以通过, maintainer改一下makefile吧
g++ -O3 hdiffz.cpp libhdiffpatch.a -lbz2 -lz -o hdiffz

todo:空间复杂度更低的patch算法实现(设定允许使用的内存大小)

思路:(diff数据格式不变)
提供另一个patch接口API:抽象化输入和输出缓冲区;这样的话,patch的调用者可以选择利用磁盘文件读写来回避加载全部文件数据到内存;(可能patch的速度会�略慢)
该方案patch时内存使用量和文件大小无关,内存占用都几乎可以忽略!

js版patch实现

内存版比较容易实现,但stream流式版本比较困难
支持解压多种压缩格式也会遇到一些麻烦;

ps: java\C#等版本?

parallel suffix tree sort optimize diff speed

当前的-m diff约一半以上时间消耗在后缀数组排序上;可以考虑并行实现。
一种是增加一个接口,支持外部用多线程环境并行调用;另外就是内置并行代码(当前的想法是同时支持单线程、openMP、pthread、C++11thread可选);

todo: 支持更大的数据作diff运算

当前要求oldSize和newSize都小于2G; 增大到64bit来表示距离类型的话就不存在限制了,但内存占用会加倍!考虑实现一个40bit的整数类型(实现几个基础运算)来完成支持(<512G),空间占用只需要增加25%(代价是没有64bit的实现方便,速度也可能慢些); (生成的diff数据格式 (补丁包数据)不变;)

更好的参数模型

分析当前的参数变化形成的差异,弄清楚参数起作用和失效的场景,抽象出更好的算法模型.

支持原地更新老数据

磁盘占用可控:下载的补丁包可能不能存储到磁盘,也需要原地更新老数据;(如果磁盘空间可用,patch执行又很快,那“边下载边patch”的需求不太可能单独存在)
内存占用可控:只下载一部分就开始执行补丁过程,分阶段完成patch过程;

缺点:

  • 原地更新老数据 的算法可行性还需要研究(已有初步可行结果,实现代码较复杂);
  • 可能需要定义新的补丁数据格式更好,可能不利于压缩(现有格式也还能用);
  • patch过程中,下载等失败不好复原(失败后业务上需要进入新旧替换更新,不能再使用补丁更新);

完善自动化兼容性测试

自动测试当前版本的各种对外功能;
自动测试和该功能的初始版本的兼容性;
自动测试不同操作系统下的兼容性;

提供git的diff插件

提供一个输出格式兼容git的diff –binary二进制补丁格式;
提供一个插件给git,从而让git支持超大二进制的diff的解决方案;

P3 提供一个diff前的预处理算法 #rejected

. 可能会得到更小的差分数据;
. 以前的Delphi旧版本有该算法,新的开源版本未提供;
. 该算法类似RLE行程压缩,思路是提前压缩掉连续的N(N≥1)字节重复,甚至可以尝试支持不连续但N较大时的重复字节序列;
. patch时生成的数据需要进行逆算法;
. 数据编码格式会有不同,新的patch可以兼容旧的diff,但旧版本的patch无法兼容新的diff; 实际应用时需要一个渐进更新过程;

更好的覆盖线算法 #rejected

当前的算法选择的覆盖线没有重叠(贪心算法);
可以考虑寻找更短的覆盖线,并允许重叠,(考虑编码代价权重)然后优化一个路径算法.

make 编译不过

编译出错, 环境:ubuntu LTS 16.04

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)

g++ -O3 -lbz2 -lz -c -o libHDiffPatch/HDiff/diff.o libHDiffPatch/HDiff/diff.cpp
In file included from libHDiffPatch/HDiff/diff.cpp:29:0:
libHDiffPatch/HDiff/diff.h:63:18: error: expected identifier before ‘’ token
size_t (maxCompressedSize)(const hdiff_TCompress compressPlugin,size_t dataSize);
^
libHDiffPatch/HDiff/diff.h:63:76: error: ‘size_t’ has not been declared
size_t (maxCompressedSize)(const hdiff_TCompress compressPlugin,size_t dataSize);
^
libHDiffPatch/HDiff/diff.h:63:91: error: ISO C++ forbids declaration of ‘size_t’ with no type [-fpermissive]
size_t (maxCompressedSize)(const hdiff_TCompress compressPlugin,size_t dataSize);
^
libHDiffPatch/HDiff/diff.h:63:91: error: ‘size_t’ declared as function returning a function
libHDiffPatch/HDiff/diff.h:65:27: error: expected identifier before ‘
’ token
size_t (compress)(const hdiff_TCompress compressPlugin,
^
libHDiffPatch/HDiff/diff.h:67:93: error: ISO C++ forbids declaration of ‘size_t’ with no type [-fpermissive]
const unsigned char* data,const unsigned char* data_end);
^
libHDiffPatch/HDiff/diff.h:67:93: error: ‘size_t’ declared as function returning a function
libHDiffPatch/HDiff/diff.h:132:36: error: ‘size_t’ has not been declared
size_t kMatchBlockSize=kMatchBlockSize_default);
^
libHDiffPatch/HDiff/diff.cpp: In function ‘void {anonymous}::do_compress(std::vector&, const std::vector&, const hdiff_TCompress*)’:
libHDiffPatch/HDiff/diff.cpp:423:44: error: ‘const hdiff_TCompress {aka const struct hdiff_TCompress}’ has no member named ‘maxCompressedSize’
size_t maxCodeSize=compressPlugin->maxCompressedSize(compressPlugin,data.size());
^
libHDiffPatch/HDiff/diff.cpp:428:41: error: ‘const hdiff_TCompress {aka const struct hdiff_TCompress}’ has no member named ‘compress’
size_t codeSize=compressPlugin->compress(compressPlugin,
^
: recipe for target 'libHDiffPatch/HDiff/diff.o' failed
make: *** [libHDiffPatch/HDiff/diff.o] Error 1

优化getBestMatch的速度

比较慢的主要场景:old和new数据没有什么相关性的的区域,一般是在输入随机值数据、压缩后的数据(如zip,png)等情况;

针对可执行程序文件的特殊优化算法的实现

相信会得到更小的diff数据;但这样代码就和不同平台的可执行程序文件结构有耦合,比如windows平台需要处理PE文件结构...

原理:1. 可执行文件在系统加载到内存的时候,能够加载到不同的基地址,这时系统会修正很多地址值;修正的原理正是我们可以利用(实现一个预处理算法)的地方...
2. 对于短跳转造成的diff数据,可以尝试识别出相同的偏移量区段(统计其规律,增或删触发)作预处理...

优化流式diff的结果大小

当前用roll hash的匹配方案实现了流式diff,速度ok;但输出大小没有尝试优化,可以选择放弃一些速度。
可能的方向: 1. 选择更多个可能的匹配位置,当前的实现只测试了可能最长的2个hash值位置; 2. 得到的cover线可以尝试向2边延长;3. 优先特殊匹配最前和最后位置优化速度 4. 可能考虑link线的合并是否有效? 5. 现在匹配时用的贪心算法,可以考虑优先处理更长的匹配位置?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.