Coder Social home page Coder Social logo

detools's Introduction

nala_

About

Binary delta encoding in Python 3.6+ and C.

Based on http://www.daemonology.net/bsdiff/ and HDiffPatch, with the following features:

  • bsdiff, hdiffpatch and match-blocks algorithms.
  • sequential, hdiffpatch or in-place (resumable) patch types.
  • BZ2, LZ4, LZMA, Zstandard, heatshrink or CRLE compression.
  • Sequential patches allow streaming.
  • Maximum file size is 2 GB for the bsdiff algorithm. There is practically no limit for the hdiffpatch and match-blocks algorithms.
  • Incremental apply patch implemented in C, suitable for memory constrained embedded devices. Only the sequential patch type is supported.
  • SA-IS or divsufsort instead of qsufsort for bsdiff.
  • Optional experimental data format aware algorithm for potentially smaller patches. I don't recommend anyone to use this functionality as the gain is small in relation to memory usage and code complexity!

    There is a risk this functionality uses patent https://patents.google.com/patent/EP1988455B1/en. Anyway, this patent expires in August 2019 as I understand it.

    Supported data formats:

    • ARM Cortex-M4
    • AArch64

Project homepage: https://github.com/eerimoq/detools

Documentation: http://detools.readthedocs.org/en/latest

Installation

Statistics

Patch sizes, memory usage (RSS) and elapsed times when creating a patch from Python-3.7.3.tar (79M) to Python-3.8.1.tar (84M) for various algorithm, patch type and compression combinations.

See tests/benchmark.sh for details on how the data was collected.

Algorithm Patch type Compr. Patch size RSS Time
bsdiff sequential lzma

3,5M

662M 0:24.29
bsdiff sequential none

86M

646M 0:15.20
hdiffpatch hdiffpatch lzma

2,4M

523M 0:13.74
hdiffpatch hdiffpatch none

7,2M

523M 0:10.24
match-blocks sequential lzma

2,9M

273M 0:08.57
match-blocks sequential none

84M

273M 0:01.72
match-blocks hdiffpatch lzma

2,6M

212M 0:06.07
match-blocks hdiffpatch none

9,7M

212M 0:01.30

Same as above, but for MicroPython ESP8266 binary releases (from 604k to 615k).

Algorithm Patch type Compr. Patch size RSS Time
bsdiff sequential lzma

71K

46M

0:00.64
bsdiff sequential none

609K

27M

0:00.33
hdiffpatch hdiffpatch lzma

65K

42M

0:00.37
hdiffpatch hdiffpatch none

123K

25M

0:00.32
match-blocks sequential lzma

194K

46M

0:00.44
match-blocks sequential none

606K

25M

0:00.22
match-blocks hdiffpatch lzma

189K

43M

0:00.38
match-blocks hdiffpatch none

313K

24M

0:00.19

Example usage

Examples in C are found in c.

Command line tool

The create patch subcommand

Create a patch foo.patch from tests/files/foo/old to tests/files/foo/new.

$ detools create_patch tests/files/foo/old tests/files/foo/new foo.patch
Successfully created 'foo.patch' in 0.01 seconds!
$ ls -l foo.patch
-rw-rw-r-- 1 erik erik 127 feb  2 10:35 foo.patch

Create the same patch as above, but without compression.

$ detools create_patch --compression none \
      tests/files/foo/old tests/files/foo/new foo-no-compression.patch
Successfully created 'foo-no-compression.patch' in 0 seconds!
$ ls -l foo-no-compression.patch
-rw-rw-r-- 1 erik erik 2792 feb  2 10:35 foo-no-compression.patch

Create a hdiffpatch patch foo-hdiffpatch.patch.

$ detools create_patch --algorithm hdiffpatch --patch-type hdiffpatch \
      tests/files/foo/old tests/files/foo/new foo-hdiffpatch.patch
Successfully created patch 'foo-hdiffpatch.patch' in 0.01 seconds!
$ ls -l foo-hdiffpatch.patch
-rw-rw-r-- 1 erik erik 146 feb  2 10:37 foo-hdiffpatch.patch

Lower memory usage with --algorithm match-blocks algorithm. Mainly useful for big files. Creates slightly bigger patches than bsdiff and hdiffpatch.

$ detools create_patch --algorithm match-blocks \
      tests/files/foo/old tests/files/foo/new foo-hdiffpatch-64.patch
Successfully created patch 'foo-hdiffpatch-64.patch' in 0.01 seconds!
$ ls -l foo-hdiffpatch-64.patch
-rw-rw-r-- 1 erik erik 404 feb  8 11:03 foo-hdiffpatch-64.patch

Non-sequential but smaller patch with --patch-type hdiffpatch.

$ detools create_patch \
      --algorithm match-blocks --patch-type hdiffpatch \
      tests/files/foo/old tests/files/foo/new foo-hdiffpatch-sequential.patch
Successfully created 'foo-hdiffpatch-sequential.patch' in 0.01 seconds!
$ ls -l foo-hdiffpatch-sequential.patch
-rw-rw-r-- 1 erik erik 389 feb  8 11:05 foo-hdiffpatch-sequential.patch

The create in-place patch subcommand

Create an in-place patch foo-in-place.patch.

$ detools create_patch_in_place --memory-size 3000 --segment-size 500 \
      tests/files/foo/old tests/files/foo/new foo-in-place.patch
Successfully created 'foo-in-place.patch' in 0.01 seconds!
$ ls -l foo-in-place.patch
-rw-rw-r-- 1 erik erik 672 feb  2 10:36 foo-in-place.patch

The create bsdiff patch subcommand

Create a bsdiff patch foo-bsdiff.patch, compatible with the original bsdiff program.

$ detools create_patch_bsdiff \
      tests/files/foo/old tests/files/foo/new foo-bsdiff.patch
Successfully created 'foo-bsdiff.patch' in 0 seconds!
$ ls -l foo-bsdiff.patch
-rw-rw-r-- 1 erik erik 261 feb  2 10:36 foo-bsdiff.patch

The apply patch subcommand

Apply the patch foo.patch to tests/files/foo/old to create foo.new.

$ detools apply_patch tests/files/foo/old foo.patch foo.new
Successfully created 'foo.new' in 0 seconds!
$ ls -l foo.new
-rw-rw-r-- 1 erik erik 2780 feb  2 10:38 foo.new

The in-place apply patch subcommand

Apply the in-place patch foo-in-place.patch to foo.mem.

$ cp tests/files/foo/in-place-3000-500.mem foo.mem
$ detools apply_patch_in_place foo.mem foo-in-place.patch
Successfully created 'foo.mem' in 0 seconds!
$ ls -l foo.mem
-rw-rw-r-- 1 erik erik 3000 feb  2 10:40 foo.mem

The bsdiff apply patch subcommand

Apply the patch foo-bsdiff.patch to tests/files/foo/old to create foo.new.

$ detools apply_patch_bsdiff tests/files/foo/old foo-bsdiff.patch foo.new
Successfully created 'foo.new' in 0 seconds!
$ ls -l foo.new
-rw-rw-r-- 1 erik erik 2780 feb  2 10:41 foo.new

The patch info subcommand

Print information about the patch foo.patch.

$ detools patch_info foo.patch
Type:               sequential
Patch size:         127 bytes
To size:            2.71 KiB
Patch/to ratio:     4.6 % (lower is better)
Diff/extra ratio:   9828.6 % (higher is better)
Size/data ratio:    0.3 % (lower is better)
Compression:        lzma

Number of diffs:    2
Total diff size:    2.69 KiB
Average diff size:  1.34 KiB
Median diff size:   1.34 KiB

Number of extras:   2
Total extra size:   28 bytes
Average extra size: 14 bytes
Median extra size:  14 bytes

Contributing

  1. Fork the repository.
  2. Install prerequisites.

    pip install -r requirements.txt
  3. Implement the new feature or bug fix.
  4. Implement test case(s) to ensure that future changes do not break legacy.
  5. Run the tests.

    make test
  6. Create a pull request.

detools's People

Contributors

dependabot-preview[bot] avatar eerimoq avatar rgilton avatar tspivey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

detools's Issues

In-place patching: `memory-size` and `segment-size`

Hello, @eerimoq!

  • Could you please explain the parameters memory-size and segment-size and their usage for the in-place patching mechanism?
  • On what basis do I select these values while creating an in-place patch?
  • Do I need to configure these values in any way while applying patches in the C implementation?
  • Could you also explain the usage of step_set and step_get callbacks in the C implementation?

Thanks a lot!

Memory leak while applying patches

There seems to be a memory leak while applying patches.
Tested on 0.51.0 from PyPI and latest master (3b0a0a7).

Test script:

from io import BytesIO
import detools

l = 50000000
data1 = b'a' * l
data2 = b'b' * l

patch = BytesIO()
detools.create_patch(BytesIO(data1), BytesIO(data2), patch, compression='zstd', patch_type='hdiffpatch', algorithm='hdiffpatch', use_mmap=False)
for i in range(50):
	new = BytesIO()
	print(i)
	patch.seek(0)
	detools.apply_patch(BytesIO(data1), patch, new)

On my system, this uses over 2 GB of memory once it completes (with python -i).

Adding free(temp_cache_p ); to the end of m_apply_patch in hdiffpatch.cpp reduces the usage to under 200 MB. Is this the correct solution?

Patch version info in metadata

Hello, @eerimoq!

I am using detools for patching embedded firmware. Could we make a provision for storing the source firmware version (from which the patch was created) in the patch header/metadata and add an API for fetching the same in the C source?

Thanks a lot!

In-place patching: Minimising the number of fwrite calls

Hello @eerimoq,

I am trying to investigate the integration of detools in an embedded systems project, where I need an in-place patching mechanism.

We are using an SPI (NOR) flash memory and the "flash" writes a whole binary (.bin) file on it. To write a new binary file on it, we were thinking of using the diff in-place patching mechanism provided by detools rather than the orthodox way of writing the whole new binary file.

Our main objective behind this is to reduce the number of fwrite calls needed. Below are the results of a basic analysis I carried out to check the number of fwrite calls needed in an in-place operation:

Created two in_place patches, one with minimum shift size 0 (patch size = 93179 B) and another with default minimum shift size (2*segment_size) (patch size = 11715 B).

(from_file size = 176128 B, to_file size = 174624 B.)

memory size segment size minimum shift size number of fwrite counts
176128 4096 0 4419
176128 4096 8192 2712

(number of fwrite counts was counted by adding a static count variable here)


I had the following questions regarding the same:
  1. Is there a way to reduce the number of fwrite counts? (probably writing just the diff data segments)
  2. I thought one way of reducing the fwrite counts could be by not shifting the memory by minimum-shift-size bytes (ref). Does changing the minimum shift size to 0 make sure that the initial memory shifting does not take place?
  3. Though I tried doing so, surprisingly I found the number of fwrite counts increased. (Or why does the patch size increase when the minimum shift size is 0 (patch size = 93179 B) as compared to when the minimum shift size is 8192 B (patch size = 11715 B) ?)

Could you please help me answer these questions? Also, please do let me know about any suggestions for this integration in my case.

Modify Heatshrink Compression Parameters

Thank you for this amazing project! I ported it for ESP32 (here) and it worked like a charm!

Is there a way to modify the Heatshrink compression parameters (window size, look ahead buffer) from the command-line?
Something like -
detools create_patch -c heatshrink --wbit 8 --lookahead 7...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.