Coder Social home page Coder Social logo

rokath / tcobs Goto Github PK

View Code? Open in Web Editor NEW
22.0 2.0 1.0 2.05 MB

short messages compression with COBS framing using implicit run-length-encoding, optimized for data containing statistically a bit more 0 and FF bytes in a row, as data often carry 16, 32 or 64 bit numbers with small values.

License: MIT License

C 66.57% Go 33.43%
cobs rle compression framing-protocols rcobs rlecobs encoding decoding

tcobs's Introduction

TCOBS v1 & v2

Table of Contents

GitHub All Releases GitHub code size in bytes GitHub watchers Go Report Card PRs Welcome test Coverage Status GitHub issues

1. About The project

./docs/ref/COBSDataDisruption.svg

  • TCOBS is a variant of COBS combined with real-time RLE data compression especially for short messages containing integers.
  • The consistent overhead with TCOBS is 1 byte for each starting 31 bytes in the worst case, when no compression is possible. (Example: A 1000 bytes buffer can be encoded with max 33 additional bytes.) This is more compared to the original COBS with +1 byte for each starting 254 bytes, but if the data contain integer numbers, as communication packets often do, the encoded data will be statistically shorter with TCOBS compared to the legacy COBS.

1.1. Assumptions

  • Most messages like Trices consist of 16 or less bytes.
  • Some messages or user data are longer.
  • Several zeros in a row are a common pattern (example:00 00 00 05).
  • Several 0xFF in a row are a common pattern too (example -1 as 32 bit value).
  • Maybe some other bytes appear also in a row.
  • TCOBS does not know the inner data structure and is therefore usable on any user data.

(back to top)

2. Preface

  • TCOBS was originally developed as an optional Trice part and that's the T is standing for. It aims to reduce the binary trice data together with framing in one step.
    • T symbols also the joining of the 2 orthogonal tasks compression and framing.
    • Additionally, the usage of ternary and quaternary numbers in TCOBSv2 is reflected in the letter T.
  • TCOBSv2 is a better approach for TCOBSv1, suited perfect when long sequences of equal characters occur in the data stream.
    • The TCOBSv1 compression is expected to be not that good as with TCOBSv2.
  • About the data is assumed, that 00-bytes and FF-bytes occur a bit more often than other bytes.
  • The compression aim is more to get a reasonable data reduction with minimal computing effort, than reducing to an absolute minimum. The method shown here simply counts repeated bytes and transforms them into shorter sequences. It works well also on very short messages, like 2 or 4 bytes and on very long buffers. The compressed buffer contains no 00-bytes anymore what is the aim of COBS.
  • TCOBS is stand-alone usable in any project for package framing with data minimizing.
  • Use cases in mind are speed, limited bandwidth and long time data recording in the field.
  • TCOBS is inspired by rlercobs. The ending sigil byte idea comes from rCOBS. It allows a straight forward encoding avoiding lookahead and makes this way the embedded device code simpler.
  • TCOBS uses various chained sigil bytes to achieve an additional lossless compression if possible.
  • Each encoded package ends with a sigil byte.
  • 0 is usable as delimiter byte between the packages containing no 0 anymore. It is up to the user to insert the optional delimiters for framing after each or several packages.

2.1. Why not in 2 steps?

  • Usually it is better to divide this task and do compression and COBS encoding separately. This is good if size and time do not really matter.
  • The for TCOBS expected messages are typically in the range of 2 to 300 bytes, but not limited, and a run-length encoding then makes sense for real-time compression.
  • Separating compression and COBS costs more time (2 processing loops) and does not allow to squeeze out the last byte.
  • With the TCOBS algorithm, in only one processing loop a smaller transfer packet size is expected, combined with more speed.

(back to top)

3. Data Disruption Handling

  • In case of data disruption, the receiver will wait for the next 0-delimiter byte. As a result it will get a packet start and end of 2 different packages A and Z.

    Logo
  • For the decoder it makes no difference if the packages starts or ends with a sigil byte. In any case it will run into issues in such case with high probability and report a data disruption. But a false match is not excluded for 100%.

    • If the decoded data are structured, one can estimate the false match probability and increase the safety with an additional package CRC before encoding, if needed.
  • The receiver calls continuously a Read() function. The received buffer can contain 0-delimited packages and the receiver assumes them all to be valid because there is no known significant time delay between package start and end.

  • If a package start was received and the next package end reception is more than ~100ms away, a data disruption is likely and the receiver should ignore these data.

    • Specify a maximum inter-byte delay inside a single package like ~50ms for example.
    • To minimize the loss in case of data disruption, each message should get TCOBS encoded and 0-byte delimited separately.
    • The more often 0-byte delimiters are increasing the transmit overhead a bit on the other hand.
  • Of course, when the receiver starts, the first buffer can contain broken TCOBS data, but we have to live with that on a PC. Anyway there is a reasonable likelihood that a data inconsistency is detected as explained.

(back to top)

4. Current State

  • The TCOBSv1 & TCOBSv2 code is stable and ready to use without limitations.
Property TCOBSv1 TCOBSv2
Code amount 🟒 less 🟑 more
Speed assumption (not measured yet) 🟒 faster 🟒 fast
Compression on short messages from 2 bytes length 🟒 yes 🟒 yes
Compression on messages with many equal bytes in a row 🟑 good 🟒 better
Encoding C language support 🟒 yes 🟒 yes
Decoding C language support 🟒 yes 🟒 yes
Encoding Go language support 🟑 yes with CGO 🟑 yes with CGO
Decoding Go language support 🟒 yes 🟑 yes with CGO
Other language support πŸ†˜ No πŸ†˜ No

(back to top)

5. Use cases

  • Compression is a wide field and there is a lot of excellent code around.
  • But when it comes to very short messages like up to 100 bytes these algorithms fail for one of two reasons:
    • They rely on a case specific runtime generated dictionary, which must be packed into the compressed data as well.
    • They rely on a common dictionary on encoder and decoder side which then is not needed to be a part of the compressed data. An interesting example is SMAZ. But this method is not usable on arbitrary data.
  • If your packages contain many integers, they have statistically more 0xFF and 0x00 bytes: βœ… that is TCOBS is made for.
  • If your packages contain many equal bytes in a row: βœ… that is TCOBS is made for.
  • If your packages contain statistically mixed byte sequences, like encrypted data: πŸ›‘ that is TCOBS is NOT made for. Such data you frame better simply with COBS, even it is possible with TCOBS. A compression may make sense before the encryption.

(back to top)

6. TCOBS Specification

6.1. TCOBSv1 Specification

↩ See ./docs/TCOBSv1Specification.md.

6.2. TCOBSv2 Specification

↩ See ./docs/TCOBSv2Specification.md.

(back to top)

7. Getting Started

7.1. Folder Overview

Name Content
v1 This is a pure Go TCOBSv1 package. Only decoding is supported. Usable in Go apps, which only need to decode. One example is trice. The Go files inside this folder are copied from Cv1.
Cv1 This is the with C-sources and tests extended v1 folder. It provides a TCOBSv1 Go encoding and decoding using CGO. The C-files are direct usable in an embedded project.
Cv2 Here are the TCOBSv2 C-sources usable in an embedded project. The Go-files are the CGO adaption and the Go tests for TCOBSv2.

7.2. TCOBSv1 Go only decode

  • Add import "github.com/rokath/tcobs/v1" to your go source file.
    • Use function tcobs.Decode OR
    • use function tcobs.NewDecoder and then method Read. See read_test.go for an example.

7.3. TCOBSv1 and TCOBSv2 Go with CGO encode and decode

  • Add import "tcobs github.com/rokath/tcobs/Cv1" or import "tcobs github.com/rokath/tcobs/Cv2" to your go source file.
    • Use functions tcobs.CDecode and tcobs.CEncode OR
    • use functions tcobs.NewDecoder and tcobs.NewEncoder and then methods Read and Write. See read_test.go and write_test.go for an example.

7.4. TCOBSv1 and TCOBSv2 C encode and decode

  • Include the Cv1 or Cv2 C sources in your C project. Check tcobsTest.c for usage example.

(back to top)

8. Roadmap

  • Add Changelog
  • Add back to top links
  • Add Go Reader & Writer interface
  • Add generic CCTN & CCQN conversions to remove TCOBSv2 limitations.
  • Improve testing with groups of equal bytes.
  • Add fuzzing testing.
  • Compare efficiency TCOBSv2 with TCOBSv1.

See the open issues for a full list of proposed features (and known issues).

(back to top)

9. Future improvements?

❓ One could think that arbitrary byte buffer examples could be analyzed concerning the statistical usage of bytes and find that 0xFC...0xFF and 0x00...0x20 are used more often than 0xBD for example. This would allow to code some bytes with 5 bits and others with 11 bits creating a universal table, like huffman encoding. This table than is commonly used and the need to pack it into the compressed buffer disappears. Maybe some 2-byte sequences get also in this table and the table code could get enhanced with run-length codes.

❓ Several such "universal" tables are thinkable and during compression the encoder decides which "universal" table fits best for a specific short buffer. Then the table index must get into the compressed data.

❗ Because these "universal" tables then must reside together with the encoder and the decoder, this will increase the needed code space significantly. Alternatively these tables reside accessible outside the embedded device.

(back to top)

10. Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Working on your first Pull Request? You can learn how from this free series How to Contribute to an Open Source Project on GitHub

(back to top)

11. License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

12. Contact

Thomas HΓΆhenleitner - [email protected] Project Link: https://github.com/rokath/tcobs

13. Acknowledgments

14. Maybe Interesting Too

(back to top)


Logo

TCOBS

Common Object Byte Stuffing with optimized Run-Length Encoding
Explore docs Β»

v1 Code Β· Cv1 Code Β· Cv2 Code Β· Report Bug / Request Feature

tcobs's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

th-sabik

tcobs's Issues

Use positions instead of repetition to compact data better

Using groups of Z as kind of power-of-3 numbers allows further compression:

Z 1 2 3 11 12 13 21 22 23 31 32 33
count 1 2 3 4 5 6 7 8 9 10 11 12
V 0 1 2 00 01 02 10 11 12 20 21 22
value 0 1 2 0 1 2 3 4 5 6 7 8
  • count = 0
  • count = 1 + value, if 1 Z (1=1+0) = 1...3 (3)
  • count = 4 + value, if 2 Z (4=1+3) = 4...12 (9)
  • count = 13 + value if 3 Z (13=1+12) = 13...41 (27)
  • ...
  • Task: Convert a 0x00 count in a Z sequece.

Similar is possible with F and R. The 0xff acts as F1 and the 0xaa acts as R1.

Add reference test data for TCOBS compression benchmarking

TCOBS looks interesting. I would like to compare it against plain COBS, and other algorithms.
Can you please add some reference test data? For example, just a text file, where each line contains bytes in hex:

01 00 00 00
FF FF FF FF
01 02 01 02 AA BB AA BB
...

Of course I can just use a random data set (and this can also be included later for comparison), but reference test data are preferred, for which TCOBS is being developed in the first place. Such reference examples should at least roughly reflect typical data patterns, the approximate min/max length of messages, and serve as initial benchmark data for anyone interested.

p.s. If text files are inconvenient, the data can be saved in any other format more suitable in your opinion.

Add Reader Timeout option for data-disruption cases

Change

func NewDecoder(r io.Reader, size int, multi bool) (p *decoder) 

into

func NewDecoder(r io.Reader, size int, multi bool, timeOut time.Duration) (p *decoder) 

The timeOut, if 0, does not influence the current Read behaviour. Otherwise, if a successfully decoded package needed internally more than the timeOut, Read returns together with the data an error containing a data-disruption warning.

Inconsistend decode data can cause panic.

inconsistent TCOBSv1 buffer: [20 202 5 22 68 213 33 64 158 0]
inconsistent TCOBSv1 buffer: [32 209 0]
inconsistent TCOBSv1 buffer: [8 161 133 0]
inconsistent TCOBSv1 buffer: [8 249 89 0]
panic: runtime error: index out of range [-2]

goroutine 1 [running]:
github.com/rokath/tcobs/v1.repeatByte(...)
        C:/Users/ms/go/pkg/mod/github.com/rokath/tcobs@v0.0.0-20230205132339-8d91e904310e/v1/tcobsDecode.go:127
github.com/rokath/tcobs/v1.Decode({0xc00037a000, 0x10000, 0x10000}, {0xc00031af68, 0x1?, 0xf098})
        C:/Users/ms/go/pkg/mod/github.com/rokath/tcobs@v0.0.0-20230205132339-8d91e904310e/v1/tcobsDecode.go:95 +0x552
github.com/rokath/trice/internal/trexDecoder.(*trexDec).nextPackage(0xc0002fc000)
        C:/repos/trice/internal/trexDecoder/trexDecoder.go:148 +0x5d5
github.com/rokath/trice/internal/trexDecoder.(*trexDec).Read(0xc0002fc000, {0xc00034a000, 0x10000, 0x10000})
        C:/repos/trice/internal/trexDecoder/trexDecoder.go:205 +0x55
github.com/rokath/trice/internal/translator.decodeAndComposeLoop({0xc20420, 0xc000092a08}, 0xc00012dd50, {0xc20e18, 0xc0002fc000}, 0xc0000c7ce8?, 0x8add67?)
        C:/repos/trice/internal/translator/translator.go:143 +0x38c
github.com/rokath/trice/internal/translator.Translate({0xc20420?, 0xc000092a08}, 0xc20280?, 0xc000141830, 0xc0002785b8, 0xc000141980, {0x1f25271ee58?, 0xc0000da540})
        C:/repos/trice/internal/translator/translator.go:79 +0xaa7
github.com/rokath/trice/internal/args.logLoop({0xc20420?, 0xc000092a08}, 0xc000142600)
        C:/repos/trice/internal/args/handler.go:200 +0x730
github.com/rokath/trice/internal/args.Handler({0xc20580, 0xc0000ca008}, 0xc000142600, {0xc0000d4000, 0xc000142600?, 0x0?})
        C:/repos/trice/internal/args/handler.go:100 +0x305
main.doit({0xc20580, 0xc0000ca008}, 0xc00005c000?)
        C:/repos/trice/cmd/trice/main.go:44 +0x165
main.main()
        C:/repos/trice/cmd/trice/main.go:31 +0x50

TCOBSR

Analoguos COBS/R, the last sigil byte could be dropped, when it would be the only one and the then last byte interpreted as sigil byte would cause a sigil byte chain error. When decoding, such situation tells, that the whole message is without any sigil byte.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.