TCOBS v1 & v2

Table of Contents

TCOBS v1 & v2

1. About The project

TCOBS is a variant of COBS combined with real-time RLE data compression especially for short messages containing integers.
The consistent overhead with TCOBS is 1 byte for each starting 31 bytes in the worst case, when no compression is possible. (Example: A 1000 bytes buffer can be encoded with max 33 additional bytes.) This is more compared to the original COBS with +1 byte for each starting 254 bytes, but if the data contain integer numbers, as communication packets often do, the encoded data will be statistically shorter with TCOBS compared to the legacy COBS.

1.1. Assumptions

Most messages like Trices consist of 16 or less bytes.
Some messages or user data are longer.
Several zeros in a row are a common pattern (example:00 00 00 05).
Several 0xFF in a row are a common pattern too (example -1 as 32 bit value).
Maybe some other bytes appear also in a row.
TCOBS does not know the inner data structure and is therefore usable on any user data.

(back to top)

2. Preface

TCOBS was originally developed as an optional Trice part and that's the T is standing for. It aims to reduce the binary trice data together with framing in one step.
- T symbols also the joining of the 2 orthogonal tasks compression and framing.
- Additionally, the usage of ternary and quaternary numbers in TCOBSv2 is reflected in the letter T.
TCOBSv2 is a better approach for TCOBSv1, suited perfect when long sequences of equal characters occur in the data stream.
- The TCOBSv1 compression is expected to be not that good as with TCOBSv2.
About the data is assumed, that 00-bytes and FF-bytes occur a bit more often than other bytes.
The compression aim is more to get a reasonable data reduction with minimal computing effort, than reducing to an absolute minimum. The method shown here simply counts repeated bytes and transforms them into shorter sequences. It works well also on very short messages, like 2 or 4 bytes and on very long buffers. The compressed buffer contains no 00-bytes anymore what is the aim of COBS.
TCOBS is stand-alone usable in any project for package framing with data minimizing.
Use cases in mind are speed, limited bandwidth and long time data recording in the field.
TCOBS is inspired by rlercobs. The ending sigil byte idea comes from rCOBS. It allows a straight forward encoding avoiding lookahead and makes this way the embedded device code simpler.
TCOBS uses various chained sigil bytes to achieve an additional lossless compression if possible.
Each encoded package ends with a sigil byte.
0 is usable as delimiter byte between the packages containing no 0 anymore. It is up to the user to insert the optional delimiters for framing after each or several packages.

2.1. Why not in 2 steps?

Usually it is better to divide this task and do compression and COBS encoding separately. This is good if size and time do not really matter.
The for TCOBS expected messages are typically in the range of 2 to 300 bytes, but not limited, and a run-length encoding then makes sense for real-time compression.
Separating compression and COBS costs more time (2 processing loops) and does not allow to squeeze out the last byte.
With the TCOBS algorithm, in only one processing loop a smaller transfer packet size is expected, combined with more speed.

(back to top)

3. Data Disruption Handling

In case of data disruption, the receiver will wait for the next 0-delimiter byte. As a result it will get a packet start and end of 2 different packages A and Z.
For the decoder it makes no difference if the packages starts or ends with a sigil byte. In any case it will run into issues in such case with high probability and report a data disruption. But a false match is not excluded for 100%.
- If the decoded data are structured, one can estimate the false match probability and increase the safety with an additional package CRC before encoding, if needed.
The receiver calls continuously a Read() function. The received buffer can contain 0-delimited packages and the receiver assumes them all to be valid because there is no known significant time delay between package start and end.
If a package start was received and the next package end reception is more than ~100ms away, a data disruption is likely and the receiver should ignore these data.
- Specify a maximum inter-byte delay inside a single package like ~50ms for example.
- To minimize the loss in case of data disruption, each message should get TCOBS encoded and 0-byte delimited separately.
- The more often 0-byte delimiters are increasing the transmit overhead a bit on the other hand.
Of course, when the receiver starts, the first buffer can contain broken TCOBS data, but we have to live with that on a PC. Anyway there is a reasonable likelihood that a data inconsistency is detected as explained.

(back to top)

4. Current State

The TCOBSv1 & TCOBSv2 code is stable and ready to use without limitations.

Property	TCOBSv1	TCOBSv2
Code amount	🟢 less	🟡 more
Speed assumption (not measured yet)	🟢 faster	🟢 fast
Compression on short messages from 2 bytes length	🟢 yes	🟢 yes
Compression on messages with many equal bytes in a row	🟡 good	🟢 better
Encoding C language support	🟢 yes	🟢 yes
Decoding C language support	🟢 yes	🟢 yes
Encoding Go language support	🟡 yes with CGO	🟡 yes with CGO
Decoding Go language support	🟢 yes	🟡 yes with CGO
Other language support	🆘 No	🆘 No

(back to top)

5. Use cases

Compression is a wide field and there is a lot of excellent code around.
But when it comes to very short messages like up to 100 bytes these algorithms fail for one of two reasons:
- They rely on a case specific runtime generated dictionary, which must be packed into the compressed data as well.
- They rely on a common dictionary on encoder and decoder side which then is not needed to be a part of the compressed data. An interesting example is SMAZ. But this method is not usable on arbitrary data.
If your packages contain many integers, they have statistically more 0xFF and 0x00 bytes: ✅ that is TCOBS is made for.
If your packages contain many equal bytes in a row: ✅ that is TCOBS is made for.
If your packages contain statistically mixed byte sequences, like encrypted data: 🛑 that is TCOBS is NOT made for. Such data you frame better simply with COBS, even it is possible with TCOBS. A compression may make sense before the encryption.

(back to top)

6. TCOBS Specification

6.1. TCOBSv1 Specification

↩ See ./docs/TCOBSv1Specification.md.

6.2. TCOBSv2 Specification

↩ See ./docs/TCOBSv2Specification.md.

(back to top)

7. Getting Started

7.1. Folder Overview

Name	Content
v1	This is a pure Go TCOBSv1 package. Only decoding is supported. Usable in Go apps, which only need to decode. One example is trice. The Go files inside this folder are copied from Cv1.
Cv1	This is the with C-sources and tests extended v1 folder. It provides a TCOBSv1 Go encoding and decoding using CGO. The C-files are direct usable in an embedded project.
Cv2	Here are the TCOBSv2 C-sources usable in an embedded project. The Go-files are the CGO adaption and the Go tests for TCOBSv2.

7.2. TCOBSv1 Go only decode

Add import "github.com/rokath/tcobs/v1" to your go source file.
- Use function tcobs.Decode OR
- use function tcobs.NewDecoder and then method Read. See read_test.go for an example.

7.3. TCOBSv1 and TCOBSv2 Go with CGO encode and decode

Add import "tcobs github.com/rokath/tcobs/Cv1" or import "tcobs github.com/rokath/tcobs/Cv2" to your go source file.
- Use functions tcobs.CDecode and tcobs.CEncode OR
- use functions tcobs.NewDecoder and tcobs.NewEncoder and then methods Read and Write. See read_test.go and write_test.go for an example.

7.4. TCOBSv1 and TCOBSv2 `C` encode and decode

Include the Cv1 or Cv2 C sources in your C project. Check tcobsTest.c for usage example.

(back to top)

8. Roadmap

Add Changelog
Add back to top links
Add Go Reader & Writer interface
Add generic CCTN & CCQN conversions to remove TCOBSv2 limitations.
Improve testing with groups of equal bytes.
Add fuzzing testing.
Compare efficiency TCOBSv2 with TCOBSv1.

See the open issues for a full list of proposed features (and known issues).

(back to top)

9. Future improvements?

❓ One could think that arbitrary byte buffer examples could be analyzed concerning the statistical usage of bytes and find that 0xFC...0xFF and 0x00...0x20 are used more often than 0xBD for example. This would allow to code some bytes with 5 bits and others with 11 bits creating a universal table, like huffman encoding. This table than is commonly used and the need to pack it into the compressed buffer disappears. Maybe some 2-byte sequences get also in this table and the table code could get enhanced with run-length codes.

❓ Several such "universal" tables are thinkable and during compression the encoder decides which "universal" table fits best for a specific short buffer. Then the table index must get into the compressed data.

❗ Because these "universal" tables then must reside together with the encoder and the decoder, this will increase the needed code space significantly. Alternatively these tables reside accessible outside the embedded device.

(back to top)

10. Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Working on your first Pull Request? You can learn how from this free series How to Contribute to an Open Source Project on GitHub

(back to top)

11. License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

12. Contact

Thomas Höhenleitner - [email protected] Project Link: https://github.com/rokath/tcobs

13. Acknowledgments

14. Maybe Interesting Too

(back to top)

TCOBS

Common Object Byte Stuffing with optimized Run-Length Encoding
Explore docs »

v1 Code · Cv1 Code · Cv2 Code · Report Bug / Request Feature

Z	1	2	3	11	12	13	21	22	23	31	32	33
count	1	2	3	4	5	6	7	8	9	10	11	12
V	0	1	2	00	01	02	10	11	12	20	21	22
value	0	1	2	0	1	2	3	4	5	6	7	8

rokath / tcobs Goto Github PK

tcobs's Introduction

TCOBS v1 & v2

1. About The project

1.1. Assumptions

2. Preface

2.1. Why not in 2 steps?

3. Data Disruption Handling

4. Current State

5. Use cases

6. TCOBS Specification

6.1. TCOBSv1 Specification

6.2. TCOBSv2 Specification

7. Getting Started

7.1. Folder Overview

7.2. TCOBSv1 Go only decode

7.3. TCOBSv1 and TCOBSv2 Go with CGO encode and decode

7.4. TCOBSv1 and TCOBSv2 C encode and decode

8. Roadmap

9. Future improvements?

10. Contributing

11. License

12. Contact

13. Acknowledgments

14. Maybe Interesting Too

TCOBS

tcobs's People

Stargazers

Watchers

Forkers

tcobs's Issues

Coding

Testing

Coding

Testing

Recommend Projects

Recommend Topics

Recommend Org

7.4. TCOBSv1 and TCOBSv2 `C` encode and decode