catid / wirehair Goto Github PK

View Code? Open in Web Editor NEW

248.0 26.0 57.0 1.68 MB

Wirehair : O(N) Fountain Code for Large Data

Home Page: http://wirehairfec.com

License: BSD 3-Clause "New" or "Revised" License

C++ 93.76% C 4.55% CMake 0.59% Python 1.10%

wirehair's Introduction

Wirehair

Fast and Portable Fountain Codes in C

Wirehair produces a stream of error correction blocks from a data source using an erasure code. When enough of these blocks are received, the original data can be recovered.

As compared to other similar libraries, an unlimited number of error correction blocks can be produced, and much larger block counts are supported. Furthermore, it gets slower as O(N) in the amount of input data rather than O(N Log N) like the Leopard block code or O(N^2) like the Fecal fountain code, so it is well-suited for large data.

This is not an ideal MDS code, so sometimes it will fail to recover N original data packets from N symbol packets. It may take N + 1 or N + 2 or more. On average it takes about N + 0.02 packets to recover. Overall the overhead from the code inefficiency is low, compared to LDPC and many other fountain codes.

A simple C API is provided to make it easy to incorporate into existing projects. No external dependencies are required.

Building: Quick Setup

The source code in this folder (gf256 and wirehair code) can be incorporated into your project without any other external dependencies.

To build the software in this repo:

On Windows, make sure CMake and Git Bash are installed. Open up git bash and then:

git clone [email protected]:catid/wirehair.git
cd wirehair
mkdir build
cd build
cmake .. -G "Visual Studio 16 2019"
ls
explorer .

Then you can use Visual Studio Community Edition to open up the wirehair.sln file and build the software.

Example Usage

Here's an example program using Wirehair. It's included in the UnitTest project and demonstrates both the sender and receiver, which are normally separate programs. For example the data sender might be a file server and the data receiver might be downloading a file from the sender.

#include <wirehair/wirehair.h>

static bool ReadmeExample()
{
    // Size of packets to produce
    static const int kPacketSize = 1400;

    // Note: Does not need to be an even multiple of packet size or 16 etc
    static const int kMessageBytes = 1000 * 1000 + 333;

    vector<uint8_t> message(kMessageBytes);

    // Fill message contents
    memset(&message[0], 1, message.size());

    // Create encoder
    WirehairCodec encoder = wirehair_encoder_create(nullptr, &message[0], kMessageBytes, kPacketSize);
    if (!encoder)
    {
        cout << "!!! Failed to create encoder" << endl;
        return false;
    }

    // Create decoder
    WirehairCodec decoder = wirehair_decoder_create(nullptr, kMessageBytes, kPacketSize);
    if (!decoder)
    {
        // Free memory for encoder
        wirehair_free(encoder);

        cout << "!!! Failed to create decoder" << endl;
        return false;
    }

    unsigned blockId = 0, needed = 0;

    for (;;)
    {
        // Select which block to encode.
        // Note: First N blocks are the original data, so it's possible to start
        // sending data while wirehair_encoder_create() is getting started.
        blockId++;

        // Simulate 10% packetloss
        if (blockId % 10 == 0) {
            continue;
        }

        // Keep track of how many pieces were needed
        ++needed;

        vector<uint8_t> block(kPacketSize);

        // Encode a packet
        uint32_t writeLen = 0;
        WirehairResult encodeResult = wirehair_encode(
            encoder, // Encoder object
            blockId, // ID of block to generate
            &block[0], // Output buffer
            kPacketSize, // Output buffer size
            &writeLen); // Returned block length

        if (encodeResult != Wirehair_Success)
        {
            cout << "wirehair_encode failed: " << encodeResult << endl;
            return false;
        }

        // Attempt decode
        WirehairResult decodeResult = wirehair_decode(
            decoder, // Decoder object
            blockId, // ID of block that was encoded
            &block[0], // Input block
            writeLen); // Block length

        // If decoder returns success:
        if (decodeResult == Wirehair_Success) {
            // Decoder has enough data to recover now
            break;
        }

        if (decodeResult != Wirehair_NeedMore)
        {
            cout << "wirehair_decode failed: " << decodeResult << endl;
            return false;
        }
    }

    vector<uint8_t> decoded(kMessageBytes);

    // Recover original data on decoder side
    WirehairResult decodeResult = wirehair_recover(
        decoder,
        &decoded[0],
        kMessageBytes);

    if (decodeResult != Wirehair_Success)
    {
        cout << "wirehair_recover failed: " << decodeResult << endl;
        return false;
    }

    // Free memory for encoder and decoder
    wirehair_free(encoder);
    wirehair_free(decoder);

    return true;
}

int main()
{
    const WirehairResult initResult = wirehair_init();

    if (initResult != Wirehair_Success)
    {
        SIAMESE_DEBUG_BREAK();
        cout << "!!! Wirehair initialization failed: " << initResult << endl;
        return -1;
    }

    if (!ReadmeExample())
    {
        SIAMESE_DEBUG_BREAK();
        cout << "!!! Example usage failed" << endl;
        return -2;
    }
...

Benchmarks

Some quick comments:

Benchmarks on my PC do not mean a whole lot. Right now it's clocked at 3 GHz and has Turbo Boost on, etc. To run the test yourself just build and run the UnitTest project in Release mode.

For small values of N < 128 or so this is a pretty inefficient codec compared to the Fecal codec. Fecal is also a fountain code but is limited to repairing a small number of failures or small input block count.

For N = 2 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 11 usec (236.364 MBPS)
+ Average wirehair_encode() time: 0 usec (7435.7 MBPS)
+ Average wirehair_decode() time: 2 usec (476.205 MBPS)
+ Average overhead piece count beyond N = 0.0105
+ Average wirehair_recover() time: 0 usec (9319 MBPS)

For N = 4 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 8 usec (650 MBPS)
+ Average wirehair_encode() time: 0 usec (8353.43 MBPS)
+ Average wirehair_decode() time: 1 usec (695.102 MBPS)
+ Average overhead piece count beyond N = 0.0225
+ Average wirehair_recover() time: 0 usec (11219 MBPS)

For N = 8 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 13 usec (800 MBPS)
+ Average wirehair_encode() time: 0 usec (7916.2 MBPS)
+ Average wirehair_decode() time: 1 usec (704.359 MBPS)
+ Average overhead piece count beyond N = 0.0045
+ Average wirehair_recover() time: 1 usec (8973.25 MBPS)

For N = 16 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 27 usec (770.37 MBPS)
+ Average wirehair_encode() time: 0 usec (7993.4 MBPS)
+ Average wirehair_decode() time: 1 usec (707.211 MBPS)
+ Average overhead piece count beyond N = 0.036
+ Average wirehair_recover() time: 2 usec (9116.81 MBPS)

For N = 32 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 41 usec (1014.63 MBPS)
+ Average wirehair_encode() time: 0 usec (7062.93 MBPS)
+ Average wirehair_decode() time: 1 usec (908.097 MBPS)
+ Average overhead piece count beyond N = 0.0195
+ Average wirehair_recover() time: 5 usec (8057.33 MBPS)

For N = 64 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 81 usec (1027.16 MBPS)
+ Average wirehair_encode() time: 0 usec (7159.51 MBPS)
+ Average wirehair_decode() time: 1 usec (1033.95 MBPS)
+ Average overhead piece count beyond N = 0.017
+ Average wirehair_recover() time: 10 usec (7640.74 MBPS)

For N = 128 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 192 usec (866.667 MBPS)
+ Average wirehair_encode() time: 0 usec (5662.07 MBPS)
+ Average wirehair_decode() time: 1 usec (870.14 MBPS)
+ Average overhead piece count beyond N = 0.015
+ Average wirehair_recover() time: 25 usec (6419.38 MBPS)

For N = 256 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 319 usec (1043.26 MBPS)
+ Average wirehair_encode() time: 0 usec (6333.2 MBPS)
+ Average wirehair_decode() time: 1 usec (1018.77 MBPS)
+ Average overhead piece count beyond N = 0.022
+ Average wirehair_recover() time: 50 usec (6602.26 MBPS)

For N = 512 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 670 usec (993.433 MBPS)
+ Average wirehair_encode() time: 0 usec (6483.91 MBPS)
+ Average wirehair_decode() time: 1 usec (1028.85 MBPS)
+ Average overhead piece count beyond N = 0.022
+ Average wirehair_recover() time: 100 usec (6600.1 MBPS)

For N = 1024 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 1697 usec (784.443 MBPS)
+ Average wirehair_encode() time: 0 usec (5309.05 MBPS)
+ Average wirehair_decode() time: 1 usec (671.005 MBPS)
+ Average overhead piece count beyond N = 0.022
+ Average wirehair_recover() time: 207 usec (6404.05 MBPS)

For N = 2048 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 3227 usec (825.039 MBPS)
+ Average wirehair_encode() time: 0 usec (5202.3 MBPS)
+ Average wirehair_decode() time: 1 usec (683.141 MBPS)
+ Average overhead piece count beyond N = 0.021
+ Average wirehair_recover() time: 441 usec (6026.08 MBPS)

For N = 4096 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 7614 usec (699.343 MBPS)
+ Average wirehair_encode() time: 0 usec (4334.08 MBPS)
+ Average wirehair_decode() time: 2 usec (577.674 MBPS)
+ Average overhead piece count beyond N = 0.0215
+ Average wirehair_recover() time: 1208 usec (4405.65 MBPS)

For N = 8192 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 17208 usec (618.875 MBPS)
+ Average wirehair_encode() time: 0 usec (3277.17 MBPS)
+ Average wirehair_decode() time: 2 usec (521.665 MBPS)
+ Average overhead piece count beyond N = 0.075
+ Average wirehair_recover() time: 2916 usec (3651.35 MBPS)

For N = 16384 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 42512 usec (501.016 MBPS)
+ Average wirehair_encode() time: 0 usec (2646.89 MBPS)
+ Average wirehair_decode() time: 2 usec (435.173 MBPS)
+ Average overhead piece count beyond N = 0.015
+ Average wirehair_recover() time: 7282 usec (2924.63 MBPS)

For N = 32768 packets of 1300 bytes:
+ Average wirehair_encoder_create() time: 111287 usec (382.78 MBPS)
+ Average wirehair_encode() time: 0 usec (2378.29 MBPS)
+ Average wirehair_decode() time: 3 usec (342.556 MBPS)
+ Average overhead piece count beyond N = 0.0195
+ Average wirehair_recover() time: 16326 usec (2609.23 MBPS)

Credits

Software by Christopher A. Taylor [email protected]

Please reach out if you need support or would like to collaborate on a project.

wirehair's People

Contributors

Stargazers

Watchers

Forkers

shteou sheps dumpforjunk tmick0 jjfeing stevebernard sakridge chlangjou templeblock losynix whzhou drneurosurg danieagle foobar2019 chaser-wind girgitt rjharmon dorlan gitai feiyunwill 516025 wahern 3v1lw1th1n zeta1999 taoistking h-bo hanseuljun blairtyx wangjinzhouqw externpro ottopia-tech igorauad ehsanen manifoldx alan-ic eyecon pfect ochafik maatticlabs fortitudepub jacobgorm gholdzhang ycyang0508 xingangl alcsyooterranf 1f604 0xhellord xdcesc d9pouces politesse724 neatsys jamestiotio

wirehair's Issues

Add comprehensive error logging

Hello again. I've finally found some time to create Kotlin wrapper around wirehair. Here it's first raw version - https://github.com/seniorjoinu/wirehair-wrapper

There is a test https://github.com/seniorjoinu/wirehair-wrapper/blob/master/src/test/kotlin/net/joinu/wirehair/ReadmeTest.kt that copies the test from wirehair's README.md
Problem is - it doesn't work and I cannot debug it, because there are no error logs with details. Right now it fails at

val encoder = Wirehair.WRAPPER.wirehair_encoder_create(null, message, kMessageBytes, kPacketSize) // it returns null and doesn't says why

Is there a way to run on the arm architecture?

Thank you for sharing a good implementation. I try to compile the code on the raspberry but I get: "fatal error: tmmintrin.h: No such file or directory" in gf256.h. Is it possible to encode and decoder blocks on the arm architecture? Thanks!

Encoder/Decoder usage question (wirehair_test.cpp)

No issues

Recent ARM Neon detection change breaks IOS build

commit 4e1c039 forces the definition of LINUX_ARM if gf256.h, which in turns triggers unconditional inclusion of elf.h and other linux-specific header files, breaking build on IOS.

This change would seem to fix the issue:

diff --git a/gf256.h b/gf256.h
index adc3a27..8396a44 100644
--- a/gf256.h
+++ b/gf256.h
@@ -54,7 +54,9 @@
 // Platform/Architecture
 
 #if defined(__ARM_ARCH) || defined(__ARM_NEON) || defined(__ARM_NEON__)
-    #define LINUX_ARM
+    #if !defined IOS
+        #define LINUX_ARM
+    #endif
 #endif

Struggling to build on Ubuntu 20.04

Hi,

I have tried to build the master branch at commit e42951f0b5601d867695921cdd5e8a403188df56

It fails with the following commands and results, do you have an idea why?

➜  wirehair git:(master) ✗ cd build  
➜  build git:(master) ✗ cmake ..
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/REDACTED/git_repositories/wirehair/build
➜  build git:(master) ✗ make
Scanning dependencies of target gen_tables
[  4%] Building CXX object CMakeFiles/gen_tables.dir/test/SiameseTools.cpp.o
[  8%] Building CXX object CMakeFiles/gen_tables.dir/tables/TableGenerator.cpp.o
/home/REDACTED/git_repositories/wirehair/tables/TableGenerator.cpp: In function ‘void ShuffleDeck16(siamese::PCGRandom&, uint16_t*, uint32_t)’:
/home/REDACTED/git_repositories/wirehair/tables/TableGenerator.cpp:1070:17: warning: this statement may fall through [-Wimplicit-fallthrough=]
 1070 |                 ++ii;
      |                 ^~~~
/home/REDACTED/git_repositories/wirehair/tables/TableGenerator.cpp:1071:13: note: here
 1071 |             case 2:
      |             ^~~~
/home/REDACTED/git_repositories/wirehair/tables/TableGenerator.cpp:1075:17: warning: this statement may fall through [-Wimplicit-fallthrough=]
 1075 |                 ++ii;
      |                 ^~~~
/home/REDACTED/git_repositories/wirehair/tables/TableGenerator.cpp:1076:13: note: here
 1076 |             case 1:
      |             ^~~~
[ 12%] Building CXX object CMakeFiles/gen_tables.dir/tables/HeavyRowGenerator.cpp.o
/home/REDACTED/git_repositories/wirehair/tables/HeavyRowGenerator.cpp: In function ‘void PrintMatrix(const uint8_t*)’:
/home/REDACTED/git_repositories/wirehair/tables/HeavyRowGenerator.cpp:212:20: warning: unused variable ‘modulus’ [-Wunused-variable]
  212 |     const unsigned modulus = 8;
      |                    ^~~~~~~
[ 16%] Building CXX object CMakeFiles/gen_tables.dir/gf256.cpp.o
/home/REDACTED/git_repositories/wirehair/gf256.cpp: In function ‘void gf256_add_mem(void*, const void*, int)’:
/home/REDACTED/git_repositories/wirehair/gf256.cpp:821:28: warning: this statement may fall through [-Wimplicit-fallthrough=]
  821 |     case 3: x1[offset + 2] ^= y1[offset + 2];
      |             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:822:5: note: here
  822 |     case 2: x1[offset + 1] ^= y1[offset + 1];
      |     ^~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:822:28: warning: this statement may fall through [-Wimplicit-fallthrough=]
  822 |     case 2: x1[offset + 1] ^= y1[offset + 1];
      |             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:823:5: note: here
  823 |     case 1: x1[offset] ^= y1[offset];
      |     ^~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp: In function ‘void gf256_add2_mem(void*, const void*, const void*, int)’:
/home/REDACTED/git_repositories/wirehair/gf256.cpp:941:28: warning: this statement may fall through [-Wimplicit-fallthrough=]
  941 |     case 3: z1[offset + 2] ^= x1[offset + 2] ^ y1[offset + 2];
      |             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:942:5: note: here
  942 |     case 2: z1[offset + 1] ^= x1[offset + 1] ^ y1[offset + 1];
      |     ^~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:942:28: warning: this statement may fall through [-Wimplicit-fallthrough=]
  942 |     case 2: z1[offset + 1] ^= x1[offset + 1] ^ y1[offset + 1];
      |             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:943:5: note: here
  943 |     case 1: z1[offset] ^= x1[offset] ^ y1[offset];
      |     ^~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp: In function ‘void gf256_addset_mem(void*, const void*, const void*, int)’:
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1096:28: warning: this statement may fall through [-Wimplicit-fallthrough=]
 1096 |     case 3: z1[offset + 2] = x1[offset + 2] ^ y1[offset + 2];
      |             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1097:5: note: here
 1097 |     case 2: z1[offset + 1] = x1[offset + 1] ^ y1[offset + 1];
      |     ^~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1097:28: warning: this statement may fall through [-Wimplicit-fallthrough=]
 1097 |     case 2: z1[offset + 1] = x1[offset + 1] ^ y1[offset + 1];
      |             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1098:5: note: here
 1098 |     case 1: z1[offset] = x1[offset] ^ y1[offset];
      |     ^~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp: In function ‘void gf256_mul_mem(void*, const void*, uint8_t, int)’:
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1260:28: warning: this statement may fall through [-Wimplicit-fallthrough=]
 1260 |     case 3: z1[offset + 2] = table[x1[offset + 2]];
      |             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1261:5: note: here
 1261 |     case 2: z1[offset + 1] = table[x1[offset + 1]];
      |     ^~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1261:28: warning: this statement may fall through [-Wimplicit-fallthrough=]
 1261 |     case 2: z1[offset + 1] = table[x1[offset + 1]];
      |             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1262:5: note: here
 1262 |     case 1: z1[offset] = table[x1[offset]];
      |     ^~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp: In function ‘void gf256_muladd_mem(void*, uint8_t, const void*, int)’:
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1489:28: warning: this statement may fall through [-Wimplicit-fallthrough=]
 1489 |     case 3: z1[offset + 2] ^= table[x1[offset + 2]];
      |             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1490:5: note: here
 1490 |     case 2: z1[offset + 1] ^= table[x1[offset + 1]];
      |     ^~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1490:28: warning: this statement may fall through [-Wimplicit-fallthrough=]
 1490 |     case 2: z1[offset + 1] ^= table[x1[offset + 1]];
      |             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1491:5: note: here
 1491 |     case 1: z1[offset] ^= table[x1[offset]];
      |     ^~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp: In function ‘void gf256_memswap(void*, void*, int)’:
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1562:84: warning: this statement may fall through [-Wimplicit-fallthrough=]
 1562 |     case 3: temp = x1[offset + 2]; x1[offset + 2] = y1[offset + 2]; y1[offset + 2] = temp;
      |                                                                     ~~~~~~~~~~~~~~~^~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1563:5: note: here
 1563 |     case 2: temp = x1[offset + 1]; x1[offset + 1] = y1[offset + 1]; y1[offset + 1] = temp;
      |     ^~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1563:84: warning: this statement may fall through [-Wimplicit-fallthrough=]
 1563 |     case 2: temp = x1[offset + 1]; x1[offset + 1] = y1[offset + 1]; y1[offset + 1] = temp;
      |                                                                     ~~~~~~~~~~~~~~~^~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1564:5: note: here
 1564 |     case 1: temp = x1[offset]; x1[offset] = y1[offset]; y1[offset] = temp;
      |     ^~~~
In file included from /home/REDACTED/git_repositories/wirehair/gf256.h:70,
                 from /home/REDACTED/git_repositories/wirehair/gf256.cpp:30:
/usr/lib/gcc/x86_64-linux-gnu/9/include/tmmintrin.h: In function ‘void gf256_mul_mem(void*, const void*, uint8_t, int)’:
/usr/lib/gcc/x86_64-linux-gnu/9/include/tmmintrin.h:136:1: error: inlining failed in call to always_inline ‘__m128i _mm_shuffle_epi8(__m128i, __m128i)’: target specific option mismatch
  136 | _mm_shuffle_epi8 (__m128i __X, __m128i __Y)
      | ^~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1197:34: note: called from here
 1197 |             h0 = _mm_shuffle_epi8(table_hi_y, h0);
      |                  ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
In file included from /home/REDACTED/git_repositories/wirehair/gf256.h:70,
                 from /home/REDACTED/git_repositories/wirehair/gf256.cpp:30:
/usr/lib/gcc/x86_64-linux-gnu/9/include/tmmintrin.h:136:1: error: inlining failed in call to always_inline ‘__m128i _mm_shuffle_epi8(__m128i, __m128i)’: target specific option mismatch
  136 | _mm_shuffle_epi8 (__m128i __X, __m128i __Y)
      | ^~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1196:34: note: called from here
 1196 |             l0 = _mm_shuffle_epi8(table_lo_y, l0);
      |                  ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
In file included from /home/REDACTED/git_repositories/wirehair/gf256.h:70,
                 from /home/REDACTED/git_repositories/wirehair/gf256.cpp:30:
/usr/lib/gcc/x86_64-linux-gnu/9/include/tmmintrin.h:136:1: error: inlining failed in call to always_inline ‘__m128i _mm_shuffle_epi8(__m128i, __m128i)’: target specific option mismatch
  136 | _mm_shuffle_epi8 (__m128i __X, __m128i __Y)
      | ^~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1197:34: note: called from here
 1197 |             h0 = _mm_shuffle_epi8(table_hi_y, h0);
      |                  ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
In file included from /home/REDACTED/git_repositories/wirehair/gf256.h:70,
                 from /home/REDACTED/git_repositories/wirehair/gf256.cpp:30:
/usr/lib/gcc/x86_64-linux-gnu/9/include/tmmintrin.h:136:1: error: inlining failed in call to always_inline ‘__m128i _mm_shuffle_epi8(__m128i, __m128i)’: target specific option mismatch
  136 | _mm_shuffle_epi8 (__m128i __X, __m128i __Y)
      | ^~~~~~~~~~~~~~~~
/home/REDACTED/git_repositories/wirehair/gf256.cpp:1196:34: note: called from here
 1196 |             l0 = _mm_shuffle_epi8(table_lo_y, l0);
      |                  ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/gen_tables.dir/build.make:102: CMakeFiles/gen_tables.dir/gf256.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:88: CMakeFiles/gen_tables.dir/all] Error 2
make: *** [Makefile:130: all] Error

Python demo script causes segfault with Python2

Running on 12-thread i7 in 64-bit Linux (Ubuntu Bionic). Compiled and installed libwirehair-shared.so and ran python2 whirehair.py:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5d6937e in wirehair::Codec::Encode (this=0x55b6ba50, block_id=1, block_out=0x555555bb2ef0, out_buffer_bytes=32)
    at /home/(redacted)/software/external/wirehair/WirehairCodec.cpp:4051
4051	    if ((uint16_t)block_id == _block_count - 1) {

GDB stack trace:

#0  0x00007ffff5d6937e in wirehair::Codec::Encode (this=0x55b6ba50, block_id=1, block_out=0x555555bb2ef0, out_buffer_bytes=32)
    at /home/(redacted)/software/external/wirehair/WirehairCodec.cpp:4051
#1  0x00007ffff5d59af4 in wirehair_encode (codec=0x55b6ba50, blockId=1, blockDataOut=0x555555bb2ef0, outBytes=32, dataBytesOut=0x7ffff7ec9910)
    at /home/(redacted)/software/external/wirehair/wirehair.cpp:139
#2  0x00007ffff5f9bdae in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#3  0x00007ffff5f9b71f in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#4  0x00007ffff61aead4 in _ctypes_callproc () from /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
#5  0x00007ffff61ae4d5 in ?? () from /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
#6  0x000055555564df9e in PyEval_EvalFrameEx ()
#7  0x0000555555646b0a in PyEval_EvalCodeEx ()
#8  0x0000555555646429 in PyEval_EvalCode ()
#9  0x00005555556764cf in ?? ()
#10 0x0000555555671442 in PyRun_FileExFlags ()
#11 0x00005555556708bd in PyRun_SimpleFileExFlags ()
#12 0x000055555562075b in Py_Main ()
#13 0x00007ffff7a05b97 in __libc_start_main (main=0x5555556200c0 <main>, argc=2, argv=0x7fffffffde28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffde18) at ../csu/libc-start.c:310
#14 0x000055555561ffda in _start ()

Works fine in Python 3. I am currently debugging.

Practicality of supporting more than 65535 blocks?

I've got a 900MB file that I want to break into UDP-sized blocks that are ~1400 bytes in size. Thus, I need over 600,000 blocks, but the various block counters in the codec are limited to uint16_t. Actually this silently fails, which is probably a bug, but I was wondering whether it would be possible to support this, or if I'm asking for a horrible memory explosion in attempting to do that.

android platform armv7-a-neon issue

09-18 18:54:44.590 11127 11127 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
09-18 18:54:44.590 11127 11127 F DEBUG : Build fingerprint: 'OneCloud/onething_onecloudpro/onething_onecloudpro:7.1.2/NHG47L/20180910:userdebug/test-keys'
09-18 18:54:44.590 11127 11127 F DEBUG : Revision: '0'
09-18 18:54:44.591 11127 11127 F DEBUG : ABI: 'arm'
09-18 18:54:44.591 11127 11127 F DEBUG : pid: 11124, tid: 11124, name: gen_dcounts >>> ./gen_dcounts <<<
09-18 18:54:44.591 11127 11127 F DEBUG : signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0xea2b105d
09-18 18:54:44.591 11127 11127 F DEBUG : r0 ea2b105d r1 aad3c390 r2 00000000 r3 ea2b1059
09-18 18:54:44.591 11127 11127 F DEBUG : r4 0000000d r5 0d000000 r6 148db80c r7 0000df35
09-18 18:54:44.591 11127 11127 F DEBUG : r8 148db80c r9 0021df35 sl 00000095 fp ea2b105d
09-18 18:54:44.591 11127 11127 F DEBUG : ip 00000057 sp ff8db3f8 lr 000000ad pc aacd5170 cpsr 200f0010
09-18 18:54:44.623 11127 11127 F DEBUG :
09-18 18:54:44.623 11127 11127 F DEBUG : backtrace:
09-18 18:54:44.623 11127 11127 F DEBUG : #00 pc 00006170 /mnt/nfs/android/gen_dcounts (gf256_muladd_mem+272)
09-18 18:54:44.648 4014 4111 W NativeCrashListener: Couldn't find ProcessRecord for pid 11124
09-18 18:54:44.653 3559 3559 W : debuggerd: resuming target 11124
09-18 18:54:44.656 4014 4032 I BootReceiver: Copying /data/tombstones/tombstone_04 to DropBox (SYSTEM_TOMBSTONE)

wirehair compiler error

Hi!

First Very Very Thanks for this lib.

The full error report is in here

my system is AMD64 Linux Ubuntu 20.4 LTS gcc-10.2 g++-10.2.

Very Very Very Thanks in Advance!

Best Whishes, Daniel.

License?

Thanks for writing & releasing this!

Is there some information regarding the license? I'm looking at using your library in https://github.com/AIS-Bonn/nimbro_network, but I can't do that without a proper license. BSD-3 or compatible would be optimal...

Adversarial erasure

If my understanding is correct, https://pdfs.semanticscholar.org/356f/0eaa61a54eda1adc99a794b28a55f8acc956.pdf applies to wirehair. If this is the case, then the statement:

It may take N + 1 or N + 2 or more

Would deserve a clarification - is it possible to erase certain, rather "unfortunate" combination of symbols making the decoder fail even when getting far more than N symbols? Finding the exact trapping sets appears to be non-trivial in practice, but even with simple brute force, RaptorQ can be made to fail with just few hundred picked symbols even at 1/2 rate.

(solution seems to be to either keep PRNG secret from adversary, or use MDS)

how create a shared library ? in linux ? Thanks!

(Whishlist) how create a shared lib ? in linux AMD64?

And Thanks for your Excelent Work! :-)

[]'s Daniel.

What needs to be done to be more Raptor-like (>N up to 2^24)

From what I understand, wirehair is LDPC over GF(256), instead of of textbook xors in GF(2) Luby. The boatload of matrix algebra algorithms to solve block dependencies is largely moonmath to me, but seems to involve some sort of cauchy systematic bolted on Luby so as to drastically improve recoverability as well as contain combinatorial explosion of higher order blocks seen in naive LT. So more or less RaptorQ, but with a bag of tricks to make it run fast.

So i surmise this could mean that N>2^16 (with K below < 2^16) is a possibility here, just as with RaptorQ.

I want to check before I go on a misguided adventure of mass-replacing uint16_t's with uint32_t's and trying to bend the tabgens for larger Ns - is my thinking entirely misguided and there's some hard limit for N as long K is < 2^16 (for instance, the resultant compressed matrix for GE..)? Any more gotchas I should be looking for in my ill fated attempt?

Let me rub the ego in exchange: Wirehair is hands down the best performing LDPC codec and the documentation of code is top notch (even if a lot of tricks still fly over my head). The limited N is the wrinkle in here which bars deployment in high block loss scenarios as a drop-in replacement for RQ.

Error compiling

Running into the following error while compiling a program that uses wirehair:

In file included from ./wirehair/cm256.h:32:0,
from ./fec.h:17,
from ./blockencodings.h:8,
from blockencodings.cpp:6:
./wirehair/gf256.h:70:14: fatal error: tmmintrin.h: No such file or directory
#include <tmmintrin.h> // SSSE3: _mm_shuffle_epi8
^~~~~~~~~~~~~
compilation terminated.

This is on ubuntu.
Any help or quidance is appreciated.

(Hint) If compiling without cmake use "-msse3" and "-msse4.1" to success compile.

Hi Catid!

Very Very Very Thanks for yours Fec libs!

Just to someone try compiling WireHair without cmake,
just add "-msse3" and "-msse4.1" to success compiling.

System: Linux Ubuntu AMD64
compiler: gcc-12
CPU: intel core i7

Best Wishes,
GrateFull,
Daniel.

Wirehair_ExtraInsufficient

Thanks for your code!
What does this error mean?
Sincerely appreciate

API Edge Cases Undocumented

Some fun things callers need to be aware of:

You may not call wirehair_read twice with the same packet ID, this causes decode failure.
Memory lifetimes generally arent documented - during encode the memory you pass into wirehair_encode must be available until the last time you call wirehair_write (and maybe later? probably not), whereas the same is not true for writing.

In other news, for my application I hardcoded the chunk size, assumed 16/32-byte alignment (depending on where its used) for XOR, replaced the different-compile-unit XOR code with simple avx-/sse-based intrinsics and saw something like a 3x speedup :).

Block duplication

When I try out the README example, I notice that the encoded blocks may have duplicated content, and the total number of blocks with unique content is bounded.

The following results are with 2^16 bytes message and 2^6 bytes blocks, generating blocks with consecutive id that starts from an arbitrary id.

When message content is all zero, all blocks are all zero
When message content simply repeats one byte (e.g. through memset as in README), there are 128 unique blocks and each of them repeats one byte
- The mapping between block id and content seems random with uniform distribution
If I partially override the aforementioned message content by repeating another one byte, there may be 256 unique blocks by chance, but sometimes there are still 128 unique blocks
With fully random message content such duplication is not observed
The recovery does not behave differently when decoding blocks with duplicated or nonduplicated content

Is this expected? As defined in wikipedia, fountain codes should be "a class of erasure codes with the property that a potentially limitless sequence of encoding symbols". Well this library certainly can generate blocks for potentially limitless block id (2^32 is good enough for me), and according to the definition it's unclear that whether I should expect a potentially limitless sequence of block content.

Also, does other rateless erasure codes have similar behavior? Is there any way to mitigate this? For example, is there any way to produce a sequence of block id for a given message so that those block id will generate blocks with distinct content? The general answer is probably no since theoretically the sequence cannot go beyound 128 id, but my use case is k=80-120 so it's still good to have. While there's a strawman solution that blindly generates first then discard duplicated blocks, it would be better to achieve it more efficiently. Thanks.

Several cross-platform questions

Hello. Thank you for this amazing job.

I'm not an expert in FEC and math behind it, but I'm encouraged to use wirehair as a part of my reliable UDP project in Kotlin. As you might know, Kotlin is a very cross-platform language - everywhere Java can run Kotlin can run as well. And I want to take advantage of this property and make my project run on Linux, Windows, and Android.
So, my questions are:

What do you think, am I able to cross-compile wirehair to Windows and Android (I'm on ubuntu right now) using standard cross-compiling toolchains?
What do you think, will it work there?
Do you have any advice related to the topic?
If there is no possibility of running wirehair on Android, is there any other FEC implementations suitable for reliable UDP able to work fast in networks with very frequent retransmissions and providing comparable to wirehair performance you know?

Thanks in advance!

(wishlist) stream api

Hi!
Very Very Very Thanks for this lib.

I am planning to use WireHair with files from hundreds of megabytes to tens of gigabytes or more,
ranging from 1 to 300 files simultaneously. this clearly does not fit in the server ram.

Is there any way to not have to load entire files into ram?

Thanks!
Best Wishes, Daniel.

ARM64 Windows Support

At gf256.h line 60-66, immintrin.h gets included if the build is based on MSVC and is AVX2 is supported. The issue here starts from immintrin.h not supporting build targets other than x86 or x64.

It is very understandable assuming Windows is for x86 and x64 machines, but unfortunately, there is already an ARM64 machine: Hololens 2... The fix I can think of using GF256_TARGET_MOBILE as a guard to not including immintrin.h, so including immintrin.h can be avoided by manually setting GF256_TARGET_MOBILE.

I will soon submit a pull request based on the above fix and let me know if you think there is a better solution.

Segmentation fault

Hello. I'm experiencing this issue from time to time. Yesterday everything was fine (about 20 runs without any error) but today I only have something like 1 successful run out of 50 total.

Problem

Thread 2 "java" received signal SIGSEGV, Segmentation fault.
__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:423
423	../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.

This error happens here WirehairCodec.cpp:4078:

// Copy from the original file data
memcpy(data_out, src, copyBytes);

Environment
I've compiled the master branch with c++ (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 and cmake 3.13.1. I'm on ubuntu 18.04 (as you might guess) amd64. I'm calling your code from OpenJDK 8u191 using JNA.

My quesses
I'm absolutely newbie to C and all this pointer stuff, but after executing this line WirehairCodec.cpp:4075

const uint8_t * GF256_RESTRICT src = _input_blocks + _block_bytes * block_id;

in good scenario debugger says that value of the src is "" (empty string)
in bad scenario debugger says that value of the src is 0x... (some address)
The value of _input_blocks, _block_bytes and block_id is always the same type - address, int, int respectively.

Compiler warning about parameters with restrict

Hey, thanks for this awesome library! I've been using it for some time now inside my ROS package https://github.com/AIS-Bonn/nimbro_network/tree/develop.

Since updating to gcc 9.3 I'm getting a compiler warning:

contrib/wirehair/WirehairCodec.cpp: In member function ‘void wirehair::Codec::BackSubstituteAboveDiagonal()’:
contrib/wirehair/WirehairCodec.cpp:2303:39: warning: passing argument 1 to restrict-qualified parameter aliases with argument 2 [-Wrestrict]
 2303 |                         gf256_div_mem(src, src, code_value, _block_bytes);
      |                                       ^~~  ~~~
contrib/wirehair/WirehairCodec.cpp:2387:35: warning: passing argument 1 to restrict-qualified parameter aliases with argument 2 [-Wrestrict]
 2387 |                     gf256_div_mem(src, src, code_value, _block_bytes);
      |                                   ^~~  ~~~
contrib/wirehair/WirehairCodec.cpp:2652:31: warning: passing argument 1 to restrict-qualified parameter aliases with argument 2 [-Wrestrict]
 2652 |                 gf256_div_mem(src, src, code_value, _block_bytes);
      |

I guess these should be fixed since the compiler can perform optimizations assuming that the arguments are not identical. Either we should remove the restrict qualifier or modify the calling code and copy the array before doing the operation.

Multiple Instance Corruption

Wirehair encoders (and possibly decoders) are corrupted by using multiple instances simultaneously. No wirehair errors reported, but memcmp fails after decoder reconstruction.

Can be reproduced by inserting the following lines into your test case:
https://github.com/catid/wirehair/blob/master/tests/wirehair_test.cpp
#L132

    + wirehair_state encoder0 = wirehair_encode(encoder, message_in, bytes, block_bytes);
    + wirehair_state encoder1 = wirehair_encode(encoder, message_in, bytes, block_bytes);
    + wirehair_free(encoder1);
    encoder = wirehair_encode(encoder, message_in, bytes, block_bytes);
    assert(encoder);

#L192

    + wirehair_free(encoder0);
    delete []message_in;

Issue understanding matrix layout

In WirehairCodec.h we find:

    (1) Matrix Construction
        A = Original data blocks, N blocks long.
        D = Count of dense/heavy matrix rows (see below), chosen based on N.
        E = N + D blocks = Count of recovery set blocks.
        R = Recovery blocks, E blocks long.
        C = Matrix, with E rows and E columns.
        0 = Dense/heavy rows sum to zero.
        +---------+-------+   +---+   +---+
        |         |       |   |   |   |   |
        |    P    |   M   |   |   |   | A |
        |         |       |   |   |   |   |
        +---------+-----+-+ x | R | = +---+
        |    D    |  J  |0|   |   |   | 0 |
        +---------+-+---+-+   |   |   +---+
        |    0      |  H  |   |   |   | 0 |
        +-----------+-----+   +---+   +---+
        A and B are Ex1 vectors of blocks.
            A has N rows of the original data padded by H zeros.
            R has E rows of encoded blocks.
        C is the ExE hybrid matrix above:
            P is the NxN peeling binary submatrix.
                - Optimized for success of the peeling solver.
            M is the NxD mixing binary submatrix.
                - Used to mix the D dense/heavy rows into the peeling rows.
            D is the DxN dense binary submatrix.
                - Used to improve on recovery properties of peeling code.
            J is a DxD random-looking invertible submatrix.
            H is the 6x18 heavy GF(256) submatrix.
                - Used to improve on recovery properties of dense code.
            0 is a Dx6 zero submatrix.

My confusion lies in the M, D, J and 0 part. If M is NxD and D is DxN and J is DxD, then J exactly fills the corner between M and D and there should be no room for the 0 part. What gives?

Cannot compile wirehair with distributed compilation

I found that I cannot compile wirehair with distributed compilation.
I am afraid that some compiler options are missing in CMakeLists.txt.

I used sccache for distributed compilation.

$ cmake .. -DCMAKE_C_COMPILER_LAUNCHER=sccache -DCMAKE_CXX_COMPILER_LAUNCHER=sccache
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/gitai/gitai/wirehair/build
$ make -j1
Scanning dependencies of target gen_tables
[  4%] Building CXX object CMakeFiles/gen_tables.dir/test/SiameseTools.cpp.o
[  8%] Building CXX object CMakeFiles/gen_tables.dir/tables/TableGenerator.cpp.o
/home/gitai/gitai/wirehair/tables/TableGenerator.cpp: In function 'void ShuffleDeck16(siamese::PCGRandom&, uint16_t*, uint32_t)':
/home/gitai/gitai/wirehair/tables/TableGenerator.cpp:1070:17: warning: this statement may fall through [-Wimplicit-fallthrough=]
 1070 |       __cxa_init_primary_exception(void *object, std::type_info *tinfo,
      |                 ^~~~
/home/gitai/gitai/wirehair/tables/TableGenerator.cpp:1071:13: note: here
 1071 |                 void ( *dest) (void *)) noexcept;
      |             ^
/home/gitai/gitai/wirehair/tables/TableGenerator.cpp:1075:17: warning: this statement may fall through [-Wimplicit-fallthrough=]
 1075 |
      |                 ^
/home/gitai/gitai/wirehair/tables/TableGenerator.cpp:1076:13: note: here
 1076 |
      |             ^
[ 12%] Building CXX object CMakeFiles/gen_tables.dir/tables/HeavyRowGenerator.cpp.o
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/include/immintrin.h:123,
                 from /home/gitai/gitai/wirehair/tables/../gf256.h:68,
                 from /home/gitai/gitai/wirehair/tables/HeavyRowGenerator.cpp:39:
/usr/lib/gcc/x86_64-linux-gnu/9/include/movdirintrin.h: In function 'void _directstoreu_u32(void*, unsigned int)':
/usr/lib/gcc/x86_64-linux-gnu/9/include/movdirintrin.h:41:3: error: '__builtin_ia32_directstoreu_u32' was not declared in this scope; did you mean '__builtin_ia32_bextr_u32'?
/usr/lib/gcc/x86_64-linux-gnu/9/include/movdirintrin.h: In function 'void _directstoreu_u64(void*, long long unsigned int)':
/usr/lib/gcc/x86_64-linux-gnu/9/include/movdirintrin.h:48:3: error: '__builtin_ia32_directstoreu_u64' was not declared in this scope; did you mean '__builtin_ia32_bextr_u64'?
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/include/immintrin.h:123,
                 from /home/gitai/gitai/wirehair/tables/../gf256.h:68,
                 from /home/gitai/gitai/wirehair/tables/HeavyRowGenerator.cpp:39:
/usr/lib/gcc/x86_64-linux-gnu/9/include/movdirintrin.h: In function 'void _movdir64b(void*, const void*)':
/usr/lib/gcc/x86_64-linux-gnu/9/include/movdirintrin.h:67:3: error: '__builtin_ia32_movdir64b' was not declared in this scope; did you mean '__builtin_ia32_movnti64'?
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/include/immintrin.h:129,
                 from /home/gitai/gitai/wirehair/tables/../gf256.h:68,
                 from /home/gitai/gitai/wirehair/tables/HeavyRowGenerator.cpp:39:
/usr/lib/gcc/x86_64-linux-gnu/9/include/waitpkgintrin.h: In function 'void _umonitor(void*)':
/usr/lib/gcc/x86_64-linux-gnu/9/include/waitpkgintrin.h:41:3: error: '__builtin_ia32_umonitor' was not declared in this scope; did you mean '__builtin_ia32_monitor'?
/usr/lib/gcc/x86_64-linux-gnu/9/include/waitpkgintrin.h: In function 'unsigned char _umwait(unsigned int, long long unsigned int)':
/usr/lib/gcc/x86_64-linux-gnu/9/include/waitpkgintrin.h:48:10: error: '__builtin_ia32_umwait' was not declared in this scope; did you mean '__builtin_ia32_mwait'?
/usr/lib/gcc/x86_64-linux-gnu/9/include/waitpkgintrin.h: In function 'unsigned char _tpause(unsigned int, long long unsigned int)':
/usr/lib/gcc/x86_64-linux-gnu/9/include/waitpkgintrin.h:55:10: error: '__builtin_ia32_tpause' was not declared in this scope; did you mean '__builtin_ia32_pause'?
In file included from /home/gitai/gitai/wirehair/tables/../gf256.h:68,
                 from /home/gitai/gitai/wirehair/tables/HeavyRowGenerator.cpp:39:
/usr/lib/gcc/x86_64-linux-gnu/9/include/immintrin.h: In function 'void _ptwrite64(long long unsigned int)':
/usr/lib/gcc/x86_64-linux-gnu/9/include/immintrin.h:289:3: error: '__builtin_ia32_ptwrite64' was not declared in this scope; did you mean '__builtin_ia32_mwaitx'?
/usr/lib/gcc/x86_64-linux-gnu/9/include/immintrin.h: In function 'void _ptwrite32(unsigned int)':
/usr/lib/gcc/x86_64-linux-gnu/9/include/immintrin.h:297:3: error: '__builtin_ia32_ptwrite32' was not declared in this scope; did you mean '__builtin_ia32_mwaitx'?
/home/gitai/gitai/wirehair/tables/HeavyRowGenerator.cpp: In function 'void PrintMatrix(const uint8_t*)':
/home/gitai/gitai/wirehair/tables/HeavyRowGenerator.cpp:212:20: warning: unused variable 'modulus' [-Wunused-variable]
  212 | typedef unsigned char uint_fast8_t;
      |                    ^~~~~~~
sccache: Compiler killed by signal 1
make[2]: *** [CMakeFiles/gen_tables.dir/build.make:89: CMakeFiles/gen_tables.dir/tables/HeavyRowGenerator.cpp.o] Error 254
make[1]: *** [CMakeFiles/Makefile2:88: CMakeFiles/gen_tables.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

Clarify on memory safety

As partially mentioned in #2, I also notice that the message buffer must outlive encoder, but the block buffers are not necessarily to outlive the decoder. Can you confirm on this?

Also, I noticed that encoder seems to be capable to encode multiple blocks in parallel (i.e. through multithreading). (On the other hand decoding feels inherently sequential so I never attempt that.) Can you also clarify the thread safety properties? I am building a Rust binding for this library (https://github.com/sgdxbc/wirehair), and the language asks for explicit statement of lifetime and thread safety. Thanks.