Coder Social home page Coder Social logo

intel / hexl Goto Github PK

View Code? Open in Web Editor NEW
214.0 15.0 48.0 5.57 MB

Intel:registered: Homomorphic Encryption Acceleration Library accelerates modular arithmetic operations used in homomorphic encryption

Home Page: https://intel.github.io/hexl

License: Apache License 2.0

CMake 3.54% C++ 96.46%
homomorphic-encryption avx-512 privacy cryptography

hexl's Introduction

Build and Test

Intel Homomorphic Encryption (HE) Acceleration Library

Intel:registered: HE Acceleration Library is an open-source library which provides efficient implementations of integer arithmetic on Galois fields. Such arithmetic is prevalent in cryptography, particularly in homomorphic encryption (HE) schemes. Intel HE Acceleration Library targets integer arithmetic with word-sized primes, typically 30-60 bits. Intel HE Acceleration Library provides an API for 64-bit unsigned integers and targets Intel CPUs. For more details on Intel HE Acceleration Library, see our whitepaper. For tips on best performance, see Performance.

Contents

Introduction

Many cryptographic applications, particularly homomorphic encryption (HE), rely on integer polynomial arithmetic in a finite field. HE, which enables computation on encrypted data, typically uses polynomials with degree N a power of two roughly in the range N=[2^{10}, 2^{17}]. The coefficients of these polynomials are in a finite field with a word-sized prime, q, up to q~62 bits. More precisely, the polynomials live in the ring Z_q[X]/(X^N + 1). That is, when adding or multiplying two polynomials, each coefficient of the result is reduced by the prime modulus q. When multiplying two polynomials, the resulting polynomials of degree 2N is additionally reduced by taking the remainder when dividing by X^N+1.

The primary bottleneck in many HE applications is polynomial-polynomial multiplication in Z_q[X]/(X^N + 1). For efficient implementation, Intel HE Acceleration Library implements the negacyclic number-theoretic transform (NTT). To multiply two polynomials, q_1(x), q_2(x) using the NTT, we perform the FwdNTT on the two input polynomials, then perform an element-wise modular multiplication, and perform the InvNTT on the result.

Intel HE Acceleration Library implements the following functions:

  • The forward and inverse negacyclic number-theoretic transform (NTT)
  • Element-wise vector-vector modular multiplication
  • Element-wise vector-scalar modular multiplication with optional addition
  • Element-wise modular multiplication

For each function, the library implements one or several Intel(R) AVX-512 implementations, as well as a less performant, more readable native C++ implementation. Intel HE Acceleration Library will automatically choose the best implementation for the given CPU Intel(R) AVX-512 feature set. In particular, when the modulus q is less than 2^{50}, the AVX512IFMA instruction set available on Intel IceLake server and IceLake client will provide a more efficient implementation.

For additional functionality, see the public headers, located in include/hexl

Building Intel HE Acceleration Library

Intel HE Acceleration Library can be built in several ways. Intel HE Acceleration Library has been uploaded to the Microsoft vcpkg C++ package manager, which supports Linux, macOS, and Windows builds. See the vcpkg repository for instructions to build Intel HE Acceleration Library with vcpkg, e.g. run vcpkg install hexl. There may be some delay in uploading latest release ports to vcpkg. Intel HE Acceleration Library provides port files to build the latest version with vcpkg. For a static build, run vcpkg install hexl --overlay-ports=/path/to/hexl/port/hexl --head. For dynamic build, use the custom triplet file and run vcpkg install hexl:hexl-dynamic-build --overlay-ports=/path/to/hexl/port/hexl --head --overlay-triplets=/path/to/hexl/port/hexl. For detailed explanation, see instruction for building vcpkg port using overlays and use of custom triplet provided by vcpkg.

Intel HE Acceleration Library also supports a build using the CMake build system. See below for the instructions to build Intel HE Acceleration Library from source using CMake.

Dependencies

We have tested Intel HE Acceleration Library on the following operating systems:

  • Ubuntu 20.04
  • macOS 10.15 Catalina
  • Microsoft Windows 10

Intel HE Acceleration Library requires the following dependencies:

Dependency Version
CMake >= 3.13 *
Compiler gcc >= 7.0, clang++ >= 5.0, MSVC >= 2019

* For Windows 10, you must check whether the version on CMake you have can generate the necessary Visual Studio project files. For example, only from CMake 3.14 onwards can MSVC 2019 project files be generated.

Compile-time options

In addition to the standard CMake build options, Intel HE Acceleration Library supports several compile-time flags to configure the build. For convenience, they are listed below:

CMake option Values Default
HEXL_BENCHMARK ON / OFF ON Set to ON to enable benchmark suite via Google benchmark
HEXL_COVERAGE ON / OFF OFF Set to ON to enable coverage report of unit-tests
HEXL_SHARED_LIB ON / OFF OFF Set to ON to enable building shared library
HEXL_DOCS ON / OFF OFF Set to ON to enable building of documentation
HEXL_TESTING ON / OFF ON Set to ON to enable building of unit-tests
HEXL_TREAT_WARNING_AS_ERROR ON / OFF OFF Set to ON to treat all warnings as error

Compiling Intel HE Acceleration Library

To compile Intel HE Acceleration Library from source code, first clone the repository and change directories to where the source has been cloned.

Linux and Mac

The instructions to build Intel HE Acceleration Library are common to Linux and MacOS.

Then, to configure the build, call

cmake -S . -B build

adding the desired compile-time options with a -D flag. For instance, to use a non-standard installation directory, configure the build with

cmake -S . -B build -DCMAKE_INSTALL_PREFIX=/path/to/install

Or, to build Intel HE Acceleration Library as a shared library, call

cmake -S . -B build -DHEXL_SHARED_LIB=ON

Then, to build Intel HE Acceleration Library, call

cmake --build build

This will build the Intel HE Acceleration Library library in the build/hexl/lib/ directory.

To install Intel HE Acceleration Library to the installation directory, run

cmake --install build

Windows

To compile Intel HE Acceleration Library on Windows using Visual Studio in Release mode, configure the build via

cmake -S . -B build -G "Visual Studio 16 2019" -DCMAKE_BUILD_TYPE=Release

adding the desired compile-time options with a -D flag (see Compile-time options). For instance, to use a non-standard installation directory, configure the build with

cmake -S . -B build -G "Visual Studio 16 2019" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/path/to/install

To specify the desired build configuration, pass either --config Debug or --config Release to the build step and install steps. For instance, to build Intel HE Acceleration Library in Release mode, call

cmake --build build --config Release

This will build the Intel HE Acceleration Library library in the build/hexl/lib/ or build/hexl/Release/lib directory.

To install Intel HE Acceleration Library to the installation directory, run

cmake --build build --target install --config Release

Performance

For best performance, we recommend using Intel HE Acceleration Library on a Linux system with the clang++-12 compiler. We also recommend using a processor with Intel AVX512DQ support, with best performance on processors supporting Intel AVX512-IFMA52. To determine if your processor supports AVX512-IFMA52, simply look for -- Setting HEXL_HAS_AVX512IFMA printed during the configure step.

See the below table for setting the modulus q for best performance.

Instruction Set Bound on modulus q
AVX512-DQ q < 2^30
AVX512-IFMA52 q < 2^50

Some speedup is still expected for moduli q > 2^30 using the AVX512-DQ instruction set.

Testing Intel HE Acceleration Library

To run a set of unit tests via Googletest, configure and build Intel HE Acceleration Library with -DHEXL_TESTING=ON (see Compile-time options). Then, run

cmake --build build --target unittest

The unit-test executable itself is located at build/test/unit-test on Linux and Mac, and at build\test\Release\unit-test.exe or build\test\Debug\unit-test.exe on Windows.

Benchmarking Intel HE Acceleration Library

To run a set of benchmarks via Google benchmark, configure and build Intel HE Acceleration Library with -DHEXL_BENCHMARK=ON (see Compile-time options). Then, run

cmake --build build --target bench

On Windows, run

cmake --build build --target bench --config Release

The benchmark executable itself is located at build/benchmark/bench_hexl on Linux and Mac, and at build\benchmark\Debug\bench_hexl.exe or build\benchmark\Release\bench_hexl.exe on Windows.

Using Intel HE Acceleration Library

The example folder has an example of using Intel HE Acceleration Library in a third-party project.

Debugging

For optimal performance, Intel HE Acceleration Library does not perform input validation. In many cases the time required for the validation would be longer than the execution of the function itself. To debug Intel HE Acceleration Library, configure and build Intel HE Acceleration Library with -DCMAKE_BUILD_TYPE=Debug (see Compile-time options). This will generate a debug version of the library, e.g. libhexl_debug.a, that can be used to debug the execution. In Debug mode, Intel HE Acceleration Library will also link against Address Sanitizer.

Note, enabling CMAKE_BUILD_TYPE=Debug will result in a significant runtime overhead.

To enable verbose logging for the benchmarks or unit-tests in a Debug build, add the log level as a command-line argument, e.g. build/benchmark/bench_hexl --v=9. See easyloggingpp's documentation for more details.

Threading

Intel HE Acceleration Library is single-threaded and thread-safe.

Community Adoption

Intel HE Acceleration Library has been integrated to the following homomorphic encryption libraries:

See also the Intel Homomorphic Encryption Toolkit for example uses cases using Intel HE Acceleration Library.

Please let us know if you are aware of any other uses of Intel HE Acceleration Library.

Documentation

Intel HE Acceleration Library supports documentation via Doxygen. See https://intel.github.io/hexl for the latest Doxygen documentation.

To build documentation, first install doxygen and graphviz, e.g.

sudo apt-get install doxygen graphviz

Then, configure Intel HE Acceleration Library with -DHEXL_DOCS=ON (see Compile-time options). To build Doxygen documentation, after configuring Intel HE Acceleration Library with -DHEXL_DOCS=ON, run

cmake --build build --target docs

To view the generated Doxygen documentation, open the generated docs/doxygen/html/index.html file in a web browser.

Contributing

Intel HE Acceleration Library welcomes external contributions. To know more about contributing please go to CONTRIBUTING.md.

We encourage feedback and suggestions via Github Issues as well as discussion via Github Discussions.

Repository layout

Public headers reside in the hexl/include folder. Private headers, e.g. those containing Intel(R) AVX-512 code should not be put in this folder.

Citing Intel HE Acceleration Library

To cite Intel HE Acceleration Library, please use the following BibTeX entry.

Version 1.2

    @misc{IntelHEXL,
        author={Boemer, Fabian and Kim, Sejun and Seifu, Gelila and de Souza, Fillipe DM and Gopal, Vinodh and others},
        title = {{I}ntel {HEXL} (release 1.2)},
        howpublished = {\url{https://github.com/intel/hexl}},
        month = september,
        year = 2021,
        key = {Intel HEXL}
    }

Version 1.1

    @misc{IntelHEXL,
        author={Boemer, Fabian and Kim, Sejun and Seifu, Gelila and de Souza, Fillipe DM and Gopal, Vinodh and others},
        title = {{I}ntel {HEXL} (release 1.1)},
        howpublished = {\url{https://github.com/intel/hexl}},
        month = may,
        year = 2021,
        key = {Intel HEXL}
    }

Version 1.0

    @misc{IntelHEXL,
        author={Boemer, Fabian and Kim, Sejun and Seifu, Gelila and de Souza, Fillipe DM and Gopal, Vinodh and others},
        title = {{I}ntel {HEXL} (release 1.0)},
        howpublished = {\url{https://github.com/intel/hexl}},
        month = april,
        year = 2021,
        key = {Intel HEXL}
    }

Contributors

The Intel contributors to this project, sorted by last name, are

In addition to the Intel contributors listed, we are also grateful to contributions to this project that are not reflected in the Git history:

hexl's People

Contributors

ajagann avatar faberga avatar fboemer avatar fdiasmor avatar florentclmichel avatar gelilaseifu avatar hamishun avatar jlhcrawford avatar joserochh avatar sivanov-work avatar skmono avatar tgonzalez89-intel avatar ymeng-git avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hexl's Issues

Issue in HEXL_CHECK_BOUNDS debug check in avx512 reduction when output_mod_factor is 2.

Hello,

I believe there is an issue in the check found at: hexl/eltwise/eltwise-reduce-mod-avx512.hpp:84. The code reads as follows:

if (input_mod_factor == modulus) { if (output_mod_factor == 2) { for (size_t i = 0; i < n_tmp; i += 8) { __m512i v_op = _mm512_loadu_si512(v_operand); v_op = _mm512_hexl_barrett_reduce64<BitShift, 2>( v_op, v_modulus, v_bf, v_bf_52, prod_right_shift, v_neg_mod); HEXL_CHECK_BOUNDS(ExtractValues(v_op).data(), 8, modulus, "v_op exceeds bound " << modulus); _mm512_storeu_si512(v_result, v_op); ++v_operand; ++v_result; } ...

This check will fail even though the operations succeeds. I believe you should be checking against modulus << 1u instead of against modulus, since the output is expected to be within [0, 2*modulus). I'm not sure if there are other problems across the codebase with these checks, but it is difficult to debug issues when these things are throwing.

EltwiseReduceMod fails on EltwiseReduceModAVX512 operation with a modulus that is between 51 and 52 bits.

The library does not reduce properly for any modulus between 51 and 52 bits. See this test:

TEST(ReduceFailure, ReduceFailure) {
std::vector<uint64_t> tst = {
41834706795195925,
4670328046076965,
17760383643480343,
49435670237278413,
24379914680392825,
33362919182756282,
14501318335168678,
31183658415687847
};
auto mod = 4440955546175441;
auto chk = tst;
intel::hexl::EltwiseReduceMod(chk.data(), tst.data(), tst.size(),
mod,
mod, 1);
auto exp = tst;
for (auto & elem : exp) {
elem %= mod;
}
ASSERT_EQ(exp, chk);
}

This is just an example. I have tried several different moduli of the same bit width and several different inputs. The reduction function does not match the modulus operation.

Add a port file to vcpkg?

FYI, you can create a port file in vcpkg so that we can add a feature to SEAL:

./vcpkg install seal       # installs SEAL without HEXL
./vcpkg install seal[hexl] # installs SEAL with HEXL

(Rust binding) Repeated invocation of EltwiseFMAModAVX512 (with different data) in loop has unexpected performance regression

I am weiting rust bindings for hexl here. I have added support for NTT operations and some elwise operations. However, I am running into issues with elwise operations with prime (ie q) set to 50 bits. To see what's wrong you can clone the repository and run cargo bench modulus/elwise_fma_mod. This will run benches inside benches/modulus.rs with prefix elwise_fma_mod which uses EltwiseFMAModAVX512 internally and will produce following looking output

modulus/elwise_fma_mod_2d/n=32768/logq=60/mod_size=1
                        time:   [40.942 µs 40.978 µs 41.017 µs]

modulus/elwise_fma_mod_2d/n=32768/logq=60/mod_size=3
                        time:   [122.65 µs 122.72 µs 122.80 µs]

modulus/elwise_fma_mod_2d/n=32768/logq=60/mod_size=5
                        time:   [205.28 µs 205.52 µs 205.76 µs]

modulus/elwise_fma_mod_2d/n=32768/logq=60/mod_size=15
                        time:   [616.00 µs 616.57 µs 617.19 µs]

modulus/elwise_fma_mod_2d/n=32768/logq=50/mod_size=1
                        time:   [9.6013 µs 9.6061 µs 9.6115 µs]

modulus/elwise_fma_mod_2d/n=32768/logq=50/mod_size=3
                        time:   [27.549 µs 27.647 µs 27.770 µs]

modulus/elwise_fma_mod_2d/n=32768/logq=50/mod_size=5
                        time:   [81.501 µs 81.550 µs 81.607 µs]

modulus/elwise_fma_mod_2d/n=32768/logq=50/mod_size=15
                        time:   [284.54 µs 287.81 µs 291.38 µs]

I have reduced the output to only necessary items: bench name and time.

bench modulus/elwise_fma_mod_2d/* benches this function. The function simply takes two 2-dimensional (row-major) matrix r0, r1, and a scalar and calls elwise_fma_mod row-wise. elwise_fma_mod internally calls EltwiseFMAModAVX512 here.

n is row size, fixed at 32768. logq is bits in prime and mod_size is no. of rows in matrix. For example, modulus/elwise_fma_mod_2d/n=32768/logq=60/mod_size=1 calls elwise_fma_mod once (since it has only 1 row) with a 60 bit prime and vector size 32768 and modulus/elwise_fma_mod_2d/n=32768/logq=60/mod_size=3 calls elwise_fma_mod thrice for 3 different rows (since mod_size is 3) with rest of parameters set to same. Hence we must expect performance of modulus/elwise_fma_mod_2d/n=32768/logq=60/mod_size=3 to be around 3x of modulus/elwise_fma_mod_2d/n=32768/logq=60/mod_size=1. Indeed it is. Same holds for other benches with n=32768 and logq=60 and mod_size=5 / 15.

But things behave differently when logq is set to 50 bits (ie when EltwiseFMAModAVX512 uses IFMA instead of DQ). modulus/elwise_fma_mod_2d/n=32768/logq=50/mod_size=3 is 3x of modulus/elwise_fma_mod_2d/n=32768/logq=50/mod_size=1 as expected, but same pattern does not holds when mod_size is either 5 or 15 (for mod_size=5 it should be around 50µs but is 81µs and for mod_size=15 it should be 145µs but is 287µs). I have tried for other mod_sizes and it gets worse as mod_size increases, that is as no. of rows increase.

I am unable to detect what causes this for 50 bit primes. Do you have any pointers? Or is this expected with IFMA?

Thanks!

build fails on Mac (M1)

running cmake --build build fails with following on M1 mac

[ 1%] Building C object cmake/third-party/cpu-features/cpu-features-build/CMakeFiles/unix_based_hardware_detection.dir/src/hwcaps.c.o [ 1%] Built target unix_based_hardware_detection [ 2%] Building C object cmake/third-party/cpu-features/cpu-features-build/CMakeFiles/utils.dir/src/filesystem.c.o [ 4%] Building C object cmake/third-party/cpu-features/cpu-features-build/CMakeFiles/utils.dir/src/stack_line_reader.c.o [ 5%] Building C object cmake/third-party/cpu-features/cpu-features-build/CMakeFiles/utils.dir/src/string_view.c.o [ 5%] Built target utils [ 6%] Building C object cmake/third-party/cpu-features/cpu-features-build/CMakeFiles/cpu_features.dir/src/cpuinfo_arm.c.o In file included from /Users/janmajayamall/desktop/hexl/build/cmake/third-party/cpu-features/cpu-features-src/src/cpuinfo_arm.c:15: /Users/janmajayamall/desktop/hexl/build/cmake/third-party/cpu-features/cpu-features-src/include/cpuinfo_arm.h:118:2: error: "Including cpuinfo_arm.h from a non-arm target." #error "Including cpuinfo_arm.h from a non-arm target." ^ 1 error generated. make[2]: *** [cmake/third-party/cpu-features/cpu-features-build/CMakeFiles/cpu_features.dir/src/cpuinfo_arm.c.o] Error 1 make[1]: *** [cmake/third-party/cpu-features/cpu-features-build/CMakeFiles/cpu_features.dir/all] Error 2 make: *** [all] Error 2

EltwiseFMAMod fails on EltwiseFMAModAVX512 operation with a modulus that is between 51 and 52 bits.

Hello,
I believe that this is related to the following issue: #121. That issue is fixed and working on v1.2.5, but using the EltwiseFMAMod function to scale numbers when the modulus is 52 bits (a number between 2^51 and 2^52) sometimes fails. Could you confirm this behaviour?

Example case:

modulus = 4503599627370486
input = {1191078607011827, 1769260550218270, 4345204646426905, 1153813479460416, 2994576917176123, 2254124429352543, 2866174865142532, 3780255914740878}
factor = 3724197286470134
input_mod_factor = 1  

function call:

intel::hexl::EltwiseFMAMod(result, input, factor, nullptr, input.size(), modulus, input_mod_factor);

Result:

{3669213812048074, 4438596699595472, 3062713067630738, 2441777770654938, 1332695674286306, 833480206939968, 2031613570657280, 229641522601742}

Expected (input * factor):

{3669213812048074, 4438596699595472, 3062713067630738, 2441777770654938, 1332695674286306, 833480206939968, 2031613570657280, 229641522601752}

Note the last element is not correct.

tried to build the docker file getting this error

mac osx 12.1
docker version 4.6.1
hexl version v1.2.4

7 393.9 [ 86%] Linking CXX executable ../bin/benchmark/poly-benchmark-16k
#7 394.7 [ 86%] Built target compare-bfvrns-vs-bfvrnsB
#7 394.8 [ 86%] Built target poly-benchmark-16k
#7 395.1 [ 87%] Linking CXX executable ../../bin/examples/pke/depth-bfvrns
#7 395.3 [ 87%] Built target depth-bfvrns
#7 395.6 [ 87%] Linking CXX executable ../bin/benchmark/poly-benchmark-4k
#7 395.8 [ 87%] Linking CXX executable ../../bin/examples/pke/evalatindex
#7 395.8 [ 88%] Linking CXX executable ../../bin/examples/pke/depth-bfvrns-b
#7 396.0 [ 88%] Built target poly-benchmark-4k
#7 396.3 [ 88%] Built target depth-bfvrns-b
#7 396.5 [ 88%] Built target evalatindex
#7 396.7 [ 88%] Linking CXX executable ../../bin/examples/pke/depth-bgvrns
#7 397.3 [ 88%] Built target depth-bgvrns
#7 399.4 [ 88%] Linking CXX executable ../bin/benchmark/lib-benchmark
#7 399.9 [ 88%] Built target lib-benchmark
#7 401.7 [ 89%] Linking CXX executable ../../bin/examples/pke/threshold-fhe
#7 402.2 [ 89%] Built target threshold-fhe
#7 402.7 [ 90%] Linking CXX executable ../bin/benchmark/VectorMath
#7 403.1 [ 90%] Built target VectorMath
#7 407.7 [ 91%] Linking CXX executable ../bin/benchmark/lib-hexl-benchmark
#7 408.2 [ 91%] Built target lib-hexl-benchmark
#7 443.1 [ 92%] Linking CXX executable ../bin/benchmark/serialize-ckks
#7 443.4 [ 92%] Built target serialize-ckks
#7 444.7 make[1]: *** [CMakeFiles/Makefile2:1406: src/pke/CMakeFiles/pke_tests.dir/all] Error 2
#7 445.3 [ 93%] Linking CXX executable ../../bin/examples/pke/simple-integers-serial-bgvrns
#7 445.5 [ 93%] Built target simple-integers-serial-bgvrns
#7 451.7 [ 93%] Linking CXX executable ../../bin/examples/pke/simple-integers-serial
#7 451.9 [ 93%] Built target simple-integers-serial
#7 451.9 make: *** [Makefile:149: all] Error 2

executor failed running [/bin/sh -c tar -zxvf libs.tar.gz && cd /libs/hexl && cmake -S . -B build && cmake --build build -j && cmake --install build && cd /libs/GSL && cmake -S . -B build -DGSL_TEST=OFF && cmake --build build -j && cmake --install build && cd /libs/zlib && cmake -S . -B build && cmake --build build -j && cmake --install build && cd /libs/zstd/build/cmake && cmake -S . -B build -DZSTD_BUILD_PROGRAMS=OFF -DZSTD_BUILD_SHARED=OFF -DZSTD_BUILD_STATIC=ON -DZSTD_BUILD_TESTS=OFF -DZSTD_MULTITHREAD_SUPPORT=OFF && cmake --build build -j && cmake --install build && cd /libs/SEAL && cmake -S . -B build -DSEAL_BUILD_DEPS=OFF -DSEAL_USE_INTEL_HEXL=ON -DHEXL_DIR=/usr/local/lib/cmake/hexl-1.2.3/ && cmake --build build -j && cmake --install build && cd /libs/palisade-release && cmake -S . -B build -DWITH_INTEL_HEXL=ON -DINTEL_HEXL_PREBUILT=ON -DINTEL_HEXL_HINT_DIR=/usr/local/lib/cmake/hexl-1.2.3/ && cmake --build build -j && cmake --install build && cd /libs/HElib && cmake -S . -B build -DUSE_INTEL_HEXL=ON -DHEXL_DIR=/usr/local/lib/cmake/hexl-1.2.3/ && cmake --build build -j && cmake --install build]: exit code: 2

Continue to have master branch

Since the branch was renamed to main even relatively recent codebases tha depend on hexl (from ~ a year ago) are now broken. Example:

git clone -b 3.6.6 https://github.com/microsoft/SEAL
cd SEAL
cmake -S . -B build -DCMAKE_INSTALL_PREFIX=$LIBDIR -DSEAL_USE_INTEL_HEXL=ON

It would have been a much smoother transition and better experience for everyone if master branch was kept in the repo even if further development is done in main or, as some projects do, if master was mirroring main, which can be done with a very simple script.

CKKS multiply experimental feature doesn't work correctly when integrated with SEAL

When trying to integrate the HEXL experimental features into SEAL, I found that it's not giving me the correct results.
I'm not sure if I'm doing something wrong or if there is an error in the CKKS multiply implementation of HEXL.

How to reproduce:

git clone https://github.com/tgonzalez89-intel/SEAL.git
git checkout hexl-ckks-mult-bug
./build.sh
cd build/bin
./sealtest

Observe that the tests fail.

You can run git diff 3.7.2 to observe the changes I made to SEAL in order to integrate HEXL's CKKS multiply.

EltwiseReduceMod Fails with Larger Moduli on 1.2.2 With DQ Processor

I discovered the following when tracking a test failure on a DQ processor. When passing a larger modulus and at least 8 input elements to the EltwiseReduceMod function, incorrect results are returned.

I wrote a quick test in the test-eltwise-reduce-mod.cpp file to replicate the issue:

TEST(EltwiseReduceMod, LargeModError) {
  uint64_t num = 8;
  std::vector<uint64_t> op;
  for(uint64_t i = 0; i < num; i++) op.push_back(124498721298790);
  std::vector<uint64_t> exp_out;
  for(uint64_t i = 0; i < num; i++) exp_out.push_back(253924022517);
  std::vector<uint64_t> result;
  for(uint64_t i = 0; i < num; i++) result.push_back(0);

  const uint64_t modulus = 1099511480321;
  const uint64_t input_mod_factor = modulus;
  const uint64_t output_mod_factor = 1;
  EltwiseReduceMod(result.data(), op.data(), op.size(), modulus,
                   input_mod_factor, output_mod_factor);
  CheckEqual(result, exp_out);
}

When compiling and executing the above on my computer with no processor enhancements, it passes, but with a DQ processor it fails. For our testing we were simply using an AWS c5.4xlarge instance. I've tested on an IFMA processor and the test passes where.

We built the project using the flags: -DCMAKE_BUILD_TYPE=Debug -DHEXL_TESTING=ON.

HEXL slowing down some operations in Microsoft SEAL benchmarks

Hello HEXL team,

We are looking into enabling HEXL for Microsoft SEAL. To estimate whether we would see a performance increase I launched an AWS c5.large instance (running Amazon Linux 2). It has avx512dq but not IFMA, so the HEXL paper led me to expect some performance improvements. Here's the lscpu output:

$ sudo lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping:            4
CPU MHz:             3400.103
BogoMIPS:            6000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            25344K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke

I downloaded SEAL and built its current main branch (commit 97e4b8d). First I built without HEXL and ran SEAL's built in performance tests for BFV (i.e. running ./build/bin/sealexamples and choosing option 7, then 1), getting the following results as a baseline:

+--------------------------------------------------------------------------+
|         BFV Performance Test with Degrees: 4096, 8192, and 16384         |
+--------------------------------------------------------------------------+
/
| Encryption parameters :
|   scheme: BFV
|   poly_modulus_degree: 4096
|   coeff_modulus size: 109 (36 + 36 + 37) bits
|   plain_modulus: 786433
\

Generating secret/public keys: Done
Generating relinearization keys: Done [2181 microseconds]
Generating Galois keys: Done [44914 microseconds]
Running tests .......... Done

Average batch: 93 microseconds
Average unbatch: 81 microseconds
Average encrypt: 1509 microseconds
Average decrypt: 413 microseconds
Average add: 14 microseconds
Average multiply: 4052 microseconds
Average multiply plain: 615 microseconds
Average square: 2902 microseconds
Average relinearize: 845 microseconds
Average rotate rows one step: 861 microseconds
Average rotate rows random: 2911 microseconds
Average rotate columns: 858 microseconds
Average serialize ciphertext: 13 microseconds
Average compressed (ZLIB) serialize ciphertext: 11545 microseconds
Average compressed (Zstandard) serialize ciphertext: 1642 microseconds

/
| Encryption parameters :
|   scheme: BFV
|   poly_modulus_degree: 8192
|   coeff_modulus size: 218 (43 + 43 + 44 + 44 + 44) bits
|   plain_modulus: 786433
\

Generating secret/public keys: Done
Generating relinearization keys: Done [12411 microseconds]
Generating Galois keys: Done [287076 microseconds]
Running tests .......... Done

Average batch: 147 microseconds
Average unbatch: 163 microseconds
Average encrypt: 4271 microseconds
Average decrypt: 1472 microseconds
Average add: 63 microseconds
Average multiply: 15933 microseconds
Average multiply plain: 2569 microseconds
Average square: 11515 microseconds
Average relinearize: 4309 microseconds
Average rotate rows one step: 4367 microseconds
Average rotate rows random: 16938 microseconds
Average rotate columns: 4332 microseconds
Average serialize ciphertext: 37 microseconds
Average compressed (ZLIB) serialize ciphertext: 42170 microseconds
Average compressed (Zstandard) serialize ciphertext: 1738 microseconds

/
| Encryption parameters :
|   scheme: BFV
|   poly_modulus_degree: 16384
|   coeff_modulus size: 438 (48 + 48 + 48 + 49 + 49 + 49 + 49 + 49 + 49) bits
|   plain_modulus: 786433
\

Generating secret/public keys: Done
Generating relinearization keys: Done [83935 microseconds]
Generating Galois keys: Done [2136795 microseconds]
Running tests .......... Done

Average batch: 274 microseconds
Average unbatch: 346 microseconds
Average encrypt: 14563 microseconds
Average decrypt: 6018 microseconds
Average add: 295 microseconds
Average multiply: 67476 microseconds
Average multiply plain: 10846 microseconds
Average square: 50479 microseconds
Average relinearize: 26285 microseconds
Average rotate rows one step: 26960 microseconds
Average rotate rows random: 127576 microseconds
Average rotate columns: 27025 microseconds
Average serialize ciphertext: 319 microseconds
Average compressed (ZLIB) serialize ciphertext: 181093 microseconds
Average compressed (Zstandard) serialize ciphertext: 6015 microseconds

Then I recompiled with -DSEAL_USE_INTEL_HEXL=ON and repeated the process and got the following:

+--------------------------------------------------------------------------+
|         BFV Performance Test with Degrees: 4096, 8192, and 16384         |
+--------------------------------------------------------------------------+
/
| Encryption parameters :
|   scheme: BFV
|   poly_modulus_degree: 4096
|   coeff_modulus size: 109 (36 + 36 + 37) bits
|   plain_modulus: 786433
\

Generating secret/public keys: Done
Generating relinearization keys: Done [2114 microseconds]
Generating Galois keys: Done [42659 microseconds]
Running tests .......... Done

Average batch: 222 microseconds
Average unbatch: 42 microseconds
Average encrypt: 1183 microseconds
Average decrypt: 902 microseconds
Average add: 5 microseconds
Average multiply: 3669 microseconds
Average multiply plain: 581 microseconds
Average square: 2765 microseconds
Average relinearize: 738 microseconds
Average rotate rows one step: 560 microseconds
Average rotate rows random: 2216 microseconds
Average rotate columns: 556 microseconds
Average serialize ciphertext: 15 microseconds
Average compressed (ZLIB) serialize ciphertext: 11515 microseconds
Average compressed (Zstandard) serialize ciphertext: 1683 microseconds

/
| Encryption parameters :
|   scheme: BFV
|   poly_modulus_degree: 8192
|   coeff_modulus size: 218 (43 + 43 + 44 + 44 + 44) bits
|   plain_modulus: 786433
\

Generating secret/public keys: Done
Generating relinearization keys: Done [10650 microseconds]
Generating Galois keys: Done [238692 microseconds]
Running tests .......... Done

Average batch: 310 microseconds
Average unbatch: 69 microseconds
Average encrypt: 3035 microseconds
Average decrypt: 3722 microseconds
Average add: 23 microseconds
Average multiply: 13892 microseconds
Average multiply plain: 2535 microseconds
Average square: 10526 microseconds
Average relinearize: 3145 microseconds
Average rotate rows one step: 2781 microseconds
Average rotate rows random: 11008 microseconds
Average rotate columns: 2742 microseconds
Average serialize ciphertext: 38 microseconds
Average compressed (ZLIB) serialize ciphertext: 42019 microseconds
Average compressed (Zstandard) serialize ciphertext: 1928 microseconds

/
| Encryption parameters :
|   scheme: BFV
|   poly_modulus_degree: 16384
|   coeff_modulus size: 438 (48 + 48 + 48 + 49 + 49 + 49 + 49 + 49 + 49) bits
|   plain_modulus: 786433
\

Generating secret/public keys: Done
Generating relinearization keys: Done [68194 microseconds]
Generating Galois keys: Done [1699540 microseconds]
Running tests .......... Done

Average batch: 578 microseconds
Average unbatch: 145 microseconds
Average encrypt: 9196 microseconds
Average decrypt: 15750 microseconds
Average add: 168 microseconds
Average multiply: 60583 microseconds
Average multiply plain: 10694 microseconds
Average square: 47020 microseconds
Average relinearize: 17176 microseconds
Average rotate rows one step: 16948 microseconds
Average rotate rows random: 80074 microseconds
Average rotate columns: 16925 microseconds
Average serialize ciphertext: 307 microseconds
Average compressed (ZLIB) serialize ciphertext: 181309 microseconds
Average compressed (Zstandard) serialize ciphertext: 5996 microseconds

We do see noticeable performance gains for most operations, but batching and decryption are taking >100% more time. Is there any guidance you can offer on why this might be the case, and if there's anything we can do to address it?

Thanks.

How does hexl performs against NFLLib?

I was wondering if your benchmarks could be tweaked a little to show the performance against NFLLib. It seems like NFLLib is more popular and the results would give some clarifications on which one to choose.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.