Coder Social home page Coder Social logo

(Rust binding) Repeated invocation of EltwiseFMAModAVX512 (with different data) in loop has unexpected performance regression about hexl HOT 5 OPEN

intel avatar intel commented on August 27, 2024
(Rust binding) Repeated invocation of EltwiseFMAModAVX512 (with different data) in loop has unexpected performance regression

from hexl.

Comments (5)

joserochh avatar joserochh commented on August 27, 2024

Hello @Janmajayamall. Unfortunately I no longer have the machines to run HEXL at full (Using AVX512). I can tell you modular reduction works different depending on BitShift variable.

Look at functions on fma_mod that depends on BitShift here: https://github.com/intel/hexl/blob/development/hexl/eltwise/eltwise-fma-mod-avx512.cpp

BitShift definition happens here https://github.com/intel/hexl/blob/development/hexl/eltwise/eltwise-fma-mod.cpp

Would you have the same behavior using logq = 48 or 46? just curious.

Regards,
José Rojas

from hexl.

faberga avatar faberga commented on August 27, 2024

Hi @Janmajayamall,

You mentioned that you are trying to use the Intel Advanced Vector Extensions 512 Integer Fused Multiply Add (AVX512-IFMA52) instructions. These were introduced in the 3rd Gen Intel® Xeon® Scalable Processors (and onwards), so checking which CPU manufacturer and type you are using will be important.

The AVX512-IFMA52 should only be used for primes below 50–52 bits, assuming it suffices for your computation.

For more information on how HEXL uses the AVX512-IFMA52, please refer to:
https://www.intel.com/content/www/us/en/developer/articles/technical/introducing-intel-hexl.html

and

https://arxiv.org/pdf/2103.16400.pdf

Regards,
Flavio

from hexl.

faberga avatar faberga commented on August 27, 2024

@Janmajayamall
A description of the AVX512-IFMA52 intrinsics can be found here: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#avx512techs=AVX512IFMA52&cats=Arithmetic

from hexl.

Janmajayamall avatar Janmajayamall commented on August 27, 2024

Would you have the same behavior using logq = 48 or 46? just curious.

modulus/elwise_fma_mod_2d/n=32768/logq=48/mod_size=1
                        time:   [9.2788 µs 9.3188 µs 9.3523 µs]

modulus/elwise_fma_mod_2d/n=32768/logq=48/mod_size=3
                        time:   [28.762 µs 28.882 µs 28.987 µs]

modulus/elwise_fma_mod_2d/n=32768/logq=48/mod_size=5
                        time:   [76.355 µs 76.662 µs 76.946 µs]

modulus/elwise_fma_mod_2d/n=32768/logq=48/mod_size=15
                        time:   [273.11 µs 276.14 µs 279.43 µs]

Yeah it behaves same for logq=48 and can confirm same for logq=46.

I don't suspect that this is due to calling code from rust (but will still compare by implementing same in C++).

If I understand correctly the line here sets Bitshift value to 52 and uses IFMA, right?

You mentioned that you are trying to use the Intel Advanced Vector Extensions 512 Integer Fused Multiply Add (AVX512-IFMA52) instructions. These were introduced in the 3rd Gen Intel® Xeon® Scalable Processors (and onwards), so checking which CPU manufacturer and type you are using will be important.

I am using C3 machine on GCP (4th Gen Intel Xeon Scalable processor) that supports AVX512-IFMA. I don't think there are additional configs I need to enable for hexl, or am I missing something?

I am curious whether you have some ideas around what can cause this?

Thanks!

from hexl.

faberga avatar faberga commented on August 27, 2024

Hi @Janmajayamall

The 4th Gen Intel Xeon Scalable processor does support AVX512-IFMA instructions. But, just in case, assuming you are using Linux, can you check with the command "lscpu".

As far as how to make use of HEXL in an FHE library, I would suggest you study the integration of HEXL with MS SEAL and/or with OpenFHE.

from hexl.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.