Coder Social home page Coder Social logo

Comments (1)

mratsim avatar mratsim commented on June 20, 2024 2

The current state-of-the-art is based on either Bernstein-Yang or Pornin's GCD.

Overview

Modular inverse can be computed either via Fermat's little theorem or a variant of Extended Euclid Algorithm to solve GCD (for example using binary shift/add/sub instead the canonical with quotients/division)

Binary Extended Euclid Algorithm (bEEA)

I'll use the algorithm by Niels Möller as the basis of my analysis (Niles is author of the GNU Nettle cryptographic library and a significant contributor to GMP, in particular the constant-time modular inverse).

Algorithm 5 in
Fast Software Polynomial Multiplication on ARM Processors Using the NEON Engine
Danilo Camara, Conrado P. L. Gouvea, Julio Lopez, and Ricardo Dahab
https://link.springer.com/content/pdf/10.1007%2F978-3-642-40588-4_10.pdf

Input: integer x, odd integer n, x < n
Output: x−1 (mod n)
1:   function ModInv(x, n)
2:   (a, b, u, v) ← (x, n, 1, 1)
3:   ℓ ← ⌊log2 n⌋ + 1            ⮚ number of bits in n
4:   for i ← 0 to 2ℓ − 1 do
5:     odd ← a & 1
6:     if odd and a ≥ b then
7:       a ← a − b
8:     else if odd and a < b then
9:       (a, b, u, v) ← (b − a, a, v, u)
10:    a ← a >> 1
11:    if odd then u ← u − v
12:    if u < 0 then u ← u + n
13:    if u & 1 then u ← u + n
14:    u ← u >> 1
15:  return v

image

Constant-time algorithms make it easier to analyze speed because they always take the worst case time. As you can see, the worst case grows with iteration of 2l with l the bitsize of the modulus so 254 for BN254/BN256/alt_bn128.

Fermat's Little Theorem (FLT)

Using Little Fermat's Theorem, we can use exponentiation by p-2 and rely on a highly optimized modular multiplication.
However multiplication has an asymptotic complexity of O(n²) meaning as the prime modulus gets bigger, FLT gets slower relative to bEEA.

Addition chains can reduce the number of multiplications but not of squaring, In particular BN254 has a quite high hamming weight for a cryptographic prime.

Benching my old transition (2020) to addition chains, the speedup was only 30% (mratsim/constantine#80 )
image

The bottleneck

Both traditional EEA and FLT have a bottleneck, each operations are done on the full integer width (254 bits).

The big innovation in Bernstein-Yang and Pornin is that instead of working on the full integer width, you only work on:

  • the bottom 62 bits (Bernstein-Yang)
  • or top 31 and bottom 31 bits (Pornin)
    and those are enough to make all the if/else branch decisions of EEA.

Then you propagate those transitions based on 62 or 31/31 on the full integer width (254 bits) once every 31 or 62 iterations.
Hence, they are asymptotically about 254/62 = 4.1x faster.

References

For implementation details and gotchas, see mratsim/constantine#172 (comment)

Pornin's writeup: https://research.nccgroup.com/2020/09/28/faster-modular-inversion-and-legendre-symbol-and-an-x25519-speed-record/

Bernstein-Yang inversion:

Pornin's inversion:

Benchmarks

Both 255-bit Pornin's constant-time inversion from BLST and 255-bit BY's inversion from Constantine are in the 1100 to 1300ns on my machine:
image

For BN254 specifically, Bernstein-Yang variable-time
image

And Gnark's Pornin inversion (variable-time):
image

Note that BLST uses Assembly and Constantine/Gnark do not.

Legendre symbol

Both BY and Pornin's GCD method can be adapted to compute the Legendre symbol, see:

Legendre symbol is a bottleneck in Hashing to curve of BN254 (#47)

from halo2curves.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.