Comments (1)
The current state-of-the-art is based on either Bernstein-Yang or Pornin's GCD.
Overview
Modular inverse can be computed either via Fermat's little theorem or a variant of Extended Euclid Algorithm to solve GCD (for example using binary shift/add/sub instead the canonical with quotients/division)
Binary Extended Euclid Algorithm (bEEA)
I'll use the algorithm by Niels Möller as the basis of my analysis (Niles is author of the GNU Nettle cryptographic library and a significant contributor to GMP, in particular the constant-time modular inverse).
Algorithm 5 in
Fast Software Polynomial Multiplication on ARM Processors Using the NEON Engine
Danilo Camara, Conrado P. L. Gouvea, Julio Lopez, and Ricardo Dahab
https://link.springer.com/content/pdf/10.1007%2F978-3-642-40588-4_10.pdf
Input: integer x, odd integer n, x < n
Output: x−1 (mod n)
1: function ModInv(x, n)
2: (a, b, u, v) ← (x, n, 1, 1)
3: ℓ ← ⌊log2 n⌋ + 1 ⮚ number of bits in n
4: for i ← 0 to 2ℓ − 1 do
5: odd ← a & 1
6: if odd and a ≥ b then
7: a ← a − b
8: else if odd and a < b then
9: (a, b, u, v) ← (b − a, a, v, u)
10: a ← a >> 1
11: if odd then u ← u − v
12: if u < 0 then u ← u + n
13: if u & 1 then u ← u + n
14: u ← u >> 1
15: return v
Constant-time algorithms make it easier to analyze speed because they always take the worst case time. As you can see, the worst case grows with iteration of 2l
with l the bitsize of the modulus so 254 for BN254/BN256/alt_bn128.
Fermat's Little Theorem (FLT)
Using Little Fermat's Theorem, we can use exponentiation by p-2 and rely on a highly optimized modular multiplication.
However multiplication has an asymptotic complexity of O(n²) meaning as the prime modulus gets bigger, FLT gets slower relative to bEEA.
Addition chains can reduce the number of multiplications but not of squaring, In particular BN254 has a quite high hamming weight for a cryptographic prime.
Benching my old transition (2020) to addition chains, the speedup was only 30% (mratsim/constantine#80 )
The bottleneck
Both traditional EEA and FLT have a bottleneck, each operations are done on the full integer width (254 bits).
The big innovation in Bernstein-Yang and Pornin is that instead of working on the full integer width, you only work on:
- the bottom 62 bits (Bernstein-Yang)
- or top 31 and bottom 31 bits (Pornin)
and those are enough to make all the if/else branch decisions of EEA.
Then you propagate those transitions based on 62 or 31/31 on the full integer width (254 bits) once every 31 or 62 iterations.
Hence, they are asymptotically about 254/62 = 4.1x faster.
References
For implementation details and gotchas, see mratsim/constantine#172 (comment)
Pornin's writeup: https://research.nccgroup.com/2020/09/28/faster-modular-inversion-and-legendre-symbol-and-an-x25519-speed-record/
Bernstein-Yang inversion:
- References:
- Original Bernstein-Yang paper, https://eprint.iacr.org/2019/266
- Formal verification by Hvass-Aranha-Spitters, https://eprint.iacr.org/2021/549
- Implementations:
- In formally-verified fiat-crypto:
- In Relic by Aranha: https://github.com/relic-toolkit/relic/blob/6d29b27/src/fp/relic_fp_inv.c#L553
- In Bitcoin's secp256k1:
- In Constantine: https://github.com/mratsim/constantine/blob/151f284/constantine/math/arithmetic/limbs_exgcd.nim
Pornin's inversion:
- References:
- Discussion on SIMD optimization supranational/blst#62
- Implementations:
Benchmarks
Both 255-bit Pornin's constant-time inversion from BLST and 255-bit BY's inversion from Constantine are in the 1100 to 1300ns on my machine:
For BN254 specifically, Bernstein-Yang variable-time
And Gnark's Pornin inversion (variable-time):
Note that BLST uses Assembly and Constantine/Gnark do not.
Legendre symbol
Both BY and Pornin's GCD method can be adapted to compute the Legendre symbol, see:
- Relic for BY: https://github.com/relic-toolkit/relic/blob/ddd1984/src/fp/relic_fp_smb.c
- Constantine for BY: https://github.com/mratsim/constantine/blob/151f284/constantine/math/arithmetic/limbs_exgcd.nim#L596-L639
- BLST for Pornin: https://github.com/supranational/blst/blob/689c11c/src/no_asm.h#L996-L1060
- https://github.com/pornin/x25519-cm0#legendre-symbol
Legendre symbol is a bottleneck in Hashing to curve of BN254 (#47)
from halo2curves.
Related Issues (20)
- Different output of bn256::G1::hash_to_curve between vanilla & asm branches HOT 11
- Refactor `BN` curves. HOT 1
- about serialization of bn254 curve point HOT 10
- MSRV broken again in CI by dependencies HOT 1
- Add function in new_curve_impl has a wrong result HOT 1
- Hitting import issues on x86_64-unknown-linux-gnu from maybe_rayon crate HOT 4
- Add more info in regards impossibility to compile the lib with `multicore` feature to `wasm` targets HOT 1
- Remove `multicore` feature from `msm` bench HOT 1
- Enabling serialization feature in `pasta_curves`
- subtle v2.4 causes compilation errors HOT 1
- Add typos tool to CI to automate typo detection
- Re-define & unify the serialization for all primitives here and in any downstream crates
- Support various hash functions at `expand_message` part of hashing to curve
- Bad subgroup check or cofactor clearing in G2 pluto
- Continue `multiexp_serial` skips doubling when all bits are zero.
- Cyclone MSM panic HOT 4
- Fix error in WithSmallOrderMulGroup impl for Bn254::Fr HOT 2
- Duplicated `mul_by_nonresidue` function in `Bn256` field extensions HOT 1
- Simplify `frobenius_map` in `bn256` HOT 1
- New crate release HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from halo2curves.