toposware / cheetah Goto Github PK

View Code? Open in Web Editor NEW

38.0 7.0 3.0 729 KB

A STARK-friendly elliptic curve defined over a sextic extension of a small prime field.

License: Apache License 2.0

Rust 100.00%

rust curve cryptography elliptic-curve

cheetah's People

Contributors

Stargazers

Watchers

Forkers

binadamu-isiyoonekana mfaulk rajeshrk18

cheetah's Issues

Add custom implementation of double() and square()

The Fp struct currently implements double() simply as self.add(self) and square() as self.mul(self).
The Scalar struct also implements double() as self.add(self).

A custom implementation could be beneficial, in particular as Fp6 operations are naturally slower than their FieldElement counter-part of the stark-curve repo.

Lookup tables with AffinePoint for Projective scalar multiplication

Mixed addition (AffinePoint + ProjectivePoint) is about 88% faster than regular ProjectivePoint addition.
This could lead to beneficial gains in scalar multiplication, by storing in the LookupTable multiples of the base point in Affine coordinates, rather than Projective coordinates. This can be done efficiently by creating first the array of ProjectivePoint multiples, and then doing a batch normalization to convert them all in Affine coordinates with 1 field inversion.

Unfortunately, we currently have a relatively expensive Fp6 inversion which makes this approach only beneficial for scalar multiplication with the hardcoded basepoint. But it may be worth investigating, and keep it under the hood in case inversion is improved.

Custom implementation of group operations

We currently rely on generic algorithms for computing curve group operations, with the only benefit of ignoring multiplications by a4 as it is equal to 1 in our setting (Weierstrass form y^2 = x^3 + x + B).

Having group algorithms more "tailored" for our specific curve could be beneficial.

Use towered extension internally

The latest version of cheetah uses a direct sextic extension Fp -> Fp6 defined as Fp[X] / (X^6 - 7), where 7 is a quadratic and cubic non-residue.

The current cost of multiplication in Fp6 is:

36 Fp multiplications
5 integer (u128) multiplications
30 integer (u128) additions
6 reductions of u96 (stored as u128) into Fp elements

For squaring in Fp6, it is:

15 Fp multiplications
6 Fp squarings
5 integer (u128) multiplications
15 integer (u128) additions
7 integer (u128) left shifts
6 reductions of u96 (stored as u128) into Fp elements

We may benefit from using a towered extension Fp -> Fp2 -> Fp6 internally. Inputs / outputs would still be given in canonical form of the direct sextic extension. For this, we have the correspondence between:
a in Fp6\Fp: a0 + a1.alpha + a2.alpha^2 + a3.alpha^3 + a4.alpha^4 + a5.alpha^5
and
b in Fp6\Fp2: (b00 + b01.beta) + (b10 + b11.beta).delta + (b20 + b21.beta).delta^2

In particular, if we take alpha^6 = delta^2 = beta^3 = 7 (i.e, Fp2 = Fp[X] / (X^2 - t) and Fp6\Fp2 = Fp2[X] / (X^3 - t)), then we can rewrite b as
b = b00 + t.b21.alpha + b10.alpha^2 + b01.alpha^3 + b20.alpha^4 + b11.alpha^5

Inversely, a can be rewritten as:
a = (a0 + a3.beta) + (a2 + a5.beta).delta + (a4 + 1/t.a1.beta).delta^2

What's important is that, given the 6 coordinates of an Fp6 element, conversion between the direct extension form and the towered extension form is almost costless. This may make towered extension approach interesting for:

reducing the number of operations in multiplication and squaring (by combination of Schoolbook / Karatsuba as was done before, perhaps other methods?)
allowing for much faster Fp6 inversion (the current implementation is not optimal, but the structure of the direct sextic extension imposes too much Fp operations, which makes it about twice slower than what it could be with towered construction). This would be helpful for #14.

Bring back Montgomery form?

We were originally using Montgomery form for representing base field and scalar field elements, which has then be removed in favour of a canonical encoding for the base prime field, to match the implementation on winterfell and miden.
However, if we stay away from the latter, we may want to go back to Montgomery form and use the same wrapping approach in winterfell for our own version of f64. While Fp6 multiplication and squaring may be slightly slower than it currently is, addition/subtraction/doubling should be faster. In addition, we wouldn't have to deal with canonicalization of elements before feeding them into a Hasher for instance, as all element representations are unique in Montgomery form.

Custom squaring in Fp

Squaring in Fp currently relies on the underlying Fp multiplication. We may want to investigate custom squaring possibilities to see if they bring any significant speed-up.

Add w-NAF scalar multiplication

In the continuation of #1, it may be interesting to include support for w-NAF scalar decomposition for faster scalar multiplication of group elements.

Integrate conditional winterfell-related feature?

Within winterfell, we're currently relying on a wrapper of the cheetah Fp type to implement the FieldElement and related Traits.
Because of the trait declaration of some methods in the Air trait, we cannot directly benefit from the speed improvement of toposware/hash#4 in our state-transition AIR program in certificate-stark (except from outside the circuit, i.e. when building the trace but this is almost negligible with respect to the rest of the prover computations).

Having the Fp struct implement directly FieldElement, StarkField, ExtensibleField behind a condition flag like #cfg[feature("air")] would reduce the overhead in type conversion, make the usage easier and allow for the kind of optimizations we benefit from the above mentioned PR for instance.

Introduce other point coordinate systems

We currently have only Affine and Projective coordinates logic for the curve arithmetic.
Being able to support other point coordinate systems may bring some performance benefits, for instance in the schnorr-sig crate.

For instance Jacobian coordinates are known for yielding relatively fast point doubling methods (about 2/3 of the running time of a point doubling in Projective coordinates) for a slightly slower addition algorithm. However, the use of Straus-Shamir's trick in signature verification reduces the impact of such change (as we have now N doublings and 2N additions).

Jacobian coordinates variants (like Chudnovsky) could also be investigated, though at the cost of a heavier point representation (240B instead of 144B for Jacobian/Projective).

Faster squaring in Fp

This paper introduces a faster way than Tonelli-Shanks to compute the square root of a field element if the field characteristic is highly 2-adic.
According to the table 1 page 12, the number of field operations in the case of Fp (with 2-adicity 32) would go from 311 to 134.5 when switching algorithm.

Faster Fp^6 square root

We currently use a constant-time version of Tonelli-Shanks for p^6 = 1 mod 16, which is quite heavy (including a whole Fp^6 exponentiation).
Implementing the Algorithm 10 from https://eprint.iacr.org/2012/685.pdf may yield significant improvement, and would be beneficial for point decompression.

Pacify clippy

The recent stable toolchain introduces noisy lint errors, mostly related to needless borrows.

Add hashing to curve

The current generator point, while not random (as coming from the "Topos" encoded string), still suffers from a non-standard generation procedure. In addition, having a hash-to-curve algorithm would be a great add-on as it can serve multiple purposes.

Following the draft-cfrg-hash-to-curve (section 6.6 p26), it seems that SWU would be the one to go for. The generator point would then be regenerated, following the proper hash-to-curve implementation.

Speed-up scalar multiplication in the group

Currently the scalar multiplication is done through a simple double-and-add algorithm, skipping the first MSB that is always zero.
We would gain from having other, more efficient methods implemented (not necessarily limited to only 1), among which:

variable based scalar mult
Shamir's simulteneous scalar mult (Algorithm 3.48, originally from Straus - can be generalized to more than 2 points, and either constant time with fixed window or variable time with sliding window. Could be beneficial in the Schnorr signature for computing h.P + r.G.)
Pippenger's algorithm

More here:

Generalization of Shamir's trick for N points

It could be nice to generalize the Shamir's trick for N points. It is currently only available for two points (and optimized if one is the base point).

Scalar reveals the secret value

Scalar reveals the limbs as the elements of Fixed size array implements Copy trait. Whenever they have to be moved, limbs are copied thus leaving the original array in memory until it is overwritten.

I have to change the scope of limbs to find this bug.

I have reproduced the issue here: https://gist.github.com/RajeshRk18/24b2f273ea689786c3bb5a19990c6166

Expected Behaviour

For the scalar Scalar([12, 34, 56, 78]),
when the given code is run,

Creation Address: 0x7ffe0a86a4b8 
Value before dropping: [11246162368984329641, 15720856220356058543, 2654058533781523010, 95027734021429075]
Returned Address: 0x7ffe0a86a4b8 // same location means that limbs are not moved
Value at Returned Address: [0, 0, 0, 0]
Value at Creation Address: [0, 0, 0, 0]

Actual Observed Behavior

Creation Address: 0x7ffe0a86a4b8 
Value before dropping: [11246162368984329641, 15720856220356058543, 2654058533781523010, 95027734021429075]
Returned Address: 0x7ffe0a86a470 // different location means that limbs are moved
Value at Returned Address: [0, 0, 0, 0]
Value at Creation Address: [11246162368984329641, 15720856220356058543, 2654058533781523010, 95027734021429075]

Investigate inline assembly

Now that inline assembly is stable with Rust 1.59.0 (https://blog.rust-lang.org/2022/02/24/Rust-1.59.0.html), it may be interesting to include conditional assembly feature to speed-up some arithmetic operations.