Coder Social home page Coder Social logo

j08ny / pyecsca Goto Github PK

View Code? Open in Web Editor NEW
51.0 8.0 15.0 18.3 MB

Python Elliptic Curve Side-Channel Analysis toolkit.

Home Page: https://neuromancer.sk/pyecsca/

License: MIT License

Makefile 0.33% Python 99.67%
side-channel-attacks elliptic-curve-cryptography side-channel ecc sca

pyecsca's Introduction

docs License MIT Test Lint Codecov DeepSource

Python Elliptic Curve cryptography Side-Channel Analysis toolkit.

For more info, see the docs.

Functionality

pyecsca aims to fill a gap in SCA tooling for Elliptic Curve Cryptography, it focuses on black-box implementations of ECC and presents a way to extract implementation information about a black-box implementation of ECC through side-channels. The main goal of pyecsca is to be able to reverse engineer the curve model, coordinate system, addition formulas, scalar multiplier and even finite-field implementation details.

It currently provides:

pyecsca consists of three packages:

Requirements

pyecsca contains data from the Explicit-Formulas Database by Daniel J. Bernstein and Tanja Lange. The data was partially changed, to make working with it easier. It is available on Github at crocs-muni/efd.

It uses ChipWhisperer as one of its targets. It also supports working with Riscure Inspector trace sets, which are of a proprietary format.

Testing & Development

See the Makefile for tests, performance measurement, codestyle and type checking commands. Use black for code-formatting.

Docs

License

MIT License

Copyright (c) 2018-2024 Jan Jancar

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Thanks alot to contributors: Tomas Jusko, Andrej Batora, Vojtech Suchanek and to ChipWhisperer/NewAE.

Development was supported by the Masaryk University grant MUNI/C/1701/2018.

pyecsca's People

Contributors

andrr3j avatar deepsource-autofix[bot] avatar deepsourcebot avatar j08ny avatar tomko10 avatar vojtechsu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyecsca's Issues

Fix broken Montgomery reduction in codegen

Currently Montgomery reduction is functionally broken in pyecsca-codegen.
The following is a reproducer:

builder build --platform HOST --red BARRETT shortw projective add-2016-rcb dbl-2016-rcb "ltr()" .
echo "Barrett"
client --platform HOST --fw ./pyecsca-codegen-HOST.elf shortw projective gen secg/secp128r1

builder build --platform HOST --red MONTGOMERY shortw projective add-2016-rcb dbl-2016-rcb "ltr()" .
echo "Montgomery"
client --platform HOST --fw ./pyecsca-codegen-HOST.elf shortw projective gen secg/secp128r1

builder build --platform HOST --red BASE shortw projective add-2016-rcb dbl-2016-rcb "ltr()" .
echo "Base"
client --platform HOST --fw ./pyecsca-codegen-HOST.elf shortw projective gen secg/secp128r1

prints

Barrett
(162938999268550597445809790209592423458, Point([[x=111810799217268384317536017529141796945, y=309320541414531923178173772704935971498]] in AffineCoordinateModel("affine" on short Weierstrass curves)))
0.015533685684204102
Montgomery
(162938999268550597445809790209592423458, Point([[x=207606955816995729794136774524546058183, y=283722461722771654824977452306297906912]] in AffineCoordinateModel("affine" on short Weierstrass curves)))
0.01282811164855957
Base
(162938999268550597445809790209592423458, Point([[x=111810799217268384317536017529141796945, y=309320541414531923178173772704935971498]] in AffineCoordinateModel("affine" on short Weierstrass curves)))
0.01994013786315918

The floats are not important (timings) but the generated point is different even if the private key is the same (the RNG state is constant).

To solve this we likely need a simpler test case in the codegen repo that demonstrates the issue, it is likely somewhere in encoding or decoding of the Montgomery representation.

Investigate trace data streaming on CPU/GPU

Currently, if not using the HDF5 trace set "inplace" functionality, all trace data is loaded into memory where it is operated on. This puts a limit on the size of the traceset. Using HDF5 this loading into memory can be delayed somewhat, but will likely happen when the data is read or computed on. Perhaps HDF5 or some other method of streaming the data could be used to allow operating on large tracesets both on the CPU/GPU as it seems that trace set size is the major bottleneck.

Fully symbolic formula evaluation fails on assumptions

Traceback (most recent call last):
  File "<ipython-input-15-689bb4a323c5>", line 7, in <module>
    res = formula(p, PmQ, P, Q, **{"a": SymbolicMod(a, p), "d": SymbolicMod(d, p)})[0]
  File ".../pyecsca/pyecsca/ec/formula.py", line 207, in __call__
    self.__validate_assumptions(field, params)
  File ".../pyecsca/pyecsca/ec/formula.py", line 160, in __validate_assumptions
    expr = expr.subs(curve_param, k(value))
  File "/usr/lib/python3.9/site-packages/sympy/polys/domains/domain.py", line 387, in __call__
    return self.new(*args)
  File "/usr/lib/python3.9/site-packages/sympy/polys/domains/domain.py", line 378, in new
    return self.dtype(*args)
  File "/usr/lib/python3.9/site-packages/sympy/polys/domains/modularinteger.py", line 29, in __init__
    self.val = self.dom.convert(val) % self.mod
  File "/usr/lib/python3.9/site-packages/sympy/polys/domains/domain.py", line 463, in convert
    raise CoercionFailed("can't convert %s of type %s to %s" % (element, type(element), self))
sympy.polys.polyerrors.CoercionFailed: can't convert a of type <class 'pyecsca.ec.mod.SymbolicMod'> to ZZ

Consider using different bignum library for constant-time execution

fiat-crypto could be used to generate known constant-time and correct finite-field arithmetic for selected primes which could the be used in the codegen subpackage. Using fiat-crypto instead of the current libtommath would help for analysis of the generated implementations as libtommath is not constant-time, leading to misalignment in the traces, whereas fiat-crypto is constant-time.

Doing this would mean some changes to the codegen process, as generated implementations were assumed to be curve-generic (a curve is set at runtime), while fiat-crypto needs the prime to generate the implementation. Also, fiat-crypto only implements finite-field
arithmetic and not generic big-numbers, so implementing ECDH/ECDSA/the current codegen target would need to be investigated.

Handle formula assumptions

Several formulas in efd have assumptions, they also sometimes introduce parameters that are used in the rest of the formula:

parameter s
assume s = (1+r)/(1-r)
parameter r2
assume r2 = 2*r
assume Z1 = 1

The assumptions are of two types generally. Assumptions to assign a value to a new formula parameter (assume r2 = 2*r) and assumptions to check that something holds about the inputs to the formula (assume Z1 = 1). We should handle these correctly.

The formula parameter assignment should be executed on each formula execution (but there might be a way to "pre-bake" a formula such that these parameters are computed only once for a given curve for it).

The input check should be performed on each formula call and should throw an exception if it fails. This exception should be suppressible by a configuration option (to allow intentionally breaking assumptions to simulate behavior).

Unify TraceSet API and user experience

Currently the different TraceSets have quite different API, with the h5py offering the most. This API should be somehow unified, like having functions for adding traces, removing them, reordering them (so it is a TraceList actually?).

Implement correlation on GPU

Two things are meant under this, (Pearson's) correlation coefficient computation for CPA and cross-correlation computation for alignment and other uses. These are two quite different algorithms, grouped here because both are interesting and use the word "correlation".

TODO

  • Pearson's correlation coefficient
  • Cross-correlation

Pearson's correlation coefficient

For CPA: https://www.iacr.org/archive/ches2004/31560016/31560016.pdf
where it is used to compute (samplewise) correlation coefficient between hypothesized leakage values and power leakage.

Implemented on GPU here: https://arxiv.org/abs/1412.7682
also here: https://link.springer.com/chapter/10.1007/978-3-642-27257-8_16
also here: https://dl.acm.org/doi/10.1145/2611765.2611775
also here: https://www.semanticscholar.org/paper/Improving-DPA-analysis-with-distributed-computing-Amaral-Inesc-Id/43c150d1dd42a2327e302202784d2b685347e7a4
CPU work here: https://crypto.fit.cvut.cz/sites/default/files/publications/fulltexts/pearson.pdf
Some interesting work: https://informatik.rub.de/wp-content/uploads/2021/11/MA_Klostermann.pdf

More resources:

Cross-correlation

For alignment you take a reference trace and cross-correlate every trace from the trace set with it, then find the peak
value of the result and use that as a shift to align the trace.

Integer constants in formula pollute the result coordinates

Hi,

multiplying a point times the integer one gives float numbers as points coordinates, at least in projective coordinates.

# define the curve model
secp128r1 = get_params("secg", "secp128r1", "projective")
base = secp128r1.generator
coords = secp128r1.curve.coordinate_model
add = coords.formulas["add-1998-cmo"]
dbl = coords.formulas["dbl-1998-cmo"]
scl = coords.formulas["z"]
mult = LTRMultiplier(add,
                     dbl,
                     scl,
                     always=False,
                     complete=False,
                     short_circuit=True)

scalar = 1
mult.init(secp128r1, base)
pt = mult.multiply(1)
for _ in range(5):
    mult.init(secp128r1, base)
    k = random.getrandbits(128)
    pt = mult.multiply(k)
    mult.init(secp128r1, pt)
    a = mult.multiply(1)
    print(a)

gives

[X=2.661526430782553e+38, Y=1.1530304631181755e+38, Z=1]
[X=2.0027768112441094e+38, Y=1.209118786786195e+37, Z=1]
[X=2.8478326037726995e+38, Y=1.1501732243886246e+38, Z=1]
[X=7.545866618401884e+37, Y=2.915152976106912e+38, Z=1]

Am I doing something wrong? (apart from multiplying by one)

thanks for your work, this library is very very nice.

Investigate using JAX

JAX while focused on machine learning is a versatile tool for computing on GPUs that has a numpy-compatible API. We could investigate using it for trace operations on GPUs.

Decide on dtype conventions

Trace processing functions can (and do) change the dtype of traces they process. Not of the inputs but of the outputs. This should be at least documented and a sensible strategy should be chosen to:

  • Make sure the results make sense and have enough precision.
  • Make sure the results are not unnecessarily large/precise.
  • (Potentially) Follow similar conventions as numpy/scipy to not surprise people, as we usually just wrap some of their functionality.

NonInvertibleError in Edwards affine addition

Issue:

Traceback (most recent call last):
  File "/usr/lib/python3.9/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/lib/python3.9/unittest/case.py", line 593, in run
    self._callTestMethod(testMethod)
  File "/usr/lib/python3.9/unittest/case.py", line 550, in _callTestMethod
    method()
  File ".../pyecsca/test/ec/test_regress.py", line 60, in test_issue_10
    curve.affine_add(one, other)
  File ".../pyecsca/pyecsca/ec/curve.py", line 78, in affine_add
    return self._execute_base_formulas(self.model.base_addition, one, other)
  File ".../pyecsca/pyecsca/ec/curve.py", line 56, in _execute_base_formulas
    exec(compile(line, "", mode="exec"), None, locls)
  File "", line 1, in <module>
  File ".../pyecsca/pyecsca/ec/mod.py", line 90, in method
    return func(self, other)
  File ".../pyecsca/pyecsca/ec/mod.py", line 182, in __truediv__
    return self * ~other
  File ".../pyecsca/pyecsca/ec/mod.py", line 158, in __invert__
    return self.inverse()
  File ".../pyecsca/pyecsca/ec/mod.py", line 432, in inverse
    raise_non_invertible()
  File ".../pyecsca/pyecsca/ec/error.py", line 20, in raise_non_invertible
    raise NonInvertibleError("Element not invertible.")
pyecsca.ec.error.NonInvertibleError: Element not invertible.

Repro:

model = EdwardsModel()
coords = model.coordinates["projective"]
p = 0x1d
neutral = Point(coords, X=Mod(0, p), Y=Mod(1, p), Z=Mod(0, p))
curve = EllipticCurve(model, coords, p, neutral, {"c": Mod(1, p), "d": Mod(0x1c, p)})
one = Point(AffineCoordinateModel(model), x=Mod(13, p), y=Mod(19, p))
other = Point(AffineCoordinateModel(model), x=Mod(22, p), y=Mod(21, p))
curve.affine_add(one, other)

Make Point, DomainParameters objects picklable

Currently a Point is not picklable as it holds a reference to its coordinate model. If the model comes from EFD then it has code objects as some attributes which are not picklable. Same goes for DomainParameters and other objects holding a reference to EFD data as well as the EFD model objects themselves, perhaps most importantly.

The EFD objects could be made picklable in several ways:

  • Pickle just their name (or path, like "shortw/projective") and reconstruct them on the other end, assuming that the other end has EFD setup and can unpickle correctly.
  • Pickle them, but replace the code objects by their string representation that they were them selves built from. This may not be always possible/easy without storing more data on them.

Add performance measurement of CPU vs GPU implementations (+ stacking)

Add performance measurement of the GPU accelerated functions in StackedTraces as well as their CPU counterparts on unstacked traces.

Have deterministically generated pseudorandom data.
Have configurable number of traces and samples.
Have configurable data type (uint/float, 8/16).
Potentially have configurable data distribution (uniform/normal?, just to see if there is an effect).

Also measure the amount of time the stacking takes.

On the GPU side, ideally measure the different steps required to compute separately (i.e. getting the traces into GPU memory, computing on them, getting the result back).

Get inspired by the perf code in the tests directory, but do not use the profiler defined there.

Improve trace processing experience

During analysis, one usually starts of with a trace set (or more trace sets), then processes and combines the traces into one or more intermediary traces and finally produces some final trace for display/distinguishing/etc. During this, it is usually not necessary to keep all the intermediate traces and one is only interested in the result.

This should be made easier via some API. Not sure what the right pattern should be though.

Error in Edwards base formulas

File "...", line 96, in fullorder_generator
    P_full = params.curve.affine_add(P, P4_affine)
  File ".../pyecsca/pyecsca/ec/curve.py", line 78, in affine_add
    return self._execute_base_formulas(self.model.base_addition, one, other)
  File ".../pyecsca/pyecsca/ec/curve.py", line 56, in _execute_base_formulas
    exec(compile(line, "", mode="exec"), None, locls)
  File "", line 1, in <module>
TypeError: 'GMPMod' object is not callable

Dynamic Mod class

Add dynamic dispatch to the Mod class. Make Mod a (non-abstract) class that dispatches dynamically to either RawMod or GMPMod. Then GMPMod only registers itself as an option. This should satisfy type-checking yet still leave the Mod class a real and instantiable class.

Add leakage simulation via Rainbow

Rainbow: https://github.com/Ledger-Donjon/rainbow
can be used to simulate execution of a binary target (like ARM-Cortex M4 or some x86). The simulated execution can then be used for leakage simulation by applying a given leakage function (+ noise).

Concretely, get inspired by: https://github.com/Ledger-Donjon/rainbow/blob/master/examples/CortexM_AES/cortexm_aes.py

# Given a configuration rendered for Platform.STM32F3 at with the elf at `elf_path` this simulates execution of the main method (which cycles indefinitely waiting for serial input).
# Note that the libc __init functions should be called before, etc..
from rainbow.devices.stm32 import rainbow_stm32f215
e = rainbow_stm32f215(sca_mode=True)
e.load(elf_path, typ=".elf")

e.mem_trace = True
e.trace_regs = True

e.start(e.functions["main"], 0)

As the actual chip that ChipWhisperer targets is an STM32F303 (and not an STM32F215 as already in Rainbow), a custom device class for it might be needed. The SVD data for it can be found here and extracted from the XML (.SVD) file. More data on the memory map can be found here or also in the LinkerScript included in the hal directory for the chip.

Handle coordinate system assumptions

The coordinate systems have assumptions, like the Short Weierstrass projective-1 system which assumes that a = -1.
Most of the assumptions are of the form lhs = rhs, where lhs is a curve parameter and rhs is an expression evaluable from curve parameters and constants only. However, some assumptions in the Edwards model and yz/yzsquared coordinate systems are different. These coordinate systems introduce a coordinate parameter r and do:

name YZ coordinates with square d
parameter r
assume c = 1
assume d = r^2
variable Y
variable Z
satisfying r*y = Y/Z

Loading this fails with the error on the line that assume d is evaluated:

----> 1 get_params("other", "E-521", "yz")

.../pyecsca/ec/params.py in get_params(category, name, coords, infty)
    182         raise ValueError("Curve {} not found in category {}.".format(name, category))
    183 
--> 184     return _create_params(curve, coords, infty)

.../pyecsca/ec/params.py in _create_params(curve, coords, infty)
     97             alocals: Dict[str, Union[Mod, int]] = {}
     98             compiled = compile(assumption, "", mode="exec")
---> 99             exec(compiled, None, alocals)
    100             for param, value in alocals.items():
    101                 if params[param] != value:

.../pyecsca/ec/params.py in <module>

NameError: name 'r' is not defined

We should handle this somehow in a general way. In this case, it would mean to assign the modular square root of d to the r parameter and have this parameter in the curve object, perhaps separated from the main curve parameters.

Sympy sadly doesn't support finite fields and so it seems it will not be easy to use it.
sympy/sympy#9544

Sympy cannot workout "5/2" into GF(p)

Issue:

Traceback (most recent call last):
  File "/usr/lib/python3.9/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/lib/python3.9/unittest/case.py", line 593, in run
    self._callTestMethod(testMethod)
  File "/usr/lib/python3.9/unittest/case.py", line 550, in _callTestMethod
    method()
  File ".../pyecsca/test/ec/test_regress.py", line 50, in test_issue_9
    formula(base, **curve.parameters)
  File ".../pyecsca/pyecsca/ec/formula.py", line 160, in __call__
    poly = Poly(expr, symbols(param), domain=k)
  File ".../pyecsca/virt/lib/python3.9/site-packages/sympy/polys/polytools.py", line 162, in __new__
    return cls._from_expr(rep, opt)
  File ".../pyecsca/virt/lib/python3.9/site-packages/sympy/polys/polytools.py", line 292, in _from_expr
    return cls._from_dict(rep, opt)
  File ".../pyecsca/virt/lib/python3.9/site-packages/sympy/polys/polytools.py", line 239, in _from_dict
    rep[monom] = domain.convert(coeff)
  File ".../pyecsca/virt/lib/python3.9/site-packages/sympy/polys/domains/domain.py", line 148, in convert
    return self.from_sympy(element)
  File ".../pyecsca/virt/lib/python3.9/site-packages/sympy/polys/domains/finitefield.py", line 70, in from_sympy
    raise CoercionFailed("expected an integer, got %s" % a)
sympy.polys.polyerrors.CoercionFailed: expected an integer, got 5/2

Repro:

model = MontgomeryModel()
coords = model.coordinates["xz"]
p = 19
neutral = Point(coords, X=Mod(1, p), Z=Mod(0, p))
curve = EllipticCurve(model, coords, p, neutral, {"a": Mod(8, p), "b": Mod(1, p)})
base = Point(coords, X=Mod(12, p), Z=Mod(1, p))
formula = coords.formulas["dbl-1987-m-2"]
formula(base, **curve.parameters)

Figure out what copy/deepcopy should do

This is a mess and does not work for inplace HDF5 traces.

def __copy__(self):
return Trace(copy(self.samples), copy(self.meta), copy(self.trace_set))
def __deepcopy__(self, memodict):
return Trace(
deepcopy(self.samples, memo=memodict),
deepcopy(self.meta, memo=memodict),
deepcopy(self.trace_set, memo=memodict),
)

A formula call is slow

Trying to use this framework to generate a set of guess and try to reverse a scalar mult algorithm I am studying.
The call of a formula (call) is very time consuming.
Is there a way to optimize or cache the formula and speed up the computation?

As context I'm calling: some mult.multiply(guess) and some mult._add(Point_a, Point_b). Am I correct? or is there a better way to get a fast multiplication and sum of two points?

Thanks in advance.

Add config class

Some things in pyecsca should be configurable at runtime such as:

  • Whether some error states are detected and raise an exception. For example when computing a modular square root of a non-residue, or when computing a modular inverse of a non-invertible element. This is useful for simulating the behavior of implementations which contain these kinds of errors.
  • Whether formula assumptions are verified.
  • Maybe what Mod implementation is used? GMPY vs Raw.
  • ...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.