Coder Social home page Coder Social logo

sarah-ek / faer-rs Goto Github PK

View Code? Open in Web Editor NEW
1.6K 17.0 50.0 7.51 MB

Linear algebra foundation for the Rust programming language

Home Page: https://faer-rs.github.io

License: MIT License

Rust 99.31% Python 0.06% HTML 0.04% C++ 0.43% Just 0.01% TeX 0.16%
linear-algebra matrix rust

faer-rs's Introduction

faer

Documentation Crate

faer is a Rust crate that implements low level linear algebra routines and a high level wrapper for ease of use, in pure Rust. The aim is to provide a fully featured library for linear algebra with focus on portability, correctness, and performance.

See the official website and the docs.rs documentation for code examples and usage instructions.

Questions about using the library, contributing, and future directions can be discussed in the Discord server.

Contributing

If you'd like to contribute to faer, check out the list of "good first issue" issues. These are all (or should be) issues that are suitable for getting started, and they generally include a detailed set of instructions for what to do. Please ask questions on the Discord server or the issue itself if anything is unclear!

Minimum supported Rust version

The current MSRV is Rust 1.67.0.

Benchmarks

The benchmarks were run on an 11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHz with 12 threads.

  • nalgebra is used with the matrixmultiply backend
  • ndarray is used with the openblas backend
  • eigen is compiled with -march=native -O3 -fopenmp

All computations are done on column major matrices containing f64 values.

Matrix multiplication

Multiplication of two square matrices of dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       40ns       41ns      139ns       29ns       17ns
    8       77ns       80ns       63ns      161ns       85ns
   16      189ns      193ns      201ns      363ns      219ns
   32      1.1µs      1.1µs      1.1µs      1.5µs      1.2µs
   64      7.9µs      7.9µs      7.9µs     10.5µs      5.1µs
   96     27.5µs     11.2µs     26.2µs     34.9µs     10.1µs
  128     65.5µs     17.1µs     35.7µs     78.3µs     32.9µs
  192    216.6µs     54.4µs     57.3µs    260.7µs     51.7µs
  256    510.8µs    117.8µs    183.2µs    602.6µs    142.9µs
  384      1.7ms    339.1µs    575.8µs        2ms    327.9µs
  512        4ms    785.6µs      1.3ms      4.7ms        1ms
  640      7.9ms      1.6ms      2.3ms      9.2ms      1.9ms
  768     13.8ms      2.9ms      3.6ms       16ms      3.2ms
  896     22.2ms      4.6ms      6.5ms     25.7ms      5.9ms
 1024     33.9ms      6.6ms      9.7ms     39.1ms      8.3ms

Triangular solve

Solving AX = B in place where A and B are two square matrices of dimension n, and A is a triangular matrix.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       20ns       19ns      755ns       39ns       65ns
    8      118ns      118ns      1.5µs      308ns      156ns
   16      498ns      502ns      3.3µs      1.5µs      671ns
   32      2.1µs      2.1µs      8.6µs      6.6µs      2.9µs
   64      9.7µs      9.8µs     25.9µs     34.2µs     13.8µs
   96     27.7µs     24.5µs     55.2µs    101.4µs     36.9µs
  128     56.4µs     39.9µs    145.2µs      232µs     81.7µs
  192    167.8µs       92µs    263.6µs    815.5µs    213.6µs
  256    367.7µs      163µs      660µs      1.9ms    488.1µs
  384      1.1ms    317.5µs      1.4ms      7.4ms      1.4ms
  512      2.6ms    662.7µs      3.5ms     17.2ms      3.3ms
  640      4.7ms      1.2ms      5.7ms     33.6ms      5.5ms
  768        8ms      2.3ms      9.4ms     56.2ms      9.3ms
  896     12.3ms      3.6ms     13.6ms     89.3ms       14ms
 1024     18.7ms      5.2ms     20.1ms    131.9ms     22.9ms

Triangular inverse

Computing A^-1 where A is a square triangular matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      162ns      5.2µs      771ns       38ns       65ns
    8      514ns      5.9µs      1.5µs      308ns      156ns
   16      1.6µs      7.7µs      3.4µs      1.5µs      672ns
   32      4.2µs     10.5µs      8.7µs      6.6µs      2.9µs
   64     12.5µs     18.1µs     25.7µs     34.2µs     13.8µs
   96     30.6µs     39.8µs       55µs    101.4µs     36.9µs
  128     42.7µs     51.9µs    144.9µs      232µs     81.6µs
  192      110µs     89.7µs    262.9µs    815.7µs    213.3µs
  256    191.7µs    138.3µs    645.5µs      1.9ms    486.9µs
  384    533.5µs    274.7µs      1.4ms      6.7ms      1.4ms
  512      1.1ms    449.4µs      3.5ms     15.6ms      3.3ms
  640        2ms    861.3µs      5.6ms     30.2ms      5.5ms
  768      3.2ms      1.2ms      9.3ms     51.8ms      9.3ms
  896      4.8ms      1.7ms     13.4ms     81.9ms       14ms
 1024      7.2ms      2.4ms     19.9ms    122.8ms     22.7ms

Cholesky decomposition

Factorizing a square matrix with dimension n as L×L.T, where L is lower triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       49ns       49ns      149ns       52ns       43ns
    8      128ns      128ns      329ns       99ns      125ns
   16      408ns      408ns      950ns      412ns      376ns
   32      1.8µs      1.8µs      3.3µs      1.8µs      2.3µs
   64        7µs        7µs     34.6µs     10.5µs        9µs
   96       18µs     18.2µs     70.5µs     31.3µs       21µs
  128     30.1µs     30.4µs    202.2µs     77.4µs     40.3µs
  192     86.4µs     92.7µs    301.3µs    259.8µs    105.2µs
  256    161.7µs    149.4µs    711.5µs    607.4µs    216.6µs
  384    462.9µs    423.9µs      1.2ms      2.1ms    596.5µs
  512      1.1ms    619.5µs      3.8ms      5.4ms      1.3ms
  640      1.9ms      1.3ms      3.3ms     10.4ms      2.2ms
  768      3.3ms      1.8ms      5.4ms     17.9ms      3.7ms
  896        5ms      2.7ms      6.9ms     28.4ms      5.6ms
 1024      7.8ms      3.4ms     14.5ms     41.2ms      8.4ms

LU decomposition with partial pivoting

Factorizing a square matrix with dimension n as P×L×U, where P is a permutation matrix, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      103ns       99ns      180ns       77ns       98ns
    8      210ns      217ns      405ns      241ns      278ns
   16      649ns      625ns      1.4µs      859ns      880ns
   32      2.7µs      2.6µs      5.6µs      4.4µs      3.9µs
   64     12.4µs     12.5µs     17.4µs     22.9µs     15.6µs
   96     30.2µs     31.6µs     34.4µs     67.9µs     36.7µs
  128     61.3µs     60.7µs     97.4µs    159.4µs      126µs
  192    163.5µs    187.3µs    182.4µs    527.8µs    425.5µs
  256      352µs    360.9µs    491.1µs      1.3ms    824.9µs
  384    968.8µs    781.3µs    909.5µs      4.5ms      1.9ms
  512      2.1ms      1.5ms      1.5ms     11.1ms      4.3ms
  640      3.8ms      2.2ms      2.2ms     20.7ms      5.6ms
  768      6.2ms      3.2ms      3.4ms     35.8ms      8.6ms
  896      9.5ms      4.6ms      4.7ms     56.1ms     11.4ms
 1024     14.4ms      6.5ms      6.7ms       88ms     17.1ms

LU decomposition with full pivoting

Factorizing a square matrix with dimension n as P×L×U×Q.T, where P and Q are permutation matrices, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      132ns      134ns          -      111ns      164ns
    8      386ns      415ns          -      418ns      493ns
   16      1.7µs      1.7µs          -      2.3µs      2.1µs
   32      5.9µs        6µs          -     14.7µs     12.2µs
   64     25.8µs     25.4µs          -    106.4µs     72.2µs
   96     67.7µs     67.9µs          -    347.3µs    206.3µs
  128    156.4µs    155.2µs          -    819.1µs    460.9µs
  192    463.4µs    460.6µs          -      2.8ms      1.4ms
  256      1.1ms      1.1ms          -      6.6ms      3.3ms
  384      3.8ms      3.8ms          -     22.1ms       11ms
  512     10.1ms      7.9ms          -     53.4ms     27.4ms
  640     17.7ms       12ms          -    102.5ms     50.7ms
  768     31.2ms     17.5ms          -    176.9ms     87.3ms
  896     47.3ms     25.1ms          -      280ms    136.1ms
 1024     76.1ms     33.9ms          -      431ms    207.9ms

QR decomposition with no pivoting

Factorizing a square matrix with dimension n as QR, where Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      132ns      132ns      758ns      138ns      273ns
    8      345ns      346ns      1.7µs      321ns      777ns
   16      1.1µs      1.1µs      4.8µs      1.3µs      2.2µs
   32      4.4µs      4.4µs     15.3µs      6.9µs      7.4µs
   64     30.5µs     30.1µs     61.7µs     43.4µs     45.2µs
   96     65.2µs     65.2µs    322.4µs    141.3µs     79.1µs
  128    118.4µs    118.3µs    842.4µs    320.9µs    154.3µs
  192    315.3µs    316.1µs      1.6ms      1.1ms    383.7µs
  256    643.8µs    693.4µs      2.8ms      2.4ms    794.6µs
  384      1.9ms      1.7ms      7.6ms      8.1ms      2.1ms
  512      4.1ms        3ms     16.1ms       19ms      4.5ms
  640      7.4ms      4.5ms     22.5ms     36.2ms        8ms
  768     12.2ms      6.6ms     34.7ms     62.1ms     13.2ms
  896     18.6ms      9.2ms     46.3ms     97.7ms     20.4ms
 1024     27.7ms     12.9ms     65.9ms      150ms     30.2ms

QR decomposition with column pivoting

Factorizing a square matrix with dimension n as QRP, where P is a permutation matrix, Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      167ns      185ns          -      172ns      373ns
    8      430ns      433ns          -      552ns        1µs
   16      1.7µs      1.7µs          -      2.8µs      2.9µs
   32      5.9µs        6µs          -     17.6µs      9.5µs
   64     33.2µs     50.6µs          -    126.9µs     37.9µs
   96     85.6µs    104.7µs          -    421.8µs    104.7µs
  128    182.3µs    209.2µs          -    987.7µs    218.1µs
  192    548.2µs    600.4µs          -      3.3ms    628.1µs
  256      1.3ms      1.4ms          -      7.6ms      1.6ms
  384      4.6ms      3.5ms          -     25.4ms      5.6ms
  512     11.4ms      6.7ms          -       60ms     15.1ms
  640     22.2ms     10.5ms          -    116.2ms     26.6ms
  768     37.7ms     14.8ms          -    199.7ms     46.2ms
  896     60.7ms     20.1ms          -    317.9ms     71.1ms
 1024     90.2ms     30.7ms          -    488.3ms      114ms

Matrix inverse

Computing the inverse of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      795ns      7.5µs      534ns       77ns      381ns
    8      2.2µs      8.9µs      995ns      825ns      794ns
   16      5.3µs       12µs      2.9µs        4µs      2.7µs
   32     15.2µs     29.9µs     10.3µs       19µs     10.8µs
   64     49.8µs     66.2µs     40.5µs    101.2µs     45.9µs
   96    127.1µs    122.7µs    182.1µs    285.3µs    119.2µs
  128    199.9µs    172.7µs    314.9µs    661.3µs      341µs
  192      543µs    419.8µs    587.1µs      2.2ms    963.8µs
  256        1ms    668.3µs      1.1ms      5.6ms        2ms
  384      2.9ms      1.4ms      2.4ms     18.7ms      5.1ms
  512      6.2ms      2.6ms      4.6ms     44.2ms     11.9ms
  640     11.5ms      5.5ms      7.2ms       83ms     19.2ms
  768     19.2ms      8.7ms     11.2ms    142.3ms     30.9ms
  896     29.5ms     12.9ms     16.7ms    223.1ms     44.1ms
 1024     43.5ms     18.2ms     23.9ms    347.1ms     68.8ms

Square matrix singular value decomposition

Computing the SVD of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4        2µs      1.9µs        3µs      1.3µs      1.8µs
    8      9.7µs     24.4µs      8.2µs      3.9µs      9.1µs
   16       32µs     57.8µs     25.9µs     16.9µs     49.8µs
   32      107µs    132.1µs     90.3µs     95.9µs      222µs
   64    409.1µs    381.5µs    562.5µs      555µs    987.6µs
   96    903.9µs    913.1µs      1.7ms      1.7ms      2.7ms
  128      1.6ms      1.5ms      2.9ms      4.6ms      4.3ms
  192        4ms        4ms      6.7ms     14.8ms      9.9ms
  256      7.8ms        7ms     11.7ms     47.4ms     17.3ms
  384     20.9ms     15.1ms     25.8ms    121.1ms     42.9ms
  512     45.3ms     28.1ms       52ms    472.1ms     83.9ms
  640       80ms     44.5ms     79.1ms    665.7ms    133.8ms
  768    130.9ms     78.5ms    123.9ms      1.48s    208.9ms
  896    198.4ms    110.9ms    182.8ms      2.11s    295.4ms
 1024    297.8ms      152ms    253.8ms      3.95s    433.6ms

Thin matrix singular value decomposition

Computing the SVD of a rectangular matrix with shape (4096, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4     73.4µs     73.5µs      311µs    127.5µs     76.7µs
    8    170.8µs    180.7µs    813.8µs    364.3µs    302.3µs
   16    440.4µs      513µs      2.1ms      1.4ms    775.5µs
   32      1.2ms      1.2ms      5.3ms      5.2ms      3.1ms
   64      3.4ms      3.2ms     15.7ms     19.9ms        8ms
   96      6.8ms      5.4ms     30.1ms     44.5ms     17.2ms
  128     11.2ms      8.3ms     47.4ms     79.4ms     30.9ms
  192     23.6ms     16.1ms       63ms    182.2ms     60.7ms
  256     40.7ms     25.5ms       84ms    353.1ms    101.3ms
  384     90.7ms     48.3ms      133ms    904.4ms    219.7ms
  512    164.7ms     80.2ms    303.4ms      2.02s    400.7ms
  640    258.7ms    119.7ms      289ms      3.24s    646.8ms
  768    381.7ms      187ms    440.1ms      5.15s      952ms
  896    532.6ms    252.7ms    550.2ms      7.23s      1.33s
 1024    724.4ms      327ms    849.6ms     10.64s      1.75s

Hermitian matrix eigenvalue decomposition

Computing the EVD of a Hermitian matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      1.3µs      1.3µs      1.4µs      675ns        1µs
    8      3.9µs        4µs      6.6µs      2.3µs      3.4µs
   16     13.2µs     13.6µs     25.9µs     10.3µs     12.5µs
   32     50.9µs     51.1µs    167.1µs     50.8µs     49.7µs
   64    223.9µs    217.5µs      1.2ms    293.9µs    211.2µs
   96      519µs    518.2µs      2.6ms      876µs      518µs
  128    931.7µs    885.5µs      5.4ms      1.9ms      1.1ms
  192      2.2ms      2.1ms       16ms      5.8ms      3.1ms
  256      4.1ms      3.5ms     33.9ms     13.2ms      6.6ms
  384     10.5ms      8.8ms    105.5ms     42.7ms     21.2ms
  512     21.9ms     16.5ms      175ms     99.3ms     51.4ms
  640     37.6ms     26.5ms    266.2ms    187.4ms     94.2ms
  768     60.4ms     38.1ms    403.3ms    322.6ms    161.9ms
  896     90.4ms     52.2ms    615.3ms    502.5ms    249.9ms
 1024    132.1ms     68.4ms      909ms    764.1ms      392ms

Non Hermitian matrix eigenvalue decomposition

Computing the EVD of a matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      4.8µs      5.1µs      3.5µs          -      3.1µs
    8     15.6µs     16.7µs      9.6µs          -     10.5µs
   16     54.7µs     54.4µs     35.9µs          -     44.4µs
   32    270.7µs    235.6µs    172.6µs          -    199.3µs
   64      1.1ms      1.1ms        1ms          -      1.1ms
   96      2.7ms      2.9ms      5.5ms          -      3.1ms
  128      4.9ms      5.6ms     11.6ms          -      9.2ms
  192     14.4ms     14.3ms     22.4ms          -     26.9ms
  256     24.4ms     26.2ms     49.9ms          -     86.6ms
  384     56.4ms     62.6ms      107ms          -    246.1ms
  512    126.8ms    130.1ms    281.7ms          -    887.6ms
  640    205.8ms    192.6ms    415.6ms          -       1.2s
  768    323.5ms    285.6ms    547.2ms          -      2.84s
  896    438.1ms    375.8ms    704.3ms          -      3.67s
 1024    687.8ms    579.3ms    957.1ms          -         7s

faer-rs's People

Contributors

aentity avatar bernhard-musch avatar cramt avatar delicioushair avatar djduque avatar edyounis avatar geo-ant avatar kenken-neko avatar martineekgerhardsen avatar masahirokato71 avatar oojo12 avatar sarah-ek avatar tastaturtaste avatar zusez4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

faer-rs's Issues

implement `core::ops::Add` for `Mat<T>`

implement addition for Mat<T>

the traits core::ops::Add<Mat<T>> and core::ops::Add<&Mat<T>> should be implemented for Mat<T> and &Mat<T>

T should satisfy the trait ComplexField for those operations to be implementable. Mat::with_dims could be used, with a closure that takes two elements and adds them

Add a symmetric rank k operation (e.g. like DSYRK in BLAS)

Is your feature request related to a problem? Please describe.
It would be great to have an equivalent to DSYRK in faer, to do the operations
C = alpha * A * A^T + beta * C
and
C = alpha * A^T * A + beta * C
This is useful for example when accumulating normal equations.

Describe the solution you'd like
A symmetric rank k routine.

Describe alternatives you've considered
The operation can be done with GEMM, but is significantly slower.

`Mat::eigenvalues()` may lead to a panic from a failing assertion

Describe the bug
Calling Mat::eigenvalues() may lead to a panic from a failing assertion. To reproduce this, generate a 4096x4096 matrix with random f64 elements of range 0 to 1 and call the eigenvalues funtion (See code below).
This will result in the following error:

thread 'main' panicked at /home/sthagel/.cargo/registry/src/index.crates.io-6f17d22bba15001f/faer-0.18.1/src/linalg/evd/hessenberg.rs:816:26:
assertion `left == right` failed: iterators must have the same length
  left: 31
 right: 32
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

To Reproduce

use faer::mat::Mat;
use num::complex::Complex64;
use rand::prelude::*;

fn main() {
    let n = 4096;
    let n_sq = n * n;
    let mat = (0..n_sq)
        .map(|_| {
            let mut rng = rand::thread_rng();
            rng.gen()
        })
        .collect::<Vec<f64>>();
    let results = eigenvals_faer(&mat, n - 2, n);
    println!("{:#?}", results)
}

#[inline]
pub fn eigenvals_faer(
    mat: &[f64],
    nev: usize,
    n: usize,
) -> Result<Vec<Complex64>, &str> {
    if nev * n == 0 || nev > n - 2 {
        return Err("Invalid parameters!");
    }

    let faer_mat: Mat<f64> = Mat::from_fn(n, n, |i, j| mat[i + n * j]);

    let evs = faer_mat.eigenvalues();

    Ok(evs)
}

Expected behavior
One would expect the calculation to produce the desired eigenvalues

Desktop (please complete the following information):

  • OS: Linux Mint 21.3, Kernel 6.5.0-21-generic
  • Rust version 1.76.0 (stable)
  • Version 0.18.1

Additional info
On smaller matrices (e.g. 40x40 instead of 4096x4096) it works just fine.

implement `core::ops::{Add,Sub,Mul}` for `Mat<T>`

implement basic operations (addition, subtraction, matrix multiplication, multiplication by a scalar) for Mat<T>

T should satisfy the trait ComplexField for those operations to be implementable. for the implementation of addition, subtraction, and scalar multiplication, Mat::with_dims could be used, with a closure that takes two elements and adds or subtracts them

for matrix multiplication, we could create a matrix full of zeros with Mat::zeros, and then fill it up with the result of the computation with faer_core::mul::matmul

`MulAssign<E: Entity>`

Is your feature request related to a problem? Please describe.
I have a x: Mat<f64> and an n: f64and I need to update x = n * x.

Describe the solution you'd like
I think it would be convenient to implement the MulAssign trait such that we can do x *= n and it reuses the memory allocated for x.

Describe alternatives you've considered
Maybe it would be good to have a generic map_with method that transforms all the elements of a matrix in place. It could be used to implement this MulAssign trait, and it would probably be useful for other things as well.

implement the qr module

implement the qr decomposition with and without column pivoting, along with the solve/reconstruct/inverse/least_squares functions to complete the decomposition api

the qr module is also needed for the future svd module implementation, since it could be used as a preconditioner.

Rename fill_with_constant to fill?

I was skimming through the high level API PR, and saw renaming of set_constant to fill_with_constant. It's really a minor thing, but how about naming this method just fill as analogous to slice::fill? As for fill_with_zero, it does not have a counterpart in std (or I am now aware of it), but fill_zero sounds fine to me. This way, you can leave fill_with name for the method taking a filling closure, as is also a convention in std (slice::fill_with).

Matrix Addition and Subtraction using `+` and `-` Operators

Is your feature request related to a problem? Please describe.

Hey, this might be an embarrassingly dumb question, but I cannot see a way to add two matrices using + (or subtract using -). The official website does mention that the addition and subtraction operators should work as expected, but when I write a+b for e.g. two Mat<f64> instances, it won't compile. Looking at the faer_core crate I cannot find an implementation of core::ops::Add. What am I missing?

Describe the solution you'd like

It would be great if the + operator would just work for two suitable Mat<E> instances as well as one Mat<E> and the as a suitable MatRef instance. I can elaborate on what I mean by suitable, but roughly I mean if the entities can be added, then the matrix should be addable (given that the matrix dimensions match).

Describe alternatives you've considered
I could of course do element wise additions manually, but that would feel so much less ergonomic than having an Add trait implemented. Same for AddAssign.

Additional context

If I missed an obvious way to add matrices, I'd really appreciate a hint. If the Add and AddAssign, Sub, and SubAssign traits are indeed not implemented I'd be very happy to file a PR because this crate is really awesome. Please let me know and thanks for your work :)

Guidelines for efficient faer dynamic library

Hi!

I am really impressed by your colossal work on this math kernel!
I am writing a Julia wrapper to benchmark faer against OpenBLAS and MKL.

So far I have only studied the dense matrix-matrix multiplication. My preliminary results show faer approximately 50% slower than OpenBLAS and 25% slower than MKL on an AMD Ryzen 5 7640U on 8 threads.

This is basically my first Rust project and I want to be fair to faer: is this a reasonable dynamic library exposing faer inplace matrix multiplication using the C ABI?

I am not sure if opening an issue is the right way to ask but the faer documentation is very sparse at the moment on how to import external matrices.

use faer::modules::core::mul::matmul;
use faer::{mat, Parallelism};
use std::usize;

// inplace c = a * b
#[no_mangle]
pub unsafe extern "C" fn mult(
    c_ptr: *mut f64,
    c_nrows: u64,
    c_ncols: u64,
    c_row_stride: u64,
    c_col_stride: u64,
    a_ptr: *const f64,
    a_nrows: u64,
    a_ncols: u64,
    a_row_stride: u64,
    a_col_stride: u64,
    b_ptr: *const f64,
    b_nrows: u64,
    b_ncols: u64,
    b_row_stride: u64,
    b_col_stride: u64,
    nthreads: u32,
) {
    assert!(!c_ptr.is_null());
    assert!(!a_ptr.is_null());
    assert!(!b_ptr.is_null());

    let c = unsafe {
        mat::from_raw_parts_mut::<f64>(
            c_ptr,
            c_nrows as usize,
            c_ncols as usize,
            c_row_stride as isize,
            c_col_stride as isize,
        )
    };

    let a = unsafe {
        mat::from_raw_parts::<f64>(
            a_ptr,
            a_nrows as usize,
            a_ncols as usize,
            a_row_stride as isize,
            a_col_stride as isize,
        )
    };

    let b = unsafe {
        mat::from_raw_parts::<f64>(
            b_ptr,
            b_nrows as usize,
            b_ncols as usize,
            b_row_stride as isize,
            b_col_stride as isize,
        )
    };

    matmul(c, a, b, None, 1.0, Parallelism::Rayon(nthreads as usize));
}

Amazing Result vs Intel MKL ???

I have just seen your crate and ran a quick benchmark against intel mkl.

I got the following result

test bench_faer_rs_n        ... bench:  33,479,057 ns/iter (+/- 9,121,197)
test bench_faer_rs_t        ... bench:  32,854,984 ns/iter (+/- 7,885,723)
test bench_mkl_n            ... bench:  35,916,818 ns/iter (+/- 9,305,052)
test bench_mkl_t            ... bench:  35,681,294 ns/iter (+/- 10,449,638)

I tried it several times just to make sure it is not a luck result, or bug. Having on par performance with intel mkl library is amazing!!!.

Maybe you can confirm yourself and put a comparison with intelmkl to documentation since it is one of the fastest blas implementation, if not the fastest.

Low Level API with Pre Allocated Work Space Exposed

It would be great if there will be, for any function, a low level API which exposes the needed workspace to avoid any allocations of the function.

The work is impressive. Being competitive with the big guys is nothing short of amazing considering this is a single person show.

Unit test fails in `faer-core`

Sorry I don't have time currently to try and track and fix the issue in more detail:

Running cargo t in faer-core fails non-deterministically with the error:

failures:

---- mul::tests::triangular stdout ----
thread 'mul::tests::triangular' panicked at 'attempt to subtract with overflow', /home/daniel/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-0.12.0/src/gemm.rs:960:35


failures:
    mul::tests::triangular

test result: FAILED. 12 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.07s

Apache arrow input

Apache Arrow is an aligned chunked array representation of columns. If my matrix is a dense matrix encoded as a chunked array of float64 numbers is it possible to use faer-rs on top of it in a zero-copy way? I assume further constraints (chunksize has to be a multiple of some batch size) will be required. Or will we have to combine the (relatively large) chunks into a continuous aligned array first before feeding it to faer?

Gemm slow on specific AMD epyic CPU

Describe the bug
Running the faer-bench benchmark on 2xAMD EPYC 7V13 64-Core Processor
is surprisingly slow. Both in absolute numbers, as well as in comparing the speedup of faer(par) over faer(seq).
This does not seem to be the case on other, larger AMD server cpu's

To Reproduce
Just to keep track for myself:
CXXFLAGS="-I/u/drehwald/prog" CXX=g++ cargo +nightly run --release --no-default-features --features faer

Expected behavior
Well, don't be slow.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

➜  ~ g++ --version                                                                                            
g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

➜  ~ cargo +nightly --version
cargo 1.71.0-nightly (d0a4cbcee 2023-04-16)

Additional context

Our admin just got back to me, university admins probably won't adjust the perf settings for us, the machine is too busy so it would be a perf risk. But I got access to two other AMD machines, maybe we can use that for pinning the issue down.

PyArray2 into faer to simplify python interfaces to rust routines using faer

Currently my pyO3 bindings that call some rust routines that use faer look something like:

use numpy::{IntoPyArray, PyArray2, PyReadonlyArray2};
use pyo3::{exceptions::PyRuntimeError, pymodule, types::PyModule, PyResult, Python};
use faer::{prelude::*}
// ...

#[pymodule]
fn rust_lib<'py>(_py: Python<'py>, m: &'py PyModule)
    -> PyResult<()>
{
   #[pyfn(m)]
    fn rpca<'py>(py: Python<'py>, a_py: PyReadonlyArray2<'py, f64>, n_rank: usize, n_iters: usize, n_oversamples: usize)
        -> &'py PyArray2<f64>
    {
        let a_ndarray = a_py.as_array();
        let a_faer = a_ndarray.view().into_faer();
        // ... faer-rs math
        ndarray_result.into_pyarray(py)
    }
}

I'm unsure if I am using the best approach.

Desired Solution:

// ...

   #[pyfn(m)]
    fn rpca<'py>(py: Python<'py>, a_py: PyReadonlyArray2<'py, f64>, n_rank: usize, n_iters: usize, n_oversamples: usize)
        -> &'py PyArray2<f64>
    {
        let a_faer = a_py.into_faer();
        // ... faer-rs math
        faer_result.into_pyarray(py)
    }

Auto Generate Release Notes

Is your feature request related to a problem? Please describe.
Another maintenance task. Unlike #10 which is geared towards developers knowing what low level changes happened. This one is geared towards user facing changes so the average user knows what changes happened between releases without much hassle on the maintainers side.

Describe the solution you'd like
This is one I have to research a bit more. I forgot the means to do this but I know it's possible and seemed rather useful.

You can assign this one to me

Add PGO version to benchmarks

Is your feature request related to a problem? Please describe.
Hi, I ran the benchmarks yesterday on my machine (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz) and did a quick experiment with PGO. The wallclock results are even more impressive and are eventually worth adding to the benchmark section.

Describe the solution you'd like
Make user aware of PGO, link the section in Rust documentation and add some additional estimates of benchmarks.

Additional context

Two examples with least and most improvements. I have seen a consistent improvement for all benchmarks, especially for the parallelized version.

Matrix multiplication (f32)

with PGO w/o PGO
    n       faer  faer(par)    ndarray   nalgebra      eigen
   32        1µs      828ns      1.4µs      1.7µs      812ns
   64      4.8µs      4.8µs      7.3µs      9.5µs      4.6µs
   96     14.9µs     15.5µs     18.9µs     29.6µs      8.3µs
  128     36.6µs     21.8µs     43.3µs     67.5µs     21.7µs
  192      115µs       39µs     42.7µs    221.9µs     31.4µs
  256    272.5µs     80.2µs    133.7µs    513.3µs     81.6µs
  384    910.1µs    247.4µs    223.8µs      1.7ms    179.3µs
  512      2.3ms    530.9µs    610.9µs        4ms    574.6µs
  640      4.4ms    998.5µs      1.1ms      7.8ms    895.4µs
  768      7.5ms      1.6ms      1.4ms     13.4ms      1.3ms
  896     12.2ms      2.7ms      2.6ms     21.5ms      2.6ms
 1024     18.9ms        4ms      4.1ms     32.2ms      3.5ms
    n       faer  faer(par)    ndarray   nalgebra      eigen
   32      1.3µs      1.1µs      1.5µs      1.9µs      811ns
   64      5.5µs      5.4µs      7.4µs      9.9µs      4.3µs
   96     15.9µs     56.4µs     20.1µs     30.5µs      8.3µs
  128     38.9µs     64.6µs     43.7µs     69.3µs     21.7µs
  192    118.1µs     80.4µs     42.6µs    222.7µs     31.5µs
  256    278.5µs    114.1µs    133.3µs    512.1µs     81.2µs
  384    936.1µs    330.9µs    226.2µs      1.7ms    179.1µs
  512      2.3ms    636.4µs    590.5µs        4ms    570.3µs
  640      4.4ms      1.2ms      1.1ms      7.8ms      895µs
  768      7.5ms      1.9ms      1.4ms     13.4ms      1.3ms
  896     12.6ms      3.1ms      2.6ms     21.4ms      2.6ms
 1024     18.6ms      4.5ms      4.8ms     32.1ms      3.5ms

Thin matrix singular value decomposition (f32)

with PGO w/o PGO
    n       faer  faer(par)    ndarray   nalgebra      eigen
   32    942.2µs      1.3ms      5.8ms      2.8ms      1.7ms
   64      2.6ms      3.2ms       16ms     10.9ms      5.3ms
   96      5.2ms      6.2ms     29.6ms     24.6ms     11.4ms
  128      8.2ms        9ms     45.8ms     43.2ms     17.6ms
  192     16.6ms     16.1ms     64.3ms     98.3ms     36.1ms
  256     27.7ms     24.6ms     94.2ms    185.1ms     61.6ms
  384     57.8ms     44.5ms    147.2ms      445ms      118ms
  512    102.9ms     68.6ms    222.5ms    926.3ms    201.8ms
  640    160.9ms     93.9ms    295.2ms      1.53s    311.4ms
  768    236.9ms      127ms    368.9ms      2.49s    602.2ms
  896    328.2ms      166ms    469.9ms      3.65s    611.1ms
 1024    453.5ms    215.1ms    732.9ms      6.07s    808.6ms
    n       faer  faer(par)    ndarray   nalgebra      eigen
   32      1.5ms      2.5ms      5.7ms      3.1ms      1.7ms
   64      4.3ms      6.2ms     16.2ms     11.5ms      5.1ms
   96      8.3ms     13.9ms     29.6ms     25.5ms     10.8ms
  128     13.4ms     21.7ms     43.7ms     45.3ms     17.6ms
  192     27.4ms     43.9ms     73.1ms    102.8ms     34.4ms
  256     45.8ms       72ms     94.5ms    194.1ms     54.4ms
  384     94.6ms    128.7ms    147.9ms    451.9ms    112.9ms
  512    166.7ms    210.5ms    219.9ms    962.4ms    190.5ms
  640    258.1ms    313.3ms    288.2ms      1.53s      299ms
  768    385.5ms    443.6ms    367.1ms       2.5s      431ms
  896    526.4ms    578.5ms    466.4ms      3.61s      596ms
 1024    708.7ms      739ms    719.9ms      6.19s    789.8ms

f64_pgo.txt
f64_wo_pgo.txt
f32_wo_pgo.txt
f32_pgo.txt

More specific Cholesky error

In the LAPACK Cholesky factorization routines (e.g. DPOTRF) when it encounters an error it returns the details about where it failed (INFO > 0). Is it possible to add something similar to the faer Cholesky routines?

`no_std`

You do not support no_std, right?

`AddAssign` and `SubAssign` traits

Is your feature request related to a problem? Please describe.
I have a x: Mat<f64> and y: MatRef<'_, f64> and I need to update x = x - y.

Describe the solution you'd like
I think it would be convenient to have an impl SubAssign<Rhs = MatRef> for Mat such that it reuses the memory allocated for x.

Describe alternatives you've considered
I am currently solving my issue with Mat::with_dims to create a new matrix. Implementing these traits could be a bit more performant, and it would definitely be more readable.

faer calculates the eigenvalues of a nontrivial adjacency matrix as `0` and `+/- inf`

Describe the bug
The Vec<c64> output of eigenvalues() for the adjacency matrix of the following $20$-vertex tree are incorrectly reported as all zeroes.

Note: the path graph on $3$ vertices produces the wrong eigenvalues. Here's a failing unit test.

To Reproduce

/* use the smaller unit test linked above
#[test]
fn this_other_tree_has_correct_maximum_eigenvalue() {
    let edges = [(3, 2), (6, 1), (7, 4), (7, 6), (8, 5), (9, 4), (11, 2), (12, 2), (13, 2), (15, 6), (16, 2), (16, 4), (17, 8), (18, 0), (18, 8), (18, 2), (19, 6), (19, 10), (19, 14)];
    let mut a = faer::Mat::zeros(20, 20);
    for (v, u) in edges.iter() {
        a[(*v, *u)] = 1.0;
        a[(*u, *v)] = 1.0;
    }
    let eigs_complex: Vec<faer::complex_native::c64> = a.eigenvalues();
    println!("{eigs_complex:?}");
    let eigs_real = eigs_complex
        .iter()
        .map(|e| e.re)
        .collect::<Vec<_>>();
    println!("{eigs_real:?}");
    let lambda_1 = *eigs_real
        .iter()
        .max_by(|a, b| a.partial_cmp(&b).unwrap())
        .unwrap();
    let correct_lamba_1 = 2.6148611139728866;
    assert!(
        (lambda_1 - correct_lamba_1).abs() < 1e-10,
        "lambda_1 = {lambda_1}, correct_lamba_1 = {correct_lamba_1}",
    );
}
*/

Expected behavior

name: graphenv
# channels:
#   - defaults
# dependencies:
#   - python
#   - networkx
#   - matplotlib
#   - imageio
#   - msgpack-python
#   - scipy

import networkx as nx
import numpy as np
import matplotlib.pyplot as plt

# Create the graph
edges = [[3, 2], [6, 1], [7, 4], [7, 6], [8, 5], [9, 4], [11, 2], [12, 2], [13, 2], [15, 6], [16, 2], [16, 4], [17, 8], [18, 0], [18, 8], [18, 2], [19, 6], [19, 10], [19, 14]]
graph = nx.Graph()
graph.add_edges_from(edges)

# Get the adjacency matrix eigenvalues
adjacency_matrix = nx.adjacency_matrix(graph).todense()
eigenvalues = np.linalg.eigvals(adjacency_matrix)

# Print the eigenvalues in decreasing order
print("Eigenvalues in decreasing order:")
for eigenvalue in sorted(eigenvalues, reverse=True):
    print(eigenvalue)

# Plot the graph
nx.draw(graph, with_labels=True)
plt.show()

Methods to read and write matrices from and to CSV

Is your feature request related to a problem? Please describe.
Especially in debugging, it is useful to be able to inspect the matrix in a non-terminal environment. The CSV format is easily implemented and read from, but has not yet been implemented.

Describe the solution you'd like
Methods for writing and reading matrices, either as implementations for a given type, or as generic functions.

[Curiosity] Is there a future plan for linear programming?

Hello,

Just out of curiosity, are you planning on adding support for linear programming that could run on top of the algebraic operations provided by the project? Or do you think that such thing should be better provided by third-parties using faer-rs ?

Thanks

some missing macros / convenience functions

Is your feature request related to a problem? Please describe.
It makes sense to have the macros col! and row! in order to be consistent with mat!. This also seems a good opportunity to include a related macro (or function) blockwise!:

blockwise![[A, B], [C, D]];

// ┌───────┬──┐
// │       │  │
// │   A   │ B│
// │       │  │
// ├───────┼──┤
// │   C   │ D│
// └───────┴──┘

Automate Testing

Is your feature request related to a problem? Please describe.

At current there are test but no automated tests run via CI to ensure bugs aren't introduced as changes are made.

Describe the solution you'd like
A GitHub Action to run the test and output our current test coverage.

Describe alternatives you've considered
None seemed natural to use GitHub Actions since we're on GitHub. Believe travis-CI is a viable alternative though.

High-level api for computing eigenvalues and eigenvectors from square matrices?

Hi @sarah-ek,

Is there any plans on adding any high-level APIs for computing eigenvectors and eigenvalues of square, but not necessarily symmetric, matrices?

I really like the ergonomics of your library but it would be nice to just have a functional, simple API for providing matrices and getting out eigenvalues and their associated eigenvectors without having to hand roll some form of iterative/inverse iteration method in each new crate.

Wiki code `use` statements seem incorrect

The top line of each of the wiki's examples has using faer_core::mat; or something like that. I think it should be something like use faer::core::mat; and so forth.

Add support for matrix multiplication with `half::f16`/`half::bf16`

Is your feature request related to a problem? Please describe.
Currently none of the matrix multiplication crates support the half (fp16) datatype. There is the half crate (https://crates.io/crates/half) that includes a rust type for this.

This means for any crate that needs to support matrix multiplication on CPU with f16 datatype, they need to manually implement a matrix multiplication algorithm, which can be very slow.

Describe the solution you'd like
It'd be great if faer included configurable support for these data types.

Describe alternatives you've considered
I'm currently using this super naive matmul implementation:

fn naive_gemm<F: num_traits::Float + std::ops::AddAssign, M: Dim, K: Dim, N: Dim>(
    (m, k, n): (M, K, N),
    ap: *const F,
    a_strides: [usize; 2],
    bp: *const F,
    b_strides: [usize; 2],
    cp: *mut F,
    c_strides: [usize; 2],
) {
    for i_m in 0..m.size() {
        for i_k in 0..k.size() {
            for i_n in 0..n.size() {
                unsafe {
                    let a = *ap.add(a_strides[0] * i_m + a_strides[1] * i_k);
                    let b = *bp.add(b_strides[0] * i_k + b_strides[1] * i_n);
                    let c = cp.add(c_strides[0] * i_m + c_strides[1] * i_n);
                    *c += a * b;
                }
            }
        }
    }
}

Which is extremely slow. I'm not sure what the other alternatives are.

Additional context

I'm the author of dfdx. dfdx has both cuda/CPU support, and CUDA does have hardware acceleration for f16. It'd just be nice if f16 matmul on CPU was a bit faster!

faer-bench fails with size mismatch

I ran cargo run in the faer-bench dir and the result was a panic.
The system is a Arm MacOS with the newest nichtly.

leifeld@MacStudovonDirk faer-bench % cargo +nightly run --release --no-default-features
Finished release [optimized] target(s) in 0.08s
Running target/release/faer-bench
f32

Matrix multiplication

Multiplication of two square matrices of dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       33ns       32ns      142ns          -          -
    8       57ns       60ns      122ns          -          -
   16      206ns      211ns      303ns          -          -
   32      1.1µs      1.1µs      996ns          -          -
   64      8.1µs      8.1µs      6.7µs          -          -
   96     24.8µs     30.9µs      1.3ms          -          -
  128     59.8µs     39.1µs      1.2ms          -          -
  192    195.1µs     93.3µs      1.7ms          -          -
  256    479.7µs    202.1µs      1.9ms          -          -
  384      1.6ms    448.6µs      2.3ms          -          -
  512      3.7ms        1ms      2.6ms          -          -
  640      7.3ms      1.7ms        3ms          -          -
  768     12.5ms        3ms      5.1ms          -          -
  896     19.7ms      4.4ms      6.8ms          -          -
 1024     29.8ms      6.7ms      9.2ms          -          -

Triangular solve

Solving AX = B in place where A and B are two square matrices of dimension n, and A is a triangular matrix.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       14ns       14ns     26.3µs          -          -
    8       90ns       90ns       27µs          -          -
   16      378ns      380ns     29.9µs          -          -
   32      1.4µs      1.4µs     35.9µs          -          -
   64      7.2µs      7.2µs     54.1µs          -          -
   96     20.4µs     27.2µs     93.5µs          -          -
  128     44.9µs     40.6µs    131.9µs          -          -
  192    131.7µs    117.5µs    290.3µs          -          -
  256    301.3µs    152.8µs    487.1µs          -          -
  384    913.6µs    450.8µs    996.1µs          -          -
  512      2.2ms      966µs      2.6ms          -          -
  640      4.1ms      1.8ms      2.7ms          -          -
  768      6.7ms      2.6ms      3.9ms          -          -
  896     10.8ms      3.6ms      5.3ms          -          -
 1024     17.7ms      5.5ms     12.6ms          -          -

Triangular inverse

Computing A^-1 where A is a square triangular matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      160ns      9.3µs     26.3µs          -          -
    8      468ns       12µs     27.1µs          -          -
   16      1.2µs     16.4µs     29.9µs          -          -
   32      3.4µs     24.1µs     35.9µs          -          -
   64      9.9µs     31.6µs     54.1µs          -          -
   96     23.2µs     39.9µs     89.1µs          -          -
  128     33.9µs     49.9µs    133.3µs          -          -
  192     86.4µs     94.7µs    310.4µs          -          -
  256    153.1µs    143.9µs    507.7µs          -          -
  384    421.7µs    281.3µs        1ms          -          -
  512    907.5µs    492.6µs      2.7ms          -          -
  640      1.6ms    816.2µs      2.7ms          -          -
  768      2.6ms        1ms      3.9ms          -          -
  896      4.1ms      1.5ms      5.4ms          -          -
 1024      6.8ms      2.4ms     12.5ms          -          -

Cholesky decomposition

Factorizing a square matrix with dimension n as L×L.T, where L is lower triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       43ns       43ns      120ns          -          -
    8      154ns      154ns      354ns          -          -
   16      767ns      766ns      1.2µs          -          -
   32      2.1µs      2.1µs      4.8µs          -          -
   64      6.3µs      6.3µs       15µs          -          -
   96     15.8µs     15.8µs     34.2µs          -          -
  128     22.2µs     22.3µs    428.7µs          -          -
  192     59.7µs       87µs     12.5ms          -          -
  256    106.1µs    126.4µs      3.2ms          -          -
  384    292.4µs    331.5µs      5.3ms          -          -
  512    651.6µs    483.6µs        7ms          -          -
  640      1.1ms      1.1ms     11.4ms          -          -
  768        2ms      1.5ms     19.6ms          -          -
  896        3ms        2ms      1.82s          -          -
 1024      4.7ms      2.5ms     22.3ms          -          -

LU decomposition with partial pivoting

Factorizing a square matrix with dimension n as P×L×U, where P is a permutation matrix, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       92ns       89ns      154ns          -          -
    8      263ns      238ns      456ns          -          -
   16      895ns      905ns        2µs          -          -
   32      2.9µs      2.9µs        7µs          -          -
   64     11.9µs     12.2µs     22.8µs          -          -
   96     27.2µs     28.2µs     46.3µs          -          -
  128       54µs     53.7µs     88.3µs          -          -
  192    141.9µs    215.8µs    206.9µs          -          -
  256    285.8µs    384.5µs    749.8µs          -          -
  384    801.3µs    919.1µs      1.6ms          -          -
  512      1.8ms      1.7ms      2.4ms          -          -
  640      3.2ms      3.1ms      4.2ms          -          -
  768      5.3ms      4.2ms      5.8ms          -          -
  896      8.2ms      5.8ms      7.7ms          -          -
 1024     12.8ms      7.9ms      9.8ms          -          -

LU decomposition with full pivoting

Factorizing a square matrix with dimension n as P×L×U×Q.T, where P and Q are permutation matrices, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      160ns      160ns          -          -          -
    8      388ns      403ns          -          -          -
   16      1.5µs      1.5µs          -          -          -
   32      7.9µs      7.8µs          -          -          -
   64     53.4µs       53µs          -          -          -
   96    174.2µs    173.9µs          -          -          -
  128    417.9µs      417µs          -          -          -
  192      1.5ms      1.5ms          -          -          -
  256      3.5ms      3.5ms          -          -          -
  384     11.8ms     11.3ms          -          -          -
  512     28.9ms     21.8ms          -          -          -
  640     55.6ms     33.2ms          -          -          -
  768       98ms     49.2ms          -          -          -
  896    158.7ms     72.1ms          -          -          -
 1024    242.3ms    103.1ms          -          -          -

QR decomposition with no pivoting

Factorizing a square matrix with dimension n as QR, where Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      102ns      100ns      729ns          -          -
    8      294ns      293ns      1.5µs          -          -
   16      1.1µs      1.2µs      4.7µs          -          -
   32      4.6µs      4.7µs     17.2µs          -          -
   64     25.3µs     25.6µs     65.4µs          -          -
   96     57.7µs     59.4µs    454.9µs          -          -
  128    110.2µs    109.9µs      4.4ms          -          -
  192    298.2µs    297.7µs     37.8ms          -          -
  256    615.9µs    796.9µs       59ms          -          -
  384      1.8ms      2.1ms    118.2ms          -          -
  512        4ms      3.7ms    197.2ms          -          -
  640      6.7ms      5.1ms    301.4ms          -          -
  768     11.2ms        8ms    429.2ms          -          -
  896     17.3ms     11.9ms    523.8ms          -          -
 1024     25.6ms     16.5ms    632.4ms          -          -

QR decomposition with column pivoting

Factorizing a square matrix with dimension n as QRP, where P is a permutation matrix, Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      153ns      140ns          -          -          -
    8      414ns      410ns          -          -          -
   16      1.5µs      1.6µs          -          -          -
   32      6.5µs      6.5µs          -          -          -
   64     40.7µs     66.3µs          -          -          -
   96    123.3µs    167.8µs          -          -          -
  128    266.4µs    327.5µs          -          -          -
  192    802.5µs    862.3µs          -          -          -
  256      1.8ms      2.3ms          -          -          -
  384      5.6ms      6.8ms          -          -          -
  512     12.5ms     12.8ms          -          -          -
  640     24.6ms     19.9ms          -          -          -
  768     41.3ms     28.1ms          -          -          -
  896     65.1ms     38.3ms          -          -          -
 1024     94.5ms     51.7ms          -          -          -

Matrix inverse

Computing the inverse of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      726ns     14.3µs      412ns          -          -
    8      1.7µs     17.4µs      907ns          -          -
   16      4.4µs     25.3µs      3.9µs          -          -
   32       13µs     51.3µs     12.6µs          -          -
   64       42µs     84.9µs     49.4µs          -          -
   96    100.1µs      141µs    443.4µs          -          -
  128    168.5µs    209.1µs      2.8ms          -          -
  192      430µs    454.3µs      3.1ms          -          -
  256    837.1µs    711.4µs      7.3ms          -          -
  384      2.3ms      1.6ms     17.9ms          -          -
  512      5.3ms        3ms     19.4ms          -          -
  640      9.6ms        5ms     31.4ms          -          -
  768     15.2ms        7ms     34.6ms          -          -
  896       24ms      9.7ms     43.1ms          -          -
 1024     37.6ms     14.3ms     50.6ms          -          -

Square matrix singular value decomposition

Computing the SVD of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      1.1µs      1.1µs        2µs          -          -
    8      5.8µs     21.5µs      5.8µs          -          -
   16     20.3µs       54µs     19.5µs          -          -
   32       72µs    116.1µs     71.6µs          -          -
   64    270.6µs    360.5µs      3.9ms          -          -
   96    664.6µs    971.7µs      6.7ms          -          -
  128      1.3ms      1.7ms     44.4ms          -          -
  192      3.5ms      5.1ms    108.8ms          -          -
  256      7.3ms      9.3ms    173.6ms          -          -
  384     21.4ms     20.8ms    310.9ms          -          -
  512     47.5ms     35.9ms    466.5ms          -          -
  640     85.2ms     54.2ms    616.3ms          -          -
  768      143ms       80ms    830.5ms          -          -
  896    225.4ms    115.2ms      1.07s          -          -
 1024    327.3ms    160.4ms      1.23s          -          -

Thin matrix singular value decomposition

Computing the SVD of a rectangular matrix with shape (4096, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4     32.8µs     32.4µs    644.1µs          -          -
    8      112µs    150.6µs      1.4ms          -          -
   16    318.9µs    456.7µs      8.4ms          -          -
   32    947.4µs      1.2ms     15.5ms          -          -
   64      2.9ms      3.5ms     45.5ms          -          -
   96      5.3ms      5.8ms       61ms          -          -
  128      8.9ms        9ms    100.3ms          -          -
  192     18.7ms     17.5ms    248.2ms          -          -
  256     33.7ms     29.3ms    380.9ms          -          -
  384     75.2ms     54.5ms    657.6ms          -          -
  512    138.8ms       94ms      1.15s          -          -
  640    224.2ms    136.4ms      1.63s          -          -
  768    336.6ms    182.1ms      1.97s          -          -
  896    478.8ms    240.6ms      2.45s          -          -
 1024      656ms    327.2ms      2.68s          -          -

Hermitian matrix eigenvalue decomposition

Computing the EVD of a hermitian matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      992ns      890ns        1µs          -          -
    8      2.5µs      2.4µs      3.5µs          -          -
   16      9.2µs      8.7µs     12.1µs          -          -
   32     36.5µs     34.8µs     55.5µs          -          -
   64    161.4µs    194.4µs    421.2µs          -          -
   96    391.3µs    499.3µs      1.1ms          -          -
  128    725.3µs    841.1µs      4.3ms          -          -
  192      1.9ms      2.2ms     22.6ms          -          -
  256      3.7ms        4ms     45.9ms          -          -
  384     10.7ms     12.4ms    114.5ms          -          -
  512     23.2ms     22.7ms    206.6ms          -          -
  640     41.9ms     34.5ms    330.7ms          -          -
  768     69.5ms       51ms      471ms          -          -
  896    107.4ms     70.9ms      657ms          -          -
 1024    156.6ms     92.4ms    888.3ms          -          -

Non Hermitian matrix eigenvalue decomposition

Computing the EVD of a matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      3.9µs      4.2µs      2.4µs          -          -
    8     11.6µs       13µs      6.9µs          -          -
   16     43.7µs     42.8µs     27.4µs          -          -
   32    174.2µs    188.9µs    123.3µs          -          -
   64    806.5µs    834.3µs    658.5µs          -          -
   96      1.9ms        2ms      4.4ms          -          -
  128      3.2ms      3.4ms     31.6ms          -          -
  192      8.5ms        9ms    100.3ms          -          -
  256     16.4ms       17ms    228.4ms          -          -
  384     38.3ms     47.9ms    592.5ms          -          -
  512     82.8ms     96.1ms    798.3ms          -          -
  640    146.3ms    164.9ms      1.12s          -          -
  768    230.3ms    243.3ms      1.46s          -          -
  896    340.3ms    341.4ms      1.69s          -          -
 1024    540.2ms    539.9ms      2.45s          -          -

f64

Matrix multiplication

Multiplication of two square matrices of dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       32ns       32ns      128ns          -          -
    8       80ns       84ns      171ns          -          -
   16      311ns      315ns      345ns          -          -
   32      2.1µs      2.2µs      1.8µs          -          -
   64     14.8µs     14.8µs     12.1µs          -          -
   96     50.3µs     41.3µs      1.2ms          -          -
  128    124.1µs     69.5µs      1.4ms          -          -
  192    392.9µs    173.1µs      1.5ms          -          -
  256    943.6µs    315.2µs      1.3ms          -          -
  384      3.2ms    889.3µs      2.3ms          -          -
  512      7.5ms      1.9ms      3.7ms          -          -
  640     14.5ms      3.4ms      4.9ms          -          -
  768     25.7ms      5.8ms      8.5ms          -          -
  896       30ms      7.3ms     13.1ms          -          -
 1024     45.1ms       11ms     17.4ms          -          -

Triangular solve

Solving AX = B in place where A and B are two square matrices of dimension n, and A is a triangular matrix.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       14ns       14ns     27.9µs          -          -
    8       89ns       90ns     27.7µs          -          -
   16      429ns      431ns     29.8µs          -          -
   32      1.9µs      1.9µs     35.5µs          -          -
   64     11.3µs     11.3µs       54µs          -          -
   96     33.9µs     35.3µs     81.9µs          -          -
  128     75.4µs     63.5µs      126µs          -          -
  192      235µs    149.1µs    264.2µs          -          -
  256    555.4µs    254.1µs    759.9µs          -          -
  384      1.7ms    841.4µs      1.2ms          -          -
  512      4.5ms      1.6ms      3.5ms          -          -
  640      7.8ms      2.8ms      3.4ms          -          -
  768     13.1ms      4.5ms      7.3ms          -          -
  896     20.9ms      6.5ms        7ms          -          -
 1024     35.4ms       10ms     18.3ms          -          -

Triangular inverse

Computing A^-1 where A is a square triangular matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      156ns     12.4µs       28µs          -          -
    8      479ns     11.4µs     27.7µs          -          -
   16      1.3µs       17µs     29.9µs          -          -
   32      3.7µs     24.1µs     34.2µs          -          -
   64     11.4µs       33µs     52.2µs          -          -
   96     28.8µs     45.4µs    212.9µs          -          -
  128     45.8µs     65.2µs    334.2µs          -          -
  192    124.9µs    130.4µs    405.3µs          -          -
  256      247µs      226µs      765µs          -          -
  384    712.3µs    395.6µs      1.2ms          -          -
  512      1.8ms    736.7µs      3.4ms          -          -
  640      2.9ms      1.3ms      3.3ms          -          -
  768      4.8ms        2ms      7.2ms          -          -
  896      7.6ms      2.6ms      7.1ms          -          -
 1024     13.5ms      3.9ms       18ms          -          -

Cholesky decomposition

Factorizing a square matrix with dimension n as L×L.T, where L is lower triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       46ns       46ns      121ns          -          -
    8      146ns      145ns      393ns          -          -
   16      631ns      625ns      1.3µs          -          -
   32        2µs        2µs      5.1µs          -          -
   64      7.3µs      7.3µs      187µs          -          -
   96     18.3µs     18.3µs      1.4ms          -          -
  128     32.2µs     32.4µs      1.3ms          -          -
  192     88.2µs    100.1µs        6ms          -          -
  256    186.8µs    188.5µs      4.8ms          -          -
  384    520.7µs      465µs      8.8ms          -          -
  512      1.3ms    935.6µs     10.4ms          -          -
  640      2.2ms      1.6ms     12.7ms          -          -
  768      3.8ms      2.2ms     20.5ms          -          -
  896      5.7ms        3ms     24.7ms          -          -
 1024      9.5ms      4.6ms     26.2ms          -          -

LU decomposition with partial pivoting

Factorizing a square matrix with dimension n as P×L×U, where P is a permutation matrix, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       76ns       93ns      150ns          -          -
    8      225ns      213ns      493ns          -          -
   16      740ns      771ns      2.2µs          -          -
   32        3µs      2.9µs      6.6µs          -          -
   64     14.7µs     14.8µs     21.1µs          -          -
   96     37.7µs     37.7µs     46.5µs          -          -
  128     75.7µs     76.2µs    302.8µs          -          -
  192    218.3µs    276.5µs    620.6µs          -          -
  256    474.4µs      524µs    810.8µs          -          -
  384      1.4ms      1.3ms      1.5ms          -          -
  512      3.4ms      2.6ms        3ms          -          -
  640      5.9ms      4.4ms      4.6ms          -          -
  768      9.9ms      6.2ms      6.5ms          -          -
  896     15.5ms      8.5ms      8.7ms          -          -
 1024     24.8ms     12.8ms     11.3ms          -          -

LU decomposition with full pivoting

Factorizing a square matrix with dimension n as P×L×U×Q.T, where P and Q are permutation matrices, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      119ns      132ns          -          -          -
    8      371ns      369ns          -          -          -
   16      1.4µs      1.4µs          -          -          -
   32      7.7µs      7.7µs          -          -          -
   64     53.4µs     53.4µs          -          -          -
   96    185.1µs    184.7µs          -          -          -
  128    460.1µs      459µs          -          -          -
  192      1.7ms      1.7ms          -          -          -
  256      4.3ms      4.3ms          -          -          -
  384     14.8ms       14ms          -          -          -
  512     38.2ms     25.8ms          -          -          -
  640     73.1ms       38ms          -          -          -
  768    128.8ms     56.5ms          -          -          -
  896    207.4ms     80.4ms          -          -          -
 1024    306.1ms    119.3ms          -          -          -

QR decomposition with no pivoting

Factorizing a square matrix with dimension n as QR, where Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      106ns      106ns      658ns          -          -
    8      295ns      296ns      1.6µs          -          -
   16        1µs        1µs      4.7µs          -          -
   32        5µs        5µs       18µs          -          -
   64     31.9µs     31.8µs     75.6µs          -          -
   96     79.7µs     79.7µs    526.7µs          -          -
  128    158.8µs    157.9µs      4.7ms          -          -
  192    451.3µs    450.1µs     49.9ms          -          -
  256    963.8µs      1.1ms     72.8ms          -          -
  384      2.9ms      2.9ms    148.4ms          -          -
  512      6.6ms      5.4ms    291.2ms          -          -
  640     12.4ms      8.9ms    337.5ms          -          -
  768     20.8ms     13.7ms    413.4ms          -          -
  896     31.6ms     20.7ms    546.6ms          -          -
 1024     45.6ms     28.3ms    734.9ms          -          -

QR decomposition with column pivoting

Factorizing a square matrix with dimension n as QRP, where P is a permutation matrix, Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      175ns      162ns          -          -          -
    8      422ns      436ns          -          -          -
   16      1.6µs      1.6µs          -          -          -
   32        8µs      7.9µs          -          -          -
   64     58.7µs     93.9µs          -          -          -
   96      169µs    213.4µs          -          -          -
  128    358.2µs    422.9µs          -          -          -
  192      1.2ms      1.2ms          -          -          -
  256      2.7ms        3ms          -          -          -
  384      8.8ms      8.5ms          -          -          -
  512     20.5ms     16.5ms          -          -          -
  640     39.9ms     27.4ms          -          -          -
  768     67.8ms     43.4ms          -          -          -
  896    107.9ms     59.4ms          -          -          -
 1024    159.2ms       77ms          -          -          -

Matrix inverse

Computing the inverse of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      715ns     12.4µs      406ns          -          -
    8      1.6µs     16.7µs      991ns          -          -
   16      4.2µs     25.4µs      4.1µs          -          -
   32     13.6µs     53.5µs     12.8µs          -          -
   64     51.3µs     96.7µs     56.2µs          -          -
   96      130µs      169µs    450.3µs          -          -
  128    240.2µs    260.5µs      3.3ms          -          -
  192    659.7µs    589.3µs      6.1ms          -          -
  256      1.4ms      1.1ms      7.2ms          -          -
  384      4.1ms      2.4ms     15.8ms          -          -
  512      9.8ms      4.5ms     21.3ms          -          -
  640     17.3ms      7.5ms     28.9ms          -          -
  768     28.7ms     10.8ms       40ms          -          -
  896     44.9ms     15.3ms     52.6ms          -          -
 1024     74.2ms     23.2ms     70.7ms          -          -

Square matrix singular value decomposition

Computing the SVD of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      1.4µs      1.3µs      2.4µs          -          -
    8      7.4µs     23.1µs      7.4µs          -          -
   16     23.3µs     56.8µs     24.9µs          -          -
   32     85.2µs    130.1µs     86.7µs          -          -
   64      374µs    464.2µs      4.1ms          -          -
   96      953µs      1.3ms        7ms          -          -
  128      1.9ms      2.2ms     51.7ms          -          -
  192      5.3ms      6.3ms    135.2ms          -          -
  256     11.4ms       11ms    204.5ms          -          -
  384     34.1ms     27.3ms    312.2ms          -          -
  512     77.3ms     48.9ms      516ms          -          -
  640    144.7ms     78.2ms    688.1ms          -          -
  768    248.5ms    119.6ms    912.3ms          -          -
  896    384.6ms    176.2ms      1.16s          -          -
 1024    560.5ms    249.6ms      1.64s          -          -

Thin matrix singular value decomposition

Computing the SVD of a rectangular matrix with shape (4096, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4     43.6µs     43.7µs    578.8µs          -          -
    8    165.7µs    180.3µs      1.5ms          -          -
   16    471.3µs    573.1µs      7.8ms          -          -
   32      1.4ms      1.5ms     15.2ms          -          -
   64      4.6ms      4.5ms     58.3ms          -          -
   96      9.3ms      8.2ms     62.8ms          -          -
  128     16.1ms     12.8ms    103.5ms          -          -
  192       34ms     24.8ms    237.3ms          -          -
  256     60.9ms     41.3ms    383.9ms          -          -
  384    134.4ms     83.4ms    663.6ms          -          -
  512      248ms    142.1ms      1.05s          -          -
  640    402.7ms    204.7ms      1.52s          -          -
  768    616.6ms    298.7ms      1.81s          -          -
  896    874.1ms    399.3ms      2.44s          -          -
 1024      1.19s    532.8ms      2.87s          -          -

Hermitian matrix eigenvalue decomposition

Computing the EVD of a hermitian matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      1.1µs        1µs      1.2µs          -          -
    8        3µs      3.6µs      4.2µs          -          -
   16       12µs     11.8µs     15.5µs          -          -
   32     50.4µs     53.6µs       68µs          -          -
   64    224.6µs    245.4µs    437.8µs          -          -
   96    519.9µs    615.6µs      1.4ms          -          -
  128    989.7µs      1.1ms        6ms          -          -
  192      2.5ms      2.8ms     25.6ms          -          -
  256      5.1ms      4.8ms     51.1ms          -          -
  384     14.5ms     14.8ms    122.3ms          -          -
  512     31.6ms     27.8ms    232.1ms          -          -
  640     58.5ms     44.6ms    370.5ms          -          -
  768       97ms     65.1ms      554ms          -          -
  896    148.8ms     92.1ms    769.1ms          -          -
 1024    218.1ms    122.4ms      1.09s          -          -

Non Hermitian matrix eigenvalue decomposition

Computing the EVD of a matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      4.2µs      4.3µs      2.5µs          -          -
    8     14.7µs       14µs      8.6µs          -          -
   16     51.5µs       49µs     31.3µs          -          -
   32    228.1µs    225.8µs    146.1µs          -          -
   64        1ms        1ms    822.7µs          -          -
   96      2.8ms        3ms      5.1ms          -          -
  128      5.2ms      5.2ms     25.1ms          -          -
  192     14.2ms     14.4ms    121.7ms          -          -
  256     26.9ms     26.7ms    312.8ms          -          -
  384     69.1ms     78.5ms    624.7ms          -          -
  512    159.2ms    177.8ms      1.02s          -          -
  640    256.7ms    259.4ms      1.17s          -          -
  768    407.6ms    382.6ms      1.69s          -          -
  896    588.6ms    531.4ms      2.03s          -          -
 1024      1.03s    931.6ms      2.75s          -          -

f128

Matrix multiplication

Multiplication of two square matrices of dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      150ns      151ns          -          -          -
    8      1.3µs     16.2µs          -          -          -
   16      8.1µs     25.4µs          -          -          -
   32     60.4µs     40.6µs          -          -          -
   64    474.6µs    162.2µs          -          -          -
   96      1.6ms    519.4µs          -          -          -
  128      3.8ms    961.5µs          -          -          -
  192     12.8ms      2.6ms          -          -          -
  256     30.7ms      5.5ms          -          -          -
  384    103.2ms     16.5ms          -          -          -
  512    254.1ms     39.6ms          -          -          -
  640    481.4ms     70.9ms          -          -          -
  768    866.7ms    133.5ms          -          -          -
  896      1.34s      185ms          -          -          -
 1024      2.25s    311.7ms          -          -          -

Triangular solve

Solving AX = B in place where A and B are two square matrices of dimension n, and A
is a triangular matrix.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       73ns       73ns          -          -          -
    8      670ns      669ns          -          -          -
   16      5.1µs     24.7µs          -          -          -
   32     37.8µs    179.7µs          -          -          -
   64    269.6µs    471.8µs          -          -          -
   96    892.4µs    568.2µs          -          -          -
  128      2.1ms    926.1µs          -          -          -
  192      6.9ms      2.3ms          -          -          -
  256       16ms      4.4ms          -          -          -
  384     53.9ms       12ms          -          -          -
  512    126.2ms     25.5ms          -          -          -
  640    243.8ms     47.3ms          -          -          -
  768    419.6ms     71.8ms          -          -          -
  896    662.1ms    118.7ms          -          -          -
 1024      1.03s    170.5ms          -          -          -

Triangular inverse

Computing A^-1 where A is a square triangular matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      271ns       16µs          -          -          -
    8      965ns     13.8µs          -          -          -
   16      4.2µs     35.5µs          -          -          -
   32       22µs     85.3µs          -          -          -
   64    132.6µs    330.6µs          -          -          -
   96      389µs    566.4µs          -          -          -
  128    862.3µs        1ms          -          -          -
  192      2.7ms        2ms          -          -          -
  256      6.1ms        3ms          -          -          -
  384     19.6ms      7.6ms          -          -          -
  512       46ms       13ms          -          -          -
  640     96.4ms     25.3ms          -          -          -
  768      155ms     35.2ms          -          -          -
  896    235.6ms       54ms          -          -          -
 1024    349.3ms     75.2ms          -          -          -

Cholesky decomposition

Factorizing a square matrix with dimension n as L×L.T, where L is lower triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      289ns      286ns          -          -          -
    8      692ns      696ns          -          -          -
   16      2.4µs      2.4µs          -          -          -
   32     18.9µs     58.2µs          -          -          -
   64    128.9µs    393.8µs          -          -          -
   96    368.2µs      844µs          -          -          -
  128    858.1µs      1.6ms          -          -          -
  192      2.7ms      3.6ms          -          -          -
  256      6.1ms      5.2ms          -          -          -
  384     20.1ms     12.8ms          -          -          -
  512     45.8ms       19ms          -          -          -
  640     88.4ms     36.8ms          -          -          -
  768    147.9ms     50.1ms          -          -          -
  896    231.4ms     69.7ms          -          -          -
 1024    348.7ms     89.7ms          -          -          -

LU decomposition with partial pivoting

Factorizing a square matrix with dimension n as P×L×U, where P is a permutation matrix,
L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      214ns      186ns          -          -          -
    8      580ns      596ns          -          -          -
   16      2.4µs      2.5µs          -          -          -
   32     21.7µs     64.4µs          -          -          -
   64    173.7µs    369.3µs          -          -          -
   96    581.2µs    889.6µs          -          -          -
  128      1.3ms      1.5ms          -          -          -
  192      4.8ms      3.7ms          -          -          -
  256     10.9ms      6.4ms          -          -          -
  384     36.4ms     16.3ms          -          -          -
  512     85.5ms     33.5ms          -          -          -
  640    165.6ms       54ms          -          -          -
  768    284.5ms     80.2ms          -          -          -
  896    450.1ms      125ms          -          -          -
 1024    684.1ms    173.6ms          -          -          -

LU decomposition with full pivoting

Factorizing a square matrix with dimension n as P×L×U×Q.T, where P and Q are
permutation matrices, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      310ns      314ns          -          -          -
    8      1.2µs      1.1µs          -          -          -
   16      5.4µs      5.5µs          -          -          -
   32     35.5µs     35.6µs          -          -          -
   64    255.5µs      256µs          -          -          -
   96    843.7µs    840.9µs          -          -          -
  128        2ms        2ms          -          -          -
  192      6.7ms      6.7ms          -          -          -
  256     16.2ms     16.2ms          -          -          -
  384     52.2ms     47.8ms          -          -          -
  512    125.9ms     76.2ms          -          -          -
  640    241.1ms    114.2ms          -          -          -
  768    417.1ms    166.8ms          -          -          -
  896    661.6ms    232.6ms          -          -          -
 1024    996.3ms    326.2ms          -          -          -

QR decomposition with no pivoting

Factorizing a square matrix with dimension n as QR, where Q is unitary and R is upper
triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      698ns      699ns          -          -          -
    8      2.3µs      2.3µs          -          -          -
   16     10.9µs     10.7µs          -          -          -
   32     56.8µs     56.6µs          -          -          -
   64    519.8µs    518.4µs          -          -          -
   96      1.5ms      1.5ms          -          -          -
  128      3.4ms      3.5ms          -          -          -
  192     10.7ms     10.8ms          -          -          -
  256     24.7ms     18.5ms          -          -          -
  384     80.8ms       52ms          -          -          -
  512    198.4ms     97.7ms          -          -          -
  640    368.2ms    146.3ms          -          -          -
  768      624ms    219.9ms          -          -          -
  896    980.1ms    318.4ms          -          -          -
 1024      1.46s    422.8ms          -          -          -

QR decomposition with column pivoting

Factorizing a square matrix with dimension n as QRP, where P is a permutation matrix,
Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      947ns      939ns          -          -          -
    8      3.6µs      3.6µs          -          -          -
   16     16.8µs     16.8µs          -          -          -
   32       97µs     97.4µs          -          -          -
   64    684.6µs    714.8µs          -          -          -
   96        2ms      1.9ms          -          -          -
  128      4.4ms      4.2ms          -          -          -
  192     13.7ms     13.1ms          -          -          -
  256     30.7ms     25.1ms          -          -          -
  384     99.1ms     49.1ms          -          -          -
  512    231.4ms       90ms          -          -          -
  640    449.1ms    151.6ms          -          -          -
  768    764.5ms    230.5ms          -          -          -
  896      1.21s    332.2ms          -          -          -
 1024      1.79s      462ms          -          -          -

Matrix inverse

Computing the inverse of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      1.4µs     15.1µs          -          -          -
    8      4.5µs     36.1µs          -          -          -
   16     20.4µs     63.4µs          -          -          -
   32    110.3µs    196.8µs          -          -          -
   64    694.5µs    777.2µs          -          -          -
   96      2.1ms      1.7ms          -          -          -
  128      4.7ms      3.1ms          -          -          -
  192     15.1ms      7.8ms          -          -          -
  256       35ms     13.6ms          -          -          -
  384    117.1ms     35.4ms          -          -          -
  512    267.1ms     64.2ms          -          -          -
  640    524.5ms    120.4ms          -          -          -
  768    899.5ms    179.5ms          -          -          -
  896       1.4s    273.8ms          -          -          -
 1024      2.17s    405.6ms          -          -          -

Square matrix singular value decomposition

Computing the SVD of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4     11.6µs     11.7µs          -          -          -
    8     60.4µs     96.7µs          -          -          -
   16    259.8µs    333.2µs          -          -          -
   32      1.1ms      1.1ms          -          -          -
   64      5.9ms      5.6ms          -          -          -
   96     16.4ms     12.4ms          -          -          -
  128     31.5ms     22.2ms          -          -          -
  192     89.6ms       51ms          -          -          -
  256    194.7ms     89.3ms          -          -          -
  384    581.4ms    243.3ms          -          -          -
  512      1.32s    464.9ms          -          -          -
  640      2.48s      758ms          -          -          -
  768      4.13s      1.16s          -          -          -
  896      6.45s      1.72s          -          -          -
 1024      9.51s      2.27s          -          -          -

Thin matrix singular value decomposition

Computing the SVD of a rectangular matrix with shape (4096, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      1.1ms      2.1ms          -          -          -
    8      3.6ms      4.6ms          -          -          -
   16     10.2ms     13.2ms          -          -          -
   32     32.3ms     33.1ms          -          -          -
   64    120.3ms     86.5ms          -          -          -
   96    253.1ms    142.4ms          -          -          -
  128    444.6ms    216.4ms          -          -          -
  192    996.1ms    399.4ms          -          -          -
  256       1.8s    645.6ms          -          -          -
  384      4.05s      1.24s          -          -          -
  512      7.37s       2.1s          -          -          -
  640     11.67s      3.04s          -          -          -
  768     17.43s       4.6s          -          -          -
  896     23.81s      6.02s          -          -          -
 1024     32.59s      7.85s          -          -          -

Hermitian matrix eigenvalue decomposition

Computing the EVD of a hermitian matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      5.4µs      5.8µs          -          -          -
    8     21.9µs     25.7µs          -          -          -
   16      102µs     99.5µs          -          -          -
   32    515.3µs    497.8µs          -          -          -
   64      3.1ms      3.1ms          -          -          -
   96      8.6ms      7.2ms          -          -          -
  128     17.5ms     13.1ms          -          -          -
  192     51.1ms     33.1ms          -          -          -
  256    108.3ms     65.1ms          -          -          -
  384    339.3ms    152.3ms          -          -          -
  512    745.4ms    283.9ms          -          -          -
  640       1.4s    457.2ms          -          -          -
  768      2.36s    702.8ms          -          -          -
  896      3.72s      1.03s          -          -          -
 1024      5.45s      1.37s          -          -          -

Non Hermitian matrix eigenvalue decomposition

Computing the EVD of a matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4     16.2µs     14.7µs          -          -          -
    8     68.3µs    101.5µs          -          -          -
   16    349.1µs      388µs          -          -          -
   32      1.8ms      1.8ms          -          -          -
   64     11.4ms     12.8ms          -          -          -
   96     46.5ms     60.2ms          -          -          -
  128     92.9ms    104.8ms          -          -          -
  192    299.1ms    221.5ms          -          -          -
  256    593.9ms    418.8ms          -          -          -
  384      1.61s    978.8ms          -          -          -
  512      3.46s         2s          -          -          -
  640      6.26s      3.63s          -          -          -
  768     10.17s      5.24s          -          -          -
  896     15.31s      7.25s          -          -          -
 1024     22.59s      9.99s          -          -          -

c32

Matrix multiplication

Multiplication of two square matrices of dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       61ns       61ns       97ns          -          -
    8      317ns      319ns      208ns          -          -
   16        2µs        2µs      659ns          -          -
   32     14.1µs     14.1µs        4µs          -          -
   64    107.4µs    107.4µs    837.1µs          -          -
   96      356µs    123.6µs      1.1ms          -          -
  128    834.9µs    285.1µs      1.3ms          -          -
  192      2.8ms    675.1µs      1.5ms          -          -
  256      6.6ms      1.4ms        2ms          -          -
  384     22.3ms      4.6ms      3.1ms          -          -
  512     52.6ms     10.6ms      6.7ms          -          -
  640    103.6ms     19.3ms      9.9ms          -          -
  768    178.3ms     32.5ms     16.9ms          -          -
  896    282.8ms     50.6ms     23.1ms          -          -
 1024    422.1ms     74.6ms     33.5ms          -          -

Triangular solve

Solving AX = B in place where A and B are two square matrices of dimension n, and A is a triangular matrix.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       22ns       22ns     26.8µs          -          -
    8      203ns      202ns     27.6µs          -          -
   16      1.3µs      1.3µs     30.5µs          -          -
   32      8.8µs      8.8µs     40.6µs          -          -
   64     62.7µs     62.7µs     59.5µs          -          -
   96      206µs      132µs    332.9µs          -          -
  128    466.5µs      272µs    432.2µs          -          -
  192      1.5ms    558.7µs    602.8µs          -          -
  256      3.5ms      1.1ms    919.3µs          -          -
  384     11.7ms      2.9ms      1.8ms          -          -
  512     27.4ms      6.2ms      4.5ms          -          -
  640     53.3ms     11.7ms        5ms          -          -
  768     91.6ms     19.5ms     10.2ms          -          -
  896    144.6ms     29.4ms       12ms          -          -
 1024    215.7ms     43.6ms     25.1ms          -          -

Triangular inverse

Computing A^-1 where A is a square triangular matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      186ns     15.9µs     27.5µs          -          -
    8      534ns     12.7µs     27.6µs          -          -
   16      1.8µs     16.5µs     30.5µs          -          -
   32      7.1µs     27.2µs     40.7µs          -          -
   64     35.2µs     56.5µs     59.6µs          -          -
   96    101.2µs    146.3µs     85.4µs          -          -
  128    208.8µs    251.4µs    454.6µs          -          -
  192      636µs    576.1µs    597.7µs          -          -
  256      1.4ms        1ms      1.1ms          -          -
  384      4.4ms      1.8ms      1.8ms          -          -
  512     10.1ms      3.1ms      4.5ms          -          -
  640     19.3ms      5.3ms        5ms          -          -
  768     32.8ms      7.9ms     10.2ms          -          -
  896     51.7ms     11.2ms       12ms          -          -
 1024     76.5ms     16.3ms       25ms          -          -

Cholesky decomposition

Factorizing a square matrix with dimension n as L×L.T, where L is lower triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       45ns       45ns      124ns          -          -
    8      172ns      172ns      397ns          -          -
   16      925ns      925ns      1.3µs          -          -
   32      5.1µs      5.2µs      6.6µs          -          -
   64     30.3µs     30.4µs    176.3µs          -          -
   96     85.9µs       86µs      1.7ms          -          -
  128    192.3µs    192.3µs      1.5ms          -          -
  192    583.1µs    541.4µs        4ms          -          -
  256      1.3ms    934.9µs        5ms          -          -
  384      4.2ms      2.3ms       10ms          -          -
  512      9.8ms      3.8ms     11.5ms          -          -
  640     18.9ms      6.5ms     26.6ms          -          -
  768     32.2ms      9.9ms     20.1ms          -          -
  896     50.8ms     13.8ms     18.9ms          -          -
 1024     76.1ms     19.1ms     28.1ms          -          -

LU decomposition with partial pivoting

Factorizing a square matrix with dimension n as P×L×U, where P is a permutation matrix, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      105ns       91ns      176ns          -          -
    8      266ns      273ns      566ns          -          -
   16        1µs      1.1µs      2.3µs          -          -
   32      6.5µs      6.6µs      7.8µs          -          -
   64     45.5µs     45.4µs     30.9µs          -          -
   96    145.9µs    145.5µs       74µs          -          -
  128    327.9µs    326.5µs    360.5µs          -          -
  192      1.1ms    863.8µs      756µs          -          -
  256      2.4ms      1.7ms        1ms          -          -
  384        8ms      4.2ms      2.6ms          -          -
  512     18.7ms        8ms      4.7ms          -          -
  640     36.3ms     13.9ms      7.6ms          -          -
  768     62.6ms     20.9ms     10.8ms          -          -
  896     98.6ms     29.8ms     14.7ms          -          -
 1024    146.5ms       43ms       24ms          -          -

LU decomposition with full pivoting

Factorizing a square matrix with dimension n as P×L×U×Q.T, where P and Q are permutation matrices, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      191ns      185ns          -          -          -
    8      573ns      516ns          -          -          -
   16      2.2µs      2.3µs          -          -          -
   32     12.4µs     12.3µs          -          -          -
   64     82.7µs     82.4µs          -          -          -
   96    262.9µs    262.7µs          -          -          -
  128    609.8µs    609.7µs          -          -          -
  192        2ms        2ms          -          -          -
  256      4.9ms      4.9ms          -          -          -
  384     16.4ms     15.6ms          -          -          -
  512     40.7ms     29.4ms          -          -          -
  640     77.3ms       41ms          -          -          -
  768    134.2ms     63.2ms          -          -          -
  896    215.1ms     99.3ms          -          -          -
 1024    316.8ms    136.3ms          -          -          -

QR decomposition with no pivoting

Factorizing a square matrix with dimension n as QR, where Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      164ns      164ns      879ns          -          -
    8      487ns      487ns      2.1µs          -          -
   16      2.5µs      2.5µs      6.9µs          -          -
   32     16.1µs     16.1µs     29.1µs          -          -
   64    132.5µs    132.5µs        1ms          -          -
   96    377.1µs    377.3µs      5.2ms          -          -
  128    815.1µs    815.2µs      9.7ms          -          -
  192      2.5ms      2.5ms     47.7ms          -          -
  256      5.7ms      4.7ms     68.9ms          -          -
  384     18.2ms     11.3ms    132.7ms          -          -
  512     41.9ms     21.9ms    176.6ms          -          -
  640     78.9ms     34.2ms      266ms          -          -
  768    134.3ms     52.8ms    350.6ms          -          -
  896    210.9ms     77.2ms    453.1ms          -          -
 1024    313.2ms    109.5ms    544.3ms          -          -

QR decomposition with column pivoting

Factorizing a square matrix with dimension n as QRP, where P is a permutation matrix, Q is unitary and R is upper triangular.

thread 'main' panicked at 'cast>SizeMismatch', /Users/leifeld/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bytemuck-1.13.1/src/internal.rs:32:3
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
leifeld@MacStudovonDirk faer-bench %  77.3ms       41ms          -          -          -
  768    134.2ms     63.2ms          -          -          -
  896    215.1ms     99.3ms          -          -          -
 1024    316.8ms    136.3ms          -          -          -

QR decomposition with no pivoting

Factorizing a square matrix with dimension n as QR, where Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      164ns      164ns      879ns          -          -
    8      487ns      487ns      2.1µs          -          -
   16      2.5µs      2.5µs      6.9µs          -          -
   32     16.1µs     16.1µs     29.1µs          -          -
   64    132.5µs    132.5µs        1ms          -          -
   96    377.1µs    377.3µs      5.2ms          -          -
  128    815.1µs    815.2µs      9.7ms          -          -
  192      2.5ms      2.5ms     47.7ms          -          -
  256      5.7ms      4.7ms     68.9ms          -          -
  384     18.2ms     11.3ms    132.7ms          -          -
  512     41.9ms     21.9ms    176.6ms          -          -
  640     78.9ms     34.2ms      266ms          -          -
  768    134.3ms     52.8ms    350.6ms          -          -
  896    210.9ms     77.2ms    453.1ms          -          -
 1024    313.2ms    109.5ms    544.3ms          -          -

QR decomposition with column pivoting

Factorizing a square matrix with dimension n as QRP, where P is a permutation matrix, Q is unitary and R is upper triangular.

thread 'main' panicked at 'cast>SizeMismatch', /Users/leifeld/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bytemuck-1.13.1/src/internal.rs:32:3

`i32` overflow when calling `Faer::selfadjoint_eigenvalues()` on a `20` x `20` matrix

Describe the bug

For many graph adjacency matrices with small eigenvalues, faer crashes on an i32 overflow when computing eigenvalues as a self-adjoint matrix.

The precise line is iter += 1;.

My current workaround to avoid selfadjoint_eigenvalues() is

let eigs: Vec<faer::complex_native::c64> = a.eigenvalues();

To Reproduce

let mat: [[f64; 20]; 20] = [[0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]];
let mut matt = faer::Mat::zeros(20, 20);
for i in 0..20 {
    for j in 0..20 {
        matt[(i, j)] = mat[i][j];
    }
}
let eigs = matt.selfadjoint_eigenvalues(faer::Side::Lower);

Expected behavior
Expected eigs to return a vector of $20$ eigenvalues.

For example,

# `bad_mat.py`
# run with `python bad_mat.py`

import numpy as np
mat = [[0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]]
mat = np.array(mat)

eigs = np.linalg.eig(mat)[0]
print(eigs)

real_parts = [np.real(_) for _ in eigs]
print(real_parts)

''' output:
[ 3.69390168e+00+0.00000000e+00j -3.23272209e+00+0.00000000e+00j
 -2.52597629e+00+0.00000000e+00j  2.29766159e+00+0.00000000e+00j
  2.19984492e+00+0.00000000e+00j -2.25650189e+00+0.00000000e+00j
 -1.98531299e+00+0.00000000e+00j -1.48123145e+00+0.00000000e+00j
 -1.10237216e+00+0.00000000e+00j  1.47508155e+00+0.00000000e+00j
 -6.62467970e-01+0.00000000e+00j -5.32411898e-01+0.00000000e+00j
 -2.28816715e-01+0.00000000e+00j  4.97954067e-01+0.00000000e+00j
  7.40821370e-01+0.00000000e+00j  1.10254826e+00+0.00000000e+00j
  1.00000000e+00+0.00000000e+00j  1.00000000e+00+0.00000000e+00j
 -1.88640048e-17+9.57840098e-17j -1.88640048e-17-9.57840098e-17j]
[3.6939016833383165, -3.2327220859059014, -2.52597628501941, 2.2976615943259593, 2.199844923144114, -2.2565018926765257, -1.9853129913610488, -1.4812314534620308, -1.102372157787216, 1.4750815501253782, -0.6624679696406541, -0.5324118977580056, -0.22881671515516241, 0.49795406722182745, 0.7408213702790264, 1.1025482603313292, 1.0000000000000004, 1.0000000000000004, -1.886400483817679e-17, -1.886400483817679e-17]
'''

Additional context
Feel free to suggest a different method to use!

Since these matrices have a few very small eigenvalues, perhaps it is appropriate to tweak the high-level function to instead calculate a good guess for consider_zero_threshold in the higher-level API.

In this instance, the value equals 2.2250738585072014e-308.

pub fn compute_tridiag_real_evd_qr_algorithm<E: RealField>(
    diag: &mut [E],
    offdiag: &mut [E],
    u: Option<MatMut<'_, E>>,
    epsilon: E,
    consider_zero_threshold: E,
) {

Error `Assertion failed` when computing eigenvalues for a particular matrix

When this matrix

[
[0.0 + 0.0 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I, 2.220446049250313e-16 + -1.0000000000000002 * I],
[0.0 + 0.0 * I, 0.0 + 0.0 * I, 2.220446049250313e-16 + -1.0000000000000002 * I, 0.0 + 0.0 * I],
[0.0 + 0.0 * I, 2.220446049250313e-16 + -1.0000000000000002 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I],
[2.220446049250313e-16 + -1.0000000000000002 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I],
]

as a Mat<c64> is passed to eigenvalues or complex_eigenvalues, it throws an error.

numpy does compute eigenvalues:

ar = np.array([
[0.0 + 0.0j, 0.0 + 0.0j, 0.0 + 0.0j, 2.220446049250313e-16 + -1.0000000000000002j],
[0.0 + 0.0j, 0.0 + 0.0j, 2.220446049250313e-16 + -1.0000000000000002j, 0.0 + 0.0j],
[0.0 + 0.0j, 2.220446049250313e-16 + -1.0000000000000002j, 0.0 + 0.0j, 0.0 + 0.0j],
[2.220446049250313e-16 + -1.0000000000000002j, 0.0 + 0.0j, 0.0 + 0.0j, 0.0 + 0.0j],
])

In [1]: np.linalg.eigvals(ar)
Out[1]: 
array([ 1.11022302e-16-1.j, -2.22044605e-16+1.j,  1.11022302e-16-1.j,
       -2.22044605e-16+1.j])

The error messages are

Assertion failed at /home/lapeyre/.cargo/registry/src/index.crates.io-6f17d22bba15001f/faer-core-0.13.0/src/lib.rs:2093:13:
  assert!( row < this.nrows() )
with expansion:
  4 < 4

Several other matrices do give the same eigenvalues that numpy does.

Version: 0.13.1
Arch Linux

Geometric Algebra

Will this repository include topics such as Geometric Algerbra, such as G^n. This would be something I would be interested in helping contribute to.

Normal math operators can't be used with sparse matrices

Is your feature request related to a problem? Please describe.
Normal math operators like +, -, *, == etc. can't be used on sparse matrices. There are currently two bounds that can't be satisfied:

  1. The inner storage types of sparse matrices have to impl GenericMatrix, which they don't.
  2. The generic sparse types have to implement the matrix operation traits like MatMul, MatAdd etc. Right now MatMul is implemented for sparse types, but not any of the other operations.

Describe the solution you'd like
Right now, to multiply a sparse matrix with another matrix, you must manually invoke the MatMul implementation as so:

<SparseRowMat<_> as MatMul<DenseCol>>::mat_mul(lhs, rhs.as_ref());

And there's no way to add or subtract sparse matrices. It would be nice to be able to do something like lhs * &rhs and lhs + &rhs like normal. As well, it would also be nice to have a Debug and Index implementation for sparse matrices.

Improved ergonomics for `polars_to_faer_fxx`

The current iteration of faer::polars::polars_to_faer_fxx requires that a slice &[&str] of column names be passed to the function. While there is nothing wrong with this in an absolute sense, it does make for a somewhat awkward workflow when working with polars.

A pattern more aligned to how polars functions is to remove the &[&str] item from the function signature, and have the function create a Mat<E> from the entire frame. The function should also error if columns containing invalid data types (such as strings) are passed in as well.

Sparse Cholesky Solver

Is your feature request related to a problem? Please describe.
Some problems need a Sparse Cholesky solver, such as GraphSlam

Describe the solution you'd like
Implement a Sparse Cholesky Solver

Describe alternatives you've considered
In Rust, I used Russel which wraps SuiteSparse

Large matrix support

I saw square matrix sizes up to 1k in the benchmark section. Is it in scope of the project to maintain and optimize larger (50-200k) matrix sizes for multiplication (or other operations)?

implement `core::ops::Sub` for `Mat<T>`

implement subtraction for Mat<T>

the traits core::ops::Sub<Mat<T>> and core::ops::Sub<&Mat<T>> should be implemented for Mat<T> and &Mat<T>

T should satisfy the trait ComplexField for those operations to be implementable. Mat::with_dims could be used, with a closure that takes two elements and subtracts them

Specify MSRV

Specifying or documenting the MSRV somewhere would be helpful for consumers of faer.

This was discussed on discord.

Segfault in faer-mt-cplx-llt-512

Running cargo bench faer-mt-cplx-llt-512 leads to a segfault on my machine (the corresponding single threaded bench seems to run fine). I get the following traceback:

#0  0x000055c4371ca569 in core::ptr::write<[num_complex::Complex<f64>; 3]> () at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ptr/mod.rs:1354                                                                      
#1  core::ptr::mut_ptr::{impl#0}::write<[num_complex::Complex<f64>; 3]> () at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ptr/mut_ptr.rs:1449                                                                     
#2  gemm_common::pack_operands::pack_generic_inner_loop<num_complex::Complex<f64>, 1, 3> () at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:163                                   
#3  gemm_common::pack_operands::pack_generic<num_complex::Complex<f64>, 1, 3> () at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:256                                              
#4  gemm_common::pack_operands::pack_rhs::{closure#0}<num_complex::Complex<f64>, 1, 3, gemm_common::simd::x86::Fma> () at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:301        
#5  gemm_common::simd::x86::{impl#2}::vectorize<gemm_common::pack_operands::pack_rhs::{closure_env#0}<num_complex::Complex<f64>, 1, 3, gemm_common::simd::x86::Fma>> (f=...)                                                           
    at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/simd.rs:50                                                                                                                                     
#6  0x000055c4371c74ef in gemm_common::pack_operands::pack_rhs<num_complex::Complex<f64>, 1, 3, gemm_common::simd::x86::Fma> (n=<optimized out>, k=8192, dst=..., src=..., src_cs=48, src_rs=255, dst_stride=768)                      
    at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:299                                                                                                                           
#7  0x000055c4371c4992 in gemm_common::gemm::gemm_basic_generic::{closure#3}<gemm_common::simd::x86::Fma, num_complex::Complex<f64>, 2, 6, 3, 3, gemm_c64::gemm::f64::fma_cplx::gemm_basic_cplx::{closure_env#0}> (                    
    tid=<optimized out>) at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/gemm.rs:395                                                                                                               
#8  0x000055c4371f88ea in core::ops::function::impls::{impl#0}::call<(usize), (dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (args=..., self=<optimized out>)                            
    at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ops/function.rs:263                                                                                                                                            
#9  core::ops::function::impls::{impl#1}::call_mut<(usize), &(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (args=..., self=<optimized out>)                                             
    at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ops/function.rs:274                                                                                                                                            
#10 core::iter::traits::iterator::Iterator::for_each::call::{closure#0}<usize, &&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (item=8192)                                              
    at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/iter/traits/iterator.rs:849                                                                                                                                    
#11 core::iter::traits::iterator::Iterator::fold<core::ops::range::Range<usize>, (), core::iter::traits::iterator::Iterator::for_each::call::{closure_env#0}<usize, &&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::
Send + core::marker::Sync)>> (self=..., f=..., init=<optimized out>) at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/iter/traits/iterator.rs:2477                                                                  
#12 core::iter::traits::iterator::Iterator::for_each<core::ops::range::Range<usize>, &&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (self=..., f=0x7fe657ffd740)                       
    at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/iter/traits/iterator.rs:852                                                                                                                                    
#13 rayon::iter::for_each::{impl#1}::consume_iter<&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync), usize, core::ops::range::Range<usize>> (self=..., iter=...)                            
    at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-1.7.0/src/iter/for_each.rs:55                                                                                                                                   
#14 rayon::iter::plumbing::Producer::fold_with<rayon::range::IterProducer<usize>, rayon::iter::for_each::ForEachConsumer<&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)>> (self=...,     
    folder=...) at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-1.7.0/src/iter/plumbing/mod.rs:110                                                                                                                  
#15 rayon::iter::plumbing::bridge_producer_consumer::helper<rayon::range::IterProducer<usize>, rayon::iter::for_each::ForEachConsumer<&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)>> ( 
    len=<optimized out>, migrated=<optimized out>, splitter=..., producer=..., consumer=...) at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-1.7.0/src/iter/plumbing/mod.rs:438                                     

OS: arch linux
CPU: AMD Threadripper 1920X

version: current main branch (88dd024)

Bump Polars Version to 0.34

Is your feature request related to a problem? Please describe.
Faer right now doesn't work with Polars 0.34. We should update it and make sure nothing breaks and the tests still pass.

Describe the solution you'd like

polars = { version = "0.34", optional = true, features = ["lazy"] }

and see if existing functions still work.

Describe alternatives you've considered
NA

Additional context
NA

Thanks in advance

Unexpected results using `faer_core::solve::solve_lower_triangular_in_place_with_conj`

First of all, thanks a lot for your work in faer, it is impressive. It has also been a lot of fun to play around with the library.

Describe the bug
The results of solving a lower triangular system are unexpected.

To Reproduce
The following code reproduces the error. It also shows how the equivalent code with nalgebra gets the correct solution:

# Cargo.toml

[package]
name = "test"
version = "0.1.0"
edition = "2021"

[dependencies]
faer-core = "0.9.1"
nalgebra = "0.32.2"
// main.rs

const RESPONSE: [f64; 3] = [-18.0, -60.0, -80.0];

// Toeplitz and lower triangular
fn r_element(i: usize, j: usize) -> f64 {
    if i >= j {
        let diff = i - j;
        if diff < RESPONSE.len() {
            RESPONSE[diff]
        } else {
            0.0
        }
    } else {
        0.0
    }
}

fn r_matrix_faer(n: usize) -> faer_core::Mat<f64> {
    faer_core::Mat::with_dims(n, n, |i, j| r_element(i, j))
}

fn r_matrix_nalgebra(n: usize) -> nalgebra::DMatrix<f64> {
    nalgebra::DMatrix::from_fn(n, n, |i, j| r_element(i, j))
}

// Solve for X in Y = R * X
fn solve_x_faer(r: &faer_core::Mat<f64>, y: &mut faer_core::Mat<f64>) {
    faer_core::solve::solve_lower_triangular_in_place_with_conj(
        r.as_ref(),
        faer_core::Conj::No,
        y.as_mut(),
        faer_core::Parallelism::None,
    );
}

// Solve for X in Y = R * X
fn solve_x_nalgebra(r: &nalgebra::DMatrix<f64>, y: &mut nalgebra::DMatrix<f64>) {
    r.solve_lower_triangular_mut(y);
}

fn main() {
    let (i, j) = (400, 1);

    let r = r_matrix_faer(i);
    let mut x = faer_core::Mat::zeros(i, j);
    x.write(0, 0, 150.0);

    let mut y = r.clone() * x;

    solve_x_faer(&r, &mut y); // The values in `y` blow up

    // Now, the same thing with nalgebra
    let r = r_matrix_nalgebra(i);
    let mut x = nalgebra::DMatrix::zeros(i, j);
    x[(0, 0)] = 150.0;

    let mut y = r.clone() * x;

    solve_x_nalgebra(&r, &mut y); // The values in `y` are correct
}

Expected behavior
I would expect the same results as nalgebra

A specialized Diagonal type?

I'm frequently using matrix-diagonal and diagonal-diagonal multiplication (which is a component-wise vector mul). Since populating a matrix with a diagonal does lead to wasted performance, I'm using a modified copy of div_by_s I found in faer source.

Does it make sense to flesh out a Diagonal type?

Auto Generate Changelog

Is your feature request related to a problem? Please describe.
This is more of a maintenance task to make it obvious and easy to see what changes have happened over the life of the project.

Describe the solution you'd like
I'm thinking of using an action to generate the Changelog on releases.

Describe alternatives you've considered
Considered GitHub Actions and some other functions for doing it. Decided to stick with native GitHub tooling.

Feel free to assign to me, as I think it would be fun to do.

Matrix Exponentiation

Is your feature request related to a problem? Please describe.
Matrix exponentation utilizing the standard double precision algorithm by Al-Mohy and Higham.

Describe the solution you'd like
A simple function that takes in a reference or matrix view (of some sorts not familiar with this library just yet) and yields a Result< ,> containing the exponentiated matrix if it worked and an error if not.

Describe alternatives you've considered
n/a

Additional context
I have implemented this function for ndarray-linalg but the maintainers have taken months and still no word. I am using my implementation for research purposes, using ndarray I have an error one-norm of about 20x scipy at 200x200 dense matrices and an error of machine precision for 1-sparse matrices up to 1024x1024. It might take some time but I'd be willing to port it over to this library to help move it along, would like to hear how this should best be done.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.