sciml / exponentialutilities.jl Goto Github PK

Fast and differentiable implementations of matrix exponentials, Krylov exponential matrix-vector multiplications ("expmv"), KIOPS, ExpoKit functions, and more. All your exponential needs in SciML form.

Home Page: https://docs.sciml.ai/ExponentialUtilities/stable/

License: Other

Julia 100.00%

julia high-performance exponential krylov krylov-methods krylov-subspace-methods sciml scientific-machine-learning differential-equations expokit

exponentialutilities.jl's Introduction

ExponentialUtilities

ExponentialUtilities is a package of utility functions for matrix functions of exponential type, including functionality for the matrix exponential and phi-functions. These methods are more numerically stable, generic (thus support a wider range of number types), and faster than the matrix exponentiation tools in Julia's Base. The tools are used by the exponential integrators in OrdinaryDiffEq. The package has no external dependencies, so it can also be used independently.

Tutorials and Documentation

For information on using the package, see the stable documentation. Use the in-development documentation for the version of the documentation, which contains the unreleased features.

Example

using ExponentialUtilities

A = rand(2, 2)
exponential!(A)

v = rand(2);
t = rand();
expv(t, A, v)

exponentialutilities.jl's People

Contributors

Stargazers

Watchers

exponentialutilities.jl's Issues

GPU performance of expv and expv_timestep could maybe be improved

See #101 . I don't think there was very rigorous testing of how some parts are on CPU and some parts are on GPU, though there is good justification. It would be nice to take a fine-toothed comb to this code and really optimize it, though the current code is likely close enough to optimal that it might not matter.

Output-sensitive Lanczos iteration

In [Algorithm 4, 1] the authors describe an "output-sensitive" Lanczos iteration scheme in the sense that the user can specify a projection matrix that's applied inside the iteration. To my non-expert understanding such option is not available in this library. In any case I'd be interested in contributing but some guidance would be very helpful.

We've used arnoldi!(Ks, Aᵀ, ℓ; m=m, ishermitian=hermitian, tol=tol) and cited ExponentialUtilities.jl in an upcoming paper. Being able to specify a projection matrix would give a huge boost in performance for large problems.

cc @jorgepz

[1] Bak, S., Tran, H. D., & Johnson, T. T. (2019, April). Numerical verification of affine systems with up to a billion dimensions. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control (pp. 23-32).
https://arxiv.org/abs/1804.01583

What to expect from preallocated exp(tA)v? And what is the proper way of doing it.

What to expect from preallocated exp(tA)v❓

Context
The problem I am trying to solve is

v(t+dt) = exp(-i * H * dt) v(t),

where H is a Hermitian matrix, and v is a complex valued vector. I have solved this with my own implementation of calculating the krylov subspace, and then calculating the exponent in the krylov subspace. I am running into a problem where calculating the exponent uses too much memory (I am preallocating for the krylov subspace construction part).

I was searching for ways to calculate exp(-i * H * dt) with doing as little allocations as possible, and I found this package. I tried using lanczos! and expv! to solve my problem. And I also tried keeping my own krylov subspace calculation part, but using the exponent! function from this package. I preallocated for both of these solutions, and in both the memory usage is still substantial, and the calculation time is roughly the same in all three scenarios. The number of allocations was the same when I used lanczos! and expv!, and my own code, but with exponent! there were a lot less allocations. The memory allocated was same when I used lanczos! and expv!, and exponent!, so here I saw significant improvement over my own code. Sadly this doesnt translate to calculation time, since there is still a lot of memory being allocated (about 20GiB).

I tested the different solutions by running an actual simulation, so not in a MWE or anyting like that. There is definetly other allocations coming from the rest of the code, and the situation is complicated, so please take these results with a grain of salt.

I am surprised that the number of allocations were reduced significantly when using the exponent!, but not with using lancoz! and expv!. This makes me wonder if I am doing something wrong. I had a difficult time figuring out how to preallocate for lancoz! and expv!. From the documentation it was hard to find an explanation how to do this, but with trial and error and by reading the source code I got the code running. Anyways, what I did was

# here k is the dimension of the krylov subspace, d the dimension of the system
cache = ExpvCache{Float64}(k)
ks = KrylovSubspace{ComplexF64, Float64}(d, k)

lanczos!(ks, H, v)
expv!(v, -1.0im * t, ks; cache)

Actual Questions
Is this the proper way to preallocate for this problem?
Should I expect close to zero allocations for the lanczos! and expv! part?
Do you know a better way to solve this? I am open for suggestions.

expv gives wrong results for a specific matrix

using this 55x55 matrix with expv gives wrong results

cols = [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 28, 28, 28, 28, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 32, 32, 32, 32, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 43, 43, 43, 43, 43, 43, 44, 44, 44, 44, 44, 45, 45, 45, 45, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49, 49, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 54, 55, 55, 55, 55, 55]
rows = [2, 3, 4, 6, 9, 14, 22, 35, 1, 5, 7, 10, 15, 23, 36, 1, 8, 11, 16, 24, 37, 1, 5, 12, 17, 25, 38, 2, 4, 5, 13, 18, 26, 39, 1, 7, 8, 19, 27, 40, 2, 6, 7, 20, 28, 41, 3, 6, 8, 21, 29, 42, 1, 10, 11, 12, 30, 43, 2, 9, 10, 13, 31, 44, 3, 9, 11, 32, 45, 4, 9, 12, 13, 33, 46, 5, 10, 12, 13, 34, 47, 1, 15, 16, 17, 19, 48, 2, 14, 15, 18, 20, 49, 3, 14, 16, 21, 50, 4, 14, 17, 18, 51, 5, 15, 17, 18, 52, 6, 14, 19, 20, 21, 53, 7, 15, 19, 20, 54, 8, 16, 19, 21, 55, 1, 23, 24, 25, 27, 30, 2, 22, 23, 26, 28, 31, 3, 22, 24, 29, 32, 4, 22, 25, 26, 33, 5, 23, 25, 26, 34, 6, 22, 27, 28, 29, 7, 23, 27, 28, 8, 24, 27, 29, 9, 22, 30, 31, 32, 33, 10, 23, 30, 31, 34, 11, 24, 30, 32, 12, 25, 30, 33, 34, 13, 26, 31, 33, 34, 1, 36, 37, 38, 40, 43, 48, 2, 35, 36, 39, 41, 44, 49, 3, 35, 37, 42, 45, 50, 4, 35, 38, 39, 46, 51, 5, 36, 38, 39, 47, 52, 6, 35, 40, 41, 42, 53, 7, 36, 40, 41, 54, 8, 37, 40, 42, 55, 9, 35, 43, 44, 45, 46, 10, 36, 43, 44, 47, 11, 37, 43, 45, 12, 38, 43, 46, 47, 13, 39, 44, 46, 47, 14, 35, 48, 49, 50, 51, 53, 15, 36, 48, 49, 52, 54, 16, 37, 48, 50, 55, 17, 38, 48, 51, 52, 18, 39, 49, 51, 52, 19, 40, 48, 53, 54, 55, 20, 41, 49, 53, 54, 21, 42, 50, 53, 55]
vals = [12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0, 12.566370614359172, 12.566370614359172, 12.566370614359172, 12.566370614359172, 0.0]

copy-paste the above code to REPL, and then

julia> M = sparse(rows, cols, vals)

julia> t = 0.9; st = rand(55);

julia> expv(-im*t, Matrix(H), st) ≈ exp(-im*t*Matrix(H)) * st
false

the error increases with t, thus I'm guessing there are something wrong with the convergence criteria?

Handling of different exp methods

There are several matrix exponential methods in the literature, suitable in different settings. At the moment we have Padé-based scaling and squaring. There is the direct diagonalization approach, but also polynomial based scaling and squaring (which might be better in a GPU setting), e.g. polynomial based of Sastre http://personales.upv.es/~jorsasma/software/expmpol.m or https://arxiv.org/abs/2107.12198. If we add more implementations I think we need a way for a user to select this. Sketch of a system design using dispatch:

struct ExpMethodHigham 
    do_balancing::Bool
end
ExpMethodHigham()=ExpMethodHigham(true);
struct ExpMethodDiagonalization
end
struct ExpMethodBase # Call Base.exp
end

....

Implementations could be done like this

function _exp!(A,method::ExpMethodDiagonalization,cache) 
    F=eigen!(A);
    copyto!(A,F.vectors*Diagonal(exp.(F.values))/F.vectors)
    return A
end

Functions that need exp can take the method object as a kwarg

function phi(z::T, k::Integer; cache=nothing,expmethod=ExpMethodHigham())
....
    P = _exp!(A,expmethod,cache)

The non-allocating can be handled with dispatch

function allocate_mem(A,::ExpMethodHigham)
     return [similar(A) for i=1:5];
end
function allocate_mem(A,::ExpMethodDiagonalization)
     return nothing;
end

The function allocate_mem can either be called by the user or automatically if caches=nothing.

(We are working on other matrix exponential codes and this package would be a natural place to put them, otherwise we would need to create a separate package.)

`phi!` allocates

The docstring of phi! states:

ExponentialUtilities.jl/src/phi.jl

Lines 115 to 119 in 0cd7738

    
           """ 
        
               phi!(out,A,k[;caches]) -> out 
        
           Non-allocating version of `phi` for non-diagonal matrix inputs. 
        
           """

Yet, it allocates:

using ExponentialUtilities, BenchmarkTools

d, q = 4, 3
L = rand(d, d)
out = ExponentialUtilities.phi(L, q);
caches = (zeros(d), zeros(d, q + 1), zeros(d + q, d + q))
expmethod = ExpMethodHigham2005()

@btime ExponentialUtilities.phi!($out, $L, $q; caches=$caches, expmethod=$expmethod);
# 21.998 μs (32 allocations: 10.00 KiB)

(and figuring out how to choose out and caches already required a bit of digging in the source code as this part is basically undocumented)

In comparison, non-allocating exponential! is very straight-forward:

cache = ExponentialUtilities.alloc_mem(L, expmethod);
@btime ExponentialUtilities.exponential!(copy($L), $expmethod, $cache) 
# 3.214 μs (2 allocations: 288 bytes)

The reason

Going through profilers and some code, the issue seems to be the combination of this call here

ExponentialUtilities.jl/src/phi.jl

Line 80 in 0cd7738

P = exponential!(cache, expmethod)

which then leads to some allocation here

ExponentialUtilities.jl/src/exp_noalloc.jl

Line 80 in 0cd7738

    
           function exponential!(A, method::ExpMethodHigham2005, _cache = alloc_mem(A, method))

Potential solution

Initialize this cache outside of the call to phi! and pass it as a function argument (e.g. expcache?). I tried this locally and such a change would remove most allocations:

expcache = ExponentialUtilities.alloc_mem(zeros(d + q, d + q), expmethod);
@btime ExponentialUtilities.phi!($out, $L, $q; caches=$caches, expmethod=$expmethod, expcache=$expcache,);
#  20.877 μs (4 allocations: 448 bytes)

(unfortunately it does not affect times a lot)

export `alloc_mem`

The method is not exported which causes the code example at https://docs.sciml.ai/ExponentialUtilities/stable/matrix_exponentials/ to fail.

Add ChainRules rules

From Slack:
@sethaxen:

Does ExponentialUtilities.jl play well with AD packages, in particular Zygote?

@ChrisRackauckas:

not fully with Zygote
it'll need adjoints
since it's doing a lot of scalar stuff
it's writing the kernels directly
the adjoints are easy though

I in particular need adjoints for expv. Zygote currently has an adjoint rule for exp(::AbstractMatrix) and exp(::Hermitian) using the eigendecomposition. I imagine though there's a better way to implement the adjoints for expv by looking at the underlying algorithm (I have not).

Switch to LBT for 1.7+

With the following PR targeting Julia 1.9: JuliaLang/julia@e84b06c

This line breaks:

ExponentialUtilities.jl/src/StegrWork.jl

Line 6 in 5460b95

const liblapack = Base.liblapack_name

As of 1.7 the libblastrampoline mechanism should be used, though I am not sure what this entails.

Error using expv with Turing.jl

Hi,

I am running a Bayesian inference on an ODE model using Turing.jl. To try and speed up things, I represented the model as a linear system and expressed the coefficients in the form of a matrix to solve the system using matrix exponential (ExponentialUtilities.expv()). When I run the inference, I keep getting the error below (this is only the head of the error but I can try to share the rest if necessary). The error message is associated to the following line but I am guessing all other calls of ExponetialUtilities.expv would produce it as well:

u0_eoi = inv(K)*ExponentialUtilities.expv(durs[i], K, tmp_rate) - inv(K)*tmp_rate

Using ExponentialAction.expv works but very slow so I wanted to compare the performance to ExponentialUtilities! Ideally, I would be able to even use ExponentialUtilities.expv_timestep to get the full matrix without looping through timesteps!

Any ideas on how to solve this! Also, I would greatly appreciate it if you could share any alternative more efficient method to carry out the analysis with a linear solver!

Thank you.
Ahmed.

PS: the full model structure is also below. It is a PBPK model where I am modeling a drug's pharmacokinetics in patients. The drug is introduced via IV infusion so the model is set to calculate the state variables using the relation u(t)=A^-1 * exp(At) * rate - A^-1 * rate. Once infusion duration is over I switch to u(t)=exp(At) * u0 The CLint parameter includes random effects relevant to interindividual variability. The function PBPK() would take in the parameter set and generate the full model coefficients matrix. Within this function there is a line:

K = zeros(Base.promote_eltype(p[1]), ncmt, ncmt);

This line promotes the input parameter type when creating the K matrix.
I haven't attached the full function description but I can if necessary!

The error message:

ERROR: MethodError: no method matching gebal!(::Char, ::Matrix{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:σ, :ĈLint, :KbBR, :KbMU, :KbAD, :KbBO, :KbRB, :ω, :ηᵢ), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:σ, Setfield.IdentityLens}, Int64}, Vector{Truncated{Cauchy{Float64}, Continuous, Float64}}, Vector{AbstractPPL.VarName{:σ, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:ĈLint, Setfield.IdentityLens}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:ĈLint, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}},

The model structure:

@model function fitPBPK(data, nSubject, rates, times, durs, wts, VVBs, BP, starts, ends, ncmt) # data should be a Vector
    # priors
    σ ~ truncated(Cauchy(0, 0.5), 0.0, Inf) # residual error
    ĈLint ~ LogNormal(7.1,0.25)
    KbBR ~ LogNormal(1.1,0.25)
    KbMU ~ LogNormal(0.3,0.25)
    KbAD ~ LogNormal(2.0,0.25)
    KbBO ~ LogNormal(0.03, 0.25)
    KbRB ~ LogNormal(0.3, 0.25)
    ω ~ truncated(Cauchy(0, 0.5), 0.0, 1.0)
    ηᵢ ~ filldist(Normal(0.0, 1.0), nSubject)

    # individual params
    CLintᵢ = ĈLint .* exp.(ω .* ηᵢ)  # non-centered parameterization

    # solve
    tmp_rate = zeros(ncmt)
    predicted = []

    ps_tmp = [CLintᵢ[1], KbBR, KbMU, KbAD, KbBO, KbRB, wts[1]]
    K_tmp = PBPK(ps_tmp)
    u0_eoi_tmp = inv(K_tmp)*ExponentialAction.expv(durs[1], K_tmp, tmp_rate) - inv(K_tmp)*tmp_rate
    ut = zeros(Base.promote_eltype(times[1], K_tmp, u0_eoi_tmp), length(u0_eoi_tmp), length(data))

    for i in 1:nSubject
        tmp_rate[15] = rates[i]
        ps = [CLintᵢ[i], KbBR, KbMU, KbAD, KbBO, KbRB, wts[i]]
        K = PBPK(ps)
        #u0_eoi = inv(K)*ExponentialAction.expv(durs[i], K, tmp_rate) - inv(K)*tmp_rate
        u0_eoi = inv(K)*ExponentialUtilities.expv(durs[i], K, tmp_rate) - inv(K)*tmp_rate
        n = 1
        for j in starts[i]:ends[i]
            tmp_time = times[i][n]
            if tmp_time <= durs[i]
                #ut[:,j] = inv(K)*ExponentialAction.expv(tmp_time, K, tmp_rate) - inv(K)*tmp_rate
                ut[:,j] = inv(K)*ExponentialUtilities.expv(tmp_time,K,tmp_rate) - inv(K)*tmp_rate
            else
                #ut[:,j] = ExponentialAction.expv(tmp_time, K, u0_eoi)
                ut[:,j] = ExponentialUtilities.expv(tmp_time,K,u0_eoi)
            end  
            n = n + 1
        end
        tmp_sol = ut[15,starts[i]:ends[i]] ./ (VVBs[i]*BP/1000.0)
        append!(predicted, tmp_sol)
    end

    # likelihood
    for i = 1:length(predicted)
        data[i] ~ LogNormal(log.(max(predicted[i], 1e-12)), σ)
    end
end

exp_generic has low precision for BigFloats

julia> _x = range(-10, stop=10, length=200);

julia> maximum(abs, (x -> ExponentialUtilities.exp_generic(x)/exp(x) - 1).(_x))/eps(Float64)
10.5

julia> maximum(abs, (x -> ExponentialUtilities.exp_generic(x)/exp(x) - 1).(big.(_x)))/eps(BigFloat)
3.990156146456241410115233813819434452173187e+42

The number of terms in the Padé approximant probably needs to depend on the precision.

Performance of `phi!` on diagonal matrices

Hi, I'm on Julia 1.8.2 and I'm trying to use exponential solvers on a moderately-sized problem. When I construct the linear operator via Diagonal, I notice some performance issues. I reported this in the past in SciML/OrdinaryDiffEq.jl#1193. It's been a while, but now I have some renewed interest, so I decided to dig a little deeper and managed to localize the problem here.

This is the benchmark I used in the past, where the nonlinear part is zero so that I can test against the exact solution. The accuracy problems are gone, but performance issues remain.

using LinearAlgebra
using DifferentialEquations
using DiffEqOperators
using OrdinaryDiffEq
using ExponentialUtilities
using BenchmarkTools
using Profile


N = 256^2
Arr = rand(N)
A = DiffEqArrayOperator(Diagonal(Arr))
Amat = convert(AbstractMatrix, A)
f!(du, u, p, t) = nothing
t1 = 1.0
u0 = ones(eltype(Arr), N)
u1 = exp.(Arr*t1) .* u0  # Exact solution.
prob! = SplitODEProblem(A, f!, u0, t1, nothing)
def_kwargs = Dict(:save_everystep=>false, :save_start=>false, :dt=>0.05)

ϵ(u_sol) = √sum(abs2.(u_sol - u1))  # RMS error to check the accuracy.

function test(prob, alg; kwargs...)
    sol = solve(prob, alg; kwargs...)
    @show ϵ(sol[end])
    @benchmark solve($prob, $alg; $kwargs...)
end;

Here's a comparison between LawsonEuler and ETDRK4.

test(prob!, LawsonEuler(); def_kwargs...)

ϵ(sol[end]) = 5.769346133408569e-13

BenchmarkTools.Trial: 573 samples with 1 evaluation.
 Range (min … max):  6.919 ms … 19.627 ms  ┊ GC (min … max): 0.00% … 61.09%
 Time  (median):     8.102 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.711 ms ±  2.148 ms  ┊ GC (mean ± σ):  5.39% ± 11.74%

    ▂▇██▇▅▄▄▂▂▁                                               
  ▄▆████████████▆▇▆▇▄▅▄▄▆▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▁▆█▅▆▄▄▄▁▄▄ ▇
  6.92 ms      Histogram: log(frequency) by time     18.6 ms <

 Memory estimate: 6.50 MiB, allocs estimate: 69.

test(prob!, ETDRK4(); def_kwargs...)

ϵ(sol[end]) = 1.8824825842372246e-13

BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min … max):  4.813 s …   4.831 s  ┊ GC (min … max): 0.33% … 0.37%
 Time  (median):     4.822 s              ┊ GC (median):    0.35%
 Time  (mean ± σ):   4.822 s ± 12.724 ms  ┊ GC (mean ± σ):  0.35% ± 0.03%

  █                                                       █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  4.81 s         Histogram: frequency by time        4.83 s <

 Memory estimate: 171.01 MiB, allocs estimate: 1310838.

So LawsonEuler wins by orders of magnitude, both in memory allocations and time of execution. I know ETDRK4 is a much more complex method, but I wouldn't expect such a gap in performance.

I managed to identify a part of the problem with memory. It turns out that while scalar phi and exponential! can use caches to save memory, these caches are reallocated for each matrix element. This can be fixed by creating the caches already in phi!. So I monkey-patched these two functions

function ExponentialUtilities.phi(z::T, k::Integer; cache = nothing,
             expmethod = ExponentialUtilities.ExpMethodHigham2005(), cache2 = nothing) where {T <: Number}
    # Construct the matrix
    if cache == nothing
        cache = fill(zero(T), k + 1, k + 1)
    else
        fill!(cache, zero(T))
    end
    cache[1, 1] = z
    for i in 1:k
        cache[i, i + 1] = one(T)
    end
    if cache2 == nothing
        cache2 = ExponentialUtilities.alloc_mem(cache, expmethod)
    end
    P = exponential!(cache, expmethod, cache2)
    return P[1, :]
end

function ExponentialUtilities.phi!(out::Vector{Diagonal{T, V}}, A::Diagonal{T, V}, k::Integer;
              caches = nothing, expmethod = ExponentialUtilities.ExpMethodHigham2005()) where {T <: Number, V <: AbstractVector{T}}
    cache = fill(zero(T), k + 1, k + 1)
    cache2 = ExponentialUtilities.alloc_mem(cache, expmethod)
    for i in 1:size(A, 1)
        phiz = phi(A[i, i], k; cache = cache, expmethod = expmethod, cache2 = cache2)
        for j in 1:(k + 1)
            out[j][i, i] = phiz[j]
        end
    end
    return out
end

and got the following result

test(prob!, ETDRK4(); def_kwargs...)

ϵ(sol[end]) = 1.8832000766134335e-13

BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min … max):  4.923 s …    5.065 s  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.994 s               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.994 s ± 100.383 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                        █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  4.92 s         Histogram: frequency by time         5.06 s <

 Memory estimate: 40.01 MiB, allocs estimate: 262278.

Memory usage went down noticeably, but the method is as slow as before. Running the profiler on phi! I found that most of the time is spent on LU decompositions in ldiv_for_generated!, this line specifically:

ExponentialUtilities.jl/src/exp_noalloc.jl

Line 29 in 380125a

    
           F = lu!(A) # This allocation is unavoidable, due to the interface of LinearAlgebra

At this point I'm not familiar enough with the package to continue. Can anything be done to speed up this method? Perhaps the comment about unavoidable LU is not true anymore? Or have I run into the limits of performance of exponential solvers and they simply do not scale to problems of this size? Should I use another method? LawsonEuler, which seems to work fine in this simple toy example, unfortunately isn't accurate enough in the real problem. I also tried ETDRK4(krylov=true), but it's only 50% faster and seems to allocate slightly more memory, which maybe points to issues with caching there as well.

Sorry for such a long post, I kinda want this to work :)

Complex valued expv! gives MethodError in some cases

A = Diagonal(1 .* ones(ComplexF64,8))
b = ones(ComplexF64,8)

Ks = KrylovSubspace{ComplexF64,ComplexF64}(8,8)
arnoldi!(Ks,A,b) # After this like, Ks.H = [ 1 - epsilon + 0i ; epsilon + 0i ]
out = Vector{ComplexF64}(undef,8)
expv!(out,1.0,Ks) # Causes error

Since H[1:1,1] is hermitian, this branch goes through:

ExponentialUtilities.jl/src/krylov_phiv.jl

Lines 82 to 85 in 5460b95

    
           copyto!(cache, @view(H[1:m, :])) 
        
           if ishermitian(cache) 
        
               # Optimize the case for symtridiagonal H 
        
               F = eigen!(SymTridiagonal(cache))

but the efficient eigen! for SymTridiagonal matrices only works for reals (see JuliaLang/julia@fc4c69d) so you get this error:

ERROR: LoadError: MethodError: no method matching eigen!(::SymTridiagonal{ComplexF64, Vector{ComplexF64}})
Closest candidates are:
  eigen!(!Matched::SymTridiagonal{var"#s832", V} where {var"#s832"<:Union{Float32, Float64}, V<:AbstractVector{var"#s832"}}) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\tridiag.jl:280
  eigen!(!Matched::SymTridiagonal{var"#s832", V} where {var"#s832"<:Union{Float32, Float64}, V<:AbstractVector{var"#s832"}}, !Matched::UnitRange) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\tridiag.jl:283
  eigen!(!Matched::SymTridiagonal{var"#s832", V} where {var"#s832"<:Union{Float32, Float64}, V<:AbstractVector{var"#s832"}}, !Matched::Real, !Matched::Real) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\tridiag.jl:288
  ...

In the expv! code for complex t, there is this line:

ExponentialUtilities.jl/src/krylov_phiv.jl

Line 116 in 5460b95

F = eigen!(SymTridiagonal(real(cache)))

which just takes the real part to make it work, but this seems bad... how do you know the matrix is actually real?

no stable docs

GPU-Friendly `inplace_add!`

inplace_add! as currently implemented uses scalar indexing to update the diagonal of an array. Since CuArrays disallow scalar indexing outside the REPL, this method breaks in practice.

The current implementation, reproduced from here

@inline function inplace_add!(A,B::UniformScaling) # Called from generated code
    s = B.λ
    @inbounds for i in diagind(A)
        A[i] += s
    end
end

Avoid generated function in generic_exp

The current implementation relies on methods being defined for special number types before

ExponentialUtilities.jl/src/exp.jl

Lines 127 to 135 in fce7ae6

    
           @generated function exp_pade_p(x, ::Val{k}, ::Val{m}) where {k, m} 
        
               factorial = Base.factorial ∘ big 
        
               p = map(Tuple(0:k)) do j 
        
                   num = factorial(k + m - j) * factorial(k) 
        
                   den = factorial(k + m) * factorial(k - j)*factorial(j) 
        
                   (float ∘ eltype)(x)(num // den) * (x <: Number ? 1 : I) 
        
               end 
        
               :(@evalpoly(x, $(p...))) 
        
           end

is being called, see item 4 in https://docs.julialang.org/en/v1/manual/metaprogramming/#Generated-functions-1. If not then you'll hit a world age error. E.g.

julia> using ExponentialUtilities

julia> f = t -> exp_generic(t)
#1 (generic function with 1 method)

julia> f(0.3)
1.3498588075760032

julia> using ForwardDiff

julia> ForwardDiff.derivative(f, 0.2)
ERROR: MethodError: no method matching ForwardDiff.Dual{ForwardDiff.Tag{var"#1#2",Float64},Float64,1}(::Int64)
The applicable method may be too new: running in world age 27795, while current world is 27806.

Register

@JuliaRegistrator register()

support CUDA arrays?

I tried to port CuArrays CUSPARSE module to ExponentialUtilities to utilize GPU devices. I removed the array type restriction from KrylovSubspace, but still not working.

Wondering how much effort it is to write a GPU compatible expmv and does it worth a try?

Support for CUDA Arrays

Hello,

I'm interested on using these tools with CUDA Arrays like ComplexF64 dense CuArray or sparse CuSparseMatrix. Following the documentation, it only needs that the array supports the following functions:

Base.eltype(A)
Base.size(A, dim)
LinearAlgebra.mul!(y, A, x)
LinearAlgebra.opnorm(A, p=Inf) (I can insert manually this value)
LinearAlgebra.ishermitian(A)

To prove that this is the following code:

test = CUSPARSE.CuSparseMatrixCSC(sprand(ComplexF64, 100, 100, 0.01))
if Base.eltype(test) == ComplexF64
    println("OK")
end
if Base.size(test, 2) == 100
    println("OK")
end
dy = CUDA.zeros(ComplexF64, 100)
y = CUDA.rand(ComplexF64, 100)
LinearAlgebra.mul!(dy, test, y)
test2 = Array(test)
dy2 = Array(dy)
y2 = Array(y)
LinearAlgebra.mul!(dy2, test2, y2)
if Array(dy) == dy2
    println("OK")
end

But I can't calculate for example the function expv that it returns the error

expv(1.0, test, y, opnorm = opnorm(test2))

Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.

The same problem happens for the dense matrix case.

`exponential!` crashes with some sparse array

The following code crashes

using SparseArrays
using ExponentialUtilities

A = sparse([1, 2, 1], [2, 1, 1], [1.0, 2.0, 3.0])
print(exponential!(A, ExpMethodGeneric()))

Stepping through a debugger suggests it crashes at @generated function naivemul!. Interestingly, everything runs fine if I change A to sparse([2, 2, 1], [2, 1, 1], [1.0, 2.0, 3.0]).

imaginary t is not working

MWE

using ExponentialUtilities, LinearAlgebra
A = Hermitian(rand(ComplexF64, 4, 4))
v = rand(ComplexF64, 4)
Ks = arnoldi(A, v);
w = similar(v)

expv!(w, -1e2 * im, Ks)

Got error:

ERROR: InexactError: Float64(0.0 + 56.73590474494471im)
Stacktrace:
 [1] setindex!(::Array{Float64,1}, ::Complex{Float64}, ::Int64) at ./complex.jl:37
 [2] macro expansion at /Users/roger/Documents/Repos/julia/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/generic.jl:101 [inlined]
 [3] macro expansion at ./simdloop.jl:73 [inlined]
 [4] lmul! at /Users/roger/Documents/Repos/julia/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/generic.jl:100 [inlined]
 [5] #expv!#20(::Nothing, ::Function, ::Array{Complex{Float64},1}, ::Complex{Float64}, ::KrylovSubspace{Float64,Complex{Float64},Float64}) at /Users/roger/.julia/dev/ExponentialUtilities/src/krylov_phiv.jl:86
 [6] expv!(::Array{Complex{Float64},1}, ::Complex{Float64}, ::KrylovSubspace{Float64,Complex{Float64},Float64}) at /Users/roger/.julia/dev/ExponentialUtilities/src/krylov_phiv.jl:73
 [7] top-level scope at none:0

This is because when A is hermitian, F is always real, but it uses lmul! on F.vectors here:

https://github.com/JuliaDiffEq/ExponentialUtilities.jl/blob/9304f45f4d63a80f4eaa263030666ff70580cd13/src/krylov_phiv.jl#L86

Is there anyway to avoid the allocation while supporting complex valued t ? Or just dispatch expv! to a special version for complex t?

Issue with ForwardDiff.jl

We're having difficulties using the latest patch release of ForwardDiff.jl in the following situation:

ForwardDiff.derivative(2.0) do x
    h = fill(zero(typeof(x)), (2, 2))
    F = ExponentialUtilities.exponential!(im * h, ExpMethodGeneric())
    return sum(F)
end

is returning a NaN.

One multiplication too many in `_exp!`

Analogous to the PR: JuliaLang/julia#40668 I believe the _exp!-function can be implemented with one multiplication less in several cases. More precisely, these lines can be implemented with one multiplication less:

ExponentialUtilities.jl/src/exp.jl

Lines 41 to 49 in 5460b95

    
           mul!(A2, A, A) 
        
           @. U = C[2] * P 
        
           @. V = C[1] * P 
        
           for k in 1:(div(size(C, 1), 2) - 1) 
        
               k2 = 2 * k 
        
               mul!(temp, P, A2); P, temp = temp, P # equivalent to P *= A2 
        
               @. U += C[k2 + 2] * P 
        
               @. V += C[k2 + 1] * P 
        
           end

Since when k=1, P is identity.

Extending expv using Saad to general phiv?

@jagot I have finally went through the paper by Saad. While in theory the algorithm can be extended to arbitrary phi orders and non-Hermitian matrices, I still have some doubts and would like to discuss them with you.

Applicability to non-hermitian operators?

Our idea is to controll the Krylov subspace size not by a priori error estimates ("happy-breakdown" being the simplest version), but by using a posteri error estimates as the computation proceeds (btw, I don't think Saad put forward this idea himself; can I assume it's your invention then ;)).

Now a problem with this approach is that the error estimates should not be expensive. For hermitian operators this is fine because tridiagonal Hm is fairly cheap to work with (direct diagonalization); on the other hand if Hm is just upper Hessenberg then having to calculate exp(Hm) every step is very expensive, and we're better off just doing a few more Arnoldi iterations with a fixed m.

If you agree with this, then we can restrict ourselves to implementing the algorithm only for Hermitian operators.

Error estimates

Now the good news is that all of Saad's derivations (in particular, the error formula (42) and (43)) can be easily adapted to arbitrary phis, and we can get the phi versions of Er1 and Er2. In fact, Er1 is the same error estimate used by Niesen & Wright in their phipm algorithm.

phiv! already has Er1 implemented and it is used by phiv_timestep!. The interface is a bit ugly at the moment: since to get Er1 estimate for phi_k you need phi_(k+1), the way I went is actually computing the error estimate of the second-to-last phi using the last term (so, as an example, if you want to compute phi_3(tA)b you need to actually call phiv! with k=4; this is really inconvenient and I may rewrite it in the future). But nevertheless it is present, so creating a termination condition using Er1 estimate is very much doable.

The problem I have is Er2 estimate, which the current expv! with Saad is using. Replacing exp with phi_1 is... kinda ok? But we certainly cannot do the same for phi_1's error estimate, since phi_1 and phi_2 don't even converge to the same value. Something has to be done to generalize Er2, and I'm not sure how.

Error: exponential! and exp_generic perform scalar indexing on gpu

Hey.

This happened as I tried to use the package for GPU-based matrix exponentiation (Base.exp can't do it). In constrast, expv works well. Though my julia-fu is not quite there yet, I would be happy to try and get this working if given some directions.

using CUDA, Flux, ExponentialUtilities

CUDA.allowscalar(false)

M = rand(Float32,10,10) |> gpu
v = rand(10) |> gpu 


ExponentialUtilities.exponential!(M) #ERROR: Scalar indexing is disallowed.
ExponentialUtilities.exp_generic(M) #ERROR: Scalar indexing is disallowed.
ExponentialUtilities.expv(1f0,M,v) #Works

The error message:

Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] assertscalar(op::String)
    @ GPUArraysCore C:\Users\55619\.julia\packages\GPUArraysCore\uOYfN\src\GPUArraysCore.jl:103
  [3] getindex
    @ C:\Users\55619\.julia\packages\GPUArrays\t0LfC\src\host\indexing.jl:9 [inlined]
  [4] iterate
    @ .\abstractarray.jl:1220 [inlined]
  [5] iterate
    @ .\abstractarray.jl:1218 [inlined]
  [6] chkfinite(A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ LinearAlgebra.LAPACK C:\Users\55619\AppData\Local\Programs\Julia-1.9.0\share\julia\stdlib\v1.9\LinearAlgebra\src\lapack.jl:84
  [7] gebal_noalloc!(job::Char, A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, scale::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ ExponentialUtilities C:\Users\55619\.julia\packages\ExponentialUtilities\x24Py\src\exp_noalloc.jl:56
  [8] exponential!(A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, method::ExpMethodHigham2005, _cache::Tuple{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}})
    @ ExponentialUtilities C:\Users\55619\.julia\packages\ExponentialUtilities\x24Py\src\exp_noalloc.jl:88
  [9] exponential!
    @ C:\Users\55619\.julia\packages\ExponentialUtilities\x24Py\src\exp_noalloc.jl:81 [inlined]
 [10] exponential!(A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ ExponentialUtilities C:\Users\55619\.julia\packages\ExponentialUtilities\x24Py\src\exp.jl:15
 [11] top-level scope
    @ Untitled-2:10

Occasional LAPACKException(22)

Describe the bug 🐞

LAPACKException(22) appears. I do not know what it is.

Expected behavior

Error does not arise and evaluate the codes without errors.

Minimal Reproducible Example 👇

Without MRE( I can not symplify the codes. Sorry)

Error & Stacktrace ⚠️

ERROR: LoadError: LAPACKException(22)
Stacktrace:
  [1] chklapackerror
    @ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/lapack.jl:40 [inlined]
  [2] stegr!(jobz::Char, range::Char, dv::Vector{Float64}, ev::Vector{Float64}, vl::Float64, vu::Float64, il::Int64, iu::Int64)
    @ LinearAlgebra.LAPACK ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/lapack.jl:3999
  [3] stegr!
    @ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/lapack.jl:4066 [inlined]
  [4] eigen!
    @ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/tridiag.jl:252 [inlined]
  [5] expv!(w::Vector{ComplexF64}, t::ComplexF64, Ks::KrylovSubspace{ComplexF64, Float64, Float64, Matrix{ComplexF64}, Matrix{Float64}}; cache::Nothing, expmethod::ExpMethodHigham2005Base)
    @ ExponentialUtilities ~/.julia/packages/ExponentialUtilities/xLH9y/src/krylov_phiv.jl:122
  [6] expv!
    @ ~/.julia/packages/ExponentialUtilities/xLH9y/src/krylov_phiv.jl:104 [inlined]
  [7] time_exe!(popu_g_e::Matrix{Float64}, psi0::Vector{ComplexF64}, H0_s::SparseMatrixCSC{Float64, Int64}, H1_s::SparseMatrixCSC{ComplexF64, Int64}, epsilont::Vector{Float64}, H_s::SparseMatrixCSC{ComplexF64, Int64}, Ks::KrylovSubspace{ComplexF64, Float64, Float64, Matrix{ComplexF64}, Matrix{Float64}})
    @ Main /media/sda1/sda1/H-2+n-weak_field/main-1s2p/tau50-scan_theta/tf0-dt0.02-w0.375-tau50-th0.1-step8/src/1-constants-and-preparation.jl:68
  [8] tspan!(psi0::Vector{ComplexF64}, H0_s::SparseMatrixCSC{Float64, Int64}, H1_s::SparseMatrixCSC{ComplexF64, Int64}, epsilont::Vector{Float64})
    @ Main /media/sda1/sda1/H-2+n-weak_field/main-1s2p/tau50-scan_theta/tf0-dt0.02-w0.375-tau50-th0.1-step8/src/1-constants-and-preparation.jl:79
  [9] macro expansion
    @ ./timing.jl:279 [inlined]
 [10] (::var"#4#6")()
    @ Main /media/sda1/sda1/H-2+n-weak_field/main-1s2p/tau50-scan_theta/tf0-dt0.02-w0.375-tau50-th0.1-step8/main6-perf7.jl:41
 [11] (::Base.RedirectStdStream)(thunk::var"#4#6", stream::IOStream)
    @ Base ./stream.jl:1429
 [12] #3
    @ /media/sda1/sda1/H-2+n-weak_field/main-1s2p/tau50-scan_theta/tf0-dt0.02-w0.375-tau50-th0.1-step8/main6-perf7.jl:40 [inlined]
 [13] open(::var"#3#5", ::String, ::Vararg{String}; kwargs::@Kwargs{})
    @ Base ./io.jl:396
 [14] open(::Function, ::String, ::String)
    @ Base ./io.jl:393
 [15] top-level scope
    @ /media/sda1/sda1/H-2+n-weak_field/main-1s2p/tau50-scan_theta/tf0-dt0.02-w0.375-tau50-th0.1-step8/main6-perf7.jl:39
in expression starting at /media/sda1/sda1/H-2+n-weak_field/main-1s2p/tau50-scan_theta/tf0-dt0.02-w0.375-tau50-th0.1-step8/main6-perf7.jl:39

Environment (please complete the following information):

Output of using Pkg; Pkg.status()

Status 
 `~/.julia/environments/v1.10/Project.toml` 
  [6e4b80f9] BenchmarkTools v1.5.0 
  [d4d017d3] ExponentialUtilities v1.26.1

Output of using Pkg; Pkg.status(; mode = PKGMODE_MANIFEST)

Status `~/.julia/environments/v1.10/Manifest.toml` 
  [79e6a3ab] Adapt v4.0.4 
  [4fba245c] ArrayInterface v7.9.0 
  [6e4b80f9] BenchmarkTools v1.5.0 
  [d4d017d3] ExponentialUtilities v1.26.1 
  [46192b85] GPUArraysCore v0.1.6 
  [c145ed77] GenericSchur v0.5.4 
  [682c06a0] JSON v0.21.4 
  [69de0a69] Parsers v2.8.1 
  
 [aea7be01]  
PrecompileTools v1.2.1 
   
[21216c6a]  
Preferences v1.4.3 
   
[ae029012] Requires v1.3.0 
   
[56f22d72] Artifacts 
  
  
  
[ade2ca70]  
Dates 
   
[8f399da3] Libdl 
  
   
[37e2e46d]  
LinearAlgebra 
  
  
  
[56ddb016]  
Logging 
   
[a63ad114]  
Mmap 
  
  
[de0858da] Printf 
  
  
  
[9abbd945] Profile 
  
  
  
[9a3f8284] Random 
  
   
[ea8e919c]  
SHA v0.7.0 
  
  
[9e88b42a]  
Serialization 
  
  
[2f01184e]  
SparseArrays v1.10.0 
  
  
[10745b16]  
Statistics v1.10.0 
  
  
[4607b0f0] SuiteSparse 
  
  
 [fa267f1f]  
TOML v1.0.3 
  [cf7118a7] UUIDs 
  
   
[4ec0a83e] Unicode 
  
   
[e66e0078]  
CompilerSupportLibraries_jll v1.1.0+0 
  
  [4536629a]  
OpenBLAS_jll v0.3.23+4 
  [bea87d4a]  
SuiteSparse_jll v7.2.1+1 
  [8e850b90]  
libblastrampoline_jll v5.8.0+1

Output of versioninfo()

Julia Version 1.10.2 
  
Commit bd47eca2c8a (2024-03-01 10:14 UTC) 
  
Build Info: 
  
   
Official https://julialang.org/ release 
Platform Info: 
  
  OS:  
Linux 
 ( 
x86_64-linux-gnu 
) 
  
  CPU:  
20 
 × 12th Gen Intel(R) Core(TM) i7-12700 
  WORD_SIZE: 64 
  LIBM: libopenlibm 
  LLVM: libLLVM- 
15 
.0.7 (ORCJIT, alderlake) 
Threads: 1 default, 0 interactive, 1 GC (on 20 virtual cores) 
Environment: 
  
  LD_LIBRARY_PATH = /opt/intel/oneapi/vpl/2023.0.0/lib:/opt/intel/oneapi/tbb/2021.8.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.8.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.8.0//lib/release:/opt/intel/oneapi/mpi/2021.8.0//lib:/opt/intel/oneapi/mkl/2023.0.0/lib/intel64:/opt/intel/oneapi/itac/2021.8.0/slib:/opt/intel/oneapi/ipp/2021.7.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.6.3/lib/intel64:/opt/intel/oneapi/ipp/2021.7.0/lib/intel64:/opt/intel/oneapi/dnnl/2023.0.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/debugger/2023.0.0/gdb/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/libipt/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/dep/lib:/opt/intel/oneapi/dal/2023.0.0/lib/intel64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/x64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/ccl/2021.8.0/lib/cpu_gpu_dpcpp 
  JULIA_PKG_SERVER = https://mirrors.tuna.tsinghua.edu.cn/julia

Additional context

When I run the codes firstly:

open( "./data/out.log", "a") do io
    redirect_stdout(io) do
        @time global popu_g_e = tspan!(psi0, H0_s, H1_s, epsilont)
    end
end

The error will arises. But if I run the codes above secondly, the error will dispear and give me the right results. So now the work around method is to run the codes above two times...

I searched the related error on the google. I only got one item: @Jutho/KrylovKit.jl#57

Higher allocations using `MMatrix` and `exponential!`

The snippet below shows that for small matrices using MMatrix as argument of exponential! is faster than using a regular Matrix. However the MMatrix allocates more. Is this the expected behaviour?

using StaticArrays, ExponentialUtilities
import ExponentialUtilities.alloc_mem

method=ExpMethodHigham2005();

A = [3. 2; 3 4.];
cacheA = alloc_mem(A,method);
println("Matrix");
@time exponential!(A, method,cacheA);

B  = @MMatrix [1 2; 3 4.];
cacheB = alloc_mem(B,method);
println("MMatrix");
@time exponential!(B, method,cacheB);

Matrix
0.000037 seconds (1 allocation: 80 bytes)
MMatrix
0.000006 seconds (6 allocations: 416 bytes)

generic_exp fails in zero

julia> ExponentialUtilities.exp_generic(0.0)
ERROR: InexactError: trunc(Int64, -Inf)
Stacktrace:
 [1] trunc at ./float.jl:703 [inlined]
 [2] ceil at ./float.jl:365 [inlined]
 [3] exp_generic(::Float64, ::Val{13}) at /Users/andreasnoack/.julia/dev/ExponentialUtilities/src/exp.jl:119
 [4] exp_generic(::Float64) at /Users/andreasnoack/.julia/dev/ExponentialUtilities/src/exp.jl:118
 [5] top-level scope at REPL[64]:1

Classical Gram-Schmidt with DGKS re-orthogonalization criterion

In Gram-Schmidt iterations, re-orthogonalization is often required to enforce the obtained spanning space is correct up to some precision in many practical applications.

https://www.cs.cornell.edu/~bindel/class/cs6210-f16/lec/2016-11-16.pdf

I didn't find such procedure in this repo yet. Is it because it is not needed in implementing expmv, or is it simply not implemented yet or have I missed it? I am happy to submit a PR if re-orthogonalization is really needed.

Generic mutating exp!

The current implementation (as well as the one in LinearAlgebra) calls the balancing function

ExponentialUtilities.jl/src/exp.jl

Line 24 in dc1e104

ilo, ihi, scale = LAPACK.gebal!('B', A) # modifies A

which we don't have a generic implementation of. It would be great to have one. It shouldn't be a difficult function to implement but might be worth noticing that there is already one available here https://github.com/RalphAS/GenericSchur.jl/blob/master/src/balance.jl

Generalize faster exp_pade_p for all ArrayInterface.ismutable

See #55

KIOPS

See SciML/OrdinaryDiffEq.jl#243

Failure in precompiling in OrdinaryDiffEq

Hi, I just updated the packages I have installed in julia.
After that, I import OrdinaryDiffEq and the following errors pop out

ERROR: LoadError: ArgumentError: Package ExponentialUtilities does not have ArrayInterfaceGPUArrays in its dependencies:
- You may have a partially installed environment. Try `Pkg.instantiate()`
  to ensure all packages in the environment are installed.
- Or, if you have ExponentialUtilities checked out for development and have
  added ArrayInterfaceGPUArrays as a dependency but haven't updated your primary
  environment's manifest file, try `Pkg.resolve()`.
- Otherwise you may need to report an issue with ExponentialUtilities

I have already tried Pkg.instantiate() and Pkg.resolve().
But it doesn't work

My package version info:
julia v1.8.0
OrdinaryDiffEq v6.31.2
ExponentialUtilities v1.22.0
ArrayInterfaceGPUArrays v0.2.2

Other algorithms

Is this package the place where we'd like to put other algorithms as well? Such as

Chebyshev (requires prior knowledge of spectral range of the operator)
Padé (explicit & implicit Euler, Crank–Nicolson).
Spectral (when the operator is diagonal in the basis chosen, or even if you have precomputed the diagonalization)

I imagine being able to do the following.

CrankNicolson{T} where T = Padé{1,1,T}

A = SymTridiagonal(...) # or whatever
# For a time-independent, this would precompute the factorizations necessary, that is I - A/2
# This requires fixed time-step, though
Ae = CrankNicolson(dt*A)

expv!(w, Ae, b) # Amounts to w = (I-A/2)\(I+A/2)*b

# Alternative interface, for time-dependent operators/adaptive time-steps, which cannot reuse previous factorizations
expv!(w, dt, A, P)

[Feature request] `exp_timestep` for Schordinger picture?

Is your feature request related to a problem? Please describe.

I want to evolve a quantum state by time-dependependent Schordinger equation in Schordinger picture:

$$ H(t)=H_0+H_s(t) $$

I find there is a function exp_timestep(ts,A,b[;adaptive,tol,kwargs...]) -> U :

In Schordinger picture the tA above shoud be A0 + tA. The A0 is time-independent. I hope there is a chance to develop another function to support it. Or should I give out more details?

Thx for this great Pkg anyway!

Drop `ishermitian` and `opnorm` requirement for custom operator types.

Currently to use a matrix-free operator we need to implement the LinearAlgebra.ishermitian and LinearAlgebra.opnorm (or provide a custom norm function). These slows down rapid prototyping and for some sparse types (e.g. SymTridiagonal) the default methods are not very efficient.

Since only scalar values are really needed, we should change the methods to not accept the functions themselves but their values with defaults (as suggested in #1).

For rapid prototyping & interfacing between packages, the end goal should be that only LinearAlgebra.mul! and Base.size needs to be implemented.

mode=:error_estimate gives wrong results for long times

Describe the bug 🐞

expv(t, A, b, mode=:error_estimate) gives vastly different results for longer time-steps.

Expected behavior

I would expect error_estimate and happy_breakdown results to at least have the same magnitudes

Minimal Reproducible Example 👇

A = I - sprand(100,100,.1) * .1
b = rand(100)
@assert expv(10, A,b) ≈ expv(10, A,b ,mode=:error_estimate)

Error & Stacktrace ⚠️

julia>  expv(10, A,b)
100-element Vector{Float64}:
  19372.458459100264
 -57030.13477800453
   3852.973450336655
  -7684.450057227177

julia>  expv(10, A,b ,mode=:error_estimate)
100-element Vector{Float64}:
 -3.1939130045290945e15
  4.314219301303509e15
 -1.6738861310296682e15
 -4.276422508317541e15

Environment (please complete the following information):

Output of using Pkg; Pkg.status()

[d4d017d3] ExponentialUtilities v1.25.0

Output of versioninfo()

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × AMD EPYC 7502P 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
  Threads: 11 on 64 virtual cores
Environment:
  JULIA_HISTORY = ./.history.jl_

More matrix exponential methods

Wish-list / todo-list for more matrix exponential methods

An implementation of A. H. Al-Mohy and N. J. Higham, A new scaling and squaring algorithm
for the matrix exponential, SIAM J. Matrix Anal. Appl., 31(3), (2009) This is basically the same as Higham 2005, but with a more accurate backward error estimate which in practice leads to less multiplications. The method behind expm in recent versions of matlab.
expmpoly: http://personales.upv.es/~jorsasma/software/expmpol.m described in Boosting the computation of the matrix exponential, Appl. Math. Comput., 2018 Requires only multiplications - no inverse. Main computation code can be generated with graph_exp_sid in GraphMatFun.jl.
ExpMethodArbitraryPrecision. Arbitrary precision scaling-and-squaring: https://epubs.siam.org/doi/10.1137/18M1228876?mobileUi=0

`exp_generic` mutates

julia> A = rand(4,4).*2 .- 1
4×4 Matrix{Float64}:
  0.610162  -0.729739   0.353308    0.580706
  0.13283    0.81543   -0.101152   -0.42284
  0.102145  -0.632171  -0.0403294  -0.676109
 -0.589665  -0.525069   0.908199    0.0735095

julia> ExponentialUtilities.exp_generic(A)
4×4 Matrix{Float64}:
  1.49472   -1.9331     0.852891   0.850057
  0.438917   2.38049   -0.319651  -0.511578
  0.2644    -0.824442   0.801718  -0.420251
 -0.764627  -0.76571    0.730733   0.736535

julia> A
4×4 Matrix{Float64}:
  1.49472   -1.9331     0.852891   0.850057
  0.438917   2.38049   -0.319651  -0.511578
  0.2644    -0.824442   0.801718  -0.420251
 -0.764627  -0.76571    0.730733   0.736535

I know it is deprecated

julia> ExponentialUtilities.exp_generic(A)
ERROR: `exp_generic` is deprecated, use `exponential!` instead.
Stacktrace:
 [1] depwarn(msg::String, funcsym::Symbol; force::Bool)
   @ Base ./deprecated.jl:124
 [2] depwarn(msg::String, funcsym::Symbol)
   @ Base ./deprecated.jl:121
 [3] exp_generic(args::Matrix{Float64}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ ExponentialUtilities ./deprecated.jl:115
 [4] exp_generic(args::Matrix{Float64})
   @ ExponentialUtilities ./deprecated.jl:113
 [5] top-level scope
   @ REPL[3]:1

but perhaps it's worth defining an exponential(A) = exponential!(copy(A)), and either removing exp_generic or forwarding it to exponential instead?

Long compile times

We're seeing very long compile times for ExponentialUtilities over at ClimaAtmos.jl:

julia> @time_imports using ClimaAtmos
[ Info: Precompiling ClimaAtmos [b2c96348-7fb7-4fe0-8da9-78d88439e717]
      0.0 ms  IfElse
      0.1 ms  BitTwiddlingConvenienceFunctions
      0.1 ms  CommonSolve
      0.1 ms  CustomUnitRanges
      0.1 ms  DataValueInterfaces
      0.1 ms  GilbertCurves
      0.1 ms  IteratorInterfaceExtensions
      0.1 ms  JuliaNVTXCallbacks_jll
      0.1 ms  NVTX_jll
      0.1 ms  PrecompileTools
      0.1 ms  Reexport
      0.1 ms  SIMDTypes
      0.1 ms  SimpleUnPack
      0.1 ms  SnoopPrecompile
      0.1 ms  TableTraits
      0.1 ms  Tricks
      0.1 ms  TruncatedStacktraces
      0.1 ms  UnPack
      0.1 ms  ZygoteRules
      0.2 ms  AxisAlgorithms
      0.2 ms  CubedSphere
      0.2 ms  Elliptic
      0.2 ms  EnumX
      0.2 ms  ExprTools
      0.2 ms  JLLWrappers
      0.2 ms  MuladdMacro
      0.2 ms  NaNMath
      0.2 ms  Requires
      0.2 ms  StatsAPI
      0.2 ms  TensorCore
      0.2 ms  TextWrap
      0.3 ms  ArtifactWrappers
      0.3 ms  CommonSubexpressions
      0.3 ms  FastClosures
      0.3 ms  GaussQuadrature
      0.3 ms  Parameters
      0.3 ms  SortingAlgorithms
      0.4 ms  FastBroadcast
      0.4 ms  MPIPreferences
      0.4 ms  OpenSpecFun_jll
      0.4 ms  ProgressBars
      0.5 ms  ChangesOfVariables
      0.5 ms  DiffRules
      0.5 ms  Inflate
      0.5 ms  LogExpFunctions
      0.5 ms  RuntimeGeneratedFunctions
      0.6 ms  Compat
      0.6 ms  SymbolicIndexingInterface
      0.7 ms  FastGaussQuadrature
      0.7 ms  FFTW_jll
      0.7 ms  FunctionWrappersWrappers
      0.8 ms  AtmosphericProfilesLibrary
      0.8 ms  ConstructionBase
      0.8 ms  HDF5_jll
      0.8 ms  HypergeometricFunctions
      0.8 ms  Rmath_jll
      0.8 ms  SciMLNLSolve
      0.9 ms  DataAPI
      0.9 ms  Dierckx
      0.9 ms  InverseFunctions
      0.9 ms  OpenLibm_jll
      1.0 ms  Zlib_jll
      1.1 ms  ClimaComms
      1.1 ms  VertexSafeGraphs
      1.2 ms  CLIMAParameters
      1.2 ms  DensityInterface
      1.2 ms  OpenSSL_jll
      1.3 ms  MosaicViews
      1.3 ms  Polyester
      1.5 ms  TerminalLoggers
      1.6 ms  ArgParse
      1.6 ms  StaticArraysCore
      1.7 ms  DiffResults
      1.8 ms  Libiconv_jll
      1.8 ms  NetCDF_jll
      1.8 ms  UnsafeAtomicsLLVM
      1.9 ms  Calculus
      1.9 ms  libblastrampoline_jll
      2.0 ms  ManualMemory
      2.0 ms  Random123
      2.1 ms  CEnum
      2.1 ms  SimpleTraits
      2.2 ms  GPUArraysCore
      2.3 ms  TranscodingStreams
      2.3 ms  XML2_jll
      2.4 ms  Graphics
      2.4 ms  QuadGK
      2.5 ms  DocStringExtensions 50.76% compilation time
      2.7 ms  LLVMExtra_jll 45.65% compilation time
      2.8 ms  NLsolve
      2.8 ms  UnsafeAtomics
      2.9 ms  RootSolvers
      3.0 ms  CloseOpenIntervals
      3.0 ms  MappedArrays
      3.1 ms  CommonDataModel
      3.2 ms  StackViews
      3.3 ms  PaddedViews
      3.4 ms  KernelAbstractions
      3.6 ms  CloudMicrophysics
      3.6 ms  LayoutPointers
      3.8 ms  SLEEFPirates
      4.0 ms  MPICH_jll
      4.4 ms  BFloat16s
      4.7 ms  Thermodynamics
      4.9 ms  Sparspak
      5.0 ms  StatsFuns
      5.1 ms  ArnoldiMethod
      5.2 ms  IrrationalConstants
      5.5 ms  ProgressLogging
      5.5 ms  YAML
      5.7 ms  ComputationalResources
      6.0 ms  CompilerSupportLibraries_jll
      6.1 ms  StringEncodings
      6.3 ms  Distances
      6.3 ms  NLSolversBase
      6.3 ms  TriangularSolve 73.98% compilation time
      6.7 ms  WoodburyMatrices
      6.8 ms  LineSearches
      6.9 ms  AbstractFFTs
      6.9 ms  DiffEqCallbacks
      7.0 ms  LeftChildRightSiblingTrees
      7.1 ms  Missings
      7.4 ms  OrderedCollections
      8.0 ms  ArrayInterfaceCore
      8.0 ms  PreallocationTools 44.42% compilation time (100% recompilation)
      8.5 ms  LDLFactorizations
     10.2 ms  FiniteDiff 35.61% compilation time (100% recompilation)
     10.5 ms  FunctionWrappers
     11.3 ms  FastLapackInterface
     11.7 ms  Preferences
     12.2 ms  SpecialFunctions
     13.3 ms  PolyesterWeave 60.98% compilation time
     13.7 ms  CatIndices
     14.3 ms  FFTViews
     14.4 ms  TiledIteration
     14.6 ms  RandomNumbers 26.08% compilation time
     15.0 ms  SparseDiffTools 28.54% compilation time (100% recompilation)
     15.4 ms  PkgVersion
     15.7 ms  RRTMGP
     15.8 ms  Atomix
     17.8 ms  SurfaceFluxes
     18.1 ms  Krylov
     18.5 ms  IntervalSets
     18.5 ms  StatsBase
     20.3 ms  AMD 88.17% compilation time
     21.1 ms  Tables
     21.3 ms  CFTime 27.14% compilation time
     21.5 ms  Lazy
     21.6 ms  CPUSummary 14.48% compilation time
     23.4 ms  IterativeSolvers
     24.3 ms  GenericSchur
     25.7 ms  LambertW 52.18% compilation time
     26.3 ms  KLU
     28.6 ms  NVTX 80.48% compilation time
     30.2 ms  ClimaTimeSteppers
     30.9 ms  PDMats
     32.1 ms  Adapt 78.04% compilation time (5% recompilation)
     32.2 ms  ThreadingUtilities 68.93% compilation time
     36.6 ms  Interpolations 9.60% compilation time (100% recompilation)
     36.8 ms  RecursiveArrayTools 17.02% compilation time (100% recompilation)
     37.7 ms  Rmath 83.77% compilation time
     40.2 ms  AbstractTrees
     40.6 ms  MacroTools
     42.2 ms  Setfield
     42.9 ms  Static
     43.2 ms  StaticArrayInterface 38.71% compilation time
     43.5 ms  TimerOutputs 14.88% compilation time
     52.3 ms  Graphs
     53.8 ms  ChainRulesCore
     54.2 ms  LinearOperators
     55.0 ms  ColorTypes 20.95% compilation time
     57.6 ms  DiffEqBase 35.07% compilation time
     62.3 ms  ArrayInterface 69.37% compilation time (33% recompilation)
     63.4 ms  RecipesBase
     64.2 ms  DataStructures
     78.7 ms  DualNumbers
     83.1 ms  FixedPointNumbers
     89.4 ms  LLVM 37.41% compilation time (100% recompilation)
     96.4 ms  ForwardDiff
     98.8 ms  TaylorSeries 3.84% compilation time
    118.4 ms  Dierckx_jll 98.64% compilation time (100% recompilation)
    140.8 ms  BlockArrays
    142.2 ms  Colors
    165.3 ms  FillArrays
    177.3 ms  SciMLOperators
    194.5 ms  StrideArraysCore 1.13% compilation time
    195.9 ms  Distributions
    218.5 ms  OffsetArrays
    220.3 ms  NonlinearSolve
    226.7 ms  Ratios 91.79% compilation time (97% recompilation)
    235.2 ms  SciMLBase 1.95% compilation time
    244.1 ms  GPUCompiler 1.20% compilation time
    244.2 ms  FFTW 4.95% compilation time (100% recompilation)
    251.5 ms  LoopVectorization
    251.5 ms  RecursiveFactorization
    313.5 ms  KrylovKit 2.10% compilation time (100% recompilation)
    340.5 ms  SimpleNonlinearSolve 1.01% compilation time
    394.0 ms  NCDatasets 9.35% compilation time (100% recompilation)
    396.3 ms  HostCPUFeatures 9.16% compilation time (100% recompilation)
    407.9 ms  ClimaCore
    422.1 ms  StaticArrays
    534.5 ms  GPUArrays
    607.3 ms  ClimaAtmos
    649.1 ms  VectorizationBase
    650.2 ms  ColorVectorSpace 0.45% compilation time
    703.3 ms  HDF5 52.74% compilation time (87% recompilation)
    741.8 ms  MPI 92.03% compilation time (89% recompilation)
    755.7 ms  ImageBase
    763.1 ms  LinearSolve 0.40% compilation time (100% recompilation)
    857.7 ms  ImageFiltering 1.60% compilation time
    917.7 ms  ArrayLayouts
   1016.8 ms  CUDA 0.54% compilation time
   1022.6 ms  JLD2 1.06% compilation time (36% recompilation)
   1023.3 ms  FileIO 0.72% compilation time (100% recompilation)
   2690.1 ms  ImageCore
   2809.6 ms  OrdinaryDiffEq
   3991.6 ms  Insolation 90.09% compilation time (74% recompilation)
   5229.1 ms  ExponentialUtilities

Title credit to @trontrytel

To reproduce:

git clone https://github.com/CliMA/ClimaAtmos.jl
cd ClimaAtmos.jl
julia --project
@time_imports using ClimaAtmos

Add benchmarks

From slack:

we should post the benchmarks, but essentially if you want to compute the matrix exponential exp(t*A) for a large t, timestepping it can help improve the accuracy and calculation speed.

IIRC, using timestepping is almost always the right thing to do. @MSeeker1340 we should take the benchmarking code that was in the PRs and formalize those benchmarks so that we can easily point to the numerical accuracy and timings here.

The relevant PR is here: SciML/OrdinaryDiffEq.jl#372.

@ChrisRackauckas what's the standard practice of benchmarking a Julia repo? I'm thinking about adding a benchmark script (notebook?) to test and skip it in regular unit tests. The results will then be recorded in a separate markdown file.

expv does not work when used in NonlinearSolve.jl equations

Describe the bug 🐞

Solver fails when expv is in the function. exponential! kinda works.

Expected behavior

Optimize. See https://discourse.julialang.org/t/how-can-i-use-nonlinearsolve-jl-with-matrix-functions/112379

Minimal Reproducible Example 👇

using NonlinearSolve
using LinearAlgebra
using ExponentialUtilities

dim = 2
function f(u, p)
    v = rand(2)
    return expv(1, u, v) - v*2
end

u0 = rand(dim, dim) * 0.01
p = nothing
prob = NonlinearProblem(f, u0, p)
sol = solve(prob)
display(sol)
err = sol - zeros(dim, dim)
display(err)

Error & Stacktrace ⚠️

ERROR: MethodError: no method matching exponential!(::Matrix{ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 4}}, ::ExpMethodHigham2005Base)

Closest candidates are:
  exponential!(::StridedMatrix{T}, ::ExpMethodHigham2005Base, ::Any) where T<:Union{Float32, Float64, ComplexF64, ComplexF32}
   @ ExponentialUtilities ~/.julia/packages/ExponentialUtilities/xLH9y/src/exp_baseexp.jl:23
  exponential!(::StridedMatrix{T}, ::ExpMethodHigham2005Base) where T<:Union{Float32, Float64, ComplexF64, ComplexF32}
   @ ExponentialUtilities ~/.julia/packages/ExponentialUtilities/xLH9y/src/exp_baseexp.jl:23
  exponential!(::Any, ::ExpMethodNative, ::Any)
   @ ExponentialUtilities ~/.julia/packages/ExponentialUtilities/xLH9y/src/exp.jl:70
  ...

Stacktrace:
  [1] expv!(w::Vector{…}, t::Int64, Ks::KrylovSubspace{…}; cache::Nothing, expmethod::ExpMethodHigham2005Base)
    @ ExponentialUtilities ~/.julia/packages/ExponentialUtilities/xLH9y/src/krylov_phiv.jl:95
  [2] _expv_hb(t::Int64, A::Matrix{…}, b::Vector{…}; expmethod::ExpMethodHigham2005Base, cache::Nothing, kwargs_arnoldi::@Kwargs{})
    @ ExponentialUtilities ~/.julia/packages/ExponentialUtilities/xLH9y/src/krylov_phiv.jl:51
  [3] _expv_hb(t::Int64, A::Matrix{ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 4}}, b::Vector{Float64})
    @ ExponentialUtilities ~/.julia/packages/ExponentialUtilities/xLH9y/src/krylov_phiv.jl:45
  [4] expv(t::Int64, A::Matrix{ForwardDiff.Dual{…}}, b::Vector{Float64}; mode::Symbol, kwargs::@Kwargs{})
    @ ExponentialUtilities ~/.julia/packages/ExponentialUtilities/xLH9y/src/krylov_phiv.jl:38
  [5] expv(t::Int64, A::Matrix{ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 4}}, b::Vector{Float64})
    @ ExponentialUtilities ~/.julia/packages/ExponentialUtilities/xLH9y/src/krylov_phiv.jl:35
  [6] f(u::Matrix{ForwardDiff.Dual{ForwardDiff.Tag{NonlinearSolve.NonlinearSolveTag, Float64}, Float64, 4}}, p::Nothing)
    @ Main ./Untitled-1:8
  [7] (::NonlinearFunction{…})(::Matrix{…}, ::Vararg{…})
    @ SciMLBase ~/.julia/packages/SciMLBase/QSc1r/src/scimlfunctions.jl:2197
  [8] (::SciMLBase.JacobianWrapper{false, NonlinearFunction{…}, Nothing})(u::Matrix{ForwardDiff.Dual{…}})
    @ SciMLBase ~/.julia/packages/SciMLBase/QSc1r/src/function_wrappers.jl:97
  [9] vector_mode_dual_eval!(f::SciMLBase.JacobianWrapper{…}, cfg::ForwardDiff.JacobianConfig{…}, x::Matrix{…})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/apiutils.jl:24
 [10] vector_mode_jacobian!(result::Matrix{…}, f::SciMLBase.JacobianWrapper{…}, x::Matrix{…}, cfg::ForwardDiff.JacobianConfig{…})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/jacobian.jl:145
 [11] jacobian!(result::Matrix{…}, f::Function, x::Matrix{…}, cfg::ForwardDiff.JacobianConfig{…}, ::Val{…})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/jacobian.jl:58
 [12] jacobian!(result::Matrix{…}, f::Function, x::Matrix{…}, cfg::ForwardDiff.JacobianConfig{…})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/jacobian.jl:56
 [13] sparse_jacobian!(J::Matrix{…}, ::AutoForwardDiff{…}, cache::SparseDiffTools.ForwardDiffJacobianCache{…}, f::SciMLBase.JacobianWrapper{…}, x::Matrix{…})
    @ SparseDiffTools ~/.julia/packages/SparseDiffTools/qxnHN/src/highlevel/forward_mode.jl:60
 [14] (::NonlinearSolve.JacobianCache{…})(J::Matrix{…}, u::Matrix{…}, p::Nothing)
    @ NonlinearSolve ~/.julia/packages/NonlinearSolve/5yLII/src/internal/jacobian.jl:136
 [15] JacobianCache
    @ ~/.julia/packages/NonlinearSolve/5yLII/src/internal/jacobian.jl:106 [inlined]
 [16] __step!(cache::NonlinearSolve.GeneralizedFirstOrderAlgorithmCache{…}; recompute_jacobian::Nothing, kwargs::@Kwargs{})
    @ NonlinearSolve ~/.julia/packages/NonlinearSolve/5yLII/src/core/generalized_first_order.jl:207
 [17] __step!
    @ ~/.julia/packages/NonlinearSolve/5yLII/src/core/generalized_first_order.jl:203 [inlined]
 [18] #step!#210
    @ ~/.julia/packages/NonlinearSolve/5yLII/src/core/generic.jl:55 [inlined]
 [19] step!
    @ ~/.julia/packages/NonlinearSolve/5yLII/src/core/generic.jl:50 [inlined]
 [20] solve!(cache::NonlinearSolve.GeneralizedFirstOrderAlgorithmCache{…})
    @ NonlinearSolve ~/.julia/packages/NonlinearSolve/5yLII/src/core/generic.jl:13
 [21] __solve(::NonlinearProblem{…}, ::GeneralizedFirstOrderAlgorithm{…}; kwargs::@Kwargs{…})
    @ NonlinearSolve ~/.julia/packages/NonlinearSolve/5yLII/src/core/generic.jl:4
 [22] __solve
    @ ~/.julia/packages/NonlinearSolve/5yLII/src/core/generic.jl:1 [inlined]
 [23] macro expansion
    @ ~/.julia/packages/NonlinearSolve/5yLII/src/default.jl:278 [inlined]
 [24] __solve(::NonlinearProblem{…}, ::NonlinearSolvePolyAlgorithm{…}; alias_u0::Bool, verbose::Bool, kwargs::@Kwargs{…})
    @ NonlinearSolve ~/.julia/packages/NonlinearSolve/5yLII/src/default.jl:248
 [25] __solve
    @ ~/.julia/packages/NonlinearSolve/5yLII/src/default.jl:248 [inlined]
 [26] #__solve#329
    @ ~/.julia/packages/NonlinearSolve/5yLII/src/default.jl:511 [inlined]
 [27] __solve
    @ ~/.julia/packages/NonlinearSolve/5yLII/src/default.jl:508 [inlined]
 [28] #__solve#61
    @ ~/.julia/packages/DiffEqBase/nc5nE/src/solve.jl:1371 [inlined]
 [29] __solve
    @ ~/.julia/packages/DiffEqBase/nc5nE/src/solve.jl:1364 [inlined]
 [30] #solve_call#34
    @ ~/.julia/packages/DiffEqBase/nc5nE/src/solve.jl:609 [inlined]
 [31] solve_call
    @ ~/.julia/packages/DiffEqBase/nc5nE/src/solve.jl:567 [inlined]
 [32] #solve_up#42
    @ ~/.julia/packages/DiffEqBase/nc5nE/src/solve.jl:1050 [inlined]
 [33] solve_up
    @ ~/.julia/packages/DiffEqBase/nc5nE/src/solve.jl:1044 [inlined]
 [34] #solve#41
    @ ~/.julia/packages/DiffEqBase/nc5nE/src/solve.jl:1038 [inlined]
 [35] solve(::NonlinearProblem{…})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/nc5nE/src/solve.jl:1028
 [36] top-level scope
    @ Untitled-1:14
Some type information was truncated. Use `show(err)` to see complete types.

Environment (please complete the following information):

Output of using Pkg; Pkg.status()

[667455a9] Cubature v1.5.1
  [31c24e10] Distributions v0.25.107
  [d4d017d3] ExponentialUtilities v1.26.1
  [f6369f11] ForwardDiff v0.10.36
⌃ [de52edbc] Integrals v4.3.0
⌃ [8913a72c] NonlinearSolve v3.8.3
⌃ [1dea7af3] OrdinaryDiffEq v6.70.1
⌃ [91a5bcdd] Plots v1.40.1
  [44d3d7a6] Weave v0.10.12

Output of using Pkg; Pkg.status(; mode = PKGMODE_MANIFEST)

⌃ [47edcb42] ADTypes v0.2.6
⌃ [7d9f7c33] Accessors v0.1.35
⌃ [79e6a3ab] Adapt v4.0.1
⌅ [ec485272] ArnoldiMethod v0.2.0
⌃ [4fba245c] ArrayInterface v7.7.0
⌃ [4c555306] ArrayLayouts v1.6.0
  [d1d4a3ce] BitFlags v0.1.8
  [62783981] BitTwiddlingConvenienceFunctions v0.1.5
  [2a0fbf3d] CPUSummary v0.2.4
  [49dc2e85] Calculus v0.5.1
  [fb6a15b2] CloseOpenIntervals v0.1.12
  [944b1d66] CodecZlib v0.7.4
  [35d6a980] ColorSchemes v3.24.0
  [3da002f7] ColorTypes v0.11.4
  [c3611d14] ColorVectorSpace v0.10.0
  [5ae59095] Colors v0.12.10
  [861a8166] Combinatorics v1.0.2
  [38540f10] CommonSolve v0.2.4
  [bbf7d656] CommonSubexpressions v0.3.0
⌃ [34da2185] Compat v4.12.0
  [a33af91c] CompositionsBase v0.1.2
  [2569d6c7] ConcreteStructs v0.2.3
⌃ [f0e56b4a] ConcurrentUtilities v2.3.1
⌃ [187b0558] ConstructionBase v1.5.4
  [d38c429a] Contour v0.6.2
  [adafc99b] CpuId v0.3.1
  [667455a9] Cubature v1.5.1
  [9a962f9c] DataAPI v1.16.0
⌃ [864edb3b] DataStructures v0.18.16
  [e2d170a0] DataValueInterfaces v1.0.0
  [8bb1440f] DelimitedFiles v1.9.1
⌃ [2b5f629d] DiffEqBase v6.146.2
  [163ba53b] DiffResults v1.1.0
  [b552c78f] DiffRules v1.15.1
  [31c24e10] Distributions v0.25.107
  [ffbed154] DocStringExtensions v0.9.3
  [fa6b7ba4] DualNumbers v0.6.8
  [4e289a0a] EnumX v1.0.4
⌅ [f151be2c] EnzymeCore v0.6.5
  [460bff9d] ExceptionUnwrapping v0.1.10
  [d4d017d3] ExponentialUtilities v1.26.1
  [e2ba6199] ExprTools v0.1.10
  [c87230d0] FFMPEG v0.4.1
  [7034ab61] FastBroadcast v0.2.8
  [9aa1b823] FastClosures v0.3.2
⌃ [29a986be] FastLapackInterface v2.0.1
  [1a297f60] FillArrays v1.9.3
⌃ [6a86dc24] FiniteDiff v2.22.0
  [53c48c17] FixedPointNumbers v0.8.4
⌃ [59287772] Formatting v0.4.2
  [f6369f11] ForwardDiff v0.10.36
  [069b7b12] FunctionWrappers v1.1.3
  [77dc65aa] FunctionWrappersWrappers v0.1.3
  [46192b85] GPUArraysCore v0.1.6
⌃ [28b8d3ca] GR v0.73.2
⌃ [c145ed77] GenericSchur v0.5.3
  [86223c79] Graphs v1.9.0
  [42e2da0e] Grisu v1.0.2
⌃ [19dc6840] HCubature v1.5.2
⌃ [cd3eb016] HTTP v1.10.1
  [eafb193a] Highlights v0.5.2
  [3e5b6fbb] HostCPUFeatures v0.1.16
  [34004b35] HypergeometricFunctions v0.3.23
  [615f187c] IfElse v0.1.1
  [d25df0c9] Inflate v0.1.4
  [18e54dd8] IntegerMathUtils v0.1.2
⌃ [de52edbc] Integrals v4.3.0
⌃ [3587e190] InverseFunctions v0.1.12
  [92d709cd] IrrationalConstants v0.2.2
  [82899510] IteratorInterfaceExtensions v1.0.0
  [1019f520] JLFzf v0.1.7
  [692b3bcd] JLLWrappers v1.5.0
  [682c06a0] JSON v0.21.4
⌅ [ef3ab10e] KLU v0.5.0
  [ba0b0d4f] Krylov v0.9.5
  [b964fa9f] LaTeXStrings v1.3.1
⌃ [23fbe1c1] Latexify v0.16.1
  [73f95e8e] LatticeRules v0.0.1
  [10f19ff3] LayoutPointers v0.1.15
  [50d2b5c4] Lazy v0.15.1
  [5078a376] LazyArrays v1.8.3
  [d3d80556] LineSearches v7.2.0
⌃ [7ed4a6bd] LinearSolve v2.23.2
⌃ [2ab3a3ac] LogExpFunctions v0.3.26
  [e6f89c97] LoggingExtras v1.0.3
⌃ [bdcacae8] LoopVectorization v0.12.166
  [1914dd2f] MacroTools v0.5.13
  [d125e4d3] ManualMemory v0.1.8
  [a3b82374] MatrixFactorizations v2.1.0
⌃ [bb5d69b7] MaybeInplace v0.1.1
  [739be429] MbedTLS v1.1.9
  [442fdcdd] Measures v0.3.2
  [e1d29d7a] Missings v1.1.0
  [4886b29c] MonteCarloIntegration v0.2.0
  [46d2c3a1] MuladdMacro v0.2.4
  [ffc61752] Mustache v1.0.19
  [d41bc354] NLSolversBase v7.8.3
  [77ba4419] NaNMath v1.0.2
⌃ [8913a72c] NonlinearSolve v3.8.3
  [6fe1bfb0] OffsetArrays v1.13.0
⌃ [4d8831e6] OpenSSL v1.4.1
  [bac558e1] OrderedCollections v1.6.3
⌃ [1dea7af3] OrdinaryDiffEq v6.70.1
  [90014a1f] PDMats v0.11.31
  [65ce6f38] PackageExtensionCompat v1.0.2
  [d96e819e] Parameters v0.12.3
  [69de0a69] Parsers v2.8.1
  [b98c9c47] Pipe v1.3.0
  [ccf2f8ad] PlotThemes v3.1.0
⌃ [995b91a9] PlotUtils v1.4.0
⌃ [91a5bcdd] Plots v1.40.1
⌃ [f517fe37] Polyester v0.7.9
  [1d0040c9] PolyesterWeave v0.2.1
  [d236fae5] PreallocationTools v0.4.20
⌃ [aea7be01] PrecompileTools v1.2.0
⌃ [21216c6a] Preferences v1.4.1
⌃ [27ebfcd6] Primes v0.5.5
  [1fd47b50] QuadGK v2.9.4
  [8a4e6c94] QuasiMonteCarlo v0.3.3
  [3cdcf5f2] RecipesBase v1.3.4
  [01d81517] RecipesPipeline v0.6.12
⌃ [731186ca] RecursiveArrayTools v3.8.0
  [f2c3362d] RecursiveFactorization v0.2.21
  [189a3867] Reexport v1.2.2
  [05181044] RelocatableFolders v1.0.1
  [ae029012] Requires v1.3.0
  [79098fc4] Rmath v0.7.1
  [7e49a35a] RuntimeGeneratedFunctions v0.5.12
  [94e857df] SIMDTypes v0.1.0
  [476501e8] SLEEFPirates v0.6.42
⌃ [0bca4576] SciMLBase v2.24.0
⌃ [c0aeaf25] SciMLOperators v0.3.7
  [6c6a2e73] Scratch v1.2.1
  [efcf1570] Setfield v1.1.1
  [992d4aef] Showoff v1.0.3
  [777ac1f9] SimpleBufferStream v1.1.0
⌃ [727e6d20] SimpleNonlinearSolve v1.4.1
  [699a6c99] SimpleTraits v0.9.4
  [ce78b400] SimpleUnPack v1.1.0
  [ed01d8cd] Sobol v1.5.0
  [a2af1166] SortingAlgorithms v1.2.1
⌃ [47a9eef4] SparseDiffTools v2.16.0
  [e56a9233] Sparspak v0.3.9
  [276daf66] SpecialFunctions v2.3.1
  [aedffcd0] Static v0.8.10
  [0d7ed370] StaticArrayInterface v1.5.0
⌃ [90137ffa] StaticArrays v1.9.2
  [1e83bf80] StaticArraysCore v1.4.2
  [82ae8749] StatsAPI v1.7.0
  [2913bbd2] StatsBase v0.34.2
⌃ [4c63d2b9] StatsFuns v1.3.0
  [7792a7ef] StrideArraysCore v0.5.2
  [69024149] StringEncodings v0.3.7
⌃ [2efcf032] SymbolicIndexingInterface v0.3.5
  [3783bdb8] TableTraits v1.0.1
  [bd369af6] Tables v1.11.1
  [62fd8b95] TensorCore v0.1.1
  [8290d209] ThreadingUtilities v0.5.2
  [a759f4b9] TimerOutputs v0.5.23
⌃ [3bb67fe8] TranscodingStreams v0.10.3
⌃ [d5829a12] TriangularSolve v0.1.20
  [410a4b4d] Tricks v0.1.8
  [781d530d] TruncatedStacktraces v1.4.0
  [5c2747f8] URIs v1.5.1
  [3a884ed6] UnPack v1.0.2
  [1cfade01] UnicodeFun v0.4.1
  [1986cc42] Unitful v1.19.0
  [45397f5d] UnitfulLatexify v1.6.3
  [41fe7b60] Unzip v0.2.0
  [3d5dd08c] VectorizationBase v0.21.65
  [19fa3120] VertexSafeGraphs v0.2.0
  [44d3d7a6] Weave v0.10.12
  [ddb6d928] YAML v0.4.9
  [6e34b625] Bzip2_jll v1.0.8+1
⌃ [83423d85] Cairo_jll v1.16.1+1
  [7bc98958] Cubature_jll v1.0.5+0
  [2702e6a9] EpollShim_jll v0.0.20230411+0
  [2e619515] Expat_jll v2.5.0+0
⌅ [b22a6f82] FFMPEG_jll v4.4.4+1
  [a3f928ae] Fontconfig_jll v2.13.93+0
  [d7e528f0] FreeType2_jll v2.13.1+0
  [559328eb] FriBidi_jll v1.0.10+0
  [0656b61e] GLFW_jll v3.3.9+0
⌅ [d2c73de3] GR_jll v0.73.2+0
  [78b55507] Gettext_jll v0.21.0+0
⌃ [7746bdde] Glib_jll v2.76.5+0
  [3b182d85] Graphite2_jll v1.3.14+0
  [2e76f6c2] HarfBuzz_jll v2.8.1+1
  [1d5cc7b8] IntelOpenMP_jll v2024.0.2+0
⌃ [aacddb02] JpegTurbo_jll v3.0.1+0
  [c1c5ebd0] LAME_jll v3.100.1+0
  [88015f11] LERC_jll v3.0.0+1
  [1d63c593] LLVMOpenMP_jll v15.0.7+0
  [dd4b983a] LZO_jll v2.10.1+0
⌅ [e9f186c6] Libffi_jll v3.2.2+1
  [d4300ac3] Libgcrypt_jll v1.8.7+0
  [7e76a0d4] Libglvnd_jll v1.6.0+0
  [7add5ba3] Libgpg_error_jll v1.42.0+0
  [94ce4f54] Libiconv_jll v1.17.0+0
⌃ [4b2f31a3] Libmount_jll v2.35.0+0
⌅ [89763e89] Libtiff_jll v4.5.1+1
⌃ [38a345b3] Libuuid_jll v2.36.0+0
  [856f044c] MKL_jll v2024.0.0+0
  [e7412a2a] Ogg_jll v1.3.5+1
⌃ [458c3c95] OpenSSL_jll v3.0.13+0
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [91d4177d] Opus_jll v1.3.2+0
  [30392449] Pixman_jll v0.42.2+0
  [c0090381] Qt6Base_jll v6.5.3+1
  [f50d1b31] Rmath_jll v0.4.0+0
  [a44049a8] Vulkan_Loader_jll v1.3.243+0
  [a2964d1f] Wayland_jll v1.21.0+1
  [2381bf8a] Wayland_protocols_jll v1.31.0+0
⌃ [02c8fc9c] XML2_jll v2.12.2+0
  [aed1982a] XSLT_jll v1.1.34+0
⌃ [ffd25f8a] XZ_jll v5.4.5+0
  [f67eecfb] Xorg_libICE_jll v1.0.10+1
  [c834827a] Xorg_libSM_jll v1.2.3+0
  [4f6342f7] Xorg_libX11_jll v1.8.6+0
  [0c0b7dd1] Xorg_libXau_jll v1.0.11+0
  [935fb764] Xorg_libXcursor_jll v1.2.0+4
  [a3789734] Xorg_libXdmcp_jll v1.1.4+0
  [1082639a] Xorg_libXext_jll v1.3.4+4
  [d091e8ba] Xorg_libXfixes_jll v5.0.3+4
  [a51aa0fd] Xorg_libXi_jll v1.7.10+4
  [d1454406] Xorg_libXinerama_jll v1.1.4+4
  [ec84b674] Xorg_libXrandr_jll v1.5.2+4
  [ea2f1a96] Xorg_libXrender_jll v0.9.10+4
  [14d82f49] Xorg_libpthread_stubs_jll v0.1.1+0
  [c7cfdc94] Xorg_libxcb_jll v1.15.0+0
  [cc61e674] Xorg_libxkbfile_jll v1.1.2+0
  [e920d4aa] Xorg_xcb_util_cursor_jll v0.1.4+0
  [12413925] Xorg_xcb_util_image_jll v0.4.0+1
  [2def613f] Xorg_xcb_util_jll v0.4.0+1
  [975044d2] Xorg_xcb_util_keysyms_jll v0.4.0+1
  [0d47668e] Xorg_xcb_util_renderutil_jll v0.3.9+1
  [c22f9ab0] Xorg_xcb_util_wm_jll v0.4.1+1
  [35661453] Xorg_xkbcomp_jll v1.4.6+0
  [33bec58e] Xorg_xkeyboard_config_jll v2.39.0+0
  [c5fb5394] Xorg_xtrans_jll v1.5.0+0
⌃ [3161d3a3] Zstd_jll v1.5.5+0
  [35ca27e7] eudev_jll v3.2.9+0
  [214eeab7] fzf_jll v0.43.0+0
  [1a1c6b14] gperf_jll v3.1.1+0
  [a4ae2306] libaom_jll v3.4.0+0
  [0ac62f75] libass_jll v0.15.1+0
  [2db6ffa8] libevdev_jll v1.11.0+0
  [f638f0a6] libfdk_aac_jll v2.0.2+0
  [36db933b] libinput_jll v1.18.0+0
⌃ [b53b4c65] libpng_jll v1.6.40+0
  [f27f6e37] libvorbis_jll v1.3.7+1
  [009596ad] mtdev_jll v1.1.6+0
  [1270edf5] x264_jll v2021.5.5+0
  [dfaa095f] x265_jll v3.5.0+0
  [d8fb68d0] xkbcommon_jll v1.4.1+1
  [0dad84c5] ArgTools v1.1.1
  [56f22d72] Artifacts
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [8ba89e20] Distributed
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching
  [9fa8497b] Future
  [b77e0a4c] InteractiveUtils
  [4af54fe1] LazyArtifacts
  [b27032c2] LibCURL v0.6.4
  [76f85450] LibGit2
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [d6f4376e] Markdown
  [a63ad114] Mmap
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.10.0
  [de0858da] Printf
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization
  [1a1011a3] SharedArrays
  [6462fe0b] Sockets
  [2f01184e] SparseArrays v1.10.0
  [10745b16] Statistics v1.10.0
  [4607b0f0] SuiteSparse
  [fa267f1f] TOML v1.0.3
  [a4e569a6] Tar v1.10.0
  [8dfed614] Test
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
  [e66e0078] CompilerSupportLibraries_jll v1.1.0+0
  [deac9b47] LibCURL_jll v8.4.0+0
  [e37daf67] LibGit2_jll v1.6.4+0
  [29816b5a] LibSSH2_jll v1.11.0+1
  [c8ffd9c3] MbedTLS_jll v2.28.2+1
  [14a3606d] MozillaCACerts_jll v2023.1.10
  [4536629a] OpenBLAS_jll v0.3.23+4
  [05823500] OpenLibm_jll v0.8.1+2
  [efcefdf7] PCRE2_jll v10.42.0+1
  [bea87d4a] SuiteSparse_jll v7.2.1+1
  [83775a58] Zlib_jll v1.2.13+1
  [8e850b90] libblastrampoline_jll v5.8.0+1
  [8e850ede] nghttp2_jll v1.52.0+1
  [3f19e933] p7zip_jll v17.4.0+2
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated -m`

Output of versioninfo()

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_EDITOR = code

Additional context

Add any other context about the problem here.

Improve docstrings

Such that makedocs warnonly = [:missing_docs, :docs_block] can be removed.

`phiv_timestep` performance problems for scalar equations

Describe the bug 🐞

I consider this a bug since the performance difference is so severe, but I could move this to a feature request if desired.

In general, functions like phiv_timestep and expv_timestep are very fast for moderately-sized matrices. However, I'm noticing orders of magnitude in degradation, both in performance and in allocations, when the matrices are size-1.

It is possible for applications to circumvent this by manually implementing the functionality for scalar equations, but it would clearly be more convenient if the performance of these methods for scalar problems could be improved.

Expected behavior

Size-1 equations should be as fast as, or even faster than, larger ones.

Minimal Reproducible Example 👇

using ExponentialUtilities
function time(N)
        A = randn(N, N)
        b = randn(N, 1)
        @time phiv_timestep(1, A, b);
end

time(1)   # Warm up
time(1)   # 3.109664 seconds (18.85 M allocations: 18.841 GiB, 7.80% gc time)
time(2)   # 0.003811 seconds (3.81 k allocations: 406.266 KiB)
time(100) # 0.002961 seconds (261 allocations: 543.359 KiB)

Environment (please complete the following information):

Output of using Pkg; Pkg.status()

[d4d017d3] ExponentialUtilities v1.25.0

Output of versioninfo()

Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 12 virtual cores

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Enable preconditioning

Preconditioned Newton-Krylov solver seems provide better performance than Exponential-Krylov integrator without preconditioning. It might be a good idea to add preconditioning to expv using the SAI method.

[1] https://ris.utwente.nl/ws/portalfiles/portal/5092923/memo1992.pdf
[2] https://doi.org/10.1137/040605461

	"""
	phi!(out,A,k[;caches]) -> out

	Non-allocating version of `phi` for non-diagonal matrix inputs.
	"""

	copyto!(cache, @view(H[1:m, :]))
	if ishermitian(cache)
	# Optimize the case for symtridiagonal H
	F = eigen!(SymTridiagonal(cache))

	@generated function exp_pade_p(x, ::Val{k}, ::Val{m}) where {k, m}
	factorial = Base.factorial ∘ big
	p = map(Tuple(0:k)) do j
	num = factorial(k + m - j) * factorial(k)
	den = factorial(k + m) * factorial(k - j)*factorial(j)
	(float ∘ eltype)(x)(num // den) * (x <: Number ? 1 : I)
	end
	:(@evalpoly(x, $(p...)))
	end

	mul!(A2, A, A)
	@. U = C[2] * P
	@. V = C[1] * P
	for k in 1:(div(size(C, 1), 2) - 1)
	k2 = 2 * k
	mul!(temp, P, A2); P, temp = temp, P # equivalent to P *= A2
	@. U += C[k2 + 2] * P
	@. V += C[k2 + 1] * P
	end

sciml / exponentialutilities.jl Goto Github PK

exponentialutilities.jl's Introduction

ExponentialUtilities

Tutorials and Documentation

Example

exponentialutilities.jl's People

Contributors

Stargazers

Watchers

Forkers

exponentialutilities.jl's Issues

The reason

Potential solution

Applicability to non-hermitian operators?

Error estimates

Recommend Projects

Recommend Topics

Recommend Org