I'm currently trying to access the AD hessian for a very large model I have.
I've been investigating BridgeStan's AD hessian to use in the large model since I was having trouble noticing any big speed-up from compiling the model with BRIDGESTAN_AD_HESSIAN=true
(see #52).
I decided to try sketching out a simpler model that has a non-trivial gradient and benchmarking it to see if I could identify whether or not the compiler flag was doing anything. My benchmarking seems to suggest that the AD hessian calculated with BRIDGESTAN_AD_HESSIAN=true
is much slower than the default.
Here's the Julia code:
import Pkg; Pkg.activate(@__DIR__)
using BridgeStan
using JSON3
using BenchmarkTools
using Random
using Distributions
using LinearAlgebra
model_path = "gaussian.stan"
data_path = "gaussian-data.json"
# Make a data simulator to change the sample size
K = 10
N = 100
Random.seed!(1)
mu = randn(K)
sigma = rand(K)
draws = map(x -> x, eachrow(Matrix(rand(MvNormal(mu, Diagonal(sigma .^ 2)), N)')))
data = Dict("N" => N, "K" => K, "X" => draws)
open(data_path, "w") do io
JSON3.write(io, data)
end
# Compile the BridgeStan model
lib_path = BridgeStan.compile_model(
model_path;
stanc_args = String["--O1"],
# make_args = String["BRIDGESTAN_AD_HESSIAN=true"],
)
# Add data to the model
smb = StanModel(lib_path, data_path)
# Find out the number of parameters
num_params = BridgeStan.param_unc_num(smb)
# Run the model
bm = @benchmark lp = BridgeStan.log_density_hessian(smb, zeros(num_params))
display(bm)
and the Stan model, stored as gaussian.stan
:
data {
int<lower=0> N;
int<lower=0> K;
matrix[N,K] X;
}
parameters {
vector[K] mu;
vector<lower=0>[K]Sigma;
cholesky_factor_corr[K] Omega;
}
model {
mu ~ normal(0, 1);
Omega ~ lkj_corr_cholesky(1);
Sigma ~ cauchy(0, 2.5);
for(n in 1:N)
X[n,:] ~ multi_normal_cholesky(rep_vector(0, K), diag_pre_multiply(Sigma, Omega));
}
The benchmarking is a little confusing, because I would have expected the AD hessian to be much faster than finite differences. Perhaps the log density is simply just very easy to evaluate, so that the finite difference hessian is quicker?
Anyway, here's the results from running it with BRIDGESTAN_AD_HESSIAN=true
BenchmarkTools.Trial: 33 samples with 1 evaluation.
Range (min … max): 134.849 ms … 164.949 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 153.700 ms ┊ GC (median): 0.00%
Time (mean ± σ): 151.588 ms ± 8.789 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▃ ▃ ▃ █ ▃▃ ▃
▇▁▇▁▇▁█▁▁▇▁▁▁▇▁▁▁▁▁▁▁▁▁█▇█▁▁▁▇▁▇▁▁▁▁▇▇▇▁█▁▁▁▁██▇▇▁█▁▇▁▇▇▇▁▁▁▇ ▁
135 ms Histogram: frequency by time 165 ms <
Memory estimate: 34.33 KiB, allocs estimate: 6.
and without the flag, in the default mode:
BenchmarkTools.Trial: 194 samples with 1 evaluation.
Range (min … max): 22.855 ms … 27.557 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 27.042 ms ┊ GC (median): 0.00%
Time (mean ± σ): 25.794 ms ± 1.804 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
█▃▂
▃▃▆▅▆▅▅▄▂▂▃▁▂▁▁▂▁▂▂▂▁▂▁▁▂▂▁▂▁▁▁▁▂▁▁▂▂▁▁▁▁▁▁▃▁▁▁▁▁▁▃▃████▅▃▃ ▂
22.9 ms Histogram: frequency by time 27.5 ms <
Memory estimate: 34.33 KiB, allocs estimate: 6.
Note that the case where BRIDGESTAN_AD_HESSIAN=true
takes nearly six times longer on average. Is this expected or am I missing something?
Misc hardware notes:
- Pop OS 22.04
- cmdstan 2.31
- Julia 1.8