I wasn't sure if this should be an issue on CCL or here, but here's a quick timing exercise I've carried out on the latest version of CCL (well, actually the version in this PR LSSTDESC/CCL#614).
The script below generates the theoretical predictions for an LSST-like 3x2pt data vector, using time
to time the different parts of the calculation. For this exercise I've taken 10 lens bins and 10 source bins, I've assumed that we want all possible auto- and cross-correlations, and that we compute each of them at 100 ell values between 10 and 3000 (the calculation doesn't depend much on what those values are anyway). In this case the results are (numbers are seconds):
Timing summary:
- Cosmology and Pk: 1.669
- GC tracers: 0.098
- WL tracers: 0.366
- Cl gg: 0.930
- Cl gl: 1.345
- Cl ll: 0.707
--------------------------
Total: 5.115
so it takes around the same time to initialize CLASS's power spectrum as it does to compute the C_ells (actually, a bit longer for the latter). Note that this is for a standard LCDM cosmology with no funny business (neutrino masses etc.), which would make CLASS slower.
If we instead assume that we only need to evaluate the power spectra at 15 values of ell, then CLASS becomes the bottleneck (followed by the computation of the lensing kernels):
Timing summary:
- Cosmology and Pk: 1.706
- GC tracers: 0.096
- WL tracers: 0.354
- Cl gg: 0.139
- Cl gl: 0.212
- Cl ll: 0.101
--------------------------
Total: 2.607
If the answer to the above is "yes", a quick exploration tells me that the time taken to compute C_ells can't be reduced significantly simply by lowering the GSL integrator accuracy (I've only managed to reduce the C_ell time by a factor ~1.5 by raising the tolerance one order of magnitude), and we should actually look into other integration schemes.
Anyway, sorry for the long issue. Here's the script I've used for this:
import numpy as np
import pyccl as ccl
from scipy.special import erf
import time
# Prepare redshift distribution and bias
nbins = 10
z_edges = np.linspace(0, 2.5, nbins + 1)
z_sigmas = 0.05 * (1 + 0.5 * (z_edges[1:] + z_edges[:-1]))
numz = 512
zz = np.linspace(0, 3, numz)
nz_all = zz**2*np.exp(-(zz/0.8)**1.5)
nzs=0.5*(erf((z_edges[1:, None] - zz[None,:]) / np.sqrt(2 * z_sigmas[:,None]**2)) -
erf((z_edges[:-1, None] - zz[None,:]) / np.sqrt(2 * z_sigmas[:,None]**2)))
nzs *= nz_all[None,:]
bz = 0.95 * (1 + zz)
start_all = time.time()
# Set up cosmology object and compute power spectrum
start = time.time()
cosmo = ccl.Cosmology(Omega_c=0.26066676,
Omega_b=0.048974682,
h=0.6766,
sigma8=0.8102,
n_s=0.9665)
s8 = ccl.sigma8(cosmo)
time_cosmo = time.time()-start
# Create tracers
start = time.time()
gc_t = [ccl.NumberCountsTracer(cosmo, False, (zz, nz), (zz, bz)) for nz in nzs]
time_gc_tracers = time.time()-start
start = time.time()
wl_t = [ccl.WeakLensingTracer(cosmo, (zz, nz)) for nz in nzs]
time_wl_tracers = time.time()-start
# Compute power spectra
n_ell = 15
l_arr = np.logspace(np.log10(10),np.log10(3000),n_ell)
start = time.time()
cls = np.zeros([2*nbins, 2*nbins, n_ell])
for i1, t1 in enumerate(gc_t):
for i2,t2 in enumerate(gc_t[i1:]):
cls[i1, i1+i2, :] = ccl.angular_cl(cosmo, t1, t2, l_arr)
time_clgg = time.time()-start
start = time.time()
for i1,t1 in enumerate(gc_t):
for i2,t2 in enumerate(wl_t):
cls[i1, nbins + i2] = ccl.angular_cl(cosmo, t1, t2, l_arr)
time_clgl = time.time()-start
start = time.time()
for i1, t1 in enumerate(wl_t):
for i2, t2 in enumerate(wl_t[i1:]):
cls[nbins + i1, nbins + i1 + i2] = ccl.angular_cl(cosmo, t1, t2, l_arr)
time_clll = time.time()-start
time_all = time.time()-start_all
print("Timing summary:")
print(" - Cosmology and Pk: %.3lf" % time_cosmo)
print(" - GC tracers: %.3lf" % time_gc_tracers)
print(" - WL tracers: %.3lf" % time_wl_tracers)
print(" - Cl gg: %.3lf" % time_clgg)
print(" - Cl gl: %.3lf" % time_clgl)
print(" - Cl ll: %.3lf" % time_clll)
print("--------------------------")
print(" Total: %.3lf" % time_all)