lsstdesc / firecrown Goto Github PK

DESC Cosmology Likelihood Framework

License: BSD 3-Clause "New" or "Revised" License

Python 99.35% Makefile 0.24% HTML 0.13% SCSS 0.20% Shell 0.08%

cobaya cosmosis likelihood-functions python sampler numcosmo

firecrown's Introduction

Firecrown

Firecrown is a Python package that provides the DESC framework to implement likelihoods, as well as specific likelihood implementations. Firecrown is intended to be usable from external statistical analysis tools.

See the documentation for more details and installation instructions.

firecrown's People

Contributors

Stargazers

Watchers

Forkers

ellison-j egawiser am610 mardom heatherprince lsstdesc mattkwiecien

firecrown's Issues

integrate the firecrown CLI with `emcee`

We should be able to call something like

firecrown sample --emcee config.yaml

and get back samples in a (preferably binary) file.

Note it would be good to use the click nested commands for this.

We will also need to decide on how to specify the number of walkers, how to init, etc.

This should use the functions from #28.

Systematics ordering

From a chat between @elikrause and @joezuntz :

Currently systematics can be run in an order which is scientifically inconsistent, like if a photo-z sys is applied after an IA or galaxy bias swhich uses that redshift. We need a check to raise an error in this specific case.

In future systematics may get more complicated, so we should watch this area carefully in case something more robust is needed in future.

Also - the nz_ vs _nz is way, way, way too confusing for someone browsing the code. This needs to change to something more clear.

Implement non-linear modelling using halo model

This is a sister ticket to #52 .
Implement halo model fitting into firecrown. Not that I think this is what we will ultimately use, but it would be useful for tests of model sufficiency, etc. Useful for redoing HSC fits etc.
David can probably do this while walking his dog.

ReadTheDocs pages

The gold standard for modern documentation is "Read The Docs", which uses sphinx documentation.

We need to first set up the infrastructure for these pages.

Additive shear bias

Implement placeholder parameterization for additive shear power spectrum (Eq. 19 in https://arxiv.org/pdf/astro-ph/0506030.pdf) and interface with CCL methods for Hankeltransform to calculate additive shear contribution to shear correlation functions.

implement correlated multiplicative bias prior

We should implement a prior on m which allows for variance and some degree of correlation between each tomographic bin. We can assume the correlations and variances are identical for now.

finish the firecrown statistic io

Right now each log-likelihood section returns any data as a set of possibly nest dictionaries containing values which are numpy structured arrays. We should add some temporary code to render this into a set of directories (following the nesting of the dictionaries) with CSV outputs for each record array.

To be more specific, something like

{'a': {'carr': <structured array>}, 'darr': <structured array>}

should end up as

output_<analysis_id>/a/carr.csv
output_<analysis_id>/darr.csv

This will have to be revisited, but should work for now.

Implementing the designs for the theory calculation

Starting from the approaches in the sandbox directory we will implement a superclass theory calculator and some concrete subclasses making simple theory calculations.

Since it is the fastest likelihood we're likely to use in practice we will start with a JLA-style supernova likelihood.

allow access to all CCL configuration options

In order to use the BCM models, we need to allow firecrown to configure all of the CCL parameters. Right now it only touches the cosmological ones.

add linear bias systematic for LSS

Add the linear bias systematic for LSS. See here

Default photo-z routine?

The PZsytematics modify Source attribute nz, we need to make sure that this is applied before all other SourceSystematics.
This may involve creating a default PZnone systematic that gets applied (to create Source attribute nz) in case a Source has no PZsystematic specified. This amounts to using the redshift distribution from SACC as truth, so we should consider triggering a warning.

enable template configs like galsim

Cosmic shear example broken

I got this error when running the cosmic shear example:
File "/Users/elisa/Documents/lsst_tjpcosmo/TJPCosmo/firecrown/ccl/statistics/two_point.py", line 22, in _cached_angular_cl l_logstep=1.15, l_linstep=6e4) TypeError: angular_cl() got an unexpected keyword argument 'l_logstep'
Looks like an incompatibility with CCL v2.

add real-space correlation function computations to 2pt stats

Right now the 2pt statistic class here is not able to compute real-space statistics. These should be added in the if-else block in the compute method, making the appropriate calls to CCL. See the class-level docstring for additional details.

write cosmosis example

this might help people

cc @joezuntz

examples todo:

emcee
nested sampler of some kind
max loglike + Fisher matrix

add number counts source

We should add the CCL number counts source.

compute a unique hash and other metadata for each call of firecrown

Every call of firecrown needs to have metadata associated with it so we can track what was done.

This should include:

a copy of the config file
a unique hash associated with each run of the code (called the analysis id)
The current version of the code.
The current version of CCL.
The git hash of the code if available.

This data should be written in YAML to an output file, in say

output_<analysis_id>/metadata.yaml
output_<analysis_id>/config.yaml

Large-scale assembly bias

We will need a model of assembly bias in CCL, and to expose that model in FireCrown.

Implement SL likelihoods

This is a ticket that is similar to SN likelihood, only that it operates with the time-delay distances, there is no marginalization over the absolute amplitude. Sufficiently different that it needs to be implemented separately, although the implementation can start efficiently as a copy-paste of SN likelihood.
See also ticket: #68

Implementing the designs for the likelihood calculation

We need the likelihood calculator, which combine theory results and a data set into a likelihood.

CCL benchmarking

I wasn't sure if this should be an issue on CCL or here, but here's a quick timing exercise I've carried out on the latest version of CCL (well, actually the version in this PR LSSTDESC/CCL#614).

The script below generates the theoretical predictions for an LSST-like 3x2pt data vector, using time to time the different parts of the calculation. For this exercise I've taken 10 lens bins and 10 source bins, I've assumed that we want all possible auto- and cross-correlations, and that we compute each of them at 100 ell values between 10 and 3000 (the calculation doesn't depend much on what those values are anyway). In this case the results are (numbers are seconds):

Timing summary:
 - Cosmology and Pk: 1.669
 - GC tracers:       0.098
 - WL tracers:       0.366
 - Cl gg:            0.930
 - Cl gl:            1.345
 - Cl ll:            0.707
--------------------------
 Total:              5.115

so it takes around the same time to initialize CLASS's power spectrum as it does to compute the C_ells (actually, a bit longer for the latter). Note that this is for a standard LCDM cosmology with no funny business (neutrino masses etc.), which would make CLASS slower.

If we instead assume that we only need to evaluate the power spectra at 15 values of ell, then CLASS becomes the bottleneck (followed by the computation of the lensing kernels):

Timing summary:
 - Cosmology and Pk: 1.706
 - GC tracers:       0.096
 - WL tracers:       0.354
 - Cl gg:            0.139
 - Cl gl:            0.212
 - Cl ll:            0.101
--------------------------
 Total:              2.607

So the question is: is this fast enough?.
E.g. an MCMC on a single core taking half a million samples would take around 1 month to run in the first case.
With the CLASS bottleneck (i.e. if we reduced all other calculations to zero time), we could reduce that to ~10 days. Is it worth going through the trouble of achieving this?

If the answer to the above is "yes", a quick exploration tells me that the time taken to compute C_ells can't be reduced significantly simply by lowering the GSL integrator accuracy (I've only managed to reduce the C_ell time by a factor ~1.5 by raising the tolerance one order of magnitude), and we should actually look into other integration schemes.

Anyway, sorry for the long issue. Here's the script I've used for this:

import numpy as np
import pyccl as ccl
from scipy.special import erf
import time

# Prepare redshift distribution and bias
nbins = 10
z_edges = np.linspace(0, 2.5, nbins + 1)
z_sigmas = 0.05 * (1 + 0.5 * (z_edges[1:] + z_edges[:-1]))
numz = 512
zz = np.linspace(0, 3, numz)
nz_all = zz**2*np.exp(-(zz/0.8)**1.5)
nzs=0.5*(erf((z_edges[1:, None] - zz[None,:]) / np.sqrt(2 * z_sigmas[:,None]**2)) -
         erf((z_edges[:-1, None] - zz[None,:]) / np.sqrt(2 * z_sigmas[:,None]**2)))
nzs *= nz_all[None,:]
bz = 0.95 * (1 + zz)

start_all = time.time()
# Set up cosmology object and compute power spectrum
start = time.time()
cosmo = ccl.Cosmology(Omega_c=0.26066676,
                      Omega_b=0.048974682,
                      h=0.6766,
                      sigma8=0.8102,
                      n_s=0.9665)
s8 = ccl.sigma8(cosmo)
time_cosmo = time.time()-start

# Create tracers
start = time.time()
gc_t = [ccl.NumberCountsTracer(cosmo, False, (zz, nz), (zz, bz)) for nz in nzs]
time_gc_tracers = time.time()-start
start = time.time()
wl_t = [ccl.WeakLensingTracer(cosmo, (zz, nz)) for nz in nzs]
time_wl_tracers = time.time()-start

# Compute power spectra
n_ell = 15
l_arr = np.logspace(np.log10(10),np.log10(3000),n_ell)
start = time.time()
cls = np.zeros([2*nbins, 2*nbins, n_ell])
for i1, t1 in enumerate(gc_t):
    for i2,t2 in enumerate(gc_t[i1:]):
        cls[i1, i1+i2, :] = ccl.angular_cl(cosmo, t1, t2, l_arr)
time_clgg = time.time()-start
start = time.time()
for i1,t1 in enumerate(gc_t):
    for i2,t2 in enumerate(wl_t):
        cls[i1, nbins + i2] = ccl.angular_cl(cosmo, t1, t2, l_arr)
time_clgl = time.time()-start
start = time.time()
for i1, t1 in enumerate(wl_t):
    for i2, t2 in enumerate(wl_t[i1:]):
        cls[nbins + i1, nbins + i1 + i2] = ccl.angular_cl(cosmo, t1, t2, l_arr)
time_clll = time.time()-start

time_all = time.time()-start_all

print("Timing summary:")
print(" - Cosmology and Pk: %.3lf" % time_cosmo)
print(" - GC tracers:       %.3lf" % time_gc_tracers)
print(" - WL tracers:       %.3lf" % time_wl_tracers)
print(" - Cl gg:            %.3lf" % time_clgg)
print(" - Cl gl:            %.3lf" % time_clgl)
print(" - Cl ll:            %.3lf" % time_clll)
print("--------------------------")
print(" Total:              %.3lf" % time_all)

Essential Infrastructure

This Epic is to track work on essential infrastructure to connect pieces of TJPCosmo together, as specified in the design notes I added under the doc directory.

integrate FASTPT into firecrown

We want to use FASTPT as a model for

extensions beyond the NLA IA models we use now
non-linear biasing

This issue encompasses the work needed to make that happen on the firecrown side. Related work on CCL is needed as well.

PZ uncertainty parameterization + marginalization

The SRD puts priors on PZ uncertainties in terms of the mean and width of Gaussian distributed uncertainties.
What parameterizations or marginalization schemes should fireCrown interface with long term? Will these marginalization schemes differ for the WL and LSS samples? Do these marginalization schemes place specific requirements on sampling methods?

Implementing the DataSet class and data file interface.

We need code to take turn a data file into a DataSet object.

This is probe-specific but we can think about some general aspects of it for a superclass and then get to a specific example for the SN case first.

Implement Cluster likelihood

The way this should be done is as follows:

Cluster likelihood will be composed of power spectra in auto and cross in bins of richness or some other tracer of cluster mass. sacc can do this. Additionaly, we need number counts for the same richness bins -- sacc should be extended to deal with this.
The rest is very similar to 3x2pt -- the correlation part should flow through the same code. The number counts should be implemented separately, but we have the code.

Some care needs to be taken re covariances... Let's have a first pass assuming independence and then work out what is the most natural way to implement covariances.

I'm assigning this to Eduardo Rozo, who should at least know to whom to offload this. @erozo

add `_version.py` file and integrate into `setup.py`

We need a _version.py file with the current version 0.1.0 in it. We also need the setup.py to read the version.

Adding these lines to the setup.py

__version__ = None
pth = os.path.join(os.path.dirname(os.path.realpath(__file__)), "firecrown", "_version.py")
with open(pth, 'r') as fp:
    exec(fp.read())

and putting the __version__ string in _version.py should do the trick. Finally, we need to specify the version in the call to setup via version=__version__. Mad props for importing the version to the firecrown/__init__.py as well!

Handle empty or nonexisting systematics section gracefully

In the current version, yaml must have a non-empty systematics section even if not used anywhere.

update the internal calls to CCL

PRs LSSTDESC/CCL#496 and LSSTDESC/CCL#493 will likely put API breaking changes into CCL. We will need to update our CCL API to account for this. I also suggest that we rename the attributes of our 2pt CCL source to match the new CCL API.

Documentation content

We need the actual content for the docs once we have the infrastructure.

This is ongoing work, and will include both tutorials and reference information.

We need to think about how much we replicate (if any) from the CCL and CosmoSIS documentation.

add functions to convert the parameters dictionary to a numpy array and back

This function is needed to interface with other tools, like emcee etc. Note that in many versions of python, dictionaries don't have any ordering promises on their keys. Thus this function needs to do something like sort the keys of the dict, and return the values in the key sorted order. The function to convert back should take in the parameter names in any order and a numpy array and return a dictionary with them mapped back to right spot.

Documentation: specify numpy version 1.15

I was able to run the 3x2pt.yaml example after I updated numpy. Previously I had 1.14 but it failed with this traceback:

Traceback (most recent call last):
  File "./bin/tjpcosmo", line 12, in <module>
    tjpcosmo.main.main(sys.argv[1:])
  File "/Users/tmcclintock/Github/TJPCosmo/tjpcosmo/main.py", line 136, in main
    run_cosmosis(cosmosis_args,ini=ini,values=values)
  File "/Users/tmcclintock/anaconda/envs/py36/lib/python3.6/site-packages/cosmosis/main.py", line 121, in run_cosmosis
    pipeline = LikelihoodPipeline(ini, override=args.variables, values=values)
  File "/Users/tmcclintock/anaconda/envs/py36/lib/python3.6/site-packages/cosmosis/runtime/pipeline.py", line 753, in __init__
    self.setup()
  File "/Users/tmcclintock/anaconda/envs/py36/lib/python3.6/site-packages/cosmosis/runtime/pipeline.py", line 436, in setup
    module.setup(config_block, quiet=self.quiet)
  File "/Users/tmcclintock/anaconda/envs/py36/lib/python3.6/site-packages/cosmosis/runtime/module.py", line 154, in setup
    self.data = self.setup_function(config)
  File "/Users/tmcclintock/Github/TJPCosmo/tjpcosmo/cosmosis_entry_point.py", line 39, in setup
    for name in config['correlated_probes']]
  File "/Users/tmcclintock/Github/TJPCosmo/tjpcosmo/cosmosis_entry_point.py", line 39, in <listcomp>
    for name in config['correlated_probes']]
  File "/Users/tmcclintock/Github/TJPCosmo/tjpcosmo/analyses/analysis.py", line 52, in from_dict
    data, metadata = cls.create_data(info)
  File "/Users/tmcclintock/Github/TJPCosmo/tjpcosmo/analyses/analysis.py", line 69, in create_data
    data = TwoPointDataSet.load(filename, info)
  File "/Users/tmcclintock/Github/TJPCosmo/tjpcosmo/analyses/twopoint/dataset.py", line 16, in load
    twp=cls(sacc_data,indices)
  File "/Users/tmcclintock/Github/TJPCosmo/tjpcosmo/analyses/twopoint/dataset.py", line 9, in __init__
    self.precision = np.linalg.inv(self.covariance) #TODO: optimize this through Cholesky
  File "/Users/tmcclintock/anaconda/envs/py36/lib/python3.6/site-packages/numpy/linalg/linalg.py", line 528, in inv
    ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
ValueError: On entry to DLASWP parameter number 6 had an illegal value

After updating numpy to 1.15 it works fine. This should be added to the README on the front page of the repo.

Complete IA and baryon modelling

There is some IA modelling in firecrown, e.g. LinearAlignmentSystematic. But we likely need multiple IA models and also some sort of marginalisation over baryons. The modelling should be at the minimum level we think we need for LSST Y1 data analysis and/or SRD forecasting.

Assigning to Eliza, we also want chihway chang and mustapha to be on board as WL conveners.

Linear bias

Linear bias models already exist in CCL and are ready to use - we just need to make sure they are all fully exposed.

Compute observable within firecrown

As example how to use CCL core functions to compute observables within firecrown, write firecrown module to calculate cluster number counts (with simple power-law MOR).

BUG make sure run emcee outputs actual floats in the yaml parameters

Right now the parameters being output in the yaml are rendered as numpy scalars and look ugly.

Sky systematics models

Eventually we will want to deal with sky and image level systematics that propagate into our summary statistics. These may end up being dealt with at earlier stages, but not necessarily.

Most of these will presumably end up as multiplicative or additive systematics in some way.

blending
sensor systematics
PSF-related systematics
others

add a linear intrinsic alignment systematic

The code for the linear intrinsic alignment systematic from the previous version is here. It needs to be converted to the new base class, tests and docstrings. Extra points for adding it to the cosmic shear example!

add methods for smooth predictions

We will want methods for smooth predictions. Not sure what the best thing is here.

Implementing the TheoryResults superclass

We need to implement the class that stores predicted theory calculations for a given model, the TheoryResults class.

The superclass will handle generic aspects of the data storage and the subclasses will implement specific theory data.

Try firecrown on NERSC

Since we have a functioning version of TJPCosmo, we could already try installing and running it on NERSC.

integrate firecrown with cosmosis

It would be nice to provide a cosmosis entry point so people can use that if they so desire. @joezuntz is def the person to talk to here.

add documentation for contributors

Add PZ widths systematics model

At the moment there are only PZ shifts but no PZ widths in firecrown/ccl/systematics/pz.py.
PZ widths seems like an easy thing to add. Long term we want a better way modelling PZ uncertainity, including potentially using samples of N(z) and/or eigen-vector like marginalization.

Let's make PZ responsible so that they feel so bad about it that they will implement something better.

Dependence of module six

CircleCI gives this error message when it runs checks, seems that schwimmbad has a dependence on six and causes this issue.

Collecting schwimmbad (from -r requirements.txt (line 8))
Downloading https://files.pythonhosted.org/packages/08/03/3df8857074b9d31756caf37dd6c611fff2e087d23829265771fc81d5bf52/schwimmbad-0.3.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-ysufplro/schwimmbad/setup.py", line 34, in
import schwimmbad
File "/tmp/pip-install-ysufplro/schwimmbad/schwimmbad/init.py", line 26, in
from .serial import SerialPool
File "/tmp/pip-install-ysufplro/schwimmbad/schwimmbad/serial.py", line 5, in
from .pool import BasePool
File "/tmp/pip-install-ysufplro/schwimmbad/schwimmbad/pool.py", line 6, in
import six
ModuleNotFoundError: No module named 'six'

Module to (optionally) maximize likelihood and calculate 2nd derivative matrix

Occasionally you want to do quite estimation of likelihood shifts under different assumption with a rough estimate of errors. In this case all you need to do is to maximize likelihood and calculate 2nd der matrix. This will also allow a Fisher-matrix like calculationa and perhaps even acta as a tool to estimate proposal matrix for MCMC.

allow access to prior values separate from the likelihood

We want to be able to pull out the value of the prior separate from the likelihood. This will require some way to denote which parts of the computations are priors and then corresponding API changes to allow us to compute them.

implement SNe likelihoods for the SRD

^ what it says up there

Loading data files - the DataSet objects

We are running into awkward code repetitions while implementing the stacked cluster (N + gamma_t) and stacked cluster x 3x2pt analyses.

The problem is the design philosophy behind what goes on inside of create_data(). That is, loading the data is associated with an analysis type, while twopoint, counts and count+twopoint are all loaded from SACC files. If we write a separate DataSet class + process function for each of these, most of the code is repeated, and it would make more sense to have one class + method to assemble the data vector and covariance from SACC, independent of the specific analysis. This will still be true once SACC is subsumed by DESCFormats.

@joezuntz @beckermr interested to hear your thoughts on whether it makes sense to separate the DataSet objects away from the analyses.

Magnification bias

Implement magnification bias model for galaxy number contrast following Joachimi and Bridle (2010), arXiv:0911.2454. Incorporate into correlation functions and marginalize over uncertainty in the galaxy number count luminosity slope.