Coder Social home page Coder Social logo

aimalz / chippr Goto Github PK

View Code? Open in Web Editor NEW
0.0 7.0 3.0 49 MB

Cosmological Hierarchical Inference with Probabilistic Photometric Redshifts

License: MIT License

Python 24.71% Makefile 0.01% TeX 70.79% Jupyter Notebook 4.44% Shell 0.04%

chippr's Introduction

chippr

Cosmological Hierarchical Inference with Probabilistic Photometric Redshifts

Motivation

This repository is the home of chippr, a Python package for estimating quantities of cosmological interest from surveys of photometric redshift posterior probability distributions.
It is a refactoring of previous work on using probabilistic photometric redshifts to infer the redshift distribution.

Examples

You can browse the demo notebook here:

Documentation

Documentation can be found on ReadTheDocs.
The draft of the paper documenting the details of the method can be found here.

Disclaimer

As can be seen from the git history and Python version, this code is stale and should be understood to be a prototype, originally scoped out for applicability to SDSS DR10-era data of low dimensionality. As a disclaimer, it will need a major upgrade for flexibility and computational scaling before it can run on data sets like those of modern and future galaxy surveys.

People

License, Contributing etc

The code in this repo is available for re-use under the MIT license, which means that you can do whatever you like with it, just don't blame me. If you end up using any of the code or ideas you find here in your academic research, please cite me as Malz et al, in preparation\footnote{\texttt{https://github.com/aimalz/chippr}}. If you are interested in this project, please do drop me a line via the hyperlinked contact name above, or by writing me an issue. To get started contributing to the chippr project, just fork the repo -- pull requests are always welcome!

chippr's People

Contributors

aimalz avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

chippr's Issues

Simplify/speed up analysis

Travis tests are taking a long time with the current demo notebook. I'm going to simplify the tests in the notebook and use separate, local scripts to run more meaningful tests. Similar to #16, I'm going to return to the same branch

Add math to the demo

The demo goes over use of the code without any connection to the math. In preparation for publication, I'm going to make the demo representative of the paper's content.

Proper multi-threading

My tests currently run in parallel via external scripts, not using an inherent chippr mode. The reason is that the function that evaluates the posterior over n(z) is not pickleable, so emcee multi-threading doesn't work. This issue is for either choosing a sampler that somehow doesn't care about this shortcoming (I suspect there may not be such a thing) or making the function pickleable. It's a low priority for the scopes of paper 0 (the model and proof of concept) and paper 1 (demonstration on toy test cases) but essential for paper 2 (more realistic and/or real data at scale).

Improve file formats

I'm currently storing data in plaintext, which is obviously unacceptable for nontrivial survey sizes! I'll update that soon.

Add plotting options

As with the vb keyword, I will add an option to automatically make informative plots throughout the catalog generation process and inference procedures.

Respond to referee report

The referee requests the following:

  • Upgrade to Python 3.
  • Calculate and include concrete benchmarking values of computational expense.
  • Assess accuracy of MCMC error bars on distribution of n(z) samples.
  • Add back cosmology forecast.
  • Cite an appearance of the analytic n(z) of Eqn. 13.
  • Explain chippr's overestimate at redshift extrema.
  • Remove repeated sentence in conclusion.

I would like to take this opportunity to do the following:

  • Cite more recent n(z) estimation papers. (also 2011.01836)
  • Update references (for publication status and formatting).
  • Rerun with lower Gelman-Rubin threshold.
  • Calculate and plot Delta z.
  • Clean up demo notebook with CosmoLike propagation.
  • Put CosmoLike data files on GitHub so others can run forecasting notebook.
  • Eliminate redundant plotting code.
  • Refactor and write unit tests.
  • Isolate catalog generation code and migrate to zeppole.

Improve user interface with error checks

Currently the input file/parameter dictionary format is pretty sloppy. I want to introduce some sensible checks that halt execution if inputs don't make sense.

I'm also going to use this issue to implement more use of the vb keyword to print updates to screen.

Clean up test cases and plots

I made a big mess and am making a branch to simply clean up for my thesis. This issue can be closed when the scripts yield the plots I need for my thesis.

Literature review

The literature review from the old version is a good place to start on putting together an introduction for the chippr paper.

Scale up number of galaxies

chippr needs to be able to obtain posterior samples of n(z) for catalogs of ~1 million galaxies and will need an infrastructure overhaul to do so. In particular, multiprocessing will have to happen at the level of inference, not just in wrapper scripts for tunning different test conditions.

Fix the sampler

Something has gone wrong, and the mean of the samples is not approaching the MMLE.

Split/rescope paper

The paper is now overgrown and needs to be pruned, but the three points it makes are all valuable. This issue can be closed when @davidwhogg is satisfied with the three smaller papers into which I will split the current version. For reference, the papers will be scoped as follows:

  • Paper 1: A thorough presentation of the CHIPPR model and chippr prototype code, which are currently Sec. 2 and App. A, demonstrated on the three canonical forms of photo-z systematic error (bias, scatter, catastrophic outliers), which are currently Secs. 3.1, 3.2, 3.3.
  • Paper 2: A deeper exploration of the implicit prior and model misspecification, which is currently Secs. 3.4 and 4.2.
  • Paper 3: The propagation of CHIPPR results to cosmological parameter constraints, which is currently Sec. 4.1.

I wonder if there should also be another paper about the forward model of photo-z PDFs, which I've been meaning to separate out into a standalone package anyway. . .

New fiducial case

The new fiducial case must include the three effects about which LSST (and other surveys) are most concerned:

  • RMS scatter sigma_z / (1+z) ~ 0.05
  • 3-sigma outlier fraction ~ 0.1
  • bias (z_p - z_s) / (1+z_s) ~ 0.003

It should also have a smooth n(z) consistent with the CosmoLike analysis.

Paper catch-up

In general, I shouldn't have separate branches for the paper because it should always be updated along with the code and any test results. But, I still need to do some catching up to make that possible since #55 was only just resolved.

Quantile-quantile plot

Since I'm now doing inference of n(z), which is a normalized probability distribution function, instead of N(z), I could implement a quantile-quantile plot to compare different estimators to the truth.

This is related to #23 and #33 but I'm making it a new issue anyway because it's about content that wasn't in the old version.

Optimize the optimizer

I'm going to try to break the optimizer with limiting cases that broke it in the old version. If those problems remain, I'll experiment with different optimization algorithms and step sizes (and possibly other parameters of those algorithms) to understand why those failures happen and hopefully find ways around them.

Finalize test conditions

Should the true n(z) in the test cases be kept as is or should we use something else for the paper? The same goes for the prior probability distribution and the details of the test cases (i.e at what redshift should outlier populations be located, etc.).

Epic: paper 0

I'm going to collect issues here now that this is progressing again.

  • Gather relevant references. (#65)
  • Document demo for publication. (#54)
  • Finish validating/interpreting propagation to cosmology (#49)
  • [ ] Scale up to more galaxies. (#60) migrated to Future Analysis milestone
  • [ ] Make a forecast for Euclid. (#69) migrated to Future Analysis milestone
  • Polish the sloppy plots.
  • Write a brief version of the math into an appendix.
  • Write a brief version of the mock data emulation procedure into an appendix.
  • Run sampler as well as optimizer.

Euclid forecast

Can you propagate the different n(z) estimates in your investigation to the predictions of shear 2PCF for a Euclid-like survey?

Refactor catalog production

The final test case in #25, that of catastrophic outliers as seen in empirical p(z) methods, suggests a different approach that's very similar to how the previous version was at the very end.

All the test cases currently implemented in #25 begin with true redshifts, devise p(z_est | z_true, params), and use that to get p(z_true | z_est, params). What I should really be doing is devising a probability distribution in the space of p(z_true, z_est | params, n_true(z)), sampling pairs (z_true, z_est), and evaluating p(z_true | z_est, params) as the horizontal cuts through the z_true, z_est plot.

Here are some sketchy notes to remind me of how I want this to work, accounting for how the true n(z) and interim prior must enter into catalog construction in this way:
20170627_131138

Pass nontrivial tests (extend support for challenging cases)

The old version of the code supported different physically-motivated test cases that must be re-implemented. EDIT: I will only implement the tests outlined in #55, but there are still some choices that must be made!

  • unfeatured true n(z)
  • (sdss interim prior as truth)
  • fiducial case
  • (featured truth, constant standard deviations for each galaxy)
  • high intrinsic scatter
  • (include (1+z) dependence in standard deviations?)
  • template-like catastrophic outliers
  • (constant-ish likelihood components)
  • training-like catastrophic outliers
  • (multimodal likelihood components)
  • template-like interim prior
  • (multimodal interim prior)
  • training-like interim prior
  • (low-z favoring interim prior)

Implement MCMC diagnostics

Currently, there are no diagnostics for the sampler. There should be checking of a burn-in condition to discard non-converged samples, and other convergence measures should be recorded.

Generalize n(z)/p(z) parameterization

As qp shows, redshift posteriors and redshift density functions can be parameterized in many possible ways. chippr unfortunately was scoped out before I'd really thought about a catalog that might not use the piecewise constant parameterization of SDSS. KiDS has photo-z PDFs parameterized as Gaussians and wants n(z) with way more parameters than chippr can actually handle. This issue can be closed when:

  • chippr can accept Gaussian photo-z PDFs without forcing them onto a grid
  • chippr can yield n(z) in a different parameterization than the input photo-z PDFs

Epic: refactoring

This issue is for collecting programming issues that are driving me nuts but aren't actually holding back completion of the paper.

  • Establish consistent keyword/variable names (#15)
  • Improve user interface with error checks (#26)
  • Improve file formats (#51)
  • Diagnostic printouts to file (#52)
  • Eliminate the abundant magic numbers and give more control to the user via an improved config file interface.
  • Use qp (or pomegranate) for probability distributions to replace discrete.py, gauss.py, and gmix.py (#68)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.