Coder Social home page Coder Social logo

clusters's Introduction

Documentation Status Code Health Travis CI build status (Linux)

WARNING: Package under development


Clusters

Python package wrapping up the ongoing cluster analysis of the LSST/DESC cluster group. For more info, see the two following github repositories:

See also the private Trello board that we use to share our work.

Installation

To install:

git clone https://github.com/nicolaschotard/Clusters.git
pip install Clusters/

To install in a local directory mypath, use:

pip install --prefix='mypath' Clusters/

and do not forget to add it to your PYTHONPATH.

To upgrade to a new version (after a git pull or a local modification), use:

pip install --upgrade (--prefix='mypath') Clusters/

To install a release version (no release version available yet):

pip install http://github.com/nicolaschotard/Cluster/archive/v0.1.tar.gz

Also works with the master:

pip install (--upgrade) https://github.com/nicolaschotard/Clusters/archive/master.zip

In the future, release versions will be listed at this location.

Package developers will want to run:

python setup.py develop

Dependencies

Clusters has for now the following dependencies (see the quick installs below):

Photometric redshift estimators:

Python

To install the python dependencies, simply do:

pip install -r requirements.txt

DM stack quick install

This four-step procedure should allow you to install and configure a light version of the DM stack, but complete enough to use the Clusters package. It should take ~10 minutes.

  • Get and install miniconda, if you do not have it already:

    wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -O miniconda.sh
    bash miniconda.sh -b -p $HOME/miniconda
    export PATH="$HOME/miniconda/bin:$PATH"
    conda config --set always_yes yes --set changeps1 no
    conda update -q conda
    
  • Install the needed part of the DM stack (we do not need the entire stack):

    conda config --add channels http://conda.lsst.codes/stack/0.12.0
    conda create -q -n lsst python=2.7
    source activate lsst
    conda install -q gcc lsst-daf-persistence lsst-log lsst-afw lsst-skypix lsst-meas-algorithms lsst-pipe-tasks
    
  • Install a compatible version of the obs_cfht package, which is not yet avalaible in the conda repository:

    setup daf_persistence
    git clone https://github.com/lsst/obs_cfht.git
    cd obs_cfht
    git checkout b7ab2c4
    setup -k -r .
    scons opt=3
    eups declare -r . -t yourname
    
  • To use this install of the DM stack, do not forget these following setups:

    export PATH="$HOME/miniconda/bin:$PATH"
    source activate lsst
    source eups-setups.sh
    setup daf_persistence
    setup obs_cfht -t yourname
    

If these steps went well, you should be able to use clusters_data.py on one of the outputs of the DM stack (see below to get some data).

LEPHARE quick install

You can download and install a pre-configured version of LEPHARE as followed:

  • for linux system:

    wget https://lapp-owncloud.in2p3.fr/index.php/s/MDaXObLSD9IVQ1B/download -O lephare.tar.gz
    tar zxf lephare.tar.gz
    
  • for mac:

    wget https://lapp-owncloud.in2p3.fr/index.php/s/bMTLiwfGK1SpOqE/download -O lephare.tar.gz
    tar zxf lephare.tar.gz
    

When the download is complete, exctract the lephare directory where it suits you (mypath in this example), and set the following environment variables (use setenv if needed):

export LEPHAREWORK="mypath/lephare/lephare_work"
export LEPHAREDIR="mypath/lephare/lephare_dev"
export PATH="$PATH:mypath/lephare/lephare_dev/source"

You should now be able to run clusters_zphot.py (only tested on linux systems).

BPZ quick install

The following steps can be copied/pasted in order to install and test BPZ quickly. It supposes that LEPHARE has been installed following the procedure shown in the previous section (you need $LEPHAREDIR/filt/cfht/megacam/\*.pb). Here are the official install instructions for BPZ.

Get BPZ:

export MYDIR="an install dir" # change that line
cd MYDIR
wget http://www.stsci.edu/~dcoe/BPZ/bpz-1.99.3.tar.gz
tar -xvf bpz-1.99.3.tar.gz

Create needed enironment vairables:

export BPZPATH="$MYDIR/bpz-1.99.3"
export PYTHONPATH=$PYTHONPATH:$BPZPATH
export NUMERIX=numpy

Create the filter files using the LEPHARE install:

cd $BPZPATH/FILTER/
cp $LEPHAREDIR/filt/cfht/megacam/*.pb .
for f in *.pb; do mv "$f" "CFHT_megacam_${f%.pb}.res"; done

Test the install and the megacam filter:

wget https://lapp-owncloud.in2p3.fr/index.php/s/FP1vSMB7emLxwwg/download -O megacam_bpz.columns
wget https://lapp-owncloud.in2p3.fr/index.php/s/HZbzCFLoy8Lcmwx/download -O megacam_bpz.in
python $BPZPATH/bpz.py megacam_bpz.in -INTERP 2

Configuration file

All the scripts will take the same input YAML file, which contains necessary informations for the analysis or simply for plotting purpose, such as the name of the studied cluster. Keys are listed below and are case-sensitive. Additional keys are simply ignored. You can find examples of these configuration files in the config directory, or clicking here for MACSJ2243.3-0935.

General keys Type Description [units]
"cluster" string Name of the cluster
"ra" float RA coordinate of the cluster [deg]
"dec" float DEC coordinate of the cluster [deg]
"redshift" float Cluster redshift
"butler" string Absolute path to the intput data (butler)
"filter" list List of filters to be considered, e.g., 'ugriz' (Megacam filters)
"patch" list List of patches to study

The following list of optional keys can also be added to the configuration file. They correspond to specific configurations of the different steps of the analysis. While the previous list will most likely stay unchanged, the following one will be completed with new keys as this analysis will progress.

Optional keys Type Description [units]
"keys" dict Dictionnary containing list of keys for the catalogs (see below)
"zpara" list List of paths to zphota configuration files (see below)
"zspectro_file" string File containing spectroz sample for LePhare training
  • keys is a dictionary having the name of the different catalogs like deepCoadd_meas, deepCoadd_forced_src and forced_src. The list of keys for a given catalog can include:
    • "the_full_name_of_a_key";
    • "*_a_part_of_a_key_name" or "an_other_part_of_a_key_name*" preceded or followed by a *;
    • a combination of all the above: ["key1", "ke*", "*ey"];
    • or a "*" to get all keys available in a catalog, which is the default value for all catalogs.

General usage

Clusters consists in several command-line executables that you have to run in the right order.

  • Get the input data and dump them in a hdf5 file containing astropy tables (see the data format section of the documentation for detail):

    clusters_data.py config.yaml (--output data.hdf5)
    

The memory you will need to load the data from the butler will for now depend on the number of catalogs (e.g. the forced_src catalog), patch, visits and CCD you will be loading. For instance, if you try to load ~10 patches for 5 filters, and want all the keys of several catalogs including the forced_src one (CCD-based), you could need up to 16GB of memory. The best practice would thus be to first check the list of existing keys of the catalogs you want to load (--show option), fill the configuration file with your selected list of keys using the keys parameter for each catalog, and finally run clusters_data.py using this configuration file. You can find an example for such cofiguration file there and some detail on how to use the keys in the previous section. This will allow you to adapt the content of the output file and work with lighter data files.

  • Data validation plots can for now be found in the several notebooks available in:

    https://github.com/nicolaschotard/Clusters/tree/master/notebooks
    
  • Correct the data for Milky Way extinction:

    clusters_extinction.py config.yaml data.hdf5 (--output extinction.hdf5)
    
  • Get the photometric redshift using LEPHARE:

    clusters_zphot.py config.yaml data.hdf5 (--extinction extinction.hdf5) (--output zphot.hdf5)
    

The configuration file(s) used in LEPHARE can be given with the option --zpara. The code will loop over the different files and run LEPHARE for each of them. All results are saved in the same hdf5 file. This list of configuration files can also be given in the CONFIG.yaml file (see above). --zpara will overwrite what is given in the configuration file.

  • Identify galaxies to be removed from the whole sample: Red sequence galaxies identified from color-color diagrams, foreground galaxies identified using photometric redshifts:

    clusters_getbackground.py config.yaml cat_data.hdf5 z_data.hdf5 (--zmin z_min)
                              (--zmax z_max) (--thresh_prob threshold)
    
    • cat_data.hdf5 is the catalogue from which magnitudes are read to produce the red sequence cut. After the background galaxies are identified, the astropy table 'deepCoadd_meas' is supplemented 3 boolean 'flag' columns (RS_flag, z_flag_hard, z_flag_pdz), corresponding to the RS cut and the hard and pdz redshift cuts. If True the object passed the cut and is to be kept. If --output is used, the table is also saved in a separate *_background.hdf5 file
    • z_data.hdf5 is the output file of clusters_zphot.py containing the pdz information.
    • z_min, z_max are used for a 'hard' redshift cut: all galaxies in [z_min, z_max] are flagged.
    • threshold: if the probability of a galaxy to be located at z < z_cluster + 0.1 is larger than threshold [%], the galaxy is flagged to be removed.
  • Compute the shear:

    clusters_shear config.yaml input.hdf5 output.hdf5
    
  • A pipeline script which run all the above step in a raw with standard options:

    clusters_pipeline config.yaml
    

With any command, you can run with -h or --help to see all the optional arguments, e.g., clusters_data.py -h.

Test the code

If you have installed all the dependencies previoulsy mentionned, download the following test data set:

wget https://lapp-owncloud.in2p3.fr/index.php/s/xG2AoS2jggbmP0k/download -O testdata.tar.gz
tar zxf testdata.tar.gz

The testdata directory contains a subset of the reprocessing data available for MACSJ2243.3-0935. It can be used as a test set of the code, but is not complete enough to run the full analysis. Here is the full structure and content of the directory, which has the exact same structure as a regulare DM stack output directory:

testdata/
├── input
│   ├── _mapper
│   └── registry.sqlite3
├── output
│   ├── coadd_dir
│   │   ├── deepCoadd
│   │   │   ├── g
│   │   │   │   └── 0
│   │   │   │       ├── 1,5
│   │   │   │       └── 1,5.fits
│   │   │   └── skyMap.pickle
│   │   ├── deepCoadd-results
│   │   │   └── g
│   │   │       └── 0
│   │   │           └── 1,5
│   │   │               ├── bkgd-g-0-1,5.fits
│   │   │               ├── calexp-g-0-1,5.fits
│   │   │               ├── detectMD-g-0-1,5.boost
│   │   │               ├── det-g-0-1,5.fits
│   │   │               ├── forced_src-g-0-1,5.fits
│   │   │               ├── meas-g-0-1,5.fits
│   │   │               ├── measMD-g-0-1,5.boost
│   │   │               └── srcMatch-g-0-1,5.fits
│   │   ├── forced
│   │   │   └── 08BO01
│   │   │       └── SCL-2241_P1
│   │   │           └── 2008-09-03
│   │   │               └── g
│   │   │                   └── 0
│   │   │                       ├── FORCEDSRC-1022175-00.fits
│   │   │                       ├── FORCEDSRC-1022175-09.fits
│   │   │                       ├── FORCEDSRC-1022176-00.fits
│   │   │                       ├── FORCEDSRC-1022176-09.fits
│   │   │                       ├── FORCEDSRC-1022177-00.fits
│   │   │                       ├── FORCEDSRC-1022177-09.fits
│   │   │                       ├── FORCEDSRC-1022178-00.fits
│   │   │                       ├── FORCEDSRC-1022178-09.fits
│   │   │                       ├── FORCEDSRC-1022179-00.fits
│   │   │                       ├── FORCEDSRC-1022179-09.fits
│   │   │                       ├── FORCEDSRC-1022180-00.fits
│   │   │                       └── FORCEDSRC-1022180-09.fits
│   │   └── _parent -> ../
│   └── _parent -> ../input/
└── travis_test.yaml

With this data set, you should be able to test most of the Clusters parts. You can start with the test suite available in the tests directory. To do so, use:

python setup.py test

It will use the testdata that you have downloaded previoulsy and run the tests. This is also usefull if your goal is to add new tests.

Get the data

Raw DM stack outputs

If you have installed Clusters but do not have any data to run it on, you can use one of our re-processing outputs for MACSJ2243.3-0935. The corresponding configuration file is stored under configs/. To use it, you either need to be connected at CC-IN2P3, or change the path to the butler inside the config file (if you already have a copy of this data). You could also mount sps on your personal computer (see this how to).

clusters_data.py output

The first step of the Clusters package is clusters_data.py, which will get the data from the DM butler, convert them into astropy tables and save them in a single hdf5 file. To do so, you need the LSST DM stack to be installed. If you want to skip this part and try the code whithout having to install the DM stack, you could also use the outputs of this first step that you can download from this repository, which contains the following files:

|-- CL0016
|   |-- [4.4G]  CL0016_data.hdf5                     # full data set
|   |-- [334M]  CL0016_filtered_data.hdf5            # only quality-filtered galaxies
|   `-- [ 312]  CL0016.yaml                          # configuration file
|-- MACSJ224330935
|   |-- [5.6G]  MACSJ2243.3-0935_data.hdf5           # full data set
|   |-- [367M]  MACSJ2243.3-0935_filtered_data.hdf5  # only quality-filtered galaxies
|   |-- [ 329]  MACSJ2243.3-0935.yaml                # configuration file

This short tutorial explains how to use these hdf5 files to start an analysis.

clusters's People

Contributors

combet avatar boutigny avatar humnaawan avatar deapplegate avatar rbliu avatar

Watchers

 avatar

clusters's Issues

Code ignores shear calibration

WtG used a size and snratio dependent shear calibration. Those lines of code have been commented out for the DMstack input, both in the I/O and the likelihood calculation.

I/O

Likelihood

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.