The clusters from deapplegate

WARNING: Package under development

Clusters

Python package wrapping up the ongoing cluster analysis of the LSST/DESC cluster group. For more info, see the two following github repositories:

A collection of notebooks for LSST
The ReprocessingTaskForce repository

See also the private Trello board that we use to share our work.

Installation

To install:

git clone https://github.com/nicolaschotard/Clusters.git
pip install Clusters/

To install in a local directory mypath, use:

pip install --prefix='mypath' Clusters/

and do not forget to add it to your PYTHONPATH.

To upgrade to a new version (after a git pull or a local modification), use:

pip install --upgrade (--prefix='mypath') Clusters/

To install a release version (no release version available yet):

pip install http://github.com/nicolaschotard/Cluster/archive/v0.1.tar.gz

Also works with the master:

pip install (--upgrade) https://github.com/nicolaschotard/Clusters/archive/master.zip

In the future, release versions will be listed at this location.

Package developers will want to run:

python setup.py develop

Dependencies

Clusters has for now the following dependencies (see the quick installs below):

Python 2.7 and libraries listed in the requirements file
The LSST DM stack.

Photometric redshift estimators:

LEPHARE
BPZ

Python

To install the python dependencies, simply do:

pip install -r requirements.txt

DM stack quick install

This four-step procedure should allow you to install and configure a light version of the DM stack, but complete enough to use the Clusters package. It should take ~10 minutes.

Get and install miniconda, if you do not have it already:

wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -O miniconda.sh
bash miniconda.sh -b -p $HOME/miniconda
export PATH="$HOME/miniconda/bin:$PATH"
conda config --set always_yes yes --set changeps1 no
conda update -q conda

Install the needed part of the DM stack (we do not need the entire stack):

conda config --add channels http://conda.lsst.codes/stack/0.12.0
conda create -q -n lsst python=2.7
source activate lsst
conda install -q gcc lsst-daf-persistence lsst-log lsst-afw lsst-skypix lsst-meas-algorithms lsst-pipe-tasks

Install a compatible version of the obs_cfht package, which is not yet avalaible in the conda repository:

setup daf_persistence
git clone https://github.com/lsst/obs_cfht.git
cd obs_cfht
git checkout b7ab2c4
setup -k -r .
scons opt=3
eups declare -r . -t yourname

To use this install of the DM stack, do not forget these following setups:

export PATH="$HOME/miniconda/bin:$PATH"
source activate lsst
source eups-setups.sh
setup daf_persistence
setup obs_cfht -t yourname

If these steps went well, you should be able to use clusters_data.py on one of the outputs of the DM stack (see below to get some data).

LEPHARE quick install

You can download and install a pre-configured version of LEPHARE as followed:

for linux system:

wget https://lapp-owncloud.in2p3.fr/index.php/s/MDaXObLSD9IVQ1B/download -O lephare.tar.gz
tar zxf lephare.tar.gz

for mac:

wget https://lapp-owncloud.in2p3.fr/index.php/s/bMTLiwfGK1SpOqE/download -O lephare.tar.gz
tar zxf lephare.tar.gz

When the download is complete, exctract the lephare directory where it suits you (mypath in this example), and set the following environment variables (use setenv if needed):

export LEPHAREWORK="mypath/lephare/lephare_work"
export LEPHAREDIR="mypath/lephare/lephare_dev"
export PATH="$PATH:mypath/lephare/lephare_dev/source"

You should now be able to run clusters_zphot.py (only tested on linux systems).

BPZ quick install

The following steps can be copied/pasted in order to install and test BPZ quickly. It supposes that LEPHARE has been installed following the procedure shown in the previous section (you need $LEPHAREDIR/filt/cfht/megacam/\*.pb). Here are the official install instructions for BPZ.

Get BPZ:

export MYDIR="an install dir" # change that line
cd MYDIR
wget http://www.stsci.edu/~dcoe/BPZ/bpz-1.99.3.tar.gz
tar -xvf bpz-1.99.3.tar.gz

Create needed enironment vairables:

export BPZPATH="$MYDIR/bpz-1.99.3"
export PYTHONPATH=$PYTHONPATH:$BPZPATH
export NUMERIX=numpy

Create the filter files using the LEPHARE install:

cd $BPZPATH/FILTER/
cp $LEPHAREDIR/filt/cfht/megacam/*.pb .
for f in *.pb; do mv "$f" "CFHT_megacam_${f%.pb}.res"; done

Test the install and the megacam filter:

wget https://lapp-owncloud.in2p3.fr/index.php/s/FP1vSMB7emLxwwg/download -O megacam_bpz.columns
wget https://lapp-owncloud.in2p3.fr/index.php/s/HZbzCFLoy8Lcmwx/download -O megacam_bpz.in
python $BPZPATH/bpz.py megacam_bpz.in -INTERP 2

Configuration file

All the scripts will take the same input YAML file, which contains necessary informations for the analysis or simply for plotting purpose, such as the name of the studied cluster. Keys are listed below and are case-sensitive. Additional keys are simply ignored. You can find examples of these configuration files in the config directory, or clicking here for MACSJ2243.3-0935.

General keys	Type	Description [units]
`"cluster"`	string	Name of the cluster
`"ra"`	float	RA coordinate of the cluster [deg]
`"dec"`	float	DEC coordinate of the cluster [deg]
`"redshift"`	float	Cluster redshift
`"butler"`	string	Absolute path to the intput data (butler)
`"filter"`	list	List of filters to be considered, e.g., 'ugriz' (Megacam filters)
`"patch"`	list	List of patches to study

The following list of optional keys can also be added to the configuration file. They correspond to specific configurations of the different steps of the analysis. While the previous list will most likely stay unchanged, the following one will be completed with new keys as this analysis will progress.

Optional keys	Type	Description [units]
`"keys"`	dict	Dictionnary containing list of keys for the catalogs (see below)
`"zpara"`	list	List of paths to `zphota` configuration files (see below)
`"zspectro_file"`	string	File containing spectroz sample for LePhare training

keys is a dictionary having the name of the different catalogs like deepCoadd_meas, deepCoadd_forced_src and forced_src. The list of keys for a given catalog can include:
- "the_full_name_of_a_key";
- "*_a_part_of_a_key_name" or "an_other_part_of_a_key_name*" preceded or followed by a *;
- a combination of all the above: ["key1", "ke*", "*ey"];
- or a "*" to get all keys available in a catalog, which is the default value for all catalogs.

General usage

Clusters consists in several command-line executables that you have to run in the right order.

Get the input data and dump them in a hdf5 file containing astropy tables (see the data format section of the documentation for detail):
```
clusters_data.py config.yaml (--output data.hdf5)
```

The memory you will need to load the data from the butler will for now depend on the number of catalogs (e.g. the forced_src catalog), patch, visits and CCD you will be loading. For instance, if you try to load ~10 patches for 5 filters, and want all the keys of several catalogs including the forced_src one (CCD-based), you could need up to 16GB of memory. The best practice would thus be to first check the list of existing keys of the catalogs you want to load (--show option), fill the configuration file with your selected list of keys using the keys parameter for each catalog, and finally run clusters_data.py using this configuration file. You can find an example for such cofiguration file there and some detail on how to use the keys in the previous section. This will allow you to adapt the content of the output file and work with lighter data files.

Data validation plots can for now be found in the several notebooks available in:
```
https://github.com/nicolaschotard/Clusters/tree/master/notebooks
```

Correct the data for Milky Way extinction:

clusters_extinction.py config.yaml data.hdf5 (--output extinction.hdf5)

Get the photometric redshift using LEPHARE:

clusters_zphot.py config.yaml data.hdf5 (--extinction extinction.hdf5) (--output zphot.hdf5)

The configuration file(s) used in LEPHARE can be given with the option --zpara. The code will loop over the different files and run LEPHARE for each of them. All results are saved in the same hdf5 file. This list of configuration files can also be given in the CONFIG.yaml file (see above). --zpara will overwrite what is given in the configuration file.

Identify galaxies to be removed from the whole sample: Red sequence galaxies identified from color-color diagrams, foreground galaxies identified using photometric redshifts:
```
clusters_getbackground.py config.yaml cat_data.hdf5 z_data.hdf5 (--zmin z_min)
                          (--zmax z_max) (--thresh_prob threshold)
```
- cat_data.hdf5 is the catalogue from which magnitudes are read to produce the red sequence cut. After the background galaxies are identified, the astropy table 'deepCoadd_meas' is supplemented 3 boolean 'flag' columns (RS_flag, z_flag_hard, z_flag_pdz), corresponding to the RS cut and the hard and pdz redshift cuts. If True the object passed the cut and is to be kept. If --output is used, the table is also saved in a separate *_background.hdf5 file
- z_data.hdf5 is the output file of clusters_zphot.py containing the pdz information.
- z_min, z_max are used for a 'hard' redshift cut: all galaxies in [z_min, z_max] are flagged.
- threshold: if the probability of a galaxy to be located at z < z_cluster + 0.1 is larger than threshold [%], the galaxy is flagged to be removed.

Compute the shear:

clusters_shear config.yaml input.hdf5 output.hdf5

A pipeline script which run all the above step in a raw with standard options:
```
clusters_pipeline config.yaml
```

With any command, you can run with -h or --help to see all the optional arguments, e.g., clusters_data.py -h.

Test the code

If you have installed all the dependencies previoulsy mentionned, download the following test data set:

wget https://lapp-owncloud.in2p3.fr/index.php/s/xG2AoS2jggbmP0k/download -O testdata.tar.gz
tar zxf testdata.tar.gz

The testdata directory contains a subset of the reprocessing data available for MACSJ2243.3-0935. It can be used as a test set of the code, but is not complete enough to run the full analysis. Here is the full structure and content of the directory, which has the exact same structure as a regulare DM stack output directory:

testdata/
├── input
│   ├── _mapper
│   └── registry.sqlite3
├── output
│   ├── coadd_dir
│   │   ├── deepCoadd
│   │   │   ├── g
│   │   │   │   └── 0
│   │   │   │       ├── 1,5
│   │   │   │       └── 1,5.fits
│   │   │   └── skyMap.pickle
│   │   ├── deepCoadd-results
│   │   │   └── g
│   │   │       └── 0
│   │   │           └── 1,5
│   │   │               ├── bkgd-g-0-1,5.fits
│   │   │               ├── calexp-g-0-1,5.fits
│   │   │               ├── detectMD-g-0-1,5.boost
│   │   │               ├── det-g-0-1,5.fits
│   │   │               ├── forced_src-g-0-1,5.fits
│   │   │               ├── meas-g-0-1,5.fits
│   │   │               ├── measMD-g-0-1,5.boost
│   │   │               └── srcMatch-g-0-1,5.fits
│   │   ├── forced
│   │   │   └── 08BO01
│   │   │       └── SCL-2241_P1
│   │   │           └── 2008-09-03
│   │   │               └── g
│   │   │                   └── 0
│   │   │                       ├── FORCEDSRC-1022175-00.fits
│   │   │                       ├── FORCEDSRC-1022175-09.fits
│   │   │                       ├── FORCEDSRC-1022176-00.fits
│   │   │                       ├── FORCEDSRC-1022176-09.fits
│   │   │                       ├── FORCEDSRC-1022177-00.fits
│   │   │                       ├── FORCEDSRC-1022177-09.fits
│   │   │                       ├── FORCEDSRC-1022178-00.fits
│   │   │                       ├── FORCEDSRC-1022178-09.fits
│   │   │                       ├── FORCEDSRC-1022179-00.fits
│   │   │                       ├── FORCEDSRC-1022179-09.fits
│   │   │                       ├── FORCEDSRC-1022180-00.fits
│   │   │                       └── FORCEDSRC-1022180-09.fits
│   │   └── _parent -> ../
│   └── _parent -> ../input/
└── travis_test.yaml

With this data set, you should be able to test most of the Clusters parts. You can start with the test suite available in the tests directory. To do so, use:

python setup.py test

It will use the testdata that you have downloaded previoulsy and run the tests. This is also usefull if your goal is to add new tests.

Get the data

Raw DM stack outputs

If you have installed Clusters but do not have any data to run it on, you can use one of our re-processing outputs for MACSJ2243.3-0935. The corresponding configuration file is stored under configs/. To use it, you either need to be connected at CC-IN2P3, or change the path to the butler inside the config file (if you already have a copy of this data). You could also mount sps on your personal computer (see this how to).

`clusters_data.py` output

The first step of the Clusters package is clusters_data.py, which will get the data from the DM butler, convert them into astropy tables and save them in a single hdf5 file. To do so, you need the LSST DM stack to be installed. If you want to skip this part and try the code whithout having to install the DM stack, you could also use the outputs of this first step that you can download from this repository, which contains the following files:

|-- CL0016
|   |-- [4.4G]  CL0016_data.hdf5                     # full data set
|   |-- [334M]  CL0016_filtered_data.hdf5            # only quality-filtered galaxies
|   `-- [ 312]  CL0016.yaml                          # configuration file
|-- MACSJ224330935
|   |-- [5.6G]  MACSJ2243.3-0935_data.hdf5           # full data set
|   |-- [367M]  MACSJ2243.3-0935_filtered_data.hdf5  # only quality-filtered galaxies
|   |-- [ 329]  MACSJ2243.3-0935.yaml                # configuration file

This short tutorial explains how to use these hdf5 files to start an analysis.

deapplegate / clusters Goto Github PK

clusters's Introduction

Clusters

Installation

Dependencies

Python

DM stack quick install

LEPHARE quick install

BPZ quick install

Configuration file

General usage

Test the code

Get the data

Raw DM stack outputs

clusters_data.py output

clusters's People

Contributors

Watchers

clusters's Issues

Recommend Projects

Recommend Topics

Recommend Org

`clusters_data.py` output