Coder Social home page Coder Social logo

fhee's Introduction

Fermi High Energy Explorer (FHEE)

This is the repository for a Python tutorial for gamma-ray astronomers given by Axel Donath and Christoph Deil in November 2015 at the PyGamma15 workshop.

It's for advanced beginners, i.e. gamma-ray astronomers that have used FITS files and written a script that uses Python and Astropy before, but don't know much about Python functions, classes, modules, packages, tests, docs yet.

If you want to follow along in the tutorial by running the code examples and doing the exercise at the end, please follow the instructions in the Preparation / Requirements section below and install the required software before the tutorial starts!

How does the tutorial work?

The basic idea of the tutorial is to start with some bad Python code and incrementally turn it into better Python code. We will go from buggy spaghetti code to a well-structured Python package with tests and docs and functionality that can be re-used (installed and imported from other packages and shared with colleagues as a tarball).

This will be mostly a demo, where we do live coding and explain what's going on. In the second half we'll introduce a bunch of Python development tools (e.g. pytest to run tests or sphinx to generate HTML documentation), there it's easier if you want to follow along and run the commands for yourself.

We've structured the tutorial into a series of TODO steps and put the starting point for each step of this project into folders called vXX. You can also use these during or after the tutorial to try stuff out.

At the end we'll leave 30 minutes for an exercise where you get to apply the newly learned skills and extend the package by writing a function with docs and tests.

What's the fhee package?

In this tutorial we'll create the Fermi-LAT High Energy Explorer (fhee) package together. The goal is to write some code to find the highest-energy photons near 2FHL catalog sources.

The input data files are:

  • data/gll_psch_v08.fit.gz -- The Fermi-LAT 2FHL catalog (downloaded from FSSC)
  • data/2fhl_events.fits.gz -- The event list corresponding to 2FHL (obtained from Marco Ajello with permission to share publicly on November 12, 2015)

We've added soft links to some of the versions of this package to avoid duplicating the data files (a few MB) in the git repo. E.g. v01/gll_psch_v08.fit.gz is a soft link to data/gll_psch_v08.fit.gz.

Goals

The goals of this tutorial are:

  • Take your Python skills to the next level, from writing a script for yourself to writing re-usable, maintainable code that would be appropriate for a contribution to the open-source packages we'll be sprinting on at this workshop.
  • Introduce you to some Python developer tools that will help you if you decide to do more Python coding from now on.

Do all of this using a very small toy problem / package, which you can use as a playground during and after the tutorial.

Hopefully you'll have some fun and find the example we've chosen interesting!

Preparation / Requirements

If you want to follow along during the tutorial by coding and running commands yourself and doing the exercise, you should git clone the https://github.com/gammapy/fhee repo (or download it as a zip file and extract it).

And you should install the following software listed here.

We recommend you follow the instructions here (if there's no conda package, use pip) to simultaneously install a scientific Python version 2.7 and 3.4, (because one of the things we'll explain is the differences between Python 2 and 3):

The following packages / tools we only use as command line tools, not Python packages, and they work the same whether you install it using Python 2 or 3. So it's enough to install those in one version of Python (using the Python 3.4 version is fine, here):

We'll also demo PyCharm ... install it if you'd like to try it out, but if you have another editor you like for Python programming, that's OK, too:

  • PyCharm - The most intelligent Python IDE

Tutorial playbook

Intro

  • Clone the repository with git clone https://github.com/gammapy/fhee.git, if you haven't done yet.

  • Take a few min. to read the analysis script in v01/analyse_data.py and try to understand what it does. Some questions you might adress are:

  • Is the code easy to understand?

  • Do you trust the results, or are there likely bugs in the code?

  • Is the code efficient enough?

  • Are the results easily reproducible?

  • Might the code (or some parts of the code) be useful for others?

v01 to v02 -- Improve code

Fix bugs:

  • Use e.g. from IPython import embed; embed() to interactively check the code.
  • Python uses zero based indexing, fix line 56 and 16 + 17

Improve code style:

  • Move 'import' statements to the top of the file
  • Put whitespace around operators like +, -, *, /, =, ...
  • Put whitespace after ','
  • Don't use builtin Python keywords such as list, dict, etc. as variable names
  • Use inline comments to explain what the code does, but don't comment on obvious things

Use suitable, existing data structures:

  • Replace lines 6 - 13 by using an astropy.Table. Use self-explaining variable names such as catalog_2fhl and event_list_2fhl.
  • Check table attributes and methods like .colnames, sort() and .show_in_browser()
  • Replace lines 28 - 38 using an astropy.Table
  • Use astropy.units
  • Use astropy.coordinates

Improve code effiency:

  • Replace the first Python loop (line 16 - 22) with a corresponding numpy expression
  • Replace lines 44 - 49 using table indexing/masking

Refactor code into classes / functions:

  • Write a 'Catalog' class, that is initilized with the filename and stores the data in a 'self.table' attribute. Add a method 'get_source_by_name()', that returns the corrsponding row.
  • Write an 'EventList' class with a method 'select_events_in_circle()', that returns a corresponding subset of the event list

Make the classes reusable:

  • Add docstrings to the classes and methods using triple quotes.

  • Write a main function find_2fhl_highest_energy_events()

  • Add if __name__ == '__main__:' to the script and call the main function with the corresponding parameters.

  • Now code should be roughly like v02

v02 to v03 -- Add a setup.py

  • Start with v02
  • Put data files in data folder (adapt source code) and rename module to fhee.py:
$ tree .
.
├── data
│   ├── 2fhl_events.fits.gz -> ../../data/2fhl_events.fits.gz
│   └── gll_psch_v08.fit.gz -> ../../data/gll_psch_v08.fit.gz
├── fhee.py
└── setup.py
  • Write a setup.py so that the code can be installed. See here and here for an example how to write it:
from setuptools import setup

setup(
    name='fhee',
    version=0.1,
    py_modules=['fhee'],
)
  • Show how python setup.py install installs Python packages using a virtualenv:
$ pyvenv-3.4 --system-site-packages venv
$ source venv/bin/activate
  • Now code should be roughly like v03.
  • One issue we still have is that the data files aren't installed, i.e. running the code from the installed version won't work. We'll fix this in the next version, after re-structuring the single-file module into a multi-file package.

v03 to v04 -- Restructure module to package

  • Start with v03
  • Restructure into a package:
.
├── Makefile
├── fhee
│   ├── __init__.py
│   ├── app.py
│   ├── catalog.py
│   ├── data
│   │   ├── 2fhl_events.fits.gz -> ../../../data/2fhl_events.fits.gz
│   │   └── gll_psch_v08.fit.gz -> ../../../data/gll_psch_v08.fit.gz
│   ├── event_list.py
│   └── tests
│       ├── __init__.py
│       ├── test_app.py
│       ├── test_catalog.py
│       └── test_event_list.py
└── setup.py
  • The Makefile is just to clean generated files, it's not related to setup.py or needed for Python.
$ cat Makefile
clean:
	rm -rf dist *.egg-info build
	find . -name "*.pyc" -exec rm {} \;
	find . -name __pycache__ | xargs rm -fr
  • The setup.py file has changed a bit and now also supports installing the data files and to declare some metadata about our package:
from setuptools import setup

setup(
    name='fhee',
    version=1.0,
    description='Fermi high-energy explorer',
    url='https://github.com/gammapy/fhee',
    packages=['fhee', 'fhee.tests'],
    install_requires=['numpy', 'astropy'],
    package_data={
        'fhee': ['data/*'],
    },
    license='MIT',
)
  • Explain imports
    • implicit relative (only works on Python 2, don't use this!)
    • explicit relative (OK)
    • absolute (OK)
  • Add 2 lines of boilerplate to every source file:
# Licensed under a 3-clause BSD style license - see LICENSE.rst
from __future__ import absolute_import, division, print_function, unicode_literals

The future import makes Python 2 behave more like Python 3. (All of our code was already Python 2 / 3 compatible, so it doesn't make a difference in this case, but it usually does.)

  • Now code should be roughly like v04

  • To run pytest:

py.test fhee
  • To make a coverage report pip install pytest-cov and run
py.test fhee --cov=fhee --cov-report html
open htmlcov/index.html
  • Finally, let's add some documentation for fhee. Like all Python projects, we'll use the Sphinx tool and the restructured text (RST) markup language.
  • We'll use sphinx-quickstart and answer some questions to generate an index.rst, conf.py and Makefile:
mkdir docs
cd docs
sphinx-quickstart # answer questions interactively
make html
open _build/html/index.html
  • Next we write some high-level docs in RST (titles, code blocks, references), re-running make html and re-freshing the browser to see if the markup is OK.
  • To auto-generate API documentation we'll use the sphinx.ext.autodoc (include documentation from docstrings) and sphinx.ext.napoleon extension (support for Numpy style docstrings), and we'll tell Sphinx to add .. to sys.path, which implies that Sphinx finds and imports the fhee package from the source folder (which is one level up from the docs folder):
sys.path.insert(0, os.path.abspath('..'))
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']

Now we can use the automodule directive at the place in the docs where we want the API docs:

.. automodule:: fhee
   :members:
  • Comment: Sphinx is approximately as good / bad as LaTeX or Doxygen. It's very powerful and extensible, but also very complex. There's some things we haven't set up yet, e.g. intersphinx or multi-page documentation or running the sphinx build via python setup.py sphinx_build or adding plots or ... As an example of a relatively small package with relatively nice docs, check out https://github.com/astropy/astroplan.

Code analysis and transformation tools

  • This is an optional part, will be skipped if we're short on time...
  • cd bad_code and see the bad.py file
  • python-modernize and six - Python 2 / 3 compatible code
  • pep8 and autopep8, static code analysis

Exercise

Mission: Possible!

Your mission, should you decide to accept it, is to use the rest of the tutorial (30 min?) to apply your newly learned skills to extend the fhee package with a new function that finds the 2FHL sources with the highest-energy event nearby.

  • Find a partner and do pair programming!

  • Decide who will be the "driver" and who will be the "observer".

  • Start with a clean version of the repo and the V04 folder and a new feature branch:

git status # should show no changes
git checkout -b most-energetic
cd V04

There are three main steps:

  1. Add code
  2. Add tests
  3. Add docs

Actually these steps can be done in any order, there is a lot to be said for test-driven development (write the test first) or documentation-driven development (write the docs first).

In practice it's often a creative an iterative procedure ... write one first test, write some code to make it pass, write another test, write some more code, interactively debug and fix issues with IPython, add a docstring, do some more coding, add one more test, then the high-level docs and make a pull request.

Everyone has to find their own workflow that's effective for them, and different workflows work better or worse for different cases.

Here's detailed instructions what to do:

  • Add a function find_most_energetic_2fhl_sources in the file fhee/app.py that takes arguments n_sources and radius and returns an astropy.table.Table with n_sources rows (sorted by highest-energy event near that source) and columns Source_Name, Event_Energy (TeV), Event_Offset (deg).

  • Add a test function test_find_most_energetic_2fhl_sources in the file fhee/tests/test_app.py that executes

table = find_most_energetic_2fhl_sources(n_sources=3, radius=0.5)

and then does a few assert statements on the result table:

assert len(table) == 4
source = table[2] # get the third row
assert source['Source_Name'] == 'spam'
assert source['Energy'] == 42.4

All of these assert statements will fail because the reference value is incorrect. For float number assertions you should use numpy.testing.assert_allclose. It is instructive to see what failing test reports from pytest look like though, so please take some time to play with this. We'd even encourage you to add code that raises errors (e.g. 1/0 will raise a ZeroDivisionError) in various places to practice reading pytest error reports (e.g. in test_find_most_energetic_2fhl_sources, in find_most_energetic_2fhl_sources as well as in the modules containing these functions at the top level, so that the error happens during test collection, not test running).

  • Check the code coverage for the new code you added.

  • Add a docstring to your new function and add it to the __all__ list at the top of the file. Then re-run the Sphinx build and see if the API docs for your function show up and are correctly formatted.

Sphinx errors can sometimes be hard to pin-point and resolve, so please intentionally insert some restructured text and Numpy docstring formatting errors and try to understand the Sphinx warnings and errors.

Add a one-sentence description and a code example to the high-level narrative docs in docs/index.rst and include a link to the new function.

  • If you've never make a contribution on Github before, you can make a pull request with your code if you like.

    (Just to practice git / Github a bit, we won't merge it anyways, no worries if the code is unfinished or you didn't get around to writing tests or docs.)

Wrap-up

Some questions:

  • Who learned something new?
  • Did you accept / complete the mission?
  • Do you think the fhee package is good code now?
  • Do you think it is worth the extra effort to make the code modular, add tests, docs, package it up?
  • What factor in time do you think it takes to go from working Python script to production-quality code? 2, 5, 10, 20 times as long?

Some comments:

  • As you saw, it is possible to create and share a Python package via Github and PyPI within a few hours.
  • We think that it's great if you do that for code you've written that's useful for your colleagues!
  • But there's also a concern. There's 1000s of small open-source Python packages written by one person that are somewhat useful, but aren't used much and are unmaintained, because the author moved on to another project or job.
  • So start your own project if you like, but please also consider contributing to an existing package! We think that fewer, higher-quality packages with a small community of users and developers / maintainers is better and starting such collaborations is an explicit goal of the PyGamma15 workshop.

fhee's People

Contributors

adonath avatar cdeil avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.