Coder Social home page Coder Social logo

gamma-speed's Introduction

Licence

image

image

Latest release

PyPI - Downloads

image

image

image

image

image

image

Gammapy

A Python Package for Gamma-ray Astronomy.

Gammapy is an open-source Python package for gamma-ray astronomy built on Numpy, Scipy and Astropy. It is used as core library for the Science Analysis tools of the Cherenkov Telescope Array (CTA), recommended by the H.E.S.S. collaboration to be used for Science publications, and is already widely used in the analysis of existing gamma-ray instruments, such as MAGIC, VERITAS and HAWC.

Contributing Code, Documentation, or Feedback

The Gammapy Project is made both by and for its users, so we welcome and encourage contributions of many kinds. Our goal is to keep this a positive, inclusive, successful, and growing community by abiding with the Gammapy Community Code of Conduct.

The Gammapy project uses a mechanism known as a Developer Certificate of Origin (DCO). The DCO is a binding statement that asserts that you are the creator of your contribution, and that you wish to allow Gammapy to use your work to cite you as contributor. More detailed information on contributing to the project or submitting feedback can be found on the Contributing page.

Licence

Gammapy is licensed under a 3-clause BSD style license - see the LICENSE.rst file.

Supporting the project

The Gammapy project is not sponsored and the development is made by the staff of the institutes supporting the project over their research time. Any contribution is then encouraged, as punctual or regular contributor.

Status shields

(mostly useful for developers)

  • Codacy
  • GitHub actions CI
  • image

gamma-speed's People

Contributors

cdeil avatar ignatndr avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

gamma-speed's Issues

Pipeline issues

  1. The one core pipeline takes a lot longer to execute than would be normal. Why?
  2. Why does the pipeline sometimes crash

save_speed_up

Check out basic monitoring tools

For a given process we want to measure time series of CPU, memory and disk usage, e.g. log these quantities to an ascii file every second.

Please read the manuals or tutorials for these tools and see what kind of measurements they provide:

  • top and variants like atop and htop
  • iostat
  • ...

Things to pay attention to:

  • Is it possible to log to file? (If not it's pretty much useless for us.)
  • Is it possible to only monitor one process? How do you tell the tool which process you want to monitor?

Make a wiki page to document what you find out.

Some random links I found with Google that might be helpful:

Model and measure ctlike runtime

There should be simple approximate models for the ctlike runtime.

  • Unbinned: t = t(n_threads, n_obs, n_events)
  • Binned: t = t(n_threads, n_obs, n_bins)

E.g. a model for the unbinned case could be

t = A + B * (n_obs / n_threads) + C * (n_events / n_threads)

... or not ... we noticed that the unbinned ctlike runtime was the same in this case:

  • n_threads = 3, n_obs = 100, n_events = 1700k
  • n_threads = 3, n_obs = 100, n_events = 36k
    i.e. a factor of 50 in the number of events didn't matter.

In detail the runtime will of course also depend e.g. on the model and model parameter start values, but there should be regimes with simple runtime scaling behaviours.

Make monitor work on Mac

$ ./monitor.py "../scripts/use_cpu"
Traceback (most recent call last):
  File "./monitor.py", line 143, in <module>
    main()
  File "./monitor.py", line 140, in main
    cpuinterval=args.timeinterval)
  File "./monitor.py", line 46, in monitor
    self.process.get_io_counters()[0],
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/psutil/__init__.py", line 881, in __getattribute__
    %(self.__class__.__name__, name))
AttributeError: Popen instance has no attribute 'get_io_counters'

Write Python prototype parallel likelihood function

Some parts of the evaluation of the likelihood function can be parallelised, e.g. as described on slides 13 to 15 here the model has to be evaluated for a large number of bins and then the fit statistic computed and summed for those bins.

We should create an IPython notebook that implements certain steps in parallel using the Python multiprocessing module (an example is here so that we can do some prototyping and timing for which data sizes or model function evaluation costs splitting gives good speedups.

install_ctools.py deadlocks

The script manages to download the github repositories but when it has to execute ./configure it deadlocks. Below is the output I get:

$ ./install_ctools.py -log=True
INFO - Creating installation: extralog_install
remote: Counting objects: 27847, done.
remote: Compressing objects: 100% (6177/6177), done.
remote: Total 27847 (delta 21970), reused 27469 (delta 21595)
Receiving objects: 100% (27847/27847), 82.90 MiB | 1.33 MiB/s, done.
Resolving deltas: 100% (21970/21970), done.
remote: Counting objects: 2953, done.
remote: Compressing objects: 100% (1331/1331), done.
remote: Total 2953 (delta 1626), reused 2927 (delta 1600)
Receiving objects: 100% (2953/2953), 3.47 MiB | 450 KiB/s, done.
Resolving deltas: 100% (1626/1626), done.
INFO - software successfuly downloaded
INFO - Entered GAMMALIB install
Switched to a new branch 'gammaspeed_extra_log'
configure.ac:74: installing `./config.guess'
configure.ac:74: installing `./config.sub'
configure.ac:35: installing `./install-sh'
configure.ac:35: installing `./missing'
configure: WARNING: Python wrapper(s) missing. Requires swig for wrapper generation.
config.status: WARNING:  'src/gammalib-setup.in' seems to ignore the --datarootdir setting
libtool: link: warning: `-version-info/-version-number' is ignored for convenience libraries
libtool: link: warning: `-version-info/-version-number' is ignored for convenience libraries
libtool: link: warning: `-version-info/-version-number' is ignored for convenience libraries
^CERROR - GAMMALIB install failed
make[3]: *** [GCOMSupport.lo] Error 1
make[2]: *** [all-recursive] Interrupt
make[1]: *** [all-recursive] Interrupt
make: *** [all] Interrupt

I suppressed the installer after it did not want to work with ^C. For usage of ./install_ctools.py you can

./install_ctools.py -log=True for the extra logging version
./install_ctools.py -gen=True for the normal version

my best guesses are:

  1. the Popen is trying to execute configure, make and make install at the same time but I am not sure about it.
  2. somehow, the proc.wait() is slinging the whole thing into a deadlock

Efficiency issues

The parallel parts of the ctools code does not have a perfect efficiency. In theory, the speedup should be linear i.e. speedup = cores. In practice, the efficiency is less than perfect.

Below, the efficiency and speedup of ctobssim for the parallel part of the code.
ctobssim_amdahl_paralle_speed_up

Measure FITS I/O peformance

I'd like to measure some FITS I/O (raw and how it's used in gammalib) to see if the performance is good and compare to ROOT and HDF5.

I started some docs (only has some links at the moment).

Check out profiling tools

This is related to issue #6 , some tools do monitoring and / or profiling.

With "monitoring" CPU / memory / disk I/O usage I mean looking at a process as a whole.
With "profiling" I mean in addition to looking at the total process also looking where in the where in the code the CPU spends time (main focus), allocates memory, does disk I/O.

Here's some profiling tools:

Understand optimizer algorithm

@ignatndr Before attempting to profile ctlike we should understand the optimizer method used, i.e. the Levenberg–Marquardt algorithm:
https://cta-redmine.irap.omp.eu/projects/gammalib/wiki/GOptimizerLM

The wikipedia article seems like a good starting point with tons of references:
http://en.wikipedia.org/wiki/Levenberg–Marquardt_algorithm

Some links:

Find/build tool for file acces log

Since disk IO is proving to be more difficult to monitor than was originally foreseen, it would probably be useful to find a tool than can be used to show the order in which different files have been accessed(for read or write) by a certain process.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.