Coder Social home page Coder Social logo

perf-timer's Introduction

Build status Package version Supported Python versions

PerfTimer

An indispensable performance timer for Python

Background

Taxonomy

Three general tools should be employed to understand the CPU performance of your Python code:

  1. sampling profiler - measures the relative distribution of time spent among function or lines of code during a program session. Limited by sampling resolution. Does not provide call counts, and results cannot be easily compared between sessions.
  2. microbenchmark timer (timeit) - accurately times a contrived code snippet by running it repeatedly
  3. instrumenting timer - accurately times a specific function or section of your code during a program session

PerfTimer is a humble instance of #3. It's the easiest way (least amount of fuss and effort) to get insight into call count and execution time of a function or piece of code during a real session of your program.

Use cases include:

  • check the effects of algorithm tweaks, new implementations, etc.
  • confirm the performance of a library you are considering under actual use by your app (as opposed to upstream's artificial benchmarks)
  • measure CPU overhead of networking or other asynchronous I/O (currently supported: OS threads, Trio async/await)

Yet another code timer?

It seems everyone has tried their hand at writing one of these timer utilities. Implementations can be found in public repos, snippets, and PyPi— there's even a Python feature request. That's not counting all the proprietary and one-off instances.

Features of this library:

  • flexible - use as a context manager or function decorator; pluggable logging, timer, and observer functions
  • low overhead (typically a few microseconds) - can be employed in hot code paths or even enabled on production deployments
  • async/await support (Trio only) - first of its kind! Periods when a task is is sleeping, blocked by I/O, etc. will not be counted.
  • percentile durations - e.g. report the median and 90th percentile execution time of the instrumented code. Implemented with a bounded-memory, streaming histogram.

Usage

Typical usage is to create a PerfTimer instance at the global scope, so that aggregate execution time is reported at program termination:

from perf_timer import PerfTimer

_timer = PerfTimer('process thumbnail')

def get_thumbnail_image(path):
    img = cache.get_thumbnail(path)
    if not thumbnail:
        img = read_image(path)
        with _timer:
            img.decode()
            img.resize(THUMBNAIL_SIZE)
        cache.set_thumbnail(img)
    return img

When the program exits, assuming get_thumbnail_image was called several times, execution stats will be reported to stdout as follows:

timer "process thumbnail": avg 73.1 µs ± 18.0 µs, max 320.5 µs in 292 runs

decorator style

To instrument an entire function or class method, use PerfTimer as a decorator:

@PerfTimer('get thumbnail')
def get_thumbnail_image(path):
    ...

histogram statistics

By default PerfTimer will track the average, standard deviation, and maximum of observed values. Other available observers include HistogramObserver, which reports (customizable) percentiles:

import random
import time
from perf_timer import PerfTimer, HistogramObserver

_timer = PerfTimer('test', observer=HistogramObserver, quantiles=(.5, .9))
for _ in range(50):
    with _timer:
        time.sleep(random.expovariate(1/.1))

del _timer

output:

timer "test": avg 117ms ± 128ms, 50% ≤ 81.9ms, 90% ≤ 243ms in 50 runs

custom logging

A custom logging function may be passed to the PerfTimer constructor:

import logging

_logger = logging.getLogger()
_timer = PerfTimer('process thumbnail', log_fn=_logger.debug)

OS thread support

To minimize overhead, PerfTimer assumes single-thread access. Use ThreadPerfTimer in multi-thread scenarios:

from perf_timer import ThreadPerfTimer

_timer = ThreadPerfTimer('process thumbnail')

async support

In the previous example, timing the entire function will include file I/O time since PerfTimer measures wall time by default. For programs which happen to do I/O via the Trio async/await library, you can use TrioPerfTimer which measures time only when the current task is executing:

from perf_timer import TrioPerfTimer

@TrioPerfTimer('get thumbnail')
async def get_thumbnail_image(path):
    img = cache.get_thumbnail(path)
    if not thumbnail:
        img = await read_image(path)
        img.decode()
        img.resize(THUMBNAIL_SIZE)
        cache.set_thumbnail(img)
    return img

(Open challenge: support other async/await libraries)

trio_perf_counter()

This module also provides the trio_perf_counter() primitive. Following the semantics of the various performance counters in Python's time module, trio_perf_counter() provides high resolution measurement of a Trio task's execution time, excluding periods where it's sleeping or blocked on I/O. (TrioPerfTimer uses this internally.)

from perf_timer import trio_perf_counter

async def get_remote_object():
    t0 = trio_perf_counter()
    msg = await read_network_bytes()
    obj = parse(msg)
    print('task CPU usage (seconds):', trio_perf_counter() - t0)
    return obj

Installation

pip install perf-timer

Measurement overhead

Measurement overhead is important. The smaller the timer's overhead, the less it interferes with the normal timing of your program, and the tighter the code loop it can be applied to.

The values below represent the typical overhead of one observation, as measured on ye old laptop (2014 MacBook Air 11 1.7GHz i7).

$ pip install -r test-requirements.txt
$ python benchmarks/overhead.py
compare observers:
    PerfTimer(observer=AverageObserver):         1.5 µs
    PerfTimer(observer=StdDevObserver):          1.8 µs  (default)
    PerfTimer(observer=HistogramObserver):       6.0 µs

compare types:
    PerfTimer(observer=StdDevObserver):          1.8 µs
    ThreadPerfTimer(observer=StdDevObserver):    9.8 µs
    TrioPerfTimer(observer=StdDevObserver):      4.8 µs

TODO

  • features
  • project infrastructure
    • code coverage integration
    • publish docs
    • type annotations and check

perf-timer's People

Contributors

belm0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

perf-timer's Issues

subject to ref-counting bugs of other modules

If the decorator variant of PerfTimer is used in a program with ref-counting bugs, stats may never be emitted at program exit.

For example, there is apparently a ref-counting bug in numba (numba/numba#5799), and I've observed the bad interaction between PerfTimer and numba.

TODO: consider atexit

trio async timings should include child tasks

For perf-timer async support (Trio), I implemented the smallest possible thing: within a task, measure time excluding periods where the task is de-scheduled (sleeping, blocked on i/o or an event, etc.)

However, there is a strong use case for including the time spent in descendant tasks of the instrumented task (spawned by a nursery). This aggregate time would directly correlate to CPU usage. For example, if you measured the duration of async function foo() including child tasks as 5 s, while the total application runtime was 100 s, that means foo() was responsible for 5% of the CPU use.

PerfTimer with trio.to_thread.run_sync

Hello John,

I bumped into the issue below.

import trio
from perf_timer import PerfTimer


def dummy():
    pass


@PerfTimer("handle_events_01")
async def handle_events() -> None:
    # await trio.to_thread.run_sync(dummy)
    await trio.sleep(1)


async def async_main():
    with trio.move_on_after(10.0):
        while True:
            await handle_events()
            await trio.sleep(0)


trio.run(async_main)

PerfTimer (TrioPerfTimer) reports me that timer "handle_events_01": average 1.01 s, max 1.02 s in 9 runs but when I uncomment this line await trio.to_thread.run_sync(dummy) it no longer shows anything at the end of program.

Do I miss something?

Thanks for your help in advance.

Btw the lib is really helpful. 👍

feature request: min µs

from the examples in https://github.com/belm0/perf-timer:

timer "process thumbnail": avg 73.1 µs ± 18.0 µs, max 320.5 µs in 292 runs

  • average is massively cool
  • deviation is massively cool
  • max is massively cool
  • number of runs is also very cool

why is min missing?!? that makes no sense to me.

`time.thread_time` is not available on some OS X builds

It appears time.thread_time is not available on mac os (https://bugs.python.org/issue32093) and thus trying to use this package will raise ImportError 😢

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-277-0f19c673cd74> in <module>
----> 1 from perf_timer import PerfTimer, HistogramObserver

~/opt/anaconda3/envs/py37/lib/python3.7/site-packages/perf_timer/__init__.py in <module>
----> 1 from ._impl import (PerfTimer, ThreadPerfTimer, AverageObserver,
      2                     StdDevObserver, HistogramObserver, measure_overhead)
      3 try:
      4     from ._trio import trio_perf_counter, TrioPerfTimer
      5 except ImportError:

~/opt/anaconda3/envs/py37/lib/python3.7/site-packages/perf_timer/_impl.py in <module>
      8 from inspect import iscoroutinefunction
      9 from multiprocessing import Lock
---> 10 from time import perf_counter, thread_time
     11 
     12 from perf_timer._histogram import ApproximateHistogram

ImportError: cannot import name 'thread_time' from 'time' (unknown location)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.