Coder Social home page Coder Social logo

rolling-quantiles's Introduction

Rolling Quantiles for NumPy

Python tests

Hyper-efficient and composable filters.

  • Simple, clean, intuitive interface.
  • Streaming or batch processing.
  • Python 3 bindings for a lean library written in pure C.

A Quick Tour

Let me give you but a superficial overview of this module's elegance.

import numpy as np
import rolling_quantiles as rq

pipe = rq.Pipeline( # rq.Pipeline is the only stateful object
  # declare a cascade of filters by a sequence of immutable description objects
  rq.LowPass(window=201, portion=100, subsample_rate=2),
    # the above takes a median (101th element out of 201) of the most recent 200
    # points and then spits out every other one
  rq.HighPass(window=10, portion=3))
    # that subsampled rolling median is then fed into this filter that takes a
    # 30% quantile on a window of size 10, and subtracts it from its raw input

# the pipeline exposes a set of read-only attributes that describe it
pipe.lag # = 60.0, the effective number of time units that the real-time output
         #   is delayed from the input
pipe.stride # = 2, how many inputs it takes to produce an output
            #  (>1 due to subsampling)


input = np.random.randn(1000)
output = pipe.feed(input) # the core, singular exposed method

# every other output will be a NaN to demarcate unready values
subsampled_output = output[1::pipe.stride]

Example Signal

That may be a lot to take in, so let me break it down for you:

  • rq.Pipeline(description...) constructs a filter pipeline from one or more filter descriptions and initializes internal state.
  • .feed(*) takes in a Python number or np.array and its output is shaped likewise.
  • The two filter types are rq.LowPass and rq.HighPass that compute rolling quantiles and return them as is, and subtract them from the raw signal respectively. Compose them however you like!
  • NaNs in the output purposefully indicate missing values, usually due to subsampling. If you pass a NaN into a LowPass filter, it will slowly deplete its reserve and continue to return valid quantiles until the window empties completely.
  • rq.LowPass and rq.HighPass alternatively take in a quantile=q argument, 0<=q<=1. The filters would perform a linear interpolation in this case. In order to control the statistical characteristics of this quantile estimate, parameters alpha and beta are exposed as well with default values (1, 1). Refer to SciPy's documentation for details on this aspect.
interpolated_pipe = rq.Pipeline(
    # attempt to estimate the exact 40%-quantile by the
    # default linear interpolation with parameters (1, 1)
    rq.LowPass(window=30, quantile=0.4),
    # here, the estimate is "approximately unbiased" in
    # the case of Gaussian white noise
    rq.HighPass(window=10, quantile=0.3,
      alpha=3/8, beta=3/8))

See this Wikipedia section for an elucidating overview.

I also expose a convenience function rq.medfilt(signal, window_size) at the top-level of the package to directly supplant scipy.signal.medfilt.

That's it! I detailed the entire library. Don't let the size of its interface fool you!

Installation

Downloads

If you are running Linux, MacOS, or Windows with Python 3.8+ and NumPy ~1.20, execute the following:

pip install rolling-quantiles

These are the conditions under which binaries are built and sent to the Python Package Index, which holds pip's packages. Should the NumPy version be unsuitable, for instance, I suggest building the package from source. This is rather straightforward because the handful of source files in C have absolutely minimal dependencies.

Building from Source

The meat of this package is a handful of C files with no external dependencies, besides NumPy 1.16+ and Python 3.7+ for the bindings located in src/python.c. As such, you may build from source by running the following from the project's root directory:

  1. cd python
  2. Check pyproject.toml to make sure the listed NumPy version matches your desired target. The compiled package will be forward- but not backward-compatible.
  3. python -m build (make sure this invokes Python 3)
  4. pip install dist/<name_of_generated_wheel_file>.whl

Note of Caution on MacOS Big Sur

Make sure to specify MACOSX_DEPLOYMENT_TARGET=10.X as a prefix to the build command, e.g. python -m build. The placeholder X can be any MacOS version earlier than Big Sur (I use 9.) By default, the build system would attempt to build for MacOS 11 that is incompatible with current Python interpreters that have been compiled against a prior version.

Benchmarking a median filter on 100 million doubles.

I make use of binary heaps that impart desirable guarantees on their amortized runtime. Realistically, their performance may depend on the statistics of the incoming signal. I pummeled the filters with Gaussian Brownian motion to gauge their practical usability under a typical drifting stochastic process.

window rolling_quantiles [1] scipy [2] pandas [3]
4 14 seconds 22 seconds 25 seconds
10 21 seconds 47 seconds 31 seconds
20 28 seconds 95 seconds 35 seconds
30 30 seconds 140 seconds 37 seconds
40 34 seconds 190 seconds 40 seconds
50 36 seconds 242 seconds 40 seconds
1,000 61 seconds N/A 62 seconds

Likewise, with simulated Gaussian white noise (no drift in the signal):

window rolling_quantiles [1] scipy [2] pandas [3]
4 14 seconds 22 seconds 25 seconds
10 20 seconds 51 seconds 31 seconds
20 25 seconds 105 seconds 36 seconds
30 27 seconds 156 seconds 39 seconds
40 30 seconds 218 seconds 41 seconds
50 30 seconds 279 seconds 42 seconds
1,000 45 seconds N/A 70 seconds

Intel(R) Core(TM) i7-8700T CPU @ 2.40GHz, single-threaded performance on Linux. My algorithm looked even better (relative to pandas) on a 2020 MacBook Pro. Check out this StackOverflow answer for a particular use case.

[1] rq.Pipeline(...)

[2] scipy.signal.medfilt(...)

[3] pd.Series.rolling(*).quantile(...)

Brought to you by Myrl

rolling-quantiles's People

Contributors

hoverhell avatar marmarelis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rolling-quantiles's Issues

Feature request: Centered roll

As is, the roll is left-aligned, or 'real-time'. This is great for some uses, but it would be great if there was an option to have a 'centered' option such that the edges are rolled all the way to the end (decreasing number of elements in sliding window towards the edges). Such options exist in rolling stats packages like pandas.rolling(center=True, min_periods=1).quantile(q).
I'm aware that a centered method can be achieved by effectively lagging the output by pipe.lag. However, it is not possible to recover the right edge of the output. To be clear, when you compare the output traces from these two functions, you can clearly see how rolling_quantiles is not able to achieve the same kind of output.

import numpy as np
import pandas as pd
import rolling_quantiles as rq
import matplotlib.pyplot as plt

import multiprocessing as mp
from functools import partial

def rp_pandas(x_in, win_len, ptile=10):
    x_in_df = pd.Series(x_in)
    x_out_pd = x_in_df.rolling(window=win_len, center=True, min_periods=1).quantile(quantile=(ptile/100))
    return np.array(x_out_pd)

def rp_rq_centered(x_in, win_len, ptile=10):
    pipe = rq.Pipeline( rq.LowPass(window=win_len, quantile=(ptile/100)) )
    lag = int(np.floor(pipe.lag))
    return pipe.feed(x_in)[lag:]

def mp_rolling_quantile(x_in, win_len, ptile, function):
    pool = mp.Pool(processes=None)
    results = pool.map(partial(function , win_len=win_len, ptile=ptile), [x_in[ii] for ii in np.arange(x_in.shape[0])])
    pool.close()
    pool.join()
    return np.row_stack(results)

x_in = np.random.rand(100, 100)
win_len = 11
ptile = 50


output_pd = mp_rolling_quantile(x_in, win_len, ptile, rp_pandas)
output_rq = mp_rolling_quantile(x_in, win_len, ptile, rp_rq_centered)

%matplotlib notebook
plt.figure()
plt.plot(x_in[0])
plt.plot(output_pd[0][0:], linewidth=2)
plt.plot(output_rq[0][0:], linewidth=2)
plt.legend(['x_in', 'pd: centered=True, min_periods=1', 'rq: centered'])

image

Feature request: parallel processing implementation

Hi, love the package. Thank you for making it.

I noticed that there is a block of code in the 'illustrations.py' file (here) demonstrating some code that hasn't been implemented yet in the code base:

rq.LineUp(rq.Pipeline) # possibly parallelized execution of parallel pipelines
big_input = np.random.randn(100, 1000)
# broadcast. route one row to each pipeline.
big_output = pipes.feed(big_input) # respects Fortran or C ordering to preserve cache locality

I would greatly benefit from being able to parallelize these computations. Multithreading and numba don't seem to improve speed through their parallelization methods. A CPU process monitor shows that most CPU cores are not being utilized, and there are expected errors in attempting to parallelize the rq.Pipeline object. Adding the LineUp feature to your package would be greatly appreciated.

Thanks, Rich

Impossible to install with python 3.9.X

Context:
python 3.9.12
Ubuntu 22
pip 22.1.1

This packages is impossible to install using pip on python 3.9.X:

pip install rolling-quantiles
ERROR: Could not find a version that satisfies the requirement rolling-quantiles (from versions: none)
ERROR: No matching distribution found for rolling-quantiles

Installation using setup.py fails


 python rolling-quantiles-1.0.0/python/setup.py install
running install
running bdist_egg
running egg_info
creating UNKNOWN.egg-info
writing UNKNOWN.egg-info/PKG-INFO
writing dependency_links to UNKNOWN.egg-info/dependency_links.txt
writing top-level names to UNKNOWN.egg-info/top_level.txt
writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
reading manifest file 'UNKNOWN.egg-info/SOURCES.txt'
writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'triton' extension
Warning: Can't read registry to find the necessary compiler setting
Make sure that Python modules winreg, win32api or win32con are installed.
INFO: C compiler: gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC

creating build
creating build/temp.linux-x86_64-3.9
creating build/temp.linux-x86_64-3.9/src
INFO: compile options: '-I/home/andrei/.pyenv/versions/3.9.12/envs/test-lev/lib/python3.9/site-packages/numpy/core/include -I/home/andrei/.pyenv/versions/3.9.12/envs/test-lev/include -I/home/andrei/.pyenv/versions/3.9.12/include/python3.9 -c'
extra options: '-O3'
INFO: gcc: src/filter.c
INFO: gcc: src/heap.c
INFO: gcc: src/python.c
INFO: gcc: src/quantile.c
cc1: fatal error: src/filter.c: No such file or directory
compilation terminated.
cc1: fatal error: src/heap.c: No such file or directory
compilation terminated.
cc1: fatal error: src/python.c: No such file or directory
compilation terminated.
cc1: fatal error: src/quantile.c: No such file or directory
compilation terminated.
error: Command "gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/andrei/.pyenv/versions/3.9.12/envs/test-lev/lib/python3.9/site-packages/numpy/core/include -I/home/andrei/.pyenv/versions/3.9.12/envs/test-lev/include -I/home/andrei/.pyenv/versions/3.9.12/include/python3.9 -c src/filter.c -o build/temp.linux-x86_64-3.9/src/filter.o -O3" failed with exit status 1

Installation using Building from Source section also fails:

pip install build
Collecting build
  Using cached build-0.8.0-py3-none-any.whl (17 kB)
Collecting pep517>=0.9.1
  Using cached pep517-0.12.0-py2.py3-none-any.whl (19 kB)
Collecting tomli>=1.0.0
  Using cached tomli-2.0.1-py3-none-any.whl (12 kB)
Collecting packaging>=19.0
  Using cached packaging-21.3-py3-none-any.whl (40 kB)
Collecting pyparsing!=3.0.5,>=2.0.2
  Using cached pyparsing-3.0.9-py3-none-any.whl (98 kB)
Installing collected packages: tomli, pyparsing, pep517, packaging, build
Successfully installed build-0.8.0 packaging-21.3 pep517-0.12.0 pyparsing-3.0.9 tomli-2.0.1
(test-lev) andrei@andrei-ubuntu:~/worspace/test-LEV/rolling-quantiles-1.0.0/python$ python -m build
* Creating venv isolated environment...
* Installing packages in isolated environment... (numpy~=1.20, setuptools>=54, wheel)
* Getting dependencies for sdist...
* Installing packages in isolated environment... (numpy ~= 1.20)
* Building sdist...
running sdist
running egg_info
creating rolling_quantiles.egg-info
writing rolling_quantiles.egg-info/PKG-INFO
writing dependency_links to rolling_quantiles.egg-info/dependency_links.txt
writing requirements to rolling_quantiles.egg-info/requires.txt
writing top-level names to rolling_quantiles.egg-info/top_level.txt
writing manifest file 'rolling_quantiles.egg-info/SOURCES.txt'
reading manifest file 'rolling_quantiles.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '.DS_Store' found anywhere in distribution
warning: no previously-included files matching 'pypi-token.txt' found anywhere in distribution
adding license file '../LICENSE'
writing manifest file 'rolling_quantiles.egg-info/SOURCES.txt'
running check
creating rolling_quantiles-1.0.0
creating rolling_quantiles-1.0.0/rolling_quantiles
creating rolling_quantiles-1.0.0/rolling_quantiles.egg-info
creating rolling_quantiles-1.0.0/src
copying files to rolling_quantiles-1.0.0...
copying MANIFEST.in -> rolling_quantiles-1.0.0
copying README.md -> rolling_quantiles-1.0.0
copying pyproject.toml -> rolling_quantiles-1.0.0
copying setup.cfg -> rolling_quantiles-1.0.0
copying setup.py -> rolling_quantiles-1.0.0
copying ../LICENSE -> rolling_quantiles-1.0.0/..
copying rolling_quantiles/__init__.py -> rolling_quantiles-1.0.0/rolling_quantiles
copying rolling_quantiles.egg-info/PKG-INFO -> rolling_quantiles-1.0.0/rolling_quantiles.egg-info
copying rolling_quantiles.egg-info/SOURCES.txt -> rolling_quantiles-1.0.0/rolling_quantiles.egg-info
copying rolling_quantiles.egg-info/dependency_links.txt -> rolling_quantiles-1.0.0/rolling_quantiles.egg-info
copying rolling_quantiles.egg-info/requires.txt -> rolling_quantiles-1.0.0/rolling_quantiles.egg-info
copying rolling_quantiles.egg-info/top_level.txt -> rolling_quantiles-1.0.0/rolling_quantiles.egg-info
copying rolling_quantiles.egg-info/zip-safe -> rolling_quantiles-1.0.0/rolling_quantiles.egg-info
copying src/filter.c -> rolling_quantiles-1.0.0/src
copying src/heap.c -> rolling_quantiles-1.0.0/src
copying src/python.c -> rolling_quantiles-1.0.0/src
copying src/quantile.c -> rolling_quantiles-1.0.0/src
Writing rolling_quantiles-1.0.0/setup.cfg
Creating tar archive
removing 'rolling_quantiles-1.0.0' (and everything under it)
* Building wheel from sdist
* Creating venv isolated environment...
* Installing packages in isolated environment... (numpy~=1.20, setuptools>=54, wheel)
* Getting dependencies for wheel...
* Installing packages in isolated environment... (numpy ~= 1.20, wheel)
* Building wheel...
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-39
creating build/lib.linux-x86_64-cpython-39/rolling_quantiles
copying rolling_quantiles/__init__.py -> build/lib.linux-x86_64-cpython-39/rolling_quantiles
running build_ext
building 'triton' extension
Warning: Can't read registry to find the necessary compiler setting
Make sure that Python modules winreg, win32api or win32con are installed.
INFO: C compiler: gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC

creating build/temp.linux-x86_64-cpython-39
creating build/temp.linux-x86_64-cpython-39/src
INFO: compile options: '-I/tmp/build-env-sv7dbz4o/lib/python3.9/site-packages/numpy/core/include -I/tmp/build-env-sv7dbz4o/include -I/home/andrei/.pyenv/versions/3.9.12/include/python3.9 -c'
extra options: '-O3'
INFO: gcc: src/filter.c
INFO: gcc: src/heap.c
INFO: gcc: src/python.c
INFO: gcc: src/quantile.c
src/heap.c:17:10: fatal error: heap.h: No such file or directory
   17 | #include "heap.h"
      |          ^~~~~~~~
compilation terminated.
src/filter.c:17:10: fatal error: filter.h: No such file or directory
   17 | #include "filter.h"
      |          ^~~~~~~~~~
compilation terminated.
src/quantile.c:17:10: fatal error: quantile.h: No such file or directory
   17 | #include "quantile.h"
      |          ^~~~~~~~~~~~
compilation terminated.
src/python.c:24:10: fatal error: filter.h: No such file or directory
   24 | #include "filter.h"
      |          ^~~~~~~~~~
compilation terminated.
error: Command "gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/tmp/build-env-sv7dbz4o/lib/python3.9/site-packages/numpy/core/include -I/tmp/build-env-sv7dbz4o/include -I/home/andrei/.pyenv/versions/3.9.12/include/python3.9 -c src/filter.c -o build/temp.linux-x86_64-cpython-39/src/filter.o -O3" failed with exit status 1

ERROR Backend subprocess exited when trying to invoke build_wheel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.