andgoldschmidt / derivative Goto Github PK

View Code? Open in Web Editor NEW

53.0 5.0 7.0 6.98 MB

Optimal numerical differentiation of noisy time series data in python.

Home Page: https://derivative.readthedocs.io/en/latest/

License: Other

Python 100.00%

differentiation numerical-differentiation experimental-data

derivative's Introduction

Numerical differentiation of noisy time series data in python

derivative is a Python package for differentiating noisy data. The package showcases a variety of improvements that can be made over finite differences when data is not clean.

Want to see an example of how derivative can help? This package is part of PySINDy (github.com/dynamicslab/pysindy), a sparse-regression framework for discovering nonlinear dynamical systems from data.

This package binds common differentiation methods to a single easily implemented differentiation interface to encourage user adaptation. Numerical differentiation methods for noisy time series data in python includes:

Symmetric finite difference schemes using arbitrary window size.
Savitzky-Galoy derivatives (aka polynomial-filtered derivatives) of any polynomial order with independent left and right window parameters.
Spectral derivatives with optional filter.
Spline derivatives of any order.
Polynomial-trend-filtered derivatives generalizing methods like total variational derivatives.
Kalman derivatives find the maximum likelihood estimator for a derivative described by a Brownian motion.
Kernel derivatives smooth a random process defined by its kernel (covariance).

from derivative import dxdt
import numpy as np

t = np.linspace(0,2*np.pi,50)
x = np.sin(x)

# 1. Finite differences with central differencing using 3 points.
result1 = dxdt(x, t, kind="finite_difference", k=1)

# 2. Savitzky-Golay using cubic polynomials to fit in a centered window of length 1
result2 = dxdt(x, t, kind="savitzky_golay", left=.5, right=.5, order=3)

# 3. Spectral derivative
result3 = dxdt(x, t, kind="spectral")

# 4. Spline derivative with smoothing set to 0.01
result4 = dxdt(x, t, kind="spline", s=1e-2)

# 5. Total variational derivative with regularization set to 0.01
result5 = dxdt(x, t, kind="trend_filtered", order=0, alpha=1e-2)

# 6. Kalman derivative with smoothing set to 1
result6 = dxdt(x, t, kind="kalman", alpha=1)

# 7. Kernel derivative with smoothing set to 1
result7 = dxdt(x, t, kind="kernel", sigma=1, lmbd=.1, kernel="rbf")

Contributors:

Thanks to the members of the community who have contributed!

Jacob Stevens-Haas

Kalman derivatives #12, and more!

References:

[1] Numerical differentiation of experimental data: local versus global methods- K. Ahnert and M. Abel

[2] Numerical Differentiation of Noisy, Nonsmooth Data- Rick Chartrand

[3] The Solution Path of the Generalized LASSO- R.J. Tibshirani and J. Taylor

[4] A Kernel Approach for PDE Discovery and Operator Learning - D. Long et al.

Citing derivative:

The derivative package is a contribution to PySINDy; this work has been published in the Journal of Open Source Software (JOSS). If you use derivative in your work, please cite it using the following reference:

Kaptanoglu et al., (2022). PySINDy: A comprehensive Python package for robust sparse system identification. Journal of Open Source Software, 7(69), 3994, https://doi.org/10.21105/joss.03994

@article{kaptanoglu2022pysindy,
    doi = {10.21105/joss.03994},
    url = {https://doi.org/10.21105/joss.03994},
    year = {2022},
    publisher = {The Open Journal},
    volume = {7},
    number = {69},
    pages = {3994},
    author = {Alan A. Kaptanoglu and Brian M. de Silva and Urban Fasel and Kadierdan Kaheman and Andy J. Goldschmidt and Jared Callaham and Charles B. Delahunt and Zachary G. Nicolaou and Kathleen Champion and Jean-Christophe Loiseau and J. Nathan Kutz and Steven L. Brunton},
    title = {PySINDy: A comprehensive Python package for robust sparse system identification},
    journal = {Journal of Open Source Software}
    }

derivative's People

Contributors

Stargazers

Watchers

Forkers

purdue-university-690 passion4energy krzysztof-kacprzyk jacob-stevens-haas techthiyanes allen91wu turtleizzy

derivative's Issues

License is missing from the repository and the pypi source tarball

It is typical to package a license file with the distributed source code. It would be helpful for downstream packaging, for example.

doc requirements in [tool.poetry.dev-dependencies]?

Hey Andy, 👋, I'm starting to work on the Kalman stuff and trying to make sure the docs build correctly. Apparently, poetry automatically installs the dev dependencies but not optional dependencies which must be specified individually. Can I copy all the documentation optional dependencies into the [tool.poetry.dev-dependencies] section of pyroject.toml?

Related: I can't build the docs even with the optional dependencies: mention this in #12.

Also, I have no idea why my /env/ folder isn't showing up since there's no .gitignore to ignore it, but maybe some poetry magic?

Spatial derivatives in 2+ dimensions

The current package produces derivatives methods for time series.

All the methods included here can be extended to apply to spatial derivatives of arbitrary dimension.

There is no current timeline for adding this feature.

Hyperparamter optimization plugins?

Hey Andy, I've been trying out a few hyper-parameter optimization methods for Kalman smoothing from recent papers. While this isn't as established as the main methods of derivative, I'd love to be able to rely on this package for the main Kalman parts.

It feels like this would be best as an entry point. Entry points are a form of Dependency Inversion. That way, users (like me) could write the hyperparameter optimization code in a somewhat throw-away fashion, and derivative could call that code with only a few generic lines added to this package. No need to review code or determine whether an algorithm meets some acceptability criteria.

Not sure if you've grappled with this kind of a programming problem or architecture in the past, but I could draft a quick PR to show how it works?

Support for multidimensional data

Hi there, great library. Are you thinking of adding support for multivariate data at all?

Git tag version releases

Hey Andy, I'm trying to bump the requirement specifier for pysindy and looking for the first release that has a particular code change. The git repo only has tags for 0.3.1 and 0.4.2. Not sure what your release workflow is, but could you add a git tag when you release?

Run Jupyter notebooks in documentation

Current build config does not re-run the example jupyter notebook when building docs (It was causing an error that RTD ignores, see #30). This means the results from running notebook examples are no longer viewable in the docs, so this should be restored / the RTD error addressed or an alternative workflow should be introduced.

Create functional style api

To add backwards compatibility with derivative.py, a function-style api should be implemented on top of this project's derivative interface.

E.g. from derivative.py/derivative/derivative.py on 05/04/2020:

methods = {}


def register(name=""):
    def inner(f):
        n = name or f.__name__
        methods[n] = f
        return f
    return inner

from .loc import *
from .glob import *

def derivative(x, y, kind="finitediff", **kwargs):
    method = methods.get(kind)
    f = lambda y: method(x, y, **kwargs)
    if len(y.shape) > 1:
        return np.apply_along_axis(f, 0, y)
    else:
        return f(y)

Pass or store smoothed coordinate values

Some derivative methods smooth the coordinate points while calculating the derivatives, and these smoothed coordinates can be useful for SINDy. I think the list of methods that smooth coordinates are:

savitsky golay
spline
trend-filtered
kalman

There's several options for the interface, either

functional: return tuple of x, x_dot from Derivative.compute_for() and Derivative.d(), and possibly dxdt().
object-oriented: Derivative.x set when calling Derivative.compute_for(). Subclasses that do not smooth x should probably still set the value to maintain interface. Could be set in Derivative._global(), but not sure how to deal with savitsky golay, since local methods don't need to compute the derivative for all points.

As I see it, the pros/cons boil down to explicit is better than implicit for the functional pattern vs backwards compatibility for the object-oriented pattern. Setting a new object attribute after initialization is implicit, changing return type is explicit. The object oriented pattern requires derivative objects remain public, and users like pysindy would need a more complicated calling convention.

return dxdt(x, t, axis=0, **self.kwargs)

in favor of

deriv = derivative.methods[kind](**self.kwargs)
x_dot = deriv.d(x, t, axis)
x = deriv.x
return x, x_dot

That said, backwards compatibility is always important. I'm happy to build a PR of either.

Move away from poetry?

Hey @andgoldschmidt , what are your thoughts about moving away from poetry? At the very least, .gitignoring poetry.lock would prevent all the dependabot PRs that deal with dependencies of dependencies. Thanks to #14, I only use poetry as a backend on this project (and pip, venv for the rest). There are two (related) suggestions in line with removing the poetry backend. (a) changing pyproject.toml project data to the format in PEP 621 and (b) changing the backend to setuptools_scm instead of poetry.core. The former adds flexibility for different backends, and the latter allows removing all the version strings in the code and extracting version information from git tags.

Relevant SO question

I understand this is very much a matter of personal workflows and preference, so I don't want to take a strong position on the toml stuff. But definitely recommend discarding poetry.lock in the git repo to limit the dependabot.

axis default parameter

I noticed that the default value for axis throughout derivative.py is 1. What's the rationale for using 1 instead of 0? The only reason I ask is because we've got to set axis=0 when using derivative objects in PySINDy (to conform to sklearn's conventions and those in the original SINDy paper, the 0 axis corresponds to time and the 1 axis to features).

Wavelet derivatives

Model the data as a sparse linear combination of wavelets and compute the derivative of the wavelet basis.

Might consider using SURE shrinkage from Donoho, D. L. and Johnstone, I. M. for the fit with some modifications to compute derivatives. Other options include skimage or pywavelets packages.

Add utilities for periodic boundary conditions

The functions should allow differentiation methods to accept a "periodic" flag that changes the boundary conditions used by a given method to periodic.

A pad for "mirror" boundaries may also be easy to add.

One possible solution is to add padding functions to the utilities (utils.py). In some of the global methods, it's not clear what the definition of a periodic boundary should be.

Numerical tests don't assert; also is there a better way of testing?

I noticed compare() doesn't assert the numerics, it just issues a warning. All tests currently pass with the change to assert, so I assume it's OK to submit a PR that would make them assert? It would be helpful to have this in place when changing the repo for #15.

I noticed #15 was causing issues with x having multiple time series (e.g. x.shape==(2,100)), so also added some tests. ~~Some existing methods fail on either the median or residual tests even when they pass on the individual series.~~
EDIT: Solved, it was for very aperiodic tests and spectral derivatives.

That brings up another question: Is there a reason to test median and standard deviation of the residual, instead of inf norm, MAE, MSE?

EDIT: Okay, ran a lot of tests and median seems better than 1-norm/MAE for variable numbers of timepoints.

Derivatives of empty arrays?

From differentiation.dxdt:

An empty X results in an empty derivative.

In the interest of ZOP #10, Errors should never pass silently, should dxdt() instead raise an error? It feels like returning an empty array instead of failing is an attempt to be cleverly helpful, but beyond the remit of these functions. Callers IMO should be responsible for the

if x.size:
  x_dot = dxdt(...)

Noticed when delving into an error in Derivative.x() in #15. I was refactoring the shape manipulation to be used by both x() and d(), and it gets (minorly) more verbose when handling empty arrays.

Differentiation in arrays where len(X.shape)>2?

Pertaining to this issue,

While trying to differentiate an array with len(X.shape) = 3, for example X.shape = (10, 100, 2) and t = np.arange(0, 10, 0.1), since X is not 2-dimensional, I get the error "Invalid Shape of X".
Without having to manipulate the shape of X, is it possible to implement differentiation of multidimensional arrays?