cgobat / asymmetric_uncertainty Goto Github PK

View Code? Open in Web Editor NEW

17.0 3.0 3.0 206 KB

A package for handling numeric quantities with asymmetric uncertainties.

Home Page: https://github.com/cgobat/asymmetric_uncertainty/wiki

License: GNU General Public License v3.0

Python 31.56% Jupyter Notebook 68.44%

uncertainty-propagation error-propagation python error-analysis

asymmetric_uncertainty's Introduction

$\mathcal{I}$...

work in the Space Mission Directorate at SwRI–Boulder.
studied astrophysics at GWU.
develop operations software and data pipelines (both on-board and on the ground) for space science missions, and lead the development of
- the PDS4 archive data preparation software for both the Lucy and New Horizons missions.
- the PUNCH mission operations center's telemetry/CCSDS data processing pipeline.
do scientific programming, data analysis, and observational astronomy.
❤ Python and free software.

Featured repositories:

My Pinned repositories section below showcases some notable community projects I've contributed to.

GitHub Stats 📊

asymmetric_uncertainty's People

Contributors

Stargazers

Watchers

Forkers

nsahoo sevenstarknight mikemoss3

asymmetric_uncertainty's Issues

Compatibility with `astropy.units`

The a_u object type is not currently compatible with astropy.units. Adding this functionality would greatly enhance this package for use across the physical sciences, and would certainly be very nice to have, given that this package's perhaps most notable use case is in astronomy & astrophysics.

My initial thought is that the most straightforward way to do this will be to make a_u a subclass of astropy.units.Quantity.

Add unit tests

Issue for tracking implementation/addition of unit tests and CI.

Not able to use the package

Hi! I have been trying to use the package to implement it in my code, it keeps giving an error, I tried with jupyter notebooks, python and Google Collab. Please advise.

CDF does not behave as expected.

Currently, if the user calls cdf on a single value, the method always returns 1. This is because cdf is currently implemented as:
return np.cumsum(self.pdf(x))/np.sum(self.pdf(x))
but the cumulative sum should always be from some far left location up to x and the sum should be across the entire pdf, not just the x value. This is properly implemented in the cdfplot method though where you define neg_x and pos_x:

neg_x = np.linspace(self.value-(num_sigma*self.minus),self.value,discretization)
pos_x = np.linspace(self.value,self.value+(num_sigma*self.minus),discretization)

So, you could change the cdf method to something like:

def cdf(self,x,num_sigma=5,discretization=100):
    neg_x = np.linspace(self.value-(num_sigma*self.minus),self.value,discretization)
    pos_x = np.linspace(self.value,self.value+(num_sigma*self.plus),discretization)
    x_arr = np.array(list(neg_x)+list(pos_x))

    if hasattr(x,'__len__'):
        ret = np.zeros(shape=len(x))
        for i in range(len(x)):
            ret[i] = np.sum( self.pdf( x_arr[ : np.argmin(x[i] >= x_arr)] ) ) / np.sum(self.pdf(x_arr) )
        else:
            ret = np.sum( self.pdf( x_arr[ : np.argmin(x >= x_arr)] ) ) / np.sum(self.pdf(x_arr) )

    return ret

(The if hasattr() statement is how I typically handle methods that have to compare either a single value or an array of values to some array, idk if it's the best way and would love any improvement you may know of).

Handling correlations between variables

This package currently has very limited capacity for correctly handling correlations between variables. For example, for two quantities with uncertainties $x\pm\sigma_x$ and $y\pm\sigma_y$, $(x-x) \not\equiv (x-y)$, even if $x=y$ and $\sigma_x = \sigma_y$. This is because $x-x\equiv0\pm0$, but $x-y=0\pm\sigma_{x-y}$, with $\sigma_{x-y}\neq0$.

The functionality to handle this specific subtraction case is already planned (i.e., a_u.__sub__(self, other) checks if other is self), but in general the propagation formulae implemented by this package assume uncorrelated operands. The uncertainties package for symmetric error propagation handles this correctly, and it would be nice if this package could too. Checks for identity should be fairly straightforward to implement, but users may also wish to indicate correlation between two variables that don't share an address in memory (i.e., self is other is False). Some other way of keeping track of (auto)corrrelation may therefore eventually be called for/warranted.

Issues with true division

I've got the following equation:
asymmetricResults_M = 1/asymmetricResults_lambda

where asymmetricResults_lambda = a(2.2819067612897336e-05, 6.594071993855804e-05, 2.0414838882371667e-05) which is giving me positive max/min as expected,

but asymmetricResults_M is giving me a positive and negative max/min which is unexpected (~83028 Max, ~ -82813 Min). The value is correct however (43823).

It appears that the plus/minus aren't getting swapped as part of the division process, but I have no idea why as after reviewing your code that appears to be what it should be doing. Maybe because python is interpreting them as multiplication instead?

Any suggestions are welcome

PIP Install Doesn't work with latest version

PIP's latest version (23.2.1.) won't install "pip install git+https://github.com/cgobat/asymmetric_uncertainty.git", you'll get a error that not all requirements have been met. But if you install 21.3.1., you get a successful installation.

PDF normalization.

The normalization of the piecewise Normal distribution was based on the Eq. 2 of 2008ApJ...672..433S and/or Eq. 2.1 of doi:10.1080/03610928208828279.

However, I do not believe the normalization is correct. If we test the integral of this function, we see that the integral of the currently implemented pdf() returns 2. Let's define a point at 200 with some asymmetric errors and then define a range of x values:

datap = a_u(200,10,15)
neg_x = np.linspace(datap.value-(num_sigma*datap.minus),datap.value,discretization)
pos_x = np.linspace(datap.value,datap.value+(num_sigma*datap.minus),discretization)
x_arr = np.array(list(neg_x)+list(pos_x))

Then we find the PDF value at each point in x_arr using the equation currently implemented in pdf():

pdf = np.piecewise(x_arr, [x_arr<datap.value, x_arr>=datap.value],
	[lambda y : np.sqrt(2)/np.sqrt(np.pi)/(datap.plus+datap.minus) * np.exp(-1*(y-datap.value)**2 / (2*datap.minus**2)),
	lambda y : np.sqrt(2)/np.sqrt(np.pi)/(datap.plus+datap.minus) * np.exp(-1*(y-datap.value)**2 / (2*datap.plus**2))] )

We can take the sum of all the PDF values, but we also have to multiply by the size of each integration piece (i.e., Riemann-sum integration).

delta = (x_arr[-1] - x_arr[0])/discretization
norm = np.sum( temp*delta )

norm will have a value of 2. This can be normalized if the np.sqrt(2) in the pdf variable definition are sent to the denominator (i.e., the same as dividing the whole thing by 2).

Maybe I missed something in my test above?

Raising to negative power does not flip errors as expected

In #8, @sevenstarknight brought to light the fact that the behavior of power operations with exponents < 0 is not consistent with the equivalent division operation. In other words, given x = a_u(...), 1/x currently produces a different result than x**(-1.0). These should, of course, be equivalent operations.

The implementations of the __pow__ and __rpow__ methods should be investigated and adjusted accordingly.

Unused lines in pdfplot and cdfplot

In the pdfplot() and cdfplot() methods, the variables

p_neg = np.sqrt(2)/np.sqrt(np.pi)/(self.plus+self.minus) * np.exp(-1*(neg_x-self.value)**2 / (2*self.minus**2))
p_pos = np.sqrt(2)/np.sqrt(np.pi)/(self.plus+self.minus) * np.exp(-1*(pos_x-self.value)**2 / (2*self.plus**2))

are assigned but never used. I guess these lines could be removed.

Support for np.exp function?

Hi there~ really awesome package!

I was just wondering if support could be added for the np.exp function? I just get:
TypeError: loop of ufunc does not support argument 0 of type a_u which has no callable exp method

My current workaround is to just use the Taylor series expansion of e^x, but that seems a bit unnecessary xD
Just wanted to ask first and see if it's possible! Thank you~

Generating a LaTeX table

I want to generate a LaTeX table, how do I limit the number of decimal places displayed? Thanks for your help.

CI should only run when changes/PRs target main branch

CI/testing does not need to run for every commit/push to every development branch, etc.