fjosw / pyerrors Goto Github PK

Error propagation and statistical analysis for Markov chain Monte Carlo simulations in lattice QCD and statistical mechanics using autograd

Home Page: https://fjosw.github.io/pyerrors/pyerrors.html

License: MIT License

Python 100.00%

autocorrelation autograd automatic-differentiation condensed-matter correlation data-analysis error-propagation lattice-field-theory lattice-qcd markov-chain monte-carlo particle-physics physics python qcd statistical-analysis statistical-mechanics

pyerrors's People

Contributors

Stargazers

Watchers

Forkers

jkuhl-uni s-kuberski janneuendorf yup790 pialjp kash12x j-bode nils-ht rsav3 mlap211

pyerrors's Issues

Read specific interval with read_ms5_xsf()

Hi,
read_ms5_xsf() in pyerrors/input/openQCD.py reads all configs/samples from the output file. It would be nice to define start and stop values in order to read just a subinterval, e.g. 10-20 instead of all lets say 1-500.

Issues with _filter_zeroes and Corr

I have noticed issues with _filter_zeroes. These appear when two observables with vastly different variances, defined on different subsets of an ensemble, are combined (e.g. by renormalizing a noisy correlator). In this case it can happen that configurations that belong only to the observable with the small variance are filtered out by chance. This is something that I did not have in mind when designing the method.
When this does happen to elements of correlators, the export to json fails because the Obs of the Corr are not defined on the same set of configurations anymore.

One can construct a case artificially:

li = [-pe.pseudo_Obs(.2, .1, 'a', samples=10) for i in range(6)]
for i in range(len(li)):
    li[i].idl['a'] = range(1, 21, 2)
b = pe.pseudo_Obs(1, 1e-11, 'a', samples=30)
c *= b
print(c[2].idl)
print(c[3].idl)
{'a': [1, 3, 5, 7, 9, 10, 11, 13, 15, 17, 19]}
{'a': [1, 3, 4, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 26, 30]}

This is certainly a rare event but I have observed it a number of times in the past and this is quite annoying.

A possible fix would be to remove the automatic filtering altogether and instead to put the feature in a function that may be called by the user. In this case, exact cancellations would manifest themselves as vanishing deltas and would not be filtered out anymore. I tend to think that this is better to live with than silent removals of configurations from the obs.

If we like to keep the feature, I see a number of possible improvements (use mean instead of the median in _filter_zeroes, lower filter_eps, ...).

m_eff not working properly

The effective mass of a correlator cannot be determined for the keyword 'periodic' if one includes padding :

import numpy as np
T = 32
mass = pe.pseudo_Obs(.2, .02, 'mass')
dat = [(np.exp(- mass * i) + np.exp(-mass * (T - i))) for i in range(T)]
corr = pe.Corr(dat)
m = corr.m_eff('periodic')
pl = 2
pu = 4
corr_pad = pe.Corr(dat[pl:T-pu], padding=[pl, pu])
m_pad = corr_pad.m_eff('log')
m_pad = corr_pad.m_eff('arccosh')
m_pad = corr_pad.m_eff('periodic')

throws

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-45-9585941fa274> in <module>
      4 m_pad = corr_pad.m_eff('log')
      5 m_pad = corr_pad.m_eff('arccosh')
----> 6 m_pad = corr_pad.m_eff('periodic')

~/phd/git/pyerrors_github/pyerrors/correlators.py in m_eff(self, variant, guess)
    490                 raise Exception('m_eff is undefined at all timeslices')
    491 
--> 492             return Corr(newcontent, padding=[0, 1])
    493 
    494         elif variant == 'arccosh':

~/phd/git/pyerrors_github/pyerrors/correlators.py in __init__(self, data_input, padding, prange)
     58                 raise Exception("Items in data_input are not of identical shape." + str(noNull))
     59         else:
---> 60             raise Exception("data_input contains item of wrong type")
     61 
     62         self.tag = None

Exception: data_input contains item of wrong type

Obs and Covobs defined on the same ensemble

I just noticed that no warning is issued when one attempts to add a standard Obs and a cov_Obs defined on the same ensemble
Example:

import pyerrors as pe
covobs = pe.cov_Obs(0.5, 0.002, 'test')
my_obs = pe.pseudo_Obs(2.3, 0.2, 'test')
summed_obs = my_obs + covobs
summed_obs.gamma_method()
summed_obs.details()

It looks like the deltas of the standard Obs are just dropped.

Exception when applying .symmetric() to Corr containing None

If you take .symmetric() on a corr, containing a None, an exception is raised.
Here is a minimal example.

assert(pe.__version__=="2.4.0")
example_obs=pe.Obs([[1,1,1,1,1,1]],["test"])
corr=pe.Corr([None,example_obs,example_obs,example_obs])
corr.symmetric()

I would expect this to work and give the correlator [None,obs,obs,None].
As a related question, why was .symmetric() disabled for matrix correlators?

Recover tags from arrays in json output

When I write an array of Obs with different tags in a json.gz file and extract it again all Obs have the same tag which is composed from the individual tags. How about using some delimiter in this case from which the original tags can be reconstructed?

Example:

import numpy as np
import pyerrors as pe

tt1 = pe.Obs([np.random.rand(100)], ['t|r1'], idl=[range(2, 202, 2)])
tt1.tag = 'f1'
tt2 = pe.Obs([np.random.rand(100)], ['t|r1'], idl=[range(2, 202, 2)])
tt2.tag = 'f2'
tt3 = pe.Obs([np.random.rand(100)], ['t|r1'], idl=[range(2, 202, 2)])
tt3.tag = 'f3'

pe.input.json.dump_to_json(np.array([tt1, tt2, tt3]), 'test', 'My testfile')

q = pe.input.json.load_json('test.json.gz')

print([o.tag for o in q])

> ['f1\nf2\nf3\n', 'f1\nf2\nf3\n', 'f1\nf2\nf3\n']

Automatic windowing method fails for gapped and irregular chains

The automatic windowing procedure fails in cases where the Monte Carlo history is gapped, e.g., with stride 2, and on top of this contains also holes. The main reason for this behavior is that we do not expand the chain if idl is a range, but we do so, if it is a list. Minimal example, close to what is done in test_gamma_method_irregular:

N = 10000
def gen_autocorrelated_array(inarr, rho):
    outarr = np.copy(inarr)
    for i in range(1, len(outarr)):
        outarr[i] = rho * outarr[i - 1] + np.sqrt(1 - rho**2) * outarr[i]
    return outarr

arr = np.random.normal(1, .2, size=N)
carr = gen_autocorrelated_array(arr, .8)
a = pe.Obs([carr], ['a'])
a.gamma_method()

arrt = [carr[i] for i in range(len(carr)) if i % 2 == 1]
idlt = [i for i in range(len(carr)) if i % 2 == 1]
for el in [int(e) for e in N * np.random.uniform(size=int(N/4))]:
    arrt = arrt[:el] + arrt[el + 1:]
    idlt = idlt[:el] + idlt[el + 1:]
at = pe.Obs([arrt], ['a'], idl=[idlt])
at.gamma_method(tau_exp=2)
at.plot_rho()

There are probably multiple ways to fix this behavior and I am not yet certain which one is the best. There is the possibility to determine the regular gaps based on the rho that we have calcuated via

# detect holes:
            gapsize = 1
            if not all(isinstance(self.idl[r_name], range) for r_name in e_content[e_name]):
                for n in range(1, w_max):
                    if abs(self.e_rho[e_name][n]) < 1.e-13:
                        gapsize += 1
                    else:
                        break

before performing the automatic windowing procedure. This one would then be done via

                    # Standard automatic windowing procedure
                    tau = self.S[e_name] / np.log((2 * self.e_n_tauint[e_name][1:] + 1) / (2 * self.e_n_tauint[e_name][1:] - 1)) * gapsize
                    g_w = np.exp(- np.arange(1, w_max) / tau) - tau / np.sqrt(np.arange(1, w_max) * e_N)
                    for n in range(gapsize, w_max, gapsize):
                        if n < w_max // 2 - 2:
                            _compute_drho(n + 1)
                        if (g_w[n - gapsize] < 0 or n >= w_max - 1):
                            self.e_tauint[e_name] = self.e_n_tauint[e_name][n] * (1 + (2 * n + 1) / e_N) / (1 + 1 / e_N)  # Bias correction hep-lat/0306017 eq. (49)
                            self.e_dtauint[e_name] = self.e_n_dtauint[e_name][n]
                            self.e_dvalue[e_name] = np.sqrt(2 * self.e_tauint[e_name] * e_gamma[e_name][0] * (1 + 1 / e_N) / e_N)
                            self.e_ddvalue[e_name] = self.e_dvalue[e_name] * np.sqrt((n + 0.5) / e_N)
                            self.e_windowsize[e_name] = n
                            break

and the critical slowing down analysis would have to be adapted accordingly.

This detection seems to be quite stable, but I have not tested it thoroughly. It could be possible that it fails in some corner cases or that the bound would have to be adapted. Before I investigate this further, I wanted to discuss if we should somehow handle this differently.

read_sfcf does not close opened files

When reading files with the new read_sfcf routine and all warnings enabled I get the following warning:

ResourceWarning: unclosed file <_io.TextIOWrapper name='my_file_name_n1' mode='r' encoding='UTF-8'>

In the past I had used context managers to avoid this problem. Another solution would be to explicitly close all files accessed by the function. Can you take a look at this @jkuhl-uni ?

iminuit 2.0 breaks prior_fit

Minuit.from_array_func was removed in the new major release of iminuit which breaks the fits.prior_fit routine.

Fits using migrad 1.3.3 broken

The fits that are performed using migrad in the test_prior_fit throw errors, when iminuit 1.3.3 is used; does this imply that there now is a requirement for the minimal version number of iminuit?:

tests/fits_test.py:291:

pyerrors/fits.py:116: in least_squares
return _prior_fit(x, y, func, priors, silent=silent, **kwargs)
pyerrors/fits.py:396: in _prior_fit
m.migrad()

???
E RuntimeError: exception was raised in user function
E User function arguments:
E p = +0,000000
E Original python exception in user function:
E TypeError: 'float' object is not subscriptable
E File "/home/simon/phd/git/pyerrors_github/pyerrors/fits.py", line 382, in chisqfunc
E model = func(p, x)
E File "/home/simon/phd/git/pyerrors_github/tests/fits_test.py", line 280, in f
E return a[0] + a[1] * x

iminuit/_libiminuit.pyx:769: RuntimeError

least_squares fit using migrad is broken

import pyerrors as pe
import numpy as np
x = np.arange(10)
y = [pe.pseudo_Obs(i+1, i*.2+.01, 'test') for i in x]
[o.gamma_method() for o in y]
def func(a, x):
    return a[0] + a[1] * x
pe.fits.least_squares(x, y, func)
pe.fits.least_squares(x, y, func, method='migrad')

results in

KeyError                                  Traceback (most recent call last)
/opt/anaconda3/lib/python3.6/site-packages/scipy/optimize/optimize.py in __getattr__(self, name)
    121         try:
--> 122             return self[name]
    123         except KeyError:

KeyError: 'nit'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-18-79beea198971> in <module>
      7     return a[0] + a[1] * x
      8 pe.fits.least_squares(x, y, func)
----> 9 pe.fits.least_squares(x, y, func, method='migrad')

~/phd/git/pyerrors_github/pyerrors/fits.py in least_squares(x, y, func, priors, silent, **kwargs)
    118         return _prior_fit(x, y, func, priors, silent=silent, **kwargs)
    119     else:
--> 120         return _standard_fit(x, y, func, silent=silent, **kwargs)
    121 
    122 

~/phd/git/pyerrors_github/pyerrors/fits.py in _standard_fit(x, y, func, silent, **kwargs)
    501         chisquare = fit_result.fun
    502 
--> 503         output.iterations = fit_result.nit
    504     else:
    505         output.method = 'Levenberg-Marquardt'

/opt/anaconda3/lib/python3.6/site-packages/scipy/optimize/optimize.py in __getattr__(self, name)
    122             return self[name]
    123         except KeyError:
--> 124             raise AttributeError(name)
    125 
    126     __setattr__ = dict.__setitem__

AttributeError: nit

using python3.6 and migrad 2.8.4 (and 2.8.3)

Multi-dimensional fits

As anticipated, we have to polish the new fit routine: The recent commit #154 seems to lead to problems with fits where several independent observables are used. Minimal example:

x1 = np.arange(1, 10)
x = np.array([[xi, xi] for xi in x1]).T
y = [pe.pseudo_Obs(i + 2 / i, .1*i, 't') for i in x[0]]
[o.gm() for o in y]
def fitf(a, x):
    return a[0] * x[0] + a[1] / x[1]

pe.fits.least_squares(x, y, fitf)

gives me

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-2-b5c7b1c30017> in <module>
      6     return a[0] * x[0] + a[1] / x[1]
      7 
----> 8 pe.fits.least_squares(x, y, fitf)

~/phd/git/pyerrors_github/pyerrors/fits.py in least_squares(x, y, func, priors, silent, **kwargs)
    170         return _prior_fit(x, y, func, priors, silent=silent, **kwargs)
    171     else:
--> 172         return _combined_fit(x, y, func, silent=silent, **kwargs)
    173 
    174 

~/phd/git/pyerrors_github/pyerrors/fits.py in _combined_fit(x, y, func, silent, **kwargs)
    556             raise TypeError('func (key=' + key + ') is not a function.')
    557         if len(xd[key]) != len(yd[key]):
--> 558             raise Exception('x and y input (key=' + key + ') do not have the same length')
    559         for i in range(100):
    560             try:

Exception: x and y input (key=) do not have the same length

In the old version, the comparison has been done using

if x.shape[-1] != len(y):
        raise Exception('x and y input have to have the same length')

so something along the lines of

if xd[key].shape[-1] != len(y[key]):
        raise Exception('x and y input (key = %s) have to have the same length' % (key))

should fix this.

Unused way of initializing a covobs

In the tests we are running the following line is never called

pyerrors/pyerrors/obs.py

Line 100 in fc25ec6

self.covobs = covobs

as all covobs seem to be initialized via cov_Obs or derived_observable where the covobs attribute is set manually. Can this line and the accompanying parameter be removed?

Bug coming from difference in search methods in sfcf inputs

Hi,
I found a bug in the inputs, when one reads reweighting factors from openQCD and data from sfcf, it can happen, that the data is correctly sorted by replika in the openQCD ones, but not the sfcf read method, as I haven't implemented it in PR #153
Just wanted to make you aware of this, I'll try to fix it later this week, as I am currently working on refactoring the sfcf read method anyways.

Ordering inside key of mesons-dict

The keys for the dictionary that is returned in read_mesons is constructed via

pyerrors/pyerrors/input/bdio.py

Line 460 in c9f6529

    
           result[(corr_name[c], corr_source[c]) + tuple(sorted(corr_kappa[c]))] = tmp_corr

I think that the sorting may be problematic if mass-derivatives are considered, since it is important to know which is the fist and which is the second quark in the correlation function. Otherwise, the matching of derivative and quark may be wrong. A change of the behavior would sort of break backwards compatibility, but I would consider this a bug, rather than a feature that is changed.

Am I missing something @fjosw?

Warning in pandas tests

The recent changes in #174 and #184 seem to have introduced the following warning when calling the tests:

tests/pandas_test.py::test_nan_df_export_import
  /home/fjosw/Codes/pyerrors/pyerrors/input/pandas.py:85: UserWarning: nan value in column int will be replaced by None
    warnings.warn("nan value in column " + column + " will be replaced by None", UserWarning)

Can you please have a look at this @jkuhl-uni ?

GEVP eigenvectors with errors

Just a small comment in case it is of interest to the developers. It would be useful to be able to get the GEVP eigenvectors with errors from the GEVP() routine since these vectors can be used among other things to obtain the overlap factors that appear in the spectral decomposition of correlation functions, see e.g https://arxiv.org/abs/1004.4930 below Eq. 3, which in turn are useful for things like spin identification of states as explained in that same reference and it is important to propagate the error there.

With the current functions available in pyerrors it is possible to calculate these vectors with errors rather straightforwardly, but if one wants to also include the sorting procedures that the GEVP() function internally uses (which might have to take into account the errors of the vectors for the determinant-based sorting) then the code might become a bit too long compared to a single function call.

Possible bug in Corr.dump(datatype="json.gz")

The .dump() function seems to have a bug for values being None.

I tried the following:

new_content=[(my_correlator.content[i] if i not in [6,8,9,12,14,15,20] else None ) for i in range(my_correlator.T) ] # We take a correlator and set some values to None

correlator_incomplete=pe.Corr(new_content) # We make a new correlator. 
#We can display it and it behaves like intended. 

correlator_incomplete.dump(file) #save it somewhere

When i reload it, all values after the first None are missing as well, not just the ones i deleted.
The json file is much smaller, so it is probably the .dump method and not the loader.

obs does not init. correctly with means argument

If you pass a list of deltas and a list of means, the variable ._value is left at 0.0.

`Corr.show()` draws prange in same color as error bars.

I was puzzled when I saw that a correlator had a large error in a plot from Corr.show() when print() gave me a small value.
It turns out, the markers for prange are identical to the error bars.
Here, in

        if self.prange:
            ax1.axvline(self.prange[0], 0, 1, ls='-', marker=',')
            ax1.axvline(self.prange[1], 0, 1, ls='-', marker=',')

it would be helpful, if color=... and maybe zorder were defined for axvline.
Nevertheless, I really like the convenience of the Corr class!
Thank you :)

Failing python 3.12 pytest workflow

This problem is connected to dateutil/dateutil#1314 & dateutil/dateutil#1285 and could get resolved via a new dateutil release on pypi.

Duplicate data cause `gamma_method()` to fail with an unhelpful message

If an Obs is constructed with idl containing duplicate entries for a particular stream (e.g. because a correlator was measured twice due to fencepost issues in job preparation), then the Obs is constructed successfully, but gamma_method() fails with a relatively cryptic error.

Example:

>>> import pyerrors as pe
>>> obs = pe.Obs([[1.2, 0.7, 1.0, 1.0, 1.4, 1.3]], ["data"], idl=[[1, 2, 3, 3, 4, 5]])
>>> obs.gamma_method()

Output:

/opt/homebrew/Caskroom/miniconda/base/envs/su2_py311/lib/python3.11/site-packages/pyerrors/obs.py:1727: RuntimeWarning: divide by zero encountered in scalar remainder
  if not np.all([gi % gap == 0 for gi in gaps]):
/opt/homebrew/Caskroom/miniconda/base/envs/su2_py311/lib/python3.11/site-packages/pyerrors/obs.py:247: RuntimeWarning: divide by zero encountered in scalar floor_divide
  r_length.append((self.idl[r_name][-1] - self.idl[r_name][0] + 1) // gapsize)
/opt/homebrew/Caskroom/miniconda/base/envs/su2_py311/lib/python3.11/site-packages/pyerrors/obs.py:1082: RuntimeWarning: divide by zero encountered in scalar floor_divide
  ret = np.zeros((idx[-1] - idx[0] + gapsize) // gapsize)
/opt/homebrew/Caskroom/miniconda/base/envs/su2_py311/lib/python3.11/site-packages/pyerrors/obs.py:1084: RuntimeWarning: divide by zero encountered in scalar floor_divide
  ret[(idx[i] - idx[0]) // gapsize] = deltas[i]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[13], line 1
----> 1 obs.gamma_method()

File /opt/homebrew/Caskroom/miniconda/base/envs/su2_py311/lib/python3.11/site-packages/pyerrors/obs.py:256, in Obs.gamma_method(self, **kwargs)
    253 self.e_drho[e_name] = np.zeros(w_max)
    255 for r_name in e_content[e_name]:
--> 256     e_gamma[e_name] += self._calc_gamma(self.deltas[r_name], self.idl[r_name], self.shape[r_name], w_max, fft, gapsize)
    258 gamma_div = np.zeros(w_max)
    259 for r_name in e_content[e_name]:

File /opt/homebrew/Caskroom/miniconda/base/envs/su2_py311/lib/python3.11/site-packages/pyerrors/obs.py:364, in Obs._calc_gamma(self, deltas, idx, shape, w_max, fft, gapsize)
    343 """Calculate Gamma_{AA} from the deltas, which are defined on idx.
    344    idx is assumed to be a contiguous range (possibly with a stepsize != 1)
    345
   (...)
    361     are found in idx, the data is expanded.
    362 """
    363 gamma = np.zeros(w_max)
--> 364 deltas = _expand_deltas(deltas, idx, shape, gapsize)
    365 new_shape = len(deltas)
    366 if fft:

File /opt/homebrew/Caskroom/miniconda/base/envs/su2_py311/lib/python3.11/site-packages/pyerrors/obs.py:1084, in _expand_deltas(deltas, idx, shape, gapsize)
   1082 ret = np.zeros((idx[-1] - idx[0] + gapsize) // gapsize)
   1083 for i in range(shape):
-> 1084     ret[(idx[i] - idx[0]) // gapsize] = deltas[i]
   1085 return ret

IndexError: index 0 is out of bounds for axis 0 with size 0

It would be helpful if pyerrors could check for this condition and raise an exception that provides more actionable detail (e.g. IndexError: idl contains a duplicate index 3 at positions 2 and 3).

An alternative could be to identify that the data and index both match, and delete the duplicate (with or without a warning), but that kind of cleaning may be better left to user code.

corr.m_eff() does not check for division by zero

I just updated to the newest release and a few of my scripts broke,
because corr.m_eff() computes corr.content[t]/corr.content[t+1] without checking if corr.content[t+1] is zero.
I do not know, if this is intentional, but i think it made sense the way it was before, where it set those timeslices to "none" in the effective mass.

Documentation for covobs is missing

Documentation for json.gz format is missing

numerical differentiation in derived_obs not working

Currently, the numerical derivative is not working in derived_obs. Minimal working example:

pe.derived_observable(np.exp, pe.cov_Obs(1, .1, 'test'), num_grad=True)

gives me

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-75-1c77dc9277a6> in <module>
----> 1 pe.derived_observable(np.exp, pe.cov_Obs(1, .1, 'test'), num_grad=True, base_step=.1)

~/phd/git/pyerrors_github/pyerrors/obs.py in derived_observable(func, data, array_mode, **kwargs)
   1186         values = np.vectorize(lambda x: x.value)(data)
   1187 
-> 1188     new_values = func(values, **kwargs)
   1189 
   1190     multi = int(isinstance(new_values, np.ndarray))

TypeError: 'num_grad' is an invalid keyword to ufunc 'exp'

The problem is the use of kwargs inside this function that are passed to func although it might not be able to cope with this.

An easy fix would be to manually pop the keywords man_grad and num_grad, if they are present. However, since the user could add all the kwargs of numdifftools.step_generators.MaxStepGenerator, this would not be sufficient (except you explicitly exclude all of them, as well. This should work, but could break if new keywords are defined in an upcoming version of this function). I tried to use the solution of https://stackoverflow.com/a/44052550, but this does not seem to work for numpy functions.

Bug in JSON export together with Correlators and Covobs

The export of a derivative of a correlator that includes a Covobs fails. Minimal not working example:

c = pe.Corr([pe.pseudo_Obs(i, .1, 'test') for i in range(10)])
c *= pe.cov_Obs(1., .1, '#ren')
c = c.deriv()
pe.input.json.dump_to_json(c, 'test')

gives
[...]

~/phd/git/pyerrors_github/pyerrors/misc.py in _assert_equal_properties(ol, otype)
     84             raise Exception("All Obs in list have to have the same property 'reweighted'.")
     85         if not ol[0].e_content == o.e_content:
---> 86             raise Exception("All Obs in list have to be defined on the same set of configs.")
     87         if not ol[0].idl == o.idl:
     88             raise Exception("All Obs in list have to be defined on the same set of configurations.")

Exception: All Obs in list have to be defined on the same set of configs.

This is based on several problems. In

pyerrors/pyerrors/input/json.py

Lines 165 to 176 in f515035

    
           def _nan_Obs_like(obs): 
        
               samples = [] 
        
               names = [] 
        
               idl = [] 
        
               for key, value in obs.idl.items(): 
        
                   samples.append([np.nan] * len(value)) 
        
                   names.append(key) 
        
                   idl.append(value) 
        
               my_obs = Obs(samples, names, idl) 
        
               my_obs.reweighted = obs.reweighted 
        
               my_obs.is_merged = obs.is_merged 
        
               return my_obs

is missing a line

my_obs._covobs = obs._covobs

But this does not solve the problem, since the covobs are not known to e_content

pyerrors/pyerrors/obs.py

Lines 168 to 175 in f515035

    
           @property 
        
           def e_content(self): 
        
               res = {} 
        
               for e, e_name in enumerate(self.e_names): 
        
                   res[e_name] = sorted(filter(lambda x: x.startswith(e_name + '|'), self.names)) 
        
                   if e_name in self.names: 
        
                       res[e_name].append(e_name) 
        
               return res

because self.names is not updated, when the covobs are set manually. There are several possible fixes:

for name in obs._covobs:
        my_obs.names.append(name)

to _nan_Obs_like

Write a method to set covobs manually, that takes care of this in all cases, where we do this. However, the user should not do this, in principle.
Construct e_content and mc_names differently by having a property, that automatically generates names from self.names and self._covobs.keys()

What do you think?

_assert_equal_properties(data_input)

I noticed that the correlator__init__() method now requires the same number of deltas. (Is this true?)
Is there a reason for enforcing this?

linal_test fails

When running pytest on the current develop branch, linalg_test fails:

tests/linalg_test.py:304:

pyerrors/linalg.py:283: in eig
w = derived_observable(lambda x, **kwargs: anp.real(anp.linalg.eig(x)[0]), obs)
pyerrors/obs.py:1126: in derived_observable
deriv = jacobian(func)(values, **kwargs)
../../../.local/lib/python3.6/site-packages/autograd/wrap_util.py:20: in nary_f
return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
../../../.local/lib/python3.6/site-packages/autograd/differential_operators.py:57: in jacobian
vjp, ans = _make_vjp(fun, x)
../../../.local/lib/python3.6/site-packages/autograd/core.py:10: in make_vjp
end_value, end_node = trace(start_node, fun, x)
../../../.local/lib/python3.6/site-packages/autograd/tracer.py:10: in trace
end_box = fun(start_box)
../../../.local/lib/python3.6/site-packages/autograd/wrap_util.py:15: in unary_f
return fun(*subargs, **kwargs)
pyerrors/linalg.py:283: in
w = derived_observable(lambda x, **kwargs: anp.real(anp.linalg.eig(x)[0]), obs)
../../../.local/lib/python3.6/site-packages/autograd/tracer.py:45: in f_wrapped
node = node_constructor(ans, f_wrapped, argvals, kwargs, argnums, parents)

self = <autograd.core.VJPNode object at 0x7fd135504b70>
value = (array([5.14144634, 0.55022181, 0.01581728, 0.08542992]), array([[ 0.34517284, 0.82409178, -0.39054664, -0.22181473],... [ 0.73745352, 0.05786009, 0.5835535 , 0.33508182],
[ 0.535441 , -0.52066857, -0.66188194, 0.06418314]]))
fun = <function primitive..f_wrapped at 0x7fd15acaebf8>
args = (array([[0.99286045, 0.3160893 , 1.32503081, 0.71702452],
[0.3160893 , 0.35664052, 0.81995292, 0.67152304],
[1.32503081, 0.81995292, 2.81293272, 2.00931819],
[0.71702452, 0.67152304, 2.00931819, 1.63048166]]),)
kwargs = {}, parent_argnums = (0,), parents = (<autograd.core.VJPNode object at 0x7fd1357711d0>,)

def __init__(self, value, fun, args, kwargs, parent_argnums, parents):
    self.parents = parents
    try:
        vjpmaker = primitive_vjps[fun]
    except KeyError:
        fun_name = getattr(fun, '__name__', fun)
        raise NotImplementedError("VJP of {} wrt argnums {} not defined"

                                .format(fun_name, parent_argnums))

E NotImplementedError: VJP of eig wrt argnums (0,) not defined

../../../.local/lib/python3.6/site-packages/autograd/core.py:35: NotImplementedError

Nested loop with same variable in input.dobs

In lines 776 and 783 of https://github.com/fjosw/pyerrors/blob/develop/pyerrors/input/dobs.py loops with loop variable name are started which is already used in the higher level loop starting in line 730. I think the functionality is not affected by this but I would still propose to rename the loop variables.

No dobs-related functions from the input submodule can be used

Describe the issue:

None of the dobs-related reading/writing functions that are defined in the pyerrors/input/dobs.py file are recognized when called. This might be because the corresponding init.py does not load dobs.py ?

Code example:

import pyerrors as pe
pe.input.dobs.read_pobs()

Error message:

AttributeError                            Traceback (most recent call last)
Cell In[62], line 1
----> 1 pe.input.dobs.read_pobs()

AttributeError: module 'pyerrors.input' has no attribute 'dobs'

Runtime information:

Python version: 3.10.10
pyerrors version: 2.6.0

plot_history unexpected behaviour for gapped idl

Hi,
I just noticed that, when using gapped data, plot_history shows the deltas that are set to 0 as well. This may result in "lines" on the (replica) mean values of the ensemble. I guess this is supposed to happen according to the rules for gapped Monte-Carlo-Chains, but maybe we could hide them in the history plots?

Numpy 1.25 breaks a few linalg functions

With the upcoming release of numpy 1.25 the return types of the functions eigh and svd change from tuple to a sublass of namedtuple which breaks the computation of the jacobian in autograd. This will lead to two failing tests in our ci:

FAILED tests/linalg_test.py::test_matrix_functions - TypeError: Can't differentiate w.r.t. type <class 'numpy.linalg.linalg.EighResult'>
FAILED tests/mpm_test.py::test_mpm - TypeError: Can't differentiate w.r.t. type <class 'numpy.linalg.linalg.SVDResult'>

I already opened an issue in the autograd repo (HIPS/autograd#597) but did not have the time to find a fix for the problem.

For now that means that pyerrors is not fully compatible with numpy>=1.25 and we should either change the requirements or add version guards for the affected functions until a solution is found.

Gamma method in initialization of Corr

The fact that the gamma method is called each time a correlator is constructed can create a significant overhead for an analysis. When I add two correlators, the call to the gamma method makes up about 60% percent of the total time. In many cases, sevral manipulations are made to a correlator (renormalization, improvement, GEVP, normalization, ...) before the errors are needed, e.g., in a fit or a plot (if they are needed, at all).

In my opinion, it would be more reasonable to call the gamma method inside fit and plot routines (when the errors of the Obses are null) or leaving it to the user (as in the case of Obs) instead of calculating errors over and over again in an analysis.

Gamma_method() is broken for Obs that are NaN

Commit #152 breaks the gamma_method for Obs that are NaN.
Minimal example:

o = pe.pseudo_Obs(1, .1, 'test')
o.gamma_method()
no = np.nan * o
no.gamma_method()

works fine in the old version but now throws an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-bd59fd2172b6> in <module>
      2 o.gamma_method()
      3 no = np.nan * o
----> 4 no.gamma_method()

~/phd/git/pyerrors_github/pyerrors/obs.py in gamma_method(self, **kwargs)
    323                     for n in range(1, w_max):
    324                         if g_w[n - 1] < 0 or n >= w_max - 1:
--> 325                             _compute_drho(gapsize * n)
    326                             n *= gapsize
    327                             self.e_tauint[e_name] = self.e_n_tauint[e_name][n] * (1 + (2 * n / gapsize + 1) / e_N) / (1 + 1 / e_N)  # Bias correction hep-lat/0306017 eq. (49)

~/phd/git/pyerrors_github/pyerrors/obs.py in _compute_drho(i)
    290 
    291             def _compute_drho(i):
--> 292                 tmp = self.e_rho[e_name][i + 1:w_max] + np.concatenate([self.e_rho[e_name][i - 1::-1], self.e_rho[e_name][1:w_max - 2 * i]]) - 2 * self.e_rho[e_name][i] * self.e_rho[e_name][1:w_max - i]
    293                 self.e_drho[e_name][i] = np.sqrt(np.sum(tmp ** 2) / e_N)
    294 

ValueError: operands could not be broadcast together with shapes (0,) (500,)

This had not been the case before. However, If I revert the change, I can create a similar behavior when I add

o.idl['test'] = [1, 5] + list(range(7, 2002, 2))
o.gamma_method()
no = np.NaN * o
no.gamma_method()

, i.e., when I use a gapped MC history with a hole, the following exception is thrown:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-4fdb987095af> in <module>
      8 o.gamma_method()
      9 no = np.NaN * o
---> 10 no.gamma_method()

~/phd/git/pyerrors_github/pyerrors/obs.py in gamma_method(self, **kwargs)
    323                     for n in range(1, w_max):
    324                         if n < w_max // 2 - 2:
--> 325                             _compute_drho(gapsize * n + gapsize)
    326                         if g_w[n - 1] < 0 or n >= w_max - 1:
    327                             n *= gapsize

~/phd/git/pyerrors_github/pyerrors/obs.py in _compute_drho(i)
    290 
    291             def _compute_drho(i):
--> 292                 tmp = self.e_rho[e_name][i + 1:w_max] + np.concatenate([self.e_rho[e_name][i - 1::-1], self.e_rho[e_name][1:w_max - 2 * i]]) - 2 * self.e_rho[e_name][i] * self.e_rho[e_name][1:w_max - i]
    293                 self.e_drho[e_name][i] = np.sqrt(np.sum(tmp ** 2) / e_N)
    294 

ValueError: operands could not be broadcast together with shapes (499,) (500,)

corr.dump method does not handle prange

It looks like the prange parameter does not get preserved after saving and reloading.
This would be very important to me, but i do not want to change it myself out of fear of breaking the format.

Fit routines break r_values

import pyerrors as pe
def f(a, x):
    return a[0] + a[1] * x

a1 = pe.pseudo_Obs(1.1, .1, 'a|1')
a2 = pe.pseudo_Obs(1.2, .1, 'a|2')
a = pe.merge_obs([a1, a2])
b = pe.pseudo_Obs(2, .001, 'b')
c = pe.pseudo_Obs(3, .001, 'c')

y = [a, b, c]
[o.gamma_method() for o in y]
fitp = pe.fits.least_squares([1, 2, 3], y, f)
print(fitp)
for o in fitp:
    print(o.value, o.r_values)

gives

Fit with 2 parameters
Method: Levenberg-Marquardt
`xtol` termination condition is satisfied.
chisquare/d.o.f.: 4.5022309160171705
Goodness of fit:
χ²/d.o.f. = 4.502231
Fit parameters:
0	 0.0002(36)
1	 0.9999(14)

0.0002401189830877699 {'a|1': 1.1000000000000003, 'a|2': 1.2, 'b': 1.15, 'c': 1.15}
0.9999099553831112 {'a|1': 1.1000000000000003, 'a|2': 1.2, 'b': 1.15, 'c': 1.15}

which is clearly wrong.

pyerrors/pyerrors/fits.py

Line 617 in a324a7f

    
           result.append(derived_observable(lambda x, **kwargs: x[0], list(y), man_grad=list(deriv[i])))

read_hd5 in pyerrors 2.9.0 not fully backwards compatible to <=2.8.2

Describe the issue:

Basically the title, I'm trying to read a couple of files, to put the data into a database, if I try this with pyerrors==2.9.0 or pyerrors==2.10.0-dev. I guess that the separation of name stem and suffix got mangled a little.
For the error message, I left out all the custom functions I built on top.

Code example:

import pyerrors as pe

# files in path are called e.g. "meson_0_0_0_h0.20_h0.20.500.h5"

file_name = "meson_0_0_0_h0.20_h0.20"
corr = pe.input.hadrons.read_meson_hd5(path, file_name, ens_id=ens_name, gammas=entry).symmetric()

Error message:

File "/usr/local/python/current/lib/python3.11/site-packages/pyerrors/input/hadrons.py", line 177, in read_meson_hd5
    return read_hd5(filestem=path + "/" + filestem, ens_id=ens_id,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python/current/lib/python3.11/site-packages/pyerrors/input/hadrons.py", line 93, in read_hd5
    files, idx = _get_files(path, filestem, idl)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python/current/lib/python3.11/site-packages/pyerrors/input/hadrons.py", line 25, in _get_files
    files.sort(key=get_cnfg_number)
  File "/usr/local/python/current/lib/python3.11/site-packages/pyerrors/input/hadrons.py", line 22, in get_cnfg_number
    return int(n.replace(".h5", "")[len(filestem) + 1:])  # From python 3.9 onward the safer 'removesuffix' method can be used.
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '20.100'

Runtime information:

system           Linux
python           3.11.5
pyerrors         2.8.2
numpy            1.24.4
scipy            1.11.1
matplotlib       3.7.2
pandas           2.0.3

Documentation for Corr class is missing

projected not working for correlators with "None" elements

I think the reason for this problem is the same which was solved in matrix_symmetric() recently. A minimal working example is

import pyerrors as pe
import numpy as np

a = pe.pseudo_Obs(1.0,0.1,'a')
l = np.asarray([[a,a],[a,a]])
n = np.asarray([[None,None],[None,None]])
x = [l,n]
matr = pe.Corr(x)
matr.projected(np.asarray([1.0,0.0]))

which results in

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [1], in <cell line: 9>()
      7 x = [l,n]
      8 matr = pe.Corr(x)
----> 9 matr.projected(np.asarray([1.0,0.0]))

File ~/.local/lib/python3.8/site-packages/pyerrors/correlators.py:158, in Corr.projected(self, vector_l, vector_r, normalize)
    156     if normalize:
    157         vector_l, vector_r = vector_l / np.sqrt((vector_l @ vector_l)), vector_r / np.sqrt(vector_r @ vector_r)
--> 158     newcontent = [None if (item is None) else np.asarray([vector_l.T @ item @ vector_r]) for item in self.content]
    160 else:
    161     # There are no checks here yet. There are so many possible scenarios, where this can go wrong.
    162     if normalize:

File ~/.local/lib/python3.8/site-packages/pyerrors/correlators.py:158, in <listcomp>(.0)
    156     if normalize:
    157         vector_l, vector_r = vector_l / np.sqrt((vector_l @ vector_l)), vector_r / np.sqrt(vector_r @ vector_r)
--> 158     newcontent = [None if (item is None) else np.asarray([vector_l.T @ item @ vector_r]) for item in self.content]
    160 else:
    161     # There are no checks here yet. There are so many possible scenarios, where this can go wrong.
    162     if normalize:

TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

As far as I understand it, the part

None if (item is None) ...

in line 158 in correlators.py is not working as expected since

n = np.asarray([[None,None],[None,None]])
n is None

returns

False

Matrix instantiation of Corr objects does not work when elements are None

Instantiation of a multi dimensional correlator matrix as an array of Corr objects results in an error if an entry of the correlators is None.

Example:

import numpy as np
import pyerrors as pe

dim = 64
x = np.arange(dim)
y = 2 * np.exp(-0.06 * x) + np.random.normal(0.0, 0.15, dim)
yerr = [0.1] * dim

oy = []
for i, item in enumerate(x):
    oy.append(pe.pseudo_Obs(y[i], yerr[i], 'test'))
    
corr = pe.Corr(oy)
corr = corr.deriv()
pe.Corr(np.array([[corr, corr], [corr, corr]]))

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_19547/2202667383.py in <module>
     10 corr = pe.Corr(oy)
     11 corr = corr.deriv()
---> 12 pe.Corr(np.array([[corr, corr], [corr, corr]]))

~/Codes/pyerrors/pyerrors/correlators.py in __init__(self, data_input, padding, prange)
     56             input_as_list = []
     57             for t in range(T):
---> 58                 if any([(item.content[t][0] is None) for item in data_input.flatten()]):
     59                     if not all([(item.content[t][0] is None) for item in data_input.flatten()]):
     60                         warnings.warn("Input ill-defined at different timeslices. Conversion leads to data loss!", RuntimeWarning)

~/Codes/pyerrors/pyerrors/correlators.py in <listcomp>(.0)
     56             input_as_list = []
     57             for t in range(T):
---> 58                 if any([(item.content[t][0] is None) for item in data_input.flatten()]):
     59                     if not all([(item.content[t][0] is None) for item in data_input.flatten()]):
     60                         warnings.warn("Input ill-defined at different timeslices. Conversion leads to data loss!", RuntimeWarning)

TypeError: 'NoneType' object is not subscriptable

Files keyword for multiple reps in read_sfcf

Hi there,
just wanted to make you aware (and maybe have a small discussion) about an inconsistency in the input.sfcf.read_sfcf_multi() and therefore also in the input.sfcf.read_sfcfi() method. There exists a keyword files, which takes a list of files to be read. I noticed, that this only works for one replica and not if you have multiple ones.
I can take care of this, and it seems not that hard to me. I'll make a PR when I am done. Before that, I just wanted to check in for one thing:
This functionality can be implemented either as a list of lists, or, similar to idl as a dict of lists.
There are arguments for both implementations in my opinion. On one hand, this kind of functionality is implemented in this function via lists of lists. On the other hand, I think it is saver to implement it via a dict, as one can recheck, if the files are written in the right replica.
It is possible to implement both and a third option which covers the current (false-ish) implementation, such that we do not break any backwards compatibility.

Constrained fit

Mathias made me take a closer look at constrained fits and I believe the constrained fits triggered by the keyword argument const_par do not work as expected. I attached a small example in which one can achieve the constraint also by subtracting a constant from the data. The errors for the unconstrained parameter differ by a relevant factor between the two methods. I have the assumption that the two should agree in the limit in which the error of the constraint is small compared to the error on the data.

import numpy as np
import pyerrors as pe

dim = 10
x = np.arange(dim)
y = -0.06 * x + 5 + np.random.normal(0.0, 0.3, dim)
yerr = [0.3] * dim

oy = []
for i, item in enumerate(x):
    oy.append(pe.pseudo_Obs(y[i], yerr[i], 'test'))

def func(a, x):
    y = a[0] * x + a[1]
    return y

my_constant = pe.pseudo_Obs(5, 0.0001, 'test')

# Fit with constrained parameter
out = pe.least_squares(x, oy, func, const_par=[my_constant])

def alt_func(a, x):
    y = a[0] * x
    return y

alt_y = np.array(oy) - my_constant
[o.gamma_method() for o in alt_y]

# Fit with the constant subtracted from the data
alt_out = pe.least_squares(x, alt_y, alt_func)

print(out, alt_out)
print(out[0] == alt_out[0])

Dublication in input.dobs.read_dobs

In the function read_dobs content is updated twice, once within and once after the if clause. I think lines 590 & 591 in https://github.com/fjosw/pyerrors/blob/develop/pyerrors/input/dobs.py should be removed as it is not possible to read an unzipped xml in the current version.

Redundancies in Corr class

When going over the Corr class I noticed that

The method projected seems to be able to do everything the method smearing does. Can we simply remove smearing?
The method Eigenvalue is undocumented and seems to follow a different workflow than the GEVP method. Is this method still needed or can it be removed or refactored?
smearing_symmetric seems to be only used within Eigenvalue.
The method sum is undocumented and not called within any other methods. Is this method still needed or can it be removed or refactored?

Can you take a look at this @JanNeuendorf ?

gamma method in corr routines

We need to add the gamma method to certain functions,
because something like:
corr.m_eff().some_func().show()
would stop working otherwise.

	def _nan_Obs_like(obs):
	samples = []
	names = []
	idl = []
	for key, value in obs.idl.items():
	samples.append([np.nan] * len(value))
	names.append(key)
	idl.append(value)
	my_obs = Obs(samples, names, idl)
	my_obs.reweighted = obs.reweighted
	my_obs.is_merged = obs.is_merged
	return my_obs

	@property
	def e_content(self):
	res = {}
	for e, e_name in enumerate(self.e_names):
	res[e_name] = sorted(filter(lambda x: x.startswith(e_name + '\|'), self.names))
	if e_name in self.names:
	res[e_name].append(e_name)
	return res

fjosw / pyerrors Goto Github PK

pyerrors's People

Contributors

Stargazers

Watchers

Forkers

pyerrors's Issues

Describe the issue:

Code example:

Error message:

Runtime information:

Describe the issue:

Code example:

Error message:

Runtime information:

Recommend Projects

Recommend Topics

Recommend Org