metran,pastas

No solution when using engine="numba" for Kalman Filter

For some datasets, the optimization does not succeed due to a ZeroDivisionError in SciPy.

Can be reproduced with the following dataset: test.csv

df = pandas.read_csv(test.csv, index_col=0, parse_dates=True)
mt = metran.Metran(df)
mt.solve()

The issue can be resolved by using in the SPKalmanFilter(engine="numpy"). I discussed this issue with @dbrakenhoff and we suspect that engine="numpy" is more robust since it fills in inf or nan for logarithms and fractions automatically while engine="numba" does not.

This can be resolved by allowing the user to specify the SPKalmanFilter engine which is currently only possible by changing the source code.

Save/load metran models

It would be nice to build a pas-like file (see Pastas) to save models and load them again.

We could try to follow pastas' style and have the different relevant objects in a model contain to_dict() methods. Similar to pastas this would include a series keyword argument to store the dictionary with or without time series.

We can reuse the pastas encoder to write certain datatypes to json for storing it as a file. I guess we could use a .metran extension or something similar.

Metran seems to work!

Hoi @wlberendrecht en @dbrakenhoff ! Ik heb net even getest, en het is me gelukt metran te installeren en de notebook te runnen. Een paar methodes aan het einde van de notebook moest ik aanpassen maar verder werkt alles! 👍🏻

Ik moet er nog eens rustig naar kijken, maar leuk dat het technisch al goed lijkt te werken.

Groet,
Raoul

Velicer's MAP Test results in 0 factors for Dynamic Factor Model notebook

David and I get different results dependent on our machines. They get 0 factors with Velicer's MAP test while I get 1 (as intended originally). Velicer's MAP test code:

metran/metran/factoranalysis.py

Lines 220 to 312 in d8aef35

    
               def _maptest(cov, eigvec, eigval): 
        
                   """Internal method to run Velicer's MAP test. 
        
                   Determines the number of factors to be used. This method includes 
        
                   two variations of the MAP test: the orginal and the revised MAP test. 
        
                   Parameters 
        
                   ---------- 
        
                   cov : numpy.ndarray 
        
                       Covariance matrix. 
        
                   eigvec : numpy.ndarray 
        
                       Matrix with columns eigenvectors associated with eigenvalues. 
        
                   eigval : numpy.ndarray 
        
                       Vector with eigenvalues in descending order. 
        
                   Returns 
        
                   ------- 
        
                   nfacts : integer 
        
                       Number factors according to MAP test. 
        
                   nfacts4 : integer 
        
                       Number factors according to revised MAP test. 
        
                   References 
        
                   ---------- 
        
                   The original MAP test: 
        
                   Velicer, W. F. (1976). Determining the number of components 
        
                   from the matrix of partial correlations. Psychometrika, 41, 321-327. 
        
                   The revised (2000) MAP test i.e., with the partial correlations 
        
                   raised to the 4rth power (rather than squared): 
        
                   Velicer, W. F., Eaton, C. A., and Fava, J. L. (2000). Construct 
        
                   explication through factor or component analysis: A review and 
        
                   evaluation of alternative procedures for determining the number 
        
                   of factors or components. Pp. 41-71 in R. D. Goffin and 
        
                   E. Helmes, eds., Problems and solutions in human assessment. 
        
                   Boston: Kluwer. 
        
                   """ 
        
                   nvars = len(eigval) 
        
                   fm = np.array([np.arange(nvars, dtype=float), np.arange(nvars, dtype=float)]).T 
        
                   np.put( 
        
                       fm, 
        
                       [0, 1], 
        
                       ((np.sum(np.sum(np.square(cov))) - nvars) / (nvars * (nvars - 1))), 
        
                   ) 
        
                   fm4 = np.copy(fm) 
        
                   np.put( 
        
                       fm4, 
        
                       [0, 1], 
        
                       ( 
        
                           (np.sum(np.sum(np.square(np.square(cov)))) - nvars) 
        
                           / (nvars * (nvars - 1)) 
        
                       ), 
        
                   ) 
        
                   for m in range(nvars - 1): 
        
                       biga = np.atleast_2d(eigvec[:, : m + 1]) 
        
                       partcov = cov - np.dot(biga, biga.T) 
        
                       # exit function with nfacts=1 if diag partcov contains negatives 
        
                       if np.amin(np.diag(partcov)) < 0: 
        
                           return 1, 1 
        
                       d = np.diag((1 / np.sqrt(np.diag(partcov)))) 
        
                       pr = np.dot(d, np.dot(partcov, d)) 
        
                       np.put( 
        
                           fm, 
        
                           [m + 1, 1], 
        
                           ((np.sum(np.sum(np.square(pr))) - nvars) / (nvars * (nvars - 1))), 
        
                       ) 
        
                       np.put( 
        
                           fm4, 
        
                           [m + 1, 1], 
        
                           ( 
        
                               (np.sum(np.sum(np.square(np.square(pr)))) - nvars) 
        
                               / (nvars * (nvars - 1)) 
        
                           ), 
        
                       ) 
        
                   minfm = fm[0, 1] 
        
                   nfacts = 0 
        
                   minfm4 = fm4[0, 1] 
        
                   nfacts4 = 0 
        
                   for s in range(nvars): 
        
                       fm[s, 0] = s 
        
                       fm4[s, 0] = s 
        
                       if fm[s, 1] < minfm: 
        
                           minfm = fm[s, 1] 
        
                           nfacts = s 
        
                       if fm4[s, 1] < minfm4: 
        
                           minfm4 = fm4[s, 1] 
        
                           nfacts4 = s 
        
                   return nfacts, nfacts4

On my device:

eigvec = array([[ 0.96750358, -0.25285732],  [ 0.96750358,  0.25285732]])
eigvec[0,0] = 0.9675035797467857
eigvec[0,1] = -0.25285731782401605
eigvec[1,0] = 0.9675035797467855
eigvec[1,1] = 0.2528573178240161

Later on this results in:

minfm = 1.000000000000007
minfm4 = 1.0000000000000142

which yields True for (if s = 1):

if fm[s, 1] < minfm:`

with fm[s, 1] = 1.0

Update testing routine

Add tests for newer versions of Python
Add black / isort formatting
Use tox for testing routine, similar to Pastas

Version Specifyer Deprecation

DEPRECATION: metran 0.2.0 has a non-standard dependency specifier numpy>=1.16.5matplotlib>=3.0. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of metran or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at pypa/pip#12063

lag structure and autocorrelation of residuals

Hi,
Good work.

Wondering if the metran only assumes AR(1) process.

statsmodels have more flexible DFM configurations (see here an example), but they lack the ability to determine the optimal lags and numbers of structures.

NumPy matrix deprecation

PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.

NumPy MachAr deprecation

https://numpy.org/doc/stable/release/1.22.0-notes.html

Bug in kalmanfilter - decompose

metran/metran/kalmanfilter.py

Line 582 in ac3c74c

cdf_means = [[]] * ncdf

An error occurs here for ncdf>1, as the lists produced by the loop are ncdf times too long. Consider the following example:

a = [[]] * 2
a[0].append([0])
print(a)

I think the code in metran expects [[[0]], [[]]] to be printed, but instead [[[0]], [[0]]] is printed.

In which case decompose_simulation retrieves cdf_means that are too long. Or it could definitely be the case that I don't understand the code..

Allow user to specify number of common dynamic components

It would be a nice feature to be able to override the automatic method to determine the number of common dynamic components.

For example:

import metran as mt

ml = mt.Metran(oseries, nfactors=2)
ml.solve()

Currently the FactorAnalysis class also contains a maxfactors argument that can presumably be used to limit the no. of factors. This is not exposed through the Metran model class however. So perhaps we should also expose this argument in the Metran class?

Additionally it would be nice to test the current implementation for estimating the number of factors on a dataset that results in 2 (or more) common components.

So in short:

Allow manual setting for number of factors
Expose maxfactors keyword argument in FactorAnalysis (if this makes sense)
Test metran with dataset that results in 2+ common dynamic components

Todo list when Metran goes public

problem importing metran

Hi

I am getting this error when trying to load the package

AttributeError: module 'pastas' has no attribute 'stats'

	def _maptest(cov, eigvec, eigval):
	"""Internal method to run Velicer's MAP test.

	Determines the number of factors to be used. This method includes
	two variations of the MAP test: the orginal and the revised MAP test.

	Parameters
	----------
	cov : numpy.ndarray
	Covariance matrix.
	eigvec : numpy.ndarray
	Matrix with columns eigenvectors associated with eigenvalues.
	eigval : numpy.ndarray
	Vector with eigenvalues in descending order.

	Returns
	-------
	nfacts : integer
	Number factors according to MAP test.
	nfacts4 : integer
	Number factors according to revised MAP test.

	References
	----------
	The original MAP test:

	Velicer, W. F. (1976). Determining the number of components
	from the matrix of partial correlations. Psychometrika, 41, 321-327.

	The revised (2000) MAP test i.e., with the partial correlations
	raised to the 4rth power (rather than squared):

	Velicer, W. F., Eaton, C. A., and Fava, J. L. (2000). Construct
	explication through factor or component analysis: A review and
	evaluation of alternative procedures for determining the number
	of factors or components. Pp. 41-71 in R. D. Goffin and
	E. Helmes, eds., Problems and solutions in human assessment.
	Boston: Kluwer.
	"""

	nvars = len(eigval)
	fm = np.array([np.arange(nvars, dtype=float), np.arange(nvars, dtype=float)]).T
	np.put(
	fm,
	[0, 1],
	((np.sum(np.sum(np.square(cov))) - nvars) / (nvars * (nvars - 1))),
	)
	fm4 = np.copy(fm)
	np.put(
	fm4,
	[0, 1],
	(
	(np.sum(np.sum(np.square(np.square(cov)))) - nvars)
	/ (nvars * (nvars - 1))
	),
	)

	for m in range(nvars - 1):
	biga = np.atleast_2d(eigvec[:, : m + 1])
	partcov = cov - np.dot(biga, biga.T)
	# exit function with nfacts=1 if diag partcov contains negatives
	if np.amin(np.diag(partcov)) < 0:
	return 1, 1
	d = np.diag((1 / np.sqrt(np.diag(partcov))))
	pr = np.dot(d, np.dot(partcov, d))
	np.put(
	fm,
	[m + 1, 1],
	((np.sum(np.sum(np.square(pr))) - nvars) / (nvars * (nvars - 1))),
	)
	np.put(
	fm4,
	[m + 1, 1],
	(
	(np.sum(np.sum(np.square(np.square(pr)))) - nvars)
	/ (nvars * (nvars - 1))
	),
	)

	minfm = fm[0, 1]
	nfacts = 0
	minfm4 = fm4[0, 1]
	nfacts4 = 0
	for s in range(nvars):
	fm[s, 0] = s
	fm4[s, 0] = s
	if fm[s, 1] < minfm:
	minfm = fm[s, 1]
	nfacts = s
	if fm4[s, 1] < minfm4:
	minfm4 = fm4[s, 1]
	nfacts4 = s
	return nfacts, nfacts4

pastas / metran Goto Github PK

metran's People

Contributors

Stargazers

Watchers

Forkers

metran's Issues

Recommend Projects

Recommend Topics

Recommend Org