I don't quite understand how does the code calculate the coefficients. In particular,

Hi, you could take a look into the references inside <a href="https://cran.r-project.o

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How about that <a class="user-mention notranslate" data-hovercard-type="user" data-hov

Great <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

About theoretical background about pydse HOT 10 CLOSED

franchesoni commented on June 24, 2024

About theoretical background

from pydse.

Comments (10)

FlorianWilhelm commented on June 24, 2024

Hi, you could take a look into the references inside https://cran.r-project.org/web/packages/dse/vignettes/Guide.pdf

from pydse.

franchesoni commented on June 24, 2024

@FlorianWilhelm
I've red the guide you sent, and also the references there contained. There's some part of the S code that is similar to your code, but I could not find (maybe I'm not a good searcher) a theoretical explanation of the log likelihood calculation.

Probably there is some optimization/statistics theory I don't know, but Gilbert's publications don't seem to have an answer. A cost function build upon eigenvalues of a matrix of correlations (between variables with no lag) puzzles me.

Thanks for previous response, hope you can help me

from pydse.

FlorianWilhelm commented on June 24, 2024

@franchesoni, note that the product of all eigenvalues of a matrix equals the determinant of a matrix. In the pdf of a multivariate normal distribution you need to calculate the determinant of the covariance matrix. Got it?

from pydse.

franchesoni commented on June 24, 2024

I got that. This is what I understood:

I should assume that the residuals are independent and identically distributed according to a multivariate normal distribution with zero mean.
Then I can write down the negative log likelihood and identify the terms 'like1' and 'const' of your code.

Is the other term that I don't understand yet. Somehow (lack of latex here), res_i^T cov^{-1} res_i is the same that len(s).

Could you clarify that last step?
thanks!

from pydse.

FlorianWilhelm commented on June 24, 2024

It's actually only a bit of linear algebra. Just keep in mind that cov = res^t * res and do some transformations and you arrive at the number of non-zero eigenvalues which is len(s). Got it?

from pydse.

franchesoni commented on June 24, 2024

Thanks for the response!

Unfortunately, it does not seem immediate at all to me (and I've tried!). I would appreciate it if you could write that bit of linear algebra I'm lacking of. Sorry for the nitpicking, it's the final step!

from pydse.

FlorianWilhelm commented on June 24, 2024

Maybe I underestimated this... it was quite some time that I programmed it and I also peeked a bit into the implementation of DSE. I would say that in general res_i^T cov^{-1} res_i does not equal len(s) as you can be easily seen by constructing a counter example. So maybe it's just an approximation if you sum over all.

Also, let's assume we only have a univariate time series, thus cov^{-1} is just a scalar that only equals 1 = len(s) if that's the std of res. In this case the sum over all res_i^T cov^{-1} res_i could be approximated with sampleT * len(s) due to the definition of cov. In the general case where cov != 1 for p =1 or if we have eigenvalues not equal 1 for p > 1 this is not the case. At least that are my findings after putting one hour of thought into it.

Please tell me a bit more about your motivations in knowing this? Is this some student work or why are you so eager to know that technical detail? Have you also tried to contact the original author of DSE? In general PyDSE is more a prototype and not actively developed anymore...

from pydse.

franchesoni commented on June 24, 2024

@FlorianWilhelm , thanks for taking the time. I appreciate it.

My main motivation, aside from learning ambitions, comes from my job. I work in the Solar Energy Lab in forecasting through temporal series analysis, and I'm exploring the performance of ARMA models (now I'm comparing static digital filtering with statsmodels' Kalman filtering approach).

As I don't have anyone double checking my work, I can't copy or implement things without making sure that they are correct (not that I doubt you, it's just that I can't leave giant bugs or even think about publishing unfunded research). Nevertheless, your code is great, just what I need.

As I found analytically, when p = 1 and the res is non zero, the equality holds (squared L2 norms go away leaving only sampleT). As you said, the general case is not that obvious (at least to me). I will continue asking elsewhere as you suggest, with the confidence on the correctness of the method (I've made a few tests).

Thanks for answering all of it! Cheers

from pydse.

FlorianWilhelm commented on June 24, 2024

How about that @franchesoni? Best you copy it in some LaTeX aware editor.

We have for $r\in\mathcal{R}^{n\times p}$ that $\frac{r^tr}{n} = \mathrm{cov}\in\mathcal{R}^{p\times p}$ and now we want to calculate $\sum_{i=1}^{sampleT}r_i\mathrm{var}^{-1}r_i^t$ where $r_i\in\mathcal{R}^{1\times p}$ denotes the $i$-th row of $r$.

Putting everything together, we have that $\sum_{i=1}^{sampleT}r_i\mathrm{var}^{-1}r_i^t = \sum_{i,j=1}^{sampleT,p}r\mathrm{var}^{-1}\circ r$ where $\circ$ is the Hadamard product. Since $\sum_{i,j}(A\circ B){i,j}=\mathrm{trace}\left(B^{\mathrm {T} }A\right)$ we have $\sum{i,j=1}^{sampleT,p}r\mathrm{var}^{-1}\circ r =\mathrm{trace}\left(r^tr\mathrm{var}^{-1}\right)=\mathrm{trace}\left(n\mathrm{var}\cdot \mathrm{var}^{-1}\right)=n\cdot p$. For the degenerative case we have for p the number of non-zero eigenvalues.

from pydse.

franchesoni commented on June 24, 2024

Great @FlorianWilhelm ! It was just some linear algebra indeed.

About the degenerate case, I asked in cross-validated and there is an intuitive explanation that relates with PCA. Basically, when some eigenvalue of the covariance is negligible, we can use the principal component representation of the data, and arrive at the case you just nicely covered.

I've not done the math yet, but the intuition is there. Thanks for the very helpful responses!

from pydse.

About theoretical background about pydse HOT 10 CLOSED

Comments (10)

Related Issues (1)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent