Coder Social home page Coder Social logo

Covariate-dependent latent variables about hmsc HOT 19 OPEN

hmsc-r avatar hmsc-r commented on June 30, 2024
Covariate-dependent latent variables

from hmsc.

Comments (19)

gtikhonov avatar gtikhonov commented on June 30, 2024

Hei @aminorberg
The precise answer to your question very much depends on what you refer to as a full working example. As far as I am aware of, the functionality from the original methodological paper is properly implemented and anything not working with of that shall be treated as bug.
On the other hand, there are currently a lot of post processing utilities in the package, which bring the raw results of the MCMC model fitting closer to ecological interpretation. These are not guaranteed to work fine with variable latent loadings, both due to

  1. the fact that often it is improper to interprete them in similar manner as the constant latent loadings and the underlying math is yet to be designed,
  2. in some of those cases, when the interpretation is transferrable, the current implementation is deficient due to limited robustness caused by iterative development process.

In personal communication, @ovaskain claimed that they encountered multiple challenges with this class of models, both in terms of sensitivity of prior assumptions and/or sophisticated exploration of the posterior typical set using available MCMC algorithm.

from hmsc.

aminorberg avatar aminorberg commented on June 30, 2024

Thanks @gtikhonov!

By working example I mean e.g. an example using the data provided with the package, since I can't even get that working, and I think it's just that I don't know the syntax. This is what I tried:

studyDesign <- data.frame(sample = as.factor(1:50), 
                          plot = as.factor(sample(1:20, 50, replace = TRUE)))
rL2 <- HmscRandomLevel(units = TD$studyDesign$plot)
rL1 <- HmscRandomLevel(xData = data.frame(x1 = rep(1, length(TD$X$x1)), 
                                          x2 = TD$X$x2))

m <- Hmsc(Y = TD$Y, XData = TD$X, 
          XFormula = ~x1+x2, 
          studyDesign = studyDesign, 
          ranLevels = list("sample" = rL1, "plot" = rL2))

ps <- sampleMcmc(m, samples = 1000)

And when attempting to sampleMcmc, I get an error:

Error in (Eta[[r]][Pi[, r], ] * rL[[r]]$x[as.character(dfPi[, r]), k]) %*%  : 
  non-conformable arguments
In addition: Warning message:
In Ops.factor(Eta[[r]][Pi[, r], ], rL[[r]]$x[as.character(dfPi[,  :*not meaningful for factors

from hmsc.

gtikhonov avatar gtikhonov commented on June 30, 2024

Looks like a bug. Will have a close look at this.

from hmsc.

jarioksa avatar jarioksa commented on June 30, 2024

Looks like this comes from updateZ in loop of lines 22 to 30, and exactly there in line 28:

   for(r in seq_len(nr)){
      if(rL[[r]]$xDim == 0){
         ...
      } else{
         LRan[[r]] = matrix(0,ny,ns)
         for(k in 1:rL[[r]]$xDim)
            LRan[[r]] = LRan[[r]] + (Eta[[r]][Pi[,r],]*rL[[r]]$x[as.character(dfPi[,r]),k]) %*% Lambda[[r]][,,k]
      }
   }

and basically the error is caused by this statement with innermost loop at k == 2

 rL[[r]]$x[as.character(dfPi[,r]),k]
 [1] o o o o o o o o o o o o o o o o o o o o o o o o o c c c c c c c c c c c c c
[39] c c c c c c c c c c c c
Levels: c o

which is a factor that is not meaningful in multiplication (like the warning said), and when used within

Eta[[r]][Pi[, r], ] * rL[[r]]$x[as.character(dfPi[,r]),k]

The result drops dimensions (50,2) and returns a vector of 100 NA values and this fails in the following matrix multiplication (%*%) with the error of non-conformable arguments. So it is about handling factor variables.

from hmsc.

gtikhonov avatar gtikhonov commented on June 30, 2024

Thanks, @jarioksa, I was going to mark the same thing. We have not yet implemented the XFormula type of data specification for this feature. However, there is one more issue - namely in the alignPosterior() function, which is be default called after the MCMC is done. I have not fixed it yet, but the following code is a fully operating example of model fitting for covariate-dependent associations:

library(Hmsc)
studyDesign <- data.frame(sample = as.factor(1:50),
                          plot = as.factor(sample(1:20, 50, replace = TRUE)))
rL2 <- HmscRandomLevel(units = TD$studyDesign$plot)
xData = data.frame(x1=rep(1, length(TD$X$x1)), x2=as.numeric(TD$X$x2=="c"))
rL1 <- HmscRandomLevel(xData = xData)
m <- Hmsc(Y = TD$Y,XData = TD$X,
          XFormula = ~x1+x2,
          studyDesign = studyDesign,
          ranLevels = list("sample" = rL1, "plot" = rL2))
ps <- sampleMcmc(m, samples = 1000, alignPost=FALSE)

from hmsc.

jarioksa avatar jarioksa commented on June 30, 2024

@gtikhonov : It works in this case where x2 is a two-level factor which can be turned into one contrast variable. A more general solution is:

model.matrix(reformulate(names(TD$X[, -1])), TD$X)

model.matrix generates an (Intercept) which already is in TD$X and one of the replicated intercepts can be removed – this is one way, another way is to just drop the first column of the resulting model matrix. This solution will work when there is any number of factors with any number of levels. The generation of the model matrix could be incorporated in updateZ to allow flexible use of factor variables.

from hmsc.

oysteiop avatar oysteiop commented on June 30, 2024

Hi, I have also been exploring this recently. Despite the minor bugs the models seem to fit. Regarding how to compute the Lambdas per site as a function of the covariates, I have tried (continuing from Gleb's working example above):

#Covariates for plot-level latent variable
xmat = as.matrix(m$rL[[1]]$x)
dim(xmat)

#Residual correlations given observed covariates
OmegaCor = getPostEstimate(ps, "OmegaCor", r=1, x=xmat)$mean
OmegaCor

#Residual correlations given observed covariates for first site
OmegaCor = getPostEstimate(ps, "OmegaCor", r=1, x=xmat[1,])$mean
OmegaCor

from hmsc.

gtikhonov avatar gtikhonov commented on June 30, 2024

@oysteiop, this does not look like a valid approach:

  1. you repeat l variable (small L) both for the loop index and accumulated sum. At least my copy of your code indicates this, maybe you meant I and l. Such repetition would not bring you any good.
  2. you shall not sum over the l1 at all, and your output should have dimension [2,4,50] ([nf,ns,ny]).

The package tensorA features a function that can replace the whole loop with single line. Additionally, the Hmsc function getPostEstimate features parameter x that controls the condition for association matrix if you set parName="Omega".

from hmsc.

oysteiop avatar oysteiop commented on June 30, 2024

Thanks for spotting this typo. I have edited my comment above to you much simpler solution.

from hmsc.

jarioksa avatar jarioksa commented on June 30, 2024

I think you are on a dangerous path: these tricks work when you study a case with a two-level factor where the internal numeric coding will give the correct contrast. However, this approach fails when you have a factor with three levels, where the internal numeric coding just changes the factor into an arbitrary continuous variable. Most dangerously, it will fail silently: no error messages, but wrong results. You must study the model matrix which changes a three-level factor into two contrast variables (and p-level factor into p-1 contrast variables). Solving a margin case will bring trouble and grievance later in life.

from hmsc.

aminorberg avatar aminorberg commented on June 30, 2024

Hmm, @gtikhonov, running your example results in a new error:
Error in lambda * array(rep(rL[[r]]$x[unLdfPi[q], ], each = nf * ns), :
non-numeric argument to binary operator

from hmsc.

gtikhonov avatar gtikhonov commented on June 30, 2024

@jarioksa , I am not sure whom did you address, but here are some thoughts of mine regarding this aspect. To my mind, any of the package's internal functionality on for simplifying user's life when specifying the covariate matrix (e.g. XData + XFormula way) are potentially unsafe, since a user can easily define something very different that he/she desires without noticing it. However, I also do fully recognise that it is often nuch more gandy in this way + Otso had some ideas on using this formula-empowered way for more informative postprocessing (not implemented yet).

from hmsc.

gtikhonov avatar gtikhonov commented on June 30, 2024

@aminorberg is the problem still actual? As far as I understood from the responses by other collaborators on this branch, for them that example code was executing without issues...

from hmsc.

aminorberg avatar aminorberg commented on June 30, 2024

@gtikhonov I reinstalled Hmsc (master version), and now your example works without errors. So only the CRAN version gives the error. Thank you!

from hmsc.

aminorberg avatar aminorberg commented on June 30, 2024

@gtikhonov my problems with cov-dep latents variables continued as I tried to compute predicted values:

cv_partition <- createPartition(ps, nfolds = 2, column = "sample")
cv_preds <- computePredictedValues(ps, partition = cv_partition, expected = FALSE)

Error in LambdaPostMean[k, ] : incorrect number of dimensions

Related to the alignPost = FALSE ?

from hmsc.

jarioksa avatar jarioksa commented on June 30, 2024

Immediate reason for this is that in your example, internal item LambdaPostMean is a 3-dim array, and too few indices are given when it is referred: It should be referred as LambdaPostMean[k, , ] – that is, one comma more since three indices are needed.

from hmsc.

aminorberg avatar aminorberg commented on June 30, 2024

@gtikhonov Is there any chance that the predict function would be updated for cov-dep latent variables some time soon?

from hmsc.

rburner avatar rburner commented on June 30, 2024

Hi @aminorberg - I've been following this discussion and also have a few projects where I'm hoping to use the covariate-dependent association features.

Am I correct in understanding that covariate(s) must be categorical?

from hmsc.

Fabiogeography avatar Fabiogeography commented on June 30, 2024

Hi, I'm still struggling to get the covariate-dependent functionality to work.
The code that @gtikhonov provoided on 28th Jan 2019 makes a model, but computeAssociations fails:

Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent

This seems to be because in the postList object created by poolMcmcChains in computeAssociations, the element of Lambda that corresponds to the covariate-dependent random level (rL1) has two species * species matrices (see also @jarioksa comment on 9th Dec 2019):

postList[[1]]$Lambda

[[1]]
, , 1

[,1] [,2] [,3] [,4]
[1,] 0.0262100640 0.0186711924 5.804635e-02 0.1077498535
[2,] -0.0136396407 -0.0261227143 -5.455211e-03 -0.0122868940
[3,] 0.0013262262 0.0002285267 6.567342e-03 -0.0011980602
[4,] -0.0004133883 -0.0007863265 -3.854753e-05 0.0004773631

, , 2

[,1] [,2] [,3] [,4]
[1,] -0.2373420188 -0.0828967318 -0.066711818 -0.1553379441
[2,] -0.0478642459 0.0118648434 -0.032209240 -0.0578322325
[3,] 0.0001250210 0.0081198502 -0.003553992 0.0140523202
[4,] -0.0002287827 -0.0001740797 0.001141877 0.0001855759

[[2]]
[,1] [,2] [,3] [,4]
[1,] 0.20153787 -0.10386425 -0.08746174 -0.04134730
[2,] 0.04805722 0.01782121 0.03143461 0.02981933

I can update computeAssociations to work around this, but I'm not certain how to use these two matrices to understand how the species associations are affected by the environmental variable. Am I right in thinking that the first matrix is the species associations independent of the environmental variable and the second matrix is the species associations dependent on the selected environmental variable?

Presumably this is explained in the Matlab code for the 2017 paper, but that's not accessible to me.

convertToCodaObject and the predict functions etc... also fails for the same reason I think. Again, I can fix for my data, but would need to understand what the two Lambda matrices are.

Thanks for any light you're able to shed!

from hmsc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.