Coder Social home page Coder Social logo

hmsc's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hmsc's Issues

Limit number of cores for HMSC

Dear colleagues,

When generating the models, how can I limit the number of cores used by HMSC. I don't seem to find a parameter for that. When I put it to run, it seems to use all the cores available in our server.

Here is the command that seems to be using all the cores:
sampleMcmc(m, samples = samples, thin=thin, adaptNf=rep(ceiling(0.4samplesthin),1), transient = ceiling(0.5samplesthin), nChains = nChains, nParallel = nChains)

Many thanks in advance,
Daniel

sampleMCMC() error on cluster

Hi - I mentioned this issue to Otso and he suggested I post here.

When I run the following code on an HPC cluster (with any value for nParallel) I get the error shown in the image:

sixmodelsHPC[[1]]
Hmsc object with 999 sampling units, 334 species, 6 covariates, 6 traits and 3 random levels

sixmodelsHPC[[1]] = sampleMcmc(sixmodelsHPC[[1]], thin = thin, samples = samples,
transient = transient, nChains = nChains, nParallel = 1,
verbose = verbose, initPar = "fixed effects")

ERROR

The same code, with the same model objects, runs fine on my PC (with an Hmsc installation from August; Hmsc installation date on the HPC cluster was today.

Secondarily, is it true that I can't use more cores than the number of chains (as per the warning)?

Thanks!

Ryan

(NNGP) Data non-numeric Error in Cross-validation

Hi,
I was running a dataset of size 1,200 sites, 5 covariates, and 141 species. I found that is is too slow to run the original spatial explicit model (I stopped my code after 10 hours). Then I'm trying to run the NNGP.
When I was testing NNGP with a much fewer MCMC sampling times, this error shows:

################################
Loading required package: usethis
Loading required package: coda
[1] "n=1200"
[1] 1200 2
[1] 1200 5
[1] 1200 141
[1] "MCMC is done"
Cross-validation, fold 1 out of 2
Error in get.knnx(data, query, k, algorithm) : Data non-numeric
Calls: computePredictedValues ... predict.Hmsc -> predictLatentFactor -> knnx.index -> get.knnx
Execution halted
################################

Do you know what's going on there?
How is my data non-numeric?
Here is my code:

#install.packages("devtools")
library(devtools)
#install_github("hmsc-r/HMSC", build_opts = c("--no-resave-data", "--no-manual"))
library(Hmsc)
library(MASS)
set.seed(6)

path = "../DATA/"
exp = "birds"
num = "1"

t_S = read.csv(paste(path, sprintf("St_%s_%s.csv", num, exp), sep = ""), header = FALSE)
t_X = read.csv(paste(path, sprintf("Xt_%s_%s.csv", num, exp), sep = ""), header = FALSE)
t_Y = read.csv(paste(path, sprintf("Yt_%s_%s.csv", num, exp), sep = ""), header = FALSE)

v_S = read.csv(paste(path, sprintf("Sv_%s_%s.csv", num, exp), sep = ""), header = FALSE)
v_X = read.csv(paste(path, sprintf("Xv_%s_%s.csv", num, exp), sep = ""), header = FALSE)
v_Y = read.csv(paste(path, sprintf("Yv_%s_%s.csv", num, exp), sep = ""), header = FALSE)

n = dim(t_S)[1] * 2
sprintf("n=%d", n)

S = rbind(t_S, v_S)
X = rbind(t_X, v_X)
Y = rbind(t_Y, v_Y)

X = as.matrix(X, nrow=n)
Y = as.matrix(Y, nrow=n)

colnames(S) = c("x-coordinate","y-coordinate")
rownames(S) = 1:n

print(dim(S))
print(dim(X))
print(dim(Y))

colnames(X) = 1:dim(X)[2]

XData = data.frame(x1=X[,])
#print(XData[1,])

###########################################
#Knots = constructKnots(S, knotDist = 0.2, minKnotDist = 0.4)

studyDesign = data.frame(sample = as.factor(1:n))
rL.spatial = HmscRandomLevel(sData = S, sMethod = 'NNGP', nNeighbours = 10) # sKnot =Knots)
rL.spatial = setPriors(rL.spatial) #We limit the model to one latent variables for visualization purposes
m.spatial = Hmsc(Y=Y, X=X, #XFormula=~x1,
studyDesign=studyDesign, ranLevels=list("sample"=rL.spatial),distr="probit")

nChains = 1
thin = 1
samples = 100
transient = 400
verbose = 0

m.spatial = sampleMcmc(m.spatial, thin = thin, samples = samples, transient = transient,
nChains = nChains, verbose = verbose,updater=list(GammaEta=FALSE))
print("MCMC is done")

#preds.spatial = computePredictedValues(m.spatial)
#print(dim(preds.spatial))
#MF.spatial = evaluateModelFit(hM=m.spatial, predY=preds.spatial)
#print(MF.spatial)

#Predictive power
partition = c(rep(1, n%/%2), rep(2, n%/%2)) #createPartition(m.spatial, nfolds = 2, column = "sample")
#print(partition)
cvpreds.spatial = computePredictedValues(m.spatial, partition=partition, updater=list(GammaEta=FALSE))
print(dim(cvpreds.spatial))
cvMF.spatial = evaluateModelFit(hM=m.spatial, predY=cvpreds.spatial)
print(cvMF.spatial)

write.csv(cvpreds.spatial, sprintf("res_%s_%s.csv", num, exp))
print("END")

######################
The dataset is from this URL: (To reproduce the bug, it takes about 10 minus.)

https://doi.org/10.5281/zenodo.2637812 (paper "A comprehensive evaluation of predictive performance of 33 species distribution models at species and community levels")

Trying to identify efficiency bottlenecks/solutions

Hi,

First, just wanted to congratulate the developers--Hmsc is a really useful and flexible tool.

I had a general question about scalability/speed and different model components, and I apologize in advance for the rambling/uncertain nature of the question. I am attempting to fit probit models for about 25 species across about 1200 spatial replicates ('plots') nested within ~400 larger spatial clusters ('grids'). Each plot has been sampled multiple times (ranging from ~ 5 to ~300), such that at the finest temporal resolution, the Y matrix has dimension 25 species x 216,000 'samples'. The key questions relate to how species occurrence and co-occurrence respond to a few spatiotemporally varying covariates. The model structure I have been trying to pursue includes between 5-10 covariates in X, 2-3 trait effects upon Beta, and 2-3 covariates associated with Lambda. There are 3 random levels I've specified:

rL1=HmscRandomLevel(units=MyStudyDesign$plot)
rL2= HmscRandomLevel(sData=GridCoords, sMethod = 'NNGP')
rL3= HmscRandomLevel(xData=data.frame(x1=rep(1, 216000),x2=XDAT$X2, X3=XDAT$X3))

Not really surprising but I've found that incorporating rL3 really slows things down (maybe ~100 samples every several days). I can condense the data some by aggregating temporal replicates ('samples'), discarding species, or discarding locations. I've found that cutting the number of rows in the Y matrix to about 40000 speeds sampling up somewhat, but maybe not sufficiently for the particular pending deadlines.

I suppose my question is whether there are certain model components where dimension reduction is likely to exhibit the most pronounced gains? Or alternatively/additionally, is there a different general way to approach spatiotemporal environmental variation in co-occurrence that might be more effective? I've considered the A matrix described in the 2017 introductory paper, but am a little worried about drastically increasing the number of estimated coefficients (to be somewhat analogous to environmental variation in lambda would seem to require interactions between the A matrix and some covariates). I am also not quite sure how to deal with the ragged nature of the sampling--in many cases, there are gaps in the time-series, and it seems like the A matrix would have to be imputed?

Thanks again for the excellent software.

John

evaluateModelFit broken?

I downloaded 3.0-4 earlier this week, and now a function that used to work no longer does.

shosh.hmsc.rand.eval <- evaluateModelFit(hM=shosh.hmsc.rand,
+ predY=shosh.hmsc.rand.preds)
Error in roc.default(response, predictor, auc = TRUE, ...) : 
  No control observation.

This command worked fine recently. If I foolishly repeat the command after the first warning R crashes and I lose eveything from that session.

How do you define a longitduinal model?

Hi again,

I have a temporal data set with no spatial information. I have a site with 3 plots that have been botanically surveyed for a number of years (1996-2015). My data looks like:

> head(XData)
  Year     meanGW Plot
1 1996  1.3192455    A
2 1996 -0.2007545    B
3 1996 -1.6307545    C
4 1997  1.1835752    A
5 1997 -0.3364248    B
6 1997 -1.7664248    C
> head(studyDesign)
  Year Plot Sample
1 1996    A   96.A
2 1996    B   96.B
3 1996    C   96.C
4 1997    A   97.A
5 1997    B   97.B
6 1997    C   97.C

With random effects

time <- as.matrix(data.frame(studyDesign$Year))
rL1 = HmscRandomLevel(sData = time)
rL2 = HmscRandomLevel(units = studyDesign$Plot)

Ive then built the Hmsc model following

m2 <- Hmsc(Y=Y2,
          XData = XData, XFormula = XFormula,
          TrData = TrData, TrFormula = TrFormula,
          distr= "lognormal poisson",
          studyDesign = studyDesign, ranLevels = list(Year=rL1, Plot=rL2))

Where XFormula <- ~ poly(meanGW, degree=2, raw=TRUE) I have not included the trait information here. Running sampleMcmc() gives me the following error:

 m2 <- sampleMcmc(hM = m2, samples=samples, thin = thin,
                 transient = transient,
                 nChains = nChains, nParallel = nChains)
Error in chol.default(W) : 'a' must have dims > 0

Is it because my temporal data has only one dimension? How do I accurately reflect the longitudinal nature of the data set? Thanks,
Chris

defining spatial structure in nested or spatially clustered sampling design

Hi,

Thanks for the excellent package and many excellent papers describing the methods. I have been working to properly specify spatial random effect in a nested sampling design. I think I have the random effects specification worked out, but wanted to pose this as a question (hopefully I'm not missing something obvious in the vignettes/help documentation). There appear to be similar issues raised in #47 and may be similar in principle to the "islands in an ocean" model described in #33 though I did not see syntax there.

I have a nested sampling design, wherein multiple samples were taken per site, and each site has a single set of GPS coordinates, i.e. no coordinates for individual samples. (These are microbial communities of trees within ~1 ha plots/sites, where the sites are distributed from 200-1000 km distance from each other.)

I first tried specifying the spatial random effect so that each sample had a set of coordinates, with coordinates for samples within a site repeated, but this resulted in a Error in chol.default(W) : the leading minor of order n is not positive definite error. (Note: this works with coordinates jittered.) I next tried to specify the spatial effect at the level of site, with a site random effect (i.e., one coordinate row for each site, with row names of the spatial matrix designated by site), and this results in the function running.

Following code gives the error

library(Hmsc)
library(MASS)
set.seed(6)

n = 100
ns = 2

#coordinates for each site, and also coordinates for each sample (that are the same for all samples within a site
plot_coords = matrix(runif(2*10), ncol = 2)
xycoords = matrix(rep(t(plot_coords),10),ncol=2, byrow = T)

colnames(xycoords) = c("x-coordinate","y-coordinate")
rownames(xycoords) = paste0("sample_", seq(1:n))
colnames(plot_coords) = c("x-coordinate","y-coordinate")
rownames(plot_coords) = paste0("site_", letters[seq(1:10)])

#toy data following vignette
beta1 = c(-1,1)
alpha = rep(0,ns)
beta = cbind(alpha,beta1)
x = cbind(rep(1,n),rnorm(n))
Lf = x%*%t(beta)
sigma.spatial = c(2)
alpha.spatial = c(0.35)
Sigma = sigma.spatial^2*exp(-as.matrix(dist(xycoords))/alpha.spatial)
eta1 = mvrnorm(mu=rep(0,n), Sigma=Sigma)
lambda1 = c(1,2)
Lr = eta1%*%t(lambda1)
L = Lf + Lr
yprob = 1*((L +matrix(rnorm(n*ns),ncol=ns))>0)
XData = data.frame(x1=x[,2])

#Study desgin, random effect, and model specification
studyDesign = data.frame(
    sample = as.factor(paste0("sample_", seq(1:n))),
    site = as.factor(rep(paste0("site_", letters[seq(1:10)]), 10)),
    spatial = as.factor(paste0("sample_", seq(1:n)))
)

rL.sample = HmscRandomLevel(units = as.factor(studyDesign$sample))
rL.site = HmscRandomLevel(units = levels(as.factor(studyDesign$Site)))
rL.spatial = HmscRandomLevel(sData = xycoords)

m.spatial = Hmsc(Y=yprob, XData=XData, XFormula=~x1,
studyDesign=studyDesign, ranLevels=list("sample"=rL.sample, "site" = rL.site, "spatial" = rL.spatial),distr="probit")

m.spatial = sampleMcmc(m.spatial, thin = 1, samples = 10, transient = 5,
nChains = 2, verbose = 0,updater=list(GammaEta=FALSE))

#Error in chol.default(W) : 
#  the leading minor of order 3 is not positive definite

Respecifying the spatial rL and studyDesign works

#redefine spatial random effect on plots and redefine studyDesign
rL.spatial = HmscRandomLevel(sData = plot_coords)

studyDesign = data.frame(
sample = as.factor(paste0("sample_", seq(1:n))),
site = as.factor(rep(paste0("site_", letters[seq(1:10)]), 10)),
spatial = as.factor(rep(paste0("site_", letters[seq(1:10)]), 10))
)

m.spatial = Hmsc(Y=yprob, XData=XData, XFormula=~x1,
studyDesign=studyDesign, ranLevels=list("sample"=rL.sample, "site" = rL.site, "spatial" = rL.spatial),distr="probit")

m.spatial = sampleMcmc(m.spatial, thin = 1, samples = 10, transient = 5,
nChains = 2, verbose = 0,updater=list(GammaEta=FALSE))

So, questions:

  1. Is spatial effect at site level as done in the second code block the appropriate syntax for a nested or spatially clustered sampling design?

  2. It seems like row names of the various objects are important here. Is that the case, and at what levels are the row names used to associate data points within different objects? E.g., it clearly seems that the row names of plot_coords in the second code block need to be coded to site, but do the row names of the yprob matrix need to be coded to sample?

Thanks for any pointers!
~Eric

Error in hM$rL[[r]]$s[levels(hM$dfPi[, r]), ] : subscript out of bounds

Hello again,

I am trying to run a model with spatial latent effects following Vignette 3 and the Manual, but getting this error. If I remove the spatial effect then the error disappears. I've checked the xycoords and it is fine - as far as I can see (I plotted it and checked the values). Below the error is the model code in full (and attached data). The xycoords are for plot locations. There are 2 samples per plot (at 1 m and at 18 m height).

m.pa.spatial = sampleMcmc(m.pa.spatial, thin = thin, samples = samples, transient = transient,

  •               nChains = nChains, verbose = verbose)
    

Error in hM$rL[[r]]$s[levels(hM$dfPi[, r]), ] : subscript out of bounds

traceback()
2: computeDataParameters(hM)
1: sampleMcmc(m.pa.spatial, thin = thin, samples = samples, transient = transient,
nChains = nChains, verbose = verbose)

xycoords <- matrix(c(plot.char$UTM_E,plot.char$UTM_N),ncol=2)
colnames(xycoords) <- c("UTM_E","UTM_N")
rownames(xycoords) <- plot.id
xycoords <- apply(xycoords,2, function(x) sapply(x, function(m) m - min(x)))

studyDesign = data.frame(plot = plot.id, sample = sample.id)
rL.spatial = HmscRandomLevel(sData = xycoords)
rL.spatial = setPriors(rL.spatial,nfMin=1,nfMax=1)
rL2 = HmscRandomLevel(units = studyDesign$sample)

#set presence-absence model
m.pa.spatial <- Hmsc(Y = sign(lh_sp), XData = plot.char, XFormula = ~ HEIGHT + ELEV + SLOPE + LAI, distr = "probit", studyDesign = studyDesign, ranLevels = list("plot" = rL.spatial))

plot.char.txt
lh_sp.txt

constructGradient error and conditional estimates of species associations

Hello,

I am experimenting with the current Hmsc R package (from CRAN). I have a time-series (6 time points) abundance data set of terrestrial arthropods (100spp), 14 plots (w/ spatial structure), 102 sampling units, and 1 categorical trait variable (9 levels). I'm examining how species respond to 2 covariates: (1) time/julian date (continuous) and (2) Farm type (categorical with 2 levels: O & C).

My model structure is as follows:

hmsc_mod = Hmsc(Y = sp_comp, XData = XData, XFormula = XFormula,
                TrData = trait_data, TrFormula = TrFormula,
                studyDesign = studyDesign, 
                ranLevels = list(farm = randPlot, sample = randSample, space = randSpace),
                distr = 'poisson')

At the moment, I have 2 questions:

1) Error message from constructGradient()

When I try to use the constructGradient() function to create a new dataframe for prediction, I get the follow error message:

 Gradient <- constructGradient(output, focalVariable = "julian_date",
        non.focalVariables = list('farm_type'=list(3,"O")))
Error in `[.data.frame`(hM$XData, , vars[i]) : undefined columns selected
> traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(hM$XData, , vars[i])
3: hM$XData[, vars[i]]
2: is.factor(hM$XData[, vars[i]])
1: constructGradient(output, focalVariable = "julian_date", non.focalVariables = list(farm_type = list(3, "O")))

I tried running the source code directly but still could not find where the issue is.

2) Covariate-dependent species-to-species associations

I am aware that the Matlab version has this functionality (Tikhonov et al 2017). I'm wondering if there is a way to do this in the R package.

Any help on either of the two questions would be much appreciated. Thanks!

sampleMcmc Error in tcrossprod(x,y)

Hi Hmsc!

Thanks for a great package :)

I am getting a consistent error in some (not all) of the examples given in the vignettes as well as my own data. The error happens when running sampleMcmc(). This example in Vignette_1 gives me the error for example.

> n = 50
> x = rnorm(n)
> alpha = 0
> beta = 1
> sigma = 1
> L = alpha + beta*x
> y = L + rnorm(n, sd = sigma)
> Y = as.matrix(y)
> XData = data.frame(x = x)
> m = Hmsc(Y = Y, XData = XData, XFormula = ~x)
> nChains = 2
> thin = 5
> samples = 1000
> transient = 500*thin
> verbose = 500*thin
> m = sampleMcmc(m, thin = thin, samples = samples, transient = transient,
+ nChains = nChains, verbose = verbose)

[1] "Computing chain 1"
Error in tcrossprod(x, y) : 
  requires numeric/complex matrix/vector arguments

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
 [1] LC_CTYPE=en_NZ.UTF-8       LC_NUMERIC=C               LC_TIME=en_NZ.UTF-8        LC_COLLATE=en_NZ.UTF-8    
 [5] LC_MONETARY=en_NZ.UTF-8    LC_MESSAGES=en_NZ.UTF-8    LC_PAPER=en_NZ.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] truncnorm_1.0-8 statmod_1.4.32  phytools_0.6-99 pROC_1.15.3     pdist_1.2       nnet_7.3-12     mvtnorm_1.0-11  MCMCpack_1.4-4 
 [9] Matrix_1.2-17   ggplot2_3.2.0   BayesLogit_0.6  abind_1.4-5     dplyr_0.8.3     fields_9.8-3    maps_3.3.0      spam_2.2-2     
[17] dotCall64_1.0-0 MASS_7.3-51.1   ape_5.3         corrplot_0.84   knitr_1.23      stringr_1.4.0   Hmsc_0.4.3.0    coda_0.19-3    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2              lattice_0.20-38         snow_0.4-3              gtools_3.8.1            assertthat_0.2.1       
 [6] R6_2.4.0                plyr_1.8.4              MatrixModels_0.4-1      pillar_1.4.2            rlang_0.4.0            
[11] lazyeval_0.2.2          rstudioapi_0.10         SparseM_1.77            phangorn_2.5.5          combinat_0.0-8         
[16] splines_3.6.1           igraph_1.2.4.1          munsell_0.5.0           xfun_0.8                compiler_3.6.1         
[21] numDeriv_2016.8-1.1     pkgconfig_2.0.2         mnormt_1.5-5            mgcv_1.8-28             mcmc_0.9-6             
[26] tidyselect_0.2.5        tibble_2.1.3            expm_0.999-4            quadprog_1.5-7          withr_2.1.2            
[31] crayon_1.3.4            nlme_3.1-140            gtable_0.3.0            magrittr_1.5            scales_1.0.0           
[36] stringi_1.4.3           scatterplot3d_0.3-41    fastmatch_1.1-0         tools_3.6.1             glue_1.3.1             
[41] purrr_0.3.2             yaml_2.2.0              plotrix_3.7-6           colorspace_1.4-1        animation_2.6          
[46] clusterGeneration_1.3.4 quantreg_5.42.1    

Any help would be greatly appreciated :)

Chris

Scaling the Y for normal observation model

In commit 81d4273#diff-d385a9a4c0892cbfdbaef4b4b3dc6fc7 @ovaskain introduced the potential scaling for the columns of Y matrix that are assigned with Gaussian observation model. The scaled object is constructed in the Hmsc() call and is used later during the sampling and predictions.

HMSC/R/Hmsc.R

Line 646 in afaf521

#scaling of response

But as far as I can judge based on the code, we never apply the inverse scaling. Is it a poorly documented feature or a bug?

Error when using random levels

Hello! I have a multivariate dataset of counts of sediment infauna organisms (65 spp, YData) and I would like to relate that to (XData) distance to oyster reefs (continuous, in meters) and position in the reef (between reef patches or outside, 'InOut'). I also have 3 random levels: spatial coordinates (sData), volume collected for each sample (in liter) and percentage of silt in each sample.

So my studyDesign is: studyDesign = data.frame(sample = rownames(silt),vol=vol,silt=silt)

I have no problems when I ran the model with sData only:
rL = HmscRandomLevel(sData = xycoord@coords)
m = Hmsc(Y = YData, XData = XData, XFormula = ~InOut*oysters, studyDesign = studyDesign, ranLevels=list("sample"=rL),distr="poisson")
m = sampleMcmc(m, thin = thin, samples = samples, transient = transient, nChains = nChains, verbose = verbose)

But when I add volume and/or silt (either of those), I get an error:
m = Hmsc(Y = YData, XData = XData, XFormula = ~InOut*oysters, studyDesign = studyDesign, ranLevels=list("sample"=rL,"vol"=rL2,"silt"=rL3),distr="poisson")
m = sampleMcmc(m, thin = thin, samples = samples, transient = transient, nChains = nChains, verbose = verbose)

[1] "Computing chain 1"
Error in (Eta[[r]][hM$Pi[, r], , drop = FALSE] * hM$rL[[r]]$x[as.character(hM$dfPi[, :
non-conformable arguments

I wonder if I am not specifying these random levels properly, as I have not found any examples of how to do this. My data looks like this:

head(vol)

vol
HR1 1
HR10 1
HR11 1
HR12 1
HR13 1
HR14 1

head(silt)

silt
HR1 89.334
HR10 56.859
HR11 78.656
HR12 72.245
HR13 13.270
HR14 84.430

rL2 = HmscRandomLevel(xData = vol)
rL3 = HmscRandomLevel(xData = silt)

Any help would be appreciated.
Thanks
Ana

installation in windows 10

I cannot install Hmsc in windows 10. The message is as follow:

> install_github("hmsc-r/HMSC", build_opts = c("--no-resave-data", "--no-manual"))

Downloading GitHub repo hmsc-r/HMSC@master
Installing 52 packages: coda, abind, ape, fields, FNN, ggplot2, MCMCpack, mvtnorm, pdist, pROC, phytools, SparseM, statmod, truncnorm, spam, maps, gtable, lazyeval, reshape2,
scales, tibble, viridisLite, mcmc, quantreg, plyr, animation, clusterGeneration, combinat, 
expm, gtools, mnormt, numDeriv, phangorn, plotrix, scatterplot3d, dotCall64, labeling, 
munsell, RColorBrewer, colorspace, fansi, pillar, pkgconfig, utf8, vctrs, ellipsis, zeallot,
MatrixModels, magick, quadprog, igraph, fastmatch
Installing packages into ‘C:/Users/yuanheng/Documents/R/win-library/3.6’
(as ‘lib’ is unspecified)
Error: Failed to install 'Hmsc' from GitHub:
  (converted from warning) unable to access index for repository jarioksa.github.io/drat/src/contrib:
  scheme not supported in URL 'jarioksa.github.io/drat/src/contrib/PACKAGES'

I'm in China. I've tried installing with or without VPN. The error message remains the same.

WAIC on lognormal Poisson models

Hi,

A quick question about calculating WAIC - the computeWAIC() function works fine on my probit models, but throws an NaN on my lognormal poisson models. Is this a bug, or does the WAIC function not work on these models?

Thanks again for developing the HMSC package and all your work improving it and answering questions.

Best,

Ryan

Numerical instability in a very uneven spatial study design

Dear @gtikhonov ,

I am having a similar problem trying to sampleMcmc a model 2 random effects, one that was the spatial coordinates and the other a regional categorical variable. Since I have >1000 community samples, the spatial method set to GPP, using about 117 knots. It is a hurdle model, so I am first fitting the presence/absence data and then species abundances (individuals ha-1 from 0.001 until ~2000) and relative densities (0-1 values). I log-transformed abundances and relative densities to use distr = "normal".

All my coordinates are in meters and I have jittered coordinates that were too close to each other to avoid problems related to the precision of the spatial coordinates. However, I still have distances between units that vary from 5 meters to 3e+06 meters. Anyways, not sure if this spatial precision is indeed an issue while using GPP.

Notably, the error only appeared when I increased the thin x samples combination and only for the abundance part of the model. The error appears error by the end of chain 1, sometimes of chain 2:
Error in chol.default(iU) :
the leading minor of order 19 is not positive definite

I run the tests you suggested for keilast here are the results for the computeDataParameters part of the sampleMCMC function. Here are the results:
alpha = 4257548
ag = 101
W is positive-definite and has no NAs
'distance' has no NAs but it is not positive-definite.
's' a matrix with 2 columns of numbers and no NAs, with upper off-diagonal = 8916294 and lower off-diagonal = 5766795 9000554.

In conclusion, I think the problem is probably in the small distance between some of my sites. But there may be a possible interaction between these distances and data type (problem not arises with presence/absence data, only abundances).

Any clues? You mentioned some that you had more detailed advice on how you can handle the whole data. I have spent the entire morning truing to solve this issue, so any help would be very welcome.

Thanks in advance,
Renato

Originally posted by @LimaRAF in #8 (comment)

NAs in Y matrix cause MCMC error: Error in if (any(y < 0))

Hi - I'm trying to fit a hurdle model and so have set all 0s in my Y matrix to NA.

However, when I try to use sampleMcmc() I get the following error:

m1 = sampleMcmc(t21L[[1]], thin = thin, samples = samples, transient = transient,
+ nChains = nChains, verbose = verbose, initPar = "fixed effects",nParallel = 1)

[1] "Setting updater$Gamma2=FALSE due to specified phylogeny matrix"
[1] "Computing chain 1"
Hmsc::computeInitialParameter - initializing fixed effects with SSDM estimates

Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' family") :
missing value where TRUE/FALSE needed

With some testing it looks like this is true even if I introduce a single NA to an otherwise functional Y matrix. But when I put NAs into a simulated example it doesn't give the error. What am I overlooking?

I have the latest github installation (reinstalled today).

Thanks!

Conditional prediction: how to specify the set of species we want to condition on?

Hello,

I'm trying to compute the conditional prediction. Let's say I want to predict the distribution of an unobserved species given the presence/absence of another species. To provide a reproducible example, I run vignette_2 (multivariate low) and sampled the mcmc chains, contained in the object m.

My aim is to compute the distribution of species 2, given the observations of species 1 and some new environmental covariates XDataNew that I create as:

n_new=20
x1_new = rnorm(n_new)
x2_new = rnorm(n_new)
XDataNew = data.frame(x1=x1_new,x2=x2_new)

studyDesign_new = data.frame(sample = as.factor(1:nrow(XDataNew)))
rL_new = HmscRandomLevel(units = studyDesign_new$sample)

This allows me to compute the distribution of species in new sites :

Pred = predict(m, XData= XDataNew,study=studyDesign_new, ranLevels = list(sample=rL_new), expected=T)

Now, I want to provide the observations of species 1, to compute the distribution of species 2 given species 1. The help function says: "Yc : a matrix of the outcomes that are assumed to be known for conditional predictions."

Yc_new = as.matrix(rnorm(n_new))

However, if I run :

Pred = predict(m, XData= XDataNew,study=studyDesign_new, ranLevels = list(sample=rL_new), Yc= Yc_new, expected=T)
An error appears:

Error in predict.Hmsc(m, XData = XDataNew, study = studyDesign_new, ranLevels = list(sample = rL_new),  : 
  hMsc.predict: number of columns in Yc must be equal to ns

The function requires Yc to have the same number of column as the number of species. I guess that this means that the function computes the distribution of each species conditionally on all the other ones. However, this is not my aim since I just want to compute the distribution of one (or more) species, given the occurrences of one (or more) species, that I can arbitrarily choose.

Is this possible in the package? How can I deal with it? I have looked in the vignette but conditional prediction is only used as an example to improve the predictive power of the model in CV.

Thanks a lot for your help and sorry for the long message, I tried to making as clear as possible.

Giovanni

can't fix the number of latent factor

Hi,

I am trying to compare the HMSC model for different number of latent factors. I compute the models by fixing the number of latent factor. The performance of the models increase with 0, 1 and 2 latent factors but when I am trying to run an Hmsc model with 3 or more latent factors, I don't see any difference with the Hmsc model with only 2 latent factors (I suppose it's the optimal number of latent factors to avoid modeling noise). I compare the models with AUC and R2 of Tjur.

Here is my code :

thin = 30
samples = 1000
transient = 10000
verbose=500*thin
nbChains = 2

sample.id=1:761
sample.id = as.factor(sample.id)
studyDesign = data.frame(sample=sample.id)

rL = HmscRandomLevel(units = studyDesign$sample)

rL$nfMax=3
rL$nfMin = 3

model = Hmsc(Y = ydata, XFormula = ~.,
XData = xdata, distr = "probit",
studyDesign = studyDesign, ranLevels = list("sample"=rL))

res_HMSC = sampleMcmc(model, samples=samples,
transient = transient, thin = thin,
verbose = verbose,
nChains = nbChains,
adaptNf = rep(0, model$nr))

Can anybody helps me ?

Sporadic error during fitting or cross-validation in: sampleMcmc -> alignPosterior -> abind

Hi. When fitting several lists of 3-4 models I've been getting the following error, but not every time.

For example, when I run fit the models using short chains (e.g. 50 samples with thin = 3) as a test, all three models will fit and cross-validate successfully.

But, when I try with more samples (using the mcmc parameters as below) I eventually get the error I show below. Sometimes the error comes up during model fitting, other times during cross-validation. Sometime on the first model in the list, sometimes the second.

The models all have 100 sites, 100 species, 5 continuous environmental covariates, 3 trait covariates (one of which is a 3-level factor), and a phylogeny. Random effects include site (non-spatial), year, and project.

#read in models
redHPC1=readRDS("t02ns.rds")

#make list for cross validation results
crossFit=list("a")

#set fitting parameters
nChains = 2
thin = 5
samples = 1000
transient = 400thin
verbose = 150
thin

for (i in 1:length(redHPC1)) {
#run the model in parallel
redHPC1[[i]] = sampleMcmc(redHPC1[[i]], thin = thin, samples = samples,
transient = transient, nChains = nChains, nParallel = 2,
verbose = verbose, initPar = "fixed effects")

#save results
saveRDS(redHPC1, file = "t02nsFIT.rds")

#do cross-validation
partition = createPartition(redHPC1[[i]], nfolds = 2, column = "Site")
preds = computePredictedValues(redHPC1[[i]],partition=partition,nParallel = 2)
crossFit[[i]] = evaluateModelFit(hM=redHPC1[[i]], predY=preds)

#save cross-validation results
saveRDS(crossFit, file = "t02nsCROSS.rds")
}`

[1] "Setting updater$Gamma2=FALSE due to specified phylogeny matrix"
Error in abind(cpL[[j]]$Delta[[r]], array(1, DeltaAddDim), along = 1) :
arg 'X2' has dims=1, 2; but need dims=X, 1
Calls: sampleMcmc -> alignPosterior -> abind

prediction for poisson model

Hi,

I tried running a model using poisson distribution (the response are counts of individuals). However, when I do the predictions of the model I get continuous numbers. Instead of 2 or 3 individuals, the models predicts 0.764534 individuals, for example. Thus, in some sites I end with a total abundance of 5 individuals and 10 species, which is hard to explain. Also, there is no site with a prediction of 0 individuals for any of the species.
Do you know how to interpret the predictions of the model, or better, how to get predictions that are integer numbers?

Thanks,
Aimara

sampleMcmc() error in chol.default(W)

I am trying to fit a model with 4 random effects, one is spatial coordinates to be specified with the sData argument, and three are hierarchical (Quadrat, Site, and Region). The input for the spatial random effect object (sData) is a matrix with two columns (lat and long), and a row for each observation. The spatial coordinates are in conventional lat and long to four decimal places, as some points are as close as 15m apart. However each row has a distinct set of coordinates; I have tested this. The object "Pi" below describes my three hierarchical random effects.

spat <- data.frame(spat = sprintf('spatial_%.2d',1:78))
studyDesign <- cbind(Pi, spat)

rL1 = HmscRandomLevel(units = studyDesign$Quadrat)
rL2 = HmscRandomLevel(units = studyDesign$Site)
rL3 = HmscRandomLevel(units = studyDesign$Region)
rL4 = HmscRandomLevel(sData = coords[,2:3])

m <- Hmsc(Y = Y, XData = X, XFormula = ~eelgrass_lai +  dissox + salinity + ph + sstmean + nitrate + chlomean, studyDesign = studyDesign, ranLevels = list(Quadrat = rL1, Site = rL2, Region = rL3, spat = rL4))

When I use the sampleMcmc() function I get the error:

Error in chol.default(W) : the leading minor of order 2 is not positive definite

I do not see this error when I attempt to fit the exact same model but with no spatial random effect (rL4 not included). I have also tried swapping the order of the rL's when specifying my Hmsc object, but this doesn't change anything.

Variance partitioning plot

Hello,
I was just going through the model outputs.
There is a small component missing in the plot function.
It does not calculate/show the R2 of traits when using the plotVariancePartitioning() function.

Why "tr1" was automatically added when construct Hmsc model?

Hi,

When I constructed Hmsc object use the following code:
m_JSD <- Hmsc(Y = as.matrix(Y_JSD_f4), XData = XData_JSD, XFormula = XFormula_JSD, studyDesign = studyDesign_JSD, ranLevels = list(SampleID = rL_sample), distr = "lognormal poisson")
Obviouly, I didn't include any trait data. But, when I run m_JSD$Tr, I got

  tr1

 [1,]   1
  [2,]   1
  [3,]   1
  [4,]   1
  [5,]   1
  [6,]   1
  [7,]   1
  [8,]   1
  [9,]   1

After I run sampleMcmc(m_JSD, thin = thin, samples = samples, transient = transient, nChains = nChains), I also got gamma matrix, which measure the influcences of traits on species niches.

     G[(Intercept) (C1), tr1 (T1)] G[PC1 (C2), tr1 (T1)] G[PC2 (C3), tr1 (T1)] G[PC3 (C4), tr1 (T1)]
 [1,]                     -1.379801           0.115521989            -0.3555763            0.04924043
 [2,]                     -1.384214           0.071302217            -0.4061238           -0.01881592
 [3,]                     -1.350750           0.120859871            -0.3869807            0.09454399
 [4,]                     -1.331357           0.046136793            -0.3523373           -0.00651528
 [5,]                     -1.326145          -0.011303150            -0.4620844           -0.02800752
 [6,]                     -1.365764           0.123376021            -0.4005528            0.02288172

I feel confused about this. Is it a bug?

Computing associations due to environmental covariates

Hello,

I see that Hmsc has a function computeAssociations to investigate the residual correlations, but I would like to see also the associatios due to covariates, as they seems to be the shared environmental response. Is there any way to extract this information now?

Thanks,
Marcio

Grouping Variance partitioning results by site

I have a spatially explicit data set of invertebrates and so I am using a classical site x species matrix.

Is there currently a way to group variance partitioning results by sites rather than species? I know the group feature currently place allows you to aggregate variables together(i.e. multiple environmental variables), but I'm asking about something entirely different.

For instance, now variance partitioning results are grouped at the species level and show how species are influenced by environmental and random effects. One can aggregate these scores and get an overall picture of how all species across sites are influenced by environmental and random effects. However, I've seen in some of the other JSDM packages (Hmsc) allows you to groups group by sites, which allows one to get an idea of how the importance of environmental and random effects vary among sampling locations (i.e.sites).

Thanks you!

Hmsc::computePredictedValues - Error in Eta[[r]]

Hi everyone!

Thanks for the package! I've got an error when I try to use the function Hmsc::computePredictedValues. I've got the following error message Error in Eta[[r]][as.character(dfPiNew[, r]), ]: no dinames attribute for array.

So I dig a bit onto your code, and in fact the problem come from the HMSC:::predict.Hmsc(). At a moment, it called the function Hmsc::predictLatentFactor to compute predPostEta[[r]] which is in turn used to compute Eta[[r]]. Hmsc::predictLatentFactorreturns returns a list of array of dim 0x5.

I don't know if the issue is caused by my model specification, or by the package itself. So I will show you how I had constructed my model.

So here are my model specs : Yis a matrix of 172x92 matrix, X is 172x92 data frame and the sampling_design is a data frame of 172x1 samples id.

Here is the heart of the code :

rL <- HmscRandomLevel(units = sampling_design$sample)

model <- Hmsc(Y = as.matrix(Y), XData = as.data.frame(X),
              XFormula = ~ .,
              distr = "lognormal poisson",
              studyDesign = sampling_design,
              ranLevels = list(sample   = rL))

thin      <- 100

samples   <- 1000
transient <- ceiling(0.5 * thin * samples)
nChains   <- 4

hmsc_samp <- sampleMcmc(model,
                   samples = samples,
                   thin = thin,
                   transient = transient,
                   nChains = nChains,
                   nParallel = 1,
                   verbose = 1
)

I hope you find the solution to my problem.

Regards,

Clément.

Unclear need for extra return value

@ovaskain , I have a concern about this change you've implemented. This is about L value that you changed to be outputted from the updateZ method and later saved to the posterior. This value is deterministically derived from other model parameters and data. It also is asymptotically the largest element saved to the posterior, and it makes the fitted model almost 2 times larger. Are you sure there is no way to avoid saving this non-essential information?

return(ZL)

error in getPostEstimate

I'm trying to better understand the species associations, and attempting to look at the Eta coefficients. However, I get an error with getPostEstimate I don't understand

class(shosh.hmsc.rand)
[1] "Hmsc"
Eta <- getPostEstimate(shosh.hmsc.rand,'Eta')
Error in abind(..., along = 0) :
arg 'X1001' has dims=1, 150, 4; but need dims=X, 150, 5

Specification of fixed and random effects in hmsc 3.0.4

Hi,

First of all, thank you very much for developping the Hmsc package in R, and for the very clear papers and vignettes about this very interesting and useful framework.

I'm reporting the following error that popped up while I was estimating a model with a handfull of random effects, while the model was working fine without them:

[1] "Setting updater$Gamma2=FALSE due to specified phylogeny matrix"
[1] "Computing chain 1"
Error in Eta[[r]][hM$Pi[, r], ] : out of range index

Here is the traceback():

3: computeInitialParameters(hM, initPar)
2: sampleChain(chain)
1: sampleMcmc(m4, thin = thin, samples = samples, transient = transient, 
       nChains = nChains, verbose = verbose)

I would appreciate some help and feedback to figure out what is happening. One part of my struggle comes from deciding which categorical predictors to treat as random or fixed effects, and the other about how to specify these in the models. Reading the issue "Specification of random effects for nested+longitudinal study design #32" did not helped. I provide some context below.

I'm intending to use Hmsc to study microbial communities. I'm currently working on a reduced set of 50 species (YData) to get some experience with Hmsc and figure out a script that works fine (typically, I could work with up to 10,000 species). I have environmental (XData) and phylogenetic correlation (CData) data in hands, trait data (TData) will come later.

  • The 108 sampling units have a unique sample.id that can be grouped in the following sets of observations: 3 Replicates X 2 Fractions X 3 Compartments X 6 Sites. Because I expect these sets share a common property, I would be inclined to treat them all as random effects. I am rigth? What about the Sample.id?

  • I've defined each rL with HmscRandomLevel(), e.g. for the effect of Site HmscRandomLevel(units = studyDesign$Site). Then I specified ranLevels = list("Site" = rL0, "Compartment" = rL1, "Fraction" = rL2, "Replicate" = rL3, "Sample.id" = rL4) in my model. Is that right or I am missing something?

Thanks for reading this post. Feel free to ask more details if needed, I'll be happy to provide them.

Adrien

Hmsc.Rd blocked

Because Hmsc-package.Rd has an alias for Hmsc, when you request ?Hmsc you get the package help page and not the function. Since the function has numerous arguments it's essential to get Hmsc.Rd to load.

Hmsc 0.4.3.2
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 18.10
    .     .     .     .

Error in chol.default(iU) : the leading minor of order 2 is not positive definite

Hello,
I am getting the above error when running sampleMcmc (after "Computing the first chain"). My community is a set of 7 tree species measured with basal area and 13 beetle species whose distribution best fits a lognormal poisson sampled from 22 plots. When I replace with a pa dataset and distr = "probit", the problem disappears. Any idea how I might get round this problem for the abundance dataset? I was also previously running models on just the beetle dataset fine.

This is my code setting up the model:

## set model parameters
set.seed(1)
sample.id <- rownames(com)
studyDesign = data.frame(sample = sample.id)
rL = HmscRandomLevel(units = studyDesign$sample)
xData <- plot.char
distr = c(rep("normal",7), rep("lognormal poisson",13))

#set model
m.comNULL <- Hmsc(Y = com, XData = xData, XFormula = ~1,
             distr = distr, studyDesign = studyDesign, ranLevels = list("sample" = rL))

version in ubuntu

I installed Hmsc with the three lines code given in this repo. But my Hmsc version in R is '‘0.4.1.0' not '0.4.3.2'. I have ubuntu 18.04. Is that possible to solve this problem?
Thanks a lot!

mu estimates

Hello,
I have searched through the coda object etc as best I can, but I cant seem to find the community level mu estimates for the model. Where am I missing it? Thanks!

XData cannot be a tibble

If XDatais a tibble instead of a data.frame the function fails with

Error in switch(class(XData), list = { : EXPR must be a length 1 vector

the error message comes from the call to switch in switch(class(XData), ... as class(XData) gives [1] "tbl_df" "tbl" "data.frame" and thus a vector of length 3, if XData is a tibble.

repex:

Y = matrix(rpois(100, 2), nrow = 5)

XData = tibble(A = rnorm(20, 2,1),
               B = rnorm(20, 5, 2))

Hmsc(Y = Y, XData = XData, XFormula = ~A+B)

Just wanted to let you know as many people might use tibbles these days and the error message is pretty cryptic.

plotGradient cuts off original data

Hi,
I recently realized that the "plotGradient" function is cutting off original data. The problem is that the ylim is set to min and max of the predictions, not the original data.
Best
Jörg

plotGradient
plotGr

ggplot
image

Vignettes not installed

I installed the package using install_github("hmsc-r/HMSC") in a ubuntu 16.04 LTS PC and the vignettes were not installed. The output of sessionInfo() is:

R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=es_AR.UTF-8       LC_NUMERIC=C               LC_TIME=es_AR.UTF-8        LC_COLLATE=es_AR.UTF-8     LC_MONETARY=es_AR.UTF-8   
 [6] LC_MESSAGES=es_AR.UTF-8    LC_PAPER=es_AR.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_AR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] devtools_1.13.6 Hmsc_0.4.1.0    snow_0.4-3      truncnorm_1.0-8 tensorA_0.36.1  phytools_0.6-60 ggplot2_3.1.1   pROC_1.14.0    
 [9] pdist_1.2       nnet_7.3-12     mvtnorm_1.0-10  MCMCpack_1.4-4  MASS_7.3-51.1   coda_0.19-2     Matrix_1.2-17   fields_9.6.1   
[17] maps_3.3.0      spam_2.2-2      dotCall64_1.0-0 BayesLogit_0.6  ape_5.3         abind_1.4-5    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1              lattice_0.20-38         assertthat_0.2.1        digest_0.6.18           R6_2.4.0               
 [6] plyr_1.8.4              MatrixModels_0.4-1      pillar_1.3.1            rlang_0.3.4             lazyeval_0.2.2         
[11] rstudioapi_0.7          SparseM_1.77            phangorn_2.5.3          combinat_0.0-8          igraph_1.2.4.1         
[16] munsell_0.5.0           compiler_3.5.3          numDeriv_2016.8-1       pkgconfig_2.0.2         mnormt_1.5-5           
[21] mcmc_0.9-6              tidyselect_0.2.4        tibble_2.1.1            expm_0.999-4            quadprog_1.5-5         
[26] crayon_1.3.4            dplyr_0.7.6             withr_2.1.2             nlme_3.1-139            gtable_0.3.0           
[31] magrittr_1.5            scales_1.0.0            scatterplot3d_0.3-41    bindrcpp_0.2.2          fastmatch_1.1-0        
[36] tools_3.5.3             glue_1.3.0              purrr_0.2.5             plotrix_3.7-5           yaml_2.2.0             
[41] colorspace_1.3-2        memoise_1.1.0           animation_2.6           clusterGeneration_1.3.4 bindr_0.1.1            
[46] quantreg_5.38    

Predicting species abundance

Thanks to the developers for this useful package and scientifically important contribution.
I would like to better understand if HMSC is the right tool for my task.

Briefly, I am trying to create maps of animal species density ( individuals/km2) from distribution maps (presence/absence) and density estimates. The covariates will be species traits (body mass, diet, etc.) and environmental variables ( NPP, temperature, etc. ).
A model such as: Dens ~ trait1 + trait2+ env1^2 + ... etc.
A few more details:

  1. I am only interested in predicting density where a species is present, so the distribution maps are only used as a "mask" to avoid predicting over a global map.
    However I am not sure if the community component offered by jSDM is important because in my case NPP is related to species richness, so by adding NPP as a variable I already capture that aspect. How are the biotic interactions between species captured in this case?

  2. Sample size for single species can be quite low (<10 locations/grid cells) compared to a distribution of +500 grid cells.
    How well does HMSC work with small sample size? And is it a good idea to pool similar species using taxonomy or phylogeny or traits?

Some of these questions are rather complicated to answer but would appreciate any general feedback on the feasibility of HMSC to solve this task.
Thanks in advance.

Slow mixing for Poisson counts

Hello,

We are encountering some difficulties when modelling Poisson counts with the most recent version of the Hmsc package (https://github.com/hmsc-r/HMSC). Our data consist of highly overdispersed counts for 113 species in 2603 sites. The model includes 7 covariates and 3 random effects.

We are running a single MCMC chain, which shows poor mixing and takes a long time (several days) to complete the specified number of iterations (50K including 25K burn-in).

For comparison, we ran the same model using the earlier version of the package (HMSC; https://github.com/guiblanchet/HMSC), and the same data used in Hmsc. The resulting chains show better mixing and run considerably faster (see accompanying figure below). The figure shows output for species 113 only, but the same pattern holds for most other species as well.

We are wondering whether the default settings (e.g., for priors, or for proposal variances) in Hmsc are hindering its performance.

We would appreciate any suggestions on how to improve the mixing of our chain.

Our model specification is:

thin = 25
samples = 1000
nChains = 1
transient = samples*thin

hmsc_mod = Hmsc(Y=Y, X=X, XFormula =XFormula, C=C, distr="poisson", studyDesign=studyDesign, ranLevels=ranlevels)

hmsc_out = sampleMcmc(hmsc_mod, samples=samples, thin=thin, transient=transient, nChains=nChains, nParallel=nChains)

The inputs are:

> str(Y)
int [1:2603, 1:113] 1 1 3 2 1 1 3 3 2 1 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:113] "y1" "y2" "y3" "y4" ...

> str(X)
num [1:2603, 1:7] -0.559 -0.825 0.213 0.449 1.097 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:7] "x1" "x2" "x3" "x4" ...

> XFormula
~x1 + x2 + x3 + x4 + x5 + x6 + x7

> str(C)
num [1:113, 1:113] 1 0 0 0 0 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : NULL

> str(studyDesign)
'data.frame': 2603 obs. of 3 variables:
$ re1: Factor w/ 10 levels "A","B","C",..: 6 6 1 1 1 1 1 1 1 1 ...
$ re2: Factor w/ 2123 levels "s1","s2",..: 227 230 39 38 60 48 46 1 19 6 ...
$ re3: Factor w/ 11 levels "year1","year2",..: 10 10 7 7 7 7 7 7 7 7 ...

output_HMSCvsHmsc

R CMD check issues

R CMD check reports several issues (ERROR, NOTE and WARNING). I can have a look at some of these, but here a few that may need developers' attention:

Both and Install and Check logs give this warning, this comes from 00install.out:

** byte-compile and prepare package for lazy loading
Note: possible error in 'combineParameters(Beta = Beta, ':
argument 11 matches multiple formal arguments 

The error comes from function samplePrior which has this call:

  sample = combineParameters(Beta=Beta,Gamma=Gamma,iV=iV,rho=rho,iSigma=iSigma,
                    Eta=Eta,Lambda=Lambda,Alpha=Alpha,Psi=Psi,Delta=Delta,
                    nc=hM$nc, XScalePar=hM$XScalePar, XInterceptInd=hM$XInterceptInd, nt=hM$nt, TrScalePar=hM$TrScalePar, TrInterceptInd=hM$TrInterceptInd, rhopw=hM$rhopw)

The error comes from item nc=hM$nc. Function combineParameters has three arguments matching nc*: ncNRRR, ncRRR and ncsel. The function is undocumented and I don't know which is intended (but R selects the first of these, ncNRRR).

There are WARNINGs on S3 generic/method consistency. These are easily fixed, but this can be done in different ways. I suggest you think how to do this. The rule is that you define a S3 method, its signature must match the definition of generic. So instead of print.Hmsc(hM) you should have print.Hmsc(x, ...) because this is how generic print is defined (and you also need the three dots even though you ignore the extra arguments). Similarly, predict is defined for arguments object and .... As to setPriors you have more choices, since you define both the generic and the method yourself, but these two should match. You can have new arguments that are not in the generic, but you should have all the arguments of the generic. That is, for print, the first argument should be x, and there also must be ... somewhere (usually at the end).

There are still several undefined variables. These are probably coding errors.

Undefined global functions or variables:
  ns ny supp

There is a huge list of undocumented arguments, code document mismatches and reports of spelling errors in their documentation. These are all trivial to fix, but need janitorial work. I have not started this yet. Some of these can be fixed by roxygenizing ma pages: the changes were made only in the R files, but man pages were not updated.

dependecy on BayesLogit for MacOS

This is not an Issue per se but just to let you know that the installation of BayseLogit can be very cubersome on OSX. This has to do with the need of compiling the package manually and problems with rcpp on mac afaik.

I made it work after ~2h following the advice below:

library not found for -lgfortran
macOS 'wchar.h' File Not Found

I saw that @jarioksa tried to fix the problems with BayseLogit so it could be resubmitted to CRAN but the author seems to be unresponsive. It's a pity because users might be putt off because of the trouble that ensues.


# excerpt from session_info(): 

version  R version 3.5.3 (2019-03-11)
os       macOS Mojave 10.14          
system   x86_64, darwin15.6.0 1

ps: now that I managed to install it anyway, I am really looking forward to start using it

How to use XSelect for variable selection

Dear HMSC developers -- First, thank you. I am excited about using your package for my research. Before I start using the many tools available, I would like to be able to use some sort of model selection procedure to help identify which environmental variables are important to retain in the XData matrix.

Based on the language in Hmsc() help, it sounds like the XSelect argument may help, but I do not fully understand what this does. For instance, in the examples you provide what do the q values specify? Is that a penalty for additional parameters?

Any additional references/help would be greatly appreciated. Thanks!

installation error for vignettes

install_github("hmsc-r/HMSC", build_opts = c("--no-resave-data", "--no-manual")), produces this error:

✔ checking for file ‘/private/var/folders/4g/4wpmdzbd7tj0cwctcd99wz680000gp/T/RtmpQVtgTN/remotes102f511d44a89/hmsc-r-HMSC-077cb8c/DESCRIPTION’ ...
─ preparing ‘Hmsc’:
✔ checking DESCRIPTION meta-information ...
─ installing the package to build vignettes
E creating vignettes (10.2s)
duplicated vignette title:
‘R packages: Static PDF and HTML vignettes’

--- re-building ‘vignette_1_univariate.pdf.asis’ using asis
--- finished re-building ‘vignette_1_univariate.pdf.asis’

--- re-building ‘vignette_2_multivariate_low.pdf.asis’ using asis
--- finished re-building ‘vignette_2_multivariate_low.pdf.asis’

--- re-building ‘vignette_3_multivariate_high.pdf.asis’ using asis
--- finished re-building ‘vignette_3_multivariate_high.pdf.asis’

--- re-building ‘vignette_4_simulated.pdf.asis’ using asis
--- finished re-building ‘vignette_4_simulated.pdf.asis’

--- re-building ‘vignette_5_birds.pdf.asis’ using asis
--- finished re-building ‘vignette_5_birds.pdf.asis’

Error: Duplicate vignette titles.
Ensure that the %\VignetteIndexEntry lines in the vignette sources
correspond to the vignette titles.
Execution halted
Error in (function (command = NULL, args = character(), error_on_status = TRUE, :
System command error

These install successfully (no vignettes, but they can be downloaded from github anyway):

install_github("hmsc-r/HMSC")
install_github("hmsc-r/HMSC", build_opts = c("--no-resave-data", "--no-manual", "--no-build-vignettes"))

Error during conditional cross-validation in a NNGP model

Hi,

I report the following error tryng to run a conditional cross-validation on a NNGP model:

>   m.spatial.predsREAL.sp = computePredictedValues(m.spatial, partition=partition, partition.sp=1:ncol(Y), updater=list(GammaEta=FALSE), nParallel=nChains, expected=F)
[1] "Cross-validation, fold 1 out of 4"
Error in as(x, "CsparseMatrix") : 
  no method or default for coercingNULLtoCsparseMatrix

here is the traceback():

13: stop(gettextf("no method or default for coercing %s to %s", dQuote(thisClass), 
        dQuote(Class)), domain = NA)
12: as(x, "CsparseMatrix")
11: FUN(X[[i]], ...)
10: lapply(lst, as_Csp2)
9: lapply(lapply(lst, as_Csp2), as, "TsparseMatrix")
8: .bdiag(alis)
7: .class1(object)
6: as(.bdiag(alis), "CsparseMatrix")
5: bdiag(lapply(seq_len(nf), function(x) iWg[[alpha[x]]]))
4: updateEta(Y = Yc, Z = Z, Beta = sam$Beta, iSigma = 1/sam$sigma, 
       Eta = Eta, Lambda = sam$Lambda, Alpha = sam$Alpha, rLPar = object$rLPar, 
       X = X, Pi = PiNew, dfPi = dfPiNew, rL = rL)
3: predict.Hmsc(hM1, post = postList, X = XVal, XRRR = XRRRVal, 
       studyDesign = dfPi, Yc = Yc, mcmcStep = mcmcStep, expected = expected)
2: predict(hM1, post = postList, X = XVal, XRRR = XRRRVal, studyDesign = dfPi, 
       Yc = Yc, mcmcStep = mcmcStep, expected = expected)
1: computePredictedValues(m.spatial, partition = partition, partition.sp = 1:ncol(Y), 
       updater = list(GammaEta = FALSE), nParallel = nChains, expected = F)

Diving into the functions, I discovered the issue appears when launching the updateEta function

Any ideas?
Thanks in advance for your precious help.
Mirko

Specification of random effects for nested+longitudinal study design

Hi!

I'd appreciate your advice regarding the specification of random effects in Hmsc. My dataset consists of standardized, annually repeated surveys across an evenly spaced grid of sampling plots. The exact location of individuals within each plot is recorded, allowing me to derive species occurrences at finer grain sizes by subdividing the the entire plot into a number of equally-sized sub-plots. I am mostly interested in the change in species association patterns at different grain sizes, but I'm not quite sure whether my specification of random effects is correct.
I need random effects for three reasons: (1) to make Hmsc estimate pairwise residual correlations, (2) to account for the fact that sub-plots are nested within sampling plots, and (3) to account for the repeated sampling in different years. From my understanding, the specification of study design and random effects at the coarsest level, i.e. for entire sampling plots, should be as follows:

sampling_design

sampling_plot | year
--------------|--------
1             | 2010
1             | 2011
2             | 2010
2             | 2011
...           | ...

random effects

rnd_eff = list("sampling_plot" = HmscRandomLevel(units = study_design$sampling_plot),   # obtain residual correlations
               "year" = HmscRandomLevel(units = study_design$year))   # account for temporal autocorrelation

However, when I want to look at the sub-plot level, I am not quite sure if the following is correct:

sampling_design

sampling_plot |  sub_plot  | year
--------------|------------|--------
1             |     1      | 2010
1             |     2      | 2010
1             |     3      | 2010
1             |     4      | 2010
1             |     1      | 2011
1             |     2      | 2011
1             |     3      | 2011
1             |     4      | 2011
2             |     5      | 2010
...           |     ...    | ...

random effects

rnd_eff = list("sampling_plot" = HmscRandomLevel(units = study_design$sampling_plot),   # account for nested sampling,
               "sub_plot" = HmscRandomLevel(units = study_design$sub_plot),   # obtain residual correlations
               "year" = HmscRandomLevel(units = study_design$year))   # account for temporal autocorrelation

Any thoughts on that?

New CRAN release

Current CRAN release fails in R-devel (to be R 4.0.0), and CRAN maintainers have requested us to release a new version of Hmsc by December 18, 2019. I have fixed the R-devel issues, and from that point of view, Hmsc is ready for the release. However, some fine tuning is needed, and there also is a possibility of some quick last minute fixes. However, I think we should move early next week to have some buffer for the CRAN deadline.

Primary problem: Class of matrix

There were numerous places where the code assumed that an object belongs to only one class, but it can have an inheritance sequence of several classes. In the he future version of R, the class of matrix object will have such an inheritance and have two classes c("matrix", "array"). In general, a class should not be queried using if(class(x) == "foo"), but using if(inherits(x, "foo")). I have changed all class() queries to that idiom, or used specific query functions is.matrix(), is.data.frame(), is.factor(). Most of the CRAN failures come from constructions switch(class(x), "matrix" = {...}, "data.frame" = {...}) which fails if x is of class c("matrix", "array"). I have made the minimal change: explicitly use only the first class: switch(class(x)[1L], ...). I think most of these cases would be better as if ... else if ... constructs, because there are usually only two alternatives (and no error catching default or else): switch() is great for multiple choices, but for two choices if ... else if ... is more readable. Specifically, I would like the construct

if (is.matrix(x)) {...}
else if (inherits(x, "list") {...} # or provide specific test for a *list of matrices*
else {stop("coding error: contact developers")}

However, switch() is correct, and I have not fixed it because it was not broken.

NEWS

I have added a file inst/NEWS.md for user-visible changes in release versions. This should be checked and updated (and approved by you).

Running time of CRAN tests

The first CRAN release had some delays because it was put under manual inspection. While this is normal for new packages, and things may be easier for updated versions, there was one point that was caught in CRAN tests: running example(sampleMcmc) took nearly 10 sec, and we had to explain that it is needed. We had bad luck, and a slow computer was used in CRAN test farm: the test took ca. 8 sec, but even in my tiny MacBook Air it runs in 6 sec, but even that is above the 5 sec limit of passed test. So we need to cut down to samples=50 even if that feels bad (with a warning comment) – setting nParallell=2, nChains=2 could also help (and even verbose = FALSE helps a bit).

Alternative README for the CRAN release

The current README.md mainly serves the github server. However, it will also be copied to the README in CRAN, but there some features are less useful or non-functional (such as download statistics, or instructions of github installs). If we provide an alternative README in the inst/ folder, it will copied over the current README in the package and in CRAN, and we could have different versions for CRAN and github. That new inst/README.md is minimally the current one stripped of github specifics. I think it might be better to have something similar as @MelindadeJonge already had, and that was largely incorporated in the package help (man/Hmsc-package.Rd).

Version number

The first release version was 3.0-2: we submitted first 3.0-0, but we had to increase the patch level at each resubmission. The current github version is 3.0-4, and I think we should aim at releasing at that number, and the documentation should reflect this. Personally, I don't see much reason to change a version number without a release (releases have meaningful numbers, and they are tagged in git, but for other changes the version numbers have little use), but I don't see large problem in having discontinuous numbering in CRAN either – as long as we make clear to users, that there are no release between those numbers.

More detailed documentation/examples for Reduced Rank Regression?

Hmsc() has arguments XRRRData, XRRRFormula, XRRR ncRRR, and XRRRScale for reduced rank regression, but this is not described in the details section of the Hmsc() documentation, nor in any of the vignettes. Is there anywhere that this is described in detail?

Unable to set non-zero-centered Gamma priors plus a few minor fixes

I have been experimenting with the Hmsc package for about a month now, and I really appreciate the package. Thank you for all the hard work you've put into it.

I have one issue that feels important and three that seem very minor. I hope it's okay that I am presenting them all in a single issue (let me know if I should have divided them up). I am attaching an MWE that illustrates all of them.

(1) Setting a prior on gamma that is not zero-centered.

Following up on a post from Issue #10, I tried to use priors to force Hmsc to treat a column of the X matrix as an offset, and I think Hmsc is implementing it incorrectly (or I am implementing Hmsc incorrectly).

If the offset (i.e. log(effort)) is column K of the X matrix, then my goal is to have Beta_{jK} ~ N(1,0) for all species j=1,...,J. To make this happen in the Hmsc framework, we require a prior on Gamma_{1K} ~ N(1,0), and we also need V[K,K] to be nearly zero.

However, my output (traceplot) behaves as though I have: (a) set a Gamma_{1K} ~ N(0,0) prior while (b) set an initial value for MCMC sampling of 1 for Gamma_{1K}. In other words, my effort to set mGamma=1 is being interpreted as changing the initial value instead of the prior mean.

In my attached toy example, the Trait matrix is just an intercept column, and K is the last column of X, so the code that makes sense to me (for a model setup called 'm.setup') is:

m.setup <- setPriors(m.setup,
                     V0 = diag(c(rep(1, K-1), 1e-8)),
                     f0 = K+1,
                     mGamma = c(rep(0, K-1), 1),
                     UGamma = diag(c(rep(1, K-1), 1e-8))
)

(2) Labels for Eta.

Assume I have a random effect with R = 5 levels that I defined using HmscRandomLevel() and named 'rL'. The column names for Eta appear to be set using the first R values of rL -- rL$pi[1:R] -- instead of using the the random effect levels -- levels(rL$pi)[1:R].

So, for example, if the levels of my random effect are: A, A, B, B, C, C, D, D, E, E, the model fit object has labels A, A, B, B, C instead of A, B, C, D, E.

(3) Labels for Lambda

When I extract a posterior sample of the Lambda matrix for a random effect, the variable names correspond to building a matrix with byrow=TRUE, but the values in the vector correspond to building a matrix with byrow=FALSE. This is easier to explain in the attached MWE than verbally.

(4) Error for adaptNf > transient

I should know better than to accidentally set adaptNf > transient when defining my sampleMcmc(), but that hasn't prevented me from doing so. It would be nice to add an error warning for the user to highlight this mistake.
MWE non-zero priors.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.