Comments (6)
The runtimes you have observed look somewhat too large. Given that you number of species is small, I would expect that the spatial component does not work sufficiently performant. Using our internal performance comparisons with various datasizes, I would dare to say that it is approximately x10 compared to most equivalent tasks in my laptop, which is no HPC at all.
There are two potential reasons that I have in my mind right now:
- If you are running chains in parallel (which you do) some R distribution-OS combinations ae known to start getting convoluted due to interplay between cross-chain paralellization (intended) and within chain default paralllization of called linear algebra routines (not very much intended). Can you check your short runs with
nChains=1
or withnParallel=1
and report whether they significantly differ in terms of sec/iteration or not? - NNGP approximation algorithm is sensitive to the order of spatial units. E.g. if you rename your spatial units so that it perturbs their order, the approximation will be different. The rule of thumb is that the order shall be such that there are no neighbours far away in the order. If this is done randomly, then technically NNGP can be equally slow as full covariance GP. HMSC is not handling this aspect automatically. Could you please check if your spatial units are ordered (in term of factor/string values) in somewhat reasonable way, e.g. along the longest axis of your study area?
The nParallel
in predictions can be different from its value in the sampling phase.
from hmsc.
@MartinStjernman you need to sort the names of sData rows, so that its lexicographic order matches the desired one. Personally I typically add some numerical prefix, like 0001_first_site_original_name, 0002_second_site_original_name. Also, you would need to update the corresponding column of studyDesign accordingly.
I am quite sceptical whether TSP is best suited for this problem. First of all, you do not need to return to origin in NNGP scenario. Next, it is not the distance that we are worried about, but that the neighbours are not too far in the resulted order. My guess is that you can simply order along the lon/lat in many cases. Preferably, you shall project to the leading eigenvalue (principal component) of you sites' coordinates.
N = 100
X = cbind(2*runif(N), runif(N))
plot(X[,1], X[,2])
pc <- prcomp(X)
proj = X %*% pc$rotation[,1]
optOrder = rank(proj)
plot(X[,1], X[,2], type="n")
text(X[,1], X[,2], optOrder)
Of course, there are exceptions - if you are studying some coastal communities, then the best way would be to order along the coast.
from hmsc.
Thanks heaps!!!
After spatial sorting the observation with the nearest neighbour, the processing speed gained is huge! now with thin = 500 I can finish within 5 hours !! Thanks for keeping my PhD alive :D
But this only works on my personal Mac Studio. If I tried to deploy it on HPC with a Linux system, it seems that spatial sorting does not help. Would you happen to have any ideas? I might need to increase the number of sampling units and species numbers later on which my personal Mac may become a bottleneck.
from hmsc.
using parallel = 1, nChains = 1, samples = 250, thin = 1 and transient = 50
when I use my own Mac studio the running time is
[1] "MODEL START: Mon May 27 23:45:53 2024"
[1] "MODEL END: Mon May 27 23:45:58 2024"
5 seconds
but the same thing on HPC
[1] "MODEL START: Mon May 27 23:42:39 2024"
[1] "MODEL END: Mon May 27 23:48:22 2024"
nearly 6 minutes
below is my observation row order in my data
from hmsc.
Hi,
If I may tune in on this, as I have also problems with long running times and am seeking anything that can speed it up, I wonder about the sorting of observations suggested as one solution.
- What exactly is sorted, is it the XData/studyDesign objects or the object provided as sData when constructing the random level object or both?
- It seems the improvement reached by LamuelCH when sorting with nearest neighbour used Travelling Salesman Problem (TSP) "algorithm" and I wonder a) is this a good method to satisfy NNGP algorithm requirements and b) what package/function was used to get the ordering according to TSP?
Any help is highly appreciated!
Thanks!
from hmsc.
Thanks a lot Gleb!
I take it the reason I need to have names of my sites (i.e. rownames in sData), such that its lexicographic order matches the desired one, is that sData is sorted "under the hood" when constructing the random level object using HmscRandomLevel() (i.e. the step: rL$pi = as.factor(sort(rownames(sData)))
).
I will try this out although I think that my sites are already quite well sorted (site names are "sort of" coordinates).
I have, if I may, one additional question. My sites are aggregated in small clusters (cluster is also included as a non-spatial/unstructured random effect) and I have adjusted the alphapw prior for the site random effect to the scale of sites within clusters. With such a "local" prior, is it still of benefit (for speed) to spatially sort the clusters or is it enough for the sites to be spatially sorted within clusters?
Thanks again for the excellent package and help!
from hmsc.
Related Issues (20)
- HSMC usage to infer microbial communities? [discussion] HOT 1
- Spatial random variable with 9,738 coordinates causes R to crash HOT 2
- Interpretation of model coefficients in a multivariate poisson GLM with spatial random effect
- incorrect number of dimensions HOT 3
- Can not predict at the same coordinates used to train the model
- Missing help for `importPosteriorFromHPC` function
- Error in cross validation: missing value where TRUE/FALSE needed
- predict with Yc instead of constructGradient to avoid "Error: vector memory exhausted (limit reached?)" ?
- Interpretation of `predictEtaMean` / `predictEtaMeanField` arguments of the predict function
- In cor(lbeta[[i]][k, ], lmu[[i]][k, ]) : the standard deviation is zero HOT 2
- Unexpected trace plots for alpha parameters of a GPP model HOT 4
- Error in `importPosteriorFromHPC` for GPP/Hmsc-hpc models with `alignPost = TRUE` HOT 1
- Error while converting Hmsc model object to JSON: `Error in rcpp_to_json(x, unbox, digits, numeric_dates, factors_as_string, : negative length vectors are not allowed` HOT 3
- im getting this error in running the Uhlig code
- Question about making predictions when using a hurdle approach
- Inconsistency in spatial model variance partitioning
- Issue with constructGradient() Function in HMSC Package HOT 1
- The use of `setPriors` in `computePredictedValues` function HOT 1
- sampling bias correction in Hmsc (Is it possible to provide a weight to sampling units) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hmsc.