sjevelazco / flexsdm Goto Github PK
View Code? Open in Web Editor NEWUseful tools for constructing species distribution models
Home Page: https://sjevelazco.github.io/flexsdm/
Useful tools for constructing species distribution models
Home Page: https://sjevelazco.github.io/flexsdm/
Hi everyone,
Is there a way to provide rlayer
argument to sample_background()
if data partition method used is Conventional data partitioning methods (part_random) or Environmental and spatial cross-validation (part_senv) ? Those methods don't return raster grid that can be passed into sample_background()
like in the documentation.
I'm using flexsdm v1.3.2
Calculating extra_eval for many points takes a long time. When extrapolation for both the training environment and a projection environment is wanted, the function has to be run twice. I assume most of the running time of extra_eval comes from calculating the distances. Allowing for multiple environments in the function can speed up a total workflow considerably.
Hi there,
I am trying to use flexsdm for modelling my interest species. I noticed there is no esm_raf
function available.
Please correct me if I am wrong and point me to the relevant function.
Also
the fit_max
and esm_max
functions give out results like the rest of the models but the plot output of maxent shows just one number instead of a predictive index like the rest of the models eg. 0.6345 insetad of lets say 1 - 100 range.
Cheers
Hi Santiago,
I found an error in the occfilt_geo() function. The function only works when the coordinate columns are named "x" and "y". If we run the function example, it works perfectly:
# Environmental variables
somevar <- system.file("external/somevar.tif", package = "flexsdm")
somevar <- terra::rast(somevar)
plot(somevar)
# Species occurrences
data("spp")
spp
spp1 <- spp %>% dplyr::filter(species == "sp1", pr_ab == 1)
somevar[[1]] %>% plot()
points(spp1 %>% select(x, y))
#Change column names
spp2 <- spp1 %>% dplyr::rename(longitude = x, latitude = y)
# Using Moran method
filtered_1 <- occfilt_geo(
data = spp,
x = "x",
y = "y",
env_layer = somevar,
method = c("moran"),
prj = crs(somevar)
)
However, when I rename the columns with the coordinates to "longitude" and "latitude," I get the following error:
#Change column names
spp2 <- spp1 %>% dplyr::rename(longitude = x, latitude = y)
# Using Moran method
filtered_1 <- occfilt_geo(
data = spp2,
x = "longitude",
y = "latitude",
env_layer = somevar,
method = c("moran"),
prj = crs(somevar)
)
Error in `[.data.frame`(da, c(x, y)) : colunas indefinidas selecionadas
Terra provides the same function as raster but they are faster.
Interesting function
predict (compare speed between raster::predict and terra::predict)
crop
mask
Big problem!!!!!!!!
terra and raster treat factor layer distinctly
List of functions used with raster that can be converted to terra
importFrom(raster,as.data.frame)
importFrom(raster,bind)
importFrom(raster,brick)
importFrom(raster,buffer)
importFrom(raster,cellFromXY)
importFrom(raster,cellStats)
importFrom(raster,coordinates)
importFrom(raster,crs)
importFrom(raster,extend)
importFrom(raster,extent)
importFrom(raster,extract)
importFrom(raster,mask)
importFrom(raster,match)
importFrom(raster,ncell)
importFrom(raster,nlayers)
importFrom(raster,plot)
importFrom(raster,predict)
importFrom(raster,projection)
importFrom(raster,rasterToPolygons)
importFrom(raster,res)
importFrom(raster,stack)
importFrom(raster,values)
importFrom(raster,writeRaster)
importFrom(raster,xyFromCell)
#Issue in rev_jack
Function breaks when length(v)>2
if (v < median(x))
Possible solution using a for loop
if (any(z > t1)) {
f <- which(z > t1)
v <- x[f]
for (vt in v) {
if (vt < median(x)) {
xa <- (v2 <= vt) * 1
out <- out + xa
}
if (vt > median(x)) {
xb <- (v2 >= vt) * 1
out <- out + xb
}
}
} else {
out <- out
}
Hello, I am predicting the habitat of a fish using flexsdm with the variables Sea-Surface-Temp. and Chlorophyll, it works pretty neatly, except for the last step. I try to use the following method:
eglm <- esm_glm(
data = sdm_extract,
response = 'presence',
predictors = c('sst', 'chlor_a'),
partition = '.part',
thr = 'max_sens_spec'
)
With this I get the following error:
Error in data_ens[[2]] (fishi.R#134): subscript out of bounds
However, this works flawlessly:
mglm <- fit_glm(
data = sdm_extract,
response = 'presence',
predictors = c('sst', 'chlor_a'),
partition = '.part',
thr = 'max_sens_spec'
)
I just followed the example with my own data... Complete script is here: https://hastebin.com/share/lidadasejo.ruby
Why is there something out of bounds for the esm-methods, but not for the fit-functions?
Some functions like calib_area, sample_pseudoabs, and sample_backgroud, among other, would be interesting have the possibility of transform data.frame occurrence data to a given crs. I think that Brook can help with it :)
Can we calculate environmental variable importance and select some? I didn't find it in this package.
Search R packages that perform all or some SDM modeling steps (biomode2, ENMTML, sdm, ZOON, ModEco, kuenm, Model-R, BiodiversityR, tuneSDM, BlockCV) in order to offer other methods or improve ones. Also, it is important to create in the ms (table or text in support information) highlight singular features of our package with respect to the others. Maybe would be interesting to create a set of function that help report the most important modeling procedures
Hi,
I tested the occfilt_env function and found out that it fails when environmental variables are rare. Within the function zero values are then produced which cause an error message:
Error in seq.default(ext1[1], ext1[2], by = res[i]) :
invalid '(to - from)/by'
As far as i can see the error occures in this part of the function code:
n <- ncol(env_layer)
res <- res <- apply(env_layer, 2, function(x) diff(range(x))) / nbins
for (i in 1:n) {
ext1 <- range(env_layer[, i])
ext1[2] <- ext1[2] + 0.05
ext1 <- c(ext1[1] - 0.000001, ext1[2] + 0.000001)
classes[[i]] <- seq(ext1[1], ext1[2], by = res[i])
classes[[i]][length(classes[[i]])] <- ext1[2]
}
I realized that one of my environmental variables produces a zero in res[i]
so that the seq function does not work.
Is there any solution for this problem?
In the website you stated that for the function correct_colinvar() using the pearson method return 3 objects, being a SpatRaster the first one, but I cannot retrieve this first object, only 2 objects.
somevar <- system.file("external/somevar.tif", package = "flexsdm")
somevar <- terra::rast(somevar)
var <- correct_colinvar(env_layer = somevar, method = c("pearson", th = "0.7"))
When fitting models, I generally like to compare predictive power (evaluation metrics over the testing datasets) against explanatory power (evaluation metrics over the training datasets). This is also an indicator of overfitting. I dont see any function in the package that allow to calculate explanatory power, just predictive power in the models objects. Maybe I am missing something. If not, I would like to see this option in the package.
I'm trying to download the package without success (both on Windows and Linux).
Warning in install.packages :
package ‘flexsdm’ is not available for this version of R
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
I'm using R version 4.2.2. But I encounter same problem on a previous version 4.2.0
At the moment, mean and sd of each performance metric is returned based on its values across cross-validation folds. It would be useful for meta-analysis to be able to extract the individual performance metrics for each individual fold.
For the tune_gbm
function this would mean filtering eval.partial
for the best fit and adding it to the results of the function when the user specifies an option to hold on to this info, similar to the keep.fold.fit
in the gbm.step
function of the dismo
package
correct_colivar()
correction of collinearity based on points and not only on rasters.When using tune_max
with presence-only data and background, the evaluation step fails since both the number of absences and presences have to be greater than zero in line 152 of sdm_eval
: if (na == 0 | np == 0)
.
Is there a way to use tune_max with presence-only data?
Thank you so much for this wonderful package.
I would like to include a bias grid in a modelling workflow as suggested by Elith et. al. 2011.
Is there functionality to explicitly include a bias grid that indicates the biases in the survey data for all algorithms?
There is an example in the MaxEnt tutorial, but I cannot understand how to implement this more generally in flexsdm.
If it is possible, I would like to ask if it could be added as a vignette showing how.
Many thanks,
Darren
Elith et. al. 2011. A statistical explanation of MaxEnt for ecologists. https://doi.org/10.1111/j.1472-4642.2010.00725.x
I know this isssue can be controlled to a certain extent by
i) chosing appprate resolution (extent and cell size),
ii) chosing appropriate pseudo-absences
iii) environmental filtering of observation data
Hello Santiago Velazco,
Firstly, I would like to congratulate you on the package. Besides being very useful, it is also very didactic. I always recommend the package's website to those who are starting to study niche modeling.
My question is about the projection of PCA in various scenarios. To demonstrate my point, I ran the correct_colinvar function with temperature variables from WorldClim cropped to the state of Paraná (Brazil). The coefficients on the first axis show that temperature variables (Bio11, Bio06, Bio01) have negative values.
variable PC1
1 Bio01 -0.452
2 Bio02 -0.100
3 Bio03 -0.177
4 Bio04 0.135
5 Bio05 -0.412
6 Bio06 -0.415
7 Bio07 0.00865
8 Bio10 -0.439
9 Bio11 -0.449
As a result, we can expect warmer locations to have lower score values on the first axis. When we plot the map, we can see that the warmer regions in Paraná have lower ("more negatives") values:
So, if we project this PCA to a warmer future, the expected scores should be lower. However, the function returns a scenario with higher scores:
This happens, I believe, because you use the function terra::scale to scale the variables from the new scenario before predicting.
When we predict the PCA without scaling the variables, we get a map that appears to make more sense, with lower predicted scores in the warmer scenario.
I also tested the PCA prediction with the kuenm_rpca function from the kuenm package, and the results are identical as when the variables are not scaled before predicting:
So I was wondering if it is actually required to scale the variables of the new scenario before predicting, because the predictions without scale appear to make more sense to me.
If you want to reproduce the results, I've attached the files and code I used.
PCA_flexsdm.zip
Hi @sjevelazco !
I am exploring some R packages to build species distribution models, and I am trying flexsdm
in the last days. I find the package very useful, but I am encountering an issue that I am not able to solve, relating to an out of memory error when using the function sdm_predict
, this is the code:
# projecting
#For memory issue
mem.maxVSize(vsize = Inf)
mem.maxNSize(nsize = Inf)
Sys.setenv('R_MAX_VSIZE' = 120000000000000000)
Sys.setenv('R_MAX_NSIZE' = 120000000000000000)
Sys.setenv('R_MAX_MEM_SIZE' = Inf)
ensemble_map_1 <- sdm_predict(
models = ens_m,
pred = EnvVars,
thr = "max_sens_spec",
nchunk = 50,
con_thr = FALSE,
predict_area = NULL
)
and this the error:
Predicting ensembles
Error: Cannot allocate a vector of size 22.1 Gb
Error: no more error handler available (recursive errors?); invoke reboot abort
This is the sessionInfo:
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=Italian_Italy.utf8 LC_CTYPE=Italian_Italy.utf8 LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C LC_TIME=Italian_Italy.utf8
time zone: Europe/Rome
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.1.4 terra_1.7-65 flexsdm_1.3.4
loaded via a namespace (and not attached):
[1] randomForest_4.7-1.1 Matrix_1.6-1.1 gtable_0.3.4 compiler_4.3.2 maps_3.4.2 Rcpp_1.0.11 tidyselect_1.2.0 parallel_4.3.2
[9] splines_4.3.2 scales_1.3.0 lattice_0.22-5 ggplot2_3.4.4 R6_2.5.1 generics_0.1.3 patchwork_1.2.0 gbm_2.1.8.1
[17] knitr_1.45 kernlab_0.9-32 iterators_1.0.14 dotCall64_1.1-1 tibble_3.2.1 munsell_0.5.0 nnet_7.3-19 pillar_1.9.0
[25] rlang_1.1.2 sp_2.1-2 utf8_1.2.4 xfun_0.41 maxnet_0.1.4 doParallel_1.0.17 viridisLite_0.4.2 cli_3.6.2
[33] spThin_0.2.0 magrittr_2.0.3 mgcv_1.9-0 foreach_1.5.2 grid_4.3.2 rstudioapi_0.15.0 spam_2.10-0 lifecycle_1.0.4
[41] nlme_3.1-163 fields_15.2 vctrs_0.6.5 glue_1.6.2 raster_3.6-26 codetools_0.2-19 survival_3.5-7 fansi_1.0.6
[49] colorspace_2.1-0 tools_4.3.2 pkgconfig_2.0.3
The error is not only related to the ensemble, but all the single models as well. As you can see, I already tryed changing the memory settings, and I ran the code in different workstations, but even using the one with 128 Gb RAM did'nt solve the issue. I am wondering if there are some other options that I could try to solve these errors.
Many thanks!
Hello everyone,
is there any way to reduce the memory allocation of flexsdm? Its simply impossible for me to run some functions, like correct_colinvar()
and sdm_predict()
since it eats up all memory of my notebook (i7-7500, 16 GB RAM, Ubuntu 22) and i'm using 13 cropped layers of 5400 x 6000 and ~1 km resolution. My notebook is not the best for the task but i feel this memory issue can make some people simply drop the package, mostly students and people who cannot afford for better setups.
For example, raster and terra packages create temp files on a temp dir and allows the user to set memory fraction manually when using rasterOptions(memfrac= value)
or terraOptions(memfrac= value)
. It also can process files into chunks if needed as well.
Sincerely,
Pedro Bittencourt
tune_glm and tune_gam will not necessarily tune the models. It is only to have the same function for fitting and validating the models.
HI,
Any ideas why I am getting this error when trying to apply esm_max()? How can I solve it?
Code:
esm_max_t1 <- esm_max(
data = ex_spxy2,
response = "pr_ab",
predictors= c("PC1", "PC2"),
partition = ".part",
thr = NULL,
background = ex_bg2,
clamp = TRUE,
classes = "lq",
pred_type = "cloglog",
regmult = 2.5
)
Presences, background points and predictors attached,
the rest of my libraries are updated, but I get the following error.
devtools::install_github('sjevelazco/flexsdm')
Downloading GitHub repo sjevelazco/flexsdm@HEAD
✓ checking for file ‘/tmp/RtmpDagyjX/remotes1c3fe0498f/sjevelazco-flexsdm-aaa7271/DESCRIPTION’ ...
─ preparing ‘flexsdm’:
✓ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘flexsdm_0.0.2.tar.gz’
Installing package into ‘/home/eric/R/x86_64-pc-linux-gnu-library/4.1’
(as ‘lib’ is unspecified)
sessionInfo:
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 20.2
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so
When I run the code
presence.background.points.model <- rbind(presence.points.model, background.points.model)
maxent.posteriori.model <- msdm_posteriori(
records = presence.background.points.model,
x = "x",
y = "y",
pr_ab = "pr_ab",
cont_suit = maxent.predict.BR$max$max,
method = c("obr"),
thr = "max_sens_spec",
buffer = NULL,
)
I get the message
Error in quantile.default(a, 0:1000/1000) :
missing values and NaN's not allowed if 'na.rm' is FALSE
When I change the code to use a previously calculated threshold, it works fine.
maxent.posteriori.model <- msdm_posteriori(
records = presence.background.points.model,
x = "x",
y = "y",
pr_ab = "pr_ab",
cont_suit = maxent.predict.BR$max$max,
method = c("obr"),
thr = threshold.val,
buffer = NULL,
)
As if it was necessary to remove the NAs from the SpatRaster that with continuous suitability predictions before use it.
This is a most useful package;
I have just started using the package and I am having trouble fitting a gam with random effects.
With my data, being able to include such options in the formula enables superior GAM models, e.g. deviance explained for a simple GAM = 33.5%, and deviance explained for a GAM with random effects = 48.1%.
I used the code from the fit_gam help to illustrate the issue - see below.
fit_gam fails with: Sorry, but it was not possible to fit the model
Am I specifying the formula correctly?
Many thanks for any help,
Darren
data(abies)
abies2 <- part_random(
data = abies,
pr_ab = "pr_ab",
method = c(method = "kfold", folds = 5)
)
require(mgcv)
gam_test <- gam(pr_ab ~ s(aet) + s(landform, bs = "re"), data = abies2)
summary(gam_test)
gam_t2 <- fit_gam(
data = abies2,
response = "pr_ab",
predictors = c("aet"),
predictors_f = c("landform"),
select_pred = FALSE,
partition = ".part",
thr = "max_sens_spec",
fit_formula = stats::formula(pr_ab ~ s(aet) + s(landform, bs = "re"))
)
This is a basic question.
How can I create multi band raster (.tif ) file for doing ENM analysis ?
I tried to create it by goal_merge function in QGIS, however It didn't work in flexsdm.
when I use names() in R, the raster file include only one name.
Hi,
Which is the classes default value in https://sjevelazco.github.io/flexsdm/reference/esm_max.html?
Sorry if it is evident!
Thanks
I've just started using flex sdm, such a great package. Thanks for creating it.
I was wondering if there is a way to extract response curves for the predictors in the different models. I couldn't find anywhere in the package documentation or vignettes.
Thanks.
Hi there. Trying to run a fit_gam and am getting an error. I get the same error using the abies data from the package examples. Example code from package and with an error message at the bottom.
data("abies")
abies2 <- part_random(
data = abies,
pr_ab = "pr_ab",
method = c(method = "kfold", folds = 10)
)
abies2
gam_t1 <- fit_gam(
data = abies2,
response = "pr_ab",
predictors = c("aet", "ppt_jja", "pH", "awc", "depth"),
predictors_f = c("landform"),
select_pred = FALSE,
partition = ".part",
thr = "max_sens_spec"
)
Formula used for model fitting:
pr_ab ~ s(aet, k = -1) + s(ppt_jja, k = -1) + s(pH, k = -1) + s(awc, k = -1) + s(depth, k = -1) + landform
Error in (function (cond) :
error in evaluating the argument 'x' in selecting a method for function 'unique': Problem while evaluating dplyr::starts_with(dplyr::all_of(partition))
.
Write a code to calculate a default value for the parameter size in the net method
It must be implemented in fit_nnet
nnet::nnet(
formula1,
data = train[[i]],
size = 2, # revise and implement a formula to calculate it
rang = 0.1,
# decay = grid$decay[ii],
maxit = 200,
trace = FALSE
)
Any plan to make the p_pdp function work with ensemble models?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.