The flexsdm from sjevelazco

rlayer argument in sample_background() function

Hi everyone,
Is there a way to provide rlayer argument to sample_background() if data partition method used is Conventional data partitioning methods (part_random) or Environmental and spatial cross-validation (part_senv) ? Those methods don't return raster grid that can be passed into sample_background() like in the documentation.

I'm using flexsdm v1.3.2

Allow extra_eval to calculate extrapolation for multiple environments

Calculating extra_eval for many points takes a long time. When extrapolation for both the training environment and a projection environment is wanted, the function has to be run twice. I assume most of the running time of extra_eval comes from calculating the distances. Allowing for multiple environments in the function can speed up a total workflow considerably.

esm_raf not present?

Hi there,
I am trying to use flexsdm for modelling my interest species. I noticed there is no esm_raf function available.
Please correct me if I am wrong and point me to the relevant function.

Also
the fit_max and esm_max functions give out results like the rest of the models but the plot output of maxent shows just one number instead of a predictive index like the rest of the models eg. 0.6345 insetad of lets say 1 - 100 range.

Cheers

occfilt_geo() only works with "x" and "y" columns

Hi Santiago,

I found an error in the occfilt_geo() function. The function only works when the coordinate columns are named "x" and "y". If we run the function example, it works perfectly:

# Environmental variables
somevar <- system.file("external/somevar.tif", package = "flexsdm")
somevar <- terra::rast(somevar)

plot(somevar)

# Species occurrences
data("spp")
spp
spp1 <- spp %>% dplyr::filter(species == "sp1", pr_ab == 1)

somevar[[1]] %>% plot()
points(spp1 %>% select(x, y))

#Change column names
spp2 <- spp1 %>% dplyr::rename(longitude = x, latitude = y)

# Using Moran method
filtered_1 <- occfilt_geo(
  data = spp,
  x = "x",
  y = "y",
  env_layer = somevar,
  method = c("moran"),
  prj = crs(somevar)
)

However, when I rename the columns with the coordinates to "longitude" and "latitude," I get the following error:

#Change column names
spp2 <- spp1 %>% dplyr::rename(longitude = x, latitude = y)

# Using Moran method
filtered_1 <- occfilt_geo(
  data = spp2,
  x = "longitude",
  y = "latitude",
  env_layer = somevar,
  method = c("moran"),
  prj = crs(somevar)
)
Error in `[.data.frame`(da, c(x, y)) : colunas indefinidas selecionadas

Migrate from raster to terra

Terra provides the same function as raster but they are faster.
Interesting function
predict (compare speed between raster::predict and terra::predict)
crop
mask

Big problem!!!!!!!!
terra and raster treat factor layer distinctly

List of functions used with raster that can be converted to terra

importFrom(raster,as.data.frame)
importFrom(raster,bind)
importFrom(raster,brick)
importFrom(raster,buffer)
importFrom(raster,cellFromXY)
importFrom(raster,cellStats)
importFrom(raster,coordinates)
importFrom(raster,crs)
importFrom(raster,extend)
importFrom(raster,extent)
importFrom(raster,extract)
importFrom(raster,mask)
importFrom(raster,match)
importFrom(raster,ncell)
importFrom(raster,nlayers)
importFrom(raster,plot)
importFrom(raster,predict)
importFrom(raster,projection)
importFrom(raster,rasterToPolygons)
importFrom(raster,res)
importFrom(raster,stack)
importFrom(raster,values)
importFrom(raster,writeRaster)
importFrom(raster,xyFromCell)

Function for creating

rev_jack breaks when v has more than one element

#Issue in rev_jack

Function breaks when length(v)>2
if (v < median(x))

Possible solution using a for loop
if (any(z > t1)) {
f <- which(z > t1)
v <- x[f]
for (vt in v) {
if (vt < median(x)) {
xa <- (v2 <= vt) * 1
out <- out + xa
}
if (vt > median(x)) {
xb <- (v2 >= vt) * 1
out <- out + xb
}
}
} else {
out <- out
}

Subscript out of bounds

Hello, I am predicting the habitat of a fish using flexsdm with the variables Sea-Surface-Temp. and Chlorophyll, it works pretty neatly, except for the last step. I try to use the following method:

eglm  <- esm_glm(
  data = sdm_extract,
  response = 'presence',
  predictors = c('sst', 'chlor_a'),
  partition = '.part',
  thr = 'max_sens_spec'
)

With this I get the following error:
Error in data_ens[[2]] (fishi.R#134): subscript out of bounds

However, this works flawlessly:

mglm <- fit_glm(
  data = sdm_extract,
  response = 'presence',
  predictors = c('sst', 'chlor_a'),
  partition = '.part',
  thr = 'max_sens_spec'
)

I just followed the example with my own data... Complete script is here: https://hastebin.com/share/lidadasejo.ruby
Why is there something out of bounds for the esm-methods, but not for the fit-functions?

possibility of defining crs for some functions

Some functions like calib_area, sample_pseudoabs, and sample_backgroud, among other, would be interesting have the possibility of transform data.frame occurrence data to a given crs. I think that Brook can help with it :)

variable importance

Can we calculate environmental variable importance and select some? I didn't find it in this package.

flexsdm ms - some tasks to do

Search R packages that perform all or some SDM modeling steps (biomode2, ENMTML, sdm, ZOON, ModEco, kuenm, Model-R, BiodiversityR, tuneSDM, BlockCV) in order to offer other methods or improve ones. Also, it is important to create in the ms (table or text in support information) highlight singular features of our package with respect to the others. Maybe would be interesting to create a set of function that help report the most important modeling procedures

occfilt_env

Hi,
I tested the occfilt_env function and found out that it fails when environmental variables are rare. Within the function zero values are then produced which cause an error message:

Error in seq.default(ext1[1], ext1[2], by = res[i]) : 
  invalid '(to - from)/by'

As far as i can see the error occures in this part of the function code:

 n <- ncol(env_layer)
 res <- res <- apply(env_layer, 2, function(x) diff(range(x))) / nbins
for (i in 1:n) {
    ext1 <- range(env_layer[, i])
    ext1[2] <- ext1[2] + 0.05
    ext1 <- c(ext1[1] - 0.000001, ext1[2] + 0.000001)
    classes[[i]] <- seq(ext1[1], ext1[2], by = res[i])
    classes[[i]][length(classes[[i]])] <- ext1[2]
  }

I realized that one of my environmental variables produces a zero in res[i] so that the seq function does not work.

Is there any solution for this problem?

correct_colinvar() for pearson method do not return 3 objects, only 2 objects

In the website you stated that for the function correct_colinvar() using the pearson method return 3 objects, being a SpatRaster the first one, but I cannot retrieve this first object, only 2 objects.

somevar <- system.file("external/somevar.tif", package = "flexsdm")

somevar <- terra::rast(somevar)

var <- correct_colinvar(env_layer = somevar, method = c("pearson", th = "0.7"))

Evaluation over training dataset (explanatory power)

When fitting models, I generally like to compare predictive power (evaluation metrics over the testing datasets) against explanatory power (evaluation metrics over the training datasets). This is also an indicator of overfitting. I dont see any function in the package that allow to calculate explanatory power, just predictive power in the models objects. Maybe I am missing something. If not, I would like to see this option in the package.

Testing functions

Package download problem

I'm trying to download the package without success (both on Windows and Linux).

From CRAN I get :

Warning in install.packages :
package ‘flexsdm’ is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

From GitHub I get error:
Downloading GitHub repo sjevelazco/flexsdm@HEAD
Error in utils::download.file(url, path, method = method, quiet = quiet, :
download from 'https://api.github.com/repos/sjevelazco/flexsdm/tarball/HEAD' failed

I'm using R version 4.2.2. But I encounter same problem on a previous version 4.2.0

Possibility to extract performance metrics per cross-validation fold

At the moment, mean and sd of each performance metric is returned based on its values across cross-validation folds. It would be useful for meta-analysis to be able to extract the individual performance metrics for each individual fold.
For the tune_gbm function this would mean filtering eval.partial for the best fit and adding it to the results of the function when the user specifies an option to hold on to this info, similar to the keep.fold.fit in the gbm.step function of the dismo package

New functions and function improvements

Write examples

add an example that shows how to use calib_area function with presences-absence data, i.e. define a calibration area and then constraint pres abs data for a given species

sdm_eval fails for presence-only tune_max

When using tune_max with presence-only data and background, the evaluation step fails since both the number of absences and presences have to be greater than zero in line 152 of sdm_eval: if (na == 0 | np == 0).

Is there a way to use tune_max with presence-only data?

Survey data bias grid example?

Thank you so much for this wonderful package.
I would like to include a bias grid in a modelling workflow as suggested by Elith et. al. 2011.
Is there functionality to explicitly include a bias grid that indicates the biases in the survey data for all algorithms?
There is an example in the MaxEnt tutorial, but I cannot understand how to implement this more generally in flexsdm.
If it is possible, I would like to ask if it could be added as a vignette showing how.
Many thanks,
Darren
Elith et. al. 2011. A statistical explanation of MaxEnt for ecologists. https://doi.org/10.1111/j.1472-4642.2010.00725.x
I know this isssue can be controlled to a certain extent by
i) chosing appprate resolution (extent and cell size),
ii) chosing appropriate pseudo-absences
iii) environmental filtering of observation data

Predicting PCA

Hello Santiago Velazco,

Firstly, I would like to congratulate you on the package. Besides being very useful, it is also very didactic. I always recommend the package's website to those who are starting to study niche modeling.

My question is about the projection of PCA in various scenarios. To demonstrate my point, I ran the correct_colinvar function with temperature variables from WorldClim cropped to the state of Paraná (Brazil). The coefficients on the first axis show that temperature variables (Bio11, Bio06, Bio01) have negative values.

   variable      PC1
1 Bio01    -0.452  
2 Bio02    -0.100  
3 Bio03    -0.177  
4 Bio04     0.135  
5 Bio05    -0.412  
6 Bio06    -0.415  
7 Bio07     0.00865
8 Bio10    -0.439  
9 Bio11    -0.449

As a result, we can expect warmer locations to have lower score values on the first axis. When we plot the map, we can see that the warmer regions in Paraná have lower ("more negatives") values:

So, if we project this PCA to a warmer future, the expected scores should be lower. However, the function returns a scenario with higher scores:

This happens, I believe, because you use the function terra::scale to scale the variables from the new scenario before predicting.

When we predict the PCA without scaling the variables, we get a map that appears to make more sense, with lower predicted scores in the warmer scenario.

I also tested the PCA prediction with the kuenm_rpca function from the kuenm package, and the results are identical as when the variables are not scaled before predicting:

So I was wondering if it is actually required to scale the variables of the new scenario before predicting, because the predictions without scale appear to make more sense to me.

If you want to reproduce the results, I've attached the files and code I used.
PCA_flexsdm.zip

Out of memory when using sdm_predict

Hi @sjevelazco !
I am exploring some R packages to build species distribution models, and I am trying flexsdm in the last days. I find the package very useful, but I am encountering an issue that I am not able to solve, relating to an out of memory error when using the function sdm_predict, this is the code:

# projecting

#For memory issue
mem.maxVSize(vsize = Inf)
mem.maxNSize(nsize = Inf)

Sys.setenv('R_MAX_VSIZE' = 120000000000000000)
Sys.setenv('R_MAX_NSIZE' = 120000000000000000)

Sys.setenv('R_MAX_MEM_SIZE' = Inf)

ensemble_map_1 <- sdm_predict(
  models = ens_m,
  pred = EnvVars,
  thr = "max_sens_spec",
  nchunk = 50,
  con_thr = FALSE,
  predict_area = NULL
)

and this the error:

Predicting ensembles
Error: Cannot allocate a vector of size 22.1 Gb
Error: no more error handler available (recursive errors?); invoke reboot abort

This is the sessionInfo:

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default


locale:
[1] LC_COLLATE=Italian_Italy.utf8  LC_CTYPE=Italian_Italy.utf8    LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C                   LC_TIME=Italian_Italy.utf8    

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.1.4   terra_1.7-65  flexsdm_1.3.4

loaded via a namespace (and not attached):
 [1] randomForest_4.7-1.1 Matrix_1.6-1.1       gtable_0.3.4         compiler_4.3.2       maps_3.4.2           Rcpp_1.0.11          tidyselect_1.2.0     parallel_4.3.2      
 [9] splines_4.3.2        scales_1.3.0         lattice_0.22-5       ggplot2_3.4.4        R6_2.5.1             generics_0.1.3       patchwork_1.2.0      gbm_2.1.8.1         
[17] knitr_1.45           kernlab_0.9-32       iterators_1.0.14     dotCall64_1.1-1      tibble_3.2.1         munsell_0.5.0        nnet_7.3-19          pillar_1.9.0        
[25] rlang_1.1.2          sp_2.1-2             utf8_1.2.4           xfun_0.41            maxnet_0.1.4         doParallel_1.0.17    viridisLite_0.4.2    cli_3.6.2           
[33] spThin_0.2.0         magrittr_2.0.3       mgcv_1.9-0           foreach_1.5.2        grid_4.3.2           rstudioapi_0.15.0    spam_2.10-0          lifecycle_1.0.4     
[41] nlme_3.1-163         fields_15.2          vctrs_0.6.5          glue_1.6.2           raster_3.6-26        codetools_0.2-19     survival_3.5-7       fansi_1.0.6         
[49] colorspace_2.1-0     tools_4.3.2          pkgconfig_2.0.3

The error is not only related to the ensemble, but all the single models as well. As you can see, I already tryed changing the memory settings, and I ran the code in different workstations, but even using the one with 128 Gb RAM did'nt solve the issue. I am wondering if there are some other options that I could try to solve these errors.
Many thanks!

Memory overflow

Hello everyone,

is there any way to reduce the memory allocation of flexsdm? Its simply impossible for me to run some functions, like correct_colinvar() and sdm_predict() since it eats up all memory of my notebook (i7-7500, 16 GB RAM, Ubuntu 22) and i'm using 13 cropped layers of 5400 x 6000 and ~1 km resolution. My notebook is not the best for the task but i feel this memory issue can make some people simply drop the package, mostly students and people who cannot afford for better setups.

For example, raster and terra packages create temp files on a temp dir and allows the user to set memory fraction manually when using rasterOptions(memfrac= value) or terraOptions(memfrac= value). It also can process files into chunks if needed as well.

Sincerely,

Pedro Bittencourt

Create a tune_glm and tune_gam

tune_glm and tune_gam will not necessarily tune the models. It is only to have the same function for fitting and validating the models.

Error in data_ens[[2]] : subscript out of bounds

HI,

Any ideas why I am getting this error when trying to apply esm_max()? How can I solve it?

Code:
esm_max_t1 <- esm_max(
data = ex_spxy2,
response = "pr_ab",
predictors= c("PC1", "PC2"),
partition = ".part",
thr = NULL,
background = ex_bg2,
clamp = TRUE,
classes = "lq",
pred_type = "cloglog",
regmult = 2.5
)

Presences, background points and predictors attached,

Thanks
background.txt
Presencias.txt
somevar.zip

Install fails

the rest of my libraries are updated, but I get the following error.

devtools::install_github('sjevelazco/flexsdm')
Downloading GitHub repo sjevelazco/flexsdm@HEAD
✓ checking for file ‘/tmp/RtmpDagyjX/remotes1c3fe0498f/sjevelazco-flexsdm-aaa7271/DESCRIPTION’ ...
─ preparing ‘flexsdm’:
✓ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘flexsdm_0.0.2.tar.gz’

Installing package into ‘/home/eric/R/x86_64-pc-linux-gnu-library/4.1’
(as ‘lib’ is unspecified)

installing source package ‘flexsdm’ ...
** using staged installation
** R
** data
*** moving datasets to lazyload DB
** inst
** byte-compile and prepare package for lazy loading
Error: package slot missing from signature for generic ‘coerce’
and classes Raster, SpatRaster
cannot use with duplicate class names (the package may need to be re-installed)
Execution halted
ERROR: lazy loading failed for package ‘flexsdm’
removing ‘/home/eric/R/x86_64-pc-linux-gnu-library/4.1/flexsdm’
Warning message:
In i.p(...) :
installation of package ‘/tmp/RtmpDagyjX/file1c3f6dd11570/flexsdm_0.0.2.tar.gz’ had non-zero exit status

sessionInfo:
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 20.2

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so

msdm_posteriori

When I run the code

presence.background.points.model <- rbind(presence.points.model, background.points.model)

maxent.posteriori.model <- msdm_posteriori(
records = presence.background.points.model,
x = "x",
y = "y",
pr_ab = "pr_ab",
cont_suit = maxent.predict.BR$max$max,
method = c("obr"),
thr = "max_sens_spec",
buffer = NULL,
)

I get the message

Error in quantile.default(a, 0:1000/1000) :
missing values and NaN's not allowed if 'na.rm' is FALSE

When I change the code to use a previously calculated threshold, it works fine.

maxent.posteriori.model <- msdm_posteriori(
records = presence.background.points.model,
x = "x",
y = "y",
pr_ab = "pr_ab",
cont_suit = maxent.predict.BR$max$max,
method = c("obr"),
thr = threshold.val,
buffer = NULL,
)

As if it was necessary to remove the NAs from the SpatRaster that with continuous suitability predictions before use it.

Can´t fit random effects with fit_gam?

This is a most useful package;
I have just started using the package and I am having trouble fitting a gam with random effects.
With my data, being able to include such options in the formula enables superior GAM models, e.g. deviance explained for a simple GAM = 33.5%, and deviance explained for a GAM with random effects = 48.1%.
I used the code from the fit_gam help to illustrate the issue - see below.
fit_gam fails with: Sorry, but it was not possible to fit the model
Am I specifying the formula correctly?
Many thanks for any help,
Darren

data(abies)
abies2 <- part_random(
data = abies,
pr_ab = "pr_ab",
method = c(method = "kfold", folds = 5)
)

require(mgcv)
gam_test <- gam(pr_ab ~ s(aet) + s(landform, bs = "re"), data = abies2)
summary(gam_test)
gam_t2 <- fit_gam(
data = abies2,
response = "pr_ab",
predictors = c("aet"),
predictors_f = c("landform"),
select_pred = FALSE,
partition = ".part",
thr = "max_sens_spec",
fit_formula = stats::formula(pr_ab ~ s(aet) + s(landform, bs = "re"))
)

how to create multi band raster

This is a basic question.
How can I create multi band raster (.tif ) file for doing ENM analysis ?

I tried to create it by goal_merge function in QGIS, however It didn't work in flexsdm.
when I use names() in R, the raster file include only one name.

classes default value in esm_max

Hi,

Which is the classes default value in https://sjevelazco.github.io/flexsdm/reference/esm_max.html?

Sorry if it is evident!

Thanks

response curves

I've just started using flex sdm, such a great package. Thanks for creating it.

I was wondering if there is a way to extract response curves for the predictors in the different models. I couldn't find anywhere in the package documentation or vignettes.

Thanks.

Error in (function (cond) and fit_gam

Hi there. Trying to run a fit_gam and am getting an error. I get the same error using the abies data from the package examples. Example code from package and with an error message at the bottom.

data("abies")

abies2 <- part_random(
data = abies,
pr_ab = "pr_ab",
method = c(method = "kfold", folds = 10)
)
abies2

gam_t1 <- fit_gam(
data = abies2,
response = "pr_ab",
predictors = c("aet", "ppt_jja", "pH", "awc", "depth"),
predictors_f = c("landform"),
select_pred = FALSE,
partition = ".part",
thr = "max_sens_spec"
)

Formula used for model fitting:
pr_ab ~ s(aet, k = -1) + s(ppt_jja, k = -1) + s(pH, k = -1) + s(awc, k = -1) + s(depth, k = -1) + landform

Error in (function (cond) :
error in evaluating the argument 'x' in selecting a method for function 'unique': Problem while evaluating dplyr::starts_with(dplyr::all_of(partition)).

Size parameter in the nnet method

Write a code to calculate a default value for the parameter size in the net method
It must be implemented in fit_nnet

nnet::nnet(
formula1,
data = train[[i]],
size = 2, # revise and implement a formula to calculate it
rang = 0.1,
# decay = grid$decay[ii],
maxit = 200,
trace = FALSE
)

https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw#:~:text=However%2C%20neural%20networks%20with%20two,more%20than%20two%20hidden%20layers.

Partial dependence plots for ensemble models

Any plan to make the p_pdp function work with ensemble models?

sjevelazco / flexsdm Goto Github PK

flexsdm's People

Contributors

Stargazers

Watchers

Forkers

flexsdm's Issues

Recommend Projects

Recommend Topics

Recommend Org