I looked at it all again and it looks like the total time is not the problem after all

Random Error in Function Bsptime about bmstdr HOT 12 CLOSED

Hallo951 commented on June 20, 2024

Random Error in Function Bsptime

from bmstdr.

Comments (12)

Hallo951 commented on June 20, 2024

I have tested this with the pause inside the foreach loop but unfortunately this has no positive effect on the random error. So the problem must be somewhere else...

from bmstdr.

Hallo951 commented on June 20, 2024

So, I could now trace the error back to the function "Bsptime/BspTimer_sptime/spTimer::spT.Gibbs". It seems that the used package 'spTime' causes the error...

from bmstdr.

Hallo951 commented on June 20, 2024

I extracted the problem function that causes the random error from the 'Bsptime/BspTimer_sptime' function and set it up to test with the same settings and data I used in my first post. I think this should help clarify the error.

library(spTimer)

data <- read.csv("S:/data.csv")  # Customize path

results <- spTimer::spT.Gibbs(formula = as.formula("y_spec ~ x_precipitation_HYRAS_std + x_DGM1 + x_air_temperature_mean_std + 
    x_radiation_global_mean + x_Auenlehmmaechtigkeit + x_air_temperature_mean_std:x_radiation_global_mean + 
    x_air_temperature_mean_std:x_Auenlehmmaechtigkeit"), 
                              data = data, 
                              model = "GPP", 
                              coords = as.matrix(unique(data[, which(colnames(data) %in% c("coord_x","coord_y"))])), 
                              nItr = 2000, 
                              nBurn = 1000,
                              distance.method = "euclidean", 
                              priors = spTimer::spT.priors(model = "GPP", inv.var.prior = Gamm(2,1), beta.prior = Norm(0, 1e-04^(-1))),
                              spatial.decay = spTimer::spT.decay(distribution = Gamm(2,1), 0.03), 
                              scale.transform = "NONE",
                              knots.coords = as.matrix(data.frame(x = c(314456.1,305362.6,308393.7,311424.9),y = c(5692110,5695567,5695567,5695567))), 
                              cov.fnc = "exponential", 
                              tol.dist = 0.005, 
                              time.data = NULL, 
                              newcoords = NULL,
                              newdata = NULL, 
                              truncation.para = list(at = 0, lambda = 2),
                              annual.aggrn = "NONE", 
                              report = 1)

I think the current behavior of the package 'spTimer' occurs on all models not just the GPP model. But I didn't test that, because currently with my data the GPP-model gets the best results.

Here is a short summary of what I want to use your package for.

I am trying to create habitat modeling for different plants. Using the uploaded example dataset for the plant 'Alium usinum' you can see what kind of data I have available. The dependent variable always has the range 0 <= y <= 1 (or even 0% <= y <= 100% cover). The model I am currently using is the GPP-model, as this has so far given the best results with a reasonable computation time. The goal is to find a regression model with which I can make spatial and/or temporal predictions.

I am not sure if the truncatedGPP-model would be better for my kind of data. I did some tests with it but they were always worse than the GPP-model. Also, in the current implementation of the truncatedGPP-model you can only define one boundary and not two. For these reasons I decided to use the GPP-model for now. What do you advise me to do?

My further procedure in reference to your book and other habitat models I have already created for the identification of the 'best' regression model is currently as follows:

create different regression formulas
create different node configurations based on a regular grid within the study area
create a k-fold cross validation dataset for validation
create different models using the different regression formulas in conjunction with the different node configurations and the cross validation dataset
calculate the mean of all validation statistics for all cross validation models per regression formula and node configuration
extract the best model according to the smallest mean MAE and/or mean RMSE (would the CRPS statistic possibly be better for selection here?).
use the extracted best model for spatial/temporal prediction.

For the implementation I use several loops in which I create a variety of models using their package within a tryCatch - environment, then compare them and select the best one afterwards. And exactly at this point, despite the used tryCatch - environment, I always get a random error, so that the outermost loop ends with an error and then the whole script terminates.

from bmstdr.

Hallo951 commented on June 20, 2024

I think I am beginning to understand the problem. I think this is a classic convergence problem of the regression function which is not properly intercepted/handled. I mean that the respective regression model does not find a solution for the corresponding data. I don't know what the correct technical term for this is in the field of Bayesian statistics. I am not well versed in this branch of statistics.

My thesis is based on the example and the data set used there from the first post here. I scale (normalization of a dataset using the mean value and standard deviation) independent variables in the dataset for testing and then ran the example from first Post in a loop 100000 times. Result, not a single error!

After that I extended the loop with a second loop for different model formulas. Unfortunately this test was not successful. Means that one or more models inside the loop do not converge properly (does not find a solution), thus causing an error and thus causing the whole loop to terminate.

If my thesis is true, a quick solution would be to write a work-around which intercepts the non-converging regression model from the package "spTimer" (I think also the other packages) and outputs an appropriate message and as result NA and thus prevents the current error.

This work-around could be extended to the effect that a non-converging regression model is tried to be executed x times and only then a message and as result NA is output about a non-converging model.

from bmstdr.

Hallo951 commented on June 20, 2024

I have made a fork of your repository and implemented the described work-around there. I am currently testing whether it works like this. But I won't have any results until Monday....if it should work like this, you can take a look at my commits and see if you want to adopt this work-around for the official repository...

from bmstdr.

Hallo951 commented on June 20, 2024

Hello,

A very general question:

Does it make sense to transform (more normalise) the input data for the Bavarian regression in advance using Tukey's Ladder of Powers transformation or is this not necessary for a Bavarian regression?

My tests have shown that a standardization (scale() in R) has a positive influence on the model building, but what about a transformation?

from bmstdr.

sujit-sahu commented on June 20, 2024

Sorry, I have not been able to replicate the random error you reported in the foreach loop. Again I ran the code on both my Windows and Linux machines. Please can you try to run this example on a different machine and tell me if this is still a problem.

It will take me time to implement the power transformation. I will talk to you over email before adding this feature.

Please let me knoiw if it is okay to close this issue.

from bmstdr.

Hallo951 commented on June 20, 2024

I have tested the example from the first post on different computers and it appears every now and then when spTimer does not find a solution and outputs different NA in the model. This then causes various subsequent errors which I catch in the fork.

Please have a look at the change to solve the problem in the fork. It works very well and should be included in the maincode if possible, unless they have a better solution.

from bmstdr.

sujit-sahu commented on June 20, 2024

Sorry, I am not able to see your fork or solution. I did not see your pull request as well. Please can you post it again. I will have a look.

from bmstdr.

Hallo951 commented on June 20, 2024

Oh, my bad. I thought you can see the fork on github. I make a pull request on...

from bmstdr.

Hallo951 commented on June 20, 2024

Please take another look at the closed thread about the error in the prediction. I have written something else about it....

from bmstdr.

sujit-sahu commented on June 20, 2024

I could not see that thread. Please feel free to open a new issue.

from bmstdr.

Random Error in Function Bsptime about bmstdr HOT 12 CLOSED

Comments (12)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent