jtimonen / lgpr Goto Github PK

R-package for interpretable nonparametric modeling of longitudinal data using additive Gaussian processes. Contains functionality for inferring covariate effects and assessing covariate relevances. Various models can be specified using a convenient formula syntax.

Home Page: https://jtimonen.github.io/lgpr-usage/

R 91.31% C++ 0.01% Stan 8.31% CSS 0.10% Shell 0.27%

stan longitudinal-data gaussian-processes bayesian-inference r-packages

lgpr's People

Contributors

Stargazers

Watchers

Forkers

smukherj09

lgpr's Issues

the default value for argument 'reduce' can have wrong class on some systems

The functions relevances and pred have default value mean for the argument reduce. An error is thrown if class(reduce)[1] is not "function", but on some settings it can be standardGeneric

Scalability of lgpr

Is your feature request related to a problem? Please describe.
First of all, thank you very much for writing this package and making it publicly available. I was trying to use it to analyze time-series linguistic data. One problem with lgpr, it seems, is that it can't currently handle datasets with more than 300 observations. A typical dataset in linguistics that involves 20 individuals with repeated measurements in a couple of conditions can easily exceed 10,000 entries. My computer crashed when I was attempting to use lgpr to analyze the data.

Describe the solution you'd like
Is there a way to scale up the analytical capability of lgpr, so that it can handle bigger datasets efficiently?

Describe alternatives you've considered
I'm also trying to use other generalized additive mixed effects models to analyze the data.

Additional context
I'd like to be able to fit the following model:

fit <- lgp(VALUE ~ TIME + TIME|VOICING + TIME|TONE + HEIGHT + PLACE + GENDER + PERSON,
           data = d,
           iter = 1000,
           chains = 4,
           cores = 4,
           refresh = 500)

with the following simulated dataset:

begin_f0 <- function(gender) {
  if (gender == "M") {
    return(rnorm(n = 1, mean = 150, sd = 3))
  } else {
    return(rnorm(n = 1, mean = 250, sd = 3))
  }
}

end_f0 <- function(gender) {
  if (gender == "M") {
    return(rnorm(n = 1, mean = 120, sd = 3))
  } else {
    return(rnorm(n = 1, mean = 220, sd = 3))
  }
}

a_coeff <- function(f1, f2) {
  return((f1 - f2)/81)
}

b_coeff <- function(f1, f2) {
  return(20 * (f2 - f1) / 81)
}

c_coeff <- function(f1, f2) {
  return(f1 - (f1 - f2)/81 - 20 * (f2 - f1) / 81)
}

get_f0 <- function(f1, f2) {
  a_ <- a_coeff(f1, f2)
  b_ <- b_coeff(f1, f2)
  c_ <- c_coeff(f1, f2)
  t <- 1:10
  return(a_ * t^2 + b_ * t + c_)
}


set.seed(2021)
n_indiv <- 6
participants <- sample(c("M", "F"), size = n_indiv, replace = TRUE)

f_p_i_1 <- vector("numeric")
f_p_i_4 <- vector("numeric")
f_p_ai_1 <- vector("numeric")
f_p_ai_4 <- vector("numeric")

f_b_i_1 <- vector("numeric")
f_b_i_4 <- vector("numeric")
f_b_ai_1 <- vector("numeric")
f_b_ai_4 <- vector("numeric")

f_t_i_1 <- vector("numeric")
f_t_i_4 <- vector("numeric")
f_t_ai_1 <- vector("numeric")
f_t_ai_4 <- vector("numeric")

f_d_i_1 <- vector("numeric")
f_d_i_4 <- vector("numeric")
f_d_ai_1 <- vector("numeric")
f_d_ai_4 <- vector("numeric")

f_k_ai_1 <- vector("numeric")
f_k_ai_4 <- vector("numeric")
f_g_ai_1 <- vector("numeric")
f_g_ai_4 <- vector("numeric")

f_m_i_1 <- vector("numeric")
f_m_i_4 <- vector("numeric")
f_m_ai_4 <- vector("numeric")

f_n_i_4 <- vector("numeric")
f_n_ai_4 <- vector("numeric")

f0 <- vector("numeric")

for (p in participants) {
  perturbation_effect <- rnorm(1, -10, 5)
  f_e <- end_f0(p)
  f_b_voiceless <- begin_f0(p)
  f_b_voiced <- f_b_voiceless + perturbation_effect
  f_b_sonorant <- f_e + (f_b_voiceless - f_e) * 0.1

  voicelss_f0 <- get_f0(f_b_voiceless, f_e)
  voiced_f0 <- get_f0(f_b_voiced, f_e)
  sonorant_f0 <- get_f0(f_b_sonorant, f_e)

  height_diff <- runif(1, 0, 2)

  f_p_i_1 <- rep(voicelss_f0, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_p_i_4 <- rep(voicelss_f0, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_p_ai_1 <- rep(voicelss_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_p_ai_4 <- rep(voicelss_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)

  f_b_i_1 <- rep(voiced_f0, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_b_i_4 <- rep(voiced_f0, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_b_ai_1 <- rep(voiced_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_b_ai_4 <- rep(voiced_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)

  f_t_i_1 <- rep(voicelss_f0, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_t_i_4 <- rep(voicelss_f0, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_t_ai_1 <- rep(voicelss_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_t_ai_4 <- rep(voicelss_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)

  f_d_i_1 <- rep(voiced_f0, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_d_i_4 <- rep(voiced_f0, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_d_ai_1 <- rep(voiced_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_d_ai_4 <- rep(voiced_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)

  f_k_ai_1 <- rep(voicelss_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_k_ai_4 <- rep(voicelss_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_g_ai_1 <- rep(voiced_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_g_ai_4 <- rep(voiced_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)

  f_m_i_1 <- rep(sonorant_f0, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_m_i_4 <- rep(sonorant_f0, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_m_ai_4 <- rep(sonorant_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)

  f_n_i_4 <- rep(sonorant_f0, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)
  f_n_ai_4 <- rep(sonorant_f0 - height_diff, 3) + rep(rnorm(3, mean = 0, sd = 1), each = 10)

  f0 <- c(f0, f_p_i_1, f_p_i_4, f_p_ai_1, f_p_ai_4,
          f_b_i_1, f_b_i_4, f_b_ai_1, f_b_ai_4,
          f_t_i_1, f_t_i_4, f_t_ai_1, f_t_ai_4,
          f_d_i_1, f_d_i_4, f_d_ai_1, f_d_ai_4,
          f_k_ai_1, f_k_ai_4,
          f_g_ai_1, f_g_ai_4,
          f_m_i_1, f_m_i_4, f_m_ai_4,
          f_n_i_4, f_n_ai_4)
}

f0 <- matrix(f0, ncol = 10, byrow = TRUE)

tone <- rep(rep(c("Tone1", "Tone4", "Tone1", "Tone4",
                  "Tone1", "Tone4", "Tone1", "Tone4",
                  "Tone1", "Tone4", "Tone1", "Tone4",
                  "Tone1", "Tone4", "Tone1", "Tone4",
                  "Tone1", "Tone4",
                  "Tone1", "Tone4",
                  "Tone1", "Tone4", "Tone4",
                  "Tone4", "Tone4"), each = 3), n_indiv)

cons <- rep(rep(c("p", "p", "p", "p",
                  "b", "b", "b", "b",
                  "t", "t", "t", "t",
                  "d", "d", "d", "d",
                  "k", "k",
                  "g", "g",
                  "m", "m", "m",
                  "n", "n"), each = 3), n_indiv)

voicing <- rep(rep(c("voiceless", "voiceless", "voiceless", "voiceless",
                     "voiced", "voiced", "voiced", "voiced",
                     "voiceless", "voiceless", "voiceless", "voiceless",
                     "voiced", "voiced", "voiced", "voiced",
                     "voiceless", "voiceless",
                     "voiced", "voiced",
                     "nasal", "nasal", "nasal",
                     "nasal", "nasal"), each = 3), n_indiv)

height <- rep(rep(c("high", "high", "low", "low",
                    "high", "high", "low", "low",
                    "high", "high", "low", "low",
                    "high", "high", "low", "low",
                    "low", "low",
                    "low", "low",
                    "high", "high", "low",
                    "high", "low"), each = 3), n_indiv)

place <- rep(rep(c("bilabial", "bilabial", "bilabial", "bilabial",
                   "bilabial", "bilabial", "bilabial", "bilabial",
                   "alveolar", "alveolar", "alveolar", "alveolar",
                   "alveolar", "alveolar", "alveolar", "alveolar",
                   "velar", "velar",
                   "velar", "velar",
                   "bilabial", "bilabial", "bilabial",
                   "alveolar", "alveolar"), each = 3), n_indiv)

gender <- rep(participants, each = 75)
person <- rep(1:n_indiv, each = 75)

df <- cbind(person, gender, cons, tone, voicing, height, place, as.data.frame(f0))
colnames(df) <- c("PERSON", "GENDER", "CONS", "TONE", "VOICING", "HEIGHT", "PLACE", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10")

df %>%
  pivot_longer(cols = !c("PERSON", "GENDER", "CONS", "TONE", "VOICING", "HEIGHT", "PLACE"), names_to = "TIME", values_to = "F0") %>%
  mutate(TIME = as.integer(TIME),
         TRIAL = rep(1:(75*n_indiv), each = 10)) %>%
  ggplot(aes(x = TIME, y = F0, color = CONS, group = TRIAL)) +
  geom_line() +
  facet_wrap(~PERSON)

d <- df %>%
  pivot_longer(!c("PERSON", "GENDER", "CONS", "TONE", "VOICING", "HEIGHT", "PLACE"), names_to = "TIME", values_to = "VALUE") %>%
  mutate(PERSON = as.factor(PERSON),
         GENDER = as.factor(GENDER),
         CONS = as.factor(CONS),
         TONE = as.factor(TONE),
         VOICING = as.factor(VOICING),
         HEIGHT = as.factor(HEIGHT),
         PLACE = as.factor(PLACE),
         TIME = as.numeric(TIME))

d <- as.data.frame(d)

Again, appreciated!

speed ups in code

I believe it will be faster to do (will need rstan 2.26+ or recent cmdstanr)

lgpr/inst/stan/chunks/functions-posterior.stan

Line 69 in abdcbfd

matrix[n, n] Ky = STAN_matrix_array_sum(KX) + diag_matrix(sigma2_vec) +

// current code
  matrix[n, n] Ky = STAN_matrix_array_sum(KX) + diag_matrix(sigma2_vec) +
    diag_matrix(num_comps*delta_vec)

// proposed
  matrix[n, n] Ky = add_diag(STAN_matrix_array_sum(KX), sigma2_vec + num_comps*delta_vec);

Line 13 in abdcbfd

matrix[num_obs, num_obs] Delta = diag_matrix(delta_vec);

// current
  matrix[num_obs, num_obs] Delta = diag_matrix(delta_vec);
  matrix[num_obs, num_obs] KX[num_comps] = STAN_kernel_all(num_obs, num_obs,
      K_const, components, x_cont, x_cont, x_cont_unnorm, x_cont_unnorm,
      alpha, ell, wrp, beta, teff, 
      vm_params, idx_expand, idx_expand, teff_zero);
      
  for(j in 1:num_comps){
    f_latent[1, j] = cholesky_decompose(KX[j] + Delta) * eta[1, j];
  }

// proposed
  // matrix[num_obs, num_obs] Delta = diag_matrix(delta_vec);
  matrix[num_obs, num_obs] KX[num_comps] = STAN_kernel_all(num_obs, num_obs,
      K_const, components, x_cont, x_cont, x_cont_unnorm, x_cont_unnorm,
      alpha, ell, wrp, beta, teff, 
      vm_params, idx_expand, idx_expand, teff_zero);
      
  for(j in 1:num_comps){
    f_latent[1, j] = cholesky_decompose(add_diag(KX[j], delta_vec)) * eta[1, j];
  }

Better error message when x_cont | x_cat is wrong way around in formula

Currently, if the categorical and continuous covariate are wrong way around the vertical pipe, it gives

cannot have more than one gp(), gp_ns() or gp_vm() expression in one term! found = 2

because it translates to gp(x_cat)*gp(x_cont) but a better error message would be good.

Models using a standard GP with a continuous covariate fails

Environment

lgpr v1.0.12
rstantools 2.0.0
rstan 2.21.1
R 4.0.2

Describe the bug
Models using a standard GP with a continuous covariate fails

To Reproduce
Steps to reproduce the behavior:

fit <- lgp(y ~ gp(age) + zs(id)*gp(age) + gp(blood) + zs(sex)*gp(age) + gp_vm(dis_age),
             testdata_001,
             likelihood = "gaussian",
             refresh = 10,
             iter = 1000, 
             chains = 4,
             cores = 4,
             control = list(adapt_delta = 0.99),
             save_warmup = FALSE,
             verbose = T)


t <- seq(0, 36, by = 1)
x_pred <- new_x(testdata_001, t)
p <- pred(fit, testdata_001, reduce = mean)
plot_pred(fit, x = x_pred, pred = p, t_name="age", group_by="id")

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 148, 24

Expected behavior
plot_pred and plot_components should work here

Additional context
I can't get this to work with plot_f.create.df_line() either. Not sure if this is a user error or a bug - appreciate any help. thanks!

Cannot compile under Windows

I am using R 3.6.1 and Rtools 3.5 (latest version, reinstalled it today) but cannot compile, seems to be some kind of incompatibility with the compiler flags.

Messages from g++:

C:/Rtools/mingw_64/bin/g++ -O2 -Wno-unused-variable -Wno-unused-function  -I"C:/PROGRA~1/R/R-36~1.1/include" -DNDEBUG -I"../inst/include" -I"C:/Users/sner0004/rpackages/StanHeaders/include/src" -DBOOST_DISABLE_ASSERTS -DEIGEN_NO_DEBUG -I"C:/Users/sner0004/rpackages/BH/include" -I"C:/Users/sner0004/rpackages/Rcpp/include" -I"C:/Users/sner0004/rpackages/RcppEigen/include" -I"C:/Users/sner0004/rpackages/rstan/include" -I"C:/Users/sner0004/rpackages/StanHeaders/include"   -march=core2     -O3 -Wno-unused-variable -Wno-unused-function -c stanExports_lgp.cc -o stanExports_lgp.o


In file included from C:/Rtools/mingw_64/x86_64-w64-mingw32/include/c++/type_traits:35:0,
                 from C:/Users/sner0004/rpackages/StanHeaders/include/stan/math/prim/scal/meta/is_constant.hpp:4,
                 from C:/Users/sner0004/rpackages/StanHeaders/include/stan/math/prim/arr/meta/is_constant_struct.hpp:4,
                 from C:/Users/sner0004/rpackages/StanHeaders/include/stan/math/prim/mat/meta/is_constant_struct.hpp:4,
                 from C:/Users/sner0004/rpackages/StanHeaders/include/stan/math/prim/mat.hpp:15,
                 from C:/Users/sner0004/rpackages/StanHeaders/include/stan/math/rev/mat.hpp:12,
                 from C:/Users/sner0004/rpackages/StanHeaders/include/src/stan/model/log_prob_grad.hpp:4,
                 from C:/Users/sner0004/rpackages/StanHeaders/include/src/stan/model/test_gradients.hpp:7,
                 from C:/Users/sner0004/rpackages/StanHeaders/include/src/stan/services/diagnose/diagnose.hpp:10,
                 from C:/Users/sner0004/rpackages/rstan/include/rstan/stan_fit.hpp:35,
                 from C:/Users/sner0004/rpackages/rstan/include/rstan/rstaninc.hpp:3,
                 from stanExports_lgp.h:6,
                 from stanExports_lgp.cc:5:
C:/Rtools/mingw_64/x86_64-w64-mingw32/include/c++/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support for the \
  ^

Avoiding large matrices when computing predictive variance

Currently P x P matrices are computed for the Gaussian posterior and predictive distributions, where P is the number of prediction points. This should be avoided if only their diagonal is needed. This will require writing a function that computes only the diagonal of each kernel matrix.

Tutorial about real data analysis

It would be useful to have a tutorial where for example proteomics data is analysed. Alternatively or additionally some classic longitudinal dataset like Orthodont or sleepstudy.

Supporting spherical coordinates?

I'm currently using mgcv::gam with spherical data where I can formulate a model as y ~ s(latitude,longitude,bs='sos',dim=2). Since, as far as I understand, the main thing that differentiates this from an isotropic smooth in cartesian coordinates is the use of great-circle distance rather than euclidean, it strikes me as something that should be fairly straightforward to implement?

dependent variable hardcoded for plotting model predictions

Environment

lgpr v1.0.4
rstantools 2.0.0
rstan 2.21.1
R 4.0.2

Describe the bug
If the dependent variable in your model is named something other than y then the plot_pred() function fails

To Reproduce

testdata_002_new <- testdata_002 %>% 
  dplyr::rename(time=age, depvar=y)

fit <- lgp(depvar ~ gp(time) + zs(id)*gp(time) + zs(sex)*gp(time) + gp_vm(diseaseAge) + zs(group),
           data     = testdata_002_new,
           prior    = list(wrp = log_normal(-0.7, 0.3)), 
           iter     = 200,
           chains   = 2,
           cores = 2,
           refresh  = 50)

t <- seq(1, 120, by = 1)
x_pred <- new_x(testdata_002_new, t, x = "time", x_ns = "diseaseAge")
p <- pred(fit, x_pred, reduce = mean)

This works
plot_f(fit, x = x_pred, pred = p, t_name = "time")

This fails

plot_pred(fit, x = x_pred, pred = p, t_name = "time")

Error in FUN(X[[i]], ...) : object 'y' not found

Expected behavior
That the dependent variable can be named something other than y

Thanks!
-shane

Some generated quantities can have zero variance and r_hat is therefore NaN

Masked effects are always zero for the masked group which is why this happens. Need to get rid of possible warnings or reporting largest r_hat as NaN due to this.

Need to add guidance on how to set prior for input warping steepness

Need to add guidance on how to set prior for input warping steepness in documentation or tutorials

Build failure on macOS in Macports (clang/gcc): error: cannot initialize a member subobject of type 'std::ostream ' with an lvalue of type 'SEXP' (aka 'SEXPREC ')

In file included from RcppExports.cpp:4:
  In file included from /opt/local/Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/RcppEigen.h:25:
  In file included from /opt/local/Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/RcppEigenForward.h:26:
  In file included from /opt/local/Library/Frameworks/R.framework/Versions/4.2/Resources/library/Rcpp/include/RcppCommon.h:169:
  In file included from /opt/local/Library/Frameworks/R.framework/Versions/4.2/Resources/library/Rcpp/include/Rcpp/as.h:25:
  /opt/local/Library/Frameworks/R.framework/Versions/4.2/Resources/library/Rcpp/include/Rcpp/internal/Exporter.h:31:28: error: cannot initialize a member subobject of type 'std::ostream *' with an lvalue of type 'SEXP' (aka 'SEXPREC *')
                      Exporter( SEXP x ) : t(x){}
                                           ^ ~
  /opt/local/Library/Frameworks/R.framework/Versions/4.2/Resources/library/Rcpp/include/Rcpp/as.h:87:41: note: in instantiation of member function 'Rcpp::traits::Exporter<std::ostream *>::Exporter' requested here
              ::Rcpp::traits::Exporter<T> exporter(x);
                                          ^
  /opt/local/Library/Frameworks/R.framework/Versions/4.2/Resources/library/Rcpp/include/Rcpp/as.h:152:26: note: in instantiation of function template specialization 'Rcpp::internal::as<std::ostream *>' requested here
          return internal::as<T>(x, typename traits::r_type_traits<T>::r_category());
                           ^
  /opt/local/Library/Frameworks/R.framework/Versions/4.2/Resources/library/Rcpp/include/Rcpp/InputParameter.h:34:38: note: in instantiation of function template specialization 'Rcpp::as<std::ostream *>' requested here
          inline operator T() { return as<T>(x) ; }
                                       ^
  RcppExports.cpp:23:54: note: in instantiation of member function 'Rcpp::InputParameter<std::ostream *>::operator std::ostream *' requested here
      rcpp_result_gen = Rcpp::wrap(STAN_var_mask(x, a, pstream__));
                                                       ^
  18 warnings and 1 error generated.

gcc-12 builds it fine, but clang-15 fails.
Complete log can be found at: https://github.com/macports/macports-ports/actions/runs/4634085782/jobs/8199927979?pr=18211

[macOS] error: 'stan' was not declared in this scope; did you mean 'tan'?

With the current versions of rstan and Rcpp it fails worse that before:

RcppExports.cpp:15:15: error: 'stan' was not declared in this scope; did you mean 'tan'?
   15 | Eigen::Matrix<stan::promote_args_t<stan::value_type_t<double>, double>, -1, 1> STAN_var_mask(const Eigen::Matrix<double, -1, 1>& x, const double& a, std::ostream* pstream__);
      |               ^~~~
      |               tan
RcppExports.cpp:15:78: error: wrong number of template arguments (1, should be at least 3)
   15 | Eigen::Matrix<stan::promote_args_t<stan::value_type_t<double>, double>, -1, 1> STAN_var_mask(const Eigen::Matrix<double, -1, 1>& x, const double& a, std::ostream* pstream__);
      |                                                                              ^

Then:

/opt/local/Library/Frameworks/R.framework/Versions/4.3/Resources/library/RcppEigen/include/Eigen/src/Core/util/ForwardDeclarations.h:70:9: note: provided for 'template<class _Scalar, int _Rows, int _Cols, int _Options, int _MaxRows, int _MaxCols> class Eigen::Matrix'
   70 | > class Matrix;
      |         ^~~~~~
RcppExports.cpp:278:881: error: template argument 1 is invalid
  278 | std::vector<Eigen::Matrix<stan::promote_args_t<double, double, double, double, double>, -1, -1>> STAN_kernel_all(const int& n1, const int& n2, const std::vector<Eigen::Matrix<double, -1, -1>>& K_const, const std::vector<std::vector<int>>& components, const std::vector<Eigen::Matrix<double, -1, 1>>& x1, const std::vector<Eigen::Matrix<double, -1, 1>>& x2, const std::vector<Eigen::Matrix<double, -1, 1>>& x1_unnorm, const std::vector<Eigen::Matrix<double, -1, 1>>& x2_unnorm, const std::vector<double>& alpha, const std::vector<double>& ell, const std::vector<double>& wrp, const std::vector<Eigen::Matrix<double, -1, 1>>& beta, const std::vector<Eigen::Matrix<double, -1, 1>>& teff, const std::vector<double>& vm_params, const std::vector<int>& idx1_expand, const std::vector<int>& idx2_expand, const std::vector<Eigen::Matrix<double, -1, 1>>& teff_zero, std::ostream* pstream__);
      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 ^
RcppExports.cpp:278:881: error: template argument 2 is invalid
RcppExports.cpp: In function 'SEXPREC* _lgpr_STAN_kernel_all(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP)':
RcppExports.cpp:301:34: error: 'STAN_kernel_all' was not declared in this scope; did you mean 'STAN_kernel_eq'?
  301 |     rcpp_result_gen = Rcpp::wrap(STAN_kernel_all(n1, n2, K_const, components, x1, x2, x1_unnorm, x2_unnorm, alpha, ell, wrp, beta, teff, vm_params, idx1_expand, idx2_expand, teff_zero, pstream__));
      |                                  ^~~~~~~~~~~~~~~
      |                                  STAN_kernel_eq

The earlier error is also there: #27

Need more colors for plot_components

The color scheme in plot_components has only 5 different colors so setting color_by doesn't give expected results when
there is a component involving six or more categories. A larger set of colors or some other way to visually separate the categories is needed when there are 6 or more categories.

Tutorial about prior predictive checking

Tutorial demonstrating prior predictive checking would be useful.

interactions for zs() or categ() variables

Hei,

Thanks for the nice software. I'm using lgpr v1.0.4 with rstantools 2.0.0, rstan 2.21.1, and R 4.0.2 I guess this is a feature request.

Is your feature request related to a problem? Please describe.
I have a longitudinal experiment with two treatment factors and I expect there to be an interaction between them. This is what I get when I create the model.

Invalid term with expressions: zs(predation)*zs(pseudomonas_hist). Note that each term can contain at most one categ() or zs() expression.

What is the reason why we can't have an interaction between two zs() or categ() kernels?

Describe alternatives you've considered
My workaround is to just manually encode the interaction as a dummy variable and then include it in the model with a zs(dummy_interaction)*gp(time) interaction. Is that the correct approach?

Describe the solution you'd like
Native support for categorical interactions if that is possible. Otherwise include in the support or tutorials a short section that explains why you can't have zs() interactions and what the correct approach should be if you have a categorical interaction in your experiment.

thanks for your help!
-shane

Easier/faster prior predictive checks

The current way to do a prior predictive check is to first run lgp() using the options sample_f=TRUE and prior_only=TRUE after which one can use ppc(). This is not any faster than posterior sampling, and it would probably be much faster to just draw from
the hyperparameter priors separately first, and then for each draw use lgp() with sample_f=TRUE, prior_only=TRUE and algorithm="Fixed_param".

data.table and tibble objects not accepted as data argument

Describe the bug
Pass anything other than a native R data frame to the data argument of create_model() and an error is (eventually) thrown,

Error in parse_response(data, likelihood, model_formula) : 
  <data> must be a data.frame! found = data.tabledata.frame

To Reproduce
Steps to reproduce the behavior:

library(lgpr)
dat <- data.table::data.table(x = 1:3, y = rnorm(3))
mod <- create_model(y ~ x, dat)

tbl <- tibble::as_tibble(dat)
mod <- create_model(y ~ x, tbl)

This actually throws a different, hard-to-interpret error, I suspect due to the way data.frames and data.tables do d[i,j] subset syntax differently:

Error in check_allowed(class(data[, j])[1], allowed) : 
  Error in data_types(data): The given value 'integer' for argument <class(data[, j])[1]> is invalid.
Allowed values are {factor, numeric}.

To get the first error (above), wrap something on the right-hand side in gp():

modgp <- create_model(y ~ gp(x), dat)

Expected behavior
Tibbles should work just like data frames, but the function lgrp:::parse_response essentially does a check of the form

if (class(data) != 'data.frame')
  stop('Error message')

when it should maybe be doing something like !inherits(data, 'data.frame').

On the other hand, data.tables work differently (and appear to cause problems with subsetting syntax in lgpr) so maybe it should just check the class of data earlier on and throw a more informative error message, or else call as.data.frame with a warning.

Additional context
Maybe the package shouldn't accept anything other than vanilla data.frames, but if so the checks/errors should be clearer.

For now the workaround for end-users is to explicitly coerce to a native data frame (i.e. as.data.frame(data)), before passing data to any lgpr functions.

Better documentation for get_pred.sampled

There are typos an not totally exact explanation in the documentation of get_pred.sampled.

Possible bug in get_teff_obs()

Describe the bug
get_teff_obs() accepts the argument group_by, but the call to pick_one_row_each() uses the default value "id". (While get_teff_obs() is not exported, it is called by functions that are, e.g., new_x().)

To Reproduce
Steps to reproduce the behavior:

library(lgpr)
#> Attached lgpr 1.0.12, using rstan 2.21.2.
dat <- data.frame(
  id = factor(rep(letters[1:2], each = 2)),
  age = c(16, 17, 36, 37),
  diseaseAge = c(1, 2, 4, 5)
)
lgpr:::get_teff_obs(dat)
#>  a  b 
#> 15 32
dat$subj_id <- dat$id
dat$id <- NULL
lgpr:::get_teff_obs(dat, group_by = "subj_id")
#> Error in dollar(data, fac): Variable with name 'id' not found in <data>! Found: {age, diseaseAge, subj_id}

Expected behavior
In the example above, I would expect get_teff_obs() to have the same return value in both cases, i.e., when group_by is not named "id".

cc1plus.exe: out of memory in installation from source

I get

cc1plus.exe: out of memory allocating 65536 bytes
make: *** [C:/PROGRA~1/R/R-40~1.2/etc/i386/Makeconf:227: stanExports_lgp_latent.o] Error 1

when trying to install on Windows

Numerical stability of negative binomial random numbers

When using do_yrng = TRUE with the negative binomial model,
and error can occur due to numerical problems (see here). This problem should addressed in a future release to allow more stable prior and posterior predictive sampling.

A common error message is

23] "  Exception: neg_binomial_2_log_rng: Random number that came from gamma distribution is 8.79585e+09, but must be less than 1.07374e+09  (in 'chunks/generated.stan' at line 34; included from 'model_lgp' at line 52)"
error occurred during calling the sampler; sampling not done

jtimonen / lgpr Goto Github PK

lgpr's People

Contributors

Stargazers

Watchers

Forkers

lgpr's Issues

Recommend Projects

Recommend Topics

Recommend Org