Coder Social home page Coder Social logo

citools's Introduction

ciTools: An R Package for Quick Uncertainty Intervals

As seen at RStudio::conf2018!

Overview

ciTools is an R package that makes working with model uncertainty as easy as possible. It gives the user easy access to confidence or prediction intervals for the fitted values of (log-) linear models, generalized linear models, and (log-) linear mixed models. Additionally, we provide functions to determine probabilities and quantiles of the conditional response distribution given each of these models.

Status

I no longer have time to actively maintain ciTools. There are some known bugs and limitations.

Why use ciTools?

Uncertainty intervals (confidence or prediction intervals) for fitted values are easy enough if one is just working with linear models, but when models become more complex, e.g. generalized linear models or linear mixed models, the facilities for quantifying uncertainty for predictions are either more cumbersome to work with or nonexistent. We hope that ciTools provides uniform, model invariant access to these statistics.

Main Components

This package has three main features that make it useful to R users:

  • Data frame first design - Easily pipe commands together (great for ggplot users)
  • Uniform syntax - all commands have the same simple syntax and sane defaults.
  • Generic design - One command for each statistical quantity

All commands in ciTools:

  • add_ci(data, fit, ...) - get confidence intervals and append to data
  • add_pi(data, fit, ...) - get prediction intervals and append to data
  • add_probs(data, fit, q, ...) - get conditional response probabilities, Pr(Response|Covariates < q)
  • add_quantile(data, fit, p, ...) - get conditional response quantiles at level p

Installation

Install from CRAN:

install.packages("ciTools")
library(ciTools)

Install ciTools from Github with devtools:

library(devtools)
devtools::install_github("jthaman/ciTools")
library(ciTools)

If you are behind a corporate firewall, you may have to instead run:

library(httr)
library(devtools)
httr::set_config(config(ssl_verifypeer = FALSE))
devtools::install_github("jthaman/ciTools")
httr::set_config(config(ssl_verifypeer = TRUE))
library(ciTools)

Usage

Here is a short example featuring a linear mixed model, but the idea is the same for any other model.

## Example with a linear mixed model

library(ciTools)
library(lme4)
library(dplyr)

dat <- sleepstudy ## data included with lme4
## Fit a random intercept model
fit <- lmer(Reaction ~ Days + (1 | Subject), data = dat)

## New data for predictions
## A random sample of 20 observations from sleepstudy
new_dat <- sleepstudy[sample(NROW(dat), 20),]

## Append (conditional) 50% CIs and PIs to the new data:
new_dat <- add_ci(new_dat, fit, alpha = 0.5) %>%
    add_pi(fit, alpha = 0.5)

## Append 50% CIs and PIs, but ignore the random effects:
add_ci(new_dat, fit, alpha = 0.5, includeRanef = FALSE,
       names = c("lcb_uncond", "ucb_uncond"), ## custom interval names
       yhatName = "pred_uncond") %>%
    add_pi(fit, alpha = 0.5, includeRanef = FALSE,
           names = c("lpb_uncond", "upb_uncond"))

## Determine the probability that a new Reaction time will be less
## than 300 (given the model and new data):
add_probs(new_dat, fit, 300)

## Or greater than:
add_probs(new_dat, fit, 300, comparison = ">")

## Determine the 0.9-quantile of the predictive distribution, given the
## model and new data:
add_quantile(new_dat, fit, 0.9)

The end result is a data frame that is amenable to plotting with ggplot2. See the documentation for more examples with other types of statistical regression models.

Scope of ciTools

Model Confidence Intervals Prediction Intervals Response Probabilities Response Quantiles
Linear [X] [X] [X] [X]
Log-Linear [X] [X] [X] [X]
GLM [X] [X] [X] [X]
Neg. Binomial [X] [X] [X] [X]
Linear Mixed [X] [X] [X] [X]
Log-Linear Mixed [TODO] [X] [X] [X]
Survival (AFT model) [X] [X] [X] [X]
GLMM [X] [X] [X] [X]

[X] = Implemented

Help us out?

We still have work to do. Submit an Issue if you run into a bug, or a PR if you think you can help us out (see the TODO file).

Authors

John Haman and Matt Avery

Copyright

ciTools (C) 2017 Institute for Defense Analyses

citools's People

Contributors

billdenney avatar jthaman avatar matthewravery avatar znmeb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

citools's Issues

Implement a parametric method for GLM PIs

Currently we simulate from the model using a parametric bootstrap. We showed in the GLM vignette that this is valid if the model is "true", and the simulation is still relatively fast, but a parametric method would be more pleasing to me.

I'm not sure if there is any literature available on this.

new regression related to bind_rows

Run the example code in add_ci.lmerMod.R:

dat <- lme4::sleepstudy
 # Fit a linear mixed model (random intercept model)
 fit <- lme4::lmer(Reaction ~ Days + (1|Subject), data = lme4::sleepstudy)
 # Get the fitted values for each observation in dat, and
 # append CIs for those fitted values to dat
 add_ci(dat, fit, alpha = 0.5)

I'm getting a warning message related to bind_rows:

add_ci(dat, fit, alpha = 0.5)
Warning in bind_rows_(x, .id) :
  binding character and factor vector, coercing into character vector
# A tibble: 180 x 6
   Reaction  Days Subject     pred  LCB0.25  UCB0.75
      <dbl> <dbl>  <fctr>    <dbl>    <dbl>    <dbl>
 1 249.5600     0     308 292.1888 283.0011 301.3766
 2 258.7047     1     308 302.6561 293.5979 311.7143
 3 250.8006     2     308 313.1234 304.1636 322.0832
 4 321.4398     3     308 323.5907 314.6971 332.4843
 5 356.8519     4     308 334.0580 325.1976 342.9183
 6 414.6901     5     308 344.5252 335.6649 353.3856
 7 382.2038     6     308 354.9925 346.0989 363.8861
 8 290.1486     7     308 365.4598 356.5000 374.4196
 9 430.5853     8     308 375.9271 366.8689 384.9853
10 466.3535     9     308 386.3944 377.2066 395.5821
# ... with 170 more rows

This has to do with some code in the helper functions file. If it's harmless we can just wrap the code in suppressWarnings(), but I'm not convinced it's entirely harmless.

need response variable in new data for add_pi

Very useful package. I have a minor suggestion: in add_pi (and possibly other functions - I haven't checked), an error is thrown if tb does not include a column for the response variable. The actual values in the column are ignored, but the column has to be present. I'd guess this is due to some internal code similar to:
X <- formula(fit) %>% model.matrix(data = tb)
used to get the design matrix for simulation-based prediction intervals. Using a few more steps should eliminate the need to include the response. Something like:
chr_formula <- formula(fit) %>% deparse() %>% strsplit(' ') %>% getElement(1)
X <- as.formula(chr_formula[-1]) %>% model.matrix(data = tb)
I noticed this specifically with a Poisson GLM. add_ci did not require the response to be in tb.

Add nls

I've done some work in the past with CI's and PI's for non linear regression using the propagate package. Would you be interested in receiving a PR to add nls model support to this package?

Reuse simulation data

For certain methods like add_pi.glm and add_quantile.glm, a small parametric bootstrap is performed to generate the desired statistic. Presently if these two methods are used, a separate simulation is run for each method. If the number of replicates (nSims) and new data (tb) are not changed, it would be nice to store and reuse the simulation data.

The benefits of storing simulation data are twofold:

  1. We can speed up our computations if many methods are run in sequence
  2. add_pi, add_qauntile, and add_probs will be consistent within an analysis --- each statistic calculated on the same simulation data.

Though we could just handle the second point by setting a seed.

add_ci.lmer chokes on "big data"

I'm finding that we cannot use add_ci.lmer for "big data". I tried an example from the mermod vignette with 200,000 observations and found that R couldn't put the new data frame into memory. Here's the example I tried:

## linear example

x_gen_mermod <- function(ng = 8, nw = 5){
  n <- ng * nw
  x2 <- runif(n)
  group <- rep(as.character(1:ng), each = nw)
  return(tibble::tibble(x2 = x2,
                        group = group))
}

mm_pipe <- function(tb, ...){
  model.matrix(data = tb, ...)
}

get_validation_set <- function(tb, sigma, sigmaG, beta, includeRanef, groupIntercepts){
  vm <- sample_n(tb, 5, replace = F)[rep(1:5, each = 100), ]
  vf <- bind_rows(vm, tb) %>%
    select(-group) %>%
    mm_pipe(~.*.)
  vf <- vf[1:500, ]
  vGroups <- if(!includeRanef) rnorm(500, 0, sigmaG) else groupIntercepts[as.numeric(vm$group)]
  vm[["y"]] <- vf %*% beta + vGroups + rnorm(500, mean = 0, sd = sigma)
  vm
}

y_gen_mermod <- function(tb, sigma = 1, sigmaG = 1, delta = 1, includeRanef = FALSE, validationPoints = FALSE){
  groupIntercepts <- rnorm(length(unique(tb$group)), 0, sigmaG)
  tf <- tb %>%
    dplyr::select(-group) %>%
    mm_pipe(~.*.)
  beta <- rep(delta, ncol(tf))
  if(validationPoints)  {
    vm <- get_validation_set(tb, sigma, sigmaG, beta, includeRanef, groupIntercepts)
  }
  tb[["y"]] <- tf %*% beta + groupIntercepts[as.numeric(tb$group)] + rnorm(nrow(tb), mean = 0, sd = sigma)
  tb[["truth"]] <- tf %*% beta + groupIntercepts[as.numeric(tb$group)] * (includeRanef)
  if(validationPoints) return(list(tb = tb, vm = vm)) else return(tb)
}


tb <- x_gen_mermod(10, 20000) %>%
    y_gen_mermod()

fit2 <- lmer(y ~ x2 + (1|group) , data = tb)

tb %>% add_ci(fit2, type = "parametric", includeRanef = TRUE, names = c("LCB", "UCB"))

Lmer works just fine on an example data set this large, but ciTools chokes and spits out

Error: cannot allocate vector of size 298.0 Gb

We need to re-examine how we are storing things in memory and see if we can do something more efficient. I'm not sure if this bug affects the other methods as well.

support betareg

Beta regression is a good alternative to binomial regression when the number of trials is unknown. I would like to support this model!

Rank-deficient models in lme4 produce errors

Warning message produced by lme4::lmer

fixed-effect model matrix is rank deficient so dropping 7 columns / coefficients.

Result with add_ci/add_pi

Error in X %*% vcovBetaHat : non-conformable arguments

Likely what is happening is that lmer is taking your specified model, noticing that the covariance matrix won't properly invert, and is thus dropping some columns so that it behaves better. When ciTools tries to do the same thing, it attempts to use the original, un-reduced matrix and thus produces the above error.

Error with add_ci() fed with a glmer(...,family=Gamma) model

Dear John,

I am running GLMM models with family=Gamma and I want to use add_ci() to compute confidence intervals around predictions (many thanks for developing this by the way).
Unfortunately, add_ci() often encounters the following error:
Error in quantile.default(newX[, i], ...) : missing values and NaN's not allowed if 'na.rm' is FALSE

often but not always; I can re-run add_ci() and it might fail or not.

And it seems to be associated with this type of warning message:
In lme4::bootMer(fit, my_pred, nsim = nSims, type = "parametric", : some bootstrap runs failed (1/100)

Here is the code I’m running:
modfinal<-glmer(cumulative_duration ~ winter + month + s_cumul_dura_small + big_treat + (1 | id_first_feederpair), data=data_FRICOE, family = Gamma(link = "log")),
with cumulative_duration being the time (in seconds) spent by the common chaffinch (Fringilla coelebs) on a feeder.

newdatpred <- expand.grid(winter=as.factor("4")
,month=as.factor("Jan")
,s_cumul_dura_small=0
,big_treat=as.factor(c("NONE","PICPIC"))
,id_first_feederpair=as.factor("0060564"))

add_ci(newdatpred,modfinal,alpha=0.01,response=T,includeRanef=F,type="boot",nSims=100,names=c("lowBCI","uppBCI"))
(Note: nSims set to 100 for speed reason, but planning on using 1000 for final results)

I hope I'm giving enough useful details on this. I'm happy to share more as needed.
I attach the data table data_FRICOE.txt if it can help reproduce this error (several runs of add_ci() might be required given it does not always yield this error)
data_FRICOE.txt

Equations in vignettes isn't rendering properly in RStudio

When viewed through the help file interface in RStudio, the equations don't render properly. They seem fine when accessed from the CRAN or other external sites, so I'm not sure what the problem is. Can you double-check this and let me know if it's only a problem on my end?

add citations

This package is quite an accomplishment. I would use this frequently in analyses for publication, but I would need to be able to cite the sources for the interval methods you used

`add_ci.lmerMod` error when lmer given rank-deficient model

When lmer is given a rank-deficient model matrix, it may go ahead and fit the model anyways, dropping columns as necessary. It produces a message when it does this if you look at the model object, fit or summary(fit). You can also look at the messages via summary(fit)$fitMsgs.

When this takes place, add_ci.lmerMod produces an error:

Error in X %*% vcovBetaHat : non-conformable arguments

I have a fix for this in the latest push of my master fork if you want to grab it. Alternatively, over-write get_x_matrix_mermod from helper_functions.R with the following:

get_x_matrix_mermod <- function(tb, fit){

    ##This function is necessary to avoid cases where the new data upon
    ##which CIs are generated does not contain all levels for one or more
    ##factors from the original data set. New and old data sets are
    ##appended, the model matrix is generated, and the function returns
    ##only the rows corresponding to the new data.
    mm <- fit@frame
    rv <- names(mm)[1]
    mm[[rv]] <- as.numeric(mm[[rv]])
    
    for(i in names(mm)){
        if(is.factor(mm[[i]])) mm[[i]] <- as.character(mm[[i]])
    }

    X <- suppressWarnings(model.matrix(reformulate(attributes(terms(fit))$term.labels), 
                                       dplyr::bind_rows(mm, tb))[-(1:nrow(fit@frame)), ])
    
    #remove columns from X that weren't used in the model fit
    X[, colnames(X)[colnames(X) %in% colnames(model.matrix(fit))]]
}

Fail to install ciTools package

Hi,
I seem to have trouble installing ciTools package. I've tried the three different ways suggested but keep getting the error message as the screenshot. Any suggestions are appreciated!

Screenshot 2022-04-29 at 16 16 35

Many thanks
Joyce

Provide option to give shorter prediction intervals

Currently, when we form prediction intervals via simulation, we take the alpha/2 and 1 - alpha/2 quantiles of the simulations to compute the interval estimate. Another method would be to provide the shortest intervals such that (1 - alpha)*M simulations lie inside. M is the total numbers of simulations set by nSims.

add_ci.lmer is not robust to formula types

Trying add_ci on a model with a formula involving I(...), i.e.

library(ciTools)
library(lme4)

data <- data.frame(x = c(1, 2, 3, 4), y = c(1, 3, 10, 15), gr = (1, 1, 2, 2))
mod <- lmer(y ~ I(x^2) + (1|gr), data)
add_ci(data, mod)

results in Error in x^2: non-numeric argument to binary operator. Attempting add_ci(data, mod, type = "parametric") results in Error in `[[<-.data.frame`(`*tmp*`, names[1], value = numeric(0)) : replacement has 0 rows, data has 4. The code completes in other cases, but returns spurious confidence intervals and the warning longer object length is not a multiple of shorter object length. At least in the parametric case, the problem appears to be in get_x_matrix_mermod.

Trying add_ci on a model with no fixed effects, i.e.

data <- data.frame(x = rep(seq(5), 2), y = c(rnorm(5), rnorm(5, mean = 0.5)), gr = c(rep(1, 5), rep(2, 5)))
mod <- lmer(y ~ (1 | gr), data)
data %>% add_ci(mod, type = "parametric")

results in Error in reformulate(attributes(terms(fit))$term.labels) : 'termlabels' must be a character vector of length at least one. Again, the problem appears to be in get_x_matrix_mermod.

Incorrect warning messages

Example:
newDF <- add_ci(test, testModel, alpha = 0.1, names = c(“LCB”, “UCB”))

If test does not have the columns pred, LCB, and UCB, no warning appears. (Expected outcome: no warning [working as expected])
If I rerun the exact same line and am overwriting newDF, no warning appears (Expected outcome: should get warning)

If I run the following:
test <- add_ci(test, testModel, alpha = 0.1, names = c(“LCB”, “UCB”))
newDF2 <- add_ci(test, testModel, alpha = 0.1, names = c(“LCB”, “UCB”))

I get a warning that I am overwriting data in newDF2, even though that df is created in that line and does not yet exist. (Expected outcome: no warning)

It seems that the warning only occurs when the dataframe defined in the function has the columns of the same name even though the warning should occur when the dataframe written to has columns of the same name.

add_ci.lmer assumes the standard error of random effects is constant across groups

When we compute the SEs of random effects, we use

seRandom <- arm::se.ranef(fit)[[1]][,1] ##
which results in a scalar. This is fine if all groups are the same size. If the groups are not the same size, we should match the correct random effect SE to the group. In add_ci.glmer, we do this:

ranef_name <- names(fit@cnms)[1] ## just one random effect for now
seRandom <- arm::se.ranef(fit)[[1]][,1] ## get all random effect SEs
seRandom_vec <- rep(NA, length(tb[[ranef_name]])) 

seRandom_df <- tibble( ## convert to data frame
    group = names(seRandom),
    seRandom = seRandom
)
names(seRandom_df)[names(seRandom_df) == 'group'] <- ranef_name ## get the group names so SEs can be matched
seRandom_vec <- dplyr::left_join(tb, seRandom_df, by = ranef_name)[["seRandom"]] ## match RE SEs to each obs in tb using ranef_name as the matching criterion

`

This code is ugly and I'm not proud of it, but I think we need to do something about this because the documentation does not state that add_ci.lmer can only be used if the group sizes are equal.

my_pred_full_glmer_response - possible to not have newdata argument?

Hi. When I try to run add_ci with a glmerMod, I get the following error:

Error in [.data.frame(fr, vars) : undefined columns selected

After tracing and running in debug, it seems the error is being thrown by a call to lme4::bootMer and the issue is in the newdata argument.

Would it be incorrect to change my_pred_fixed_glmer_linear to drop the newdata argument? When I changed the my_pred function in bootstrap_ci_glmermod from

function (fit, lvl) { predict(fit, newdata = ciTools_data$tb_temp, re.form = NULL, family = fit@resp$family$family, type = "response") }

to

my_pred = function (fit, lvl) { predict(fit, re.form = NULL, family = fit@resp$family$family, type = "response") }

it seems to work. Since the data are contained inside the glmerMod object I don't see any need to have the newdata argument.

Implement testing

We lack an automated testing process that we can plug into the build process. We can implement some pretty good testing with the testthat package.

Failure due to update with add_ci(type="boot")

Thank you for the wonderful package!

I am a new user, and I'm having an issue with a model that I fit in one environment and need to use ciTools in another environment. My general use case is illustrated by the reprex below:

set.seed(5)
library(ciTools)
#> ciTools version 0.6.1 (C) Institute for Defense Analyses
library(tidyverse)

my_data <-
  data.frame(
    counts=c(18,17,15,20,10,20,25,13,12),
    outcome=rep(1:3, 3),
    treatment=rep(1:3, each=3)
  )

fit_my_data <- function(my_data_in_the_function) {
  link <- "log"
  glm.D93 <- glm(counts ~ outcome + treatment, data=my_data_in_the_function, family=gaussian(link=link))
  glm.D93
}

my_model <- fit_my_data(my_data)

# Works because it is not using `update()`
my_new_data <-
  my_data %>%
  add_ci(fit=my_model)

# Fails because `link` is not set in the current environment (it is only set in
# the function environment)
my_new_data <-
  my_data %>%
  add_ci(fit=my_model, type="boot")
#> Error in gaussian(link = link): object 'link' not found

# Works because now the call doesn't rely on the link from the current
# environment; the link comes from the model family.
my_model_update <- my_model
my_model_update$call$family <- my_model$family
my_new_data <-
  my_data %>%
  add_ci(fit=my_model_update, type="boot")
#> Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
#> prediction from a rank-deficient fit may be misleading

Created on 2021-05-17 by the reprex package (v2.0.0)

Error in predict add_pi.survreg

When I run the survreg with boot option, I get the error.

> ## Define a data set.
> df <- survival::stanford2
> ## remove a covariate with missing values.
> df <- df[, 1:4]
> ## next, create the Surv object inside the survreg call:
> fit <- survival::survreg(survival::Surv(time, status) ~ age + I(age^2),
+                          data = df, dist = "lognormal")
> add_pi(df, fit, alpha = 0.1, names = c("lwr", "upr"),method="boot")
Error in MASS::mvrnorm(nSims, beta.hat, vcov.hat) : 
  incompatible arguments

I know what is causing it, basically, the scale coefficient is not being picked up with Log(scale).

> vcov(fit)
             (Intercept)           age      I(age^2)    Log(scale)
(Intercept)  3.453920115 -0.1844604713  2.304116e-03  0.0052810250
age         -0.184460471  0.0105711147 -1.379510e-04  0.0001468898
I(age^2)     0.002304116 -0.0001379510  1.854279e-06 -0.0000039567
Log(scale)   0.005281025  0.0001468898 -3.956700e-06  0.0049915276
> coef(fit)
 (Intercept)          age     I(age^2) 
 3.344231092  0.226092785 -0.003504315 

Data not overwritten when rerunning add_ci

If I run:
test <- add_ci(test, testModel, alpha = 0.1, names = c(“LCB”, “UCB”))
test$pred <- NA
test$LCB <- NA
test$UCB <- NA

THEN:
newDF <- add_ci(test, testModel, alpha = 0.1, names = c(“LCB”, “UCB”))
-or -
test <- add_ci(test, testModel, alpha = 0.1, names = c(“LCB”, “UCB”))

Both the newDF and test dataframes come up with the columns still set to NA. This occurs with values in the columns too, not only after setting the columns to NA. It does not always occur on the first attempt, but once it occurs it continues to occur.

How to use add_ci in presence of observation-level random effect

I have a binomial GLMM of the form
fit = glmer(cbind(strain, TOTAL-strain) ~ 0+(1|REGION/obs) +
COUNTRY*SAMPLE_DATE,
family=binomial(logit), data=data)
Now I would like to produce confidence intervals, taking into account my fixed factors COUNTRY & SAMPLE_DATE and random factor intercept REGION, but ignoring the observation-level random effect (put in to take into account overdispersion).
I would also like to be able to extrapolate my model beyond the original sample dates in my dataset. Is there any way to achieve something like this? Or is the only option for me to set includeRanef = FALSE and then ignore the random intercepts per region? (which would not be ideal for me)

Support gls models?

Would it also be possible for the package to support confidence and prediction intervals for gls models?

CI for quantile

Is there the capability to create a confidence interval around a quantile?

Use z-distribution for GLMs with dispersion parameter fixed to 1

At present, many GLM methods use t-distributions rather than z-distributions for GLMs where the dispersion parameter is fixed at 1. (See Section 2.6.2 of Applied generalized linear mixed models: Continuous and discrete data, 2010.) These methods should use a standard normal instead. See add_ci.glm, add_pi.glm, etc.

No support for binomial regression

Binomial (vice Bernoulli) regression is supported by GLM and can be specified by using the "weights" option to give the number of trials per observation and the response variable to indicate the proportion of "success"s. Prediction intervals, quantile estimates, and probability estimates are sensible for such cases and could be estimated via simulation.

Inverse Gaussian parametric CIs are broken

I have not tested this but i am sure add_ci.glm doesn't work if it is supplied with an inverse-gaussian fit. I added a bootstrap procedure to mitigate this issue, but it's a definitely worth correcting. I would like to be able to calculate GLM CIs and PIs with complete coverage of the exp family.

Add_pi does not work for Gaussian family

If family=gaussian is used in add_pi, all the bounds to returned as zero vectors.

Since these prediction intervals can be solved parametrically, we should just pass to add_pi.lm if the Guassian family is used in a glm model.

Failure in bootstrapped CI due to missing factor levels

When factor levels are missing from a model, no surprise, it cannot predict rows with that factor level. The issue comes from the use of predict() in those models updated without all levels. The reprex below illustrates the issue.

set.seed(5)
library(ciTools)
#> ciTools version 0.6.1 (C) Institute for Defense Analyses
library(tidyverse)

my_data <-
  data.frame(
    counts=c(18,17,15,20,10,20,25,13,12),
    outcome=gl(3,1,9),
    treatment=gl(3,3)
  )

my_model <- glm(counts ~ outcome + treatment, data=my_data)

# Works because it is not using `predict()`
my_new_data <-
  my_data %>%
  add_ci(fit=my_model)

# Fails because data subsetting is not contingent on input factor levels
my_new_data <-
  my_data %>%
  add_ci(fit=my_model, type="boot")
#> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): factor treatment has new levels 2

Created on 2021-05-17 by the reprex package (v2.0.0)

Support gam and scam models?

Was also wondering if it would be hard to also support mgcv::gam (generalized additive models) and scam (shape constrained additive models) models? The recipe to calculate confidence and prediction intervals for these should be quite similar to that for GLMs right?

misssingness

Hello
I am new with ciTools but very interested in using it for post hoc analyses for my survreg models
I have a very naive question:
is there a way to omit NA ?

fit<-  survreg(Surv(left, right, type="interval2") ~ drug,  dist="weibull", data=mydata)
add_quantile(mydata, fit, p = 0.75, name = c("quant", "lwr", "upr"))

error

Error in add_quantile.survreg
 Check df for missingness

thank you!

Zero inflated Poisson models

I would like to have some methods for inflated Poisson and NegBin models. I'm not sure how these models are commonly fit in R (what package or functions are used?). I think this issue is less important than #28 and #25.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.