Coder Social home page Coder Social logo

Comments (2)

jenniferthompson avatar jenniferthompson commented on August 17, 2024

Here's some additional info in case it's helpful:

  • I fit the original model, then used robcov with cluster = individual site and tried anova. Got an error that led to trying cluster = site type. (That error ended up being related to needing a higher tol.)
  • When I tried cluster = site type (dichotomous), there was no warning or error, but the results had negative SS and F statistics and p-values of 1.0 for variables that were highly significant in the original model.
  • The same issues were seen when using fit.mult.impute(..., fitter = ols) on the same data.

from rms.

harrelfe avatar harrelfe commented on August 17, 2024

I modified your code to expose the problem a bit more:

set.seed(56)
for(nsites in 7:2) {
  cat('nsites:', nsites, '\n')
  df <- data.frame(y = rnorm(n = 100),
                   x1 = rnorm(n = 100),
                   x2 = rnorm(mean = 5, sd = 0.5, n = 100),
                   site = sample(LETTERS[1:nsites], size = 100, replace = TRUE))

  f <- ols(y ~ rcs(x1, 3) + rcs(x2, 3), data = df, x = TRUE, y = TRUE)
  g <- robcov(f, cluster = df$site)
  print(anova(g))
}

You'll see that it's ok for 7, 6, 5 sites but is singular with 4 sites. [Don't mess with tol to try to avoid the problem.]

Just as with a random effects model you must have at least 3 clusters, and more are recommended, there is a limit to how few clusters you can use when using the sandwich estimator. This does need to be better understood though.

But the results are dataset dependent. If we fix x1 and x2 to be the same no matter how site is computed, and set up to be able to also use bootcov we get:

afun <- robcov
set.seed(56)
df <- data.frame(y = rnorm(n = 100),
                 x1 = rnorm(n = 100),
                 x2 = rnorm(mean = 5, sd = 0.5, n = 100))
 
for(nsites in 7:2) {
  cat('nsites:', nsites, '\n')
  df$site <- sample(LETTERS[1:nsites], size = 100, replace = TRUE)
  f <- ols(y ~ rcs(x1, 3) + rcs(x2, 3), data = df, x = TRUE, y = TRUE)
  g <- afun(f, cluster=df$site)
  print(anova(g))
}

Things work mathematically until nsites=4 but the anova doesn't look quite right with nsites < 6.

If we instead use afun <- function(...) bootcov(..., B=500), results are mathematically OK for nsites > 2 but don't look quite right for nsites=3.

from rms.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.