I get a strange sounding error when trying to use validate()

validate() for rms::ols: Error in lsfit(x, y) : only 0 cases, but 2 variables about rms HOT 5 OPEN

harrelfe commented on August 17, 2024

validate() for rms::ols: Error in lsfit(x, y) : only 0 cases, but 2 variables

from rms.

Comments (5)

harrelfe commented on August 17, 2024 1

The behavior you saw is the intended behavior when the sample size does not support a large number of parameters. You'll need to reduce the number of parameters in the model.

from rms.

harrelfe commented on August 17, 2024 1

You have too many parameters in the model.

from rms.

harrelfe commented on August 17, 2024

Thanks for the report. There was a bug for ols for validate and calibrate where singular fits were reporting NAs instead of setting fail=TRUE so that that sample would be ignore. This is fixed for the next release.

from rms.

Deleetdk commented on August 17, 2024

Updating to the Github version, validate no longer throws as error, but it gives useless output for my use case as all 40 runs failed:

> validate(ols_fit)

Divergence or singularity in 40 samples
          index.orig training test optimism index.corrected n
R-square       0.572      NaN  NaN      NaN             NaN 0
MSE            0.425      NaN  NaN      NaN             NaN 0
g              0.000      NaN  NaN      NaN             NaN 0
Intercept      0.000      NaN  NaN      NaN             NaN 0
Slope          1.000      NaN  NaN      NaN             NaN 0

In the iris example case, it is also almost useless. Despite 40 runs, only 2 completed:

> validate(fit)

Divergence or singularity in 38 samples
          index.orig training   test optimism index.corrected n
R-square      0.5504   0.8728 -0.931    1.804         -1.2536 2
MSE           0.0848   0.0234  0.364   -0.341          0.4258 2
g             0.3504   0.4573  0.191    0.266          0.0845 2
Intercept     0.0000   0.0000  2.177   -2.177          2.1766 2
Slope         1.0000   1.0000  0.289    0.711          0.2886 2

My guess is the same as before: one has to use special sampling to avoid the issue. As someone on Cross Validated suggested:

You could look into stratified sampling, i.e. constraining your train/test splits so that they have (approximately) the same relative frequencies for your predictor levels.

However, I think it worth considering whether the current behavior is actually wanted: So random splitting with non-negligible frequency results in sets that do not cover all predictor levels. Can you consider such a set representative for whatever the application is?
I've been working with such small sample sizes and went for stratified splitting. But I insist that thinking hard about the data and the consequences of working with such small samples is at least as necessary as fixing the pure computational error.

from rms.

Deleetdk commented on August 17, 2024

How do you recommend that I validate models that contain a large number of logical predictors without running into this issue?

from rms.

Recommend Projects

validate() for rms::ols: Error in lsfit(x, y) : only 0 cases, but 2 variables about rms HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent