Coder Social home page Coder Social logo

Comments (5)

harrelfe avatar harrelfe commented on August 17, 2024 1

The behavior you saw is the intended behavior when the sample size does not support a large number of parameters. You'll need to reduce the number of parameters in the model.

from rms.

harrelfe avatar harrelfe commented on August 17, 2024 1

You have too many parameters in the model.

from rms.

harrelfe avatar harrelfe commented on August 17, 2024

Thanks for the report. There was a bug for ols for validate and calibrate where singular fits were reporting NAs instead of setting fail=TRUE so that that sample would be ignore. This is fixed for the next release.

from rms.

Deleetdk avatar Deleetdk commented on August 17, 2024

Updating to the Github version, validate no longer throws as error, but it gives useless output for my use case as all 40 runs failed:

> validate(ols_fit)

Divergence or singularity in 40 samples
          index.orig training test optimism index.corrected n
R-square       0.572      NaN  NaN      NaN             NaN 0
MSE            0.425      NaN  NaN      NaN             NaN 0
g              0.000      NaN  NaN      NaN             NaN 0
Intercept      0.000      NaN  NaN      NaN             NaN 0
Slope          1.000      NaN  NaN      NaN             NaN 0

In the iris example case, it is also almost useless. Despite 40 runs, only 2 completed:

> validate(fit)

Divergence or singularity in 38 samples
          index.orig training   test optimism index.corrected n
R-square      0.5504   0.8728 -0.931    1.804         -1.2536 2
MSE           0.0848   0.0234  0.364   -0.341          0.4258 2
g             0.3504   0.4573  0.191    0.266          0.0845 2
Intercept     0.0000   0.0000  2.177   -2.177          2.1766 2
Slope         1.0000   1.0000  0.289    0.711          0.2886 2

My guess is the same as before: one has to use special sampling to avoid the issue. As someone on Cross Validated suggested:

You could look into stratified sampling, i.e. constraining your train/test splits so that they have (approximately) the same relative frequencies for your predictor levels.

However, I think it worth considering whether the current behavior is actually wanted: So random splitting with non-negligible frequency results in sets that do not cover all predictor levels. Can you consider such a set representative for whatever the application is?
I've been working with such small sample sizes and went for stratified splitting. But I insist that thinking hard about the data and the consequences of working with such small samples is at least as necessary as fixing the pure computational error.

from rms.

Deleetdk avatar Deleetdk commented on August 17, 2024

How do you recommend that I validate models that contain a large number of logical predictors without running into this issue?

from rms.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.