Coder Social home page Coder Social logo

predict with missing values about ranger HOT 13 CLOSED

imbs-hl avatar imbs-hl commented on July 24, 2024
predict with missing values

from ranger.

Comments (13)

mayer79 avatar mayer79 commented on July 24, 2024

What version are you using? At least the current CRAN release (0.5.0) would not allow missing values in the training data. But certainly we would all be extremely happy if ranger would be able to treat NA ;).

from ranger.

jlevatic avatar jlevatic commented on July 24, 2024

I'm using version 0.4.2. Ranger did not complain about missing values in the training data, however, predict function did.

If I may ask, what is hindering ranger from handling NAs? Tree based methods can handle NAs nicely, i.e., an example with missing value is usually passed to all of the branches of the tree, or alternatively, in random forest you can chose one branch at random, it usually works equally well as the former approach and it's faster.

from ranger.

mnwright avatar mnwright commented on July 24, 2024

In training, the data is checked for missing data if the formula interface is used. If the alternative interface (dependent.variable.name) is used, this check is currently skipped. I'm not sure what's exactly happening with the NA's in this case, so be careful.
We should check the runtime of this check for huge datasets and probably add it (if possible with column information, as in #89).

In prediction, ranger always checks for missing values.

@jlevatic you are right, there are some options to handle NA's in random forests. We are still not sure what's the best option in which case and therefore there is nothing implemented yet. But I have a student working on that. ;)

from ranger.

mnwright avatar mnwright commented on July 24, 2024

Better NA checks now included in #109.

from ranger.

nlapidot3 avatar nlapidot3 commented on July 24, 2024

@mnwright do you have any updates on implementing missing data imputation within ranger?

from ranger.

mnwright avatar mnwright commented on July 24, 2024

Unfortunately not. We achieved good results with multiple imputation and adaptive tree imputation (Ishwaran 2008, http://dx.doi.org/10.1214/08-AOAS169). However, there is nothing implemented in ranger yet.

from ranger.

NamLQ avatar NamLQ commented on July 24, 2024

@nlapidot3
Maybe you need this https://github.com/mayer79/missRanger

from ranger.

nlapidot3 avatar nlapidot3 commented on July 24, 2024

@mnwright thanks for the reference.
@NamLQ thank you for the suggestion. I will take a look at it and see if it works for my purposes.

from ranger.

mayer79 avatar mayer79 commented on July 24, 2024

missRanger is built to impute missing values in a data set. To use such chaining loops within ranger would be awfully inefficient. There must be much much better ways to deal with missing predictor values in a random forest! One thing that could work with few missings: if an observation with missing x is to be split on x, send part of its case weight to the left and the remaining weight to the right. Same during prediction.

from ranger.

samFarrellDay avatar samFarrellDay commented on July 24, 2024

I'm coming to this thread because I'm actually working on a homebrew package that uses ranger to impute missing values. Ranger is the fastest rf package I know of so I was hoping I could bootleg the mice package and use ranger instead of randomForest.

I'm sure most of you know this but xgboost handles missing variables by putting them in the split that minimizes the loss function the most, which I think is an elegant solution.

from ranger.

mnwright avatar mnwright commented on July 24, 2024

In my understanding you don't need to predict with missing values in multiple imputation.

I'm sure most of you know this but xgboost handles missing variables by putting them in the split that minimizes the loss function the most, which I think is an elegant solution.

I think that is a very promising solution. Still, I think we need a good comparison before we decide on missing value methods to implement in ranger,

from ranger.

Geoff-Kahn avatar Geoff-Kahn commented on July 24, 2024

Just wondering if there has been any progress in implementing a method for missing values? The suggestion above of putting them in the split that minimizes loss seems like a good one.

from ranger.

mnwright avatar mnwright commented on July 24, 2024

Sorry, no progress.

from ranger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.