Comments (13)
What version are you using? At least the current CRAN release (0.5.0) would not allow missing values in the training data. But certainly we would all be extremely happy if ranger
would be able to treat NA ;).
from ranger.
I'm using version 0.4.2. Ranger did not complain about missing values in the training data, however, predict function did.
If I may ask, what is hindering ranger from handling NAs? Tree based methods can handle NAs nicely, i.e., an example with missing value is usually passed to all of the branches of the tree, or alternatively, in random forest you can chose one branch at random, it usually works equally well as the former approach and it's faster.
from ranger.
In training, the data is checked for missing data if the formula interface is used. If the alternative interface (dependent.variable.name) is used, this check is currently skipped. I'm not sure what's exactly happening with the NA's in this case, so be careful.
We should check the runtime of this check for huge datasets and probably add it (if possible with column information, as in #89).
In prediction, ranger always checks for missing values.
@jlevatic you are right, there are some options to handle NA's in random forests. We are still not sure what's the best option in which case and therefore there is nothing implemented yet. But I have a student working on that. ;)
from ranger.
Better NA checks now included in #109.
from ranger.
@mnwright do you have any updates on implementing missing data imputation within ranger?
from ranger.
Unfortunately not. We achieved good results with multiple imputation and adaptive tree imputation (Ishwaran 2008, http://dx.doi.org/10.1214/08-AOAS169). However, there is nothing implemented in ranger yet.
from ranger.
@nlapidot3
Maybe you need this https://github.com/mayer79/missRanger
from ranger.
@mnwright thanks for the reference.
@NamLQ thank you for the suggestion. I will take a look at it and see if it works for my purposes.
from ranger.
missRanger
is built to impute missing values in a data set. To use such chaining loops within ranger
would be awfully inefficient. There must be much much better ways to deal with missing predictor values in a random forest! One thing that could work with few missings: if an observation with missing x is to be split on x, send part of its case weight to the left and the remaining weight to the right. Same during prediction.
from ranger.
I'm coming to this thread because I'm actually working on a homebrew package that uses ranger to impute missing values. Ranger is the fastest rf package I know of so I was hoping I could bootleg the mice package and use ranger instead of randomForest.
I'm sure most of you know this but xgboost handles missing variables by putting them in the split that minimizes the loss function the most, which I think is an elegant solution.
from ranger.
In my understanding you don't need to predict with missing values in multiple imputation.
I'm sure most of you know this but xgboost handles missing variables by putting them in the split that minimizes the loss function the most, which I think is an elegant solution.
I think that is a very promising solution. Still, I think we need a good comparison before we decide on missing value methods to implement in ranger,
from ranger.
Just wondering if there has been any progress in implementing a method for missing values? The suggestion above of putting them in the split that minimizes loss seems like a good one.
from ranger.
Sorry, no progress.
from ranger.
Related Issues (20)
- Increasing mtry crashes ranger fit HOT 3
- make fails => cannot compile C++ source on Mac HOT 2
- Error updating the package HOT 14
- warnings generated running 'Understanding random forests with randomForestExplainer' code HOT 1
- num.threads causing crashes inside caret recursive feature elimination wrapper HOT 1
- Results from importance_pvalues() differ despite setting seed HOT 1
- Decision Tree Build HOT 2
- Random forest prediction intervals using the out-of-bag predictions errors. HOT 2
- Is there a way to fit an isolation forest using ranger? HOT 1
- Node-wise impurity decrease HOT 2
- Extract "dependent.variable.name" from a ranger object HOT 6
- No Tree Plotting Function Provided by Package HOT 2
- Add C++14 specification (`std::make_unique` is only avaiable from C++14 onwards) HOT 2
- classProbs are not in line with the predicted label HOT 4
- Trees summary statistics: height, splits HOT 2
- Matrices without colnames. HOT 2
- A check on inbag size would be nice
- Feature Request: inclusion of the trivial random forest model HOT 2
- compilation failed for package 'ranger' HOT 2
- Clarify Gini index calculation HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ranger.