Comments (7)
I just released a version (0.4.2) based on the new toolchain. As reported, the problem is solved there. In addition, multithreading is finally working! This version can also be installed on the current R version by using the binary, see https://github.com/imbs-hl/ranger/releases.
I hope it's solved with R-3.3.0!
from ranger.
No this is not as expected. I can reproduce the issue on Windows but not on Mac or Linux. I will check the code for some Windows-specific problems.
from ranger.
The problem seems to be std::discrete_distribution<>
with gcc 4.6.3. I tried with the new 4.9.3 toolchain and R-devel and it was fast.
Any idea how to solve this instead of waiting for a newer gcc?
from ranger.
Using boost::random::discrete_distribution as a replacement helps:
before:
> system.time(fit.1 <- ranger(y ~ x))
user system elapsed
9.27 0.13 9.41
> system.time(fit.3 <- ranger(y ~ x, case.weights = rep(1, times = n)))
user system elapsed
93.02 0.07 93.19
after:
> system.time(fit.1 <- ranger(y ~ x))
user system elapsed
8.76 0.16 8.96
> system.time(fit.3 <- ranger(y ~ x, case.weights = rep(1, times = n)))
user system elapsed
8.98 0.09 9.09
from ranger.
Thanks! However I'm reluctant to merge it in the master because of the Boost dependency... ;)
from ranger.
That is a temporary simple solution while waiting for a newer gcc. I didn't do extensive testing, but a quick check showed very similar model performance (see below). That should make it at least feasible for me to run some prototyping with ranger on my windows laptop, as I frequently need to use weights. And the real dependency is only for the windows R version, which is already a neglected child with no multithreading :)
# with the original std::discrete_distribution
set.seed(111)
fit_std <- ranger(y ~ x, case.weights = rep(1, times = n), write.forest=T)
pr_std <- predict(fit_std, data.frame(x = x))
# with boost::random::discrete_distribution
set.seed(111)
fit_boost <- ranger(y ~ x, case.weights = rep(1, times = n), write.forest=T)
pr_boost <- predict(fit_boost, data.frame(x = x))
cor(pr_std$predictions, pr_boost$predictions)
[1] 0.9979446
The gcc's <random>
was based on boost. But some over-engineering resulted in overheads and worse speed - I've seen a few discussions about that in the past. It wasn't just the discrete_distribution, but some other distributions too were several times slower. Maybe things did significantly improve in this regard in the latest releases (I didn't really follow), but I personally had more trust in boost::random.
It's your choice in the end. I'm just telling you what I know. I'm glad I've noticed this discussion, since my initial observations didn't agree with the claims of ranger being very fast, so I didn't even try it on a linux server.
from ranger.
This is brilliant, thank you very much for these investigations. Even on the current R version, the issue seems to be fixed with ranger 0.4.2. Wow!
from ranger.
Related Issues (20)
- Increasing mtry crashes ranger fit HOT 3
- make fails => cannot compile C++ source on Mac HOT 2
- Error updating the package HOT 14
- warnings generated running 'Understanding random forests with randomForestExplainer' code HOT 1
- num.threads causing crashes inside caret recursive feature elimination wrapper HOT 1
- Results from importance_pvalues() differ despite setting seed HOT 1
- Decision Tree Build HOT 2
- Random forest prediction intervals using the out-of-bag predictions errors. HOT 2
- Is there a way to fit an isolation forest using ranger? HOT 1
- Node-wise impurity decrease HOT 2
- Extract "dependent.variable.name" from a ranger object HOT 6
- No Tree Plotting Function Provided by Package HOT 2
- Add C++14 specification (`std::make_unique` is only avaiable from C++14 onwards) HOT 2
- classProbs are not in line with the predicted label HOT 4
- Trees summary statistics: height, splits HOT 2
- Matrices without colnames. HOT 2
- A check on inbag size would be nice
- Feature Request: inclusion of the trivial random forest model HOT 2
- compilation failed for package 'ranger' HOT 2
- Clarify Gini index calculation HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ranger.