Comments (10)
Good suggestion. Since XGBoost and LightGBM are very similar in their architecture, I have also been trying to implement LightGBMLSS. It is working, but I haven`t figured out how to properly select LightGBM's hyperparameter as performance is way off compared to XGBoost. But maybe I need to re-visit the concept
from catboostlss.
Hey @StatMixedML
normally when I use LightGBM the defaults work relatively well for most cases. In fact superior to XGBoost and Catboost defaults. The reason to ask was because most of our algorithms in production rely one way or another on LightGBM (is just so fast and robust) optimised now with Optuna, and we use Quantile Regression to get confidence intervals. I would be more than happy to replace it with a proper probabilistic approach π.
from catboostlss.
@jrzaurin See also LightGBMLSS and ProbBoost. I was planning to do it anyways, so thanks for the issue.
I need to get working on that soon :-)
from catboostlss.
@StatMixedML you are a heroπ
Looking forward to see the progress (and use it!)
from catboostlss.
normally when I use LightGBM the defaults work relatively well for most cases. In fact superior to XGBoost and Catboost defaults.
Interesting. This is in fact contrary to what I have experienced, mostly for regression tasks though, with a lot of categorical covariates (I haven`t had much classification tasks). LightGBM appeared to be very sensitive to its hyper-parameters, and I had to do considerable hyper-parameter-tuning to arrive at a decent accuracy.
from catboostlss.
@jrzaurin May I ask what set of parameters you usually set / optimize using LightGBM and what range you search over?
from catboostlss.
@StatMixedML Sure! Up until a couple of weeks ago we used hyperopt
and this param space:
space = {
"learning_rate": hp.uniform("learning_rate", 0.01, 0.3),
"n_estimators": hp.quniform("n_estimators", 100, 1000, 50),
"num_leaves": hp.quniform("num_leaves", 40, 400, 20),
"min_child_samples": hp.quniform("min_child_samples", 20, 100, 20),
"colsample_bytree": hp.uniform("colsample_bytree", 0.5, 1.0),
"reg_alpha": hp.choice(
"reg_alpha", [0.01, 0.05, 0.1, 0.2, 0.4, 1.0, 2.0, 4.0, 10.0]
),
"reg_lambda": hp.choice(
"reg_lambda", [0.01, 0.05, 0.1, 0.2, 0.4, 1.0, 2.0, 4.0, 10.0]
),
}
Note that we do not use subsample
or bagging_fraction
. This is because our problem has a strong temporal component and we cannot sample rows at random. We also do not tune max_bin
because, in all honesty, we do not know really how to "control" that param, so we leave it with its default value.
These days I am thinking in picking 5-10 datasets and run tones of lightgbm experiments with diff parameters see how results change when you change, for example, reg_lambda. This is because I have not found many resources online that give a hint of which values are sensible.
It is for that reason that we recently changed to Optuna, which has lightGBM fully integrated. When optimising GBMs there is a hierarchy in parameters, i.e. some are more important than others (all this you guys being THE experts I am sure you know). Optuna takes care of that and optimises following certain hierarchy so that you do not need to worry about the param space.
Let me know if this helps! :)
from catboostlss.
@jrzaurin Nice, thanks for sharing! The sensitivity analysis is definitely something of interest for the wider community.
Not sure if you know this site https://sites.google.com/view/lauraepp/parameters.
I am mostly using Bayesian Optimization for arriving at sensible hyper-parameters. Basically what you do is to specify an initial set of hyper-parameters that provides a loss-surface for different combination of parameters. It then trains a surrogate model to learn the relationship between hyper-parameters and the loss. It then suggests new values for the hyper-parameters. However, I still face some problems for LightGBMLSS, as you can see below.
Ideally, we should see something like this
Obviously, it doesn`t learn the variance parameter well. Which is odd, as the partial dependence plot shows that it gets it right
Let me try and use your set of hyper-parameters.
from catboostlss.
@StatMixedML Awesome. Let's see how it goes.
(this is still me replying from my working account :) )
And thanks, I did not know that site, is good to have a full list in one place!
Let me read your 2019 paper at some point see if I can be more useful π
from catboostlss.
@jrzaurin Let me close this issue and re-open it at the LightGBMLSS repo here.
from catboostlss.
Related Issues (9)
- Is it already usable? HOT 2
- how to install? HOT 1
- Is this forthcoming? HOT 3
- All files are empty HOT 2
- ETA for package? HOT 6
- Question: Catboost Quantile Regression HOT 1
- Project dead ? HOT 3
- ETA for this code?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from catboostlss.