The current code uses init_score to inject starting values, but what about modeling pr

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Models with init_score about lightgbmlss HOT 8 CLOSED

statmixedml commented on August 25, 2024

Models with init_score

from lightgbmlss.

Comments (8)

StatMixedML commented on August 25, 2024 1

Thanks for your interest in the project.

So the initial scores serve as to initialize the boosting model, i.e., the model uses the initial scores as starting values where to boost from for the first itertation. Since we model all parameters of a distribution, LightGBMLSS currently calculates the unconditional parameter values from the data using the LBFGS model, based on the NLL. Hence, there is no need to multiply them with a value since they already reflect a reasonable starting value.

So for a Poisson distributed variable, the rate parameter is initialized with the unconditional parameter values and for the remaining iterations, the rate parameters changes as a function of x.

If you want to model a specific insurance type of data, it is best to transform the response prior to model, i.e., y/exposure as you suggested.

The weights are used to weight the gradient/hessian which are then used to update the parameter estimates.

Does that answer your question?

from lightgbmlss.

neverfox commented on August 25, 2024

I think the desired offset would take the place of the np.ones it uses now but you'd have to know how it relates to the distribution parameters through the distributions mean function, right? So for Gamma, you could multiply the desired init_score (offset) by the concentration starting value but you have to invert it for the rate. For Gaussian, you'd multiply it by the starting mean but not the sigma, etc. That would require having the ability to determine the correct operation to perform for each parameter for each distribution. Does that make sense?

from lightgbmlss.

neverfox commented on August 25, 2024

In a traditional context, one can either use weights + transformed response or no weights and an offset. In a non-LSS approach where predictions are in terms of the response value rather than distribution parameters, init_score serves as the place bring in an offset. So, for example, I can predict loss severity by using losses as response and log(exposure) as init_score. My prediction will then be severity / exposure. Alternatively, I can use losses/exposure as my response and exposure as weights. That should produce similar models (though not precisely similar in terms of the path training might take).

As I understand you answer, LSS needs to use init_score for distribution parameters, not means, and so I'd have to resort to the losses/exposure + exposure weights approach. The thought in my initial response was basically thinking that if I was really committed to using my domain offset, I'd just need to relate it to parameter init_scores in the proper way. For example, in Gamma the mean is alpha/beta. So if the init_scores were alpha*log(exposure) and beta/log(exposure), then I'd get the parameter predictions for Gammas that were already in terms of a loss/exposure distribution rather than a raw loss distribution without having to transform my response.

Transforming the response and using weights isn't a big deal, but there are cases were you really would like to use an offset in a regression, like if I want to build a model off the prior model's predictions (even if they are distributional parameters) or if I want to start from (to use insurance again) my current rating plan's relativities.

So the thought behind the ticket is really how one might be able to take any init_score you might have for your problem space that you planned to use in a traditional LGBM context (just like you have a response variable that itself doesn't start out as a set of distribution parameters) and have it work in the same manner, transparently, with the LSS version of the model.

from lightgbmlss.

StatMixedML commented on August 25, 2024

@neverfox Thanks for the detailed explanation! I now better understand your problem/use-case.

I need to see how to incorporate an offset without transforming the response. Not sure if this is possible via init_score since it is currently being used as the starting-values for the distributional parameters.

from lightgbmlss.

StatMixedML commented on August 25, 2024

@neverfox I looked into it, however, I don't see how I can possibly change the start values to a transformation of the response. So for now, the onl way to do it would be to transform the response accordingly.

from lightgbmlss.

StatMixedML commented on August 25, 2024

Can I close this?

from lightgbmlss.

neverfox commented on August 25, 2024

You can. I am still thinking the issue through but it might take me some time.

from lightgbmlss.

StatMixedML commented on August 25, 2024

Feel free to re-open if necessary. Thanks.

from lightgbmlss.

Models with init_score about lightgbmlss HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent