Coder Social home page Coder Social logo

search on hyper parameters about hep_ml HOT 5 OPEN

arogozhnikov avatar arogozhnikov commented on May 23, 2024
search on hyper parameters

from hep_ml.

Comments (5)

arogozhnikov avatar arogozhnikov commented on May 23, 2024

Hi, Simone.

You're doing somewhat strange and expect algorithms to do the things they can't know about.

Cross-validation of machine learning is easy when you have some figure of merit (ROC AUC, MSE, classification accuracy). In this case evaluation is quite straghtforward.

However in case of reweighting, correct validation requires 2 steps:

  • weak check: looking at 1d distributions (or computing simple 1-d tests)
  • strong check: checking that machine learning model used in the analysis can't discriminate data after reweighting.

(Also, is there any reason to optimize parameters automatically?)

from hep_ml.

bifani avatar bifani commented on May 23, 2024

Hi,

OK, let me try to clarify what the situation is

I have played a bit with the hyper parameters and ended up using the following configuration

    GBReweighterPars = {"n_estimators"     : 200,
                        "learning_rate"    : 0.1,
                        "max_depth"        : 4,
                        "min_samples_leaf" : 1000,
                        "subsample"        : 1.0}

However, when I use different samples with much less stats I am afraid the above are far from being optimal, e.g. too many n_estimators, causing the to misbehave
Rather than trying by myself other settings, I was wondering if there is an automated way to study this

In particular, after having created the reweighter I do compute the ROC AUC on a number of variables of interest, which I could use a FoM
Would that be useful?

Thanks

from hep_ml.

arogozhnikov avatar arogozhnikov commented on May 23, 2024

Would that be useful?

Not really. 1-dimensional discrepancies are not all discrepancies.

You can drive 1-dimensional ROC AUCs to 0.5 with max_depth=1, but you'll not cover any non-trivial difference between distributions.

(Well, you can use it as a starting point, and then check results using step 2, but completely no guarantees can be done for this approach)

from hep_ml.

bifani avatar bifani commented on May 23, 2024

OK, therefore how do you suggest to pick up the hyper parameters?

from hep_ml.

arogozhnikov avatar arogozhnikov commented on May 23, 2024

If you really want to automate this process, you need to write evaluation function which encounters both steps 1) and 2) mentioned above. E.g. sum over KS(featuture_i) + abs(ROC AUC classifier - 0.5)

As for me: I pick relatively small number of trees 30-50, select leaf size and regularization accordingly to the dataset and play with depth (2-4) and learning rate (0.1-0.3). I stop when I see that I significantly reduced discrepancy between datasets. There are many other errors to be encountered in the analysis and trying to minimize only one of those to zero isn't a wise strategy.

from hep_ml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.