Coder Social home page Coder Social logo

mlr-org / mlr3tuningspaces Goto Github PK

View Code? Open in Web Editor NEW
12.0 7.0 3.0 5.91 MB

Collection of search spaces for hyperparameter optimization in the mlr3 ecosystem

Home Page: https://mlr3tuningspaces.mlr-org.com

R 100.00%
r machine-learning mlr3 tuning tune hyperparameter-tuning automl r-package

mlr3tuningspaces's Introduction

mlr3tuningspaces

Package website: release | dev

r-cmd-check CRAN Status StackOverflow Mattermost

mlr3tuningspaces is a collection of search spaces for hyperparameter optimization in the mlr3 ecosystem. It features ready-to-use search spaces for many popular machine learning algorithms. The search spaces are from scientific articles and work for a wide range of data sets. Currently, we offer tuning spaces from three publications.

Publication Learner n Hyperparameter
Bischl et al. (2021) glmnet 2
kknn 3
ranger 4
rpart 3
svm 4
xgboost 8
Kuehn et al. (2018) glmnet 2
kknn 1
ranger 8
rpart 4
svm 5
xgboost 13
Binder, Pfisterer, and Bischl (2020) glmnet 2
kknn 1
ranger 6
rpart 4
svm 4
xgboost 10

Resources

There are several sections about hyperparameter optimization in the mlr3book.

  • Getting started with the book section on mlr3tuningspaces.
  • Learn about search space.

The gallery features a collection of case studies and demos about optimization.

  • Tune a classification tree with the default tuning space from Bischl et al. (2021).

Installation

Install the last release from CRAN:

install.packages("mlr3tuningspaces")

Install the development version from GitHub:

remotes::install_github("mlr-org/mlr3tuningspaces")

Example

Quick Tuning

A learner passed to the lts() function arguments the learner with the default tuning space from Bischl et al. (2021).

library(mlr3tuningspaces)

learner = lts(lrn("classif.rpart"))

# tune learner on pima data set
instance = tune(
  tnr("random_search"),
  task = tsk("pima"),
  learner = learner,
  resampling = rsmp("holdout"),
  measure = msr("classif.ce"),
  term_evals = 10
)

# best performing hyperparameter configuration
instance$result
##    minsplit minbucket        cp learner_param_vals  x_domain classif.ce
## 1: 1.966882  3.038246 -4.376785          <list[4]> <list[3]>  0.2265625

Tuning Search Spaces

The mlr_tuning_spaces dictionary contains all tuning spaces.

library("data.table")

# print keys and tuning spaces
as.data.table(mlr_tuning_spaces)

A key passed to the lts() function returns the TuningSpace.

tuning_space = lts("classif.rpart.rbv2")
tuning_space
## <TuningSpace:classif.rpart.rbv2>: Classification Rpart with RandomBot
##           id lower upper levels logscale
## 1:        cp 1e-04     1            TRUE
## 2:  maxdepth 1e+00    30           FALSE
## 3: minbucket 1e+00   100           FALSE
## 4:  minsplit 1e+00   100           FALSE

Get the learner with tuning space.

tuning_space$get_learner()
## <LearnerClassifRpart:classif.rpart>: Classification Tree
## * Model: -
## * Parameters: xval=0, cp=<RangeTuneToken>, maxdepth=<RangeTuneToken>,
##   minbucket=<RangeTuneToken>, minsplit=<RangeTuneToken>
## * Packages: mlr3, rpart
## * Predict Types:  [response], prob
## * Feature Types: logical, integer, numeric, factor, ordered
## * Properties: importance, missings, multiclass, selected_features, twoclass, weights

Adding New Tuning Spaces

We are looking forward to new collections of tuning spaces from peer-reviewed articles. You can suggest new tuning spaces in an issue or contribute a new collection yourself in a pull request. Take a look at an already implemented collection e.g. our default tuning spaces from Bischl et al. (2021). A TuningSpace is added to the mlr_tuning_spaces dictionary with the add_tuning_space() function. Create a tuning space for each variant of the learner e.g. for LearnerClassifRpart and LearnerRegrRpart.

vals = list(
  minsplit  = to_tune(2, 64, logscale = TRUE),
  cp        = to_tune(1e-04, 1e-1, logscale = TRUE)
)

add_tuning_space(
  id = "classif.rpart.example",
  values = vals,
  tags = c("default", "classification"),
  learner = "classif.rpart",
  label = "Classification Tree Example"
)

Choose a name that is related to the publication and adjust the documentation.

The reference is added to the bibentries.R file

bischl_2021 = bibentry("misc",
  key           = "bischl_2021",
  title         = "Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges",
  author        = "Bernd Bischl and Martin Binder and Michel Lang and Tobias Pielok and Jakob Richter and Stefan Coors and Janek Thomas and Theresa Ullmann and Marc Becker and Anne-Laure Boulesteix and Difan Deng and Marius Lindauer",
  year          = "2021",
  eprint        = "2107.05847",
  archivePrefix = "arXiv",
  primaryClass  = "stat.ML",
  url           = "https://arxiv.org/abs/2107.05847"
)

We are happy to help you with the pull request if you have any questions.

References

Binder, Martin, Florian Pfisterer, and Bernd Bischl. 2020. “Collecting Empirical Data about Hyperparameters for Data Driven AutoML.” https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_63.pdf.

Bischl, Bernd, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, et al. 2021. “Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges.” https://arxiv.org/abs/2107.05847.

Kuehn, Daniel, Philipp Probst, Janek Thomas, and Bernd Bischl. 2018. “Automatic Exploration of Machine Learning Experiments on OpenML.” https://arxiv.org/abs/1806.10961.

mlr3tuningspaces's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlr3tuningspaces's Issues

Release mlr3tuningspaces 0.5.0

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • Add preemptive link to blog post in pkgdown news menu
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)
  • Finish blog post
  • Tweet

Add Prefix Parameter to lts for Tuning Spaces in Pipelines?

I wanted to use a search space for a pipeline and tried to rename the tuning space:

task = tsk("boston_housing")
tuner = tnr("random_search", batch_size = 3)

# Potentially large ML pipeline
xgb = as_learner(po("encode") %>>% lrn("regr.xgboost"))

search_space = lts("regr.xgboost.default")
# My attempt to rename the tuning space so that it matches the expected parameter names:
names(search_space$values) = paste0("regr.xgboost.", names(search_space$values)

at = auto_tuner(
  tuner = tuner,
  learner = xgb,
  search_space = search_space,
  resampling = rsmp("cv", folds = 2),
  measure = msr("regr.mse"),
  term_time = 20
)

at$train(task)

Maybe it could be useful to be able to allow passing a prefix to lts such as lts("regr.xgboost.default", param_set_prefix = "regr.xgboost.")?

The workaround is to use xgb = as_learner(po("encode") %>>% lts(lrn("regr.xgboost"))) maybe this can be mentioned in the book here?
Currently, the book only mentions that this is possible without mentioning a use case: "We could also apply the default search spaces from Bischl et al. (2023) by passing the learner to [lts()]".
Maybe one could add 1-2 more sentences to highlight that this can be useful if one wants to tune the learner parameters when the learner is combined with other pipeops?

leanify_package

The package does not call leanify_package(); idk on top of my head if it makes sense to call it, but there are R6 classes here, so I'd assume it does.

Release mlr3tuningspaces 0.3.5

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • git push
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • git push

Default tuning spaces vs. Bischl et al. 2021 article

Hello,
I have a question. I've noticed that several default tuning spaces don't seem to correspond to those in the article by Bischl et al. 2021. Here are a few examples:

  • GLMNET
    s: [log(10^-4); log(10000)] instead of [2^-12; 2^12]

  • RPART
    minsplit: [log(2); log(128)] instead of [2^1; 2^7]
    minbucket: [log(1); log(64)] instead of [2^0; 2^6]
    cp: [log(10^-4); log(0.1)] instead of [10^-4; 10^-1]

  • KNN
    k: [log(1); log(50)] instead of [1; 50]

  • SVM
    cost and gamma: [log(10^-4); log(10000)] instead of [2^-12; 2^12]

  • XGBOOST
    eta: [log(10^-4); log(1)] instead of [10^-4; 10^0]
    lambda and alpha: [log(10^-3); log(1000)] instead of [2^-10; 2^10]

Is there a reason for this? Sorry if I misunderstood something. Thank you for your assistance.

Release mlr3tuningspaces 0.3.5

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • git push
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • git push

bibentry for default tuningspaces can be updated to the paper

@Article{bischl2023hyperparameter,
title={Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges},
author={Bischl, Bernd and Binder, Martin and Lang, Michel and Pielok, Tobias and Richter, Jakob and Coors, Stefan and Thomas, Janek and Ullmann, Theresa and Becker, Marc and Boulesteix, Anne-Laure and others},
journal={Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery},
volume={13},
number={2},
pages={e1484},
year={2023},
publisher={Wiley Online Library}
}

regr.xgboost.rbv2 and classif.xgboost.rbv2 - nrounds

Description

The setting of nrounds for regr.xgboost.rbv2 and classif.xgboost.rbv2 deviate from Kuehn et al. (2018). It should be [1,5000] but is defined as p_dbl(lower = 2, upper = 8, trafo = function(x) as.integer(round(exp(x)))).

Reproducible example

    library(mlr3tuningspaces)
    tuning_space_xgboost <- lts("regr.xgboost.rbv2")
    tuning_space_xgboost$values$nrounds

Tuning Spaces from rbv2

Get them from here, perhaps use them with suffix rbv2 or something?
(You might have to scroll down a little to find them. the trainsize and repl params need to be removed.)

Release mlr3tuningspaces 0.5.1

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)

Release mlr3tuningspaces 0.4.0

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Check if any deprecation processes should be advanced, as described in Gradual deprecation
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • git push
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • git push
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Tuning spaces for PipeOps

It would be useful to have tuning spaces also for at least a few mlr3pipelines PipeOps.
The reason there is, that we could then basically specify an AutoML system by just writing down the graph and then auto-generating the relevant tuning space from the individual pipeops and learners, essentially eliminating the need to write down the tuning space.

Open Questions:

  • Should this live here or in mlr3pipelines?
  • Does this require a proof of concept that demonstrates that it makes sense?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.