Coder Social home page Coder Social logo

mlr-org / mlr3measures Goto Github PK

View Code? Open in Web Editor NEW
11.0 9.0 3.0 2.23 MB

Performance measures used in mlr3

Home Page: https://mlr3measures.mlr-org.com

License: GNU Lesser General Public License v3.0

R 100.00%
mlr3 machine-learning performance-evaluation performance-measures r r-package

mlr3measures's Introduction

mlr3measures

Package website: release | dev

r-cmd-check CRAN Status StackOverflow Mattermost

Implements multiple performance measures for supervised learning. Includes over 40 measures for regression and classification. Additionally, meta information about the performance measures can be queried, e.g. what the best and worst possible performances scores are. Internally, checkmate is used to check arguments efficiently - no other runtime dependencies.

The function reference gives an encompassing overview over implemented measures.

Note that explicitly loading this package is not required if you want to use any of these measures in mlr3. Also note that we advise against attaching the package via library() to avoid namespace clashes. Instead, load the namespace via requireNamespace() and use the :: operator.

mlr3measures's People

Contributors

andreassot10 avatar github-actions[bot] avatar mb706 avatar mllg avatar mvanhala avatar pat-s avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlr3measures's Issues

Instance-wise loss

Many measure aggregate over instances, e.g. the weighted mean in MSE here:

wmean(.se(truth, response), sample_weights)

Could we have something like an aggregate argument (default TRUE) to return instance-wise loss instead of aggregated loss if set to FALSE? Would be useful e.g. for bips-hb/cpi#13.

Happy to create a PR for this.

Measures: Flexibility and extending to new task_types

Example:
In mlr3forecasting, we could basically use regr.mse in order to score a forecast.

Currently, this is not allowed because assert_measure compares the task_type of measure and task.

I guess I could use a policy for how we want to solve such issues in general:

  • Open up measure to more types? This would require a reflection mechanism.
  • Re-add all applicable measures for the new type?

Additionally, in measures.R L. 20 I found the following assertion:
type = assert_choice(type, c("binary", "classif", "regr"))

I guess this does not extend naturally to other task types.
Do we want to add measures from other packages to the measures dictionary?

Integer overflow in auc

In auc, if n_pos * (n_pos + 1L) or n_pos * n_neg exceed the maximum integer size, the result will be NA due to integer overflow.

set.seed(123)
truth <- factor(sample(c("Y", "N"), 250000, replace = TRUE))
prob <- runif(250000)
mlr3measures::auc(truth, prob, "Y")
#> Warning in n_pos * (n_pos + 1L): NAs produced by integer overflow
#> Warning in n_pos * n_neg: NAs produced by integer overflow
#> [1] NA

PR for new measures

Description

I am not sure do you accept PR for new measure and if yes is below code the right way:

#' @title Linear-exponential Loss
#'
#' @details
#' Linear-exponential, or Linex, loss takes the form \deqn{
#'   L(e)  =  a{1}(exp(a a{2}e)−a{2}e −1).
#' }
#'
#' @templateVar mid linex
#' @template regr_template
#'
#' @inheritParams regr_params
#' @template regr_example
#' @export
linex = function(truth, response, a1 = 1, a2 = -1) {
  assert_regr(truth, response = response)
  if (a2 == 0) stop("Argument a2 can't be 0.")
  if (a1 <= 0) stop("Argument a1 must be greater than 0.")
  e = truth - response
  a1 * (exp(-a2*e) - a2*e - 1)
}

#' @include measures.R
add_measure(linex, "Linear-exponential Loss", "regr", 0, Inf, TRUE)

The function implements linex regr measure.
Should I make PR?

Reproducible example

#' @title Linear-exponential Loss
#'
#' @details
#' Linear-exponential, or Linex, loss takes the form \deqn{
#'   L(e)  =  a{1}(exp(a a{2}e)−a{2}e −1).
#' }
#'
#' @templateVar mid linex
#' @template regr_template
#'
#' @inheritParams regr_params
#' @template regr_example
#' @export
linex = function(truth, response, a1 = 1, a2 = -1) {
  assert_regr(truth, response = response)
  if (a2 == 0) stop("Argument a2 can't be 0.")
  if (a1 <= 0) stop("Argument a1 must be greater than 0.")
  e = truth - response
  a1 * (exp(-a2*e) - a2*e - 1)
}

#' @include measures.R
add_measure(linex, "Linear-exponential Loss", "regr", 0, Inf, TRUE)

Feature Request: Pairwise Jaccard Distances

If more than two sets are provided, the mean of all pairwise scores is calculated.

It would be great to be able to get a matrix of pairs, for tasks such as hierarchical clustering and pairwise distance calculations.

Element with key 'classif.mauc_aunu' not found in DictionaryMeasure!

Description

rr[[2]]$score(msr("classif.mauc_aunu"))
Error: Element with key 'classif.mauc_aunu' not found in DictionaryMeasure!

Reproducible example

rr[[2]]$score(msr("classif.mauc_aunu"))
Error: Element with key 'classif.mauc_aunu' not found in DictionaryMeasure!

All the measures below work, but not classif.mauc_aunu

rr[[2]]$score(msr("classif.acc")) # Classification Accuracy
rr[[2]]$score(msr("classif.bacc")) # Balanced Accuracy
rr[[2]]$score(msr("classif.ce")) # Classification Error
rr[[2]]$score(msr("classif.logloss")) # logloss
rr[[2]]$score(msr("classif.mbrier"))

Check regression measures

  • Carefully read all implementations and check for copy-paste errors
  • Also check formulas in PDF generated with devtools::build_manual()
  • Write tests:
    • trigger all functions
    • compare results with Metrics
    • check that na_value is returned correctly

Parameters in Measure objects

Hi, I couldn't see this answered in the book or in this GitHub but forgive me if it has been somewhere else. What's the correct way to include parameters in a Measure object? For example I can see that the logloss formula has a parameter for eps, but how yould you include this in MeasureClassifLogloss (which I can't find implemented)? Thanks

Add Precision-Recall AUC measure to mlr3measures?

I was wondering if it would be possible to add a Precision-Recall AUC measure (e.g. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0118432) to mlr3measures.

I've come up with a temporary and rather sloppy solution for use with my own data. But I comment below that there's an issue I haven't been able to deal with. My code should hopefully give you some ideas.

# Define four measures:
# 1. prc_micro: Precision-Recall Area Under the Curve with aggregation method set to 'micro'.
# 2. prc_macro: Precision-Recall Area Under the Curve with aggregation method set to 'macro'.
# 3. auc_micro: ROC Area Under the Curve with aggregation method set to 'micro'.
# 4. auc_macro: ROC Area Under the Curve with aggregation method set to 'macro'.

# 1. prc_micro
prc_micro <- msr('classif.auc')$clone(deep = TRUE)
prc_micro # Take a look- need to change a few things (id etc.)
prc_micro$id <- 'prc_micro'
prc_micro$average <- 'micro' # Aggregation method
prc_micro$packages <- 'PRROC'
prc_micro$man <- NA_character_
prc_micro # Take another look

# 2. prc_macro
prc_macro <- prc_micro$clone()
prc_macro$id <- 'prc_macro'
prc_macro$average <- 'macro'

# 3. auc_micro
auc_micro <- msr('classif.auc')$clone()
auc_micro$id <- 'auc_micro'
auc_micro$average <- 'micro'

# 4. auc_macro
auc_macro <- msr('classif.auc')$clone()
auc_macro$id <- 'auc_macro'

# Create dataset for binary classification
iris1 <- iris %>%
  slice(1:100) %>%
  mutate(Species = factor(Species)) %>%
  as.data.table

task_iris <- TaskClassif$new("iris1", iris1, 
  target = "Species", positive = "setosa")

# Hard-code task inside prc_mirco$fun where PR AUC is calculated. See comments later on about why I've hard-coded this here
prc_micro$fun <- function(task = task_iris, prob, truth, na_value = NaN, ...) # NOTE: Hard-coded task to be task_iris. Commented on later.
{
  truth1 <- ifelse(truth == task$positive, 1, 0) # Package PRROC assumes class '1' is the positive class.I've set 'setosa' as the positive class, so it needs to be set to '1' now
  PRROC::pr.curve(prob, weights.class0 = truth1)[[2]] # Area under the curve computed by integration of the piecewise function
}

# Define learner, parameters etc. and auto-tune
learner <- lrn("classif.xgboost", predict_type = "prob")
resampling_inner <- rsmp("cv", folds = 3)
measures <- list(prc_micro, prc_macro, auc_micro, auc_macro)
tuner = tnr("grid_search", resolution = 4)
terminator <- term("evals", n_evals = 5)
param_set <- ParamSet$new(list(
  ParamFct$new("booster", levels = "gbtree"), 
  ParamInt$new("nrounds", lower = 1, upper = 10), 
  ParamInt$new("max_depth", lower = 3, upper = 10), 
  ParamInt$new("min_child_weight", lower = 0, upper = 10), 
  ParamDbl$new("subsample", lower = 0, upper = 1), 
  ParamDbl$new("eta", lower = 0.1, upper = 0.6),
  ParamDbl$new("colsample_bytree", lower = 0.5, upper = 1),
  ParamInt$new("gamma", lower = 0, upper = 5) # Is it integer or real?
))

at = AutoTuner$new(
  learner, 
  resampling_inner,
  measures,
  param_set, 
  terminator, 
  tuner)

resampling_outer = rsmp("cv", folds = 2)

rr = resample(task = task_iris, learner = at, 
  resampling = resampling_outer, store_models = TRUE)

rr$aggregate(measures)

# prc_micro prc_macro auc_micro auc_macro 
# 0.8835709 0.7500000 0.8758000 0.7500000 

# Derived micro metric for PR AUC is the same with the one from PROC::pr.curve
pred <- as.data.table(rr$prediction())
pred$truth <- ifelse(pred$truth == 'setosa', 1, 0) # Package PRROC assumes class '1' is the positive class. I've set 'setosa' as the positive class, so it needs to be set to '1' now.
pr.curve(pred$prob.setosa, weights.class0 = pred$truth, curve = TRUE)[[2]]

# [1] 0.8835709

Main issues

  1. Note that I've explicitly set task = task_iris in prc_micro$fun. That's because I need to retrieve the information in task_iris$positive in order to set the positive class to '1'. How this could be done without hard-coding is beyond my current understanding of R6.
  2. It is evident that PROC::pr.curve calculates a micro AUC, i.e. prc_micro (see code). So prc_micro$fun works fine. But I don't have a way of confirming that the value from prc_macro$fun is actually correct.

I hope this helps.

mlr_measures_classif.costs & predict_type = "prob": Change the prob threshold

The cost-sensitive measure mlr_measures_classif.costs requires a 'response' predict type.

msr("classif.costs")
#<MeasureClassifCosts:classif.costs>
#* Packages: -
#* Range: [-Inf, Inf]
#* Minimize: TRUE
#* Properties: requires_task
#* Predict type: response

This measure seems to be working even when a learner's predict_type is set to 'prob':

# get a cost sensitive task
task = tsk("german_credit")

# cost matrix as given on the UCI page of the german credit data set
# https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)
costs = matrix(c(0, 5, 1, 0), nrow = 2)
dimnames(costs) = list(truth = task$class_names, predicted = task$class_names)
print(costs)

# mlr3 needs truth in columns, predictions in rows
costs = t(costs)

# create measure which calculates the absolute costs
m = msr("classif.costs", id = "german_credit_costs", costs = costs, normalize = FALSE)

# fit models and calculate costs
learner = lrn("classif.rpart", predict_type = "prob")
rr = resample(task, learner, rsmp("cv", folds = 3))
rr$aggregate(m)

#german_credit_costs 
#               341

Is this a bug or does the measure internally convert probabilities into classes? I guess the threshold for predicting a class as positive or negative is internally set to 0.5? Can one change this threshold?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.