mlr-org / mlr3measures Goto Github PK

View Code? Open in Web Editor NEW

11.0 9.0 3.0 2.23 MB

Performance measures used in mlr3

Home Page: https://mlr3measures.mlr-org.com

License: GNU Lesser General Public License v3.0

R 100.00%

mlr3 machine-learning performance-evaluation performance-measures r r-package

mlr3measures's Introduction

mlr3measures

Package website: release | dev

Implements multiple performance measures for supervised learning. Includes over 40 measures for regression and classification. Additionally, meta information about the performance measures can be queried, e.g. what the best and worst possible performances scores are. Internally, checkmate is used to check arguments efficiently - no other runtime dependencies.

The function reference gives an encompassing overview over implemented measures.

Note that explicitly loading this package is not required if you want to use any of these measures in mlr3. Also note that we advise against attaching the package via library() to avoid namespace clashes. Instead, load the namespace via requireNamespace() and use the :: operator.

mlr3measures's People

Contributors

Stargazers

Watchers

Forkers

mvanhala hong-sung-hyun andreassot10

mlr3measures's Issues

Instance-wise loss

Many measure aggregate over instances, e.g. the weighted mean in MSE here:

mlr3measures/R/regr_mse.R

Line 18 in 680a9b1

wmean(.se(truth, response), sample_weights)

Could we have something like an aggregate argument (default TRUE) to return instance-wise loss instead of aggregated loss if set to FALSE? Would be useful e.g. for bips-hb/cpi#13.

Happy to create a PR for this.

Implement more multiclass classification measures

There are some additional measures in https://github.com/mlr-org/measures. We want those which can be implemented in <= 10 lines of code (this excludes the AUC stuff for now).

Measures: Flexibility and extending to new task_types

Example:
In mlr3forecasting, we could basically use regr.mse in order to score a forecast.

Currently, this is not allowed because assert_measure compares the task_type of measure and task.

I guess I could use a policy for how we want to solve such issues in general:

Open up measure to more types? This would require a reflection mechanism.
Re-add all applicable measures for the new type?

Additionally, in measures.R L. 20 I found the following assertion:
type = assert_choice(type, c("binary", "classif", "regr"))

I guess this does not extend naturally to other task types.
Do we want to add measures from other packages to the measures dictionary?

Integer overflow in auc

In auc, if n_pos * (n_pos + 1L) or n_pos * n_neg exceed the maximum integer size, the result will be NA due to integer overflow.

set.seed(123)
truth <- factor(sample(c("Y", "N"), 250000, replace = TRUE))
prob <- runif(250000)
mlr3measures::auc(truth, prob, "Y")
#> Warning in n_pos * (n_pos + 1L): NAs produced by integer overflow
#> Warning in n_pos * n_neg: NAs produced by integer overflow
#> [1] NA

Support sample weights

All measures which aggregate with the mean can also support sample weights.

Implement binary classification measures

https://github.com/mlr-org/mlr3/blob/master/R/MeasureClassifConfusion.R

I've started with TPR:
https://github.com/mlr-org/mlr3measures/blob/master/R/classif_tpr.R

We probably want all the binary classification measures.

brier score missing?

I think the brier score measure for binary classification is missing.

Support reliability and agreement measures

Some can be found in the irr and irrCAC packages.

Consider using {mathjaxr} for formula rendering in help pages

https://github.com/wviechtb/mathjaxr

PR for new measures

Description

I am not sure do you accept PR for new measure and if yes is below code the right way:

#' @title Linear-exponential Loss
#'
#' @details
#' Linear-exponential, or Linex, loss takes the form \deqn{
#'   L(e)  =  a{1}(exp(a a{2}e)−a{2}e −1).
#' }
#'
#' @templateVar mid linex
#' @template regr_template
#'
#' @inheritParams regr_params
#' @template regr_example
#' @export
linex = function(truth, response, a1 = 1, a2 = -1) {
  assert_regr(truth, response = response)
  if (a2 == 0) stop("Argument a2 can't be 0.")
  if (a1 <= 0) stop("Argument a1 must be greater than 0.")
  e = truth - response
  a1 * (exp(-a2*e) - a2*e - 1)
}

#' @include measures.R
add_measure(linex, "Linear-exponential Loss", "regr", 0, Inf, TRUE)

The function implements linex regr measure.
Should I make PR?

Reproducible example

#' @title Linear-exponential Loss
#'
#' @details
#' Linear-exponential, or Linex, loss takes the form \deqn{
#'   L(e)  =  a{1}(exp(a a{2}e)−a{2}e −1).
#' }
#'
#' @templateVar mid linex
#' @template regr_template
#'
#' @inheritParams regr_params
#' @template regr_example
#' @export
linex = function(truth, response, a1 = 1, a2 = -1) {
  assert_regr(truth, response = response)
  if (a2 == 0) stop("Argument a2 can't be 0.")
  if (a1 <= 0) stop("Argument a1 must be greater than 0.")
  e = truth - response
  a1 * (exp(-a2*e) - a2*e - 1)
}

#' @include measures.R
add_measure(linex, "Linear-exponential Loss", "regr", 0, Inf, TRUE)

Feature Request: Pairwise Jaccard Distances

If more than two sets are provided, the mean of all pairwise scores is calculated.

It would be great to be able to get a matrix of pairs, for tasks such as hierarchical clustering and pairwise distance calculations.

Element with key 'classif.mauc_aunu' not found in DictionaryMeasure!

Description

rr[[2]]$score(msr("classif.mauc_aunu"))
Error: Element with key 'classif.mauc_aunu' not found in DictionaryMeasure!

Reproducible example

rr[[2]]$score(msr("classif.mauc_aunu"))
Error: Element with key 'classif.mauc_aunu' not found in DictionaryMeasure!

All the measures below work, but not classif.mauc_aunu

rr[[2]]$score(msr("classif.acc")) # Classification Accuracy
rr[[2]]$score(msr("classif.bacc")) # Balanced Accuracy
rr[[2]]$score(msr("classif.ce")) # Classification Error
rr[[2]]$score(msr("classif.logloss")) # logloss
rr[[2]]$score(msr("classif.mbrier"))

Check regression measures

Carefully read all implementations and check for copy-paste errors
Also check formulas in PDF generated with devtools::build_manual()
Write tests:
- trigger all functions
- compare results with Metrics
- check that na_value is returned correctly

Parameters in Measure objects

Hi, I couldn't see this answered in the book or in this GitHub but forgive me if it has been somewhere else. What's the correct way to include parameters in a Measure object? For example I can see that the logloss formula has a parameter for eps, but how yould you include this in MeasureClassifLogloss (which I can't find implemented)? Thanks

Move description to details

We can include details in mlr3, but not the description.

Add Precision-Recall AUC measure to mlr3measures?

I was wondering if it would be possible to add a Precision-Recall AUC measure (e.g. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0118432) to mlr3measures.

I've come up with a temporary and rather sloppy solution for use with my own data. But I comment below that there's an issue I haven't been able to deal with. My code should hopefully give you some ideas.

# Define four measures:
# 1. prc_micro: Precision-Recall Area Under the Curve with aggregation method set to 'micro'.
# 2. prc_macro: Precision-Recall Area Under the Curve with aggregation method set to 'macro'.
# 3. auc_micro: ROC Area Under the Curve with aggregation method set to 'micro'.
# 4. auc_macro: ROC Area Under the Curve with aggregation method set to 'macro'.

# 1. prc_micro
prc_micro <- msr('classif.auc')$clone(deep = TRUE)
prc_micro # Take a look- need to change a few things (id etc.)
prc_micro$id <- 'prc_micro'
prc_micro$average <- 'micro' # Aggregation method
prc_micro$packages <- 'PRROC'
prc_micro$man <- NA_character_
prc_micro # Take another look

# 2. prc_macro
prc_macro <- prc_micro$clone()
prc_macro$id <- 'prc_macro'
prc_macro$average <- 'macro'

# 3. auc_micro
auc_micro <- msr('classif.auc')$clone()
auc_micro$id <- 'auc_micro'
auc_micro$average <- 'micro'

# 4. auc_macro
auc_macro <- msr('classif.auc')$clone()
auc_macro$id <- 'auc_macro'

# Create dataset for binary classification
iris1 <- iris %>%
  slice(1:100) %>%
  mutate(Species = factor(Species)) %>%
  as.data.table

task_iris <- TaskClassif$new("iris1", iris1, 
  target = "Species", positive = "setosa")

# Hard-code task inside prc_mirco$fun where PR AUC is calculated. See comments later on about why I've hard-coded this here
prc_micro$fun <- function(task = task_iris, prob, truth, na_value = NaN, ...) # NOTE: Hard-coded task to be task_iris. Commented on later.
{
  truth1 <- ifelse(truth == task$positive, 1, 0) # Package PRROC assumes class '1' is the positive class.I've set 'setosa' as the positive class, so it needs to be set to '1' now
  PRROC::pr.curve(prob, weights.class0 = truth1)[[2]] # Area under the curve computed by integration of the piecewise function
}

# Define learner, parameters etc. and auto-tune
learner <- lrn("classif.xgboost", predict_type = "prob")
resampling_inner <- rsmp("cv", folds = 3)
measures <- list(prc_micro, prc_macro, auc_micro, auc_macro)
tuner = tnr("grid_search", resolution = 4)
terminator <- term("evals", n_evals = 5)
param_set <- ParamSet$new(list(
  ParamFct$new("booster", levels = "gbtree"), 
  ParamInt$new("nrounds", lower = 1, upper = 10), 
  ParamInt$new("max_depth", lower = 3, upper = 10), 
  ParamInt$new("min_child_weight", lower = 0, upper = 10), 
  ParamDbl$new("subsample", lower = 0, upper = 1), 
  ParamDbl$new("eta", lower = 0.1, upper = 0.6),
  ParamDbl$new("colsample_bytree", lower = 0.5, upper = 1),
  ParamInt$new("gamma", lower = 0, upper = 5) # Is it integer or real?
))

at = AutoTuner$new(
  learner, 
  resampling_inner,
  measures,
  param_set, 
  terminator, 
  tuner)

resampling_outer = rsmp("cv", folds = 2)

rr = resample(task = task_iris, learner = at, 
  resampling = resampling_outer, store_models = TRUE)

rr$aggregate(measures)

# prc_micro prc_macro auc_micro auc_macro 
# 0.8835709 0.7500000 0.8758000 0.7500000 

# Derived micro metric for PR AUC is the same with the one from PROC::pr.curve
pred <- as.data.table(rr$prediction())
pred$truth <- ifelse(pred$truth == 'setosa', 1, 0) # Package PRROC assumes class '1' is the positive class. I've set 'setosa' as the positive class, so it needs to be set to '1' now.
pr.curve(pred$prob.setosa, weights.class0 = pred$truth, curve = TRUE)[[2]]

# [1] 0.8835709

Main issues

Note that I've explicitly set task = task_iris in prc_micro$fun. That's because I need to retrieve the information in task_iris$positive in order to set the positive class to '1'. How this could be done without hard-coding is beyond my current understanding of R6.
It is evident that PROC::pr.curve calculates a micro AUC, i.e. prc_micro (see code). So prc_micro$fun works fine. But I don't have a way of confirming that the value from prc_macro$fun is actually correct.

I hope this helps.

Support Concordance Correlation Coefficient

Would it be possible to support calculation of the concordance correlation coefficient?

It seems that it would be quite useful: https://thedatascientist.com/performance-measures-concordance-correlation-coefficient/

In addition, can p-values be outputted with certain performance measures like spearman's rho?

mlr_measures_classif.costs & predict_type = "prob": Change the prob threshold

The cost-sensitive measure mlr_measures_classif.costs requires a 'response' predict type.

msr("classif.costs")
#<MeasureClassifCosts:classif.costs>
#* Packages: -
#* Range: [-Inf, Inf]
#* Minimize: TRUE
#* Properties: requires_task
#* Predict type: response

This measure seems to be working even when a learner's predict_type is set to 'prob':

# get a cost sensitive task
task = tsk("german_credit")

# cost matrix as given on the UCI page of the german credit data set
# https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)
costs = matrix(c(0, 5, 1, 0), nrow = 2)
dimnames(costs) = list(truth = task$class_names, predicted = task$class_names)
print(costs)

# mlr3 needs truth in columns, predictions in rows
costs = t(costs)

# create measure which calculates the absolute costs
m = msr("classif.costs", id = "german_credit_costs", costs = costs, normalize = FALSE)

# fit models and calculate costs
learner = lrn("classif.rpart", predict_type = "prob")
rr = resample(task, learner, rsmp("cv", folds = 3))
rr$aggregate(m)

#german_credit_costs 
#               341

Is this a bug or does the measure internally convert probabilities into classes? I guess the threshold for predicting a class as positive or negative is internally set to 0.5? Can one change this threshold?

mlr-org / mlr3measures Goto Github PK

mlr3measures's Introduction

mlr3measures

mlr3measures's People

Contributors

Stargazers

Watchers

Forkers

mlr3measures's Issues

Description

Reproducible example

Description

Reproducible example

All the measures below work, but not classif.mauc_aunu

Recommend Projects

Recommend Topics

Recommend Org