yanyachen / mlmetrics Goto Github PK

View Code? Open in Web Editor NEW

69.0 69.0 14.0 43 KB

Machine Learning Evaluation Metrics

License: GNU General Public License v2.0

R 100.00%

mlmetrics's People

Contributors

Stargazers

Watchers

Forkers

mutual-ai novasmedley renatopanda nanaakwasiabayieboateng qrqpjxq clustersdata vishalbelsare kstepanmpmg jhuanglabtools minghao2016 jianguozhou3 rnaimehaom restevesd chaowu2009

mlmetrics's Issues

Integer Overflow in AUC

When I use AUC with more than 100k observations, the AUC calculation has an integer overflow and returns NA, see also this SO question.

For reproducibility:

set.seed(15)
N <- 100000

true <- sample(0:1, N, replace = TRUE)
pred <- sample(0:1, N, replace = TRUE)

MLmetrics::AUC(true, pred)
[1] NA
Warning message:
In n_pos * n_neg : NAs produced by integer overflow

Inconsistencies and errors with `Precision()`

Hello,

Firstly, thanks for developing this package. I have found some bugs with the Precision() function when all the true or predicted values equal the positive value. There is also unexpected behaviour when none of the true values are the "positive value".

See below a minimal example;

Precision(y_true = c(1, 1, 1, 1),
                y_pred = c(1, 0, 1, 0),
                positive = "1")
Error in FUN(X[[i]], ...) : 
  only defined on a data frame with all numeric variables

Precision(y_true = c(1, 0, 1, 0),
                y_pred = c(1, 1, 1, 1),
                positive = "0")
Error in FUN(X[[i]], ...) : 
  only defined on a data frame with all numeric variables

Precision(y_true = c(0, 0, 0, 0),
                y_pred = c(1, 0, 1, 0),
                positive = "1")
[1] NA

Hope this helps,

Support for multi-class classification metrics such as micro/macro f1-score, precision and recall

First of all thanks for the great package.

I was wondering if it would be possible to have micro/macro statistics (precision, recall, F1 score) for multi-class classification in the future. For instance, as described here: https://sebastianraschka.com/faq/docs/multiclass-metric.html

Thanks!

Wrong logloss if `y_pred` is not a matrix and does not contain all the possible classes

Example to reproduce:

x <- data.frame("C1" = c(0.1,0.1,0.3), "C2" = c(0.3,0.3,0.4), "C3" = c(0.6,0.6,0.3), "indexLabel" = c("C3","C1","C3"))
pred <- x[,c(1:3)]

MultiLogLossCustom(y_true = x$indexLabel, y_pred = pred)

we will get a following y_true:

  as.character.y_true.C1 as.character.y_true.C3
1                      0                      1
2                      1                      0
3                      0                      1

which will result in wrong matrix y_true*log(y_pred)

MAPE without multiply 100%

Hello.

Why in MAPE function dont multiply 100%?

Inconsistent order of y_true and y_pred arguments

Would it be possible to implement a consistent order of the y_true and y_pred arguments in the various MLmetrics functions? The inconsistency can be quite frustrating when passing unnamed arguments to these functions.

For instance, MLmetrics::Accuracy() looks like this: MLmetrics::Accuracy(y_pred, y_true) and MLmetrics::Precision() looks like this Precision(y_true, y_pred, positive = NULL).

I realize that this would be a breaking change.

R2_Score() is inconsistent with other methods

When trying to compute R-Squared from predicted and actual values, the results from R2_Score() did not match other methods. (they were very large and negative).

Here is some R-Code to reproduce this issue:

`x = rnorm(20)
y = rnorm(20)

test = lm(y~x)
summary(test)
y_pred = predict(test)

MLmetrics

R2_Score(y, y_pred)

squared correlation

cor(y_pred, y)^2

sums of squares formula

PRESS = sum((y - y_pred)^2)
SS = sum((y - mean(y))^2)
1 - (PRESS/SS)`

error when all predicted values are 1 for F1_Score function

Hi,

I encounter the error:

Error in FUN(X[[i]], ...): only defined on a data frame with all numeric variables
Traceback:

1. MLmetrics::F1_Score(y_true, y_pred_resp)
2. Precision(y_true, y_pred, positive)
3. Summary.data.frame(structure(list(Freq = integer(0)), row.names = integer(0), class = "data.frame"), 
 .     na.rm = FALSE)
4. lapply(args, function(x) {
 .     x <- as.matrix(x)
 .     if (!is.numeric(x) && !is.complex(x)) 
 .         stop("only defined on a data frame with all numeric variables")
 .     x
 . })
5. FUN(X[[i]], ...)
6. stop("only defined on a data frame with all numeric variables")

if I try to calculate the F1 score. I found that the cause in my case was that all predicted values were 1.

Reproducible example:

# this works:
y_true <- sample(c(0, 1), size = 20, replace = T)
y_pred_resp <- sample(c(0, 1), size = 20, replace = T)
f_one <- MLmetrics::F1_Score(y_true, y_pred_resp, positive = 1)
# this is a possible real worl scenario but produces the error:
y_true <- sample(c(0, 1), size = 20, replace = T)
y_pred_resp <- rep(1, 20)
f_one <- MLmetrics::F1_Score(y_true, y_pred_resp, positive = 1)

typo ''Possion" instead of "Poisson"

Hey Yachen,

Nice work with MLmetrics :)
I noticed a typo in your CRAN documentation, you write "Possion" instead of "Poisson"

Regards,

Selim

The defaults of the "positive" argument

When the positive argument is NULL, your package defaults to choosing as.character(Confusion_DF[1, 1]) as the value for positive.

From using your package a bit carelessly, it took me a while to realize it defaults to choosing "0" as the positive category when y_true and y_pred are supplied as vectors of 1s and 0s.

Since 1 tends to be coded as the positive category in most cases, I think users may have an expectation that your package automatically defaults positive to 1 when numerical vectors consisting of 0s and 1s are supplied.

I think a fair deal of careless users thus may not realize their recalls or f1 scores might be incorrect. Perhaps the package should print a message of what category it chose as positive when that argument is left as NULL?