yanyachen / mlmetrics Goto Github PK
View Code? Open in Web Editor NEWMachine Learning Evaluation Metrics
License: GNU General Public License v2.0
Machine Learning Evaluation Metrics
License: GNU General Public License v2.0
When I use AUC with more than 100k observations, the AUC calculation has an integer overflow and returns NA, see also this SO question.
For reproducibility:
set.seed(15)
N <- 100000
true <- sample(0:1, N, replace = TRUE)
pred <- sample(0:1, N, replace = TRUE)
MLmetrics::AUC(true, pred)
[1] NA
Warning message:
In n_pos * n_neg : NAs produced by integer overflow
Hello,
Firstly, thanks for developing this package. I have found some bugs with the Precision()
function when all the true or predicted values equal the positive value. There is also unexpected behaviour when none of the true values are the "positive value".
See below a minimal example;
Precision(y_true = c(1, 1, 1, 1),
y_pred = c(1, 0, 1, 0),
positive = "1")
Error in FUN(X[[i]], ...) :
only defined on a data frame with all numeric variables
Precision(y_true = c(1, 0, 1, 0),
y_pred = c(1, 1, 1, 1),
positive = "0")
Error in FUN(X[[i]], ...) :
only defined on a data frame with all numeric variables
Precision(y_true = c(0, 0, 0, 0),
y_pred = c(1, 0, 1, 0),
positive = "1")
[1] NA
Hope this helps,
First of all thanks for the great package.
I was wondering if it would be possible to have micro/macro statistics (precision, recall, F1 score) for multi-class classification in the future. For instance, as described here: https://sebastianraschka.com/faq/docs/multiclass-metric.html
Thanks!
Example to reproduce:
x <- data.frame("C1" = c(0.1,0.1,0.3), "C2" = c(0.3,0.3,0.4), "C3" = c(0.6,0.6,0.3), "indexLabel" = c("C3","C1","C3"))
pred <- x[,c(1:3)]
MultiLogLossCustom(y_true = x$indexLabel, y_pred = pred)
we will get a following y_true
:
as.character.y_true.C1 as.character.y_true.C3
1 0 1
2 1 0
3 0 1
which will result in wrong matrix y_true*log(y_pred)
Hello.
Why in MAPE function dont multiply 100%?
Would it be possible to implement a consistent order of the y_true
and y_pred
arguments in the various MLmetrics
functions? The inconsistency can be quite frustrating when passing unnamed arguments to these functions.
For instance, MLmetrics::Accuracy()
looks like this: MLmetrics::Accuracy(y_pred, y_true)
and MLmetrics::Precision()
looks like this Precision(y_true, y_pred, positive = NULL)
.
I realize that this would be a breaking change.
When trying to compute R-Squared from predicted and actual values, the results from R2_Score() did not match other methods. (they were very large and negative).
Here is some R-Code to reproduce this issue:
`x = rnorm(20)
y = rnorm(20)
test = lm(y~x)
summary(test)
y_pred = predict(test)
R2_Score(y, y_pred)
cor(y_pred, y)^2
PRESS = sum((y - y_pred)^2)
SS = sum((y - mean(y))^2)
1 - (PRESS/SS)`
Hi,
I encounter the error:
Error in FUN(X[[i]], ...): only defined on a data frame with all numeric variables
Traceback:
1. MLmetrics::F1_Score(y_true, y_pred_resp)
2. Precision(y_true, y_pred, positive)
3. Summary.data.frame(structure(list(Freq = integer(0)), row.names = integer(0), class = "data.frame"),
. na.rm = FALSE)
4. lapply(args, function(x) {
. x <- as.matrix(x)
. if (!is.numeric(x) && !is.complex(x))
. stop("only defined on a data frame with all numeric variables")
. x
. })
5. FUN(X[[i]], ...)
6. stop("only defined on a data frame with all numeric variables")
if I try to calculate the F1 score. I found that the cause in my case was that all predicted values were 1.
Reproducible example:
# this works:
y_true <- sample(c(0, 1), size = 20, replace = T)
y_pred_resp <- sample(c(0, 1), size = 20, replace = T)
f_one <- MLmetrics::F1_Score(y_true, y_pred_resp, positive = 1)
# this is a possible real worl scenario but produces the error:
y_true <- sample(c(0, 1), size = 20, replace = T)
y_pred_resp <- rep(1, 20)
f_one <- MLmetrics::F1_Score(y_true, y_pred_resp, positive = 1)
Hey Yachen,
Nice work with MLmetrics :)
I noticed a typo in your CRAN documentation, you write "Possion" instead of "Poisson"
Regards,
Selim
When the positive
argument is NULL
, your package defaults to choosing as.character(Confusion_DF[1, 1])
as the value for positive
.
From using your package a bit carelessly, it took me a while to realize it defaults to choosing "0"
as the positive category when y_true
and y_pred
are supplied as vectors of 1
s and 0
s.
Since 1
tends to be coded as the positive category in most cases, I think users may have an expectation that your package automatically defaults positive
to 1
when numerical vectors consisting of 0
s and 1
s are supplied.
I think a fair deal of careless users thus may not realize their recalls or f1 scores might be incorrect. Perhaps the package should print a message of what category it chose as positive
when that argument is left as NULL
?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.