Coder Social home page Coder Social logo

mlmetrics's People

Contributors

yanyachen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlmetrics's Issues

Wrong logloss if `y_pred` is not a matrix and does not contain all the possible classes

Example to reproduce:

x <- data.frame("C1" = c(0.1,0.1,0.3), "C2" = c(0.3,0.3,0.4), "C3" = c(0.6,0.6,0.3), "indexLabel" = c("C3","C1","C3"))
pred <- x[,c(1:3)]

MultiLogLossCustom(y_true = x$indexLabel, y_pred = pred)

we will get a following y_true:

  as.character.y_true.C1 as.character.y_true.C3
1                      0                      1
2                      1                      0
3                      0                      1

which will result in wrong matrix y_true*log(y_pred)

The defaults of the "positive" argument

When the positive argument is NULL, your package defaults to choosing as.character(Confusion_DF[1, 1]) as the value for positive.

From using your package a bit carelessly, it took me a while to realize it defaults to choosing "0" as the positive category when y_true and y_pred are supplied as vectors of 1s and 0s.

Since 1 tends to be coded as the positive category in most cases, I think users may have an expectation that your package automatically defaults positive to 1 when numerical vectors consisting of 0s and 1s are supplied.

I think a fair deal of careless users thus may not realize their recalls or f1 scores might be incorrect. Perhaps the package should print a message of what category it chose as positive when that argument is left as NULL?

error when all predicted values are 1 for F1_Score function

Hi,

I encounter the error:

Error in FUN(X[[i]], ...): only defined on a data frame with all numeric variables
Traceback:

1. MLmetrics::F1_Score(y_true, y_pred_resp)
2. Precision(y_true, y_pred, positive)
3. Summary.data.frame(structure(list(Freq = integer(0)), row.names = integer(0), class = "data.frame"), 
 .     na.rm = FALSE)
4. lapply(args, function(x) {
 .     x <- as.matrix(x)
 .     if (!is.numeric(x) && !is.complex(x)) 
 .         stop("only defined on a data frame with all numeric variables")
 .     x
 . })
5. FUN(X[[i]], ...)
6. stop("only defined on a data frame with all numeric variables")

if I try to calculate the F1 score. I found that the cause in my case was that all predicted values were 1.

Reproducible example:

# this works:
y_true <- sample(c(0, 1), size = 20, replace = T)
y_pred_resp <- sample(c(0, 1), size = 20, replace = T)
f_one <- MLmetrics::F1_Score(y_true, y_pred_resp, positive = 1)
# this is a possible real worl scenario but produces the error:
y_true <- sample(c(0, 1), size = 20, replace = T)
y_pred_resp <- rep(1, 20)
f_one <- MLmetrics::F1_Score(y_true, y_pred_resp, positive = 1)

Inconsistent order of y_true and y_pred arguments

Would it be possible to implement a consistent order of the y_true and y_pred arguments in the various MLmetrics functions? The inconsistency can be quite frustrating when passing unnamed arguments to these functions.

For instance, MLmetrics::Accuracy() looks like this: MLmetrics::Accuracy(y_pred, y_true) and MLmetrics::Precision() looks like this Precision(y_true, y_pred, positive = NULL).

I realize that this would be a breaking change.

Inconsistencies and errors with `Precision()`

Hello,

Firstly, thanks for developing this package. I have found some bugs with the Precision() function when all the true or predicted values equal the positive value. There is also unexpected behaviour when none of the true values are the "positive value".

See below a minimal example;

Precision(y_true = c(1, 1, 1, 1),
                y_pred = c(1, 0, 1, 0),
                positive = "1")
Error in FUN(X[[i]], ...) : 
  only defined on a data frame with all numeric variables

Precision(y_true = c(1, 0, 1, 0),
                y_pred = c(1, 1, 1, 1),
                positive = "0")
Error in FUN(X[[i]], ...) : 
  only defined on a data frame with all numeric variables

Precision(y_true = c(0, 0, 0, 0),
                y_pred = c(1, 0, 1, 0),
                positive = "1")
[1] NA

Hope this helps,

R2_Score() is inconsistent with other methods

When trying to compute R-Squared from predicted and actual values, the results from R2_Score() did not match other methods. (they were very large and negative).

Here is some R-Code to reproduce this issue:

`x = rnorm(20)
y = rnorm(20)

test = lm(y~x)
summary(test)
y_pred = predict(test)

MLmetrics

R2_Score(y, y_pred)

squared correlation

cor(y_pred, y)^2

sums of squares formula

PRESS = sum((y - y_pred)^2)
SS = sum((y - mean(y))^2)
1 - (PRESS/SS)`

F1 score output

Hi, wondering how you calculated F1 score. Wikipedia says:

image

Here is how I would calculate F1 (using some dummy data), yet I get a different value using the Wikipedia version and MLmetrics version.

image

Thank you.

Integer Overflow in AUC

When I use AUC with more than 100k observations, the AUC calculation has an integer overflow and returns NA, see also this SO question.

For reproducibility:

set.seed(15)
N <- 100000

true <- sample(0:1, N, replace = TRUE)
pred <- sample(0:1, N, replace = TRUE)

MLmetrics::AUC(true, pred)
[1] NA
Warning message:
In n_pos * n_neg : NAs produced by integer overflow

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.