benmack / oneclass Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 12.0 99.17 MB

One-class classification in the absence of test data.

License: GNU Affero General Public License v3.0

R 17.65% Jupyter Notebook 82.33% Dockerfile 0.02%

oneclass's People

Contributors

Stargazers

Watchers

Forkers

hoseinjafary ellie-peng grseb9s haifeng1992 ryanboyd pridiltal waldnerf lipinu xiangliull minghao2016 aminkhairoun chathu84

oneclass's Issues

Enhance Intro notebook ...

Mention functions:

consistent_ocsvm
generateThresholds
get_params_if_best_at_gridborder

Reproduce example from Can I trust my one class classification

Hi Benjamin
I am facing a one class classification problem and would like to use the approach you developped in your paper entitled "Can I trust my one-class classification?". The current vignette does not seem to explain how to create the diagnostic plots (Fig1). Could you help me with that?
Cheers
Franz

Issue when model trainOcc in parallel

Hello,

When running the trainOcc in parallel, the follow error was prompted. But this error was not prompted when not using parallel mode. Attached a reproducible R code for your reference. Thank!

In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.**

library(oneClass)
library(tidyverse)

#register no of CPU cores used
doParallel:::registerDoParallel(7)

get the banana dataset

library(imbalance)
data(banana)
input_data <- banana

this is the default setting of trControl in trainOcc

cntrl <- trainControl(method = "cv",
number = 5,
summaryFunction = puSummary, #!
classProbs = TRUE, #!
savePredictions = TRUE, #!
returnResamp = "all", #!
allowParallel = TRUE)

tocc <- trainOcc(x=input_data [, -3], y=input_data [, 3], trControl=cntrl, method = "ocsvm")
Setting direction: controls > cases
Warning messages:
1: In .positiveLabel(y) : Positive label not given explicitly.
The positive class is assumed to be the one with smaller frequency.
2 (pos): 0 samples
2 (un): 2640 samples
2: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.

tocc
one-class svm

2640 samples
2 predictor
2 classes: 'un', 'pos'

No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 2111, 2112, 2112, 2113, 2112
Resampling results across tuning parameters:

sigma nu tpr puP ppp puAuc puF puF1 pn
1e-03 0.01 NaN 0 NaN 0 0 0 NaN
1e-03 0.05 NaN 0 NaN 0 0 0 NaN
1e-03 0.10 NaN 0 NaN 0 0 0 NaN
1e-03 0.15 NaN 0 NaN 0 0 0 NaN
1e-03 0.20 NaN 0 NaN 0 0 0 NaN
1e-03 0.25 NaN 0 NaN 0 0 0 NaN
1e-02 0.01 NaN 0 NaN 0 0 0 NaN
1e-02 0.05 NaN 0 NaN 0 0 0 NaN
1e-02 0.10 NaN 0 NaN 0 0 0 NaN
1e-02 0.15 NaN 0 NaN 0 0 0 NaN
1e-02 0.20 NaN 0 NaN 0 0 0 NaN
1e-02 0.25 NaN 0 NaN 0 0 0 NaN
1e-01 0.01 NaN 0 NaN 0 0 0 NaN
1e-01 0.05 NaN 0 NaN 0 0 0 NaN
1e-01 0.10 NaN 0 NaN 0 0 0 NaN
1e-01 0.15 NaN 0 NaN 0 0 0 NaN
1e-01 0.20 NaN 0 NaN 0 0 0 NaN
1e-01 0.25 NaN 0 NaN 0 0 0 NaN
1e+00 0.01 NaN 0 NaN 0 0 0 NaN
1e+00 0.05 NaN 0 NaN 0 0 0 NaN
1e+00 0.10 NaN 0 NaN 0 0 0 NaN
1e+00 0.15 NaN 0 NaN 0 0 0 NaN
1e+00 0.20 NaN 0 NaN 0 0 0 NaN
1e+00 0.25 NaN 0 NaN 0 0 0 NaN
1e+01 0.01 NaN 0 NaN 0 0 0 NaN
1e+01 0.05 NaN 0 NaN 0 0 0 NaN
1e+01 0.10 NaN 0 NaN 0 0 0 NaN
1e+01 0.15 NaN 0 NaN 0 0 0 NaN
1e+01 0.20 NaN 0 NaN 0 0 0 NaN
1e+01 0.25 NaN 0 NaN 0 0 0 NaN
1e+02 0.01 NaN 0 NaN 0 0 0 NaN
1e+02 0.05 NaN 0 NaN 0 0 0 NaN
1e+02 0.10 NaN 0 NaN 0 0 0 NaN
1e+02 0.15 NaN 0 NaN 0 0 0 NaN
1e+02 0.20 NaN 0 NaN 0 0 0 NaN
1e+02 0.25 NaN 0 NaN 0 0 0 NaN

puF was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.001 and nu = 0.01.

sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Traditional)_Hong Kong SAR.950 LC_CTYPE=Chinese (Traditional)_Hong Kong SAR.950
[3] LC_MONETARY=Chinese (Traditional)_Hong Kong SAR.950 LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Hong Kong SAR.950

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] oneClass_0.5.0 kernlab_0.9-29 pROC_1.16.2 caret_6.0-86 lattice_0.20-41 forcats_0.5.0
[7] stringr_1.4.0 dplyr_0.8.5 purrr_0.3.4 readr_1.3.1 tidyr_1.0.2 tibble_3.0.1
[13] ggplot2_3.3.0 tidyverse_1.3.0 imbalance_1.0.2.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 raster_3.1-5 xml2_1.3.1 magrittr_1.5 MASS_7.3-51.5
[6] splines_3.6.3 hms_0.5.3 rvest_0.3.5 tidyselect_1.0.0 colorspace_1.4-1
[11] R6_2.4.1 rlang_0.4.5 foreach_1.5.0 fansi_0.4.1 rgdal_1.4-8
[16] parallel_3.6.3 broom_0.5.6 dismo_1.1-4 gower_0.2.1 dbplyr_1.4.3
[21] modelr_0.1.6 withr_2.2.0 spatial.tools_1.6.2 ellipsis_0.3.0 iterators_1.0.12
[26] class_7.3-15 recipes_0.1.10 abind_1.4-5 assertthat_0.2.1 lifecycle_0.2.0
[31] Matrix_1.2-18 haven_2.2.0 mmap_0.6-19 sp_1.4-1 compiler_3.6.3
[36] cellranger_1.1.0 pillar_1.4.3 scales_1.1.0 backports_1.1.6 generics_0.0.2
[41] stats4_3.6.3 lubridate_1.7.8 jsonlite_1.6.1 pkgconfig_2.0.3 smotefamily_1.3.1
[46] rstudioapi_0.11 doParallel_1.0.15 munsell_0.5.0 prodlim_2019.11.13 httr_1.4.1
[51] plyr_1.8.6 tools_3.6.3 grid_3.6.3 nnet_7.3-12 ipred_0.9-9
[56] nlme_3.1-144 timeDate_3043.102 data.table_1.12.8 gtable_0.3.0 DBI_1.1.0
[61] cli_2.0.2 readxl_1.3.1 yaml_2.2.1 survival_3.1-12 crayon_1.3.4
[66] lava_1.6.7 reshape2_1.4.4 ModelMetrics_1.2.2.2 codetools_0.2-16 fs_1.4.1
[71] vctrs_0.2.4 rpart_4.1-15 glue_1.4.0 reprex_0.3.0 stringi_1.4.6

ONECLASS_ERROR.zip

evaluateOcc strange behaviour

I have multiple .R pipelines from 2018 and 2019 that relied on oneClass and are currently not working at the evaluateOcc() step.
In fact, I just tried to run the notebook example from the package in a fresh installation of R and got the same error.

> ocsvm.ev <- evaluateOcc(ocsvm.fit, te.u=te.x, te.y=te.y, allModels=TRUE, positive=1)

Error in if ((class(newdata) == "rasterTiled" | .is.raster(newdata)) & :
the condition has length > 1

I have been trying to debug for a few days without success. Considering that the issue is not only affecting my datasets but also the examples provided by the package I wonder if this can be a new incompatibility between evaluateOcc function and the packages it relies on?

> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] doParallel_1.0.17 iterators_1.0.14  foreach_1.5.2     oneClass_0.5.0   
 [5] kernlab_0.9-31    pROC_1.18.0       caret_6.0-93      lattice_0.20-45  
 [9] e1071_1.7-12      zoo_1.8-11        ggplot2_3.3.6     maptools_1.1-5   
[13] raster_3.6-3      sp_1.5-0         

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9           lubridate_1.8.0      listenv_0.8.0       
 [4] class_7.3-20         assertthat_0.2.1     digest_0.6.29       
 [7] ipred_0.9-13         utf8_1.2.2           parallelly_1.32.1   
[10] R6_2.5.1             plyr_1.8.7           hardhat_1.2.0       
[13] stats4_4.2.1         pillar_1.8.1         rlang_1.0.6         
[16] rstudioapi_0.14      data.table_1.14.4    rpart_4.1.19        
[19] Matrix_1.5-1         splines_4.2.1        rgdal_1.5-32        
[22] foreign_0.8-83       gower_1.0.0          stringr_1.4.1       
[25] munsell_0.5.0        proxy_0.4-27         compiler_4.2.1      
[28] pkgconfig_2.0.3      dismo_1.3-9          globals_0.16.1      
[31] nnet_7.3-18          mmap_0.6-19          tidyselect_1.2.0    
[34] gridExtra_2.3        tibble_3.1.8         prodlim_2019.11.13  
[37] codetools_0.2-18     viridisLite_0.4.1    fansi_1.0.3         
[40] future_1.28.0        dplyr_1.0.10         withr_2.5.0         
[43] MASS_7.3-58.1        recipes_1.0.2        ModelMetrics_1.2.2.2
[46] grid_4.2.1           nlme_3.1-160         gtable_0.3.1        
[49] lifecycle_1.0.3      DBI_1.1.3            magrittr_2.0.3      
[52] scales_1.2.1         future.apply_1.9.1   cli_3.4.1           
[55] stringi_1.7.8        reshape2_1.4.4       viridis_0.6.2       
[58] timeDate_4021.106    spatial.tools_1.6.2  generics_0.1.3      
[61] vctrs_0.4.2          lava_1.7.0           tools_4.2.1         
[64] glue_1.6.2           purrr_0.3.5          abind_1.4-5         
[67] survival_3.4-0       colorspace_2.0-3     terra_1.6-17

Best,
J

metric puF and trControl

Following the vignette example, I found a issue adding the trainControl option of the Caret package.

tr.control <-trainControl(method="repeatedCV",number=10,repeats=5)
ocsvm.fit <- trainOcc(x=tr.x, y=tr.y, method="ocsvm", tuneGrid=tuneGrid, index=tr.index,trControl = tr.control)

Warning message: Te metric puF was not in the result set. Accuracy will be used instead.

Question for the tutorial

Hello, I have a question in the first example of "One-class classification in R with the oneClass package".... Do you train a model with 20 positive samples and use the 500 unlabeled to calculate the performance? and then find the best model?
Thank you

why maxent can not work?

Hi Benjamin,

Following the the example, I have a problem when using the maxent method . I just replaced the "ocsvm' to "maxent", but it did not work.

ocsvm.fit <- trainOcc(x=tr.x, y=tr.y, method="maxent", index=tr.index)
Error in { : 
  task 1 failed - "arguments imply differing number of rows: 0,

Could you help me with that?
Cheers
Pulni

Qustions about the parameters of C+ and C-

Hi,
I have one question about the paremeters you set in your code.

In the function of bsvm, there are three parameters "sigma", "cNeg", and "cMultiplier", and these parameters were set by using ksvm function. see below

fit = function(x, y, wts, param, lev, last, weights, 
                      classProbs, ...) {
         cPos <- param$cNeg*param$cMultiplier
         ksvm(x = as.matrix(x), y = y,
              kernel = rbfdot,
              kpar = list(sigma = param$sigma),
              C = 1,
              class.weights=c("un" = param$cNeg, "pos" = cPos),
              prob.model = FALSE, #=class.probs
              ...)

However, I saw you set the cost C as the default value 1. C+ and C_ seem come from "class.weights". C+ = C_*cMultiplier. To my understanding, the class.weight is equal to "un"/"pos". From your code the class.weight can be finally transfer to 1/cMultiplier, so maybe the the C+ and C_ you set only can extend the range of the class.weight value. I mean if it is possible to use one parameter to define more values of class.weight when using grid search, rather than used two parameter to define the class.weight. Moreover, I was wondering if we also need to consider the COST, instead just set as default 1.

In the paper"uilding text classifiers using positive and unlabeled examples", they used SVMlight package. They controlled the C+ and C_ through the parameters c (Cost)and j (I guess it is the class.weights ), where c is C- and j =C+/C-. There were also few papers implemented BSVM by using e1071 package, and they also tuned the "Cost" and "weight.class". For example"Single-Species Detection With Airborne Imaging Spectroscopy Data: A Comparison of Support Vector Techniques"

Looking forward to your reply!

Thanks in advance and best wishes
Pulni

benmack / oneclass Goto Github PK

oneclass's People

Contributors

Stargazers

Watchers

Forkers

oneclass's Issues

Enhance Intro notebook ...

Reproduce example from Can I trust my one class classification

Issue when model trainOcc in parallel

get the banana dataset

this is the default setting of trControl in trainOcc

evaluateOcc strange behaviour

metric puF and trControl

Question for the tutorial

why maxent can not work?

Qustions about the parameters of C+ and C-

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent