benmack / oneclass Goto Github PK
View Code? Open in Web Editor NEWOne-class classification in the absence of test data.
License: GNU Affero General Public License v3.0
One-class classification in the absence of test data.
License: GNU Affero General Public License v3.0
Mention functions:
consistent_ocsvm
generateThresholds
get_params_if_best_at_gridborder
Hi Benjamin
I am facing a one class classification problem and would like to use the approach you developped in your paper entitled "Can I trust my one-class classification?". The current vignette does not seem to explain how to create the diagnostic plots (Fig1). Could you help me with that?
Cheers
Franz
Hello,
When running the trainOcc in parallel, the follow error was prompted. But this error was not prompted when not using parallel mode. Attached a reproducible R code for your reference. Thank!
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.**
library(oneClass)
library(tidyverse)#register no of CPU cores used
doParallel:::registerDoParallel(7)get the banana dataset
library(imbalance)
data(banana)
input_data <- bananathis is the default setting of trControl in trainOcc
cntrl <- trainControl(method = "cv",
number = 5,
summaryFunction = puSummary, #!
classProbs = TRUE, #!
savePredictions = TRUE, #!
returnResamp = "all", #!
allowParallel = TRUE)tocc <- trainOcc(x=input_data [, -3], y=input_data [, 3], trControl=cntrl, method = "ocsvm")
Setting direction: controls > cases
Warning messages:
1: In .positiveLabel(y) : Positive label not given explicitly.
The positive class is assumed to be the one with smaller frequency.
2 (pos): 0 samples
2 (un): 2640 samples
2: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
tocc
one-class svm
2640 samples
2 predictor
2 classes: 'un', 'pos'
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 2111, 2112, 2112, 2113, 2112
Resampling results across tuning parameters:
sigma nu tpr puP ppp puAuc puF puF1 pn
1e-03 0.01 NaN 0 NaN 0 0 0 NaN
1e-03 0.05 NaN 0 NaN 0 0 0 NaN
1e-03 0.10 NaN 0 NaN 0 0 0 NaN
1e-03 0.15 NaN 0 NaN 0 0 0 NaN
1e-03 0.20 NaN 0 NaN 0 0 0 NaN
1e-03 0.25 NaN 0 NaN 0 0 0 NaN
1e-02 0.01 NaN 0 NaN 0 0 0 NaN
1e-02 0.05 NaN 0 NaN 0 0 0 NaN
1e-02 0.10 NaN 0 NaN 0 0 0 NaN
1e-02 0.15 NaN 0 NaN 0 0 0 NaN
1e-02 0.20 NaN 0 NaN 0 0 0 NaN
1e-02 0.25 NaN 0 NaN 0 0 0 NaN
1e-01 0.01 NaN 0 NaN 0 0 0 NaN
1e-01 0.05 NaN 0 NaN 0 0 0 NaN
1e-01 0.10 NaN 0 NaN 0 0 0 NaN
1e-01 0.15 NaN 0 NaN 0 0 0 NaN
1e-01 0.20 NaN 0 NaN 0 0 0 NaN
1e-01 0.25 NaN 0 NaN 0 0 0 NaN
1e+00 0.01 NaN 0 NaN 0 0 0 NaN
1e+00 0.05 NaN 0 NaN 0 0 0 NaN
1e+00 0.10 NaN 0 NaN 0 0 0 NaN
1e+00 0.15 NaN 0 NaN 0 0 0 NaN
1e+00 0.20 NaN 0 NaN 0 0 0 NaN
1e+00 0.25 NaN 0 NaN 0 0 0 NaN
1e+01 0.01 NaN 0 NaN 0 0 0 NaN
1e+01 0.05 NaN 0 NaN 0 0 0 NaN
1e+01 0.10 NaN 0 NaN 0 0 0 NaN
1e+01 0.15 NaN 0 NaN 0 0 0 NaN
1e+01 0.20 NaN 0 NaN 0 0 0 NaN
1e+01 0.25 NaN 0 NaN 0 0 0 NaN
1e+02 0.01 NaN 0 NaN 0 0 0 NaN
1e+02 0.05 NaN 0 NaN 0 0 0 NaN
1e+02 0.10 NaN 0 NaN 0 0 0 NaN
1e+02 0.15 NaN 0 NaN 0 0 0 NaN
1e+02 0.20 NaN 0 NaN 0 0 0 NaN
1e+02 0.25 NaN 0 NaN 0 0 0 NaN
puF was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.001 and nu = 0.01.
sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Traditional)_Hong Kong SAR.950 LC_CTYPE=Chinese (Traditional)_Hong Kong SAR.950
[3] LC_MONETARY=Chinese (Traditional)_Hong Kong SAR.950 LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Hong Kong SAR.950
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] oneClass_0.5.0 kernlab_0.9-29 pROC_1.16.2 caret_6.0-86 lattice_0.20-41 forcats_0.5.0
[7] stringr_1.4.0 dplyr_0.8.5 purrr_0.3.4 readr_1.3.1 tidyr_1.0.2 tibble_3.0.1
[13] ggplot2_3.3.0 tidyverse_1.3.0 imbalance_1.0.2.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 raster_3.1-5 xml2_1.3.1 magrittr_1.5 MASS_7.3-51.5
[6] splines_3.6.3 hms_0.5.3 rvest_0.3.5 tidyselect_1.0.0 colorspace_1.4-1
[11] R6_2.4.1 rlang_0.4.5 foreach_1.5.0 fansi_0.4.1 rgdal_1.4-8
[16] parallel_3.6.3 broom_0.5.6 dismo_1.1-4 gower_0.2.1 dbplyr_1.4.3
[21] modelr_0.1.6 withr_2.2.0 spatial.tools_1.6.2 ellipsis_0.3.0 iterators_1.0.12
[26] class_7.3-15 recipes_0.1.10 abind_1.4-5 assertthat_0.2.1 lifecycle_0.2.0
[31] Matrix_1.2-18 haven_2.2.0 mmap_0.6-19 sp_1.4-1 compiler_3.6.3
[36] cellranger_1.1.0 pillar_1.4.3 scales_1.1.0 backports_1.1.6 generics_0.0.2
[41] stats4_3.6.3 lubridate_1.7.8 jsonlite_1.6.1 pkgconfig_2.0.3 smotefamily_1.3.1
[46] rstudioapi_0.11 doParallel_1.0.15 munsell_0.5.0 prodlim_2019.11.13 httr_1.4.1
[51] plyr_1.8.6 tools_3.6.3 grid_3.6.3 nnet_7.3-12 ipred_0.9-9
[56] nlme_3.1-144 timeDate_3043.102 data.table_1.12.8 gtable_0.3.0 DBI_1.1.0
[61] cli_2.0.2 readxl_1.3.1 yaml_2.2.1 survival_3.1-12 crayon_1.3.4
[66] lava_1.6.7 reshape2_1.4.4 ModelMetrics_1.2.2.2 codetools_0.2-16 fs_1.4.1
[71] vctrs_0.2.4 rpart_4.1-15 glue_1.4.0 reprex_0.3.0 stringi_1.4.6
I have multiple .R pipelines from 2018 and 2019 that relied on oneClass and are currently not working at the evaluateOcc() step.
In fact, I just tried to run the notebook example from the package in a fresh installation of R and got the same error.
> ocsvm.ev <- evaluateOcc(ocsvm.fit, te.u=te.x, te.y=te.y, allModels=TRUE, positive=1)
Error in if ((class(newdata) == "rasterTiled" | .is.raster(newdata)) & :
the condition has length > 1
I have been trying to debug for a few days without success. Considering that the issue is not only affecting my datasets but also the examples provided by the package I wonder if this can be a new incompatibility between evaluateOcc function and the packages it relies on?
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] doParallel_1.0.17 iterators_1.0.14 foreach_1.5.2 oneClass_0.5.0
[5] kernlab_0.9-31 pROC_1.18.0 caret_6.0-93 lattice_0.20-45
[9] e1071_1.7-12 zoo_1.8-11 ggplot2_3.3.6 maptools_1.1-5
[13] raster_3.6-3 sp_1.5-0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 lubridate_1.8.0 listenv_0.8.0
[4] class_7.3-20 assertthat_0.2.1 digest_0.6.29
[7] ipred_0.9-13 utf8_1.2.2 parallelly_1.32.1
[10] R6_2.5.1 plyr_1.8.7 hardhat_1.2.0
[13] stats4_4.2.1 pillar_1.8.1 rlang_1.0.6
[16] rstudioapi_0.14 data.table_1.14.4 rpart_4.1.19
[19] Matrix_1.5-1 splines_4.2.1 rgdal_1.5-32
[22] foreign_0.8-83 gower_1.0.0 stringr_1.4.1
[25] munsell_0.5.0 proxy_0.4-27 compiler_4.2.1
[28] pkgconfig_2.0.3 dismo_1.3-9 globals_0.16.1
[31] nnet_7.3-18 mmap_0.6-19 tidyselect_1.2.0
[34] gridExtra_2.3 tibble_3.1.8 prodlim_2019.11.13
[37] codetools_0.2-18 viridisLite_0.4.1 fansi_1.0.3
[40] future_1.28.0 dplyr_1.0.10 withr_2.5.0
[43] MASS_7.3-58.1 recipes_1.0.2 ModelMetrics_1.2.2.2
[46] grid_4.2.1 nlme_3.1-160 gtable_0.3.1
[49] lifecycle_1.0.3 DBI_1.1.3 magrittr_2.0.3
[52] scales_1.2.1 future.apply_1.9.1 cli_3.4.1
[55] stringi_1.7.8 reshape2_1.4.4 viridis_0.6.2
[58] timeDate_4021.106 spatial.tools_1.6.2 generics_0.1.3
[61] vctrs_0.4.2 lava_1.7.0 tools_4.2.1
[64] glue_1.6.2 purrr_0.3.5 abind_1.4-5
[67] survival_3.4-0 colorspace_2.0-3 terra_1.6-17
Best,
J
Following the vignette example, I found a issue adding the trainControl option of the Caret package.
tr.control <-trainControl(method="repeatedCV",number=10,repeats=5)
ocsvm.fit <- trainOcc(x=tr.x, y=tr.y, method="ocsvm", tuneGrid=tuneGrid, index=tr.index,trControl = tr.control)
Warning message: Te metric puF was not in the result set. Accuracy will be used instead.
Hello, I have a question in the first example of "One-class classification in R with the oneClass package".... Do you train a model with 20 positive samples and use the 500 unlabeled to calculate the performance? and then find the best model?
Thank you
Hi Benjamin,
Following the the example, I have a problem when using the maxent method . I just replaced the "ocsvm' to "maxent", but it did not work.
ocsvm.fit <- trainOcc(x=tr.x, y=tr.y, method="maxent", index=tr.index)
Error in { :
task 1 failed - "arguments imply differing number of rows: 0,
Could you help me with that?
Cheers
Pulni
Hi,
I have one question about the paremeters you set in your code.
In the function of bsvm, there are three parameters "sigma", "cNeg", and "cMultiplier", and these parameters were set by using ksvm function. see below
fit = function(x, y, wts, param, lev, last, weights,
classProbs, ...) {
cPos <- param$cNeg*param$cMultiplier
ksvm(x = as.matrix(x), y = y,
kernel = rbfdot,
kpar = list(sigma = param$sigma),
C = 1,
class.weights=c("un" = param$cNeg, "pos" = cPos),
prob.model = FALSE, #=class.probs
...)
However, I saw you set the cost C as the default value 1. C+ and C_ seem come from "class.weights". C+ = C_*cMultiplier. To my understanding, the class.weight is equal to "un"/"pos". From your code the class.weight can be finally transfer to 1/cMultiplier, so maybe the the C+ and C_ you set only can extend the range of the class.weight value. I mean if it is possible to use one parameter to define more values of class.weight when using grid search, rather than used two parameter to define the class.weight. Moreover, I was wondering if we also need to consider the COST, instead just set as default 1.
In the paper"uilding text classifiers using positive and unlabeled examples", they used SVMlight package. They controlled the C+ and C_ through the parameters c (Cost)and j (I guess it is the class.weights ), where c is C- and j =C+/C-. There were also few papers implemented BSVM by using e1071 package, and they also tuned the "Cost" and "weight.class". For example"Single-Species Detection With Airborne Imaging Spectroscopy Data: A Comparison of Support Vector Techniques"
Looking forward to your reply!
Thanks in advance and best wishes
Pulni
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.