Coder Social home page Coder Social logo

oneclass's People

Contributors

benmack avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

oneclass's Issues

Reproduce example from Can I trust my one class classification

Hi Benjamin
I am facing a one class classification problem and would like to use the approach you developped in your paper entitled "Can I trust my one-class classification?". The current vignette does not seem to explain how to create the diagnostic plots (Fig1). Could you help me with that?
Cheers
Franz

Issue when model trainOcc in parallel

Hello,

When running the trainOcc in parallel, the follow error was prompted. But this error was not prompted when not using parallel mode. Attached a reproducible R code for your reference. Thank!

In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.**


library(oneClass)
library(tidyverse)

#register no of CPU cores used
doParallel:::registerDoParallel(7)

get the banana dataset

library(imbalance)
data(banana)
input_data <- banana

this is the default setting of trControl in trainOcc

cntrl <- trainControl(method = "cv",
number = 5,
summaryFunction = puSummary, #!
classProbs = TRUE, #!
savePredictions = TRUE, #!
returnResamp = "all", #!
allowParallel = TRUE)

tocc <- trainOcc(x=input_data [, -3], y=input_data [, 3], trControl=cntrl, method = "ocsvm")
Setting direction: controls > cases
Warning messages:
1: In .positiveLabel(y) : Positive label not given explicitly.
The positive class is assumed to be the one with smaller frequency.
2 (pos): 0 samples
2 (un): 2640 samples
2: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.

tocc
one-class svm

2640 samples
2 predictor
2 classes: 'un', 'pos'

No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 2111, 2112, 2112, 2113, 2112
Resampling results across tuning parameters:

sigma nu tpr puP ppp puAuc puF puF1 pn
1e-03 0.01 NaN 0 NaN 0 0 0 NaN
1e-03 0.05 NaN 0 NaN 0 0 0 NaN
1e-03 0.10 NaN 0 NaN 0 0 0 NaN
1e-03 0.15 NaN 0 NaN 0 0 0 NaN
1e-03 0.20 NaN 0 NaN 0 0 0 NaN
1e-03 0.25 NaN 0 NaN 0 0 0 NaN
1e-02 0.01 NaN 0 NaN 0 0 0 NaN
1e-02 0.05 NaN 0 NaN 0 0 0 NaN
1e-02 0.10 NaN 0 NaN 0 0 0 NaN
1e-02 0.15 NaN 0 NaN 0 0 0 NaN
1e-02 0.20 NaN 0 NaN 0 0 0 NaN
1e-02 0.25 NaN 0 NaN 0 0 0 NaN
1e-01 0.01 NaN 0 NaN 0 0 0 NaN
1e-01 0.05 NaN 0 NaN 0 0 0 NaN
1e-01 0.10 NaN 0 NaN 0 0 0 NaN
1e-01 0.15 NaN 0 NaN 0 0 0 NaN
1e-01 0.20 NaN 0 NaN 0 0 0 NaN
1e-01 0.25 NaN 0 NaN 0 0 0 NaN
1e+00 0.01 NaN 0 NaN 0 0 0 NaN
1e+00 0.05 NaN 0 NaN 0 0 0 NaN
1e+00 0.10 NaN 0 NaN 0 0 0 NaN
1e+00 0.15 NaN 0 NaN 0 0 0 NaN
1e+00 0.20 NaN 0 NaN 0 0 0 NaN
1e+00 0.25 NaN 0 NaN 0 0 0 NaN
1e+01 0.01 NaN 0 NaN 0 0 0 NaN
1e+01 0.05 NaN 0 NaN 0 0 0 NaN
1e+01 0.10 NaN 0 NaN 0 0 0 NaN
1e+01 0.15 NaN 0 NaN 0 0 0 NaN
1e+01 0.20 NaN 0 NaN 0 0 0 NaN
1e+01 0.25 NaN 0 NaN 0 0 0 NaN
1e+02 0.01 NaN 0 NaN 0 0 0 NaN
1e+02 0.05 NaN 0 NaN 0 0 0 NaN
1e+02 0.10 NaN 0 NaN 0 0 0 NaN
1e+02 0.15 NaN 0 NaN 0 0 0 NaN
1e+02 0.20 NaN 0 NaN 0 0 0 NaN
1e+02 0.25 NaN 0 NaN 0 0 0 NaN

puF was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.001 and nu = 0.01.

sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Traditional)_Hong Kong SAR.950 LC_CTYPE=Chinese (Traditional)_Hong Kong SAR.950
[3] LC_MONETARY=Chinese (Traditional)_Hong Kong SAR.950 LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Hong Kong SAR.950

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] oneClass_0.5.0 kernlab_0.9-29 pROC_1.16.2 caret_6.0-86 lattice_0.20-41 forcats_0.5.0
[7] stringr_1.4.0 dplyr_0.8.5 purrr_0.3.4 readr_1.3.1 tidyr_1.0.2 tibble_3.0.1
[13] ggplot2_3.3.0 tidyverse_1.3.0 imbalance_1.0.2.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 raster_3.1-5 xml2_1.3.1 magrittr_1.5 MASS_7.3-51.5
[6] splines_3.6.3 hms_0.5.3 rvest_0.3.5 tidyselect_1.0.0 colorspace_1.4-1
[11] R6_2.4.1 rlang_0.4.5 foreach_1.5.0 fansi_0.4.1 rgdal_1.4-8
[16] parallel_3.6.3 broom_0.5.6 dismo_1.1-4 gower_0.2.1 dbplyr_1.4.3
[21] modelr_0.1.6 withr_2.2.0 spatial.tools_1.6.2 ellipsis_0.3.0 iterators_1.0.12
[26] class_7.3-15 recipes_0.1.10 abind_1.4-5 assertthat_0.2.1 lifecycle_0.2.0
[31] Matrix_1.2-18 haven_2.2.0 mmap_0.6-19 sp_1.4-1 compiler_3.6.3
[36] cellranger_1.1.0 pillar_1.4.3 scales_1.1.0 backports_1.1.6 generics_0.0.2
[41] stats4_3.6.3 lubridate_1.7.8 jsonlite_1.6.1 pkgconfig_2.0.3 smotefamily_1.3.1
[46] rstudioapi_0.11 doParallel_1.0.15 munsell_0.5.0 prodlim_2019.11.13 httr_1.4.1
[51] plyr_1.8.6 tools_3.6.3 grid_3.6.3 nnet_7.3-12 ipred_0.9-9
[56] nlme_3.1-144 timeDate_3043.102 data.table_1.12.8 gtable_0.3.0 DBI_1.1.0
[61] cli_2.0.2 readxl_1.3.1 yaml_2.2.1 survival_3.1-12 crayon_1.3.4
[66] lava_1.6.7 reshape2_1.4.4 ModelMetrics_1.2.2.2 codetools_0.2-16 fs_1.4.1
[71] vctrs_0.2.4 rpart_4.1-15 glue_1.4.0 reprex_0.3.0 stringi_1.4.6


ONECLASS_ERROR.zip

evaluateOcc strange behaviour

I have multiple .R pipelines from 2018 and 2019 that relied on oneClass and are currently not working at the evaluateOcc() step.
In fact, I just tried to run the notebook example from the package in a fresh installation of R and got the same error.

> ocsvm.ev <- evaluateOcc(ocsvm.fit, te.u=te.x, te.y=te.y, allModels=TRUE, positive=1)

Error in if ((class(newdata) == "rasterTiled" | .is.raster(newdata)) & :
the condition has length > 1

I have been trying to debug for a few days without success. Considering that the issue is not only affecting my datasets but also the examples provided by the package I wonder if this can be a new incompatibility between evaluateOcc function and the packages it relies on?

> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] doParallel_1.0.17 iterators_1.0.14  foreach_1.5.2     oneClass_0.5.0   
 [5] kernlab_0.9-31    pROC_1.18.0       caret_6.0-93      lattice_0.20-45  
 [9] e1071_1.7-12      zoo_1.8-11        ggplot2_3.3.6     maptools_1.1-5   
[13] raster_3.6-3      sp_1.5-0         

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9           lubridate_1.8.0      listenv_0.8.0       
 [4] class_7.3-20         assertthat_0.2.1     digest_0.6.29       
 [7] ipred_0.9-13         utf8_1.2.2           parallelly_1.32.1   
[10] R6_2.5.1             plyr_1.8.7           hardhat_1.2.0       
[13] stats4_4.2.1         pillar_1.8.1         rlang_1.0.6         
[16] rstudioapi_0.14      data.table_1.14.4    rpart_4.1.19        
[19] Matrix_1.5-1         splines_4.2.1        rgdal_1.5-32        
[22] foreign_0.8-83       gower_1.0.0          stringr_1.4.1       
[25] munsell_0.5.0        proxy_0.4-27         compiler_4.2.1      
[28] pkgconfig_2.0.3      dismo_1.3-9          globals_0.16.1      
[31] nnet_7.3-18          mmap_0.6-19          tidyselect_1.2.0    
[34] gridExtra_2.3        tibble_3.1.8         prodlim_2019.11.13  
[37] codetools_0.2-18     viridisLite_0.4.1    fansi_1.0.3         
[40] future_1.28.0        dplyr_1.0.10         withr_2.5.0         
[43] MASS_7.3-58.1        recipes_1.0.2        ModelMetrics_1.2.2.2
[46] grid_4.2.1           nlme_3.1-160         gtable_0.3.1        
[49] lifecycle_1.0.3      DBI_1.1.3            magrittr_2.0.3      
[52] scales_1.2.1         future.apply_1.9.1   cli_3.4.1           
[55] stringi_1.7.8        reshape2_1.4.4       viridis_0.6.2       
[58] timeDate_4021.106    spatial.tools_1.6.2  generics_0.1.3      
[61] vctrs_0.4.2          lava_1.7.0           tools_4.2.1         
[64] glue_1.6.2           purrr_0.3.5          abind_1.4-5         
[67] survival_3.4-0       colorspace_2.0-3     terra_1.6-17

Best,
J

metric puF and trControl

Following the vignette example, I found a issue adding the trainControl option of the Caret package.

tr.control <-trainControl(method="repeatedCV",number=10,repeats=5)
ocsvm.fit <- trainOcc(x=tr.x, y=tr.y, method="ocsvm", tuneGrid=tuneGrid, index=tr.index,trControl = tr.control)

Warning message: Te metric puF was not in the result set. Accuracy will be used instead.

Question for the tutorial

Hello, I have a question in the first example of "One-class classification in R with the oneClass package".... Do you train a model with 20 positive samples and use the 500 unlabeled to calculate the performance? and then find the best model?
Thank you

why maxent can not work?

Hi Benjamin,

Following the the example, I have a problem when using the maxent method . I just replaced the "ocsvm' to "maxent", but it did not work.

ocsvm.fit <- trainOcc(x=tr.x, y=tr.y, method="maxent", index=tr.index)
Error in { : 
  task 1 failed - "arguments imply differing number of rows: 0, 

Could you help me with that?
Cheers
Pulni

Qustions about the parameters of C+ and C-

Hi,
I have one question about the paremeters you set in your code.

In the function of bsvm, there are three parameters "sigma", "cNeg", and "cMultiplier", and these parameters were set by using ksvm function. see below

fit = function(x, y, wts, param, lev, last, weights, 
                      classProbs, ...) {
         cPos <- param$cNeg*param$cMultiplier
         ksvm(x = as.matrix(x), y = y,
              kernel = rbfdot,
              kpar = list(sigma = param$sigma),
              C = 1,
              class.weights=c("un" = param$cNeg, "pos" = cPos),
              prob.model = FALSE, #=class.probs
              ...)

However, I saw you set the cost C as the default value 1. C+ and C_ seem come from "class.weights". C+ = C_*cMultiplier. To my understanding, the class.weight is equal to "un"/"pos". From your code the class.weight can be finally transfer to 1/cMultiplier, so maybe the the C+ and C_ you set only can extend the range of the class.weight value. I mean if it is possible to use one parameter to define more values of class.weight when using grid search, rather than used two parameter to define the class.weight. Moreover, I was wondering if we also need to consider the COST, instead just set as default 1.

In the paper"uilding text classifiers using positive and unlabeled examples", they used SVMlight package. They controlled the C+ and C_ through the parameters c (Cost)and j (I guess it is the class.weights ), where c is C- and j =C+/C-. There were also few papers implemented BSVM by using e1071 package, and they also tuned the "Cost" and "weight.class". For example"Single-Species Detection With Airborne Imaging Spectroscopy Data: A Comparison of Support Vector Techniques"

Looking forward to your reply!

Thanks in advance and best wishes
Pulni

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.