Coder Social home page Coder Social logo

popsom's People

Contributors

lutzhamel avatar meiger00 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

popsom's Issues

Topology of map

Hi, Thanks for the package.

Can you please let me know if the topology of the popsom map is toroidal or planar? Also, is the grid lattice rectangular or hexagonal?

Thanks!

Performance claims cannot be confirmed (using simple experiments)

I have difficulties confirming the performance claims (see the experiments below). However, it may well be that the experiments performed are oversimplified and/or poorly specified (they are mainly based on default settings).
It would be very helpful if you could provide instructions and examples that can be used to test the performance. I also suggest to include an example illustrating the performance improvements in the software paper.

library(popsom)
library(som)
library(kohonen)
library(MASS)
library(microbenchmark)
library(ggplot2)

# method wrappers

pop_wrp <- function(dat, ...) popsom::map(as.data.frame(dat), ...)
som_wrp <- function(dat, ...) som::som(dat, ...)
koh_wrp <- function(dat, ...) kohonen::som(as.matrix(dat), ...)

# data sets

# iris
data(iris)
df_iris <- subset(iris, select = -Species)

# wines from package kohonen
data(wines)
df_wines <- scale(wines)

# synthetic data with three clusters
p <- 10
n <- 500
siglarg <- diag(rep(1, p * p), p, p)
means <- c(0, -50, 50)

clusts <- lapply(means, function(mu) mvrnorm(n = n, mu = rep(mu, p), Sigma = siglarg))
df_sim <- do.call(rbind, clusts)


datsets <- list(df_iris, df_wines, df_sim)
bmr <- lapply(datsets, 
              function(dat) microbenchmark(pop_wrp(dat, train = 1000),
                                           som_wrp(dat, xdim = 10, ydim = 5), # no default values for xdim and ylim. set to popsom defaults
                                           koh_wrp(dat)))

ggplot2::autoplot(bmr[[1]]) + ggplot2::ggtitle("Wine data")
#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.

ggplot2::autoplot(bmr[[2]]) + ggplot2::ggtitle("Iris data")
#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.

ggplot2::autoplot(bmr[[3]]) + ggplot2::ggtitle("Simulated data")
#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 19.2
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.3.5        microbenchmark_1.4-7 MASS_7.3-54         
#> [4] kohonen_3.0.10       som_0.3-5.1          popsom_5.2          
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.7        compiler_4.1.0    pillar_1.6.1      highr_0.9        
#>  [5] viridis_0.6.1     tools_4.1.0       dotCall64_1.0-1   digest_0.6.27    
#>  [9] viridisLite_0.4.0 evaluate_0.14     lifecycle_1.0.0   tibble_3.1.2     
#> [13] gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.11      reprex_2.0.1     
#> [17] cli_3.0.0         rstudioapi_0.13   yaml_2.2.1        spam_2.7-0       
#> [21] xfun_0.24         gridExtra_2.3     withr_2.4.2       stringr_1.4.0    
#> [25] dplyr_1.0.6       knitr_1.33        maps_3.3.0        fields_12.5      
#> [29] generics_0.1.0    fs_1.5.0          vctrs_0.3.8       tidyselect_1.1.1 
#> [33] grid_4.1.0        glue_1.4.2        R6_2.5.0          hash_2.2.6.1     
#> [37] fansi_0.5.0       rmarkdown_2.8     farver_2.1.0      purrr_0.3.4      
#> [41] magrittr_2.0.1    scales_1.1.1      htmltools_0.5.1.1 ellipsis_0.3.2   
#> [45] colorspace_2.0-2  utf8_1.2.1        stringi_1.6.2     munsell_0.5.0    
#> [49] crayon_1.4.1

Created on 2021-08-09 by the reprex package (v2.0.1)

Provide a high-level description of SOM

Summary does not contain a high-level description of the package functionality other than that it provides an implementation of self-organising maps and that a self-organising map is an artificial neural network designed for unsupervised learning. This is insufficient information to get a high-level idea of what SOMs are and what they are commonly used for.

Statement of need is missing

The authors statement of need is more of a results section. The only statement of need in this paragraph is: "Training a self-organizing map is time consuming." Authors should expand on how training SOMs is time consuming: which application domains? what is the dimensionality of such datasets resulting in the slow execution time?

__gfortran_os_error_at Error installing popsom

I get this error when trying to install the popsom package on macos Monterey 12.2.1:

** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘popsom’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/00LOCK-popsom/00new/popsom/libs/popsom.so':
  dlopen(/Library/Frameworks/R.framework/Versions/4.1/Resources/library/00LOCK-popsom/00new/popsom/libs/popsom.so, 0x0006): symbol not found in flat namespace '__gfortran_os_error_at'
Error: loading failed
Execution halted

I wonder if the correct FORTRAN library is being used? One search suggested the error can be caused by using the wrong version of the FORTRAN library. It appears that libgfortran.5.dylib is being used.

Version being installed: popsom_6.0.tar.gz

My sessionInfo():

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] conos_1.4.5     leidenAlg_1.0.1 igraph_1.2.11   Matrix_1.4-0   

loaded via a namespace (and not attached):
 [1] circlize_0.4.14       shape_1.4.6           GetoptLong_1.0.5      tidyselect_1.1.2      purrr_0.3.4           lattice_0.20-45      
 [7] colorspace_2.0-3      vctrs_0.3.8           generics_0.1.2        stats4_4.1.2          sccore_1.0.1          utf8_1.2.2           
[13] rlang_1.0.1           pillar_1.7.0          glue_1.6.2            DBI_1.1.2             BiocGenerics_0.40.0   RColorBrewer_1.1-2   
[19] matrixStats_0.61.0    foreach_1.5.2         lifecycle_1.0.1       munsell_0.5.0         Matrix.utils_0.9.8    gtable_0.3.0         
[25] GlobalOptions_0.1.2   codetools_0.2-18      ComplexHeatmap_2.10.0 IRanges_2.28.0        doParallel_1.0.17     parallel_4.1.2       
[31] fansi_1.0.2           Rcpp_1.0.8            BiocManager_1.30.16   scales_1.1.1          grr_0.9.5             S4Vectors_0.32.3     
[37] gridExtra_2.3         rjson_0.2.21          ggplot2_3.3.5         png_0.1-7             digest_0.6.29         Rtsne_0.15           
[43] dplyr_1.0.8           ggrepel_0.9.1         grid_4.1.2            clue_0.3-60           cli_3.2.0             tools_4.1.2          
[49] magrittr_2.0.2        tibble_3.1.6          cluster_2.1.2         crayon_1.5.0          pkgconfig_2.0.3       ellipsis_0.3.2       
[55] assertthat_0.2.1      iterators_1.0.14      R6_2.5.1              compiler_4.1.2       

Bug in function `map`

Running map sometimes triggers an error, see the following MREs.

(Some) error conditions this was observed for

  • smaller numbers of training iterations (see microbenchmarks)
  • default values for xdim and ydim
  • depends on random initialization
library(popsom)
#> 
#> Attaching package: 'popsom'
#> The following objects are masked from 'package:stats':
#> 
#>     fitted, predict
#> The following object is masked from 'package:base':
#> 
#>     summary

data(iris)
df <- subset(iris, select = -Species)
labels = subset(iris, select = Species)

# triggers error
m <- map(df, labels, train = 100, seed = 10) 
#> Error in map$unique.centroids[[cluster.ix]]: subscript out of bounds

m <- map(df, labels, train = 10, seed = 1)
#> Error in map$unique.centroids[[cluster.ix]]: subscript out of bounds

# does not trigger error
m <- map(df, labels, train = 101, seed = 10)
m <- map(df, labels, train = 100, seed = 1) 
m <- map(df, labels, xdim = 15, ydim = 10, train = 100, seed = 10)

Microbenchmarks

microbenchmark::microbenchmark(map(df, labels, train = 100))
#> Error in map$unique.centroids[[cluster.ix]]: subscript out of bounds
microbenchmark::microbenchmark(map(df, labels, train = 451))
#> Error in map$unique.centroids[[cluster.ix]]: subscript out of bounds
microbenchmark::microbenchmark(map(df, labels, train = 1000))
#> Unit: milliseconds
#>                           expr      min       lq     mean   median      uq
#>  map(df, labels, train = 1000) 299.2242 315.6117 331.4985 323.6654 334.924
#>       max neval
#>  463.2775   100

Session info

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 19.2
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] popsom_5.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.1.0       pillar_1.6.1         highr_0.9           
#>  [4] viridis_0.6.1        tools_4.1.0          dotCall64_1.0-1     
#>  [7] digest_0.6.27        viridisLite_0.4.0    evaluate_0.14       
#> [10] lifecycle_1.0.0      tibble_3.1.2         gtable_0.3.0        
#> [13] pkgconfig_2.0.3      rlang_0.4.11         reprex_2.0.1        
#> [16] cli_3.0.0            rstudioapi_0.13      microbenchmark_1.4-7
#> [19] yaml_2.2.1           spam_2.7-0           xfun_0.24           
#> [22] gridExtra_2.3        withr_2.4.2          stringr_1.4.0       
#> [25] dplyr_1.0.6          knitr_1.33           maps_3.3.0          
#> [28] fields_12.5          generics_0.1.0       fs_1.5.0            
#> [31] vctrs_0.3.8          tidyselect_1.1.1     grid_4.1.0          
#> [34] glue_1.4.2           R6_2.5.0             hash_2.2.6.1        
#> [37] fansi_0.5.0          rmarkdown_2.8        purrr_0.3.4         
#> [40] ggplot2_3.3.5        magrittr_2.0.1       scales_1.1.1        
#> [43] htmltools_0.5.1.1    ellipsis_0.3.2       colorspace_2.0-2    
#> [46] utf8_1.2.1           stringi_1.6.2        munsell_0.5.0       
#> [49] crayon_1.4.1

Created on 2021-08-09 by the reprex package (v2.0.1)

Reduce self-citations and refer to relevant work

The references in the manuscript fall in one of following categories:

  • author's own publications (Hamel 2018, Hamel 2016, Hamel 2011, Hamel 2021, Tatoian 2018)
  • publications on which the author's work is directly based (Kohonen 2001, Ultsch 1990)
  • publications on which the author's work is directly compared to (Yan 2016, Wehrens 2018)
  • references to applications of SOM in the field (Liakos 2018, Mathys 2019, Matić 2018, Miller 1996)

Please update the manuscript to address the following issues:

  • Reduce the number of self-citations to <25%. Currently 38% of citations are self-citations-
  • The citations which are meant to give a broad overview of the applications of SOM in the field are very superficial: "Self-organising maps have been applied in virtually every scientific discipline where some sort of data exploration or analysis is necessary." This is a very broad statement which is hard to verify. Please elaborate a bit on common applications of SOM. For example, FlowSOM uses SOMs to analyse Flow Cytometry data to gain insight into the expression profiles of smaller cell populations which might otherwise be missed (Van Gassen et al, Cytometry A, 2015).

Uninformative error message if learning rate is not adequately specified

The documentation of map states that alpha, "the learning rate, should be a positive non-zero real number."
However, a non-zero positive learning rate can result in an error with a rather uninformative error message (see MRE below).
I suggest to provide a more meaningful error message, for example -- if this is the case -- that the learning rate is chosen to large.

library(popsom)
data(iris)
df <- subset(iris, select = -Species)
labels <- subset(iris ,select = Species)

# triggers error
m <- popsom::map(df, labels, xdim = 15, ydim = 10, train = 10000, alpha = 2.3, seed = 42)
#> Error in ks.test(map.df[[i]], data.df[[i]]): not enough 'x' data

# does not trigger error
m <- popsom::map(df, labels, xdim = 15, ydim = 10, train = 10000, alpha = 2.2, seed = 42)
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 19.2
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] popsom_5.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.1.0       pillar_1.6.1         highr_0.9           
#>  [4] viridis_0.6.1        tools_4.1.0          dotCall64_1.0-1     
#>  [7] digest_0.6.27        viridisLite_0.4.0    evaluate_0.14       
#> [10] lifecycle_1.0.0      tibble_3.1.2         gtable_0.3.0        
#> [13] pkgconfig_2.0.3      rlang_0.4.11         reprex_2.0.1        
#> [16] cli_3.0.0            rstudioapi_0.13      microbenchmark_1.4-7
#> [19] yaml_2.2.1           spam_2.7-0           xfun_0.24           
#> [22] gridExtra_2.3        withr_2.4.2          stringr_1.4.0       
#> [25] dplyr_1.0.6          knitr_1.33           maps_3.3.0          
#> [28] fields_12.5          generics_0.1.0       fs_1.5.0            
#> [31] vctrs_0.3.8          tidyselect_1.1.1     grid_4.1.0          
#> [34] glue_1.4.2           R6_2.5.0             hash_2.2.6.1        
#> [37] fansi_0.5.0          rmarkdown_2.8        purrr_0.3.4         
#> [40] ggplot2_3.3.5        magrittr_2.0.1       scales_1.1.1        
#> [43] htmltools_0.5.1.1    ellipsis_0.3.2       colorspace_2.0-2    
#> [46] utf8_1.2.1           stringi_1.6.2        munsell_0.5.0       
#> [49] crayon_1.4.1

Created on 2021-08-09 by the reprex package (v2.0.1)

Split up package code to improve readability

The package consists of a very large R file which makes navigation of the functions in the package difficult. Following common R practices, I suggest splitting this file up into multiple R files, ideally one file per function.

Please consider generating the man/* files using roxygen2. Having the documentation close to the functions allow developers to easily interpret the functionality of each function.

S3 generics get masked

Functions popsom::summary, popsom::predict and popsom::fitted mask S3 generics.

This is very inconvenient as it breaks method dispatch for objects of other classes when popsom is loaded.
Appropriate S3 methods should therefore be implemented.

With this in mind, I also suggest to move the starburst functionality into a S3 plot method and to add a S3 print method.

Elaborate on performance claims

The authors compare the performance of their method to the som and kohonen packages.

  • Authors claim measured speedups of 60× in comparison to som (Yan, 2016). However, according to the author's own time measurements, this speedup is closer to 15×. Can you provide more insight into how
  • Can authors also provide insight into the scalability of the algorithm for the execution time in function of input dataset size (number of observations and number of features).

Improve quality of writing

Paragraphs should consist of full sentences. Partial sentences are used when referring to pieces of code or when citing articles.

Example of incomplete sentence due to code block:

This is easily verified with a scatter plot matrix of the iris dataset using
-code block-
and shown in Figure 2.

Example of incomplete sentence due to citation:

A number of R-packages exist that implement self-organizing maps including (Wehrens & Kruisselbrink, 2018) and (Yan, 2016).

If you omit everything between parentheses, the sentence that remains is incomplete:

A number of R-packages exist that implement self-organizing maps including and.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.