lutzhamel / popsom Goto Github PK
View Code? Open in Web Editor NEWR package for self-organizing maps
License: GNU General Public License v3.0
R package for self-organizing maps
License: GNU General Public License v3.0
Hi, Thanks for the package.
Can you please let me know if the topology of the popsom map is toroidal or planar? Also, is the grid lattice rectangular or hexagonal?
Thanks!
I have difficulties confirming the performance claims (see the experiments below). However, it may well be that the experiments performed are oversimplified and/or poorly specified (they are mainly based on default settings).
It would be very helpful if you could provide instructions and examples that can be used to test the performance. I also suggest to include an example illustrating the performance improvements in the software paper.
library(popsom)
library(som)
library(kohonen)
library(MASS)
library(microbenchmark)
library(ggplot2)
# method wrappers
pop_wrp <- function(dat, ...) popsom::map(as.data.frame(dat), ...)
som_wrp <- function(dat, ...) som::som(dat, ...)
koh_wrp <- function(dat, ...) kohonen::som(as.matrix(dat), ...)
# data sets
# iris
data(iris)
df_iris <- subset(iris, select = -Species)
# wines from package kohonen
data(wines)
df_wines <- scale(wines)
# synthetic data with three clusters
p <- 10
n <- 500
siglarg <- diag(rep(1, p * p), p, p)
means <- c(0, -50, 50)
clusts <- lapply(means, function(mu) mvrnorm(n = n, mu = rep(mu, p), Sigma = siglarg))
df_sim <- do.call(rbind, clusts)
datsets <- list(df_iris, df_wines, df_sim)
bmr <- lapply(datsets,
function(dat) microbenchmark(pop_wrp(dat, train = 1000),
som_wrp(dat, xdim = 10, ydim = 5), # no default values for xdim and ylim. set to popsom defaults
koh_wrp(dat)))
ggplot2::autoplot(bmr[[1]]) + ggplot2::ggtitle("Wine data")
#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.
ggplot2::autoplot(bmr[[2]]) + ggplot2::ggtitle("Iris data")
#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.
ggplot2::autoplot(bmr[[3]]) + ggplot2::ggtitle("Simulated data")
#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 19.2
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_3.3.5 microbenchmark_1.4-7 MASS_7.3-54
#> [4] kohonen_3.0.10 som_0.3-5.1 popsom_5.2
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.7 compiler_4.1.0 pillar_1.6.1 highr_0.9
#> [5] viridis_0.6.1 tools_4.1.0 dotCall64_1.0-1 digest_0.6.27
#> [9] viridisLite_0.4.0 evaluate_0.14 lifecycle_1.0.0 tibble_3.1.2
#> [13] gtable_0.3.0 pkgconfig_2.0.3 rlang_0.4.11 reprex_2.0.1
#> [17] cli_3.0.0 rstudioapi_0.13 yaml_2.2.1 spam_2.7-0
#> [21] xfun_0.24 gridExtra_2.3 withr_2.4.2 stringr_1.4.0
#> [25] dplyr_1.0.6 knitr_1.33 maps_3.3.0 fields_12.5
#> [29] generics_0.1.0 fs_1.5.0 vctrs_0.3.8 tidyselect_1.1.1
#> [33] grid_4.1.0 glue_1.4.2 R6_2.5.0 hash_2.2.6.1
#> [37] fansi_0.5.0 rmarkdown_2.8 farver_2.1.0 purrr_0.3.4
#> [41] magrittr_2.0.1 scales_1.1.1 htmltools_0.5.1.1 ellipsis_0.3.2
#> [45] colorspace_2.0-2 utf8_1.2.1 stringi_1.6.2 munsell_0.5.0
#> [49] crayon_1.4.1
Created on 2021-08-09 by the reprex package (v2.0.1)
Summary does not contain a high-level description of the package functionality other than that it provides an implementation of self-organising maps and that a self-organising map is an artificial neural network designed for unsupervised learning. This is insufficient information to get a high-level idea of what SOMs are and what they are commonly used for.
The authors statement of need is more of a results section. The only statement of need in this paragraph is: "Training a self-organizing map is time consuming." Authors should expand on how training SOMs is time consuming: which application domains? what is the dimensionality of such datasets resulting in the slow execution time?
The documentation and the included examples already make the use of popsom
very easy. To make the package even more accessible, I suggest including a brief explanation of the summary output and the starburst plot in section Usage of the software paper.
I get this error when trying to install the popsom package on macos Monterey 12.2.1:
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘popsom’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/00LOCK-popsom/00new/popsom/libs/popsom.so':
dlopen(/Library/Frameworks/R.framework/Versions/4.1/Resources/library/00LOCK-popsom/00new/popsom/libs/popsom.so, 0x0006): symbol not found in flat namespace '__gfortran_os_error_at'
Error: loading failed
Execution halted
I wonder if the correct FORTRAN library is being used? One search suggested the error can be caused by using the wrong version of the FORTRAN library. It appears that libgfortran.5.dylib
is being used.
Version being installed: popsom_6.0.tar.gz
My sessionInfo():
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] conos_1.4.5 leidenAlg_1.0.1 igraph_1.2.11 Matrix_1.4-0
loaded via a namespace (and not attached):
[1] circlize_0.4.14 shape_1.4.6 GetoptLong_1.0.5 tidyselect_1.1.2 purrr_0.3.4 lattice_0.20-45
[7] colorspace_2.0-3 vctrs_0.3.8 generics_0.1.2 stats4_4.1.2 sccore_1.0.1 utf8_1.2.2
[13] rlang_1.0.1 pillar_1.7.0 glue_1.6.2 DBI_1.1.2 BiocGenerics_0.40.0 RColorBrewer_1.1-2
[19] matrixStats_0.61.0 foreach_1.5.2 lifecycle_1.0.1 munsell_0.5.0 Matrix.utils_0.9.8 gtable_0.3.0
[25] GlobalOptions_0.1.2 codetools_0.2-18 ComplexHeatmap_2.10.0 IRanges_2.28.0 doParallel_1.0.17 parallel_4.1.2
[31] fansi_1.0.2 Rcpp_1.0.8 BiocManager_1.30.16 scales_1.1.1 grr_0.9.5 S4Vectors_0.32.3
[37] gridExtra_2.3 rjson_0.2.21 ggplot2_3.3.5 png_0.1-7 digest_0.6.29 Rtsne_0.15
[43] dplyr_1.0.8 ggrepel_0.9.1 grid_4.1.2 clue_0.3-60 cli_3.2.0 tools_4.1.2
[49] magrittr_2.0.2 tibble_3.1.6 cluster_2.1.2 crayon_1.5.0 pkgconfig_2.0.3 ellipsis_0.3.2
[55] assertthat_0.2.1 iterators_1.0.14 R6_2.5.1 compiler_4.1.2
Running map
sometimes triggers an error, see the following MREs.
(Some) error conditions this was observed for
xdim
and ydim
library(popsom)
#>
#> Attaching package: 'popsom'
#> The following objects are masked from 'package:stats':
#>
#> fitted, predict
#> The following object is masked from 'package:base':
#>
#> summary
data(iris)
df <- subset(iris, select = -Species)
labels = subset(iris, select = Species)
# triggers error
m <- map(df, labels, train = 100, seed = 10)
#> Error in map$unique.centroids[[cluster.ix]]: subscript out of bounds
m <- map(df, labels, train = 10, seed = 1)
#> Error in map$unique.centroids[[cluster.ix]]: subscript out of bounds
# does not trigger error
m <- map(df, labels, train = 101, seed = 10)
m <- map(df, labels, train = 100, seed = 1)
m <- map(df, labels, xdim = 15, ydim = 10, train = 100, seed = 10)
Microbenchmarks
microbenchmark::microbenchmark(map(df, labels, train = 100))
#> Error in map$unique.centroids[[cluster.ix]]: subscript out of bounds
microbenchmark::microbenchmark(map(df, labels, train = 451))
#> Error in map$unique.centroids[[cluster.ix]]: subscript out of bounds
microbenchmark::microbenchmark(map(df, labels, train = 1000))
#> Unit: milliseconds
#> expr min lq mean median uq
#> map(df, labels, train = 1000) 299.2242 315.6117 331.4985 323.6654 334.924
#> max neval
#> 463.2775 100
Session info
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 19.2
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] popsom_5.2
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_4.1.0 pillar_1.6.1 highr_0.9
#> [4] viridis_0.6.1 tools_4.1.0 dotCall64_1.0-1
#> [7] digest_0.6.27 viridisLite_0.4.0 evaluate_0.14
#> [10] lifecycle_1.0.0 tibble_3.1.2 gtable_0.3.0
#> [13] pkgconfig_2.0.3 rlang_0.4.11 reprex_2.0.1
#> [16] cli_3.0.0 rstudioapi_0.13 microbenchmark_1.4-7
#> [19] yaml_2.2.1 spam_2.7-0 xfun_0.24
#> [22] gridExtra_2.3 withr_2.4.2 stringr_1.4.0
#> [25] dplyr_1.0.6 knitr_1.33 maps_3.3.0
#> [28] fields_12.5 generics_0.1.0 fs_1.5.0
#> [31] vctrs_0.3.8 tidyselect_1.1.1 grid_4.1.0
#> [34] glue_1.4.2 R6_2.5.0 hash_2.2.6.1
#> [37] fansi_0.5.0 rmarkdown_2.8 purrr_0.3.4
#> [40] ggplot2_3.3.5 magrittr_2.0.1 scales_1.1.1
#> [43] htmltools_0.5.1.1 ellipsis_0.3.2 colorspace_2.0-2
#> [46] utf8_1.2.1 stringi_1.6.2 munsell_0.5.0
#> [49] crayon_1.4.1
Created on 2021-08-09 by the reprex package (v2.0.1)
The references in the manuscript fall in one of following categories:
Please update the manuscript to address the following issues:
The documentation of map
states that alpha
, "the learning rate, should be a positive non-zero real number."
However, a non-zero positive learning rate can result in an error with a rather uninformative error message (see MRE below).
I suggest to provide a more meaningful error message, for example -- if this is the case -- that the learning rate is chosen to large.
library(popsom)
data(iris)
df <- subset(iris, select = -Species)
labels <- subset(iris ,select = Species)
# triggers error
m <- popsom::map(df, labels, xdim = 15, ydim = 10, train = 10000, alpha = 2.3, seed = 42)
#> Error in ks.test(map.df[[i]], data.df[[i]]): not enough 'x' data
# does not trigger error
m <- popsom::map(df, labels, xdim = 15, ydim = 10, train = 10000, alpha = 2.2, seed = 42)
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 19.2
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] popsom_5.2
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_4.1.0 pillar_1.6.1 highr_0.9
#> [4] viridis_0.6.1 tools_4.1.0 dotCall64_1.0-1
#> [7] digest_0.6.27 viridisLite_0.4.0 evaluate_0.14
#> [10] lifecycle_1.0.0 tibble_3.1.2 gtable_0.3.0
#> [13] pkgconfig_2.0.3 rlang_0.4.11 reprex_2.0.1
#> [16] cli_3.0.0 rstudioapi_0.13 microbenchmark_1.4-7
#> [19] yaml_2.2.1 spam_2.7-0 xfun_0.24
#> [22] gridExtra_2.3 withr_2.4.2 stringr_1.4.0
#> [25] dplyr_1.0.6 knitr_1.33 maps_3.3.0
#> [28] fields_12.5 generics_0.1.0 fs_1.5.0
#> [31] vctrs_0.3.8 tidyselect_1.1.1 grid_4.1.0
#> [34] glue_1.4.2 R6_2.5.0 hash_2.2.6.1
#> [37] fansi_0.5.0 rmarkdown_2.8 purrr_0.3.4
#> [40] ggplot2_3.3.5 magrittr_2.0.1 scales_1.1.1
#> [43] htmltools_0.5.1.1 ellipsis_0.3.2 colorspace_2.0-2
#> [46] utf8_1.2.1 stringi_1.6.2 munsell_0.5.0
#> [49] crayon_1.4.1
Created on 2021-08-09 by the reprex package (v2.0.1)
Implement a summary.map
for the generic summary
. TODO: what should this function display?
For future submissions insert references about the methods in the Description field in the form Authors (year) doi:10..... or arXiv:.....?
In my opinion, it would be good practice to include installation instructions in the Readme, for example:
Install the last release from CRAN:
install.packages("popsom")
Moreover, I think a note on how to contribute/report bugs is missing.
The package consists of a very large R file which makes navigation of the functions in the package difficult. Following common R practices, I suggest splitting this file up into multiple R files, ideally one file per function.
Please consider generating the man/* files using roxygen2. Having the documentation close to the functions allow developers to easily interpret the functionality of each function.
Functions popsom::summary
, popsom::predict
and popsom::fitted
mask S3 generics.
This is very inconvenient as it breaks method dispatch for objects of other classes when popsom
is loaded.
Appropriate S3 methods should therefore be implemented.
With this in mind, I also suggest to move the starburst
functionality into a S3 plot method and to add a S3 print method.
fix the copy content on the map man pages:
The authors compare the performance of their method to the som and kohonen packages.
figure out why this masking is happening. I assume that fittted and predict are implementations of the generic functions fitted and predict.
Paragraphs should consist of full sentences. Partial sentences are used when referring to pieces of code or when citing articles.
Example of incomplete sentence due to code block:
This is easily verified with a scatter plot matrix of the iris dataset using
-code block-
and shown in Figure 2.
Example of incomplete sentence due to citation:
A number of R-packages exist that implement self-organizing maps including (Wehrens & Kruisselbrink, 2018) and (Yan, 2016).
If you omit everything between parentheses, the sentence that remains is incomplete:
A number of R-packages exist that implement self-organizing maps including and.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.