Coder Social home page Coder Social logo

maldirppa's People

Contributors

japal avatar paoloribeca avatar sgibb avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

sgibb wt215 agodmer

maldirppa's Issues

Write unit tests for the most critical functions.

You use the package testthat in your DESCRIPTION file but doesn't create any unit test.
In my opinion it is very important for a scientific package to provide unit tests to ensure that all functions do what they are supposed to do.

Don't calculate summary statistics (median etc.) twice

Instead of calculating quantile, mean, median etc. twice to get each limit use temporary variables or vectorisation:

lim <- mean(est, na.rm=TRUE) + sd(est, na.rm=TRUE) + c(threshold, -threshold)

if (method == "boxplot"){
met <- "boxplot"
t <- quantile(est,probs=0.75,na.rm=T) + threshold*IQR(est,na.rm=T)
l <- quantile(est,probs=0.25,na.rm=T) - threshold*IQR(est,na.rm=T)}
if (method == "adj.boxplot"){
met <- "adj.boxplot"
t <- adjboxStats(est,coef=threshold)$fence[2]
l <- adjboxStats(est,coef=threshold)$fence[1]}
if (method == "ESD"){
met <- "ESD"
t <- mean(est,na.rm=T) + threshold*sd(est,na.rm=T)
l <- mean(est,na.rm=T) - threshold*sd(est,na.rm=T)}
if (method == "Hampel"){
met <- "Hampel"
t <- median(est,na.rm=T) + threshold*mad(est,na.rm=T)
l <- median(est,na.rm=T) - threshold*mad(est,na.rm=T)}
if (method == "RC"){
met <- "RC"
t <- median(est,na.rm=T) + threshold*Qn(est[!is.na(est)])
l <- median(est,na.rm=T) - threshold*Qn(est[!is.na(est)])}

Functions from mvoutlier

Why you copy&paste mvoutlier::arw and not just depend on the package and use the function from it?

arw <- function (x, m0, c0, alpha, pcrit)
{
n <- nrow(x)
p <- ncol(x)
if (missing(pcrit)) {
if (p <= 10)
pcrit <- (0.24 - 0.003 * p)/sqrt(n)
if (p > 10)
pcrit <- (0.252 - 0.0018 * p)/sqrt(n)
}
if (missing(alpha))
delta <- qchisq(0.975, p)
else delta <- qchisq(1 - alpha, p)
d2 <- mahalanobis(x, m0, c0)
d2ord <- sort(d2)
dif <- pchisq(d2ord, p) - (0.5:n)/n
i <- (d2ord >= delta) & (dif > 0)
if (sum(i) == 0)
alfan <- 0
else alfan <- max(dif[i])
if (alfan < pcrit)
alfan <- 0
if (alfan > 0)
cn <- max(d2ord[n - ceiling(n * alfan)], delta)
else cn <- Inf
w <- d2 < cn
if (sum(w) == 0) {
m <- m0
c <- c0
}
else {
m <- apply(x[w, ], 2, mean)
c1 <- as.matrix(x - rep(1, n) %*% t(m))
c <- (t(c1) %*% diag(w) %*% c1)/sum(w)
}
list(m = m, c = c, cn = cn, w = w)
}

(using your own version of code forces you to look for bugs and updates yourself)

Error in cmdscale (detectOutliers)

Hello,

Could you please help to solve this issue?

out <- detectOutliers(peaks, by = type$MosID)
Error in cmdscale(x.dist, k = floor(nrow(x)/2) - (i - 1)) :
'k' must be in {1, 2, .. n - 1}

Not sure if that's because my spectra is over the scale limit? Overall, I have 522 spectra but got left after screenSpectra about 483 spectra. I have no issues about this error with less number of spectra analysis.

I'm new here and appreciate with your help,

Thank you,

Best regards,
Rutaiwan

Pre-allocate memory

As I already described in #4.

Here are some examples that could be improved by pre-allocating memory:

d <- list()
for (i in 1:ncol(I)){
d[[i]] <- createMassSpectrum(mz,I[,i])
}

MALDIrppa/R/MSlist2.R

Lines 3 to 6 in 39adb98

d <- list()
for (i in 1:ncol(I)){
d[[i]] <- createMassPeaks(mz,I[,i])
}

MALDIrppa/R/MSlist.R

Lines 3 to 6 in 39adb98

d <- list()
for (i in 1:ncol(I)){
d[[i]] <- createMassSpectrum(mz,I[,i])
}

Already addressed by #4 :

peaks2 <- list()
for (i in 1:length(x)){
t_int <- intensity(x[[i]])
t_mas <- mass(x[[i]])
mini <- which(t_int < min)
t_int <- t_int[-mini]
t_mas <- t_mas[-mini]
peaks2[[i]] <- createMassPeaks(mass=t_mas,intensity=t_int)
}

Use your own functions

Is there a specific reason why you don't use your MSlist from MSlist.R in importSpectra but rewrite/copy&paste the code into the function?

Add a newline at the end of `summary.scSpectra`'s output

Just a minor point:

> summary(h)
(10 first mass spectra) 
          ID   A score   Class
1  160408F21 0.1683850 success
2  160408F22 0.1716699 success
3  160408F23 0.1887859 success
4  160408F24 0.1478541 success
5  160408G01 0.1479799 success
6  160408G02 0.1734508 success
7  160408G03 0.1772491 success
8  160408G04 0.1534341 success
9  160408G05 0.1716697 success
10 160408G06 0.1538116 success

----------------------------

Scale estimator: Q 
Method: adj.boxplot 
Threshold: 1.5 
Limits: [0.1196,0.2942] 
Deriv. order: 1 
Lambda: 0.5 
No. potentially faulty spectra: 19 (4.75 %)> plot(h)

expected output:

# ...
No. potentially faulty spectra: 19 (4.75 %)
> plot(h)

So there is just a \n needed.

Remove `==TRUE` and `==FALSE` in `if` statements

if (x==TRUE) is superfluous. if expects a condition to be TRUE or FALSE, that means if(x) is enough if x is logical (or numeric). Your current approach is evaluated twice. First x is compared to TRUE and secondly the if tests whether this result is TRUE.

Instead of if (x==FALSE) just use if (!x).

if (binary==TRUE) dist <- "binary"

Use if (binary) here.

if (binary==FALSE) {

Use if (!binary) here.

int[is.na(int)==TRUE] <- 0

Other lines with the same approach:

if (binary==TRUE) {

if (barplot==TRUE){

if (grid==TRUE){

if (barplot==TRUE){

if (inherits(x,"scSpectra")==FALSE) {

if (type%in%c("index","hist","casewise")==FALSE){

if (labels[1]==TRUE) {labels <- rownames(x$est.table)}

if (!all(unlist(lapply(x,function(x) inherits(x,"MassSpectrum")))==TRUE)){

if (method%in%c("boxplot","adj.boxplot","ESD","Hampel","RC")==FALSE){

if (estimator%in%c("Q","MAD")==FALSE){

if (inherits(object,"scSpectra")==FALSE) {

if (inherits(ncases,"numeric")==FALSE) {

if (!all(unlist(lapply(x,function(x) inherits(x,"MassSpectrum")))==TRUE)){

if ((binary==TRUE) | (format=="NEXUS")) {x <- (x!=0)*1}

if ((binary==TRUE) | (format=="FASTA")) {x <- (x!=0)*1}

Simplify `redResolution`

As mentioned in #4 MALDIquant offers the [ for MassSpectrum and MassPeaks objects. So the following code in redResolution could be simplified:

if (inherits(x,"list")){
for (i in 1:length(x)){
x[[i]]@intensity <- x[[i]]@intensity[seq(1,length(x[[i]]@intensity),by=by)]
x[[i]]@mass <- x[[i]]@mass[seq(1,length(x[[i]]@mass),by=by)]
}
}
else {x@intensity <- x@intensity[seq(1,length(x@intensity),by=by)]
x@mass <- x@mass[seq(1,length(x@mass),by=by)]
}

if (is.list(x)) {
   x <- lapply(x, function(xx)xx[seq(1, length(xx), by=by)])
} else
   x <- x[seq(1, length(xx), by=by)]
}

Use `&&` and `||` in `if` instead of `&` and `|`

&& is different from &, see ?"&&":

‘&’ and ‘&&’ indicate logical AND and ‘|’ and ‘||’ indicate
logical OR. The shorter form performs elementwise comparisons in
much the same way as arithmetic operators. The longer form
evaluates left to right examining only the first element of each
vector. Evaluation proceeds only until the result is determined.
The longer form is appropriate for programming control-flow and
typically preferred in ‘if’ clauses.

Lines:

if ((!is.null(by)) & (is.vector(by))) {

if ((!is.null(by)) & (is.data.frame(by) | is.matrix(by))) {

if ((!is.matrix(x)) & (!is.data.frame(x)) & (!is.list(x))) stop("x is not a valid object")

if ((!inherits(x,"list")) & (!inherits(x,"MassSpectrum"))) stop("x must be a MassSpectrum object")

if ((floor(by) != by) | (length(by)!=1)) stop("by must be an integer value")

if ((!is.null(meta)) & (is.vector(meta))){

if ((!is.null(meta)) & (is.data.frame(meta) | is.matrix(meta))){

if ((!inherits(x,"list")) & (!inherits(x,"MassSpectrum"))) stop("x must be a MassSpectrum object")

if (!is.matrix(x) && !is.data.frame(x)) stop("x must be matrix or data.frame.")

if ((binary==TRUE) | (format=="NEXUS")) {x <- (x!=0)*1}

if ((binary==TRUE) | (format=="FASTA")) {x <- (x!=0)*1}

ERROR: screenSpectra(spectra, meta = type)

Dear Javier,
reading your excellent vignette, I tried to follow your standard script and ran into an error:

library(MALDIrppa)
data(spectra)
data(type)
summarySpectra(spectra[1:10])
sc.results <- screenSpectra(spectra, meta = type)

Error in mcComp(x, doReflect, eps1 = eps1, eps2 = eps2, maxit = maxit, :
object 'mc_C' not found

traceback()

5: mcComp(x, doReflect, eps1 = eps1, eps2 = eps2, maxit = maxit, 
       trace.lev = trace.lev)
4: mc.default(x, ..., na.rm = TRUE)
3: mc(x, ..., na.rm = TRUE)
2: adjboxStats(est, coef = threshold)
1: screenSpectra(spectra, meta = type)

sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] MALDIrppa_1.0.1   lattice_0.20-35   wmtsa_2.0-3       robustbase_0.91-1 signal_0.7-6      MALDIquant_1.17

Do you see any problems at first sight? Am I using the wrong combination of packages?
Do the base packages interfere with dependencies of MALDIrppa?
This meassage from library(MALDIrppa) may be a hint...:

Attaching package: ‘signal’
The following objects are masked from ‘package:stats’:
    filter, poly

Attaching package: ‘robustbase’
The following object is masked from ‘package:stats’:
    sigma

Thanks for your support and with Best Wishes, Holger

Rename MSlist and MSlist2

IHMO MSlist and MSlist2 are bad function names. Just from reading the names I had no idea what the functions are doing. Even after reading the man page it is hard to remember that MSlist creates MassSpectrum and MSlist2 MassPeaks objects. Currently I have no good idea for a better name but maybe something like matrixToSpectra/matrixToPeaks ...

Don't use nested function definitions

Dear Javier,

please don't use nested function definitions. While it is valid R code it is generally considered as bad practice. Because the nested (private) functions life in the namespace/frame of the parent functions they are hard to debug, hard to test (unit tests) and could not be reused by other functions.

An example (functions nested in 2 levels):

detectOutliers <- function(x, by = NULL, binary = FALSE, ...){
if (!isMassPeaksList(x)) {
stop("x must be a list of MassPeaks class objects")
}
if ((!is.null(by)) & (is.vector(by))) {
if (length(by) != length(x)) {
stop("The number of elements in x and by do not match")
}
}
if ((!is.null(by)) & (is.data.frame(by) | is.matrix(by))) {
if (nrow(by) != length(x)) {
stop("The number of elements in x and by do not match")
}
}
if (binary==TRUE) dist <- "binary"
adapt.out <- function (x, quan = 1/2, alpha = 0.025, ...)
# Adaptive multivariate outlier detection: tailored version from mvoutlier package
{
arw <- function (x, m0, c0, alpha, pcrit)

Instead of hiding the functions inside another function you could hide your private functions by not exporting them in the NAMESPACE file.

Question regarding `importSpectra`

Do you know MALDIquantForeign::importCsv? Do you want to avoid to depend on MALDIquantForeign or are there any other reasons why you do not can/want to use MALDIquantForeign::importCsv?

Use `isMassSpectrumList`

There are still some pieces of code that could use isMassSpectrumList:

if ((!inherits(x,"list")) & (!inherits(x,"MassSpectrum"))) stop("x must be a MassSpectrum object")
if (inherits(x,"list")){
if (!all(unlist(lapply(x,function(x) inherits(x,"MassSpectrum")))==TRUE)){
stop("x must be a list of MassSpectrum objects")}
}

if ((!inherits(x,"list")) & (!inherits(x,"MassSpectrum"))) stop("x must be a MassSpectrum object")
if (inherits(x,"list")){
if (!all(unlist(lapply(x,function(x) inherits(x,"MassSpectrum")))==TRUE)){
stop("x must be a list of MassSpectrum objects")}
}

Obviously I don't catch all with #1 .

transformIntensity vs transfIntensity

While I totally understand why you create tranfIntensity I don't understand why you use

spectra <- transfIntensity(spectra, fun = function(x) sqrt(x))

instead of

spectra <- transformIntensity(spectra, "sqrt")

in the vignette?

In earlier versions of MALDIquant there was a fun argument in transformIntensity that accepted user-defined functions. We removed it because many users of MALDIquant don't understand the concept of anonymous functions like function(x)sqrt(x) (BTW: transfIntensity(spectra, fun=sqrt) should be sufficient here). Maybe it was not the best decision we made.

Don't directly access the slots of MALDIquant objects but use their accessors

It is good practice to use the accessor methods instead of the slots of a class. See for example: https://stackoverflow.com/questions/9900134/is-it-bad-practice-to-access-s4-objects-slots-directly-using and especially Martin Morgan's answer: https://stackoverflow.com/a/9900822

Beside being good practice it avoids errors (because the accessors add some additional checks), e.g.:

library("MALDIquant")

s <- createMassSpectrum(mass=1:5, intensity=1:5)

## won't work, checks for equal length
intensity(s) <- 1:10
# Error in `intensity<-`(`*tmp*`, value = 1:10) :
#  Lengths of intensity(5) and value (10) have to be equal.

## works, no additional checks
s@intensity <- 1:10

## but now the object is broken
validObject(s)
# Error in validObject(s) :
#  invalid class “MassSpectrum” object: Lengths of mass (5) and intensity (10) have to be equal.

So use mass(x), intensity(x), snr(x) and metaData(x) instead of x@mass, x@intensity, x@metaData, and x@snr wherever possible (the would be some edge cases where you have to access the slots directly).

x[[i]]@metaData[[pos]] <- as.character(metadata[i])

x[[i]]@intensity <- x[[i]]@intensity[seq(1,length(x[[i]]@intensity),by=by)]

x[[i]]@mass <- x[[i]]@mass[seq(1,length(x[[i]]@mass),by=by)]

else {x@intensity <- x@intensity[seq(1,length(x@intensity),by=by)]

x@mass <- x@mass[seq(1,length(x@mass),by=by)]

lapply(x,function(x) x@snr)

NmzVal <- sapply(x,function(x) length(x@mass))

RmzValmin <- sapply(x,function(x) round(min(x@mass),digits))

RmzValmax <- sapply(x,function(x) round(max(x@mass),digits))

RintValmin <- sapply(x,function(x) round(min(x@intensity),digits))

RintValmean <- sapply(x,function(x) round(mean(x@intensity),digits))

RintValsd <- sapply(x,function(x) round(sd(x@intensity),digits))

RintValmed <- sapply(x,function(x) round(median(x@intensity),digits))

RintValmad <- sapply(x,function(x) round(mad(x@intensity),digits))

RintValmax <- sapply(x,function(x) round(max(x@intensity),digits))

RsnrValmin <- sapply(x,function(x) round(min(x@snr),digits))

RsnrValmean <- sapply(x,function(x) round(mean(x@snr),digits))

RsnrValsd <- sapply(x,function(x) round(sd(x@snr),digits))

RsnrValmed <- sapply(x,function(x) round(median(x@snr),digits))

RsnrValmad <- sapply(x,function(x) round(mad(x@snr),digits))

RsnrValmax <- sapply(x,function(x) round(max(x@snr),digits))

NmzVal <- sapply(x,function(x) length(x@mass))

RmzValmin <- sapply(x,function(x) round(min(x@mass),digits))

RmzValmax <- sapply(x,function(x) round(max(x@mass),digits))

RintValmin <- sapply(x,function(x) round(min(x@intensity),digits))

RintValmean <- sapply(x,function(x) round(mean(x@intensity),digits))

RintValsd <- sapply(x,function(x) round(sd(x@intensity),digits))

RintValmed <- sapply(x,function(x) round(median(x@intensity),digits))

RintValmad <- sapply(x,function(x) round(mad(x@intensity),digits))

RintValmax <- sapply(x,function(x) round(max(x@intensity),digits))

x[[i]]@intensity <- fun(x[[i]]@intensity)

if (!all(is.finite(x[[i]]@intensity))) x[[i]]@intensity <- rep(0,length(x[[i]]@intensity)) # Deals with flats

else {x@intensity <- fun(x@intensity)

if (!all(is.finite(x@intensity))) x@intensity <- rep(0,length(x@intensity))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.