lcolladotor / derfinder Goto Github PK

Annotation-agnostic differential expression analysis of RNA-seq data via expressed regions-level or single base-level approaches

Home Page: http://lcolladotor.github.io/derfinder

R 99.37% CSS 0.63%

rstats bioconductor derfinder rnaseq annotation-agnostic r

derfinder's Introduction

derfinder

Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach. This package contains two different implementations of this approach. The first one is the single base-level F-statistics implementation and the second one is via identifying expressed regions. For more information about derfinder check the vignettes here.

Documentation

For more information about derfinder check the vignettes through Bioconductor or at the documentation website.

Further documentation

You can generate HTML reports from the results using regionReport available here.

Installation instructions

Get the latest stable R release from CRAN. Then install derfinder from Bioconductor using the following code:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("derfinder")

Citation

Below is the citation output from using citation('derfinder') in R. Please run this yourself to check for any updates on how to cite derfinder.

print(citation("derfinder"), bibtex = TRUE)
#> To cite package 'derfinder' in publications use:
#> 
#>   Collado-Torres L, Nellore A, Frazee AC, Wilks C, Love MI, Langmead B,
#>   Irizarry RA, Leek JT, Jaffe AE (2017). "Flexible expressed region
#>   analysis for RNA-seq with derfinder." _Nucl. Acids Res._.
#>   doi:10.1093/nar/gkw852 <https://doi.org/10.1093/nar/gkw852>,
#>   <http://nar.oxfordjournals.org/content/early/2016/09/29/nar.gkw852>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {Flexible expressed region analysis for RNA-seq with derfinder},
#>     author = {Leonardo Collado-Torres and Abhinav Nellore and Alyssa C. Frazee and Christopher Wilks and Michael I. Love and Ben Langmead and Rafael A. Irizarry and Jeffrey T. Leek and Andrew E. Jaffe},
#>     year = {2017},
#>     journal = {Nucl. Acids Res.},
#>     doi = {10.1093/nar/gkw852},
#>     url = {http://nar.oxfordjournals.org/content/early/2016/09/29/nar.gkw852},
#>   }
#> 
#>   Frazee AC, Sabunciyan S, Hansen KD, Irizarry RA, Leek JT (2014).
#>   "Differential expression analysis of RNA-seq data at single-base
#>   resolution." _Biostatistics_, *15 (3)*, 413-426.
#>   doi:10.1093/biostatistics/kxt053
#>   <https://doi.org/10.1093/biostatistics/kxt053>,
#>   <http://biostatistics.oxfordjournals.org/content/15/3/413.long>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {Differential expression analysis of RNA-seq data at single-base resolution},
#>     author = {Alyssa C. Frazee and Sarven Sabunciyan and Kasper D. Hansen and Rafael A. Irizarry and Jeffrey T. Leek},
#>     year = {2014},
#>     journal = {Biostatistics},
#>     volume = {15 (3)},
#>     pages = {413-426},
#>     doi = {10.1093/biostatistics/kxt053},
#>     url = {http://biostatistics.oxfordjournals.org/content/15/3/413.long},
#>   }
#> 
#>   Collado-Torres L, Jaffe AE, Leek JT (2017). _derfinder:
#>   Annotation-agnostic differential expression analysis of RNA-seq data
#>   at base-pair resolution via the DER Finder approach_.
#>   doi:10.18129/B9.bioc.derfinder
#>   <https://doi.org/10.18129/B9.bioc.derfinder>,
#>   https://github.com/lcolladotor/derfinder - R package version 1.35.0,
#>   <http://www.bioconductor.org/packages/derfinder>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {derfinder: Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach},
#>     author = {Leonardo Collado-Torres and Andrew E. Jaffe and Jeffrey T. Leek},
#>     year = {2017},
#>     url = {http://www.bioconductor.org/packages/derfinder},
#>     note = {https://github.com/lcolladotor/derfinder - R package version 1.35.0},
#>     doi = {10.18129/B9.bioc.derfinder},
#>   }

Please note that the derfinder was only made possible thanks to many other R and bioinformatics software authors, which are cited either in the vignettes and/or the paper(s) describing this package.

DER Finder versions

The original implementation of the DER Finder approach as published in Frazee et al, Biostatistics 2014 is available via GitHub at derfinder.
The version implementing the single base-level approach via calculating F-stastics as described in the pre-print Collado-Torres et al, Nucleic Acids Research 2017 is available via Bioconductor at derfinder. The same package has the functions required for the expressed regions-level approach.

Code of Conduct

Please note that the derfinder project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Development tools

Continuous code testing is possible thanks to GitHub actions through usethis, remotes, sysreqs and rcmdcheck customized to use Bioconductor’s docker containers and BiocCheck.
Code coverage assessment is possible thanks to codecov and covr.
The documentation website is automatically updated thanks to pkgdown.
The code is styled automatically thanks to styler.
The documentation is formatted thanks to devtools and roxygen2.

For more details, check the dev directory.

This package was developed using biocthis.

derfinder's People

Contributors

Stargazers

Watchers

Forkers

8iterations jtleek alyssafrazee ttriche andrewejaffe arturochian ramsinghlab alexg9010 cyang-2014 c1au6i0 fallinwind

derfinder's Issues

create a UCSC Track Hub from derfinder results

it would eventually be nice to have code that takes adjusted coverage, derfinder results, and the genome-wide F-statistic used to find the DERs, and automatically create a UCSC Track Hub. Other inputs would maybe need to be genome build (although it could perhaps be inferred from the genome info on the DERs)

The general structure involves two files called hub.txt and genomes.txt then a folder named as the genome build, which contains BigWigs and bigBeds. These are from my track hub from our nature neuroscience paper: https://s3.amazonaws.com/DLPFC_n36/humanDLPFC/hub.txt . I don't think it would be that hard to at least make a skeleton Track Hub using a simple wrapper.

Example hub.txt

hub humanDLPFC
shortLabel LIBD Human DLPFC Development
longLabel RNAseq data across human brain development by age group from LIBD
genomesFile genomes.txt
email [email protected]
descriptionUrl http://www.libd.org

Example genomes.txt
genome hg19
trackDb hg19/trackDb.txt

Then there is a folder called hg19/ that contains trackDb.txt, example:
track DERs
bigDataUrl DERs.bigBed
shortLabel DERs
longLabel Differentially expressed regions (DERs) across brain development
type bigBed 5 .
priority 1
visibility dense

track sixGroup_F
bigDataUrl sixGroup_Fstat_DLPFC_LIBD.bw
shortLabel sixGroup_F
longLabel F-statistic for 36 samples across 6 age groups
type bigWig
priority 2
visibility full
maxHeightPixels 100:32:8
yLineOnOff on
yLineMark 20.51
viewLimitsMax 0:200
viewLimits 0:50
autoscale on

track sixGroupMulti
shortLabel sixGroupDLPFC
longLabel Comparison in human DLPFC across six age groups
container multiWig
aggregate none
showSubtrackColorOnUi on
type bigWig
priority 3
maxHeightPixels 100:32:8
visibility full
yLineOnOff on
yLineMark 5

    track FetalOverlay
    parent sixGroupMulti
    shortLabel Fetal DLPFC
    longLabel Normalized RNAseq coverage (Reads/80M) averaged across 6 fetal brains 
    color 27,158,119
    type bigWig
    priority 2
    bigDataUrl Fetal_DLPFC_LIBD_meanExp.bw

    track InfantOverlay
    shortLabel Infant DLPFC
    longLabel Normalized RNAseq coverage (Reads/80M) averaged across 6 infant brains 
    parent sixGroupMulti
    color 217,95,2
    type bigWig 
    priority 3
    bigDataUrl Infant_DLPFC_LIBD_meanExp.bw

    track ChildOverlay
    parent sixGroupMulti
    shortLabel Child DLPFC
    longLabel Normalized RNAseq coverage (Reads/80M) averaged across 6 child brains 
    color 117,112,179
    type bigWig 
    priority 4
    bigDataUrl Child_DLPFC_LIBD_meanExp.bw

    track TeenOverlay
    parent sixGroupMulti
    shortLabel Teen DLPFC
    longLabel Normalized RNAseq coverage (Reads/80M) averaged across 6 teen brains 
    color 231,41,138
    type bigWig
    priority 5
    bigDataUrl Teen_DLPFC_LIBD_meanExp.bw

    track AdultOverlay
    parent sixGroupMulti
    shortLabel Adult DLPFC 
    longLabel Normalized RNAseq coverage (Reads/80M) averaged across 6 adult brains 
    color 102,166,30
    type bigWig
    priority 6
    bigDataUrl Adult_DLPFC_LIBD_meanExp.bw

    track 50plusOverlay
    parent sixGroupMulti
    shortLabel 50plus DLPFC
    longLabel Normalized RNAseq coverage (Reads/80M) averaged across 6 elderly brains 
    color 230,171,2
    type bigWig
    bigDataUrl 50plus_DLPFC_LIBD_meanExp.bw
    priority 7

extra installation steps required

Depending on the user's current package configuration, installing the package directly from GitHub may fail. I had to do the following in order to install derfinder:

install Rcpp and RcppArmadillo using install.packages
install qvalue and ggbio using biocLite
install (from source) the devel version of bumphunter. Currently at 1.3.6, but minimum version required is 1.3.3
install gfortran (fortran compiler for Mac - Xcode does not come with one) - available here. gfortran is required by Rcpp.

After this, install_github('derfinder','lcolladotor') succeeds.

Is it possible to automate these installation steps or clarify in the docs that this stuff has to be pre-installed?

Problem with fullCoverage function

Hullo

I am trying to use derfinder to find differentially enriched regions in some human histone mark chip-seq data. I have bam files that were created with STAR and indexed with samtools. I have an odd issue that when I call fullCoverage on a file and specify chromosome "1" it works fine (it also works for chromosome "20", "21", "X" and "Y" but if I try chromosome "2" or "10" I get the following error

fulCov <- fullCoverage(bamFiles,"2",cutoff=2)
2015-03-03 13:15:30 fullCoverage: processing chromosome 2
2015-03-03 13:15:30 loadCoverage: finding chromosome lengths
Error: 1 errors; first error:
Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges"): solving row 2: range cannot be determined from the supplied arguments (too many NAs)

For more information, use bplasterror(). To resume calculation, re-call
the function and set the argument 'BPRESUME' to TRUE or wrap the
previous call in bpresume().

First traceback:
20: fullCoverage(bamFiles, "2", cutoff = 2)
19: bplapply(seq_len(length(chrs)), loadChr, files = files, chrs = chrs,
bai = bai, chrlens = chrlens, outputs = outputs, cutoff = cutoff,
mc.cores.load = mc.cores.load, ..., BPPARAM = BPPARAM)
18: bplapply(seq_len(length(chrs)), loadChr, files = files, chrs = chrs,
bai = bai, chrlens = chrlens, outputs = outputs, cutoff = cutoff,
mc.cores.load = mc.cores.load, ..., BPPARAM = BPPARAM)
17: lapply(X, FUN, ...)
16: lapply(X, FUN, ...)
15: FUN(1L[[1L]], ...)
14: .try(FUN(...))
13: withRestarts(withCallingHandler

any advice would be gratefully received

Pietà

sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_GB LC_NUMERIC=C LC_TIME=en_GB
[4] LC_COLLATE=en_GB LC_MONETARY=en_GB LC_MESSAGES=en_GB
[7] LC_PAPER=en_GB LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel grDevices datasets splines graphics utils
[8] stats grid methods base

other attached packages:
[1] BiocInstaller_1.16.1 IRanges_1.99.31 S4Vectors_0.2.5
[4] BiocGenerics_0.11.5 derfinder_0.99.5 knitr_1.6
[7] Hmisc_3.14-5 Formula_1.1-2 survival_2.37-7
[10] lattice_0.20-29

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3 AnnotationDbi_1.27.19 base64enc_0.1-2
[4] BatchJobs_1.4 BBmisc_1.7 Biobase_2.25.0
[7] BiocParallel_0.99.25 biomaRt_2.21.5 Biostrings_2.33.14
[10] bitops_1.0-6 brew_1.0-6 bumphunter_1.5.5
[13] checkmate_1.4 cluster_1.15.3 codetools_0.2-9
[16] DBI_0.3.1 derfinderHelper_0.99.2 digest_0.6.4
[19] doRNG_1.6 evaluate_0.5.5 fail_1.2
[22] foreach_1.4.2 foreign_0.8-61 formatR_1.0
[25] futile.logger_1.3.7 futile.options_1.0.0 GenomeInfoDb_1.1.25
[28] GenomicAlignments_1.1.30 GenomicFeatures_1.17.20 GenomicFiles_1.1.19
[31] GenomicRanges_1.17.46 iterators_1.0.7 lambda.r_1.1.6
[34] latticeExtra_0.6-26 locfit_1.5-9.1 Matrix_1.1-4
[37] matrixStats_0.10.0 nnet_7.3-8 pkgmaker_0.22
[40] qvalue_1.39.1 R.methodsS3_1.6.1 RColorBrewer_1.0-5
[43] RCurl_1.95-4.3 registry_0.2 rngtools_1.2.4
[46] rpart_4.1-8 Rsamtools_1.17.34 RSQLite_0.11.4
[49] rtracklayer_1.25.19 sendmailR_1.2-1 stringr_0.6.2
[52] tools_3.1.1 XML_3.98-1.1 xtable_1.7-4
[55] XVector_0.5.8 zlibbioc_1.11.1

Please see issue

Hi @lcolladotor,
Refer to the issue below:
leekgroup/derfinder#17

Coverage matrix for regions

Hi ,
Thank you so much for the package. I'm wondering how to load .bw files and calculate coverage matrix for specified custom regions. I couldn't find this in the documentation. It would be really helpful if I could know how to do this.

Thanks and regards,
Prakrithi

error in calculatePvalues() when fstats.output appears to be a list

2013-12-11 07:38:25 calculatePvalues: calculating F-statistics for permutation 309
Error in unlist(RleList(fstats.output), use.names = FALSE) :
  error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in .Call2("Rle_constructor", values, lengths, check, 0L, PACKAGE = "IRanges") :
  Rle of type 'list' is not supported
Calls: RleList ... Rle -> .local -> Rle -> Rle -> .local -> .Call2 -> .Call
Calls: analyzeChr -> calculatePvalues -> unlist
Execution halted

Could be related to #9

error in calculatePvalues() when fstats.output is NULL

2013-12-11 03:58:31 calculatePvalues: calculating F-statistics for permutation 253
Error in unlist(RleList(fstats.output), use.names = FALSE) :
  error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in as(from, class) :
  no method or default for coercing "NULL" to "Rle"
Calls: RleList ... .coerceToList -> lapply -> lapply -> FUN -> setNames -> .as -> as
Calls: analyzeChr -> calculatePvalues -> unlist
Execution halted

doesn't load on R 3.3

something in VariantAnnotation broke our package :(

> library(derfinder)
Error in dyn.load(file, DLLpath = DLLpath, ...) :
  unable to load shared object '/jhpce/shared/community/compiler/gcc/4.4.7/R/3.3.x/lib64/R/site-library/VariantAnnotation/libs/VariantAnnotation.so':
  /jhpce/shared/community/compiler/gcc/4.4.7/R/3.3.x/lib64/R/site-library/VariantAnnotation/libs/VariantAnnotation.so: undefined symbol: gzopen64
Error: package or namespace load failed for ‘derfinder’
> sessionInfo()
R version 3.3.1 Patched (2016-09-30 r71426)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8                AnnotationDbi_1.36.0
 [3] XVector_0.14.0             GenomicRanges_1.26.1
 [5] BiocGenerics_0.20.0        zlibbioc_1.20.0
 [7] GenomicAlignments_1.10.0   IRanges_2.8.1
 [9] BiocParallel_1.8.1         BSgenome_1.42.0
[11] lattice_0.20-34            GenomeInfoDb_1.10.1
[13] tools_3.3.1                SummarizedExperiment_1.4.0
[15] parallel_3.3.1             grid_3.3.1
[17] Biobase_2.34.0             DBI_0.5-1
[19] digest_0.6.10              Matrix_1.2-7.1
[21] rtracklayer_1.34.1         S4Vectors_0.12.1
[23] bitops_1.0-6               biomaRt_2.30.0
[25] RCurl_1.95-4.8             memoise_1.0.0
[27] RSQLite_1.1                GenomicFeatures_1.26.0
[29] Biostrings_2.42.1          Rsamtools_1.26.1
[31] XML_3.98-1.5               stats4_3.3.1
>

Width of devtools::session_info()

Change the width so all the info appears on the same line instead of split in 2, which makes it hard to read when there are lots of packages loaded.

Improve doc: filterData() after fullCoverage()

Emphasize that you need to use filterData() after fullCoverage() (when using default cutoff=NULL) before using analyzeChr(). Otherwise you would attempt to calculate F-stats in all bases of the genome including those for which there is no data in any of the samples.

Adopt GenomeInfoDb

This new package http://www.bioconductor.org/packages/devel/bioc/vignettes/GenomeInfoDb/inst/doc/GenomeInfoDb.pdf seems like it could be very useful to adopt to avoid the problem of dealing with different ways to specify the chromosomes.

Will most likely not adopt it before May

BiocParallel errors when running regionMat()

Hello,

Thank you for making this tool. I would like to use derfinder starting from a set of bam files. I have loaded the files using load coverage and it ran without problems:

lC <- loadCoverage(
files=files,
chr="chr21",
cutoff = NULL,
filter = "one",
chrlen = NULL,
output = NULL,
bai = NULL
)

I then tried to run regionMat without any parallel specification but still I get an error referring to BiocParallel:

regionMat <- regionMatrix(
fullCov=lC,
cutoff = 5,
L = rep(500,length(files)),
totalMapped = 8e+07,
targetSize = 8e+07,
runFilter = TRUE,
returnBP = TRUE
)

By using totalMapped equal to targetSize, regionMatrix() assumes that you have normalized the data already in fullCoverage(), loadCoverage() or filterData().
2023-10-27 02:32:30 regionMatrix: processing coverage
2023-10-27 02:32:30 filterData: normalizing coverage
2023-10-27 02:32:31 filterData: done normalizing coverage
2023-10-27 02:32:40 filterData: originally there were 48129895 rows, now there are 3229037 rows. Meaning that 93.29 percent was filtered.
2023-10-27 02:32:40 findRegions: identifying potential segments
2023-10-27 02:32:40 findRegions: segmenting information
2023-10-27 02:32:40 findRegions: identifying candidate regions
extendedMapSeqlevels: the 'seqnames' you supplied are currently not supported in GenomeInfoDb. Consider adding your genome by following the information at http://www.bioconductor.org/packages/release/bioc/vignettes/GenomeInfoDb/inst/doc/Accept-organism-for-GenomeInfoDb.pdf
2023-10-27 02:32:40 findRegions: identifying region clusters
extendedMapSeqlevels: the 'seqnames' you supplied are currently not supported in GenomeInfoDb. Consider adding your genome by following the information at http://www.bioconductor.org/packages/release/bioc/vignettes/GenomeInfoDb/inst/doc/Accept-organism-for-GenomeInfoDb.pdf
extendedMapSeqlevels: the 'seqnames' you supplied are currently not supported in GenomeInfoDb. Consider adding your genome by following the information at http://www.bioconductor.org/packages/release/bioc/vignettes/GenomeInfoDb/inst/doc/Accept-organism-for-GenomeInfoDb.pdf
2023-10-27 02:32:41 getRegionCoverage: processing coverage
2023-10-27 02:32:45 getRegionCoverage: done processing coverage
2023-10-27 02:32:45 regionMatrix: calculating coverageMatrix
2023-10-27 02:32:48 regionMatrix: adjusting coverageMatrix for 'L'
2023-10-27 02:32:48 regionMatrix: processing position
2023-10-27 02:32:48 filterData: normalizing coverage
Error: BiocParallel errors
1 remote errors, element index: 2
0 unevaluated and other errors
first remote error: zero-length inputs cannot be mixed with those of non-zero length

Are you familiar with this error? If there is other info I should provide, please let me know.

All the best,
Ivan

namespace issues re: `tibble`?

It looks like the tibble package dependency is off somewhere - this was a fresh derfinder install where maybe 10 other packages were installed. manually installing the tibble package resolves the issues.
-a

> library(derfinder)
Error: package or namespace load failed for ‘derfinder’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
 there is no package called ‘tibble’
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] derfinder_1.10.4           BiocInstaller_1.26.0       SummarizedExperiment_1.6.1
 [4] DelayedArray_0.2.2         matrixStats_0.52.2         Biobase_2.36.2            
 [7] GenomicRanges_1.28.1       GenomeInfoDb_1.12.0        IRanges_2.10.1            
[10] S4Vectors_0.14.1           BiocGenerics_0.22.0        edgeR_3.18.1              
[13] limma_3.32.2               jaffelab_0.99.10           rafalib_1.0.0             

loaded via a namespace (and not attached):
 [1] splines_3.4.0            foreach_1.4.3            GenomicFiles_1.12.0      Formula_1.2-1           
 [5] bumphunter_1.16.0        latticeExtra_0.6-28      doRNG_1.6.6              BSgenome_1.44.0         
 [9] GenomeInfoDbData_0.99.0  Rsamtools_1.28.0         RSQLite_1.1-2            backports_1.0.5         
[13] lattice_0.20-35          digest_0.6.12            RColorBrewer_1.1-2       XVector_0.16.0          
[17] checkmate_1.8.2          qvalue_2.8.0             colorspace_1.3-2         htmltools_0.3.6         
[21] Matrix_1.2-10            plyr_1.8.4               XML_3.98-1.7             biomaRt_2.32.0          
[25] zlibbioc_1.22.0          xtable_1.8-2             scales_0.4.1             BiocParallel_1.10.1     
[29] htmlTable_1.9            tibble_1.3.0             pkgmaker_0.22            ggplot2_2.2.1           
[33] GenomicFeatures_1.28.0   nnet_7.3-12              lazyeval_0.2.0           survival_2.41-3         
[37] magrittr_1.5             memoise_1.1.0            foreign_0.8-68           tools_3.4.0             
[41] registry_0.3             data.table_1.10.4        stringr_1.2.0            munsell_0.4.3           
[45] locfit_1.5-9.1           rngtools_1.2.4           cluster_2.0.6            AnnotationDbi_1.38.0    
[49] Biostrings_2.44.0        compiler_3.4.0           grid_3.4.0               RCurl_1.95-4.8          
[53] iterators_1.0.8          VariantAnnotation_1.22.0 htmlwidgets_0.8          bitops_1.0-6            
[57] base64enc_0.1-3          derfinderHelper_1.10.0   gtable_0.2.0             codetools_0.2-15        
[61] DBI_0.6-1                reshape2_1.4.2           GenomicAlignments_1.12.1 gridExtra_2.2.1         
[65] knitr_1.15.1             rtracklayer_1.36.0       Hmisc_4.0-3              stringi_1.1.5           
[69] Rcpp_0.12.10             rpart_4.1-11             acepack_1.4.1

Add a way to set advanced options

Given the feedback from http://lcolladotor.github.io/2014/12/11/dots/#.VYDMQmRViko and with Dan via emails, I think that it would be best to have a way to set the values for the advanced options and document them there. This reduces the clutter on the other man pages, but still provides full documentation instead of how advancedArg() currently points users to GitHub.

plotCluster() example fails due to a ggbio::autoplot() error

Autoplot error described at lawremi/ggbio#47

`targetSize` and `totalMapped` should be required inputs for `regionMatrix`

in order to ensure the cutoff is comparable across samples...

add bigwig support?

should be fairly straightforward, instead of:

align = Rsamtools::readGAlignments(bam)
coverage(align)

you can just do:
rtracklayer::import.bw(bigwig, asRle=TRUE)

which is already a coverage RleList. Note there is also support for going chromosome by chromosome with the which input like Rsamtools.

it seems faster than reading in the bams as well

plotCluster() not working on R 3.0.2

This is because of an error in ggbio::plotIdeogram() in R 3.0.2

Check lawremi/ggbio#45 for more information

Missing derfinderPlot dependency

Waiting for leekgroup/derfinderPlot#1 to be resolved before adding derfinderPlot to the suggested packages.

[Help required] setting cutoff

Hello there
It's not really a feature request, more a "help required"

I'm using derfinder to reannotate the Rat genome using RNA-seq data. I'm running fullcoverage() and regionmatrix() to id all the genomic region corresponding to transcript. However, fixing a unique cutoff value is problematic:
For weakly expressed genes, I lose the regions
For highly expressed genes, pre-transcript reads are not filtered enough and are identified as new regions.

I would like to know if there is a way to identify only region with a burst or a drop in reads according to surrounding background ?

Best
David

Warnings when using coverageToExon()

Hm... I don't know how to deal with this warning. Specially since running the coverageToExon() code manually (that is, looping manually instead of using mclapply) does not reproduce the warnings. It could be a NAMESPACE issue.

I'll most likely need help from the Bioc-devel mailing list.

Anyhow, here are the warnings:

> example("coverageToExon", "derfinder")

cvrgTE> ## Obtain fullCov object
cvrgTE> fullCov <- list("21"=genomeDataRaw$coverage)

cvrgTE> ## Use only the first two exons
cvrgTE> smallGenomicState <- genomicState

cvrgTE> smallGenomicState$fullGenome <- smallGenomicState$fullGenome[ which(smallGenomicState$fullGenome$theRegion == "exon")[1:2] ]

cvrgTE> ## Finally, get the coverage information for each exon
cvrgTE> exonCov <- coverageToExon(fullCov=fullCov, genomicState=smallGenomicState, L=100)
2014-04-14 15:21:55 coverageToExon: processing chromosome 21
Warning messages:
1: In (function ()  :
  Symbol 'cc' resolved from calling frame; escape with .(cc) for safety.
2: In (function ()  :
  Symbol 'chr' resolved from calling frame; escape with .(chr) for safety.
> options(warn=2)
> example("coverageToExon", "derfinder")

cvrgTE> ## Obtain fullCov object
cvrgTE> fullCov <- list("21"=genomeDataRaw$coverage)

cvrgTE> ## Use only the first two exons
cvrgTE> smallGenomicState <- genomicState

cvrgTE> smallGenomicState$fullGenome <- smallGenomicState$fullGenome[ which(smallGenomicState$fullGenome$theRegion == "exon")[1:2] ]

cvrgTE> ## Finally, get the coverage information for each exon
cvrgTE> exonCov <- coverageToExon(fullCov=fullCov, genomicState=smallGenomicState, L=100)
2014-04-14 15:22:04 coverageToExon: processing chromosome 21
Error in (function ()  : 
  (converted from warning) Symbol 'cc' resolved from calling frame; escape with .(cc) for safety.
> traceback()
27: doWithOneRestart(return(expr), restart)
26: withOneRestart(expr, restarts[[1L]])
25: withRestarts({
        .Internal(.signalCondition(simpleWarning(msg, call), msg, 
            call))
        .Internal(.dfltWarn(msg, call))
    }, muffleWarning = function() NULL)
24: .signalSimpleWarning("Symbol 'cc' resolved from calling frame; escape with .(cc) for safety.", 
        quote((function () 
        {
            val <- get(g, enclos)
            warning("Symbol '", g, "' resolved from calling frame; ", 
                "escape with .(", g, ") for safety.")
            val
        })()))
23: warning("Symbol '", g, "' resolved from calling frame; ", "escape with .(", 
        g, ") for safety.")
22: (function () 
    {
        val <- get(g, enclos)
        warning("Symbol '", g, "' resolved from calling frame; ", 
            "escape with .(", g, ") for safety.")
        val
    })()
21: eval(expr, envir, enclos)
20: eval(expr, as.env(envir, enclos))
19: eval(expr, as.env(envir, enclos))
18: eval(expr, envir, enclos)
17: eval(expr, envir, enclos)
16: safeEval(substitute(subset), x, top_prenv(subset))
15: .local(x, ...)
14: subset(fullCov[[chrnum]], cc[[chr]])
13: subset(fullCov[[chrnum]], cc[[chr]])
12: FUN(1L[[1L]], ...)
11: lapply(X = X, FUN = FUN, ...)
10: mclapply(seq(along = fullCov), .coverageToExonChrStep, fullCov = fullCov, 
        cc = cc, e = e, L = L, verbose = verbose, mc.cores = nCores)
9: FUN(X[[1L]], ...)
8: lapply(X = X, FUN = FUN, ...)
7: mclapply(strandIndexes, .coverageToExonStrandStep, fullCov = fullCov, 
       etab = etab, L = L, nCores = nCores, verbose = verbose, mc.cores = nCores)
6: coverageToExon(fullCov = fullCov, genomicState = smallGenomicState, 
       L = 100) at Rex148844c31d37e#15
5: eval(expr, envir, enclos)
4: eval(ei, envir)
3: withVisible(eval(ei, envir))
2: source(tf, local, echo = echo, prompt.echo = paste0(prompt.prefix, 
       getOption("prompt")), continue.echo = paste0(prompt.prefix, 
       getOption("continue")), verbose = verbose, max.deparse.length = Inf, 
       encoding = "UTF-8", skip.echo = skips, keep.source = TRUE)
1: example("coverageToExon", "derfinder")
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] derfinder_0.0.53

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.25.18     BatchJobs_1.2             BBmisc_1.5                Biobase_2.23.6           
 [5] BiocGenerics_0.9.3        BiocParallel_0.5.20       biomaRt_2.19.3            Biostrings_2.31.22       
 [9] biovizBase_1.11.15        bitops_1.0-6              brew_1.0-6                BSgenome_1.31.13         
[13] bumphunter_1.3.8          cluster_1.15.2            codetools_0.2-8           colorspace_1.2-4         
[17] DBI_0.2-7                 dichromat_2.0-0           digest_0.6.4              doRNG_1.5.5              
[21] fail_1.2                  foreach_1.4.1             Formula_1.1-1             GenomeInfoDb_0.99.31     
[25] GenomicAlignments_0.99.38 GenomicFeatures_1.15.15   GenomicRanges_1.15.44     ggbio_1.11.16            
[29] ggplot2_0.9.3.1           grid_3.1.0                gridExtra_0.9.1           gtable_0.1.2             
[33] Hmisc_3.14-3              IRanges_1.21.45           iterators_1.0.6           labeling_0.2             
[37] lattice_0.20-29           latticeExtra_0.6-26       locfit_1.5-9.1            MASS_7.3-31              
[41] matrixStats_0.8.14        munsell_0.4.2             parallel_3.1.0            pkgmaker_0.20            
[45] plyr_1.8.1                proto_0.3-10              qvalue_1.37.3             R.methodsS3_1.6.1        
[49] RColorBrewer_1.0-5        Rcpp_0.11.1               RCurl_1.95-4.1            registry_0.2             
[53] reshape2_1.2.2            rngtools_1.2.4            Rsamtools_1.15.41         RSQLite_0.11.4           
[57] rtracklayer_1.23.22       scales_0.2.3              sendmailR_1.1-2           splines_3.1.0            
[61] stats4_3.1.0              stringr_0.6.2             survival_2.37-7           tcltk_3.1.0              
[65] tools_3.1.0               VariantAnnotation_1.9.51  XML_3.98-1.1              xtable_1.7-3             
[69] XVector_0.3.7             zlibbioc_1.9.0           
>

bumphunter as import

hi,

as you have the code in analyzeChr():

 if(!is.null(regions$regions) & runAnnotation) {
                library("bumphunter")
                annotation <- annotateNearest(regions$regions, subject)
        } else {
                annotation <- NULL
        }

I would suggest adding bumphunter to the Import field in DESCRIPTION

best,

Mike

Need help with makeModels() testvar error

I'm trying to run the F-statistic version of derfinder. I follow the example in the docs, with only difference that my fullcov equivalent has five chromosomes instead of just one, and only two samples instead of a dozen. I've reached a dead end at attempting to make the models.

cv <- c("fpa7", "wt") 
sampleDepths <- sampleDepth(collapseFullCoverage(allcov), 1)
sampleDepths

## fpa7-1.100%  wt2-1.100% 
##   24.53941    24.42730 

demodels <- makeModels(sampleDepths, testvars=cv, adjustvars=NULL)

Error in makeModels(sampleDepths, testvars = cv, adjustvars = ifelse(length(cf) ==  : 
  The length of 'testvars' and the number of sample library size adjustments do not match.

The length of my testvars is equal to the number of columns I got out of sampleDepth(), and equal to the number of columns in each of the chromosome coverage tables in allcov. From what I understand from the user guide example, testvar is the category value of each sample. Two samples, two values.
The function documentation makes reference to columns in a coverageInfo$coverage but there is no such variable in my session or in the user manual example, and there is no indication where to find it.
So from the available documentation and examples I cannot figure out what I'm doing wrong.

filterData(cutoff = 0) fails in v0.0.69

Taken from derfinder?fullCoverage

library('derfinder')

datadir <- system.file('extdata', 'genomeData', package='derfinder')
dirs <- makeBamList(datadir=datadir, samplepatt='*accepted_hits.bam$',
    bamterm=NULL)
## Shorten the column names
names(dirs) <- gsub('_accepted_hits.bam', '', names(dirs))

## Read and filter the data
fullCov2 <- fullCoverage(dirs=dirs, chrs=c(1:22, 'X', 'Y'), cutoff=0)

This leads to:

Error: 19 errors; first error:
  Error in Reduce("|", RleList(data) > cutoff): error in evaluating the argument 'x' in selecting a method for function 'Reduce': Error in .Call2("Rle_constructor", values, lengths, check, 0L, PACKAGE = "S4Vectors") : 
  integer overflow while summing elements in 'lengths'

For more information, use bplasterror(). To resume calculation, re-call the function and set the argument 'BPRESUME' to TRUE or wrap the previous call in bpresume().

First traceback:
  18: fullCoverage(dirs = dirs, chrs = c(1:22, "X", "Y"), cutoff = 0)
  17: bplapply(seq_len(length(chrs)), loadChr, dirs = dirs, chrs = chrs, 
          bai = bai, chrlens = chrlens, outputs = outputs, verbose = verbose, 
          inputType = inputType, isMinusStrand = isMinusStrand, cutoff = cutoff, 
          filter = filter, returnMean = returnMean, returnCoverage = returnCoverage, 
          totalMapped = totalMapped, targetSize = targetSize, tilewidth = tilewidth, 
          mc.cores.load = mc.cor

Will revert the changes from f42c196

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] IRanges_1.99.25     S4Vectors_0.1.3     BiocGenerics_0.11.4 derfinder_0.0.69   

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.27.9     BatchJobs_1.3            BBmisc_1.7               Biobase_2.25.0           BiocParallel_0.99.11     biomaRt_2.21.1           Biostrings_2.33.13      
 [8] bitops_1.0-6             brew_1.0-6               bumphunter_1.5.5         checkmate_1.3            cluster_1.15.2           codetools_0.2-9          DBI_0.2-7               
[15] derfinderHelper_0.0.4    digest_0.6.4             doRNG_1.5.5              fail_1.2                 foreach_1.4.2            Formula_1.1-2            GenomeInfoDb_1.1.18     
[22] GenomicAlignments_1.1.26 GenomicFeatures_1.17.14  GenomicFiles_1.1.19      GenomicRanges_1.17.36    grid_3.1.1               Hmisc_3.14-4             iterators_1.0.7         
[29] lattice_0.20-29          latticeExtra_0.6-26      locfit_1.5-9.1           Matrix_1.1-4             matrixStats_0.10.0       pkgmaker_0.22            qvalue_1.39.1           
[36] R.methodsS3_1.6.1        RColorBrewer_1.0-5       Rcpp_0.11.2              RCurl_1.95-4.3           registry_0.2             rngtools_1.2.4           Rsamtools_1.17.33       
[43] RSQLite_0.11.4           rtracklayer_1.25.13      sendmailR_1.1-2          splines_3.1.1            stats4_3.1.1             stringr_0.6.2            survival_2.37-7         
[50] tools_3.1.1              XML_3.98-1.1             xtable_1.7-3             XVector_0.5.7            zlibbioc_1.11.1

filterData not providing feedback

UPDATE: The problem was when I had only one column. I got feedback correctly when I added my other datasets and tried filterData. END UPDATE

The example at here shows that there should be a number shown in now there are rows after are, before rows, such as ".., now there are 2256 rows." I am not seeing that. I see:

> filteredCov <- lapply(fullCov, filterData, cutoff = 10) 2017-05-02 18:35:37 filterData: originally there were 230218 rows, now there are rows.

I see that something happens when I run filterData as shown below. Maybe because I ran with only 1 column, the feedback mechanism is flawed? I just first obtained and used this from Bioconductor yesterday, so it should be a very recent version of derfinder.
What I see ( I removed spaces to get it to format as a code block):

filteredCov <- lapply(fullCov, filterData, cutoff = 1)
2017-05-02 18:34:36 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov <- lapply(fullCov, filterData, cutoff = 0.01)
2017-05-02 18:34:53 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov <- lapply(fullCov, filterData, cutoff = 10)
2017-05-02 18:35:37 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov <- lapply(fullCov, filterData, cutoff = 100)
2017-05-02 18:35:41 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov
$I
$I$coverage
numeric-Rle of length 226407 with 123033 runs
Lengths: 166 57 ... 7
Values : 152.528541903404 305.057083806807 ... 152.528541903404
$I$position
logical-Rle of length 230218 with 101 runs
Lengths: 159 74 8 75 660 ... 7 602 436 7 413
Values : FALSE TRUE FALSE TRUE FALSE ... FALSE TRUE FALSE TRUE FALSE
filteredCov <- lapply(fullCov, filterData, cutoff = 10)
2017-05-02 18:37:37 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov
$I
$I$coverage
numeric-Rle of length 226407 with 123033 runs
Lengths: 166 57 ... 7
Values : 152.528541903404 305.057083806807 ... 152.528541903404
$I$position
logical-Rle of length 230218 with 101 runs
Lengths: 159 74 8 75 660 ... 7 602 436 7 413
Values : FALSE TRUE FALSE TRUE FALSE ... FALSE TRUE FALSE TRUE FALSE
filteredCov <- lapply(fullCov, filterData, cutoff = 2)
2017-05-02 18:38:06 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov
$I
$I$coverage
numeric-Rle of length 226407 with 123033 runs
Lengths: 166 57 ... 7
Values : 152.528541903404 305.057083806807 ... 152.528541903404
$I$position
logical-Rle of length 230218 with 101 runs
Lengths: 159 74 8 75 660 ... 7 602 436 7 413
Values : FALSE TRUE FALSE TRUE FALSE ... FALSE TRUE FALSE TRUE FALSE
filteredCov <- lapply(fullCov, filterData, cutoff = 1002)
2017-05-02 18:39:16 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov
$I
$I$coverage
numeric-Rle of length 212193 with 121518 runs
Lengths: 1 7 ... 19
Values : 1067.69979332383 1220.22833522723 ... 1067.69979332383
$I$position
logical-Rle of length 230218 with 431 runs
Lengths: 1796 218 78 6 204 ... 7 5 98 6 1182
Values : FALSE TRUE FALSE TRUE FALSE ... FALSE TRUE FALSE TRUE FALSE`

How Fstats are generated in analyzeChr()

Hi,

I am using derfinder to compare two groups of samples (31 vs 17) and I want to optimize (and understand) my statistics. However, it is not clear to me how this is built in analyzeChr().

As it is based on F-modelling, is cutoffFstat the alpha or the Fstat threshold itself?

In the example I also see that when you switch from theoretical to empirical the cutoffFstat switches from 1-08 to 0.99, and I don't understand why.

Additionally, I would like to know which statistical method is used to calculate FWER, as I cannot find it anywhere.

Could you clarify all these aspects to me, please?

Thank you in advance.

Mónica

Question about the need for species and chromosome style

Does derfinder work exclusively with what is available in GenomeInfoDb?
I am trying to run analyzeChr() using a TxDb object that I built from a custom annotation GTF on which the alignments are based as well. But analyzeChr() fails unless the species and style are available in GenomeInfoDb.
There should be no need to know the species or the chromosome style or to query GenomeInfoDb about anything when the appropriate TxDb is provided directly. Am I misunderstanding what values the txdb argument takes? Can I not use a custom TxDb?

This is the stack trace without a recognised species (defaulting to human):

extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
2017-08-17 14:01:36 analyzeChr: Pre-processing the coverage data
 Hide Traceback
 
 Rerun with Debug
 Error: length(intersect(names(coverageInfo), c("coverage", "position"))) ==  .... is not TRUE 
9.
stop(msg, call. = FALSE, domain = NA) 
8.
stopifnot(length(intersect(names(coverageInfo), c("coverage", 
    "position"))) == 2) 
7.
preprocessCoverage(coverageInfo = coverageInfo, groupInfo = groupInfo, 
    cutoff = cutoffPre, lowMemDir = lowMemDir, ...) 
6.
(function (chr, coverageInfo, models, cutoffPre = 5, cutoffFstat = 1e-08, 
    cutoffType = "theoretical", nPermute = 1, seeds = as.integer(gsub("-", 
        "", Sys.Date())) + seq_len(nPermute), groupInfo, txdb = NULL, 
    writeOutput = TRUE, runAnnotation = TRUE, lowMemDir = file.path(chr,  ... 
5.
do.call(analyzeChr, c(list(chr = x, coverageInfo = fwdCov[[x]], 
    txdb = txdb, models = fwdModels, groupInfo = experiment$covariates$condition[fwd], 
    returnOutput = TRUE, verbose = verbose), analyzeArgs)) 
4.
do.call(analyzeChr, c(list(chr = x, coverageInfo = fwdCov[[x]], 
    txdb = txdb, models = fwdModels, groupInfo = experiment$covariates$condition[fwd], 
    returnOutput = TRUE, verbose = verbose), analyzeArgs)) 
3.
FUN(X[[i]], ...) 
2.
lapply(X = X, FUN = FUN, ...) 
1.
mclapply(chrs, function(x) {
    do.call(analyzeChr, c(list(chr = x, coverageInfo = fwdCov[[x]], 
        txdb = txdb, models = fwdModels, groupInfo = experiment$covariates$condition[fwd], 
        returnOutput = TRUE, verbose = verbose), analyzeArgs)) ...

This is the stack trace with a recognised species but not a recognised annotation:

extendedMapSeqlevels: the 'style' UCSC is currently not supported for the 'species' arabidopsis_thaliana in GenomeInfoDb. Check valid naming styles by running GenomeInfoDb::genomeStyles(species). If it's not present, consider adding your genome by following the information at http://www.bioconductor.org/packages/release/bioc/vignettes/GenomeInfoDb/inst/doc/Accept-organism-for-GenomeInfoDb.pdf
2017-08-17 14:03:58 analyzeChr: Pre-processing the coverage data
 Hide Traceback
 
 Rerun with Debug
 Error: length(intersect(names(coverageInfo), c("coverage", "position"))) ==  .... is not TRUE 
9.
stop(msg, call. = FALSE, domain = NA) 
8.
stopifnot(length(intersect(names(coverageInfo), c("coverage", 
    "position"))) == 2) 
7.
preprocessCoverage(coverageInfo = coverageInfo, groupInfo = groupInfo, 
    cutoff = cutoffPre, lowMemDir = lowMemDir, ...) 
6.
(function (chr, coverageInfo, models, cutoffPre = 5, cutoffFstat = 1e-08, 
    cutoffType = "theoretical", nPermute = 1, seeds = as.integer(gsub("-", 
        "", Sys.Date())) + seq_len(nPermute), groupInfo, txdb = NULL, 
    writeOutput = TRUE, runAnnotation = TRUE, lowMemDir = file.path(chr,  ... 
5.
do.call(analyzeChr, c(list(chr = x, coverageInfo = fwdCov[[x]], 
    txdb = txdb, models = fwdModels, groupInfo = experiment$covariates$condition[fwd], 
    returnOutput = TRUE, verbose = verbose, species = experiment$organism), 
    analyzeArgs)) 
4.
do.call(analyzeChr, c(list(chr = x, coverageInfo = fwdCov[[x]], 
    txdb = txdb, models = fwdModels, groupInfo = experiment$covariates$condition[fwd], 
    returnOutput = TRUE, verbose = verbose, species = experiment$organism), 
    analyzeArgs)) 
3.
FUN(X[[i]], ...) 
2.
lapply(X = X, FUN = FUN, ...) 
1.
mclapply(chrs, function(x) {
    do.call(analyzeChr, c(list(chr = x, coverageInfo = fwdCov[[x]], 
        txdb = txdb, models = fwdModels, groupInfo = experiment$covariates$condition[fwd], 
        returnOutput = TRUE, verbose = verbose, species = experiment$organism),  ...

This is my session Info:

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GenomicFeatures_1.28.4     AnnotationDbi_1.38.2       Rsamtools_1.28.0           Biostrings_2.44.2          XVector_0.16.0             treat_0.0.1                derfinder_1.10.5           DESeq2_1.16.1              SummarizedExperiment_1.6.3 DelayedArray_0.2.7         matrixStats_0.52.2         Biobase_2.36.2            
[13] GenomicRanges_1.28.4       GenomeInfoDb_1.12.2        IRanges_2.10.2             S4Vectors_0.14.3           BiocGenerics_0.22.0        data.table_1.10.4         

loaded via a namespace (and not attached):
 [1] bit64_0.9-7              splines_3.4.1            foreach_1.4.3            GenomicFiles_1.12.0      Formula_1.2-2            bumphunter_1.16.0        latticeExtra_0.6-28      doRNG_1.6.6              blob_1.1.0               BSgenome_1.44.0          GenomeInfoDbData_0.99.0  RSQLite_2.0              backports_1.1.0         
[14] lattice_0.20-35          digest_0.6.12            RColorBrewer_1.1-2       checkmate_1.8.3          qvalue_2.8.0             colorspace_1.3-2         htmltools_0.3.6          Matrix_1.2-11            plyr_1.8.4               pkgconfig_2.0.1          XML_3.98-1.9             biomaRt_2.32.1           genefilter_1.58.1       
[27] zlibbioc_1.22.0          xtable_1.8-2             snow_0.4-2               scales_0.4.1             BiocParallel_1.10.1      annotate_1.54.0          tibble_1.3.3             htmlTable_1.9            pkgmaker_0.22            ggplot2_2.2.1            nnet_7.3-12              lazyeval_0.2.0           crayon_1.3.2            
[40] survival_2.41-3          magrittr_1.5             memoise_1.1.0            foreign_0.8-69           tools_3.4.1              registry_0.3             stringr_1.2.0            munsell_0.4.3            locfit_1.5-9.1           cluster_2.0.6            rngtools_1.2.4           compiler_3.4.1           rlang_0.1.2             
[53] grid_3.4.1               RCurl_1.95-4.8           iterators_1.0.8          VariantAnnotation_1.22.3 htmlwidgets_0.9          bitops_1.0-6             base64enc_0.1-3          testthat_1.0.2           derfinderHelper_1.10.0   gtable_0.2.0             codetools_0.2-15         DBI_0.7                  R6_2.2.2                
[66] reshape2_1.4.2           GenomicAlignments_1.12.1 gridExtra_2.2.1          knitr_1.17               rtracklayer_1.36.4       bit_1.1-12               Hmisc_4.0-3              stringi_1.1.5            Rcpp_0.12.12             geneplotter_1.54.0       rpart_4.1-11             acepack_1.4.1

Thanks!

rename "foldChange" => "log2FoldChange" for the column in the output regions

for example, limma and edgeR give a column "logFC", DESeq gives "log2FoldChange"

                ## Calculate fold coverage vs group 1
                if(length(regionGroupMean) > 1){
                        foldChange <- vector("list", length(regionGroupMean) - 1)
                        names(foldChange) <- names(regionGroupMean)[-1]
                        for(group in names(foldChange)) {
                                foldChange[[group]] <- log2(regionGroupMean[[group]] / regionGroupMean[[1]])
                        }
                }

plotCluster() broken with ggbio 1.12.0

For more details, check lawremi/ggbio#53

Can latest `derfinder` reproduce original (2013) analysis?

Can the analysis in the paper Differential expression analysis of RNA-seq data at single-base resolution be reproduced using the latest version of derfinder? I ask because I am having a difficult time doing so using the "beta" version of the code and analysis_code.R. Thanks for any help you can provide.

Speed up p-value calculation

It would be useful to speed up the p-value calculation step.

One quick solution would be to round the areas a bit, then find the p-values for the unique areas and then assign them to each candidate DER.

> length(fullRegions$area)
[1] 689543
## Not many unique areas
> length(unique(fullRegions$area))
[1] 689535
## By rounding to 2 digits the areas greatly simplify
> length(unique(round(fullRegions$area, 2)))
[1] 151811

handle strand-specific alignments?

is it possible to handle strand-specific alignments? (i.e. get counts and run the pipeline for each strand separately?)