yulab-smu / chipseeker Goto Github PK

:dart: ChIP peak Annotation, Comparison and Visualization

Home Page: https://onlinelibrary.wiley.com/share/author/GYJGUBYCTRMYJFN2JFZZ?target=10.1002/cpz1.585

R 95.14% Makefile 0.45% TeX 4.41%

annotation atac-seq chip-seq comparison epigenetics epigenomics visualization

chipseeker's Introduction

ChIPseeker: ChIP peak Annotation, Comparison, and Visualization

This package implements functions to retrieve the nearest genes around the peak, annotate genomic region of the peak, statstical methods for estimate the significance of overlap among ChIP peak data sets, and incorporate GEO database for user to compare the own dataset with those deposited in database. The comparison can be used to infer cooperative regulation and thus can be used to generate hypotheses. Several visualization functions are implemented to summarize the coverage of the peak experiment, average profile and heatmap of peaks binding to TSS regions, genomic annotation, distance to TSS, and overlap of peaks or genes.

✍️ Authors

Guangchuang YU

School of Basic Medical Sciences, Southern Medical University

https://yulab-smu.top

If you use ChIPseeker in published research, please cite:

Q Wang^#, M Li^#, T Wu, L Zhan, L Li, M Chen, W Xie, Z Xie, E Hu, S Xu, G Yu^*. Exploring epigenomic datasets by ChIPseeker. Current Protocols, 2022, 2(10): e585.
G Yu^*, LG Wang, QY He^*. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparision and visualization. Bioinformatics. 2015, 31(14):2382-2383.

⏬ Installation

Get the released version from Bioconductor:

## try http:// if https:// URLs are not supported
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
## BiocManager::install("BiocUpgrade") ## you may need this
BiocManager::install("ChIPseeker")

Or the development version from github:

## install.packages("devtools")
devtools::install_github("YuLab-SMU/ChIPseeker")

Contributing

We welcome any contributions! By participating in this project you agree to abide by the terms outlined in the Contributor Code of Conduct.

chipseeker's People

Contributors

Stargazers

Watchers

Forkers

puriney maggiecam bioxiao ajjeon hjanime xtmgah klugem mdozmorov februaryfang changebio dayedepps huwenhuo y461650833y xiatingp woodhaha inambioinfo aiminy zefeng-wu wkl1990 souzadevinicius swebb1 tomas2mike wangdi2014 thomasdcy theocest dimitradimou liuyang0681 daniel-wells gravitytrope hmu-chaowang alexyfyf zorrodong venkoyoung acgtcoder xuzhougeng javrodriguez cblume kathrynrozengagnon asifbio huerqiang sarahmklee jorainer garyhuang1 1010stone morgenlefay yufanhui quanrd thesus bigtigr hnthirima mingli-929 ammawla memo17 sinaghadermarzi tsjzz jakelehle restlesstail ruitulyu treywea panktigosar lysaily marconotaro wook2014 josephlaic tzhu-bio computationalregulatorygenomicsicl bug1303 clearmind777 mowonhoo dingtiany wassimsalam01 minsmind yenchie

chipseeker's Issues

ChIPSeeker asks to contact the developer when annotating sacCer peaks

Hi Guangchuang,
ChIPSeeker asks to contact the developer when trying to annotate sacCer peaks:

R> ann <- lapply(peakFiles, annotatePeak, TxDb=txdb,
+               genomicAnnotationPriority=c("Promoter", "5UTR", "3UTR", "Exon",
+                                           "Intron", "Downstream", "Intergenic"),
+               annoDb="org.Sc.sgd.db", tssRegion=c(-500, 500), verbose=FALSE)
Loading required package: org.Sc.sgd.db

geneID type is not supported...	Please report it to developer...

The message is clear enough, but let me know if you need a minimal working example.

This is how the first 3 lines of the peaks file look like (tab separated, commented header output from MACS2 removed):

chr     start   end     length  abs_summit      pileup  -log10(pvalue)  fold_enrichment -log10(qvalue)  name
I       24      516     493     88      54.00   14.87746        3.45200 10.90845        scc2-4_peak_1
I       693     800     108     741     30.00   3.73770 1.94567 1.89969 scc2-4_peak_2
I       1507    1542    36      1520    29.00   3.40934 1.88291 1.64440 scc2-4_peak_3

And this is how the annotated data.frame looks like after calling annotatePeak:

R> head(as.data.frame(ann[[1]]), n=3)
  seqnames start  end width strand length abs_summit pileup X.log10.pvalue. fold_enrichment X.log10.qvalue.          name
1        I    25  516   492      *    493         88     54        14.87746         3.45200        10.90845 scc2-4_peak_1
2        I   694  800   107      *    108        741     30         3.73770         1.94567         1.89969 scc2-4_peak_2
3        I  1508 1542    35      *     36       1520     29         3.40934         1.88291         1.64440 scc2-4_peak_3
         annotation geneChr geneStart geneEnd geneLength geneStrand  geneId transcriptId distanceToTSS
1          Promoter       1       335     649        315          1 YAL069W      YAL069W             0
2          Promoter       1       538     792        255          1 YAL069W    YAL068W-A           156
3 Downstream (<1kb)       1      1807    2169        363          2 YAL068C      YAL068C           627

cheers,
Sergi

Error in downloading GEO data with the function "downloadGEObedFiles"

Hello,there
I am getting an annoying error while trying to download GEO data with chipseeker. I tried something like below:

require(ChIPseeker)
getGEOgenomeVersion()
hg19 <- getGEOInfo(genome="hg19", simplify=TRUE)
downloadGEObedFiles(genome="hg19", destDir="hg19")
gsm <- hg19$gsm[sample(nrow(hg19), 10)]
downloadGSMbedFiles(gsm, destDir="hg19")

and it gives me the following error:
Error in download.file(fnames[i], destfile = destfiles[i], mode = "wb") :
cannot open destfile 'hg19/GSM521889_UCSD.IMR90.H3K27me3.SK05.bed.gz', reason 'No such file or directory'

or:
Error in download.file(fnames[i], destfile = destfiles[i], mode = "wb") :
cannot open destfile 'hg19/GSM521933_UCSD.IMR90.Input.SK-I3.bed.gz', reason 'No such file or directory'

Can you please help me what is wrong here?

peak annotation in intron

Hi,
My peakfile.bed is like this:
chr1 110947536 110947848
chr1 124014307 124014709
chr1 160615139 160615537
chr1 165398758 165399111
chr1 176253795 176254107
chr1 179770058 179770376
chr6 39615514 39616024
chr8 65086609 65087047
chr9 85714003 85714422

annotated with ChIPseeker.
library(ChIPseeker)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
peakfile = "peakfile.bed"
txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene
peakAnno = annotatePeak(peakfile,tssRegion = c(-3000,3000),TxDb = txdb,annoDb = "org.Mm.eg.db")

Within the annotation column ,after searching the peak location in UCSC, I find the No. of intron seems reversed, while exon not.

Limit distance for enrichAnnoOverlap

enrichAnnoOverlap uses the common nearest gene of two peaks as overlap measure, but the distance of the two peaks and/or the the distance to the TSS of the common nearest gene can be large e.g. several 10kb. Therefore it would be great to be able to specify a maximum distance between pairs of peaks and optionally a maximum distance of pairs of peaks and the TSS.

Colors of plotAvgProf

Is there any way I can give colors and linetype as an input to this function. The automated colors makes it very hard to distinguish different plots

reversed order of colors on AnnoBar and DistToTSS

Hi,

I seem to be having an issue with those two plots. I tried looking at your code and coming up with an explanation but it was fruitless. When I generate the plots they have a different order:

for AnnoBar: at yours first is Promoter regions, then UTR and so on, and on mine, we start with Distal Intergenic
and DistToTSS: at yours - 0-1kb, 1-3kb etc. while on mine >100kb, 10-100kb).

The colors and labels match the values listed when accessing Genomic Summary etc. but especially DistToTSS doesn't make sense with the TSS in the middle and the order reversed.

examples (I tried on different data and different Rs)
https://drive.google.com/file/d/0B21lLuavNb9vY3h4Zm1UQ1pCZVE/view?usp=sharing
https://drive.google.com/file/d/0B21lLuavNb9vUWpYYjREU0FoM2M/view?usp=sharing

I can provide you with full sessionInfo if that will help, but to make it easier to read I only put section of it.
R version 3.3.1 (2016-06-21)
ggplot2_2.2.0, ggplot2_2.2.0, plyr_1.8.4, magrittr_1.5

I would be grateful for any suggestions.

Best,
Kasia

How to redefine the categories in peakAnnotate

Dear Guangchuang,

I am using your ChIPseeker. It is very useful.
I would like to combine some of the annotation
categories though without calculating the sums
and recreating the graphs on my own.

For example I would like to use annotatePeak in
the following way:

Combine the Promoter 2-3kb & Promoter 1-2kb into Upstream <=3kb (to match the downstream counterpart)
Combine 1st Exon & Other Exon & 1st Intron & Other Intron > Coding region

Is there a way to do this and keep the percentages?

https://bioconductor.org/packages/release/bioc/vignettes/ChIPseeker/inst/doc/ChIPseeker.html#average-profiles

Essentially replicate your pie-chart or your stacked barplot but
with these newly defined categories?

Thanks.

Harris

Error message in getPromoters

When executing the script below in a Linux server, I got an error message and the run is aborted. I ran it interactively, and realized that the problem is when I try to execute the getPromoters function. The message is

Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'metadata' for signature '"character"'

I used to have the same code executed in my Mac successfully. Could you please help? Thanks

The code is

 #!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE)

 ## Load libraries
.libPaths( c( .libPaths(), "/home/rinaldo/R/x86_64-pc-linux-gnu-library/3.3", "<_shared_library_>") )
library(ChIPseeker)

if (args[1] == 'mm9') {
    require(TxDb.Mmusculus.UCSC.mm9.knownGene)
    txdb <- "TxDb.Mmusculus.UCSC.mm9.knownGene"
} else if (args[1] == 'mm10') {
    require(TxDb.Hsapiens.UCSC.mm10.knownGene)
    txdb <- "TxDb.Hsapiens.UCSC.mm10.knownGene"
} else if (args[1] == 'hg38') {
    require(TxDb.Hsapiens.UCSC.hg38.knownGene)
    txdb <- "TxDb.Hsapiens.UCSC.hg38.knownGene"
} else if (args[1] == 'hg19') {
    require(TxDb.Hsapiens.UCSC.hg19.knownGene)
    txdb <- "TxDb.Hsapiens.UCSC.hg19.knownGene"
}

organism.dir <- paste0("<path>", args[1])
setwd(args[3]);

 ## ChIP profiling
peakfile <- args[2];
peak <- readPeakFile(peakfile, header = FALSE)

 # ChIP peaks coverage plot
pdf('plot.cov.pdf', width = 10, height = 10)
    covplot(peak, weightCol="V5")
dev.off()

 # Profile of ChIP peaks binding to TSS regions
promoter <- getPromoters(TxDb = txdb, upstream = 3000, downstream = 3000, by = "gene")

pdf('tagMatrix.pdf', width = 8, height = 10)
    tagMatrix <- getTagMatrix(peak, windows = promoter)
dev.off()

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] stats4    parallel  grid      stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] BiocInstaller_1.22.3                   
 [2] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2
 [3] GenomicFeatures_1.24.5                 
 [4] AnnotationDbi_1.34.4                   
 [5] Biobase_2.32.0                         
 [6] ReactomePA_1.16.2                      
 [7] clusterProfiler_3.0.4                  
 [8] DOSE_2.10.7                            
 [9] devtools_1.12.0                        
[10] ChIPpeakAnno_3.6.5                     
[11] VennDiagram_1.6.17                     
[12] futile.logger_1.4.3                    
[13] GenomicRanges_1.24.2                   
[14] GenomeInfoDb_1.8.3                     
[15] Biostrings_2.40.2                      
[16] XVector_0.12.1                         
[17] IRanges_2.6.1                          
[18] S4Vectors_0.10.2                       
[19] BiocGenerics_0.18.0                    
[20] ChIPseeker_1.8.9                       

loaded via a namespace (and not attached):
 [1] matrixStats_0.50.2                     
 [2] bitops_1.0-6                           
 [3] RColorBrewer_1.1-2                     
 [4] httr_1.2.1                             
 [5] UpSetR_1.2.3                           
 [6] tools_3.3.1                            
 [7] R6_2.1.2                               
 [8] KernSmooth_2.23-15                     
 [9] lazyeval_0.2.0                         
[10] DBI_0.4-1                              
[11] colorspace_1.2-6                       
[12] ade4_1.7-4                             
[13] withr_1.0.2                            
[14] graphite_1.18.0                        
[15] gridExtra_2.2.1                        
[16] graph_1.50.0                           
[17] SparseM_1.7                            
[18] labeling_0.3                           
[19] topGO_2.24.0                           
[20] rtracklayer_1.32.2                     
[21] caTools_1.17.1                         
[22] scales_0.4.0                           
[23] RBGL_1.48.1                            
[24] rappdirs_0.3.1                         
[25] stringr_1.0.0                          
[26] digest_0.6.10                          
[27] Rsamtools_1.24.0                       
[28] htmltools_0.3.5                        
[29] plotrix_3.6-3                          
[30] ensembldb_1.4.7                        
[31] limma_3.28.17                          
[32] BSgenome_1.40.1                        
[33] regioneR_1.4.2                         
[34] RSQLite_1.0.0                          
[35] shiny_0.13.2                           
[36] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[37] BiocParallel_1.6.4                     
[38] gtools_3.5.0                           
[39] GOSemSim_1.30.3                        
[40] dplyr_0.5.0                            
[41] RCurl_1.95-4.8                         
[42] magrittr_1.5                           
[43] GO.db_3.3.0                            
[44] Matrix_1.2-6                           
[45] Rcpp_0.12.6                            
[46] munsell_0.4.3                          
[47] stringi_1.1.1                          
[48] MASS_7.3-45                            
[49] SummarizedExperiment_1.2.3             
[50] zlibbioc_1.18.0                        
[51] gplots_3.0.1                           
[52] plyr_1.8.4                             
[53] qvalue_2.4.2                           
[54] AnnotationHub_2.4.2                    
[55] gdata_2.17.0                           
[56] DO.db_2.9                              
[57] lattice_0.20-33                        
[58] splines_3.3.1                          
[59] annotate_1.50.0                        
[60] multtest_2.28.0                        
[61] igraph_1.0.1                           
[62] boot_1.3-18                            
[63] seqinr_3.3-0                           
[64] reshape2_1.4.1                         
[65] biomaRt_2.28.0                         
[66] futile.options_1.0.0                   
[67] XML_3.98-1.4                           
[68] lambda.r_1.1.9                         
[69] idr_1.2                                
[70] httpuv_1.3.3                           
[71] tidyr_0.5.1                            
[72] gtable_0.2.0                           
[73] assertthat_0.1                         
[74] ggplot2_2.1.0                          
[75] gridBase_0.4-7                         
[76] mime_0.5                               
[77] xtable_1.8-2                           
[78] reactome.db_1.55.0                     
[79] survival_2.39-5                        
[80] tibble_1.1                             
[81] rvcheck_0.0.2                          
[82] GenomicAlignments_1.8.4                
[83] memoise_1.0.0                          
[84] interactiveDisplayBase_1.10.3          
[85] GSEABase_1.34.0

update to a newer version

hello ,GuangchuangYu
My ChIPseeker version is 1.5.11 now. I just want to update the version to 1.7.15 ,because my R environment can't support the lastest version(1.9.0). So, How can I install it on linux from the number 142 commit(https://codeload.github.com/Bioconductor-mirror/ChIPseeker/zip/f6ed99c7c95157a5bad32b0336078ec587b226bd) ?

question about annotation

hello GuangchuangYu,
ChIPseeker was really help me a lot.
Now, I have a question about the annotation result.I print the result after annotatePeak,the detail information are as follows:

seqnames start end width strand annotation geneChr geneStart geneEnd geneLength geneStrand geneId transcriptId distanceToTSS
1 3178055 3179048 994 * Intron (ENST00000511072/ENSG00000142611, intron 3 of 15) 1 3160696 3303482 142787 + ENSG00000142611 ENST00000463591 18352
1 7019270 7020286 1017 * Intron (ENST00000303635/ENSG00000171735, intron 3 of 22) 1 7068436 7074339 5904 - ENSG00000237365 ENST00000456701 55069
1 7507693 7508565 873 * Intron (ENST00000303635/ENSG00000171735, intron 5 of 22) 1 7501156 7501614 459 - ENSG00000225126 ENST00000440032 -6951
1 10010122 10011152 1031 * Intron (ENST00000403197/ENSG00000173614, intron 1 of 4) 1 10007376 10007694 319 ENSG00000202415 ENST00000365545 3776
1 145555716 145558806 3091 * Exon (ENST00000355594/ENSG00000198483, exon 2 of 14) 1 145549244 145561166 11923 + ENSG00000198483 ENST00000544626 9562

In the above results we can find that the 'annotation' column not only given the location information, but also provides a specific explanation.But the geneId and transcriptId from the ‘annotation’ column are not consistent with the following column named 'geneId' and 'transcriptId'.At this situation,how can I understand the meaning of 'annotation' column,especially the description of the brackets.

thanks for your help.

Covplot supporting grangeslist

Hello I was trying to test the code in the link below and this is the error I get:
http://guangchuangyu.github.io/2016/02/covplot-supports-grangeslist/

library(ChIPseeker)
require(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb<-TxDb.Hsapiens.UCSC.hg38.knownGene
files <- getSampleFiles()
peak=GRangesList(CBX6= readPeakFile(files[[4]]), CBX7= readPeakFile(files[[5]]))
p <- covplot(peak)
Error in file.exists(peak) : invalid 'file' argument

I also tested it on my own data but it fails with the same error. Do you know what the issue might be?

Error in annotatePeak with custom annotation

Hi,

I am getting an strange error while trying to annotate genomic sites (chr, start, end as the input) with chipseeker. I tried something like below:

library(biomaRt)
annot.txdb = makeTxDbFromBiomart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl")

y <- with(genomic.sites, GRanges(seqnames = chr, ranges = IRanges(pos, pos)))
library("ChIPseeker")
library("org.Hs.eg.db")
ann = annotatePeak(y, TxDb = annot.txdb, annoDb = "org.Hs.eg.db", assignGenomicAnnotation = T, genomicAnnotationPriority = c("5UTR", "3UTR", "Exon", "Intron", "Intergenic"), addFlankGeneInfo = FALSE)

and it gives me the following error:

Error in [<-.data.frame(*tmp*, -genicIndex, "Intergenic", value = TRUE) :
replacement has 1 row, data has 0
In addition: Warning messages:
1: In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
2: In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
3: In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
4: In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
5: In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)

Can you please help me what is wrong here?

Add option to specifiy the center label in plotTagMatrix

...here is a patch for it

--- plotTagMatrix.R     2015-09-21 05:19:40.000000000 +0200
+++ plotTagMatrix.R.new 2015-09-19 19:24:28.000000000 +0200
@@ -16,17 +16,18 @@
 plotAvgProf <- function(tagMatrix, xlim,
                         xlab="Genomic Region (5'->3')",
                         ylab = "Read Count Frequency",
+                        xCenterLab = "TSS",
                         conf,
                         facet="none", free_y = TRUE, ...) {
     conf <- ifelse(missingArg(conf), NA, conf)
     if (!(missingArg(conf) || is.na(conf))){
         p <- plotAvgProf.internal(tagMatrix, conf = conf, xlim = xlim, 
-                                  xlab = xlab, ylab = ylab,
+                                  xlab = xlab, ylab = ylab, xCenterLab,
                                   facet = facet, free_y = free_y
                                   )
     } else {
         p <- plotAvgProf.internal(tagMatrix, xlim = xlim,
-                                  xlab = xlab, ylab = ylab,
+                                  xlab = xlab, ylab = ylab, xCenterLab,
                                   facet = facet, free_y = free_y) 
     }
     return(p)
@@ -55,6 +56,7 @@
                          upstream = 1000, downstream = 1000,
                          xlab = "Genomic Region (5'->3')",
                          ylab = "Read Count Frequency",
+                         xCenterLab = "TSS",
                          conf,
                          facet = "none",
                          free_y = TRUE,
@@ -87,12 +89,12 @@                                                                                                                                                                                                                     
     if (!(missingArg(conf) || is.na(conf))){                                                                                                                                                                                           
         p <- plotAvgProf.internal(tagMatrix,                                                                                                                                                                                           
                                   xlim = c(-upstream, downstream),                                                                                                                                                                     
-                                  xlab = xlab, ylab = ylab, conf = conf,                                                                                                                                                               
+                                  xlab = xlab, ylab = ylab, xCenterLab, conf = conf,                                                                                                                                                   
                                   facet = facet, free_y = free_y)                                                                                                                                                                      
     } else {                                                                                                                                                                                                                           
         p <- plotAvgProf.internal(tagMatrix,                                                                                                                                                                                           
                                   xlim=c(-upstream, downstream),                                                                                                                                                                       
-                                  xlab=xlab, ylab=ylab,                                                                                                                                                                                
+                                  xlab=xlab, ylab=ylab, xCenterLab,                                                                                                                                                                    
                                   facet = facet, free_y = free_y)                                                                                                                                                                      
     }                                                                                                                                                                                                                                  
     return(p)                                                                                                                                                                                                                          
@@ -254,6 +256,7 @@                                                                                                                                                                                                                     
                                  xlim = c(-3000,3000),                                                                                                                                                                                 
                                  xlab = "Genomic Region (5'->3')",                                                                                                                                                                     
                                  ylab = "Read Count Frequency",                                                                                                                                                                        
+                                 xCenterLab = "TSS",                                                                                                                                                                                   
                                  facet="none", free_y = TRUE, ...) {                                                                                                                                                                   
                                                                                                                                                                                                                                        
     listFlag <- FALSE                                                                                                                                                                                                                  
@@ -305,7 +308,7 @@                                                                                                                                                                                                                     
                                        0,                                                                                                                                                                                              
                                        floor(xlim[2]/2), xlim[2]),                                                                                                                                                                     
                                    labels=c(xlim[1], floor(xlim[1]/2),                                                                                                                                                                 
-                                       "TSS",                                                                                                                                                                                          
+                                       xCenterLab,                                                                                                                                                                                     
                                        floor(xlim[2]/2), xlim[2]))                                                                                                                                                                     
     }

CI via Bootstrapping supports multiple inputs as tagMatrixList

Enable getTagCount function to use conf (confidence interval) and estimate CI via bootstrapping.
Enable plotAvgProf function to plot tagMatrixList
Enable plotAvgProf2 function to plot tagMatrixList

Error with annotatePeak and level="gene"

Hi Guangchuang,

Thanks for this nice package!

I am analysing ChIP-seq data with it, but I have encountered some problems with the annotatePeak function. In particular, when I run the annotatePeak function I get an error only when level="gene". Why does it happen? Am I doing something wrong?

Thanks for your help.

Kind regards,
Bianca

> peakAnnoList <- lapply(fil, annotatePeak, TxDb=txdb, level="transcript", assignGenomicAnnotation = TRUE,
+ genomicAnnotationPriority = c("Promoter", "5UTR", "3UTR", "Exon", "Intron",
+ "Downstream", "Intergenic"), tssRegion=c(3000, 3000), verbose=T,
+ annoDb="org.Hs.eg.db", addFlankGeneInfo=T, flankDistance=5000)
>> loading peak file...              2015-11-18 16:27:30 
>> preparing features information...         2015-11-18 16:27:33 
>> identifying nearest features...       2015-11-18 16:27:33 
>> calculating distance from peak to TSS...  2015-11-18 16:27:36 
>> assigning genomic annotation...       2015-11-18 16:27:36 
>> adding gene annotation...             2015-11-18 16:27:40 
>> adding flank feature information from peaks...    2015-11-18 16:28:59 
>> assigning chromosome lengths          2015-11-18 16:29:12 
>> done...                   2015-11-18 16:29:12 
>> loading peak file...              2015-11-18 16:29:12 
>> preparing features information...         2015-11-18 16:29:13 
>> identifying nearest features...       2015-11-18 16:29:13 
>> calculating distance from peak to TSS...  2015-11-18 16:29:13 
>> assigning genomic annotation...       2015-11-18 16:29:13 
>> adding gene annotation...             2015-11-18 16:29:15 
>> adding flank feature information from peaks...    2015-11-18 16:29:20 
>> assigning chromosome lengths          2015-11-18 16:29:23 
>> done...                   2015-11-18 16:29:23 


> peakAnnoList <- lapply(fil, annotatePeak, TxDb=txdb, level="gene", assignGenomicAnnotation = TRUE,
+ genomicAnnotationPriority = c("Promoter", "5UTR", "3UTR", "Exon", "Intron",
+ "Downstream", "Intergenic"), tssRegion=c(3000, 3000), verbose=T,
+ annoDb="org.Hs.eg.db", addFlankGeneInfo=T, flankDistance=5000)
>> loading peak file...              2015-11-18 16:29:32 
>> preparing features information...         2015-11-18 16:29:35 
>> identifying nearest features...       2015-11-18 16:29:35 
>> calculating distance from peak to TSS...  2015-11-18 16:29:36 
>> assigning genomic annotation...       2015-11-18 16:29:36 
>> adding gene annotation...             2015-11-18 16:29:39 
>> adding flank feature information from peaks...    2015-11-18 16:30:44 
Error in `$<-.data.frame`(`*tmp*`, "geneId", value = character(0)) : 
  replacement has 0 rows, data has 73501

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_0.5.0                           ggplot2_1.0.1                          
 [3] GO.db_3.1.2                             org.Hs.eg.db_3.1.2                     
 [5] XVector_0.8.0                           clusterProfiler_2.2.7                  
 [7] RSQLite_1.0.0                           DBI_0.3.1                              
 [9] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.2 GenomicFeatures_1.20.6                 
[11] AnnotationDbi_1.30.1                    Biobase_2.28.0                         
[13] GenomicRanges_1.20.8                    GenomeInfoDb_1.4.3                     
[15] IRanges_2.2.9                           S4Vectors_0.6.6                        
[17] BiocGenerics_0.14.0                     ChIPseeker_1.4.7                       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1             png_0.1-7               Rsamtools_1.20.5        Biostrings_2.36.4      
 [5] gtools_3.5.0            assertthat_0.1          digest_0.6.8            R6_2.1.1               
 [9] plyr_1.8.3              futile.options_1.0.0    httr_1.0.0              gplots_2.17.0          
[13] zlibbioc_1.14.0         lazyeval_0.1.10         gdata_2.17.0            qvalue_2.0.0           
[17] proto_0.3-10            labeling_0.3            splines_3.2.2           BiocParallel_1.2.22    
[21] stringr_1.0.0           igraph_1.0.1            RCurl_1.95-4.7          biomaRt_2.24.1         
[25] munsell_0.4.2           rtracklayer_1.28.10     KEGGREST_1.8.1          XML_3.98-1.3           
[29] dplyr_0.4.3             GenomicAlignments_1.4.2 MASS_7.3-43             bitops_1.0-6           
[33] grid_3.2.2              gtable_0.1.2            magrittr_1.5            scales_0.3.0           
[37] KernSmooth_2.23-15      stringi_1.0-1           GOSemSim_1.26.0         reshape2_1.4.1         
[41] DO.db_2.9               futile.logger_1.4.1     boot_1.3-17             lambda.r_1.1.7         
[45] RColorBrewer_1.1-2      tools_3.2.2             plotrix_3.5-12          colorspace_1.2-6       
[49] caTools_1.17.1          DOSE_2.6.6

covplot xlim range

If one has a small GRanges object, on a fraction of a chromosome, the covplot plots only that fraction. Providing xlim has no effect.

It would be very helpful if default xlim (whole chromosome) works as expected.

covplot(peak.tm.min, weightCol = "V5", chrs = chromosome, lower = min(peak.tm.min$V5), xlim = c(0, 100000000))

peak.Rda.zip

Annotate peaks with strandness

Hi GuangChuang,

Thanks for making such a nice tool. I am using ChIPseeker to annotate my genomic intervals (breakpoints from whole genome sequencing). Those are not ChIP-seq peaks and the strandness for each interval is critical to me.
6 column bed file format:

chr1  10000 100200  breakpoint_1  .   +
chr2  20000 200100 breakpoint_2  .    -

So, when annotating those intervals using

annotatePeak(breakpointA, tssRegion=c(-3000, 3000), 
                         TxDb=txdb, annoDb="org.Hs.eg.db")

Is there a way to specify that strandness needs to be considered?
Note that, I am not saying the strandness of the genes (as you mentioned here http://guangchuangyu.github.io/2014/01/bug-of-r-package-chippeakanno/), but rather the strandness of the intervals.

Thanks,
Ming Tang

Tagheatmap color key

Hi,
Just have a request/ inquiry. I was able to make two heatmaps of different experimental conditions using the tagheatmap function. Is it at all possible to have a color key to go with the heatmaps? Also what does the intensity of the colors of the heatmaps represent? Peak height? Log pvalue? Just want to know what is being plotted.

Thank you!

Error in unlist(metadata(TXDB))

hi YGC,
I want to draw some pictures with ChIPseeker.But I met the problem at the beginning.I read your paper:ChIPseeker: an R package for ChIP peak Annotation,Comparision and Visualization.And I tried to use the command.The problem is as follows:

> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> txdb
TxDb object:
| Db type: TxDb
| Supporting package: GenomicFeatures
| Data source: UCSC
| Genome: hg19
| Organism: Homo sapiens
| UCSC Table: knownGene
| Resource URL: http://genome.ucsc.edu/
| Type of Gene ID: Entrez Gene ID
| Full dataset: yes
| miRBase build ID: GRCh37
| transcript_nrow: 82960
| exon_nrow: 289969
| cds_nrow: 237533
| Db created by: GenomicFeatures package from Bioconductor
| Creation time: 2014-09-26 11:16:12 -0700 (Fri, 26 Sep 2014)
| GenomicFeatures version at creation time: 1.17.17
| RSQLite version at creation time: 0.11.4
| DBSCHEMAVERSION: 1.0
> promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)
Error in unlist(metadata(TXDB)) :
  error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function 'metadata' for signature '"character"'

I realy need your help,thx

best
salviadr

Error in enrichPeakOverlap

I have been following the vignette with my own data, but cannot run enrichPeakOverlap. I got a message as below.

> enrichPeakOverlap(queryPeak     = files[[6]], 
+                   targetPeak    = unlist(files[1:5]), 
+                   TxDb          = txdb, 
+                   pAdjustMethod = "BH", 
+                   nShuffle      = 50, 
+                   chainFile     = NULL,
+                   mc.cores      = detectCores() - 1,
+                   verbose       = FALSE)
Error in file.exists(dir) : invalid 'file' argument

I tried to run with the supplied datafile, but got an error, too. What can it be the issue? Thanks!

New Error in enrichPeakOverlap

Guangchuang, I updated ChIPseeker to 1.8.7, but still not working. Below is the error I get (same files)

enrichPeakOverlap(queryPeak = files[[6]],
• targetPeak = unlist(files[1:5]),
• TxDb = txdb,
• pAdjustMethod = "BH",
• nShuffle = 50,
• chainFile = NULL,
• mc.cores = detectCores() - 1,
• verbose = FALSE)

Error in sub(".+/", "", targetFiles) : object 'targetFiles' not found

annotatePeak annoDb error

I can run annotatePeak(peaks, TxDb = txdb) fine. However, I wanted to add gene names, so I added annoDb parameter. That gave me an error.

> peak_anno = annotatePeak(peaks, TxDb = txdb, annoDb="org.Mm.eg.db")
>> preparing features information...         2016-09-19 09:32:45 PM 
>> identifying nearest features...       2016-09-19 09:32:45 PM 
>> calculating distance from peak to TSS...  2016-09-19 09:32:48 PM 
>> assigning genomic annotation...       2016-09-19 09:32:48 PM 
>> adding gene annotation...             2016-09-19 09:32:52 PM 
Error in grepl("^\\d+$", , peak.gr$geneId[1]) : 
  argument "x" is missing, with no default

I have a custom TxDb based on a GTF, but it has a geneId column with values like ENSMUSG00000102851, so it should be working.

I tried to troubleshoot this myself, but can't find that line anywhere in the ChIPseeker codebase.

Shuffle does not preserve feature width

`
library(GenomicRanges)
library(ChIPseeker)
gr <- GRanges(Rle(c("chr2L", "chr2L", "chrX", "chr3R"), c(1, 3, 2, 4)),
IRanges(1:10, width=10:1, names=head(letters, 10)),
Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)))

shuffle_gr <- shuffle(gr, BSgenome.Dmelanogaster.UCSC.dm3::BSgenome.Dmelanogaster.UCSC.dm3)
`

These values should be equal but they are not:
`

width(gr)
[1] 10 9 8 7 6 5 4 3 2 1
width(shuffle_gr)
[1] 11 10 9 8 5 4 3 2 7 6
`

Shuffle also does not correctly deal with granges objects that are not sorted by chromosome name (see also in my example how the width of the region output is not only off by 1 but is in the incorrect order. The 'order' function used to calculate ii should be replaced with seq_along() to avoid this error. Further, explicitly setting width = w[ii] in the RLE call will solve the width issue instead of using start[ii] + w[ii]. The end result here is that not only are the features the wrong size, but if the input is not a sorted list by chromosome, the incorrect chromosome size will be used to calculate the shuffled position.

image host

peakHeatmap / plotAvgProf2

Hi, Guangchuang:

Thanks again for the great work. Some minor issues regarding the peakHeatmap / plotAvgProf2 commands:

While passing the GRanges object "peakAnno" (from annotatePeak) to peakHeatmap / plotAvgProf2 commands, the following error is returned:

preparing promoter regions... 2014-12-03 02:43:15 PM
preparing tag matrix... 2014-12-03 02:43:16 PM
Error in file.exists(peak) : invalid 'file' argument
Calls: peakHeatmap -> getTagMatrix -> loadPeak -> file.exists
Execution halted

The same GRanges object works well with other commands plotAnnoPie / plotAnnoBar / plotDistToTSS.

Thanks a million.

ChIPseeker: 1.2.0
Bicoconductor 3.0
R 3.1.1

extend getPromoters()

getPromoters() function prepare a GRanges object of promoter regions by user specify upstream and downstream distance from Transcript Start Site (TSS).

Refer to the questions in Bioconductor support site: https://support.bioconductor.org/p/73069/#73157 and https://support.bioconductor.org/p/73377/, some users (studying RNA splicing I guess) may interesting in how protein binding to the start of intron/exon.

getPromoters() function will extend to support output a GRanges object of Intron/Exon start region similar to Transcript start region.

Maybe the function name should be changed.

error message of annotate

Hi,

you've got a great package here!
Unfortunately, I'm running into an error that I cannot seem to resolve:
I'd like to annotate a couple of CpG sites, which I have stuck into a GRanges object. annotatePeak starts off fine, but eventually it produces an error: Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object 'min_overlap_score' not found

> mm9.txdb <- TxDb.Mmusculus.UCSC.mm9.knownGene 
> head(CpG.gr)
GRanges object with 6 ranges and 0 metadata columns:
      seqnames             ranges strand
         <Rle>          <IRanges>  <Rle>
  [1]     chr1 [8812273, 8812273]      *
  [2]     chr1 [8812587, 8812587]      *
  [3]     chr1 [8812888, 8812888]      *
  [4]     chr1 [8813689, 8813689]      *
  [5]     chr1 [8813766, 8813766]      *
  [6]     chr1 [8814243, 8814243]      *

> CpGsites_annotated <- annotatePeak(CpG.gr, TxDb = mm9.txdb, tssRegion=c(2000,2000))
>> preparing features information...             2016-01-28 06:18:48 PM
>> identifying nearest features...               2016-01-28 06:18:49 PM
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
  object 'min_overlap_score' not found 

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.3 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2
 [2] GenomicFeatures_1.21.31
 [3] AnnotationDbi_1.31.18
 [4] Biobase_2.29.1
 [5] ChIPseeker_1.6.6
 [6] BSgenome.Mmusculus.UCSC.mm9_1.4.0
 [7] BSgenome_1.37.5
 [8] rtracklayer_1.29.28
 [9] Biostrings_2.38.3
[10] XVector_0.9.4
[11] GenomicRanges_1.21.29
[12] GenomeInfoDb_1.5.16
[13] IRanges_2.4.6
[14] S4Vectors_0.8.7
[15] BiocGenerics_0.16.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1
 [2] RColorBrewer_1.1-2
 [3] futile.logger_1.4.1
 [4] plyr_1.8.3
 [5] bitops_1.0-6
 [6] futile.options_1.0.0
 [7] tools_3.2.1
 [8] zlibbioc_1.15.0
 [9] boot_1.3-17
[10] biomaRt_2.25.3
[11] digest_0.6.8
[12] gridBase_0.4-7
[13] RSQLite_1.0.0
[14] gtable_0.1.2
[15] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.1
[16] DBI_0.3.1
[17] proto_0.3-10
[18] gridExtra_2.0.0
[19] dplyr_0.4.3
[20] UpSetR_1.0.2
[21] stringr_1.0.0
[22] caTools_1.17.1
[23] gtools_3.5.0
[24] grid_3.2.1
[25] R6_2.1.1
[26] plotrix_3.6-1
[27] XML_3.98-1.3
[28] BiocParallel_1.3.52
[29] gdata_2.17.0
[30] magrittr_1.5
[31] ggplot2_1.0.1
[32] reshape2_1.4.1
[33] lambda.r_1.1.7
[34] gplots_2.17.0
[35] MASS_7.3-44
[36] scales_0.3.0
[37] Rsamtools_1.21.19
[38] GenomicAlignments_1.5.18
[39] assertthat_0.1
[40] SummarizedExperiment_0.3.9
[41] colorspace_1.2-6
[42] KernSmooth_2.23-15
[43] stringi_0.5-5
[44] munsell_0.4.2
[45] RCurl_1.95-4.7

Any insights are greatly appreciated!
Thanks!

Error when using new parameter distanceToTSS_cutoff in enrichAnnoOverlap

enrichAnnoOverlap(myPeaks, GEOpeaks, TxDb = txdb, pAdjustMethod = "BH", distanceToTSS_cutoff = 10000)

produces:
Error in colnames<-(*tmp*, value = c("Feature", "Frequency")) :
'names' attribute [2] must be the same length as the vector [1]

covplot help

I am interested in creating color overlays using two covplot outputs (peaks from H3KK9me3 and H3K27me3). This is quite easy to do in photoshop when both covplot outputs are the same size (attached screenshot), but quite difficult if they are not. Is there a way to keep the covplot scaling the same regardless of the number of peaks that are present in the input file? I only run into the problem when I have a file with few (~100) differentially called peaks. I should mention I am only interested in the location (not size) of the peak, so my input file only consist of chromosome, start, and end columns.
Thank you for your time.

Adding '.id' to list of annotated peak files

Hi,
I am hoping to use a list of annotated peak files to make visual comparisons between treatments. I followed the guide using my own data. However, I have run into an id issue when I try to annotate a list of files. Is there anyway to add ids (perhaps in the initial "myFiles" list) to the peak files so that I can use ChIPseeker's comparison plots?

Sincerely,
Nora Peterson

myFiles <- list.files(pattern="*.txt")
peakAnnoList <- lapply(myFiles, annotatePeak, TxDb=txdb,
tssRegion=c(-3000, 3000), verbose=FALSE)
peakAnnoList
[[1]]
Annotated peaks generated by ChIPseeker
16319/16359 peaks were annotated
Genomic Annotation Summary:
Feature Frequency
9 Promoter (1-2kb) 1.5319566
10 Promoter (<=1kb) 70.9602304
11 Promoter (2-3kb) 1.3358662
4 5' UTR 0.4595870
3 3' UTR 0.8517679
1 1st Exon 0.7353392
7 Other Exon 1.4645505
2 1st Intron 1.7893253
8 Other Intron 4.9451560
6 Downstream (<=3kb) 0.4963539
5 Distal Intergenic 15.4298670
[[2]]
Annotated peaks generated by ChIPseeker
16845/16890 peaks were annotated
Genomic Annotation Summary:
Feature Frequency
9 Promoter (1-2kb) 1.6859602
10 Promoter (<=1kb) 69.2668448
11 Promoter (2-3kb) 1.3831998
4 5' UTR 0.4689819
3 3' UTR 0.9260908
1 1st Exon 0.8489166
7 Other Exon 1.5197388
2 1st Intron 2.0124666
8 Other Intron 5.4615613
6 Downstream (<=3kb) 0.4867913
5 Distal Intergenic 15.9394479

etc. etc. through 8 files

plotAnnoBar(peakAnnoList)
Error in eval(expr, envir, enclos) : object '.id' not found

Annotation of peaks that overlap multiple genes/TSS?

I really like this tool for multiple reasons, including nice visualisation. However, one thing that I need to know in order to be fully convinced, is how peaks are annotated that overlap multiple genes and/or TSS.

Let me take chr3:137,806,020-137,806,847 as an example. The region is only 800 bp long, yet it contains TSS regions for 2 genes: Mttp and Rg9mtd2. In my personal experiment (where the region is slightly larger but anyway covers both TSS), it is annotated as Mttp, but based on the shape of the peak it seems it actually tends to lean into the Rg9mtd2 gene body.

What is the rule during the annotation? Is it the based on the size of the overlap with the two genes (annotating the gene with larger overlap), or maybe it takes the first entry in the annoDb that matches? I think the best (and quite easy) solution would be to report both genes.

I would greatly appreciate a response because we can't use the results for a publication if we don't fully understand the annotation.

Thanks!

details of annotating ChIP-seq peaks

Hi Guangchuan,

I am looking into the details of how annotatePeak work and found it internally uses this function:
getNearestFeatureIndicesAndDistances

I saw

    ## nearest from peak start
    ps.idx <- precede(peaks, features)

    ## nearest from peak end
    pe.idx <- follow(peaks, features)

I understand for ChIP-seq peaks, there is no strandness.

precede will actually return nearest from peak end.
follow will actually return nearest from peak start.

See some of my testing here http://crazyhottommy.blogspot.com/2016/01/find-nearest-upstream-genes-using.html

Error in unlist(metadata(TXDB))

I have all the latest packages and I am not new to R!
There is a problem with ChIPSeeker talking to TxDb objects. The TxDb object can be manually queried but cannot be queried through ChIPSeeker:

    >promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)
    Error in unlist(metadata(TXDB)) :
    error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in (function (classes, fdef, mtable) :
    unable to find an inherited method for function ‘metadata’ for signature ‘"character"’
    >txdb
    TxDb object:
    Db type: TxDb
    Supporting package: GenomicFeatures
    Data source: UCSC
    Genome: hg19
    Organism: Homo sapiens
    UCSC Table: knownGene
    Resource URL: http://genome.ucsc.edu/
    Type of Gene ID: Entrez Gene ID
    Full dataset: yes
    miRBase build ID: GRCh37
    transcript_nrow: 82960
    exon_nrow: 289969
    cds_nrow: 237533
    Db created by: GenomicFeatures package from Bioconductor
    Creation time: 2015-03-19 13:55:51 -0700 (Thu, 19 Mar 2015)
    GenomicFeatures version at creation time: 1.19.32
    RSQLite version at creation time: 1.0.0
    DBSCHEMAVERSION: 1.1

    >sessionInfo(package="ChIPseeker")
    R version 3.2.2 (2015-08-14)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Ubuntu 14.04.1 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C

[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8

[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8

[7] LC_PAPER=en_US.UTF-8 LC_NAME=C

[9] LC_ADDRESS=C LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
character(0)

other attached packages:
[1] ChIPseeker_1.4.7

loaded via a namespace (and not attached):
[1] Rcpp_0.12.1

[2] GO.db_3.1.2

[3] png_0.1-7

[4] Rsamtools_1.20.5

[5] Biostrings_2.36.4

[6] gtools_3.5.0

[7] assertthat_0.1

[8] digest_0.6.8

[9] BSgenome.Mmusculus.UCSC.mm10_1.4.0

[10] R6_2.1.1

[11] org.Mm.eg.db_3.1.2

[12] GenomeInfoDb_1.4.3

[13] plyr_1.8.3

[14] futile.options_1.0.0

[15] stats4_3.2.2

[16] RSQLite_1.0.0

[17] httr_1.0.0

[18] ggplot2_1.0.1

[19] BiocInstaller_1.18.4

[20] utils_3.2.2

[21] gplots_2.17.0

[22] zlibbioc_1.14.0

[23] GenomicFeatures_1.20.6

[24] gdata_2.17.0

[25] S4Vectors_0.6.6

[26] qvalue_2.0.0

[27] proto_0.3-10

[28] splines_3.2.2

[29] BiocParallel_1.2.22

[30] stringr_1.0.0

[31] igraph_1.0.1

[32] RCurl_1.95-4.7

[33] biomaRt_2.24.1

[34] munsell_0.4.2

[35] TxDb.Mmusculus.UCSC.mm10.knownGene_3.1.2
[36] rtracklayer_1.28.10

[37] stats_3.2.2

[38] BiocGenerics_0.14.0

[39] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.2
[40] KEGGREST_1.8.1

[41] IRanges_2.2.9

[42] grDevices_3.2.2

[43] XML_3.98-1.3

[44] dplyr_0.4.3

[45] GenomicAlignments_1.4.2

[46] MASS_7.3-44

[47] bitops_1.0-6

[48] grid_3.2.2

[49] gtable_0.1.2

[50] DBI_0.3.1

[51] magrittr_1.5

[52] datasets_3.2.2

[53] scales_0.3.0

[54] KernSmooth_2.23-15

[55] stringi_0.5-5

[56] GOSemSim_1.26.0

[57] XVector_0.8.0

[58] reshape2_1.4.1

[59] DO.db_2.9

[60] futile.logger_1.4.1

[61] clusterProfiler_2.2.7

[62] graphics_3.2.2

[63] boot_1.3-17

[64] base_3.2.2

[65] lambda.r_1.1.7

[66] RColorBrewer_1.1-2

[67] tools_3.2.2

[68] BSgenome_1.36.3

[69] Biobase_2.28.0

[70] plotrix_3.5-12

[71] parallel_3.2.2

[72] AnnotationDbi_1.30.1

[73] colorspace_1.2-6

[74] GenomicRanges_1.20.8

[75] caTools_1.17.1

[76] DOSE_2.6.6

[77] methods_3.2.2

Installing ChipSeeker

I am trying to Install ChipSeeker 1.10.0 because I was running into this error from the oder version on R/3.2.0
http://bioconductor.org/packages/release/bioc/html/ChIPseeker.html
**pathway1 <- enrichPathway(as.data.frame(peakAnno)$geneId) Error in row.names<-.data.frame(tmp, value = c(NA_character, NA_character_, : duplicate 'row.names' are not allowed In addition: There were 50 or more warnings (use warnings() to see the first 50)

the I tried seq2gene and it gives this error

gene <- seq2gene(peaks, tssRegion = c(-1000, 1000), flankDistance = 3000, TxDb=txdb) Error in .Method(..., deparse.level = deparse.level) : number of columns for arg 2 do not match those of first arg**_

and now I am getting this error, after installation

library(ChIPseeker)
Error : Function found when exporting methods from the namespace 'ChIPseeker' which is not S4 generic: 'upsetplot'
Error: package or namespace load failed for 'ChIPseeker'

Here is the sessionInfo()

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.2 (Final)

locale:
[1] C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] BiocInstaller_1.22.3 clusterProfiler_3.0.2 DOSE_2.10.2

loaded via a namespace (and not attached):
[1] Rcpp_0.12.7
[2] lattice_0.20-34
[3] GO.db_3.3.0
[4] tidyr_0.6.0
[5] Rsamtools_1.24.0
[6] Biostrings_2.40.2
[7] gtools_3.5.0
[8] assertthat_0.1
[9] gridBase_0.4-7
[10] R6_2.2.0
[11] GenomeInfoDb_1.8.1
[12] plyr_1.8.4
[13] stats4_3.3.1
[14] RSQLite_1.0.0
[15] ggplot2_2.1.0
[16] gplots_3.0.1
[17] zlibbioc_1.18.0
[18] GenomicFeatures_1.24.2
[19] annotate_1.50.0
[20] gdata_2.17.0
[21] SparseM_1.72
[22] S4Vectors_0.10.3
[23] qvalue_2.4.2
[24] splines_3.3.1
[25] BiocParallel_1.6.2
[26] stringr_1.1.0
[27] topGO_2.24.0
[28] igraph_1.0.1
[29] RCurl_1.95-4.8
[30] biomaRt_2.28.0
[31] munsell_0.4.3
[32] rtracklayer_1.32.2
[33] BiocGenerics_0.18.0
[34] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[35] SummarizedExperiment_1.2.3
[36] tibble_1.2
[37] gridExtra_2.2.1
[38] IRanges_2.6.1
[39] matrixStats_0.50.2
[40] XML_3.98-1.4
[41] dplyr_0.5.0
[42] GenomicAlignments_1.8.4
[43] bitops_1.0-6
[44] grid_3.3.1
[45] xtable_1.8-2
[46] GSEABase_1.34.0
[47] gtable_0.2.0
[48] DBI_0.5-1
[49] magrittr_1.5
[50] scales_0.4.0
[51] graph_1.50.0
[52] KernSmooth_2.23-15
[53] stringi_1.1.2
[54] GOSemSim_1.30.2
[55] XVector_0.12.1
[56] reshape2_1.4.1
[57] DO.db_2.9
[58] boot_1.3-18
[59] RColorBrewer_1.1-2
[60] tools_3.3.1
[61] Biobase_2.32.0
[62] plotrix_3.6-3
[63] parallel_3.3.1
[64] AnnotationDbi_1.34.4
[65] colorspace_1.2-6
[66] UpSetR_1.2.0
[67] GenomicRanges_1.24.0
[68] caTools_1.17.1

covplot

I am with issues in covplot.
I have tried to run the command on your test data and get the following error message.
library(ChIPseeker)
files=getSampleFiles()
peak=GenomicRanges::GRangesList(CBX6=readPeakFile(files[[4]]), CBX7=readPeakFile(files[[5]]))
p=covplot(peak)
Error in file.exists(peak) : invalid 'file' argument

Error in sqliteSendQuery

Dear Guangchuang,

I am having issues running ChIPseeker annotatePeak command.

The error message I get is :

annotatePeak(Ann_K4Me1_DB_UP_TSS, TxDb=TxDb)
Error in unlist(metadata(TXDB)) :
erreur d'évaluation de l'argument 'x' lors de la sélection d'une méthode pour la fonction 'unlist' : Error in sqliteSendQuery(con, statement, bind.data) :
expired SQLiteConnection

In this example, I'm running the analysis using a custom TxDb object I created using biomaRt.
But I have the same error when trying to run annotatePeak on the example data set included in your package.

I think I need to deal with the installation of SQLite package - which seems to work properly til now.
But, I would like to know if you already faced such an error and have suggetions.

Many thanks for your help,

Pierre-François

Definition of genomic features

How do you define or quantify promoters, introns, exons , 5utr, 3utr, downstream and distal intergenic regions for a given set of peak?

[New] Add CI by bootstrapping

Confidence interval of the TSS coverage plot is probably one of the features that users need. I have implemented it by using bootstrapping via third-party package boot. See my fork: https://github.com/Puriney/ChIPseeker

Demo

plotAvgProf(tagMatrix, xlim = c(-3000, 3000), conf = 0.95)

My implementation is elegant for both users and you. Hopefully you will agree with me that I did not mess up your source codes.
Note
Currently CI feature can only support single one tagMatrix data. It cannot handle if user feeds it with a list of data. I am happy to implement it further but the reason why I hesitate is that the final figure will be messed if many data is plotted simultaneously. For example in your own plot here. If there were 10 proteins in the same one figure, I'm afraid that the science story behind figure becomes less straightforward, let alone with CI being added.

To address the problem, I am thinking of multiple plots in one page. But whether I will go in this way depends on whether we are ok with current design. More importantly, after all you are the author and this is my 1st time being here, so I will leave it later.

Anyway, I will make CI estimation compatible with multiple inputs of tagMatrix, hopefully in short time.

annotatepeak using txdb from genecode

Hi Guangchuang,

I want to annotate the peaks using genecodev19,
which has many non-coding genes.

library(GenomicFeatures)

library(TxDb.Hsapiens.UCSC.hg19.knownGene)

genecode.txdb<- makeTxDbFromGFF("~/annotations/human/gencode_hg19.v19/gencode.v19.annotation.gtf.gz", format="gtf", organism = "Homo sapiens", chrominfo = seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene) )

annotatePeak(myGRs, tssRegion=c(-3000, 3000), 
                         TxDb=genecode.txdb, level = "transcript", annoDb="org.Hs.eg.db",
                         sameStrand = FALSE, ignoreOverlap = FALSE,
                         ignoreDownstream = TRUE, overlap = "all")

>> preparing features information...         2016-02-29 15:53:16 
>> identifying nearest features...       2016-02-29 15:53:17 
>> calculating distance from peak to TSS...  2016-02-29 15:53:18 
>> assigning genomic annotation...       2016-02-29 15:53:18 
>> adding gene annotation...             2016-02-29 15:53:40 
Error in if (type == "Entrez Gene ID") { : argument is of length zero

Thanks,
Ming

Upsetplot with vennpie: upsetplot legend colors don't correspond with vennpie legend

Hi,
Just wanted to add an issue here, could be that I made some mistake, but when I use the upsetplot with the vennpie=TRUE, the vennpie colors do not match up with the correct upsetplot colors on the set size axis. Thanks for designing a very nice tool package!

Cheers,
David

plotAvgProf weightCol not plotted

I am trying to plot the values of V5 in my data:

plotAvgProf2(peak, weightCol="V5", TxDb=txdb, upstream=3000, downstream=3000)

where "peak" is the GRange object that looks like this:

GRanges object with 100001 ranges and 4 metadata columns:

       seqnames               ranges strand   |          V4         V5

          <Rle>            <IRanges>  <Rle>   |    <factor>  <numeric>

   [1]     chr1     [ 10010,  10664]      *   |      Peak_1  4.4973750

   [2]     chr1     [713785, 714569]      *   |      Peak_2  5.4287500

   [3]     chr1     [762394, 763144]      *   |      Peak_3  3.3098750

   [4]     chr1     [815103, 817360]      *   |      Peak_4  7.0947500

But it defaults to always plotting the read frequencies count instead of the values on my V5 column. It seems to work with one peak only provided by the vignette but I can't see what the difference is between my peak and the peak in the Vignette... Is there maybe an additional specification I need to provide, such as ylim values? I think the problem is in the coordinates:

Error in plotAvgProf.internal(tagMatrix, xlim = xlim, xlab = xlab, ylab = ylab, :

please specify appropreate xcoordinations...

Calls: plotAvgProf -> plotAvgProf.internal

Also when I try the bootstrap I get this error:

"Error in boot(data = tagMatrix, statistic = getSgn, R = RESAMPLE_TIME, :

number of items to replace is not a multiple of replacement length"

Thank you for your help and great work!

annotation is wrong?

Hi Guangchuan,

if you can test this:

library(ChIPseeker)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
gr<- GRanges( seqnames ="chr10", ranges=IRanges(start= 34802689, end=  34802698), strand="+")

ap<- annotatePeak(gr, tssRegion=c(-3000, 3000), 
              TxDb=txdb, annoDb="org.Hs.eg.db",
              sameStrand = FALSE, ignoreOverlap = FALSE,
              ignoreDownstream = TRUE)

as.data.frame(ap)
  seqnames    start      end width strand        annotation geneChr geneStart  geneEnd geneLength geneStrand
1    chr10 34802689 34802698    10      + Distal Intergenic   chr10  34601188 34715659     114472          -
  geneId transcriptId distanceToTSS         ENSEMBL SYMBOL                             GENENAME
1  56288   uc001ixo.2        -87030 ENSG00000148498  PARD3 par-3 family cell polarity regulator



> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.1 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Hs.eg.db_3.2.3                      RSQLite_1.0.0                          
 [3] DBI_0.3.1                               BSgenome.Hsapiens.UCSC.hg19_1.4.0      
 [5] BSgenome_1.38.0                         rtracklayer_1.30.1                     
 [7] Biostrings_2.38.3                       XVector_0.10.0                         
 [9] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.22.7                 
[11] AnnotationDbi_1.32.3                    Biobase_2.30.0                         
[13] ggplot2_2.0.0                           tidyr_0.3.1                            
[15] dplyr_0.4.3                             GenomicInteractions_1.4.1              
[17] GenomicRanges_1.23.10                   GenomeInfoDb_1.7.3                     
[19] IRanges_2.5.19                          S4Vectors_0.9.18                       
[21] BiocGenerics_0.17.2                     ChIPseeker_1.7.5                       
[23] BiocInstaller_1.20.1                   

loaded via a namespace (and not attached):
 [1] httr_1.0.0                  splines_3.2.2               gtools_3.5.0               
 [4] Formula_1.2-1               assertthat_0.1              latticeExtra_0.6-26        
 [7] Rsamtools_1.22.0            lattice_0.20-33             biovizBase_1.18.0          
[10] chron_2.3-47                digest_0.6.8                RColorBrewer_1.1-2         
[13] colorspace_1.2-6            plyr_1.8.3                  XML_3.98-1.3               
[16] devtools_1.9.1              biomaRt_2.26.1              zlibbioc_1.16.0            
[19] scales_0.3.0                gdata_2.17.0                BiocParallel_1.4.3         
[22] UpSetR_1.0.2                SummarizedExperiment_1.1.13 nnet_7.3-11                
[25] Gviz_1.14.2                 survival_2.38-3             magrittr_1.5               
[28] memoise_0.2.1               gplots_2.17.0               foreign_0.8-66             
[31] data.table_1.9.6            tools_3.2.2                 matrixStats_0.50.1         
[34] gridBase_0.4-7              stringr_1.0.0               munsell_0.4.2              
[37] cluster_2.0.3               plotrix_3.6-1               lambda.r_1.1.7             
[40] caTools_1.17.1              futile.logger_1.4.1         grid_3.2.2                 
[43] RCurl_1.95-4.7              dichromat_2.0-0             VariantAnnotation_1.16.4   
[46] igraph_1.0.1                labeling_0.3                bitops_1.0-6               
[49] boot_1.3-17                 gtable_0.1.2                curl_0.9.4                 
[52] R6_2.1.1                    GenomicAlignments_1.7.11    gridExtra_2.0.0            
[55] knitr_1.12                  Hmisc_3.17-1                futile.options_1.0.0       
[58] KernSmooth_2.23-15          stringi_1.0-1               Rcpp_0.12.2                
[61] rpart_4.1-10                acepack_1.3-3.3

gr is in the intron of PARD3, not sure why the annotation is distal intergenic.

Thanks.
Ming

annotatePeak issue with overlapping genes

Hi, Dear Prof. Yu.

I was trying to use your package in my collaborative research, and I really like the general setup of the package.

However, I may have found a bug in annotatePeak function. One example where the bug may lead t o some scientific problems is as follows.

Let A and B be two genes that overlaps with each other in genomic locations,
and P be a peak in an exon/intron of A but not in the gene body of B.
If the TSS of B is closer to P, annotatePeak will output some of the gene information of B instead of A, yet the annotation column will still show that it is in the exon/intron of A.

One example of such genes are Xkr4 and AK149000 on mm9.

This may cause problem if one wants to use the output gene information to do gene set enrichment analysis.

I think the problem is that the output gene information retrieved from the output of getNearestFeatureIndicesAndDistances, which only consider distance to TSS but not the actual overlaps with the gene body, while the annotation of exon/intron is from the actual overlaps between peaks and features.

I have been trying to fix this since yesterday, and my current strategy is to let getGenomicAnnotation also output gene ID of the actual annotation, and use the gene ID to retrieve the gene information and re-calculate distance to TSS afterwards.

I figured out that you may have a better solution, but I will also post my solution once I have a working version.Thanks.

Qi Zhang

wrong annotation with annotatePeak

Hi,

I have two peaks (hg19):

chr1,19234321,19235388
chr1,40241808,40243451

The first one is annotated to ALDH4A1, which has the closest TSS, with a distance of -5028 bp to the TSS. With default settings, in column 'annotation' I would expect 'Distal Intergenic', but it is '3'UTR'. Keeping a look in the UCSC GenomeBrowser, it is clear why: ALDH4A1 has the closest TSS, but the peak overlaps with the 3'UTR of the next gene, which is IFFO2.
The second Peak is annotated to OXCT2, which is again the closest TSS. The distance to the TSS is -4788 bp, which should again be annotated as 'Distal Intergenic', but it is annotated to the 6th Intron of BMP8B. I think it is the same issue as with the first peak, but with another mistake: BMP8B is on the reverse strand, it is the first Intron, not the 6th.
I also tried to change genomicAnnotationPriority to prefer 'Intergenic' before 'Promoter', with no effect. I think both cases are not the expected behaviour. I used ChIPseeker-version 1.6.7

Best Regards

Shuffle and enrichPeakOverlap problems

Hi,

I cannot run the shuffle command as I receive the following errors and I think it's related to why I cannot run the enrichPeakOverlap command as well.

files
$peak1
[1] "peak1.bed"

$peak2
[1] "peak2.bed"
p <- GRanges(seqnames=c("1", "5"), ranges=IRanges(start=c(100, 100), end=c(200, 200)))

shuffle(p, TxDb = tair10)

Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : 
  solving row 1: range cannot be determined from the supplied arguments (too many NAs)

Here is the error code with the enrichPeakOverlap command

enrichPeakOverlap(queryPeak     = files[[2]], targetPeak    = unlist(files[1]), TxDb          = tair10,  pAdjustMethod = "BH", nShuffle      = 50, chainFile     = NULL, verbose       = FALSE)

qSample
peak1.bed
tSample
peak2.bed
qLen tLen N_OL pvalue p.adjust
peak2.bed 1252 4861  221      1        1

Warning message:
In mclapply(seq_along(idx), function(j) { : all scheduled cores encountered errors in user code

Then, when I set mc.cores = 1, here is the error that I receive

enrichPeakOverlap(queryPeak     = files[[2]], targetPeak    = unlist(files[1]), TxDb = tair10, pAdjustMethod = "BH", nShuffle      = 50, chainFile = NULL, verbose = FALSE, mc.cores      = 1)

Error in sample.int(length(x), size, replace, prob) : 
cannot take a sample larger than the population when 'replace = FALSE'

Please, let me know what else you would need from me to dissect this issue. Thanks!

question about ‘annotatePeak’

hi GCY,

I have two questions.As follows:
1.I had read Instruction manual about ‘annotatePeak’, the parameter 'tssRegion' is set to "c(-3000, 3000)". And 'tssRegion' is means "Region Range of TSS".Now I only hope peaks are annotate to promoter's upstream 2000bp and downstream 2000bp.I changed the parameter 'tssRegion' ,but I failed.The 'annotation' coloum in result file also including "Promoter (<=1kb)、Promoter (1-2kb)、Promoter (2-3kb)".
2.Genomic features of the peak in result have 11 categories.How should I understand ‘downstream’? I try to think it is means as downstream of a gene.But how you classify downstream to 3 categories?

Thank you for your help.
FF

getNearestFeatureIndicesAndDistances

hi,

the function "getNearestFeatureIndicesAndDistances" is documented but not exported to NAMESPACE.
Do I miss something?

Dietmar

question about strand

Hi,
I have a question about the strand (+/-) given after peak annotation. Regarding the strand given for each chr. coordinate, I want to select a specific window of the gene (-50 bp to 200 bp around geneStart).So, for example If I have the follow annotation

geneStart geneEnd strand
20156 23456 1
23584 46953 2

I've just wondering if I make sure that I'm subtracting and adding bp in the appropriate direction, or it could be : geneStart -50 geneStart + 200 in both cases (strand - and +)

Thank you very much in advance,

Best,

Peak Annotations

Hi
I have a peak file p1. I annotate files
peakAnno <- annotatePeak(p1,tssRegion=c(-3000, 3000),TxDb=txdb, annoDb="org.Hs.eg.db")
features = attributes(peakAnno)

I looked at head(features['detailGenomicAnnotation'][[1]])
for every peak it has multiple columns true. Why?
I want to know exactly, how do you assign promoter region to a peak.

No. of promoters in peaks / No of promoters in whole genome

Prerequisites

Have you read Feedback and follow the guide?
- make sure your are using the latest release version
- read the documents
- google your quesion/issue

Describe you issue

Make a reproducible example (e.g. 1)
your code should contain comments to describe the problem (e.g. what expected and actually happened?)

Ask in right place

for bugs or feature requests, post here (github issue)
for questions, please post to Bioconductor or Biostars with tag ChIPseeker