rcavalcante / annotatr Goto Github PK

Package Homepage: http://bioconductor.org/packages/devel/bioc/html/annotatr.html Bug Reports: https://support.bioconductor.org/p/new/post/?tag_val=annotatr.

R 100.00%

annotation genome-annotation

annotatr's People

Contributors

Stargazers

Watchers

Forkers

kdkorthauer insilicolife xin8you qindan2008 ismailm ryancey1 wassimsalam01 nickwilliamssanger

annotatr's Issues

Error in builtin_genomes() : could not find function "builtin_genomes"

Hi Raymond,
I am building a package that imports some function from annotatr to annotate certain regions of the genome. I am using the function build_annotations to build these annotations and I am consistently getting an error saying
Error in builtin_genomes() : could not find function "builtin_genomes"

I am using the current version of annotatr 1.2.1, do you have any insight on why I would get these errors.

Thank you,
Divy

Problem building lncrna annotations

Hello, I've been struggling to build the lncrna annotations using this package. Initially, after the following;

annots = c (
  'hg38_basicgenes',
  'hg38_lncrna_gencode'
)

annotations = build_annotations(genome = 'hg38', annotations = annots)

I ran into this issue.

Error in get(txdb_name) : 
  object 'TxDb.Hsapiens.UCSC.hg38.knownGene' not found

After installing that package and loading it, I ran into the following error which I have been unable to resolve.

'select()' returned 1:1 mapping between keys and columns
Building promoters...
Building 1to5kb upstream of TSS...
Building 5UTRs...
Building 3UTRs...
Building exons...
Building introns...
snapshotDate(): 2021-05-18
Building lncRNA transcripts...
Error in .local(x, i, j = j, ...) : 'i' must be length 1

Oddly enough, this was working before and I believe some package incompatibility to be involved or something along those lines. The workarounds mentioned in previous issue aren't helpful (unless I am mistaken) as I need transcript IDs to be maintained.

This is my session info dump:

- Session info --------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 4.1.0 (2021-05-18)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_Australia.1252      
 ctype    English_Australia.1252      
 tz       Australia/Sydney            
 date     2021-07-06                  

- Packages ------------------------------------------------------------------------------------------------------------
 package                           * version  date       lib source        
 AnnotationDbi                     * 1.54.1   2021-06-08 [1] Bioconductor  
 AnnotationHub                     * 3.0.1    2021-06-20 [1] Bioconductor  
 annotatr                          * 1.18.0   2021-05-19 [1] Bioconductor  
 assertthat                          0.2.1    2019-03-21 [1] CRAN (R 4.1.0)
 Biobase                           * 2.52.0   2021-05-19 [1] Bioconductor  
 BiocFileCache                     * 2.0.0    2021-05-19 [1] Bioconductor  
 BiocGenerics                      * 0.38.0   2021-05-19 [1] Bioconductor  
 BiocIO                              1.2.0    2021-05-19 [1] Bioconductor  
 BiocManager                       * 1.30.16  2021-06-15 [1] CRAN (R 4.1.0)
 BiocParallel                        1.26.1   2021-07-04 [1] Bioconductor  
 BiocVersion                         3.13.1   2021-03-08 [1] Bioconductor  
 biomaRt                             2.48.2   2021-07-01 [1] Bioconductor  
 Biostrings                          2.60.1   2021-06-06 [1] Bioconductor  
 bit                                 4.0.4    2020-08-04 [1] CRAN (R 4.1.0)
 bit64                               4.0.5    2020-08-30 [1] CRAN (R 4.1.0)
 bitops                              1.0-7    2021-04-24 [1] CRAN (R 4.1.0)
 blob                                1.2.1    2020-01-20 [1] CRAN (R 4.1.0)
 BSgenome                            1.60.0   2021-05-19 [1] Bioconductor  
 cachem                              1.0.5    2021-05-15 [1] CRAN (R 4.1.0)
 callr                               3.7.0    2021-04-20 [1] CRAN (R 4.1.0)
 cli                                 3.0.0    2021-06-30 [1] CRAN (R 4.1.0)
 colorspace                          2.0-2    2021-06-24 [1] CRAN (R 4.1.0)
 crayon                              1.4.1    2021-02-08 [1] CRAN (R 4.1.0)
 curl                                4.3.2    2021-06-23 [1] CRAN (R 4.1.0)
 DBI                                 1.1.1    2021-01-15 [1] CRAN (R 4.1.0)
 dbplyr                            * 2.1.1    2021-04-06 [1] CRAN (R 4.1.0)
 DelayedArray                        0.18.0   2021-05-19 [1] Bioconductor  
 desc                                1.3.0    2021-03-05 [1] CRAN (R 4.1.0)
 devtools                            2.4.2    2021-06-07 [1] CRAN (R 4.1.0)
 digest                              0.6.27   2020-10-24 [1] CRAN (R 4.1.0)
 dplyr                               1.0.7    2021-06-18 [1] CRAN (R 4.1.0)
 ellipsis                            0.3.2    2021-04-29 [1] CRAN (R 4.1.0)
 fansi                               0.5.0    2021-05-25 [1] CRAN (R 4.1.0)
 fastmap                             1.1.0    2021-01-25 [1] CRAN (R 4.1.0)
 filelock                            1.0.2    2018-10-05 [1] CRAN (R 4.1.0)
 fs                                  1.5.0    2020-07-31 [1] CRAN (R 4.1.0)
 generics                            0.1.0    2020-10-31 [1] CRAN (R 4.1.0)
 GenomeInfoDb                      * 1.28.1   2021-07-01 [1] Bioconductor  
 GenomeInfoDbData                    1.2.6    2021-07-06 [1] Bioconductor  
 GenomicAlignments                   1.28.0   2021-05-19 [1] Bioconductor  
 GenomicFeatures                   * 1.44.0   2021-05-19 [1] Bioconductor  
 GenomicRanges                     * 1.44.0   2021-05-19 [1] Bioconductor  
 ggplot2                             3.3.5    2021-06-25 [1] CRAN (R 4.1.0)
 glue                                1.4.2    2020-08-27 [1] CRAN (R 4.1.0)
 gtable                              0.3.0    2019-03-25 [1] CRAN (R 4.1.0)
 hms                                 1.1.0    2021-05-17 [1] CRAN (R 4.1.0)
 htmltools                           0.5.1.1  2021-01-22 [1] CRAN (R 4.1.0)
 httpuv                              1.6.1    2021-05-07 [1] CRAN (R 4.1.0)
 httr                                1.4.2    2020-07-20 [1] CRAN (R 4.1.0)
 interactiveDisplayBase              1.30.0   2021-05-19 [1] Bioconductor  
 IRanges                           * 2.26.0   2021-05-19 [1] Bioconductor  
 KEGGREST                            1.32.0   2021-05-19 [1] Bioconductor  
 later                               1.2.0    2021-04-23 [1] CRAN (R 4.1.0)
 lattice                             0.20-44  2021-05-02 [2] CRAN (R 4.1.0)
 lifecycle                           1.0.0    2021-02-15 [1] CRAN (R 4.1.0)
 magrittr                            2.0.1    2020-11-17 [1] CRAN (R 4.1.0)
 Matrix                              1.3-3    2021-05-04 [2] CRAN (R 4.1.0)
 MatrixGenerics                      1.4.0    2021-05-19 [1] Bioconductor  
 matrixStats                         0.59.0   2021-06-01 [1] CRAN (R 4.1.0)
 memoise                             2.0.0    2021-01-26 [1] CRAN (R 4.1.0)
 mime                                0.11     2021-06-23 [1] CRAN (R 4.1.0)
 munsell                             0.5.0    2018-06-12 [1] CRAN (R 4.1.0)
 org.Hs.eg.db                      * 3.13.0   2021-07-06 [1] Bioconductor  
 pillar                              1.6.1    2021-05-16 [1] CRAN (R 4.1.0)
 pkgbuild                            1.2.0    2020-12-15 [1] CRAN (R 4.1.0)
 pkgconfig                           2.0.3    2019-09-22 [1] CRAN (R 4.1.0)
 pkgload                             1.2.1    2021-04-06 [1] CRAN (R 4.1.0)
 plyr                                1.8.6    2020-03-03 [1] CRAN (R 4.1.0)
 png                                 0.1-7    2013-12-03 [1] CRAN (R 4.1.0)
 prettyunits                         1.1.1    2020-01-24 [1] CRAN (R 4.1.0)
 processx                            3.5.2    2021-04-30 [1] CRAN (R 4.1.0)
 progress                            1.2.2    2019-05-16 [1] CRAN (R 4.1.0)
 promises                            1.2.0.1  2021-02-11 [1] CRAN (R 4.1.0)
 ps                                  1.6.0    2021-02-28 [1] CRAN (R 4.1.0)
 purrr                               0.3.4    2020-04-17 [1] CRAN (R 4.1.0)
 R6                                  2.5.0    2020-10-28 [1] CRAN (R 4.1.0)
 rappdirs                            0.3.3    2021-01-31 [1] CRAN (R 4.1.0)
 Rcpp                                1.0.6    2021-01-15 [1] CRAN (R 4.1.0)
 RCurl                               1.98-1.3 2021-03-16 [1] CRAN (R 4.1.0)
 readr                               1.4.0    2020-10-05 [1] CRAN (R 4.1.0)
 regioneR                            1.24.0   2021-05-19 [1] Bioconductor  
 remotes                             2.4.0    2021-06-02 [1] CRAN (R 4.1.0)
 reshape2                            1.4.4    2020-04-09 [1] CRAN (R 4.1.0)
 restfulr                            0.0.13   2017-08-06 [1] CRAN (R 4.1.0)
 rjson                               0.2.20   2018-06-08 [1] CRAN (R 4.1.0)
 rlang                               0.4.11   2021-04-30 [1] CRAN (R 4.1.0)
 rprojroot                           2.0.2    2020-11-15 [1] CRAN (R 4.1.0)
 Rsamtools                           2.8.0    2021-05-19 [1] Bioconductor  
 RSQLite                             2.2.7    2021-04-22 [1] CRAN (R 4.1.0)
 rtracklayer                         1.52.0   2021-05-19 [1] Bioconductor  
 S4Vectors                         * 0.30.0   2021-05-19 [1] Bioconductor  
 scales                              1.1.1    2020-05-11 [1] CRAN (R 4.1.0)
 sessioninfo                         1.1.1    2018-11-05 [1] CRAN (R 4.1.0)
 shiny                               1.6.0    2021-01-25 [1] CRAN (R 4.1.0)
 stringi                             1.6.2    2021-05-17 [1] CRAN (R 4.1.0)
 stringr                             1.4.0    2019-02-10 [1] CRAN (R 4.1.0)
 SummarizedExperiment                1.22.0   2021-05-19 [1] Bioconductor  
 testthat                            3.0.4    2021-07-01 [1] CRAN (R 4.1.0)
 tibble                              3.1.2    2021-05-16 [1] CRAN (R 4.1.0)
 tidyselect                          1.1.1    2021-04-30 [1] CRAN (R 4.1.0)
 TxDb.Hsapiens.UCSC.hg38.knownGene * 3.13.0   2021-07-06 [1] Bioconductor  
 usethis                             2.0.1    2021-02-10 [1] CRAN (R 4.1.0)
 utf8                                1.2.1    2021-03-12 [1] CRAN (R 4.1.0)
 vctrs                               0.3.8    2021-04-29 [1] CRAN (R 4.1.0)
 withr                               2.4.2    2021-04-18 [1] CRAN (R 4.1.0)
 XML                                 3.99-0.6 2021-03-16 [1] CRAN (R 4.1.0)
 xml2                                1.3.2    2020-04-23 [1] CRAN (R 4.1.0)
 xtable                              1.8-4    2019-04-21 [1] CRAN (R 4.1.0)
 XVector                             0.32.0   2021-05-19 [1] Bioconductor  
 yaml                                2.2.1    2020-02-01 [1] CRAN (R 4.1.0)
 zlibbioc                            1.38.0   2021-05-19 [1] Bioconductor  

[1] C:/Users/usr/OneDrive/Documents/R/win-library/4.1
[2] C:/Program Files/R/R-4.1.0/library

some functions were not found

Some functions of annotatr package were not found why ?
Anyone can help me please ?

build_hmm_annots()
Error in build_hmm_annots() :
impossible de trouver la fonction "build_hmm_annots"

build_cpg_annots()
Error in build_cpg_annots() :
impossible de trouver la fonction "build_cpg_annots"

build_enhancer_annots()
Error in build_enhancer_annots() :
impossible de trouver la fonction "build_enhancer_annots"

ls("package:annotatr")
[1] "annotate_regions" "annotations" "annotatr_cache"
[4] "build_ah_annots" "build_annotations" "builtin_annotations"
[7] "builtin_genomes" "plot_annotation" "plot_categorical"
[10] "plot_coannotations" "plot_numerical" "plot_numerical_coannotations"
[13] "randomize_regions" "read_annotations" "read_regions"
[16] "summarize_annotations" "summarize_categorical" "summarize_numerical"

hg19_cpgs negative width for inter-CGI

Hi I was using hg19_cpgs annotations for CGI, shelves, shores and inter-CGI regions but when counting the size of each feature type I get negative values for inter-CGI regions. I am not sure how is this possible?

When I make separate annotations for each feature category and count width and compare it to the either the hg19_cpgs or manually mergeed GRangesList I get the same width for all features except for inter-CGI!

Do you know what could be the cause of this behaviour?

this is my code:

> annotations_CpGs.gr = build_annotations(genome = 'hg19', annotations = "hg19_cpgs")

# Make GRangesList object by splitting by type of annotation
annotations_CpGs.grl <- split(annotations_CpGs.gr, mcols(annotations_CpGs.gr)$type)
    names(annotations_CpGs.grl)

> sum(width(reduce(annotations_CpGs.grl)))
  hg19_cpg_inter hg19_cpg_islands hg19_cpg_shelves 
     -1368521925         21842742         87006497 
 hg19_cpg_shores 
       101866654 

    # Custom annotations:

>     annotations_CGI.gr = build_annotations(genome = 'hg19', annotations = "hg19_cpg_islands")
>     annotations_shores.gr = build_annotations(genome = 'hg19', annotations = "hg19_cpg_shores")
>     annotations_shelves.gr = build_annotations(genome = 'hg19', annotations = "hg19_cpg_shelves")
>     annotations_interCGI.gr = build_annotations(genome = 'hg19', annotations = "hg19_cpg_inter")

> sum(width(reduce(annotations_CGI.gr)))
[1] 21842742

> sum(width(reduce(annotations_shores.gr)))
[1] 101866654
> 
> sum(width(reduce(annotations_shelves.gr)))
[1] 87006497
> 
> sum(width(reduce(annotations_interCGI.gr)))
[1] 2926445371
> 

 # Merged
> annotations_CpGs.grl <- GRangesList("hg19_cpg_islands"=annotations_CGI.gr,"hg19_cpg_shores"=annotations_shores.gr,"hg19_cpg_shelves"=annotations_shelves.gr, "hg19_cpg_inter"=annotations_interCGI.gr )
> 
> # Size of each CGI feature
> sum(width(reduce(annotations_CpGs.grl)))
hg19_cpg_islands  hg19_cpg_shores hg19_cpg_shelves 
        21842742        101866654         87006497 
  hg19_cpg_inter 
     -1368521925

installation requires lots of other packages

I installed annotatr and it downloaded many other bioconductor packages (like TxDb.Dmelanogaster.UCSC.dm3.ensGene, for example). I don't study flies, so this is just taking up space for me. So, a suggestion: make it so that I only have to download the annotations I am interested in, instead of sticking everything into the imports area.

Error message by plot_categorical

Hi there,
Thanks for developing annotatr.
I encountered the warnings when using plot_categorical.
Warning message: In subset_order_tbl(tbl = annotated_regions, col = fill, col_order = fill_order) : There are elements in col_order that are not present in the corresponding column. Check for typos, or this could be a result of 0 tallies.

Then I got the error message as below:
Error in seq.default(h[1], h[2], length.out = n) : 'to' must be a finite number.

`
$ table(df_dm_annotated[df_dm_annotated$DM_status=="hypo", ]$annot.type)

mm10_cpg_inter mm10_cpg_islands mm10_cpg_shelves mm10_cpg_shores
1566 12 147 458
$ table(df_dm_annotated[df_dm_annotated$DM_status=="hyper", ]$annot.type)

mm10_cpg_inter mm10_cpg_islands mm10_cpg_shelves mm10_cpg_shores
538 25 55 151
$ table(df_dm_annotated[df_dm_annotated$DM_status !="hyper" & df_dm_annotated$DM_status!="hyper", ]$annot.type)

$ table(df_dm_annotated[df_dm_annotated$DM_status !="hypo" & df_dm_annotated$DM_status!="hyper", ]$annot.type)

mm10_cpg_inter mm10_cpg_islands mm10_cpg_shelves mm10_cpg_shores
259318 25014 22412 104207
`

Could you help me on this problem? Thanks.

summarize_annotations gives 'duplicate row.names' error

Hello,

I have successfully (I think) used the annotate_regions function to add CpG location category annotations (using the 'hg38_cpgs' shortcut in the build_annotations function) to regions of interest. Now I am trying to use the summarize_annotations function to count how many regions were in islands, shelves, shores, and open sea. However, this throws an error:

"Error in data.frame(seqnames = as.factor(seqnames(x)), start = start(x), :
duplicate row.names: 107641, 63635, 88662, 115649, ..."

I am using the following code. The GRanges object that contains my regions of interest is dmrs, and the error is thrown after the last line (call to summarize_annotations).

   annotations = build_annotations(genome = 'hg38', annotations = 'hg38_cpgs')
   dm_annotated = annotate_regions(regions = dmrs,
   	annotations = annotations,
   	ignore.strand = TRUE,
   	quiet = FALSE)
   dm_annsum = summarize_annotations(
	   annotated_regions = dm_annotated,
	   quiet = TRUE)

Could this be due to the fact that some of the regions map to multiple categories (e.g. part of a region overlaps a shelf and another part of the same region overlaps a shore)? This is my best guess from the error message and the fact that if I stick the line dm_annotated = unique(dm_annotated) it runs fine. However, it seems from the documentation that duplicates such as these should be fine.

I'd appreciate any insight you might have on this.

Best,
Keegan

build_annotations() error "argument to 'which' is not logical'

From a fresh start:

> library(annotatr)
Warning messages:
1: multiple methods tables found for ‘which’
2: multiple methods tables found for ‘which’
> mm10.annots <- build_annotations("mm10", c("mm10_basicgenes"))
Loading required package: GenomicFeatures
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


'select()' returned 1:1 mapping between keys and columns
Building promoters...
Error in base::which(x, arr.ind, useNames, ...) :
  argument to 'which' is not logical

R 4.0.0, annotatr 1.15.0. I'd appreciate it if you could help, thanks!

Genome prefix must be the same for custom annotation

Hi,

I am trying to build a custom annotation for annotating lncRNAs, as below

lncRNAs_custom <- file.path("./","lncRNAs.bed")
extraCols_lncRNA = c(symbol = 'character')
custom_lncRNAs<- read_annotations(lncRNAs_custom, genome = "hg19", extraCols = extraCols_lncRNA, format = "bed", name = "lncRNAs")

head(custom_lncRNAs)
GRanges object with 6 ranges and 5 metadata columns:
seqnames ranges strand | id tx_id gene_id symbol type
|
[1] chr1 84267199-84326229 * | lncRNAs:1 LINC01725:44 hg19_custom_lncRNAs
[2] chr16 74226291-74249420 * | lncRNAs:2 lnc-ZFHX3-27:11 hg19_custom_lncRNAs

annots_custom = c('hg19_cpgs', 'hg19_basicgenes','hg19_enhancers_fantom', 'custom_lncRNAs')

All seems fine until I run the build_annotations command. I have got this error message:

annotations = build_annotations(genome = 'hg19', annotations = annots_custom)
Error in check_annotations(builtin_annotations) :
Error: genome prefix on all annotations must be the same.

I have checked in builtin_annotations, and I can see there are lncrnas:
lncrna_codes = c("hg19_lncrna_gencode", "hg38_lncrna_gencode",
"mm10_lncrna_gencode")

Do I need to incorporate this somehow to build_annotations? I think the error may come from the "name" option when running read_annotions, but I can not make it work

Many thanks,
Clara

No inbuilt function to calculate percentage Overlap

I am using this wonderful package and it has made the annotation work much easier. However, I was presented with one issue. I have to keep only those annotations for which overlap of the intended region with the feature is at least 50 %. I am not sure the minoverlap subcommand has that functionality. Also, what does minoverlap = 1 L mean. I understand the 1 L is long-form of 1 but what does it stand for; for example, does it mean that the minimum overlap must be 1 Nucleotide or something else like a 10% overlap? What should be the value for 50% overlap in that case.

Annotations for just the main isoform of a gene?

Hello,

I was wondering if it's possible to retrieve the annotations for just the main isoform of each gene in the genome and not all the other isoforms? I think this would help refine downstream enrichment testing, since otherwise there is a fair amount of overlap between features, and having this sort of filtering option would reduce it.

Below is the current call that I'd like to refine:

genome <- "hg38"
annotations <- build_annotations(genome = genome, annotations = c(paste(genome,"_basicgenes", sep = ""),
                                                                    paste(genome,"_genes_intergenic", sep = ""),
                                                                    paste(genome,"_genes_intronexonboundaries", sep = ""),
                                                                    if(genome == "hg38" | genome == "mm10"){paste(genome,"_enhancers_fantom", sep = "")})) %>%
    GenomeInfoDb::keepStandardChromosomes(., pruning.mode = "coarse")

Thanks,

Ben

Use AnnotationHub accession directly in lncRNA cases

In build_lncrna_annots() we're doing something a bit roundabout by querying and then looking for the genome. This worked when there was only one genome == GRCh38 but now there are two, and this function isn't going to work so well.

Unable to build annotations for hg38 cpg islands thought listed in builtin_annotations()

Hi there

I am trying to build the below annotations but seem to have an issue accessing the cpg island annotation data, though from what I can tell they are part of builtin_annotations()

If I omit the two cpg island annotation sets the rest of the workflow through to plotting and so on proceeds fine.

Thanks in advance for any help with this.
Best
Helen

annots = c("hg38_genes_promoters","hg38_genes_firstexons","hg38_basicgenes","hg38_cpg_islands","hg38_cpgs")
annotations = build_annotations(genome = 'hg38', annotations = annots)
'select()' returned 1:1 mapping between keys and columns
Building promoters...
Building 1to5kb upstream of TSS...
Building 5UTRs...
Building 3UTRs...
Building exons...
Building first exons...
Building introns...
Building CpG islands...
Error in open.connection(4L, "rb") : Recv failure: Connection was reset

sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] tidyr_1.1.3 dplyr_1.0.7 org.Hs.eg.db_3.13.0
[4] TxDb.Hsapiens.UCSC.hg38.knownGene_3.13.0 GenomicFeatures_1.44.2 annotatr_1.18.1
[7] AnnotationDbi_1.54.1 Biobase_2.52.0 AnnotationHub_3.0.1
[10] BiocFileCache_2.0.0 dbplyr_2.1.1 GenomicRanges_1.44.0
[13] GenomeInfoDb_1.28.2 IRanges_2.26.0 S4Vectors_0.30.0
[16] BiocGenerics_0.38.0

loaded via a namespace (and not attached):
[1] bitops_1.0-7 matrixStats_0.60.1 bit64_4.0.5 filelock_1.0.2 progress_1.2.2
[6] httr_1.4.2 tools_4.1.1 utf8_1.2.2 R6_2.5.1 DBI_1.1.1
[11] colorspace_2.0-2 tidyselect_1.1.1 prettyunits_1.1.1 bit_4.0.4 curl_4.3.2
[16] compiler_4.1.1 cli_3.0.1 xml2_1.3.2 DelayedArray_0.18.0 rtracklayer_1.52.1
[21] scales_1.1.1 readr_2.0.1 rappdirs_0.3.3 stringr_1.4.0 digest_0.6.27
[26] Rsamtools_2.8.0 XVector_0.32.0 pkgconfig_2.0.3 htmltools_0.5.2 MatrixGenerics_1.4.3
[31] fastmap_1.1.0 BSgenome_1.60.0 regioneR_1.24.0 rlang_0.4.11 RSQLite_2.2.8
[36] shiny_1.6.0 BiocIO_1.2.0 generics_0.1.0 vroom_1.5.4 BiocParallel_1.26.2
[41] RCurl_1.98-1.4 magrittr_2.0.1 GenomeInfoDbData_1.2.6 Matrix_1.3-4 Rcpp_1.0.7
[46] munsell_0.5.0 fansi_0.5.0 lifecycle_1.0.0 stringi_1.7.4 yaml_2.2.1
[51] SummarizedExperiment_1.22.0 zlibbioc_1.38.0 plyr_1.8.6 grid_4.1.1 blob_1.2.2
[56] promises_1.2.0.1 crayon_1.4.1 lattice_0.20-44 Biostrings_2.60.2 hms_1.1.0
[61] KEGGREST_1.32.0 pillar_1.6.2 rjson_0.2.20 reshape2_1.4.4 biomaRt_2.48.3
[66] XML_3.99-0.7 glue_1.4.2 BiocVersion_3.13.1 BiocManager_1.30.16 png_0.1-7
[71] vctrs_0.3.8 tzdb_0.1.2 httpuv_1.6.2 gtable_0.3.0 purrr_0.3.4
[76] assertthat_0.2.1 cachem_1.0.6 ggplot2_3.3.5 mime_0.11 xtable_1.8-4
[81] restfulr_0.0.13 later_1.3.0 tibble_3.1.4 GenomicAlignments_1.28.0 memoise_2.0.0
[86] ellipsis_0.3.2 interactiveDisplayBase_1.30.0

Randomizing Regions

Thanks for the package. It has been really useful!

If I have correctly understood, randomizing regions can allow comparison of how differentially methylated regions differ from other annotations.
I have seen that in the below link there are other files available for this purpose.
https://github.com/rcavalcante/annotatr/find/master

Would you recommend any of the files available? In my case I am working with human hg19.
Thanks!
Clara

Build a plot function similar to the bar plot for Orochi_bs.

Failing to check with dplyr 0.8.0 release candidate

As part of checking reverse dependencies for the dplyr 0.8.0 release candidate, we see this problem below.

I'm not sure what the problem is.

> revdepcheck::revdep_details(revdep = "annotatr")
══ Reverse dependency check ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ annotatr 1.6.0 ══

Status: BROKEN

── Still failing

✖ checking examples ... ERROR
✖ checking re-building of vignette outputs ... WARNING
✖ checking package dependencies ... NOTE
✖ checking R code for possible problems ... NOTE

── Newly failing

✖ checking tests ...

── Before ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
❯ checking examples ... ERROR
  Running examples in ‘annotatr-Ex.R’ failed
  The error most likely occurred in:
  
  > ### Name: build_annotations
  > ### Title: A function to build annotations from TxDb.* and AnnotationHub
  > ###   resources
  > ### Aliases: build_annotations
  > 
  > ### ** Examples
  > 
  > # Example with hg19 gene promoters
  > annots = c('hg19_genes_promoters')
  > annots_gr = build_annotations(genome = 'hg19', annotations = annots)
  Error in build_gene_annots(genome = genome, annotations = gene_annotations) : 
    The package TxDb.Hsapiens.UCSC.hg19.knownGene is not installed, please install it via Bioconductor.
  Calls: build_annotations
  Execution halted

❯ checking re-building of vignette outputs ... WARNING
  Error in re-building vignettes:
    ...
  snapshotDate(): 2018-04-30
  Building annotation Gm12878 from AnnotationHub resource AH23256 ...
  require("rtracklayer")
  Warning: package 'rtracklayer' was built under R version 3.5.1
  Warning: package 'GenomicRanges' was built under R version 3.5.1
  Warning: package 'IRanges' was built under R version 3.5.1
  downloading 0 resources
  loading from cache 
      '/Users/romain//.AnnotationHub/28684'
  Quitting from lines 153-170 (annotatr-vignette.Rmd) 
  Error: processing vignette 'annotatr-vignette.Rmd' failed with diagnostics:
  The package TxDb.Hsapiens.UCSC.hg19.knownGene is not installed, please install it via Bioconductor.
  Execution halted

❯ checking package dependencies ... NOTE
  Packages suggested but not available for checking:
    ‘org.Dm.eg.db’ ‘org.Gg.eg.db’ ‘org.Hs.eg.db’ ‘org.Mm.eg.db’
    ‘org.Rn.eg.db’ ‘TxDb.Dmelanogaster.UCSC.dm3.ensGene’
    ‘TxDb.Dmelanogaster.UCSC.dm6.ensGene’
    ‘TxDb.Ggallus.UCSC.galGal5.refGene’
    ‘TxDb.Hsapiens.UCSC.hg19.knownGene’
    ‘TxDb.Hsapiens.UCSC.hg38.knownGene’
    ‘TxDb.Mmusculus.UCSC.mm9.knownGene’
    ‘TxDb.Mmusculus.UCSC.mm10.knownGene’
    ‘TxDb.Rnorvegicus.UCSC.rn4.ensGene’
    ‘TxDb.Rnorvegicus.UCSC.rn5.refGene’
    ‘TxDb.Rnorvegicus.UCSC.rn6.refGene’

❯ checking R code for possible problems ... NOTE
  plot_coannotations: no visible binding for global variable ‘.’
    (/Users/romain/git/release/dplyr/revdep/checks.noindex/annotatr/old/annotatr.Rcheck/00_pkg_src/annotatr/R/visualize.R:176-178)
  plot_numerical_coannotations: no visible binding for global variable
    ‘.’
    (/Users/romain/git/release/dplyr/revdep/checks.noindex/annotatr/old/annotatr.Rcheck/00_pkg_src/annotatr/R/visualize.R:463-480)
  plot_numerical_coannotations: no visible binding for global variable
    ‘.’
    (/Users/romain/git/release/dplyr/revdep/checks.noindex/annotatr/old/annotatr.Rcheck/00_pkg_src/annotatr/R/visualize.R:466-471)
  plot_numerical_coannotations: no visible binding for global variable
    ‘.’
    (/Users/romain/git/release/dplyr/revdep/checks.noindex/annotatr/old/annotatr.Rcheck/00_pkg_src/annotatr/R/visualize.R:473-478)
  Undefined global functions or variables:
    .

1 error ✖ | 1 warning ✖ | 2 notes ✖

── After ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
❯ checking examples ... ERROR
  Running examples in ‘annotatr-Ex.R’ failed
  The error most likely occurred in:
  
  > ### Name: build_annotations
  > ### Title: A function to build annotations from TxDb.* and AnnotationHub
  > ###   resources
  > ### Aliases: build_annotations
  > 
  > ### ** Examples
  > 
  > # Example with hg19 gene promoters
  > annots = c('hg19_genes_promoters')
  > annots_gr = build_annotations(genome = 'hg19', annotations = annots)
  Error in build_gene_annots(genome = genome, annotations = gene_annotations) : 
    The package TxDb.Hsapiens.UCSC.hg19.knownGene is not installed, please install it via Bioconductor.
  Calls: build_annotations
  Execution halted

❯ checking tests ...
  See below...

❯ checking re-building of vignette outputs ... WARNING
  Error in re-building vignettes:
    ...
  snapshotDate(): 2018-04-30
  Building annotation Gm12878 from AnnotationHub resource AH23256 ...
  require("rtracklayer")
  Warning: package 'rtracklayer' was built under R version 3.5.1
  Warning: package 'GenomicRanges' was built under R version 3.5.1
  Warning: package 'IRanges' was built under R version 3.5.1
  downloading 0 resources
  loading from cache 
      '/Users/romain//.AnnotationHub/28684'
  Quitting from lines 153-170 (annotatr-vignette.Rmd) 
  Error: processing vignette 'annotatr-vignette.Rmd' failed with diagnostics:
  The package TxDb.Hsapiens.UCSC.hg19.knownGene is not installed, please install it via Bioconductor.
  Execution halted

❯ checking package dependencies ... NOTE
  Packages suggested but not available for checking:
    ‘org.Dm.eg.db’ ‘org.Gg.eg.db’ ‘org.Hs.eg.db’ ‘org.Mm.eg.db’
    ‘org.Rn.eg.db’ ‘TxDb.Dmelanogaster.UCSC.dm3.ensGene’
    ‘TxDb.Dmelanogaster.UCSC.dm6.ensGene’
    ‘TxDb.Ggallus.UCSC.galGal5.refGene’
    ‘TxDb.Hsapiens.UCSC.hg19.knownGene’
    ‘TxDb.Hsapiens.UCSC.hg38.knownGene’
    ‘TxDb.Mmusculus.UCSC.mm9.knownGene’
    ‘TxDb.Mmusculus.UCSC.mm10.knownGene’
    ‘TxDb.Rnorvegicus.UCSC.rn4.ensGene’
    ‘TxDb.Rnorvegicus.UCSC.rn5.refGene’
    ‘TxDb.Rnorvegicus.UCSC.rn6.refGene’

❯ checking R code for possible problems ... NOTE
  plot_coannotations: no visible binding for global variable ‘.’
    (/Users/romain/git/release/dplyr/revdep/checks.noindex/annotatr/new/annotatr.Rcheck/00_pkg_src/annotatr/R/visualize.R:176-178)
  plot_numerical_coannotations: no visible binding for global variable
    ‘.’
    (/Users/romain/git/release/dplyr/revdep/checks.noindex/annotatr/new/annotatr.Rcheck/00_pkg_src/annotatr/R/visualize.R:463-480)
  plot_numerical_coannotations: no visible binding for global variable
    ‘.’
    (/Users/romain/git/release/dplyr/revdep/checks.noindex/annotatr/new/annotatr.Rcheck/00_pkg_src/annotatr/R/visualize.R:466-471)
  plot_numerical_coannotations: no visible binding for global variable
    ‘.’
    (/Users/romain/git/release/dplyr/revdep/checks.noindex/annotatr/new/annotatr.Rcheck/00_pkg_src/annotatr/R/visualize.R:473-478)
  Undefined global functions or variables:
    .

── Test failures ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── testthat ────

> library(testthat)
> library(annotatr)
> 
> test_check("annotatr")
── 1. Error: Test plot_numerical_coannotations() (@test_7_visualize.R#163)  ────
n < m
1: plot_numerical_coannotations(annotated_regions = dm_annots, x = "mu0", annot1 = "hg19_cpg_islands", 
       annot2 = "hg19_cpg_shores", bin_width = 5, plot_title = "Group 0 Perc. Meth. in CpG Islands and Promoters", 
       x_label = "Percent Methylation", legend_facet_label = "Perc. Methylation in annotation pair", 
       legend_cum_label = "Overall Perc. Methylation") at testthat/test_7_visualize.R:163
2: dplyr::do(dplyr::group_by_(sub_tbl, .dots = c("seqnames", "start", "end")), if (nrow(.) == 
       1) {
       as.data.frame(t(utils::combn(rep.int(as.character(.$annot.type), 2), 2)), stringsAsFactors = FALSE)
   } else {
       as.data.frame(t(utils::combn(sort(as.character(.$annot.type)), 2)), stringsAsFactors = FALSE)
   }) at /Users/romain/git/release/dplyr/revdep/checks.noindex/annotatr/new/annotatr.Rcheck/00_pkg_src/annotatr/R/visualize.R:463
3: do.grouped_df(dplyr::group_by_(sub_tbl, .dots = c("seqnames", "start", "end")), if (nrow(.) == 
       1) {
       as.data.frame(t(utils::combn(rep.int(as.character(.$annot.type), 2), 2)), stringsAsFactors = FALSE)
   } else {
       as.data.frame(t(utils::combn(sort(as.character(.$annot.type)), 2)), stringsAsFactors = FALSE)
   }) at /Users/romain/git/tidyverse/dplyr/R/do.r:91
4: eval_tidy(args[[j]], mask) at /Users/romain/git/tidyverse/dplyr/R/grouped-df.r:331
5: as.data.frame(t(utils::combn(sort(as.character(.$annot.type)), 2)), stringsAsFactors = FALSE) at /Users/romain/git/release/dplyr/revdep/checks.noindex/annotatr/new/annotatr.Rcheck/00_pkg_src/annotatr/R/visualize.R:473
6: t(utils::combn(sort(as.character(.$annot.type)), 2))
7: utils::combn(sort(as.character(.$annot.type)), 2)
8: stop("n < m", domain = NA)

══ testthat results  ═══════════════════════════════════════════════════════════
OK: 70 SKIPPED: 0 FAILED: 1
1. Error: Test plot_numerical_coannotations() (@test_7_visualize.R#163) 

Error: testthat unit tests failed
Execution halted

2 errors ✖ | 1 warning ✖ | 2 notes ✖

Problem with obtaining certain annotations

The below code I use to download annotations with annotatr works only partially:

annotations <- c("hg19_cpgs","hg19_genes_promoters")
promoters.GR <- annotatr::build_annotations(genome='hg19', annotations=c(annotations[2]))
cpgs.GR <- annotatr::build_annotations(genome='hg19', annotations=c(annotations[1]))

Specifically, there is no problem to get annotations of promoters, but when trying to get cpg annotations, the following error occurs:

Error in UseMethod("filter_") : no applicable method for 'filter_' applied to an object of class "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"

What could be the reason for this?

I'm using R version 3.6.3 and annotatr version 1.12.1

Can't get this working

Hi I am having trouble running annotatr at the moment. This may reflect the fact that I have recently moved offices and am using a Windows machine and am not really used to running R in a Windows environment.
Perhaps you can help?
Thanks
Tom

`> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] BiocInstaller_1.22.3

loaded via a namespace (and not attached):
[1] colorspace_1.3-2 scales_0.4.1 assertthat_0.1 lazyeval_0.2.0 plyr_1.8.4 tools_3.3.3 gtable_0.2.0 tibble_1.2 Rcpp_0.12.10 ggplot2_2.2.1
[11] grid_3.3.3 munsell_0.4.3 `

lncRNA link not working

Hello, I don't know if this project is still being updated or not, but I was trying to annotate my DMRs using annotatr and even though most of the annotations that I wanted are working, the one for hg19_lncrna_gencode is throwing an error as it is not finding the URL.

It is currently trying to find the information from 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz' and since this leads to an empty page, it is not working anymore.

I have managed to download the .gz file from https://www.gencodegenes.org/human/release_19.html and I thought I could perhaps make a custom annotation but I am not managing. What I have tried is this:

> data.file <- '/home/alejandrarodrigu21/Downloads/gencode.v19.long_noncoding_RNAs.gtf.gz'
> read_annotations(data.file, name='lncRNA', genome='hg19')
Error in .local(con, format, text, ...) : 
  unused argument (extraCols = character(0))

But I do not understand what the error is - I am not that fluent in R yet. I have also tried this:

annotationgr = build_annotations(genome='hg19', annotations='/home/alejandrarodrigu21/Downloads/gencode.v19.long_noncoding_RNAs.gtf.gz')
Error: ‘/home/alejandrarodrigu21/Downloads/gencode.v19.long_noncoding_RNAs.gtf.gz’ not in annotatr_cache

Could anyone help me with this?
Thanks,
Alejandra

Annotate from GTF

I use your annotation package, which is very practical and complete. We're developing a workflow supposed to work with many different species, so I would like to be able to start from any GTF file for gene annotation. I've seen that it is possible to use custom annotations, but I'm a bit lost in the different functions to do so.
I though to make my own TxDb using makeTxDbFromGFF, but I don't know how to create the custom annotations then.

I would also need to add annotations of CGIs which I have as a bed file. How do I combine those pieces of information to use build_annotations ?

Thank you in advance for your help.

Magali

Link island/shore/shelf IDs

Build custom annotation for bovine genome

Hi,
I'm planning to use annotatr to annotate differentially methylated regions into bovine ARS-UCD1.2 genome assembly.
With my limited knowledge (after reading the vignette) I understand that annotatr can be customized and used for annotating genome other than the ones already present in the package.

I would really appreciate if you could suggest me how to get started with this.

Thank you,
Suraj

Annotation for Arabidopsis thaliana ?

Dear Raymond,

I'd like to use annotatr for Arabidopsis thaliana, but I am not sure if this possible.
I have created a regulatory catalogue from an integrative analysis of Arabidopsis thaliana ChIP-seq datasets (http://remap.univ-amu.fr/), and would like to be able to built genome annotation for Arabidopsis thaliana.

Here is what we could use for our human catalogue.

annots <- c("hg38_basicgenes", "hg38_genes_intergenic",
            "hg38_genes_promoters", "hg38_genes_5UTRs", "hg38_genes_3UTRs",
            "hg38_genes_exons", "hg38_genes_introns") 

annotations = build_annotations(genome = "hg38", annotations = annots)

Any advice on how to do that for thaliana ?

Thanks,
Ben

annotatr dependencies not loading automatically in Travis CI

Hi Raymond,
I am using one of the functions in annotatr in my bioconductor package. It works without any problem in my local environment. My package also passes all the builds in the bioconductor multiple platform build. However it is having some errors in Travis CI. This error occurs when generating the vignette of my package. Following is the exact error.

Quitting from lines 199-202 (my-vignette.Rmd) 
Error: processing vignette 'my-vignette.Rmd' failed with diagnostics:
The package TxDb.Hsapiens.UCSC.hg38.knownGene is not installed, please install it via Bioconductor.
Execution halted

Out of all the packages that I import in mine, annotatr is the only one that has TxDb.Hsapiens.UCSC.hg38.knownGene as a dependency. Shouldn't the dependencies of the packages that I import automatically loaded in the environment?

I am wondering if you have encountered this issue previously when importing annotatr.

Thank you,
Divy

Add dm6 CpG islands

http://hgdownload.cse.ucsc.edu/goldenpath/dm6/database/cpgIslandExt.txt.gz

Error in build_annotations

GRanges = build_annotations(genome='dm6', annotations=annotations)
produces the following error:
"The package Dm is not installed, please install it via Bioconductor."
Thanks for helping,
Tal

Refactor build annotation code

Components of this:

A function to generate a mapping table of gene IDs, transcript IDs, and gene symbols:

build_mapping_table(
    orgdb,
    columns
)

A function to append the annotation metadata:

append_annotation_metadata(
    gr,
    id_maps_df,
    id = NULL,
    type = NULL,
    gene_id_cols = c(gr = 'GENEID', id_maps_df = 'ENTREZID'),
    tx_id_col = 'TXNAME',
    symbol_col = 'SYMBOL'
)

A function to build annotations from data downloaded via URL. If tx_id, gene_id, and symbol are to be used, it would be necessary to get the data first, and write some code to create those vectors prior to using this function.

build_annotations_from_url(
    url, 
    id, 
    tx_id, 
    gene_id, 
    symbol, 
    type
)

A function build annotations from AnnotationHub. There will be a function that does a single accession, and then the existing build_ah_annots() will handle multiple accessions if needed. As above, if tx_id, gene_id, and symbol are to be used, it would be necessary to get the data first, and write some code to create those vectors prior to using this function.

build_annotations_from_annotation_hub(
    ah_acc, 
    id, 
    tx_id, 
    gene_id, 
    symbol, 
    type
)

A function to build annotations from any TxDb object and a set of id_maps:

build_annotations_from_txdb(
    txdb, 
    id_maps, 
    distal_promoter, 
    distal_start = 4000, 
    distal_end = 1000, 
    proximal_promoter, 
    proximal_upstream = 1000, 
    proximal_downstream = 0,
    CDS, 
    5UTRs, 
    exons, 
    firstexons, 
    introns, 
    intronexonboundaries, 
    exonintronboundaries, 
    3UTRs,
    intergenic
)

Special cases for build_annotations_from_txdb() where gene ID columns are not ENSEMBL, REFSEQ, or ENTREZID. There are also cases where .1 are appended to gene names (Fly, I think). In other words, this is full of inconsistencies and edge cases, and we need a general solution to make this work.
- Aradopsis uses a TAIR column
- C. elegans uses a WORMBASE column
- D. melanogaster uses a FLYBASE column
A function to build CpG-type annotations from any base GRanges object. The idea being that shores flank 2000bp from edges of islands, shelves flank 2000bp from shores, and interCGI is the between space.

build_cpg_annots(
    genome,
    islands_gr,
    islands,
    shores,
    shelves,
    interCGI
)

When finished, #21 and #30 should implicitly be complete.

hg38 CGI annotation

Hi, I was wondering where the data for hg38 CpG islands come from. The intro online mentions it is obtained from the AnnotationHub package, but when I looked at the available annotations there it only includes hg18, hg19 and no hg38. How did you obtain it? Did you liftover from hg19?

What is the source of annotation data and can it be modified?

Greetings - I'd like to be able to create a custom version of the data being used as hg19_basicgenes. Is that possible, and if so, how? I searched through the source and docs, but came up empty. Thanks and best regards!

lncrna ftp link not working?

For the past few days, I've been having trouble getting the built in annotation of hg19_lncrna_gencode to work. I updated annotatr and here's the error:

> annotatr::build_annotations(genome = 'hg19', annotations = 'hg19_lncrna_gencode')
Building lncRNA transcripts...
trying URL 'ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz'
Error in download.file(resource(con), destfile) : 
  cannot open URL 'ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz'

I think it is the Sanger ftp web site since I can download it manually, albeit from a slightly different address:
ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz
instead of
ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz

It seems like a simple fix inside of build_lncrna_annots but maybe something else is going on?

build_annotations(genome = 'hg19', annotations = 'hg19_genes_promoters') returns promoters with no gene_id or symbol

Hi,

I am using the build_annotations function to fetch the promoter (<1kb) regions of hg19 genes using:

build_annotations(genome = 'hg19', annotations = 'hg19_genes_promoters')

This gives me a Granges object with 82,960 regions, 9,528 of which have missing values for both gene_id and symbol. Presumably, the promoters with no Entrez gene id or symbol associated with them are obtained from a larger set of genes than the ones used to obtain gene labels. Is there a way to return an alternate gene id (perhaps Ensembl?) associated with the promoters that do not have an Entrez id or symbol?

I am using annotatr_1.0.3 with R 3.3.1.

Thanks!
Keegan

Add support for danRer11

Clarify warning message for `subset_order_tbl()`

In particular, replace
There are elements in col_order that are not present in the corresponding column. Check for typos, or this could be a result of 0 tallies.
With a message that indicates what the difference in the selected annotations are from what's present.

A large number of regions are missing without annotations.

##Question
A large number of regions are missing without annotations.
I have a bed file with 38685 lines, but only less than 9549 lines have been annotated.

##Step
step1：read genome region
library(annotatr)
file = "38685.bed"
test_regions = read_regions(con = file, genome = 'hg38', format = 'bed')
test_regions = test_regions[1:2000]
print(test_regions)
step2：annotate genome region
annots = c('hg38_cpgs', 'hg38_basicgenes')
annotations = build_annotations(genome = 'hg38', annotations = annots)
test_annotated = annotate_regions(regions = test_regions, annotations = annotations, ignore.strand = TRUE, quiet = FALSE)
print(test_annotated)
df_test_annotated = data.frame(test_annotated)
print(head(df_test_annotated))
write.table(df_test_annotated,"df_test_annotated.txt")
step3：summary
test_annsum = summarize_annotations( annotated_regions = test_annotated, quiet = TRUE)
print(test_annsum)
step4：visualization
plot_annotation(annotated_regions = test_annotated)

Remove `randomize_regions()` function

You never really liked this function anyway because of it's potential for misuse. Users should have to implement their own random / background regions so they are fully responsible for them.

plot_categorical drops overlapping loci

The plot_categorical example in the vignette for annotating user-supplied hyper-/hypo- methylated regions as CpG island, shores, shelve, etc works as expected. But if I instead pass regions_A and regions_B and there are identical regions in both A and B then only one is reported. For example if I want to use a bed file of individual CpGs detected in Cancer and Non-Cancer cohorts, and there is a subset of CpGs detected in both then that subset only gets reported in one of the cohorts. In this example obviously you want to count the overlapping subset in both cohorts.

I traced back the data loss to this section of the plot_categorical code:

works OK:

>  annotated_regions = annotatr:::subset_order_tbl(tbl = annotated_regions, col = fill, col_order = fill_order)
> table(annotated_regions$DM_status)
   Cancer NonCancer 
   100      50

Non-cancer drops out entirely:

> annotated_regions = dplyr::distinct_(dplyr::ungroup(annotated_regions), .dots = c("seqnames", "start", "end", "annot.type"), .keep_all = TRUE)
> table(annotated_regions$DM_status)
Cancer 
100

Note that in this example all of the non-cancer CpGs were also detected in cancer and as a result they were all omitted. I suppose users with disjoint sets will never run into this issue but seems like many folks will also have some overlap in their regions.

Thanks,
John

Is it possible to calculate distance to nearest gene using annotatr?

Hi,
I have differentially methylated regions and I would like to calculate distance to nearest gene. Is there some smarter way to do using annotatr?

Regards,
bishwaG

Document rtracklayer::import adding 1 to start of GRanges

This is a source of confusion, and results in no annotations found if the regions are single nucleotide CpGs.

Txdb-package for build.annotation

hello,
how can I use a TxDb annotation-package (e.g. TxDb.Sscrofa.UCSC.susScr3.refGene) for annotatr, i.e. how can I build the annotation?

annots <- build_annotations(genome="susScr3", annotations= ? )

thank you,
dietmar

failed to load resource

Hi im using annotatr v1.14 off bioconductor and when attempting to run

build_annotations(genome = 'hg38', annotations = "hg38_enhancers_fantom")

I get the following message

snapshotDate(): 2020-04-27
loading from cache
Error: failed to load resource
  name: AH14150
  title: hg19ToHg38.over.chain.gz
  reason: error in evaluating the argument 'con' in selecting a method for function 'import': invalid class “ChainFile” object: undefined class for slot "resource" ("characterORconnection")

Im surprised this is failing since just last week on the same machine the same command worked fine.... suggestions please?

Add support for danRer11

read_regions does not work for "general" format .txt files

Hi,
I have a file that is saved as "General" format that I want to read and annotate. I am aware that the function read_regions only works for files that are saved as "text" format.
Since the file I am trying to annotate is quite a big file, if I do this change in excel directly, when I save it, not all the data is kept in the file...
Any help with this or anything that could solve this issue?

how can I change the annotation files in the annotatr_cache

Thanks for developing the package which is very useful.
First, I just wondering how can I change the the annotation files in the annotatr_cache. After I import the custom annotations:

print(annotatr_cache$get('mm10_custom_FuncElems'))
GRanges object with 1968 ranges and 5 metadata columns:
         seqnames            ranges strand |             id     tx_id   gene_id    symbol                  type
            <Rle>         <IRanges>  <Rle> |    <character> <logical> <logical> <logical>           <character>
     [1]     chr1   9648222-9650965      + |    FuncElems:1      <NA>      <NA>      <NA> mm10_custom_FuncElems
     [2]     chr1 12509175-12511893      + |    FuncElems:2      <NA>      <NA>      <NA> mm10_custom_FuncElems
     [3]     chr1 39945609-39950472      + |    FuncElems:3      <NA>      <NA>      <NA> mm10_custom_FuncElems
     [4]     chr1 56286329-56287543      + |    FuncElems:4      <NA>      <NA>      <NA> mm10_custom_FuncElems
     [5]     chr1 57329917-57330730      + |    FuncElems:5      <NA>      <NA>      <NA> mm10_custom_FuncElems
     ...      ...               ...    ... .            ...       ...       ...       ...                   ...
  [1964]    chr19 58371921-58373062      + | FuncElems:1964      <NA>      <NA>      <NA> mm10_custom_FuncElems
  [1965]    chr19 58419230-58420471      + | FuncElems:1965      <NA>      <NA>      <NA> mm10_custom_FuncElems
  [1966]    chr19 59267127-59268278      + | FuncElems:1966      <NA>      <NA>      <NA> mm10_custom_FuncElems
  [1967]    chr19 59423213-59425570      + | FuncElems:1967      <NA>      <NA>      <NA> mm10_custom_FuncElems
  [1968]    chr19 60015441-60016837      + | FuncElems:1968      <NA>      <NA>      <NA> mm10_custom_FuncElems
  -------
  seqinfo: 66 sequences from mm10 genome

> annotatr_cache$get('mm10_custom_FuncElems')$gene_id = funcElem.obj$geneIds
Error in annotatr_cache$get("mm10_custom_FuncElems")$gene_id = funcElem.obj$geneIds

Besides that issue above, sometimes I wanna delete the files in the annotatr_cache.

print(annotatr_cache$list_env())
[1] "mm10_custom_FuncElems" "mm10_custom_test"

So how can I fulfill things like that.

Thanks so much.

annotation name for enhancers

Hi, The reference manual indicates that the package has enhancer annotations but I am not sure how to load them using the build_annotations function. i.e. I am not sure what to add as an argument for annotations in the following line in order to get the enhancer annotations annotatr::build_annotations(genome = 'hg38',annotations = ?)

read regions of interest into R

Please excuse my lack of understanding annotatr, I have three genomic regions I would like to annotate, (intersect with available genomic features ) and I don't know how to read the regions into R. The Paper mentions the use of readr::read_csv() but the R tutorial does not show an example. Could you please let me know how I can read regions of interest for use with annotatr annotation tools.
In my case, the regions I'm interested in are just genes and the regions 500kb around each gene, I do not have a score (p-value) but simple coordinates for the region.
Thank you!

Where do the annotations come from?

Hi, thanks for the great package. I'm curious where the annotations/info were derived from. I get ask this question a few times already and would be nice to know the source, thanks.
A.

Annotations for hg18?

Hi,

I see from builtin_genomes() that hg19 and hg38 are supported. Is there a way to retrieve annotations for other human genomes such as hg18, even though they are not "built in"? Specifically hg18_cpgs and hg18_genes_cds?

Best,
Keegan

columns in plot_numerical

Hi!
I am trying to build a plot_numerical() with two categorical variables to facet over but I am getting 5 columns in the plot instead of 3 . My code is:

recomPlot <- plot_numerical(
  annotated_regions = te_annotated,
  x = 'Comeron',
  facet = c("annot.type", 'SFS_cat'),
  facet_order = list(c("dm6_genes_intergenic", "dm6_genes_1to5kb", "dm6_genes_3UTRs", "dm6_genes_5UTRs", "dm6_genes_introns", "dm6_genes_promoters", "dm6_genes_exons"),c('Rare','Polymorphic','Fixed')),
  bin_width = 0.5)

So, I would like my plot to show three columns (Rare','Polymorphic','Fixed) for each "annot.type". But instead I am getting 5 columns. Any way to change this?
Thanks!
Gabriel

No access to FANTOM

I started getting an error this morning, which indicates the oaths in the FANTOM database have changed.
Code:

library(annotatr)
annot <- builtin_annotations()[grep("mm10", builtin_annotations())]
annotations <- build_annotations(genome = 'mm10', annotations = annot)

Error:

Building enhancers...
snapshotDate(): 2022-10-31
loading from cache
require("rtracklayer")
Error in url(x, open = "rb") : 
  cannot open the connection to 'http://fantom.gsc.riken.jp/5/datafiles/phase2.0/extra/Enhancers/mouse_permissive_enhancers_phase_1_and_2.bed.gz'

 sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: AlmaLinux 8.5 (Arctic Sphynx)

Matrix products: default
BLAS:   /home/opt/R/4.2.2/lib64/R/lib/libRblas.so
LAPACK: /home/opt/R/4.2.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8          LC_NUMERIC=C                  LC_TIME=en_GB.UTF-8          
 [4] LC_COLLATE=en_GB.UTF-8        LC_MONETARY=en_GB.UTF-8       LC_MESSAGES=en_GB.UTF-8      
 [7] LC_PAPER=en_GB.UTF-8          LC_NAME=en_GB.UTF-8           LC_ADDRESS=en_GB.UTF-8       
[10] LC_TELEPHONE=en_GB.UTF-8      LC_MEASUREMENT=en_GB.UTF-8    LC_IDENTIFICATION=en_GB.UTF-8

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Mm.eg.db_3.16.0                                 TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0          
 [3] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2             rtracklayer_1.58.0                                 
 [5] introdataviz_0.0.0.9003                             annotatr_1.24.0                                    
 [7] forcats_0.5.2                                       stringr_1.5.0                                      
 [9] purrr_0.3.5                                         readr_2.1.3                                        
[11] tidyr_1.2.1                                         tibble_3.1.8                                       
[13] tidyverse_1.3.2                                     RnBeads.mm10_2.6.0                                 
[15] xlsx_0.6.5                                          karyoploteR_1.24.0                                 
[17] regioneR_1.30.0                                     RnBeads.hg19_1.30.0                                
[19] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0 IlluminaHumanMethylationEPICmanifest_0.3.0         
[21] data.table_1.14.6                                   wateRmelon_2.4.0                                   
[23] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1  ROC_1.74.0                                         
[25] lumi_2.50.0                                         dplyr_1.0.10                                       
[27] pheatmap_1.0.12                                     RColorBrewer_1.1-3                                 
[29] ggrepel_0.9.2                                       RnBeads_2.16.0                                     
[31] plyr_1.8.8                                          methylumi_2.44.0                                   
[33] FDb.InfiniumMethylation.hg19_2.2.0                  org.Hs.eg.db_3.16.0                                
[35] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2             GenomicFeatures_1.50.0                             
[37] AnnotationDbi_1.60.0                                reshape2_1.4.4                                     
[39] scales_1.2.1                                        illuminaio_0.40.0                                  
[41] limma_3.54.0                                        gridExtra_2.3                                      
[43] gplots_3.1.3                                        ggplot2_3.4.0                                      
[45] fields_14.1                                         viridis_0.6.2                                      
[47] viridisLite_0.4.1                                   spam_2.9-1                                         
[49] ff_4.0.7                                            bit_4.0.5                                          
[51] cluster_2.1.4                                       MASS_7.3-58.1                                      
[53] readxl_1.4.1                                        minfi_1.44.0                                       
[55] bumphunter_1.40.0                                   locfit_1.5-9.6                                     
[57] iterators_1.0.14                                    foreach_1.5.2                                      
[59] Biostrings_2.66.0                                   XVector_0.38.0                                     
[61] SummarizedExperiment_1.28.0                         Biobase_2.58.0                                     
[63] MatrixGenerics_1.10.0                               matrixStats_0.63.0                                 
[65] GenomicRanges_1.50.2                                GenomeInfoDb_1.34.6                                
[67] IRanges_2.32.0                                      S4Vectors_0.36.1                                   
[69] BiocGenerics_0.44.0                                

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3                bamsignals_1.30.0             ragg_1.2.4                   
  [4] bezier_1.1.2                  bit64_4.0.5                   knitr_1.41                   
  [7] DelayedArray_0.24.0           rpart_4.1.19                  KEGGREST_1.38.0              
 [10] RCurl_1.98-1.9                GEOquery_2.66.0               AnnotationFilter_1.22.0      
 [13] generics_0.1.3                preprocessCore_1.60.1         RSQLite_2.2.19               
 [16] tzdb_0.3.0                    httpuv_1.6.6                  lubridate_1.9.0              
 [19] xml2_1.3.3                    assertthat_0.2.1              gargle_1.2.1                 
 [22] xfun_0.35                     hms_1.1.2                     rJava_1.0-6                  
 [25] promises_1.2.0.1              evaluate_0.18                 fansi_1.0.3                  
 [28] restfulr_0.0.15               scrime_1.3.5                  progress_1.2.2               
 [31] caTools_1.18.2                dbplyr_2.2.1                  DBI_1.1.3                    
 [34] htmlwidgets_1.5.4             reshape_0.8.9                 googledrive_2.0.0            
 [37] ellipsis_0.3.2                backports_1.4.1               annotate_1.76.0              
 [40] biomaRt_2.54.0                deldir_1.0-6                  sparseMatrixStats_1.10.0     
 [43] vctrs_0.5.0                   ensembldb_2.22.0              cachem_1.0.6                 
 [46] withr_2.5.0                   BSgenome_1.66.0               checkmate_2.1.0              
 [49] GenomicAlignments_1.34.0      prettyunits_1.1.1             mclust_6.0.0                 
 [52] dotCall64_1.0-2               lazyeval_0.2.2                crayon_1.5.2                 
 [55] genefilter_1.80.2             labeling_0.4.2                pkgconfig_2.0.3              
 [58] nlme_3.1-160                  ProtGenerics_1.30.0           nnet_7.3-18                  
 [61] rlang_1.0.6                   lifecycle_1.0.3               nleqslv_3.3.3                
 [64] filelock_1.0.2                affyio_1.68.0                 BiocFileCache_2.6.0          
 [67] modelr_0.1.10                 AnnotationHub_3.6.0           dichromat_2.0-0.1            
 [70] cellranger_1.1.0              rngtools_1.5.2                base64_2.0.1                 
 [73] Matrix_1.5-3                  Rhdf5lib_1.20.0               reprex_2.0.2                 
 [76] base64enc_0.1-3               googlesheets4_1.0.1           png_0.1-8                    
 [79] rjson_0.2.21                  bitops_1.0-7                  KernSmooth_2.23-20           
 [82] rhdf5filters_1.10.0           blob_1.2.3                    DelayedMatrixStats_1.20.0    
 [85] doRNG_1.8.2                   nor1mix_1.3-0                 jpeg_0.1-10                  
 [88] memoise_2.0.1                 magrittr_2.0.3                zlibbioc_1.44.0              
 [91] compiler_4.2.2                BiocIO_1.8.0                  Rsamtools_2.14.0             
 [94] cli_3.4.1                     affy_1.76.0                   htmlTable_2.4.1              
 [97] Formula_1.2-4                 mgcv_1.8-41                   tidyselect_1.2.0             
[100] stringi_1.7.8                 textshaping_0.3.6             yaml_2.3.6                   
[103] askpass_1.1                   latticeExtra_0.6-30           VariantAnnotation_1.44.0     
[106] timechange_0.1.1              tools_4.2.2                   rstudioapi_0.14              
[109] foreign_0.8-83                farver_2.1.1                  digest_0.6.30                
[112] BiocManager_1.30.19           shiny_1.7.3                   quadprog_1.5-8               
[115] Rcpp_1.0.9                    siggenes_1.72.0               broom_1.0.1                  
[118] later_1.3.0                   BiocVersion_3.16.0            httr_1.4.4                   
[121] biovizBase_1.46.0             colorspace_2.0-3              rvest_1.0.3                  
[124] XML_3.99-0.12                 fs_1.5.2                      splines_4.2.2                
[127] xlsxjars_0.6.1                multtest_2.54.0               systemfonts_1.0.4            
[130] xtable_1.8-4                  jsonlite_1.8.3                R6_2.5.1                     
[133] Hmisc_4.7-2                   mime_0.12                     pillar_1.8.1                 
[136] htmltools_0.5.3               glue_1.6.2                    fastmap_1.1.0                
[139] BiocParallel_1.32.5           interactiveDisplayBase_1.36.0 beanplot_1.3.1               
[142] codetools_0.2-18              maps_3.4.1                    utf8_1.2.2                   
[145] lattice_0.20-45               curl_4.3.3                    gtools_3.9.4                 
[148] openssl_2.0.4                 interp_1.1-3                  survival_3.4-0               
[151] rmarkdown_2.18                munsell_0.5.0                 rhdf5_2.42.0                 
[154] GenomeInfoDbData_1.2.9        HDF5Array_1.26.0              haven_2.5.1                  
[157] gtable_0.3.1

Request for example of Chromatin states analysis for hg38 using Annotatr

Hi @rcavalcante

It would be really helpful if you could include a code chunk to demonstrate Chromatin State analysis for hg38 in your manual of annotatr. I am finding it difficult to figure out where to get the relevant files for hg38 for all the states from (for hg19 it's easy).

annotatr::build_annotations(genome = 'hg38', annotations = 'hg38_Gm12878-chromatin')
Error in check_annotations(builtin_annotations) : 
  Error: "hg38_chromatin_Gm12878-ActivePromoter, hg38_chromatin_Gm12878-WeakPromoter, hg38_chromatin_Gm12878-PoisedPromoter, hg38_chromatin_Gm12878-StrongEnhancer, hg38_chromatin_Gm12878-WeakEnhancer, hg38_chromatin_Gm12878-Insulator, hg38_chromatin_Gm12878-TxnTransition, hg38_chromatin_Gm12878-TxnElongation, hg38_chromatin_Gm12878-WeakTxn, hg38_chromatin_Gm12878-Repressed, hg38_chromatin_Gm12878-Heterochrom/lo, hg38_chromatin_Gm12878-Repetitive/CNV" is(are) not supported. See builtin_annotations().