snikumbh / seqarchrplus Goto Github PK
View Code? Open in Web Editor NEWDownstream analysis plots for (promoter) sequence (architecture) clusters
Downstream analysis plots for (promoter) sequence (architecture) clusters
Hi Sarvesh,
to compare Maize to Barley we wanted to test also CAGE data from Maize through seqArchR. But I have come to a problem.
> seqArchR::viz_seqs_acgt_mat(as.character(promoter_seqs_Maize),
+ pos_lab = positionsMaize, save_fname = NULL)
Warning message:
In .seqs_to_mat(seqs = seqs, pos_lab = pos_lab) :
NAs introduced by coercion
The promoter_seqs_Maize is DNAstringSet, created from this fasta file:
Maize_promoters.zip
promoter_seqs_Maize <- readDNAStringSet("/home/pavlu/R/SeqArchR_project/befiles/Maize_promoters.fa", format="fasta",
nrec=-1L, skip=0L, seek.first.rec=FALSE, use.names=TRUE)
> positionsMaize
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
[35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
[69] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
This does not compromise the seqArchR run overall, and we were able to use seqArchRplus...
seqArchRplus::seqs_acgt_image(sname = sn,
seqs = results[[sn]]$rawSeqs,
seqs_ord = unlist(use_clusts),
pos_lab = -50:49, dir_path = use_dir)
! Directory exists: /home/pavlu/R/SeqArchR_project/output/Maize_results
RStudioGD
2
Warning message:
In .seqs_to_mat(seqs = seqs, pos_lab = pos_lab) :
NAs introduced by coercion
> seqArchRplus::write_seqArchR_cluster_track_bed(sname = sn,
+ use_q_bound = F,
+ use_as_names = "names",
+ clusts = use_clusts,
+ info_df = info_df,
+ one_zip_all = TRUE,
+ org_name = fname_prefix,
+ dir_path = use_dir,
+ include_in_report = FALSE,
+ strand_sep = FALSE)
ℹ Preparing cluster-wise BED for Maize
! Directory exists: /home/pavlu/R/SeqArchR_project/output/Maize_results
! Directory exists: /home/pavlu/R/SeqArchR_project/output/Maize_results/Cluster_BED_tracks
ℹ Writing cluster BED track files at: /home/pavlu/R/SeqArchR_project/output/Maize_results/Cluster_BED_tracks
Error in ans[npos] <- rep(no, length.out = len)[npos] :
replacement has length zero
In addition: Warning message:
In rep(no, length.out = len) : 'x' is NULL so the result will be NULL
There may be problem in the structure of our starting BED file since the results are not our own and we are missing some of the result columns from CAGEr, seqArchR would want...
Here is our befile used:
Maize_promoters_better.zip
Here is the info_df, maybe that also helps:
info_df <- read.delim(
file = bed_info_fname,
sep = "\t", header = TRUE,
col.names = c(
"chr", "start", "end", "names",
"score", "strand", "IQW"
)
)
Since we were not able to create bedfiles for clusters we were also not able to check the heatmaps of certain motives...
Here is our result, everything else except the parts I mentioned worked. (We don't have TPM values sadly as of now)
Maize_combined.pdf
Here is session info for both, hope you can help:
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=cs_CZ.UTF-8 LC_NUMERIC=C LC_TIME=cs_CZ.UTF-8 LC_COLLATE=cs_CZ.UTF-8 LC_MONETARY=cs_CZ.UTF-8
[6] LC_MESSAGES=cs_CZ.UTF-8 LC_PAPER=cs_CZ.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.48.3 AnnotationDbi_1.58.0 ggpubr_0.4.0 egg_0.4.5
[5] gridExtra_2.3 ggplot2_3.3.6 ChIPseeker_1.32.0 seqArchRplus_0.99.0.4
[9] dplyr_1.0.9 Biostrings_2.64.0 XVector_0.36.0 seqArchR_1.0.0
[13] CAGEr_2.2.0 MultiAssayExperiment_1.22.0 SummarizedExperiment_1.26.1 Biobase_2.56.0
[17] GenomicRanges_1.48.0 GenomeInfoDb_1.32.2 IRanges_2.30.0 S4Vectors_0.34.0
[21] BiocGenerics_0.42.0 MatrixGenerics_1.8.0 matrixStats_0.62.0
loaded via a namespace (and not attached):
[1] backports_1.4.1 shadowtext_0.1.2 fastmatch_1.1-3
[4] VGAM_1.1-6 BiocFileCache_2.4.0 plyr_1.8.7
[7] igraph_1.3.2 lazyeval_0.2.2 splines_4.2.1
[10] operator.tools_1.6.3 BiocParallel_1.30.3 digest_0.6.29
[13] yulab.utils_0.0.4 GOSemSim_2.22.0 viridis_0.6.2
[16] GO.db_3.15.0 fansi_1.0.3 magrittr_2.0.3
[19] memoise_2.0.1 BSgenome_1.64.0 cluster_2.1.3
[22] graphlayouts_0.8.0 formula.tools_1.7.1 enrichplot_1.16.1
[25] prettyunits_1.1.1 colorspace_2.0-3 blob_1.2.3
[28] rappdirs_0.3.3 ggrepel_0.9.1 crayon_1.5.1
[31] RCurl_1.98-1.7 jsonlite_1.8.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[34] scatterpie_0.1.7 ape_5.6-2 glue_1.6.2
[37] polyclip_1.10-0 gtable_0.3.0 zlibbioc_1.42.0
[40] DelayedArray_0.22.0 car_3.0-13 abind_1.4-5
[43] scales_1.2.0 DOSE_3.22.0 som_0.3-5.1
[46] DBI_1.1.2 rstatix_0.7.0 Rcpp_1.0.8.3
[49] plotrix_3.8-2 viridisLite_0.4.0 progress_1.2.2
[52] gridGraphics_0.5-1 tidytree_0.3.9 bit_4.0.4
[55] httr_1.4.3 fgsea_1.22.0 gplots_3.1.3
[58] RColorBrewer_1.1-3 ellipsis_0.3.2 factoextra_1.0.7
[61] pkgconfig_2.0.3 XML_3.99-0.10 farver_2.1.0
[64] ggseqlogo_0.1 dbplyr_2.2.0 utf8_1.2.2
[67] labeling_0.4.2 ggplotify_0.1.0 tidyselect_1.1.2
[70] rlang_1.0.2 reshape2_1.4.4 munsell_0.5.0
[73] tools_4.2.1 cachem_1.0.6 cli_3.3.0
[76] generics_0.1.2 RSQLite_2.2.14 broom_0.8.0
[79] stringr_1.4.0 fastmap_1.1.0 yaml_2.3.5
[82] ggtree_3.4.0 bit64_4.0.5 tidygraph_1.2.1
[85] caTools_1.18.2 purrr_0.3.4 dendextend_1.15.2
[88] KEGGREST_1.36.2 ggraph_2.0.5 sparseMatrixStats_1.8.0
[91] nlme_3.1-157 aplot_0.1.6 DO.db_2.9
[94] xml2_1.3.3 biomaRt_2.52.0 compiler_4.2.1
[97] rstudioapi_0.13 filelock_1.0.2 curl_4.3.2
[100] png_0.1-7 ggsignif_0.6.3 treeio_1.20.0
[103] tibble_3.1.7 tweenr_1.0.2 stringi_1.7.6
[106] forcats_0.5.1 lattice_0.20-45 Matrix_1.4-1
[109] permute_0.9-7 vegan_2.6-2 vctrs_0.4.1
[112] stringdist_0.9.8 pillar_1.7.0 lifecycle_1.0.1
[115] data.table_1.14.2 cowplot_1.1.1 bitops_1.0-7
[118] patchwork_1.1.1 rtracklayer_1.56.0 qvalue_2.28.0
[121] R6_2.5.1 BiocIO_1.6.0 KernSmooth_2.23-20
[124] codetools_0.2-18 boot_1.3-28 MASS_7.3-57
[127] gtools_3.9.2.2 assertthat_0.2.1 rjson_0.2.21
[130] withr_2.5.0 GenomicAlignments_1.32.0 Rsamtools_2.12.0
[133] GenomeInfoDbData_1.2.8 mgcv_1.8-40 parallel_4.2.1
[136] hms_1.1.1 grid_4.2.1 ggfun_0.0.6
[139] tidyr_1.2.0 DelayedMatrixStats_1.18.0 carData_3.0-5
[142] ggforce_0.3.3 restfulr_0.0.14
Hi Sarvesh,
here is my annotation call:
merged_sample_names <- c("24DAP", "8DAP", "4DAG")
txdb <-makeTxDbFromGFF("/home/pavlu/Dokumenty/references/Hv_Morex.pgsb.Jul2020.gff3", format="gff3")
for(sn in merged_sample_names){
sample_name <- paste0("morex_", sn)
annotations_oneplot_pl[[sn]] <-
seqArchRplus::per_cluster_annotations(
sname = sn,
clusts = use_clusts,
tc_gr = CAGEr::tagClustersGR(cager_obj,
sample = sample_name),
cager_obj = NULL,
qLow = 0.1, qUp = 0.9,
txdb_obj = txdb,
tss_region = c(-500,100),
orgdb_obj = NULL, dir_path = use_dir,
one_plot = TRUE,
txt_size = use_txt_size)
}
"cager_obj" is CAGEexp from our CAGEr analysis
"use_clusts" is created according to your vignette
This works when called for the whole bedfile. But when called for bedfile containing only a certain subset of sequences it does not work as intended. (in the zip, there is "unannot" file which contains all of the sequences and "candidates" which contains just the subset)
BEDs.zip
Here is merged panel for the full bedfile:
4DAG_combined_plots.pdf
Here is merged panel for the subset (obviously the first few clusters should be annotated as intronic, the IQW is the same as the previous panel):
4DAG_combined_plots.pdf
Here is session info:
> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=cs_CZ.UTF-8 LC_NUMERIC=C LC_TIME=cs_CZ.UTF-8 LC_COLLATE=cs_CZ.UTF-8 LC_MONETARY=cs_CZ.UTF-8
[6] LC_MESSAGES=cs_CZ.UTF-8 LC_PAPER=cs_CZ.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] cowplot_1.1.1 GenomicFeatures_1.48.0 AnnotationDbi_1.58.0
[4] BSgenome.Hvulgare2.GACR.MorexV3_3.0.0 BSgenome_1.64.0 rtracklayer_1.56.0
[7] ggpubr_0.4.0 egg_0.4.5 gridExtra_2.3
[10] ggplot2_3.3.6 Biostrings_2.64.0 XVector_0.36.0
[13] ChIPseeker_1.32.0 seqArchRplus_0.99.0.3 seqArchR_1.0.0
[16] CAGEr_2.2.0 MultiAssayExperiment_1.22.0 SummarizedExperiment_1.26.1
[19] Biobase_2.56.0 GenomicRanges_1.48.0 GenomeInfoDb_1.32.1
[22] IRanges_2.30.0 S4Vectors_0.34.0 BiocGenerics_0.42.0
[25] MatrixGenerics_1.8.0 matrixStats_0.62.0
loaded via a namespace (and not attached):
[1] utf8_1.2.2 tidyselect_1.1.2 htmlwidgets_1.5.4
[4] RSQLite_2.2.14 grid_4.2.0 BiocParallel_1.30.0
[7] scatterpie_0.1.7 munsell_0.5.0 withr_2.5.0
[10] colorspace_2.0-3 GOSemSim_2.22.0 filelock_1.0.2
[13] rstudioapi_0.13 ggsignif_0.6.3 DOSE_3.22.0
[16] labeling_0.4.2 GenomeInfoDbData_1.2.8 polyclip_1.10-0
[19] bit64_4.0.5 farver_2.1.0 vctrs_0.4.1
[22] treeio_1.20.0 generics_0.1.2 BiocFileCache_2.4.0
[25] ggseqlogo_0.1 R6_2.5.1 graphlayouts_0.8.0
[28] VGAM_1.1-6 locfit_1.5-9.5 heatmaps_1.20.0
[31] bitops_1.0-7 cachem_1.0.6 fgsea_1.22.0
[34] gridGraphics_0.5-1 DelayedArray_0.22.0 assertthat_0.2.1
[37] BiocIO_1.6.0 scales_1.2.0 ggraph_2.0.5
[40] enrichplot_1.16.0 gtable_0.3.0 formula.tools_1.7.1
[43] tidygraph_1.2.1 rlang_1.0.2 splines_4.2.0
[46] rstatix_0.7.0 lazyeval_0.2.2 broom_0.8.0
[49] yaml_2.3.5 reshape2_1.4.4 abind_1.4-5
[52] backports_1.4.1 qvalue_2.28.0 tools_4.2.0
[55] ggplotify_0.1.0 ellipsis_0.3.2 gplots_3.1.3
[58] RColorBrewer_1.1-3 Rcpp_1.0.8.3 plyr_1.8.7
[61] sparseMatrixStats_1.8.0 progress_1.2.2 zlibbioc_1.42.0
[64] purrr_0.3.4 RCurl_1.98-1.6 prettyunits_1.1.1
[67] viridis_0.6.2 ggrepel_0.9.1 cluster_2.1.3
[70] factoextra_1.0.7 magrittr_2.0.3 data.table_1.14.2
[73] DO.db_2.9 fftwtools_0.9-11 hms_1.1.1
[76] patchwork_1.1.1 XML_3.99-0.9 jpeg_0.1-9
[79] compiler_4.2.0 biomaRt_2.52.0 tibble_3.1.7
[82] KernSmooth_2.23-20 crayon_1.5.1 shadowtext_0.1.2
[85] htmltools_0.5.2 tiff_0.1-11 ggfun_0.0.6
[88] mgcv_1.8-40 tidyr_1.2.0 aplot_0.1.4
[91] DBI_1.1.2 tweenr_1.0.2 dbplyr_2.1.1
[94] MASS_7.3-57 rappdirs_0.3.3 boot_1.3-28
[97] som_0.3-5.1 Matrix_1.4-1 car_3.0-13
[100] permute_0.9-7 cli_3.3.0 parallel_4.2.0
[103] igraph_1.3.1 forcats_0.5.1 pkgconfig_2.0.3
[106] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicAlignments_1.32.0 xml2_1.3.3
[109] ggtree_3.4.0 stringdist_0.9.8 yulab.utils_0.0.4
[112] stringr_1.4.0 digest_0.6.29 vegan_2.6-2
[115] fastmatch_1.1-3 tidytree_0.3.9 operator.tools_1.6.3
[118] dendextend_1.15.2 DelayedMatrixStats_1.18.0 restfulr_0.0.13
[121] curl_4.3.2 EBImage_4.38.0 Rsamtools_2.12.0
[124] gtools_3.9.2 rjson_0.2.21 lifecycle_1.0.1
[127] nlme_3.1-157 jsonlite_1.8.0 carData_3.0-5
[130] viridisLite_0.4.0 fansi_1.0.3 pillar_1.7.0
[133] lattice_0.20-45 KEGGREST_1.36.0 fastmap_1.1.0
[136] httr_1.4.3 plotrix_3.8-2 GO.db_3.15.0
[139] glue_1.6.2 png_0.1-7 bit_4.0.4
[142] ggforce_0.3.3 stringi_1.7.6 blob_1.2.3
[145] caTools_1.18.2 memoise_2.0.1 dplyr_1.0.9
[148] ape_5.6-2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.