Coder Social home page Coder Social logo

cistopic's People

Contributors

amathelier avatar cbravo93 avatar ghuls avatar s-aibar avatar wkopp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cistopic's Issues

selectModel plot when returnType = 'selectedModel' (cisTopic 0.2.1)

Hi,

This may well be user error, but I'm struggling to locate the likelihoods generated by runModels when returnType = 'selectedModel'.

[email protected]
Object of class "data.frame"
data frame with 0 columns and 0 rows

I'd love to be able to create a plot similar to what is generated by default with selectModel, but this only seems to work when returnType has the default value, and it sure would be handy to have for a larger set of tested models.

Thanks!

The choice of the bed file

Hi!
The tutorial writes "For initializing the cisTopic object:
Starting from the bam files and predefined regions [Reference running time: 0.4 sec/cell]
pathToBams <- 'data/bamfiles/'
bamFiles <- paste(pathToBams, list.files(pathToBams), sep='')
regions <- 'data/regions.bed' "
and your paper said"a BED file with candidate regulatory regions (for example, from peak calling on the aggregate or the bulk profile)."

So, if my single cell data is marked by H3K36me3, should I use bulk H3K36me3 WT data to call peaks for region bed file ? Or use aggregated single cell data?

topicsRcisTarget lseek erro

Hi,

One more question:

Also 5k PBMC tutorial:

cisTopicObject <- topicsRcisTarget(cisTopicObject, genome='hg19', pathToFeather, reduced_database=FALSE, nesThreshold=3, rocthr=0.005, maxRank=20000, nCores=24)

gives me an error:

Error in openFeather(path) : IO error: lseek failed

Help would be appreciated!

topicsRcisTarget error

Hi,

i ran into the following issue:

library(feather)
cisTopicObject_d0 <- topicsRcisTarget(cisTopicObject_d0, genome='mm9', pathToFeather, reduced_database=FALSE, nesThreshold=3, rocthr=0.005, maxRank=20000, nCores=24)
[1] "Exporting data to clusters..."
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
24 nodes produced errors; first error: package or namespace load failed for ‘RcisTarget’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
there is no package called ‘feather’

as you see, the 'feather' package loads without issues. However, topicsRcisTarget produces this error. I vaguely recall a conversation with authors of the 'scenic' package on feather v0.3.3 (which is what i have installed) not being compatible and on having to roll it back to v0.3.1 as far as i remember. Is this the case with cisTopic too?

Thank you!

Joe

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /sc/wo/app/R/v3.5.1/lib64/R/lib/libRblas.so
LAPACK: /sc/wo/app/R/v3.5.1/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 grid stats graphics grDevices utils datasets methods base

other attached packages:
[1] rtracklayer_1.40.6 R.utils_2.9.0 R.oo_1.22.0 R.methodsS3_1.7.1
[5] Seurat_3.0.2 ggplot2_3.2.0 RcisTarget_1.5.0 feather_0.3.3
[9] cisTopic_0.2.1 BiocParallel_1.14.2 doParallel_1.0.15 iterators_1.0.12
[13] foreach_1.4.7 densityClust_0.3 org.Mm.eg.db_3.7.0 TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.4
[17] GenomicFeatures_1.34.8 AnnotationDbi_1.44.0 Biobase_2.42.0 ChIPseeker_1.18.0
[21] rGREAT_1.14.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.2 IRanges_2.16.0
[25] S4Vectors_0.20.1 BiocGenerics_0.28.0 data.table_1.12.2 fastcluster_1.1.25
[29] ComplexHeatmap_1.20.0 Rtsne_0.15 umap_0.2.2.0 Rsubread_1.32.4
[33] httpuv_1.5.1

loaded via a namespace (and not attached):
[1] reticulate_1.10 tidyselect_0.2.5 htmlwidgets_1.3 RSQLite_2.1.1
[5] munsell_0.5.0 codetools_0.2-16 ica_1.0-2 DT_0.8
[9] future_1.14.0 withr_2.1.2 colorspace_1.4-1 GOSemSim_2.8.0
[13] rstudioapi_0.10 ROCR_1.0-7 DOSE_3.8.2 gbRd_0.4-11
[17] listenv_0.7.0 Rdpack_0.11-0 urltools_1.7.3 GenomeInfoDbData_1.2.0
[21] polyclip_1.10-0 bit64_0.9-7 farver_1.1.0 vctrs_0.2.0
[25] R6_2.4.0 rsvd_1.0.2 bitops_1.0-6 fgsea_1.8.0
[29] gridGraphics_0.4-1 DelayedArray_0.8.0 assertthat_0.2.1 promises_1.0.1
[33] SDMTools_1.1-221.1 scales_1.0.0 ggraph_1.0.2 enrichplot_1.2.0
[37] gtable_0.3.0 npsurv_0.4-0 globals_0.12.4 rlang_0.4.0
[41] zeallot_0.1.0 GlobalOptions_0.1.0 splines_3.5.1 lazyeval_0.2.2
[45] europepmc_0.3 yaml_2.2.0 reshape2_1.4.3 backports_1.1.4
[49] qvalue_2.14.1 tools_3.5.1 ggplotify_0.0.4 gridBase_0.4-7
[53] gplots_3.0.1.1 RColorBrewer_1.1-2 ggridges_0.5.1 Rcpp_1.0.1
[57] plyr_1.8.4 progress_1.2.2 zlibbioc_1.28.0 purrr_0.3.2
[61] RCurl_1.95-4.12 prettyunits_1.0.2 pbapply_1.4-1 GetoptLong_0.1.7
[65] viridis_0.5.1 cowplot_1.0.0 zoo_1.8-6 SummarizedExperiment_1.10.1
[69] ggrepel_0.8.1 cluster_2.1.0 magrittr_1.5 DO.db_2.9
[73] circlize_0.4.6 triebeard_0.3.0 lmtest_0.9-37 RANN_2.6
[77] fitdistrplus_1.0-14 matrixStats_0.54.0 hms_0.5.0 lsei_1.2-0
[81] mime_0.7 xtable_1.8-4 XML_3.98-1.20 AUCell_1.7.1
[85] gridExtra_2.3 shape_1.4.4 compiler_3.5.1 biomaRt_2.36.1
[89] tibble_2.1.3 KernSmooth_2.23-15 crayon_1.3.4 htmltools_0.3.6
[93] later_0.8.0 snow_0.4-3 tidyr_0.8.2 DBI_1.0.0
[97] tweenr_1.0.1 MASS_7.3-51.4 boot_1.3-23 Matrix_1.2-17
[101] gdata_2.18.0 metap_1.1 igraph_1.2.2 pkgconfig_2.0.2
[105] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 rvcheck_0.1.3 GenomicAlignments_1.18.1 plotly_4.9.0
[109] xml2_1.2.0 annotate_1.60.1 lda_1.4.2 XVector_0.22.0
[113] bibtex_0.4.2 stringr_1.4.0 digest_0.6.19 tsne_0.1-3
[117] sctransform_0.2.0 graph_1.60.0 Biostrings_2.48.0 fastmatch_1.1-0
[121] GSEABase_1.42.0 shiny_1.3.2 Rsamtools_1.32.3 gtools_3.8.1
[125] rjson_0.2.20 nlme_3.1-141 jsonlite_1.6 viridisLite_0.3.0
[129] pillar_1.4.2 lattice_0.20-38 httr_1.4.1 plotrix_3.7-6
[133] survival_2.44-1.1 GO.db_3.6.0 glue_1.3.1 FNN_1.1.2.1
[137] UpSetR_1.4.0 png_0.1-7 bit_1.1-14 ggforce_0.2.2
[141] stringi_1.4.3 blob_1.2.0 doSNOW_1.0.18 caTools_1.17.1.1
[145] memoise_1.1.0 dplyr_0.8.1 irlba_2.3.3 future.apply_1.3.0
[149] ape_5.2

add release tag

I'm experiencing a lot of issues with installing the dependencies for this package from Bioconductor and CRAN, so I would like to make a conda package for it.

According to this GitHub issue, that can only be done if the repository has a release tag. Could you please add one?

error when specifying topics for ontologyDotPlot

Hi, after computing GREAT I tried using ontologyDotPlot, but I kept on getting an error when trying to specify the topics to plot, below I've attached a screenshot of my cistopic object, specifically the GREAT portion and the different attempts I've made to specify topics
image
image

How to load multiple 10X runs to cisTopic?

Hi,

I was wondering if you have a propose way to analyse multiple 10X runs at once. I'm not very familiar with the R data structure that cisTopic uses; however, it would be good to have something like an equivalent of concatenate function from anndata.

Any suggestions? :)

Number of topics and peak using questions. Thank you.

Hi cisTopic team,

Thank you for developing the cisTopic software. We’re trying it on our scATAC data and find some promising results. But we have several concerns about our results. It will be great if you could provide some suggestions.

  1. We tested different number of topics, but results showed the more topics the more stable model is in our data (attached figure1). Do you have any idea the reason of this?
    We also noticed that some of the topics are similar to each other. Is there any good way to merge those similar topics? Is that ok to average the z-score or probabilities for these topics? Or do you think that we’d better manually select lower number of topics in selectModel() step?

  2. Do you have any idea that how many times that each peak/region is really meaningful in contribution to topics in general? I noticed that when the algorithm builds region score, it seems that almost all peaks are used. However, some peaks have very limited contributions. After running binarizecisToipcs() to binarize topics, there're only about 20% of peaks passed the cutoff and saved in the results [email protected] and used for downstream functional and pathway analysis. But the rest 80% of the peaks do not have meaningful contribution to any topics. (the attached figure2). And some peaks are used more than 15 times in contributing to different topics. Is that normal? How could we interpret this result?

Thank you so much!!

cisTopic_issue_figure1
cisTopic_issue_figure2

cellTopicHeatmap - annotations

Dear cisTopic team / aertslab,

First of all, thank you for the great package. It's been working really well for me.

I am using cisTopic in a jupyter notebook. In this context, I had to modify cellTopicHeatmap to get the annotations to display well:

I changed

annotation <- ComplexHeatmap::HeatmapAnnotation(df = object.cell.data[,colorBy,drop=FALSE], col = colVars, which='column', width = unit(5, "mm"))

to either

annotation <- ComplexHeatmap::HeatmapAnnotation(df = object.cell.data[,colorBy,drop=FALSE], col = colVars, which='column')

or

annotation <- ComplexHeatmap::HeatmapAnnotation(df = object.cell.data[,colorBy,drop=FALSE], col = colVars, which='column', height = unit(5, "mm"))
(I am guessing it should be height?! )

I ran the same code just from the console with png() and pdf() and had the same issue. I might be missing something, but I am guessing 'width' should either be 'height' or removed? I haven't tried running cisTopic in RStudio. It might not be a problem there since your vignette displays everything just fine? Could possibly also be due to a different ComplexHeatmap version (sessionInfo below).

Thanks,
Christoph

sessionInfo():


R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS/LAPACK: <path to conda env>/libopenblasp-r0.3.7.so

locale:
[1] C

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] scatterplot3d_0.3-41 plotly_4.9.1         ggplot2_3.2.1       
[4] ComplexHeatmap_2.2.0 fastcluster_1.1.25   cisTopic_0.3.0      

loaded via a namespace (and not attached):
  [1] reticulate_1.14             R.utils_2.9.2              
  [3] tidyselect_1.0.0            RSQLite_2.2.0              
  [5] AnnotationDbi_1.48.0        htmlwidgets_1.5.1          
  [7] BiocParallel_1.20.1         Rtsne_0.15                 
  [9] munsell_0.5.0               codetools_0.2-16           
 [11] ica_1.0-2                   pbdZMQ_0.3-3               
 [13] future_1.16.0               withr_2.1.2                
 [15] RcisTarget_1.4.0            colorspace_1.4-1           
 [17] Biobase_2.46.0              uuid_0.1-2                 
 [19] Seurat_3.1.2                stats4_3.6.1               
 [21] ROCR_1.0-7                  gbRd_0.4-11                
 [23] listenv_0.8.0               Rdpack_0.11-0              
 [25] repr_1.1.0                  GenomeInfoDbData_1.2.2     
 [27] lgr_0.3.3                   farver_2.0.3               
 [29] bit64_0.9-7                 vctrs_0.2.1                
 [31] float_0.2-3                 BiocFileCache_1.10.2       
 [33] R6_2.4.1                    GenomeInfoDb_1.22.0        
 [35] clue_0.3-57                 rsvd_1.0.2                 
 [37] AnnotationFilter_1.10.0     bitops_1.0-6               
 [39] DelayedArray_0.12.2         assertthat_0.2.1           
 [41] promises_1.1.0              SDMTools_1.1-221.2         
 [43] scales_1.1.0                gtable_0.3.0               
 [45] npsurv_0.4-0                Cairo_1.5-10               
 [47] globals_0.12.5              seqLogo_1.52.0             
 [49] rlang_0.4.4                 zeallot_0.1.0              
 [51] GlobalOptions_0.1.1         text2vec_0.6               
 [53] splines_3.6.1               rtracklayer_1.46.0         
 [55] lazyeval_0.2.2              reshape2_1.4.3             
 [57] GenomicFeatures_1.38.2      backports_1.1.5            
 [59] httpuv_1.5.2                tools_3.6.1                
 [61] feather_0.3.5               gplots_3.0.1.2             
 [63] RColorBrewer_1.1-2          BiocGenerics_0.32.0        
 [65] ggridges_0.5.2              Rcpp_1.0.3                 
 [67] plyr_1.8.5                  base64enc_0.1-3            
 [69] progress_1.2.2              zlibbioc_1.32.0            
 [71] purrr_0.3.3                 RCurl_1.98-1.1             
 [73] prettyunits_1.1.1           openssl_1.4.1              
 [75] GetoptLong_0.1.8            pbapply_1.4-2              
 [77] cowplot_1.0.0               S4Vectors_0.24.3           
 [79] zoo_1.8-7                   SummarizedExperiment_1.16.1
 [81] ggrepel_0.8.1               cluster_2.1.0              
 [83] magrittr_1.5                data.table_1.12.8          
 [85] circlize_0.4.6              lmtest_0.9-37              
 [87] RANN_2.6.1                  mlapi_0.1.0                
 [89] fitdistrplus_1.0-14         matrixStats_0.55.0         
 [91] hms_0.5.3                   lsei_1.2-0                 
 [93] mime_0.9                    evaluate_0.14              
 [95] xtable_1.8-4                RhpcBLASctl_0.20-17        
 [97] XML_3.99-0.3                AUCell_1.6.1               
 [99] shape_1.4.4                 IRanges_2.20.2             
[101] gridExtra_2.3               compiler_3.6.1             
[103] biomaRt_2.42.0              tibble_2.1.3               
[105] KernSmooth_2.23-16          crayon_1.3.4               
[107] R.oo_1.23.0                 htmltools_0.4.0            
[109] later_1.0.0                 snow_0.4-3                 
[111] tidyr_1.0.0                 RcppParallel_4.4.4         
[113] DBI_1.1.0                   dbplyr_1.4.2               
[115] MASS_7.3-51.5               rappdirs_0.3.1             
[117] Matrix_1.2-18               R.methodsS3_1.8.0          
[119] gdata_2.18.0                parallel_3.6.1             
[121] metap_1.1                   igraph_1.2.4.2             
[123] GenomicRanges_1.38.0        pkgconfig_2.0.3            
[125] getPass_0.2-2               GenomicAlignments_1.22.1   
[127] rsparse_0.3.3.4             IRdisplay_0.7.0            
[129] foreach_1.4.8               annotate_1.64.0            
[131] lda_1.4.2                   XVector_0.26.0             
[133] bibtex_0.4.2                stringr_1.4.0              
[135] digest_0.6.24               sctransform_0.2.0          
[137] RcppAnnoy_0.0.14            tsne_0.1-3                 
[139] graph_1.64.0                Biostrings_2.54.0          
[141] leiden_0.3.3                uwot_0.1.5                 
[143] GSEABase_1.46.0             curl_4.3                   
[145] shiny_1.4.0                 Rsamtools_2.2.1            
[147] gtools_3.8.1                rjson_0.2.20               
[149] lifecycle_0.1.0             nlme_3.1-143               
[151] jsonlite_1.6.1              viridisLite_0.3.0          
[153] askpass_1.1                 BSgenome_1.54.0            
[155] pillar_1.4.3                lattice_0.20-38            
[157] fastmap_1.0.1               httr_1.4.1                 
[159] survival_3.1-8              glue_1.3.1                 
[161] png_0.1-7                   iterators_1.0.12           
[163] bit_1.1-15.1                stringi_1.4.5              
[165] blob_1.2.1                  doSNOW_1.0.18              
[167] caTools_1.18.0              memoise_1.1.0              
[169] IRkernel_1.1                dplyr_0.8.3                
[171] irlba_2.3.3                 future.apply_1.4.0         
[173] ape_5.3 

scATAC-seq time course experiment

Hi,

I have an experiment with four different time points using ATAC seq where the samples at the four time points come from different subjects. All the subject were treated the same at baseline and some of the subjects were used for sample collection at three different time points. So i have a total of four time points including the baseline samples. I analyzed the four time points separately and created separate cisTopic objects

My question is: can the four predMatSumByGene matrices generated in this process per the PBMC tutorials be 'combined' to get an estimation on gene accessibility changes over time? Or would you recommend a different method?

Thank you!

GREAT database connection issues

Sorry, one more issue i keep running into:

> TObject <- GREAT(TObject, genome='hg19', fold_enrichment=2, geneHits=1, sign=0.05, request_interval=10)
Error in download.file(url, destfile = file, quiet = TRUE) :
cannot open URL 'http://great.stanford.edu/public/cgi-bin/readJsFromFile.php?path=/scratch/great/tmp/results/20190604-public-3.0.0-hUvmbp.d/EnsemblGenes.js'
In addition: Warning message:
In download.file(url, destfile = file, quiet = TRUE) :
cannot open URL 'http://great.stanford.edu/public/cgi-bin/readJsFromFile.php?path=/scratch/great/tmp/results/20190604-public-3.0.0-hUvmbp.d/EnsemblGenes.js': HTTP status was '403 Forbidden'
failed to download, try after 30s
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /sc/wo/app/R/v3.5.1/lib64/R/lib/libRblas.so
LAPACK: /sc/wo/app/R/v3.5.1/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 grid stats graphics grDevices utils datasets methods base

other attached packages:
[1] RcisTarget_1.5.0 feather_0.3.3 cisTopic_0.2.1
[4] BiocParallel_1.14.2 doParallel_1.0.14 iterators_1.0.10
[7] foreach_1.4.4 densityClust_0.3 org.Hs.eg.db_3.6.0
[10] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.34.8 AnnotationDbi_1.44.0
[13] Biobase_2.42.0 ChIPseeker_1.18.0 rGREAT_1.14.0
[16] GenomicRanges_1.34.0 GenomeInfoDb_1.18.2 IRanges_2.16.0
[19] S4Vectors_0.20.1 BiocGenerics_0.28.0 data.table_1.12.2
[22] fastcluster_1.1.25 ComplexHeatmap_1.20.0 Rtsne_0.15
[25] umap_0.2.2.0 Rsubread_1.32.4

loaded via a namespace (and not attached):
[1] snow_0.4-3 circlize_0.4.6 fastmatch_1.1-0 plyr_1.8.4 igraph_1.2.2
[6] lazyeval_0.2.2 GSEABase_1.42.0 splines_3.5.1 ggplot2_3.1.1 gridBase_0.4-7
[11] urltools_1.7.3 digest_0.6.18 htmltools_0.3.6 GOSemSim_2.8.0 viridis_0.5.1
[16] GO.db_3.6.0 gdata_2.18.0 lda_1.4.2 magrittr_1.5 memoise_1.1.0
[21] Biostrings_2.48.0 annotate_1.60.1 matrixStats_0.54.0 R.utils_2.8.0 enrichplot_1.2.0
[26] prettyunits_1.0.2 colorspace_1.4-1 blob_1.1.1 ggrepel_0.8.0 dplyr_0.7.8
[31] crayon_1.3.4 RCurl_1.95-4.12 jsonlite_1.6 graph_1.60.0 bindr_0.1.1
[36] survival_2.44-1.1 glue_1.3.1 polyclip_1.10-0 gtable_0.3.0 zlibbioc_1.28.0
[41] XVector_0.22.0 UpSetR_1.4.0 GetoptLong_0.1.7 DelayedArray_0.8.0 shape_1.4.4
[46] scales_1.0.0 DOSE_3.8.2 DBI_1.0.0 Rcpp_1.0.0 plotrix_3.7-5
[51] viridisLite_0.3.0 xtable_1.8-4 progress_1.2.2 gridGraphics_0.4-1 reticulate_1.10
[56] bit_1.1-14 europepmc_0.3 httr_1.4.0 fgsea_1.8.0 FNN_1.1.2.1
[61] gplots_3.0.1.1 RColorBrewer_1.1-2 R.methodsS3_1.7.1 pkgconfig_2.0.2 XML_3.98-1.19
[66] farver_1.1.0 later_0.7.5 ggplotify_0.0.3 tidyselect_0.2.5 rlang_0.3.4
[71] reshape2_1.4.3 munsell_0.5.0 tools_3.5.1 RSQLite_2.1.1 doMC_1.3.5
[76] ggridges_0.5.1 stringr_1.4.0 yaml_2.2.0 npsurv_0.4-0 bit64_0.9-7
[81] fitdistrplus_1.0-14 caTools_1.17.1.1 purrr_0.3.2 ggraph_1.0.2 bindrcpp_0.2.2
[86] mime_0.6 R.oo_1.22.0 DO.db_2.9 xml2_1.2.0 biomaRt_2.36.1
[91] compiler_3.5.1 rstudioapi_0.10 lsei_1.2-0 tibble_2.1.2 tweenr_1.0.1
[96] stringi_1.2.4 lattice_0.20-38 Matrix_1.2-17 pillar_1.4.1 triebeard_0.3.0
[101] GlobalOptions_0.1.0 cowplot_0.9.4 bitops_1.0-6 httpuv_1.4.5 AUCell_1.7.1
[106] rtracklayer_1.40.6 qvalue_2.14.1 R6_2.4.0 promises_1.0.1 KernSmooth_2.23-15
[111] gridExtra_2.3 codetools_0.2-16 boot_1.3-22 MASS_7.3-51.4 gtools_3.8.1
[116] assertthat_0.2.1 SummarizedExperiment_1.10.1 rjson_0.2.20 GenomicAlignments_1.18.1 Rsamtools_1.32.3
[121] GenomeInfoDbData_1.2.0 doSNOW_1.0.16 hms_0.4.2 rvcheck_0.1.3 ggforce_0.2.2
[126] shiny_1.2.0

Plotting cluster info signaturesHeatmap

Hello @cbravo93 !

I have got new insights on my dataset using cisTopic! Great package. This time I would like to know whether is possible to use signaturesHeatmap to display the cluster-info (e.g. 'densityClust') similar to cellTopicHeatmap

Thanks in advance!

counts_Lake.Rds

Hello, is it possible to share counts_Lake.Rds file?
Thanks!

densityClust error

Hi,

Following the PBMC data analysis vignette, i ran into the error below:

dclust_d3 <- densityClust(DRdist_d3,gaussian=T)
Distance cutoff calculated to 2.378223
dclust_d3 <- findClusters(dclust_d3, rho = 50, delta = 2.5)
Error in cluster[i] <- cluster[higherDensity[which.min(findDistValueByRowColInd(x$distance, :
replacement has length zero

Any ideas why this may be happening? If i look at the dclust_d3 object, it has 'NA' in the 'clusters' slot. I assume this is why the error is produced; however, i am not sure why there 'NA' in the 'clusters' slot in the first place.

Thanks,

Joe

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /sc/wo/app/R/v3.5.1/lib64/R/lib/libRblas.so
LAPACK: /sc/wo/app/R/v3.5.1/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 grid stats graphics grDevices utils datasets methods base

other attached packages:
[1] RcisTarget_1.5.0 feather_0.3.3
[3] cisTopic_0.2.1 BiocParallel_1.14.2
[5] doParallel_1.0.14 iterators_1.0.10
[7] foreach_1.4.4 densityClust_0.3
[9] org.Mm.eg.db_3.7.0 TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.4
[11] GenomicFeatures_1.34.8 AnnotationDbi_1.44.0
[13] Biobase_2.42.0 ChIPseeker_1.18.0
[15] rGREAT_1.14.0 GenomicRanges_1.34.0
[17] GenomeInfoDb_1.18.2 IRanges_2.16.0
[19] S4Vectors_0.20.1 BiocGenerics_0.28.0
[21] data.table_1.12.2 fastcluster_1.1.25
[23] ComplexHeatmap_1.20.0 Rtsne_0.15
[25] umap_0.2.2.0 Rsubread_1.32.4
[27] httpuv_1.5.1

loaded via a namespace (and not attached):
[1] snow_0.4-3 backports_1.1.4
[3] circlize_0.4.6 fastmatch_1.1-0
[5] plyr_1.8.4 igraph_1.2.2
[7] lazyeval_0.2.2 GSEABase_1.42.0
[9] splines_3.5.1 ggplot2_3.2.0
[11] gridBase_0.4-7 urltools_1.7.3
[13] digest_0.6.19 htmltools_0.3.6
[15] GOSemSim_2.8.0 viridis_0.5.1
[17] GO.db_3.6.0 gdata_2.18.0
[19] lda_1.4.2 magrittr_1.5
[21] memoise_1.1.0 Biostrings_2.48.0
[23] annotate_1.60.1 matrixStats_0.54.0
[25] R.utils_2.9.0 enrichplot_1.2.0
[27] prettyunits_1.0.2 colorspace_1.4-1
[29] blob_1.2.0 ggrepel_0.8.1
[31] dplyr_0.8.1 crayon_1.3.4
[33] RCurl_1.95-4.12 jsonlite_1.6
[35] graph_1.60.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[37] zeallot_0.1.0 survival_2.44-1.1
[39] glue_1.3.1 polyclip_1.10-0
[41] gtable_0.3.0 zlibbioc_1.28.0
[43] XVector_0.22.0 UpSetR_1.4.0
[45] GetoptLong_0.1.7 DelayedArray_0.8.0
[47] shape_1.4.4 scales_1.0.0
[49] DOSE_3.8.2 DBI_1.0.0
[51] Rcpp_1.0.1 plotrix_3.7-6
[53] xtable_1.8-4 viridisLite_0.3.0
[55] progress_1.2.2 gridGraphics_0.4-1
[57] reticulate_1.10 bit_1.1-14
[59] europepmc_0.3 DT_0.7
[61] htmlwidgets_1.3 httr_1.4.0
[63] fgsea_1.8.0 FNN_1.1.2.1
[65] gplots_3.0.1.1 RColorBrewer_1.1-2
[67] R.methodsS3_1.7.1 pkgconfig_2.0.2
[69] XML_3.98-1.20 farver_1.1.0
[71] ggplotify_0.0.3 tidyselect_0.2.5
[73] rlang_0.4.0 reshape2_1.4.3
[75] later_0.8.0 munsell_0.5.0
[77] tools_3.5.1 RSQLite_2.1.1
[79] ggridges_0.5.1 stringr_1.4.0
[81] yaml_2.2.0 npsurv_0.4-0
[83] bit64_0.9-7 fitdistrplus_1.0-14
[85] caTools_1.17.1.1 purrr_0.3.2
[87] ggraph_1.0.2 mime_0.7
[89] R.oo_1.22.0 DO.db_2.9
[91] xml2_1.2.0 biomaRt_2.36.1
[93] compiler_3.5.1 rstudioapi_0.10
[95] lsei_1.2-0 tibble_2.1.3
[97] tweenr_1.0.1 stringi_1.4.3
[99] lattice_0.20-38 Matrix_1.2-17
[101] vctrs_0.2.0 pillar_1.4.2
[103] triebeard_0.3.0 GlobalOptions_0.1.0
[105] cowplot_1.0.0 bitops_1.0-6
[107] AUCell_1.7.1 rtracklayer_1.40.6
[109] qvalue_2.14.1 R6_2.4.0
[111] promises_1.0.1 KernSmooth_2.23-15
[113] gridExtra_2.3 codetools_0.2-16
[115] boot_1.3-23 MASS_7.3-51.4
[117] gtools_3.8.1 assertthat_0.2.1
[119] SummarizedExperiment_1.10.1 rjson_0.2.20
[121] GenomicAlignments_1.18.1 Rsamtools_1.32.3
[123] GenomeInfoDbData_1.2.0 doSNOW_1.0.16
[125] hms_0.5.0 rvcheck_0.1.3
[127] ggforce_0.2.2 shiny_1.3.2

Chip-seq files for topic annotation & UCSC-style bams not working

Hi @cbravo93,

  1. Not so much about the library, but about the analysis itself: to give the topics meaning, in the tutorial you used 3 Chip-seq peak files from (I assume) separate bulk Chip-seq experiments. Are there any publicly available Chip-seq files for a range of transcription factors that you would recommend?

  2. Getting to the later stages of the tutorial mentioned above, I realised my bam files were not UCSC-style (chromosomes were named 1,2,3,... instead of chr1,chr2,chr3,...) - I fixed that and now have new bam files, but I can't read them and create a cisTopicObject
    Some relevant info: Before it used to work with the same code but just non-UCSC bams and
    I also made the aggregate pseudo-bulk UCSC-style for this

Code:
pathToBams <- '/blabla/picard_bam_files_UCSC_style_test/'
bamFiles <- paste(pathToBams, list.files(pathToBams), sep='')
regions <- '/blabla/UCSC_style_peak_aggregated_scATAC_individual.narrowPeak'
cisTopicObject <- createcisTopicObjectFromBAM(bamFiles, regions, project.name='bla')

ERROR: invalid parameter: '/blabla/picard_bam_files_UCSC_style_test/sample_1.bam'
Error in Rsubread::featureCounts(bamfiles, annot.ext = regions_frame, :
No counts were generated.

(while trying to fix this I also run this (Vignette packages):

source("https://bioconductor.org/biocLite.R")
biocLite(c('Rsubread', 'umap', 'Rtsne', 'ComplexHeatmap', 'fastcluster', 'data.table', 'rGREAT', 'ChIPseeker', 'TxDb.Hsapiens.UCSC.hg19.knownGene', 'org.Hs.eg.db'))

but that didn't help)

I'd be happy if you could help.

Thanks

Grouping regions into topics

Hi @cbravo93 !

Had a conceptual question: having read the paper I still struggle to understand how the algorithm groups regions into topics? Based on what? Is it based on co-occurence? Similar patterns of other regions? Basically how can you interpret this relationship?

Thank you.

getBigwigFiles functions error

Hi!
I tried to generate BW file after get region scores.

library(TxDb.Mmusculus.UCSC.mm9.knownGene)
txdb<-TxDb.Mmusculus.UCSC.mm9.knownGene
getBigwigFiles(cisTopicObject, path='output/cisTopics_asBW', seqlengths=seqlengths(txdb))

However, I get the error message below:
Error in FUN(extractROWS(unlisted_X, IRanges(X_elt_start[i], X_elt_end[i])), :
BigWig ranges cannot overlap

Looking forward to your Suggestions!

getCistromeEnrichment parallel computing issues

Hi,

I am getting an error in the 5k PBMC tutorial with the line below:

cisTopicObject <- getCistromeEnrichment(cisTopicObject, topic=1, TFname='SPI1', aucellRankings = aucellRankings,

  •                                     aucMaxRank = 0.05*nrow(aucellRankings), plot=FALSE)
    

Using 0 cores.
Error in mclapply(argsList, FUN, mc.preschedule = preschedule, mc.set.seed = set.seed, :
'mc.cores' must be >= 1

I tried a couple of possible solutions to this but could not get it to work.

Suggestions would be appreciated!

Thank you!

Motif Enrichment for mm9

Hi,

I noticed that the cisTarget database only has ranking by regions for hg19. If my accessibility data is mapped to mm9, is the best option to liftover to hg19 to do motif enrichment analysis?

Thank you!
E

Region score question

Hello,

First of all, thanks for publishing this work. It's really useful code and very helpful.

I was wondering about this line:

modelMat <- apply(normalizedTopics, 2, function(x) x * (log(x + 1e-05) - sum(log(x + 1e-05))/length(x)))

This looks a little different from what's in the Methods section of the paper, which has the sum of the logarithms in denominator.

I also wondered if there was an explanation available for how to interpret region scores, perhaps a reference where this score is introduced? I haven't seen it before.

Thanks again.

Undefined functions

Hi,

it seems that several functions are currently not defined in the package, including runModels and runWrapLDAModels.

Best,
Wolfgang

Trouble with installation

[in]:

devtools::install_github("aertslab/cisTopic")

[out]:

Downloading GitHub repo aertslab/cisTopic@master


Skipping 15 packages ahead of CRAN: GenomicRanges, S4Vectors, SummarizedExperiment, BiocGenerics, IRanges, XVector, GenomeInfoDb, zlibbioc, GenomicAlignments, Biobase, annotate, AnnotationDbi, DelayedArray, BiocParallel, GenomeInfoDbData

✔  checking for file ‘/tmp/RtmpktT8Ub/remotes2f89111f2a0/aertslab-cisTopic-8fd1432/DESCRIPTION’
─  preparing ‘cisTopic’:
✔  checking DESCRIPTION meta-information
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  looking to see if a ‘data/datalist’ file should be added
─  building ‘cisTopic_0.3.0.tar.gz’ (7.7s)
   

Installing package into ‘/home/mvinyard/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)

Error: Failed to install 'cisTopic' from GitHub:
  (converted from warning) installation of package ‘/tmp/RtmpktT8Ub/file2f892524899b/cisTopic_0.3.0.tar.gz’ had non-zero exit status
Traceback:

1. devtools::install_github("aertslab/cisTopic")
2. pkgbuild::with_build_tools({
 .     ellipsis::check_dots_used(action = getOption("devtools.ellipsis_action", 
 .         rlang::warn))
 .     {
 .         remotes <- lapply(repo, github_remote, ref = ref, subdir = subdir, 
 .             auth_token = auth_token, host = host)
 .         install_remotes(remotes, auth_token = auth_token, host = host, 
 .             dependencies = dependencies, upgrade = upgrade, force = force, 
 .             quiet = quiet, build = build, build_opts = build_opts, 
 .             build_manual = build_manual, build_vignettes = build_vignettes, 
 .             repos = repos, type = type, ...)
 .     }
 . }, required = FALSE)
3. install_remotes(remotes, auth_token = auth_token, host = host, 
 .     dependencies = dependencies, upgrade = upgrade, force = force, 
 .     quiet = quiet, build = build, build_opts = build_opts, build_manual = build_manual, 
 .     build_vignettes = build_vignettes, repos = repos, type = type, 
 .     ...)
4. tryCatch(res[[i]] <- install_remote(remotes[[i]], ...), error = function(e) {
 .     stop(remote_install_error(remotes[[i]], e))
 . })
5. tryCatchList(expr, classes, parentenv, handlers)
6. tryCatchOne(expr, names, parentenv, handlers[[1L]])
7. value[[3L]](cond)

Export the topic-signature matrix

Hi @cbravo93 ,

Is it possible to export the topic-signature matrix similar to how modelMatSelection exports either topic-cell or topic-region matrices?

Thank you.

Sincerely,
Anna Arutyunyan

how does the cisTopic LDA algorithm treat low-depth cells ?

The 'LDA' algorithm is a creative and effective method to handle large scATAC dimensions. Unlike scRNA, which often genes x cells, ATAC peaks are hundreds of times more than gene numbers. However, in my own analysis experience, I got separative clusters (and further different developmental trajectories) of cells, which were often different with sequence-depth. I wonder that : 1) does the cisTopic LDA algorithm treat low-depth cells properly? will it possible that these clusters were only separated due to sequence depth? 2) should I filter these low-depth (in fact, not so low-depth at my first glance)

Please help and any suggestions will be good. I just want to keep off false-positive results but do not to miss the true findings.

p.s. the results of signac(seurat extension package), episcanpy & scanpy, cicero look similar.

mm9 vs. mm10

The package is very intuitive and provides great insight.

I've noticed that the GREAT function gives an error with mm9 but not with mm10:

cisTopicObject <- GREAT(cisTopicObject, genome='mm9', fold_enrichment=2, geneHits=1, sign=0.05, request_interval=10)
Error in submitGreatJob(coord, species = genome, request_interval = request_interval, :
GREAT encountered a user error (message from GREAT web server)

Will I run into issues switching back and forth between mm9 and mm10 in the workflow as the feather file is mm9? i.e. can I run the other analysis with mm10 up until GREAT and then switch to mm9 for TF motif enrichment and formation of cistromes?

Alternatively, is there any reason why GREAT isn't accepting the above command with mm9 that can be fixed?

error in cisTopic(logLikelihoodByIter)

Hi,
I am using the following tutorial, in step 24 am getting the following error. i got the same data until step 23, there is no problem in that. please try to help me. am using R3.5.1 and checked with R.3.6.1 also, and got the same error message
https://nbviewer.jupyter.org/github/pinellolab/scATAC-benchmarking/blob/master/Real_Data/Buenrostro_2018/run_methods/cisTopic/cisTopic_buenrostro2018.ipynb?flush_cache=true

step 24
logLikelihoodByIter(cisTopicObject, select=c(10, 20, 25, 30, 35, 40))

error
Error in scales::hue_pal(l = 60:100) : length(l) == 1 is not TRUE
Calls: logLikelihoodByIter ... unique -> col2rgb -> %in% -> -> stopifnot

Error in metadataFeather(path)

Screen Shot 2020-08-27 at 10 45 22 AM

Hi,

I'm having difficulty with accessing the regions feather link for subsequent to the lift over from mm10 to mm9.
Need help getting past it.

Thanks!

topicsRcisTarget: Error in .column_indexes_feather(x, i)

Hi,

When running topicsRcisTarget on the hg38 on liftover to hg19 I get this error.

Error in .column_indexes_feather(x, i) : undefined columns: chr1-reg496, chr1-reg497, chr1-reg498, chr1-reg500, chr1-reg976, chr1-reg977, chr1-reg978, chr1-reg979, chr1-reg980, chr1-reg1117, chr1-reg1119, chr1-reg1120, chr1-reg1121, chr1-reg2014, chr1-reg2016, chr1-reg2017, chr1-reg2018, chr1-reg2310, chr1-reg2312, chr1-reg2314, chr1-reg2315, chr1-reg2316, chr1-reg2317, chr1-reg2318, chr1-reg6260, chr1-reg6261, chr1-reg6262, chr1-reg6467, chr1-reg6468, chr1-reg6618, chr1-reg6619, chr1-reg6620, chr1-reg6621, chr1-reg6622, chr1-reg6623, chr1-reg6624, chr1-reg6634, chr1-reg6635, chr1-reg6637, chr1-reg6638, chr1-reg6639, chr1-reg6641, chr1-reg6642, chr1-reg6643, chr1-reg6644, chr1-reg6645, chr1-reg6891, chr1-reg6892, chr1-reg6893, chr1-reg6972, chr1-reg7710, chr1-reg7711, chr1-reg7712, chr1-reg8470, chr1-reg8776, chr1-reg8778, chr1-reg8779, chr1-reg8967, chr1-reg9123, chr1-reg9125, chr1-reg9127, chr1-reg9128, chr1-reg9609, chr1-reg9783, chr1-reg9784, chr1-reg9785, chr1-reg9787, chr1-reg9788, chr1-reg9789, chr1-reg9790, Calls: topicsRcisTarget ... as_tibble -> [ -> [.feather -> .column_indexes_feather Execution halted

Any guess on why this is happening?

Thank you!

``

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS/LAPACK: /home/jovyan/my-conda-envs/myenvSC/lib/libopenblasp-r0.3.7.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] TxDb.Hsapiens.UCSC.hg38.knownGene_3.10.0
[2] GenomicFeatures_1.38.0
[3] org.Hs.eg.db_3.10.0
[4] AnnotationDbi_1.48.0
[5] Biobase_2.46.0
[6] rtracklayer_1.46.0
[7] GenomicRanges_1.38.0
[8] GenomeInfoDb_1.22.0
[9] IRanges_2.20.1
[10] S4Vectors_0.24.1
[11] BiocGenerics_0.32.0
[12] R.utils_2.9.2
[13] R.oo_1.23.0
[14] R.methodsS3_1.7.1
[15] cisTopic_0.2.2

loaded via a namespace (and not attached):
[1] bitops_1.0-6 matrixStats_0.55.0
[3] bit64_0.9-7 progress_1.2.2
[5] httr_1.4.1 tools_3.6.1
[7] backports_1.1.5 R6_2.4.1
[9] DBI_1.0.0 npsurv_0.4-0
[11] tidyselect_0.2.5 prettyunits_1.0.2
[13] curl_4.3 bit_1.1-14
[15] compiler_3.6.1 AUCell_1.8.0
[17] graph_1.64.0 RcisTarget_1.6.0
[19] DelayedArray_0.12.0 askpass_1.1
[21] rappdirs_0.3.1 stringr_1.4.0
[23] digest_0.6.23 Rsamtools_2.2.1
[25] XVector_0.26.0 pkgconfig_2.0.3
[27] htmltools_0.4.0 dbplyr_1.4.2
[29] fastmap_1.0.1 rlang_0.4.2
[31] RSQLite_2.1.4 shiny_1.4.0
[33] BiocParallel_1.20.0 dplyr_0.8.3
[35] RCurl_1.95-4.12 magrittr_1.5
[37] feather_0.3.5 GenomeInfoDbData_1.2.2
[39] Matrix_1.2-18 Rcpp_1.0.3
[41] lda_1.4.2 stringi_1.4.3
[43] MASS_7.3-51.4 SummarizedExperiment_1.16.0
[45] zlibbioc_1.32.0 plyr_1.8.5
[47] BiocFileCache_1.10.2 grid_3.6.1
[49] blob_1.2.0 promises_1.1.0
[51] crayon_1.3.4 doSNOW_1.0.18
[53] lattice_0.20-38 Biostrings_2.54.0
[55] splines_3.6.1 annotate_1.64.0
[57] hms_0.5.2 zeallot_0.1.0
[59] pillar_1.4.2 codetools_0.2-16
[61] biomaRt_2.42.0 XML_3.98-1.20
[63] glue_1.3.1 lsei_1.2-0
[65] data.table_1.12.8 vctrs_0.2.0
[67] httpuv_1.5.2 foreach_1.4.4
[69] openssl_1.4.1 purrr_0.3.3
[71] assertthat_0.2.1 mime_0.7
[73] xtable_1.8-4 later_1.0.0
[75] survival_3.1-8 tibble_2.1.3
[77] snow_0.4-3 iterators_1.0.10
[79] GenomicAlignments_1.22.1 memoise_1.1.0
[81] fitdistrplus_1.0-14 GSEABase_1.48.0
``

ComplexHeatmap argument cluster_columns in cellTopicHeatmap

Hi,

I'm having this issue when plotting the heatmap with cellTopicHeatmap (as in this tutorial):
Code:
cellTopicHeatmap(cisTopicObject, method='Probability', colorBy=c('celltype', 'TREATMENT'), cluster_rows = FALSE, cluster_columns = FALSE)
Error message:
Error in ComplexHeatmap::Heatmap(data.matrix(topic.mat), col = colorPal(20), :
formal argument "cluster_columns" matched by multiple actual arguments

My suspicion is that the argument cluster_columns is forced to TRUE, which for me is an issue since I would prefer to group columns according to their status is either treatment or celltype and not be clustered.

Thank you.

Installation Trouble from cisTopic/R/RunModels.R

I have installed all the dependencies cisTopic requires on R-3.5.2, but while trying to install the package from devtools or source I keep hitting this error on line 261 from an equal-sign operator.

Unsure if this is due to a change in the code-base causing some syntactical error on R.

`> devtools::install_github("aertslab/cisTopic")
Downloading GitHub repo aertslab/cisTopic@master
Skipping 1 packages not available: text2vec
✔ checking for file ‘/tmp/Rtmp5FkfaK/remotese17e72b625e9/aertslab-cisTopic-3e3cd00/DESCRIPTION’ ...
─ preparing ‘cisTopic’:
✔ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ looking to see if a ‘data/datalist’ file should be added
─ building ‘cisTopic_0.3.0.tar.gz’ (6.3s)

Installing package into ‘/usr/lib64/R/library’
(as ‘lib’ is unspecified)

  • installing source package ‘cisTopic’ ...
    ** R
    Error in parse(outFile) :
    /tmp/RtmpXwCkm6/R.INSTALLe1e87f993a10/cisTopic/R/RunModels.R:261:33: unexpected '='
    260:
    261: if (length(models) < 3 && type=
    ^
    ERROR: unable to collate and parse R files for package ‘cisTopic’
  • removing ‘/usr/lib64/R/library/cisTopic’`

Arabidopsis genome

Working on Arabidopsis, which is anot a USCS supported genome, any ideas on how to deal with getting this info:

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

getBigwigFiles(cisTopicObject, path='output/cisTopics_asBW', seqlengths=seqlengths(txdb))

and other areas that need genome annotation info

Some data missing for tutorials

Some files, like cisTopicObject_pbmc.Rds from the 10X 5k PBMCs and the data/bamfiles/ folder from the melanoma tutorial are missing.

Bulk Atac narrowpeak files from Corces 2016

Hi,

Would you guys mind sharing how to easily obtain the bulk ATAC narrowpeak files used in the PBMC tutorial as reference for enrichment testing (Corces et al. 2016 Nat Genet)?

Thanks,

Joe

Uploading data used in paper

Hello,

I was wondering whether it would be possible to upload some of the datasets used in the paper. Namely, I am interested in exploring the processed data (inputs to cisTopic) from the sections "Simulated epigenomes from FACS-sorted bulk ATAC-seq profiles from the hematopoietic system" and "scATAC-seq from FACS-sorted single-cell populations from the hematopoietic system"

Thanks!

Error of the 'runCGSModels' function in cisTopic v0.3.0

I notice that this package has been updated to version 0.3.0. However, the function ‘runCGSModels’, which is said to be equivalent to ‘runModels’ in the version 0.2.1, return some error.

More specifically, I tried running the tutorial on simulated single cell epigenomes from melanoma cell line , and changed the function ‘runModels’ in the tutorial to ‘runCGSModels’. The error message is returned as follows:

cisTopicObject_tmp <- runCGSModels(cisTopicObject, topic=c(2, 5:15, 20, 25), seed=987, nCores=13, burnin = 120, iterations = 150, addModels=FALSE)

[1] "Formatting data..."
[1] "Exporting data..."
[1] "Running models..."
Error in do.ply(i) :
task 1 failed - "cannot coerce type 'closure' to vector of type 'double'"

Moreover, when I reset the argument ‘nCores=1’, the error message is different from the previous one:

cisTopicObject_tmp <- runCGSModels(cisTopicObject, topic=c(2, 5:15, 20, 25), seed=987, nCores=1, burnin = 120, iterations = 150, addModels=FALSE)

[1] "Formatting data..."
[1] "Running models..."

| | 0%Error in lda.collapsed.gibbs.sampler(cellList, topic, regionList, num.iterations = iterations, :
object 'iterations' not found

Therefore, there probably exists some bug in the cisTopic v0.3.0. Hope that my error message helps.

Accessing topics fits

Hi, I want to access the estimated topic distributions and probabilities. How do I do this from a cisTopic fit? The documentation is not clear on this.

Column 4 ['rep..TF_lowConf...length.peaks..'] of item 2 is missing in item 1.

cisTopicObject<- getCistromes(cisTopicObject, annotation = 'Both', nCores=1)
Column 4 ['rep..TF_lowConf...length.peaks..'] of item 2 is missing in item 1. Use fill=TRUE to fill with NA (NULL for list columns), or use.names=FALSE to ignore column names. use.names='check' (default from v1.12.2) emits this message and proceeds as if use.names=FALSE for backwards compatibility. See news item 5 in v1.12.2 for options to control this message.
Column 4 ['rep..TF_lowConf...length.peaks..'] of item 2 is missing in item 1. Use fill=TRUE to fill with NA (NULL for list columns), or use.names=FALSE to ignore column names. use.names='check' (default from v1.12.2) emits this message and proceeds as if use.names=FALSE for backwards compatibility. See news item 5 in v1.12.2 for options to control this message.
Column 4 ['rep..TF_lowConf...length.peaks..'] of item 2 is missing in item 1. Use fill=TRUE to fill with NA (NULL for list columns), or use.names=FALSE to ignore column names. use.names='check' (default from v1.12.2) emits this message and proceeds as if use.names=FALSE for backwards compatibility. See news item 5 in v1.12.2 for options to control this message.

This message keeps repeating.....

Suggestions?

Thanks,

plotFeatures and cellTopicHeatmap errors

Hi,

Sorry, one more issue to open. Once again, following the 10x vignette.

plotFeatures(cisTopicObject_d0, method='tSNE', target='cell', topic_contr=NULL, colorBy=c('nCounts', 'nAcc','densityClust', 'graphBasedClusters_CRA'), cex.legend = 0.8,
factor.max=.75, dim=2, legend=TRUE, col.low='darkgreen', col.mid='yellow', col.high='brown1', intervals=10)
Error in plotFeatures(cisTopicObject_d0, method = "tSNE", target = "cell", :
The variable graphBasedClusters_CRA is not included in the cell data. Please check and re-run.

This is strange because i did run

graphBasedClusters_CRA_d0 <- read.table(pathTographBasedClusters_CRA_d0, sep=',', header=TRUE, row.names = 1)
colnames(graphBasedClusters_CRA_d0) <- 'graphBasedClusters_CRA'
graphBasedClusters_CRA_d0[,1] <- as.factor(graphBasedClusters_CRA_d0[,1])
cisTopicObject_d0 <- addCellMetadata(cisTopicObject_d0, graphBasedClusters_CRA_d0)

which all ran just fine. Ideas would be appreciated.

Also,

cellTopicHeatmap(cisTopicObject_d0, method='Probability', colorBy=c('densityClust')
Error in ComplexHeatmap::HeatmapAnnotation(df = object.cell.data[, colorBy, :
elements in col should be named vectors.

cisTopicObject_d0 <- addCellMetadata(cisTopicObject_d0, densityClust_d0)
ran without issues for this object.

Thank you,

Joe

Integration using Harmony as described in Drosophila eye-antennal disc preprint

Hi @cbravo93,

I have dataset from two conditions that I would like to compare. However, when I combined the peak count matrix and analyzed it using cisTopic, I noticed batch effect. I'm thinking about projecting one sample to the existing topic space of the other. 

And I find this in your preprint: "Additionally, we projected the FAC-sorted single cell profiles (Optix-GFP+ and sens-GFP+) with at least 70% of the fragments within regulatory regions into the existing topic space. Briefly, the topic-cell distributions of the new cells were estimated by multiplying the binary count matrix (cell-regions) by the region-topic distributions of the existing models. The estimated topic-cell contributions were merged with the topic-cell distributions of the original cells, normalized (by Z-Score) and batch effects were corrected with Harmony (v1.0)102."

Is it possible to share the script how you perform this analysis?

Thanks.
Jason

10X input issue after Cell Ranger aggregation

Getting this error trying to load an aggregation of two samples with the new CellRanger

data_folder <- '/media/breunighp/SSD/ssrnaseq/scATAC/Aggr/filtered_peak_bc_matrix/'
metrics <- 'singlecell.csv'
cisTopicObjectt <- createcisTopicObjectFrom10Xmatrix(data_folder, metrics, project.name = "cisTopicProject")
Error in dimnamesGets(x, value) :
invalid dimnames given for “dgTMatrix” object

I can load it into cisTOPIC by using 10X's recommendation for creating a matrix and going through the workflow that way.

This is the first time I've had an issue with a 10X dataset and I'm wondering if it is due to the aggregation or the new version of CellRanger?

Errors when trying to fit models with less than three topic numbers

Hi,

selectModel fails when fitting the model with runCGSModels using as the topic argument a single number (e.g. topic=c(30)).

The error message that I get is

Error in `$<-.data.frame`(`*tmp*`, "second_derivative", value = c(-Inf,  : 
  replacement has 2 rows, data has 1
Calls: selectModel -> $<- -> $<-.data.frame

I also get an error when running runCGSModels with only two topic numbers (e.g. topic=c(29,30)), but then the error is different:

Error in plot.window(...) : need finite 'xlim' values
Calls: selectModel ... plot -> plot -> plot.default -> localWindow -> plot.window
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
Execution halted

When run the model for more then two topic numbers (e.g. topic=c(29,30,31)) it seems to work.

Best,
Wolfgang

Comparing untreated vs treated cells

Good day!

I have been using this excellent package for analyzing tumor biology. Now, I have another type of dataset (four samples) where two of them are control cells in two different time points (7 and 28d), and the other two are treated cells and collected at the same time points. I would like to know how I can create a cisTopic object that contains the four datasets and perform comparative analysis and topic modeling. I am interested in checking on the same dimensional space, how the treatment affects the chromatin accessibility (cell memory), and which features don't change.

I have checked your tutorial (cisTopic on simulated single-cell epigenomes from melanoma cell line) and seems an approach that can be used in my case for this data. Still, I do not know how to create the object with the info for the four datasets and whether it can be used for my porpuses later in the downstream analysis.

Thank you in advance for your help!

getSignaturesRegions throwing error

Hi,

I am following the 5k PBMC tutorial. The line

cisTopicObject <- getSignaturesRegions(cisTopicObject, Bulk_ATAC_signatures, labels=labels, minOverlap = 0.4)

throws an error:
Error in getSignaturesRegions(cisTopicObject, Bulk_ATAC_signatures, labels = labels, :
There is at least a signature with the same label:Bcell, CD34-Bone-Marrow, CD34-Cord-Blood, CD4Tcell, CD8Tcell, Mono, NKcell. Please, rename it.

as far as i can tell, there are no duplicate labels in the signatures. Help would be appreciated!

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /sc/wo/app/R/v3.5.1/lib64/R/lib/libRblas.so
LAPACK: /sc/wo/app/R/v3.5.1/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 grid stats graphics grDevices utils datasets methods base

other attached packages:
[1] plyr_1.8.4 fitdistrplus_1.0-14 npsurv_0.4-0 lsei_1.2-0
[5] survival_2.44-1.1 MASS_7.3-51.4 AUCell_1.7.1 scatterplot3d_0.3-41
[9] plotly_4.9.0 ggplot2_3.1.1 Matrix_1.2-17 cisTopic_0.2.1
[13] BiocParallel_1.14.2 doParallel_1.0.14 iterators_1.0.10 foreach_1.4.4
[17] densityClust_0.3 org.Hs.eg.db_3.6.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.34.8
[21] AnnotationDbi_1.44.0 Biobase_2.42.0 ChIPseeker_1.18.0 rGREAT_1.14.0
[25] GenomicRanges_1.34.0 GenomeInfoDb_1.18.2 IRanges_2.16.0 S4Vectors_0.20.1
[29] BiocGenerics_0.28.0 data.table_1.12.2 fastcluster_1.1.25 ComplexHeatmap_1.20.0
[33] Rtsne_0.15 umap_0.2.2.0 Rsubread_1.32.4

loaded via a namespace (and not attached):
[1] snow_0.4-3 circlize_0.4.6 fastmatch_1.1-0 igraph_1.2.2 lazyeval_0.2.2 GSEABase_1.42.0
[7] splines_3.5.1 feather_0.3.3 gridBase_0.4-7 urltools_1.7.3 digest_0.6.18 htmltools_0.3.6
[13] GOSemSim_2.8.0 viridis_0.5.1 GO.db_3.6.0 gdata_2.18.0 lda_1.4.2 magrittr_1.5
[19] memoise_1.1.0 Biostrings_2.48.0 annotate_1.60.1 matrixStats_0.54.0 R.utils_2.8.0 enrichplot_1.2.0
[25] prettyunits_1.0.2 colorspace_1.4-1 blob_1.1.1 ggrepel_0.8.0 dplyr_0.7.8 crayon_1.3.4
[31] RCurl_1.95-4.12 jsonlite_1.6 graph_1.60.0 bindr_0.1.1 glue_1.3.1 polyclip_1.10-0
[37] gtable_0.3.0 zlibbioc_1.28.0 XVector_0.22.0 UpSetR_1.3.3 GetoptLong_0.1.7 DelayedArray_0.8.0
[43] shape_1.4.4 scales_1.0.0 DOSE_3.8.2 DBI_1.0.0 Rcpp_1.0.1 plotrix_3.7-5
[49] viridisLite_0.3.0 xtable_1.8-4 progress_1.2.2 gridGraphics_0.4-1 reticulate_1.10 bit_1.1-14
[55] europepmc_0.3 DT_0.6 htmlwidgets_1.3 httr_1.4.0 fgsea_1.8.0 FNN_1.1.3
[61] gplots_3.0.1.1 RColorBrewer_1.1-2 R.methodsS3_1.7.1 pkgconfig_2.0.2 XML_3.98-1.19 farver_1.1.0
[67] later_0.7.5 ggplotify_0.0.3 tidyselect_0.2.5 rlang_0.3.4 reshape2_1.4.3 munsell_0.5.0
[73] tools_3.5.1 RSQLite_2.1.1 ggridges_0.5.1 stringr_1.4.0 yaml_2.2.0 bit64_0.9-7
[79] caTools_1.17.1.1 purrr_0.3.2 ggraph_1.0.2 bindrcpp_0.2.2 mime_0.6 R.oo_1.22.0
[85] DO.db_2.9 xml2_1.2.0 biomaRt_2.36.1 compiler_3.5.1 rstudioapi_0.10 tibble_2.1.1
[91] tweenr_1.0.1 stringi_1.2.4 lattice_0.20-38 pillar_1.4.0 triebeard_0.3.0 GlobalOptions_0.1.0
[97] cowplot_0.9.4 bitops_1.0-6 httpuv_1.4.5 rtracklayer_1.40.6 qvalue_2.14.1 R6_2.4.0
[103] promises_1.0.1 KernSmooth_2.23-15 gridExtra_2.3 RcisTarget_1.5.0 codetools_0.2-15 boot_1.3-22
[109] gtools_3.8.1 assertthat_0.2.1 SummarizedExperiment_1.10.1 rjson_0.2.20 withr_2.1.2 GenomicAlignments_1.18.1
[115] Rsamtools_1.32.3 GenomeInfoDbData_1.2.0 doSNOW_1.0.16 hms_0.4.2 tidyr_0.8.2 rvcheck_0.1.3
[121] ggforce_0.2.2 shiny_1.2.0

distinguishing of broken cells from true low depth cells

Hi all,
cisTopic is a novel tool for single cell ATAC data analyzing, which applies the latent Dirichlet allocation (LDA) algorithm to reduce dimensions. It was reported that cisTopic can specifically handle cells at low-depth (around 3k per cell). This is very useful because the technique difficulties of single cell experiment and sequencing. Here comes my question: If I understand right, cells from the simulated data in the cisTopic paper (cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data Nature Method 2019) have a generally low-depth character (sup.figure 1,2 and 4). But what if in a real dataset both high depth and low depth cells are captured and sequenced, does cisTopic will treat them properly without bias for depth level?
The rationale of my question comes from the concern about distinguishing of broken cells from true low depth cells. Some tools (like snapATAC and cellranger-atac) will normalize all cells to their depth, some may encourage users to prefilter low-depth cells out. What do you suggest to treat scATAC datasets with both high and low depth cells within?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.