The honeybadger from jefworks-lab

Hi

Sorry to bother you,thanks for developing and maintaining HoneyBADGER! I get the following error when running the code and I can't find the reason.[I am repeating your work]

library("coda", lib.loc="E:/R-3.5.1/library")
library("rjags", lib.loc="E:/R-3.5.1/library")
Linked to JAGS 4.0.0
Loaded modules: basemod,bugs
results <- calcAlleleCnvProb(allele.mats$r.maf, allele.mats$n.sc,

                         allele.mats$l.maf, allele.mats$n.bulk, allele.mats$snps, geneFactor,

                         region=GenomicRanges::GRanges('chr10', IRanges::IRanges(0,1e9)), verbose=TRUE)

Assessing posterior probability of CNV in region ...
with 69 snps ... within 50 genes ...
converting to multi-dimensional arrays ... Aggregating data to list ...
Running model ...
Compiling model graph

Error in rjags::jags.model(modelFile, data = data, n.chains = 4, n.adapt = 100, :
Can't initialize. No nodes in graph.
Have you compiled the model?

In addition: There were 12 warnings (use warnings() to see them)
Thank you!

Warning in setGeneFactors and error in retestIdentifiedCnvs

Hi I tried to run the example in the getting started session and get warnings message and error using the data provided by HoneyBADGER. I followed this session exactly as shown in the website https://jef.works/HoneyBADGER/Getting_Started.html. This the warning messages I got were:
hb$setGeneFactors(txdb)

Warning messages:
1: In ifelse(strand(psF) == "+", 1, -1) * (start(peaks[!na.idx]) - :
longer object length is not a multiple of shorter object length
2: In ifelse(strand(peF) == "+", 1, -1) * (end(peaks[!na.idx]) - start(peF)) :
longer object length is not a multiple of shorter object length

I think that might be the reason that I got the following error when I ran

hb$retestIdentifiedCnvs(retestBoundGenes=TRUE, retestBoundSnps=TRUE, verbose=FALSE)
The error is
Error in as.vector(x) : no method for coercing this S4 class to a vector

Could you please help to have a look why I got these messages when I ran the example? Thank you very much!

Error in summarizeResults for the allele-based approach

Dear prof. Fan,

I was trying the 'getting started' tutorial for your HoneyBADGER package, but I am running into an error after retesting the CNVs identified by the allele-based approach. Initially, HoneyBADGER identifies the following CNV boundaries:

When I retest these CNVs I get two warnings, which I think might result from the fact that I get different putative CNV regions than the ones listed in the tutorial:

If I then run the `summarizeResults command I get the following error:

I already tried to fix the error by setting min.num.cells in the hb$summarizeResults function to 0 as you suggested in one of the previous issues, but this doesn't seem to work...

Thanks a lot for your time!!

Here's my session info:

Allele-mode for selecting normal cells

Hi Jean,

I have some 10x 3'-end single cells from breast cancer and trying to separate a set of normal cells from this population as described in your paper (MM135). I have a few questions in related to this analysis.

In your paper, you'd selected normal cells based on deletions on Chr13. By looking at the supplementary Fig6. Chr13 doesn't look in particularly different from many other chromosomes. Is there a reason, statistical significance or pre-knowledge, for you to separate cells based on chr13?
By using 'getFastCellAllele' function in 'Preparing data', I got a list with three elements: 'refmat', 'altmat' and 'covmat'. For allele-based mode, there were two objects mentioned in 'Getting started' tutorial: 'data(r)' and 'data(cov.sc)', I want to ask should 'r' object corresponding to 'altmat' or 'altmat/refmat' ( ratios of alt and ref ) from preparing data step?

Thank you very much in advance!
Cheers, Hy

Error: dim(X) must have a positive length

Hi! I'm currently trying to use HoneyBADGER but run into an error when executing hb$plotGexpProfile().

Here is the set of commands I ran:

mart.obj <- useMart(biomart="ENSEMBL_MART_ENSEMBL", dataset='mmusculus_gene_ensembl', host='jul2015.archive.ensembl.org')
hb <- new('HoneyBADGER', name='project')
hb$setGexpMats(matrix.sample, matrix.ref, mart.obj, filter=FALSE, scale=FALSE, verbose=TRUE)
# Initializing expression matrices ...
# Normalizing gene expression for 36544 genes and 2551 cells ...
# Done setting initial expression matrices!
hb$plotGexpProfile()
# Error in apply(d, 2, caTools::runmean, k = window.size, align = "center") :
#   dim(X) must have a positive length

The matrices matrix.sample and matrix.ref contain 2551 and 2456 cells, respectively, with 36544 genes for each cell.

Do you have an idea what the problem could be? I would greatly appreciate your help.

Thanks!

gene filtering different in HoneyBADGER object

Hi,
I have just installed HoneyBADGER, planning to use it for sc 10X analysis. I am seeing radically different filtering of genes between
gexp.mats <- setGexpMats(...)
and
hb$setGexpMats(...)
For example with my own data:

> hb <- new('HoneyBADGER', name=paste0( samplegroup, "LN1vTum1"))
> hb$setGexpMats( as.matrix(gexpOI), as.matrix(gexpRef), mart.obj,
+                 filter=TRUE, scale=TRUE )
Initializing expression matrices ... 
111 genes passed filtering ... 

> gexp.mats <- setGexpMats(as.matrix(gexpOI), as.matrix(gexpRef), mart.obj,
+                          filter=TRUE, scale=TRUE)
Initializing expression matrices ... 
15282 genes passed filtering ...

Or with the example data (which doesn't need filtering, but I used for a comparison test):

> hb <- new('HoneyBADGER', name='MGH31')
> testexample <- setGexpMats(gexp, ref, mart.obj, filter=TRUE, scale=FALSE, verbose=TRUE)
Initializing expression matrices ... 
3463 genes passed filtering ... 
Normalizing gene expression for 3463 genes and 75 cells ... 
Done setting initial expression matrices!                                                   
> hb$setGexpMats(gexp, ref, mart.obj, filter=TRUE, scale=FALSE, verbose=TRUE)
Initializing expression matrices ... 
39 genes passed filtering ...

Can you tell me what is wrong with the internal version?
Thanks!

showing NULL in the step of calcGexpCnvBoundaries - getting started tutorial

Hi HoneyBADGER developers,

Thank you for developing this tool! I tried the getting-statrted-tutorial to make sure that i can use the tool. However, i am stucked in the step of calcGexpCnvBoundaries. I used the demo data provided in the tool, but the results were not the same as yours.

But I can not find out why there is NULL. Could you give me some instructions on how to figure it out?
Thanks a lot for your time!!!

error in retestIdentifiedCnvs

Dear Jean,

When I run the following code hb$retestIdentifiedCnvs(retestBoundGenes=TRUE, retestBoundSnps=TRUE, verbose=T), i get the following error:
Error in as.vector(x) : no method for coercing this S4 class to a vector

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] gridExtra_2.3 reshape2_1.4.3
[3] ggplot2_3.0.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[5] GenomicFeatures_1.30.3 AnnotationDbi_1.40.0
[7] rjags_4-6 coda_0.19-1
[9] biomaRt_2.34.2 edgeR_3.20.9
[11] limma_3.34.9 HoneyBADGER_0.1
[13] SummarizedExperiment_1.8.1 DelayedArray_0.4.1
[15] matrixStats_0.52.2 Biobase_2.38.0
[17] GenomicRanges_1.30.3 GenomeInfoDb_1.14.0
[19] IRanges_2.12.0 S4Vectors_0.16.0
[21] BiocGenerics_0.24.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.19 locfit_1.5-9.1 lattice_0.20-35 Rsamtools_1.30.0
[5] Biostrings_2.46.0 prettyunits_1.0.2 assertthat_0.2.0 digest_0.6.15
[9] R6_2.2.2 HiddenMarkov_1.8-11 plyr_1.8.4 RSQLite_2.0
[13] httr_1.3.1 pillar_1.1.0 zlibbioc_1.24.0 rlang_0.2.0
[17] progress_1.1.2 lazyeval_0.2.1 curl_3.0 blob_1.1.0
[21] Matrix_1.2-12 BiocParallel_1.12.0 RMySQL_0.10.13 stringr_1.3.1
[25] RCurl_1.95-4.8 bit_1.1-12 munsell_0.5.0 rtracklayer_1.38.0
[29] compiler_3.4.3 pkgconfig_2.0.1 tidyselect_0.2.3 tibble_1.4.2
[33] GenomeInfoDbData_0.99.1 XML_3.98-1.11 withr_2.1.2 dplyr_0.7.5
[37] GenomicAlignments_1.14.1 bitops_1.0-6 grid_3.4.3 gtable_0.2.0
[41] DBI_0.7 magrittr_1.5 scales_1.0.0 stringi_1.1.7
[45] XVector_0.18.0 bindrcpp_0.2.2 RColorBrewer_1.1-2 tools_3.4.3
[49] bit64_0.9-7 glue_1.2.0 purrr_0.2.4 colorspace_1.3-2
[53] caTools_1.17.1 memoise_1.1.0 bindr_0.1.1

When should I use HoneyBADGER

Hey,

I'm having a fundamental question, I want to use this on a 10Xgenomics dataset so I want to know if this package can be useful or not?
one basic question is, in 10X data there is a huge bias towards the 3' in the reads so not all the snps going to be seen in the data. Can I call CNVs only based on the expression of proxy genes?
also we will have about 5000 cells per library so this package might be too slow?

How to prepare reference expression?

For a new data set how do you make ref in data(ref) ?

Parameter tweaking for no CNV identified

Hi, I tried the expression modeling of the HoneyBADGER for my single cell data.
Using plotGexpProfile with zlim = c(-1, 1), I can see some CNV expression pattern across the whole genome. However, when I calculate the CNV boundaries, there are no any CNV identified.
I am wondering that in this situation, can I trust the posterior probability with the default parameter? Or should I do some tweak for some parameters?
By the way, if I provide the allele information for the model, would it improve the power to identify CNV?
Is there any quantitative measure you suggest to identify malignant cell within a tumor according to the HoneyBadger's output?
Below is my gene expression visualization, please take a look.

how to parallelize "calcGexpCnvBoundaries" and "retestIdentifiedCnvs" ?

Hi there,

Thanks for putting out a nice package for the community!

I am in the middle of proessing ~ 3000 cells to call gene based CNVs. It seems that the pipeline is bottlenecked by the two functions taking up most of the time. I remember reading a thread that you mentioned there is a multi-core capability, but I failed to how to set the parameters. Could you please show me how?

Regards,

Error in summarizeResults

Hi,

I followed the "Getting Started" tutorial on the website to identify CNVs from expression data. I have no problem runing the functions until I run summarizeResults. The error is
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent

Could you please advise what could cause the error? Thank you so much.

sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.8 (Final)

Matrix products: default
BLAS: /data/nephron/ranli/tools/lib64/R/lib/libRblas.so
LAPACK: /data/nephron/ranli/tools/lib64/R/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rjags_4-8 coda_0.19-3 biomaRt_2.40.1 HoneyBADGER_0.1

loaded via a namespace (and not attached):
[1] progress_1.2.2 tidyselect_0.2.5 purrr_0.3.2
[4] reshape2_1.4.3 lattice_0.20-38 colorspace_1.4-1
[7] vctrs_0.2.0 stats4_3.6.0 blob_1.2.0
[10] XML_3.98-1.20 rlang_0.4.0 pillar_1.4.2
[13] glue_1.3.1 DBI_1.0.0 BiocGenerics_0.30.0
[16] bit64_0.9-7 GenomeInfoDbData_1.2.1 plyr_1.8.4
[19] zlibbioc_1.30.0 stringr_1.4.0 munsell_0.5.0
[22] gtable_0.3.0 caTools_1.17.1.2 memoise_1.1.0
[25] Biobase_2.44.0 HiddenMarkov_1.8-11 IRanges_2.18.1
[28] GenomeInfoDb_1.20.0 parallel_3.6.0 curl_3.3
[31] AnnotationDbi_1.46.0 Rcpp_1.0.1 scales_1.0.0
[34] backports_1.1.4 S4Vectors_0.22.0 XVector_0.24.0
[37] bit_1.1-14 gridExtra_2.3 ggplot2_3.2.0
[40] hms_0.5.0 digest_0.6.20 stringi_1.4.3
[43] dplyr_0.8.3 GenomicRanges_1.36.0 grid_3.6.0
[46] tools_3.6.0 bitops_1.0-6 magrittr_1.5
[49] lazyeval_0.2.2 RCurl_1.95-4.12 tibble_2.1.3
[52] RSQLite_2.1.1 crayon_1.3.4 pkgconfig_2.0.2
[55] zeallot_0.1.0 prettyunits_1.0.2 assertthat_0.2.1
[58] httr_1.4.0 R6_2.4.0 compiler_3.6.0

example data starting from counts (not scaled or centered)

Hi Jean,

It would be a huge help if you have some example data that start with the raw counts, which could be input through your standard initialization method that then does scaling and centering.

many thanks,

~brian

error in summarizeResults

Thanks for developing and maintaining HoneyBADGER! I get the following error when running this command (after preprocessing the data, analyzing it, and including the SNP info):

results <- hb$summarizeResults(geneBased=FALSE, alleleBased=TRUE)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 15, 6

the algorithm identified 15 alterations, as length(hb$cnvs$allele-based$all) is 15, but then further only found 6 as deletions with LOH, as length(hb$cnvs$allele-based$del.loh) is 6.

Thanks!

Error in calcAlleleCnvBoundaries

Dear Jean,

first, congratulations for this method and the paper.

I'm testing it on a set of Smart-seq2 data for which we have also WGS information available. I extracted the counts for the reference and allele count and follow your tutorial for the downstream steps. I'm able to get the allele profile, but when I try to run calcAlleleCnvBoundaries I get the following error:

hb_04$calcAlleleCnvBoundaries(init=TRUE, verbose=TRUE) ## HMM
ignore previously identified CNVs ... iterative HMM ... max vote:3
SNPS AFFECTED BY DELETION/LOH: chr11:36463422:36463422 chr11:68588015:68588015 chr11:107429423:107429423 chr11:108917891:108917891 chr11:114189367:114189367 chr11:134149641:134149641 chr12:909319:909319 chr12:10612961:10612961 chr12:27800374:27800374 chr12:54231632:54231632 chr12:54234519:54234519 chr12:56270842:56270842 chr12:96194399:96194399 chr12:109852372:109852372 chr12:120701764:120701764 chr12:131921676:131921676 chr13:21495681:21495681 chr13:23355937:23355937 chr13:27426307:27426307 chr13:32586535:32586535 chr13:39038397:39038397 chr13:52687852:52687852 chr14:74302884:74302884 chr14:92513948:92513948 chr14:95158667:95158667 chr14:103590772:103590772 chr15:50370390:50370390 chr15:63616479:63616479 chr15:64365021:64365021 chr15:64365340:64365340 chr16:11691753:11691753 chr16:24919799:24919799 chr16:71745491:71745491 chr17:5434694:5434694 chr17:7311357:7311357 chr17:28906269:28906269 chr17:32479946:32479946 chr17:43169904:43169904 chr17:50360705:50360705 chr17:63928898:63928898Assessing posterior probability of CNV in region ...
with 40 snps ... within 38 genes ...
converting to multi-dimensional arrays ... Error in n.sc.array[i, s, ] <- n.sc[snpst[s], ] :
incorrect number of subscripts

I don't get why this error with the dimensions is happening. If I run the same but saving it on a separate object it seems to work, but the method runs differently:

potentialCnvs <- calcAlleleCnvBoundaries(allele.mats$r.maf,allele.mats$n.sc,allele.mats$l.maf,allele.mats$n.bulk, allele.mats$snps, geneFactor, verbose = TRUE)
Running single round of HMM with tree cut height 3... max vote:3
[1] "Identified 20 potential CNVs"

Thanks a lot for your time.
Best.
Juan L.

Seurat and Honeybadger are not compatible

There is a compatibility issue here since you need R<3.4 for rjags in Honeybadger and R>3.4 for Seurat. This is making using this package hard for me. Any solution?

10X + Honeybadger question

Hello, thank you for this promising tool. I found some related issues regarding 10x data, but I am still having trouble.

In short: I have two single cell 3' expression data sets. One normal, one tumor. Is it possible to use Honeybadger on this data set?
What I have tried so far where "tumor" and "normal" are normalized expression matrices (rows are genes, cols are cells):
mart.obj <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = 'hsapiens_gene_ensembl', host = "jul2015.archive.ensembl.org")

hb <- new('HoneyBADGER', name='mybadger')
hb$setGexpMats(tumor, normal, mart.obj, filter=F, scale=F, verbose=TRUE)

Using conda

You can check out here on how to share r-packages via conda:
https://conda.io/docs/user-guide/tasks/use-r-with-conda.html

Applying bayesian hierarchical model in integrating analyses tutorial

Hello,

Thank you for developing and maintaining HoneyBADGER! I was working through the integrating analyses tutorial when I ran into the following issue. When I ran the following piece of code,

results <- do.call(rbind, lapply(potentialCnvs$region, function(region) { calcAlleleCnvProb(allele.mats$r.maf, allele.mats$n.sc, allele.mats$l.maf, allele.mats$n.bulk, allele.mats$snps, geneFactor, region=region, verbose=FALSE) }))

I got this error: Error: object 'region' not found

However, when I changed region=region to region=potentialCnvs$region, the code snippet was running for a while with no console output. Before running this, the plotAllelProfile function had the same image as what was in the tutorial. How can I resolve this issue?

UPDATE: I just needed to wait longer, doing the above step was the right solution

refCount and altCount matrixes are everywhere zero

Hi Jean,

I'm still at the initial phase of file preparation and it's weird that for several chromosomes I'm not getting any counts (counts=0) during the run of the function getSnpMats. That happens in both ref and alt, even though the coverage is not always zero. Do you think that is possible?

My VCF file is from Whole Genome Sequencing, so I expect all there are many more variants in the VCF than in bam files. I wanted to restrict to only axons but running these commands

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
exons <- exons(txdb)

withinRange <- function(rng)
function(x) x
filters <- FilterRules(list(isSNV = isSNV, withinRange = withinRange(exons)))

filt.vcf <- filterVcf(vcfFile, "hg19", "HaplotypeCaller_PASS_dbSNP_annot.vcf.bgz",index = TRUE ,filters=filters, verbose=TRUE)

didn't work on my VCF file. So, given I'm not usually working with VCF finally I used the entire VCF at the end.

Any suggestions is really appreciated.

Bests,

Elisabetta

Error: Failed to install 'HoneyBADGER' from GitHub

Hi,
I'm trying to install HoneyBADGER on my local machine (R4.0.2) and on cluster (R version 3.6.1 )
I had two kind of error,
Firts on cluster (R version 3.6.1 ):
sh: 1: /bin/gtar: not found sh: 1: /bin/gtar: not found Error: Failed to install 'HoneyBADGER' from GitHub: error in running command In addition: Warning messages: 1: In system(cmd) : error in running command 2: In utils::untar(tarfile, ...) : '/bin/gtar -xf '/tmp/RtmptXe1qc/file6039500f3705.tar.gz' -C '/tmp/RtmptXe1qc/remotes603946cf00e2'' returned error code 127
and the second (my local machine R4.0.2)

Downloading GitHub repo JEFworks/HoneyBADGER@HEAD Skipping 4 packages not available: TxDb.Hsapiens.UCSC.hg19.knownGene, AnnotationDbi, GenomicFeatures, GenomicRanges Installing 4 packages: coda, TxDb.Hsapiens.UCSC.hg19.knownGene, HiddenMarkov, rjags Installing packages into ‘/home/kermezly/R/x86_64-pc-linux-gnu-library/4.0’ (as ‘lib’ is unspecified) Erreur : Failed to install 'HoneyBADGER' from GitHub: (converti depuis l'avis) package ‘TxDb.Hsapiens.UCSC.hg19.knownGene’ is not available (for R version 4.0.2)

Any solution on your side ?

Thank you.

error when connecting to BioMart

HI
I got an error when I tried to run HoneyBADGER using the example data:

Request to BioMart web service failed.
The BioMart web service you're accessing may be down.
Check the following URL and see if this website is available:
http://jul2015.archive.ensembl.org:80/biomart/martservice type=registry&requestid=biomaRt
Error in if (!grepl(x = registry, pattern = "^\n*")) { : argument is of length zero

It seems like something goes wrong when I connect to BioMart. Do you have any advice on solving this problem?
Many thanks!

Filtering of identified CNVs

Hi,

I've been using your tool to identify CNVs in 10x Genomics scRNA-seq data. However, I have one question regarding the filtering of identified CNVs. When I run the calcAlleleCnvProb function on a region I end up with the majority of the cells assigned a probability around 0.5 while very few cells are assigned a probability closer to either 0 or 1. In the integrated tutorial I see that you use 0.9 as a cut-off to filter out the CNVs with low probabilities but that this is based on full-transcript coverage. But I'm not sure where to set the cut-off in my dataset.

So my question is, is there any way I can improve the CNV probability calling to get a seperation between cells with and without a CNV? I have analyzed another dataset with higher coverage per cell and I see that the probabilities are better separated for this dataset so is this issue just a result of low coverage?

Thanks!

Gene expression normalization

Hi Jean,

I am trying HoneyBADGER on some GBM datasets that are very similar to the data used in your example, but I run into problems when normalizing gene expression matrices.

According to your tutorial, the gene expression should be transformed to log CPM. I first normalize reads to total reads of each cell and scale by 10^6 to get CPM. Then I natural log transform CPM. This is done by seurat (NormalizeData), but I double checked it manually.

I use the gene expression of normal reference that comes with the package. There are both positive and negative values in the reference, but should log CPM be all >= 0? After normalizing to the reference, all chromosome appear very red in the heatmap, indicating amplification. The script reported errors at results <- hb$summarizeResults(geneBased=TRUE, alleleBased=FALSE).

I also generated some normal reference myself using the same log CPM normalization strategy. However, with this reference, the script failed at hb$retestIdentifiedCnvs(retestBoundGenes = TRUE, retestBoundSnps = FALSE, verbose=TRUE), as there is no CNV boundaries detected by calcGexpCnvBoundaries().

I am wondering if I did something wrong with normalizing the data. Would you please let me know how I should normalize gene expression data so that it can be the input for HoneyBADGER?

Many thanks,
Li

no method for coercing this S4 class to a vector

Hi Dr. Fan,

Thank you so much for making this tool available. While running HoneyBADGER with my own .bam file and the reference provided, I ran into this error:
Error in jags.model(modelFile, data = data, inits = inits, n.chains = 4, :
no method for coercing this S4 class to a vector

while trying to generate HMM with
hb$calcGexpCnvBoundaries(init=TRUE, verbose=FALSE)

Here is my session info:
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4 parallel stats graphics grDevices utils
[7] datasets methods base

other attached packages:
[1] devtools_2.3.2
[2] usethis_1.6.3
[3] sos_2.0-2
[4] brew_1.0-6
[5] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[6] VariantAnnotation_1.36.0
[7] Rsamtools_2.6.0
[8] Biostrings_2.58.0
[9] XVector_0.30.0
[10] SummarizedExperiment_1.20.0
[11] MatrixGenerics_1.2.0
[12] matrixStats_0.57.0
[13] sparseMatrixUtils_0.1.0
[14] Matrix_1.2-18
[15] dplyr_1.0.2
[16] Seurat_3.2.2
[17] rjags_4-10
[18] coda_0.19-4
[19] biomaRt_2.46.0
[20] HoneyBADGER_0.1
[21] GenomicFeatures_1.42.1
[22] AnnotationDbi_1.52.0
[23] Biobase_2.50.0
[24] GenomicRanges_1.42.0
[25] GenomeInfoDb_1.26.1
[26] IRanges_2.24.0
[27] S4Vectors_0.28.0
[28] BiocGenerics_0.36.0

loaded via a namespace (and not attached):
[1] BiocFileCache_1.14.0 plyr_1.8.6
[3] igraph_1.2.6 lazyeval_0.2.2
[5] splines_4.0.3 BiocParallel_1.24.1
[7] listenv_0.8.0 ggplot2_3.3.2
[9] digest_0.6.27 htmltools_0.5.0
[11] HiddenMarkov_1.8-11 fansi_0.4.1
[13] magrittr_2.0.1 memoise_1.1.0
[15] BSgenome_1.58.0 tensor_1.5
[17] cluster_2.1.0 ROCR_1.0-11
[19] remotes_2.2.0 globals_0.14.0
[21] askpass_1.1 prettyunits_1.1.1
[23] colorspace_2.0-0 blob_1.2.1
[25] rappdirs_0.3.1 ggrepel_0.8.2
[27] callr_3.5.1 crayon_1.3.4
[29] RCurl_1.98-1.2 jsonlite_1.7.1
[31] spatstat_1.64-1 spatstat.data_1.5-2
[33] survival_3.2-7 zoo_1.8-8
[35] glue_1.4.2 polyclip_1.10-0
[37] gtable_0.3.0 zlibbioc_1.36.0
[39] leiden_0.3.5 DelayedArray_0.16.0
[41] pkgbuild_1.1.0 future.apply_1.6.0
[43] abind_1.4-5 scales_1.1.1
[45] DBI_1.1.0 miniUI_0.1.1.1
[47] Rcpp_1.0.5 viridisLite_0.3.0
[49] xtable_1.8-4 progress_1.2.2
[51] reticulate_1.18 bit_4.0.4
[53] rsvd_1.0.3 htmlwidgets_1.5.2
[55] httr_1.4.2 RColorBrewer_1.1-2
[57] ellipsis_0.3.1 ica_1.0-2
[59] pkgconfig_2.0.3 XML_3.99-0.5
[61] uwot_0.1.9 deldir_0.2-3
[63] dbplyr_2.0.0 tidyselect_1.1.0
[65] rlang_0.4.9 reshape2_1.4.4
[67] later_1.1.0.1 munsell_0.5.0
[69] tools_4.0.3 cli_2.2.0
[71] generics_0.1.0 RSQLite_2.2.1
[73] ggridges_0.5.2 stringr_1.4.0
[75] fastmap_1.0.1 goftest_1.2-2
[77] fs_1.5.0 processx_3.4.5
[79] bit64_4.0.5 fitdistrplus_1.1-3
[81] caTools_1.18.0 purrr_0.3.4
[83] RANN_2.6.1 pbapply_1.4-3
[85] future_1.20.1 nlme_3.1-150
[87] mime_0.9 xml2_1.3.2
[89] compiler_4.0.3 rstudioapi_0.13
[91] plotly_4.9.2.1 curl_4.3
[93] png_0.1-7 testthat_3.0.0
[95] spatstat.utils_1.17-0 tibble_3.0.4
[97] stringi_1.5.3 ps_1.4.0
[99] desc_1.2.0 lattice_0.20-41
[101] vctrs_0.3.5 pillar_1.4.7
[103] lifecycle_0.2.0 BiocManager_1.30.10
[105] lmtest_0.9-38 RcppAnnoy_0.0.17
[107] data.table_1.13.2 cowplot_1.1.0
[109] bitops_1.0-6 irlba_2.3.3
[111] httpuv_1.5.4 patchwork_1.1.0
[113] rtracklayer_1.50.0 R6_2.5.0
[115] promises_1.1.1 KernSmooth_2.23-18
[117] gridExtra_2.3 parallelly_1.21.0
[119] sessioninfo_1.1.1 codetools_0.2-18
[121] pkgload_1.1.0 MASS_7.3-53
[123] assertthat_0.2.1 rprojroot_2.0.2
[125] openssl_1.4.3 withr_2.3.0
[127] GenomicAlignments_1.26.0 sctransform_0.3.1
[129] GenomeInfoDbData_1.2.4 mgcv_1.8-33
[131] hms_0.5.3 grid_4.0.3
[133] rpart_4.1-15 tidyr_1.1.2
[135] Rtsne_0.15 shiny_1.5.0

speed of running setAlleleMats step

Hi, thank you for providing such a useful tool.

My dataset has ~9k cells and I use all ~60k SNPs provided from HoneyBADGER package (ExAC database) to run setAlleleMats. I found running this step was very slow even it used over 10 cores in the server (it showed 7028 heterozygous SNPs identified). I then tested 9k cells only using 1 SNP with 1 core, it took over 10 minutes (I'm not sure whether it's a reasonable time). How could I speed up in this step? Thanks.

10x data normalization problem

I work with single-cell sequencing technologies, and have been troubleshooting using HoneyBADGER on some of my pancreatic adenocarcinoma scRNA data (generated using 10x genomics chemistry and an illumina nextseq500 platform) to infer CNVs. I've been struggling though, as I'm unsure what best methods to use for normalizing my data for the program, how you structured your reference data from GTex and how I can mimic that with my own "normal" pancreatic sequencing data, and how I might be able to actually get HoneyBADGER to acknowledge my files instead of defaulting to your pre-built references.

Any help you could offer would be greatly appreciated!

Showing error when trying Getting_Started.Rmd

Hi Dr. Fan,

Thank you for developing this useful tool.
I am really new to R language. It showed error when I ran the line "hb$setGexpMats(gexp, ref, mart.obj, filter=FALSE, scale=FALSE, verbose=TRUE)" in your Getting_Started.Rmd script. Do you have any thoughts to solve this problem?
Attached are the error description and my sessionInfo. Thank you

reference for gene expression

Hi Jean,

I'm a new user of this amazing tool.
I'm preparing data before starting with the analysis of CNVs in my CLL sample. I have a couple of questions related to the gene expression data I should use to infer the deviations between normal and cancer sample.
My questions are:

I found a bulk rna-seq dataset of memory B cells with only one sample, that I could use as reference. However, I'm not sure, if one sample is enough for this task.

As reference normal data, could I use single cell rna-seq (for example B cells) data from a public repository rather than bulk rna-seq?

What do you think?

Thank you in advance.

Bests,
Elisabetta

What is the reason for only including snps in HoneyBADGER?

To my understanding, indel infomation also has the potential capability of helping detect copy deletion regions.
Could you please explain the exclusion of germline indel sites in your demo code?

Many Thanks.
Pentayouth.

Error: subscript contains invalid names

What's wrong when I run the calcGexpCnvBoundaries using example data like https://jef.works/HoneyBADGER/Getting_Started.html?

Error in calcCombCnvProb

Hi Prof. Fan,
Thank you for your development of HoneyBADGER!
I tried hb$retestIdentifiedCnvs(retestBoundGenes=TRUE, retestBoundSnps=TRUE, verbose=FALSE) and get the error: Error in as.vector(x) : no method for coercing this S4 class to a vector
I guess there can be something wrong in the calcCombCnvProb method, but I have no idea which step can call this error.
Could you help to give some instructions and help to solve the problem?

Thanks a lot!

Best regards,
Rongting

and part of the log:

Aggregating data to list ...
Initializing model ...
...Done!restricting to 538 genes in region
Aggregating data to list ...
Initializing model ...
...Done!restricting to 1341 genes in region
Aggregating data to list ...
Initializing model ...
...Done!restricting to 443 genes in region
Aggregating data to list ...
Initializing model ...
...Done!restricting to 192 genes in region
Aggregating data to list ...
Initializing model ...
...Done!Retesting bound snps ... Assessing posterior probability of CNV in region ...
with 1433 snps ... within 290 genes ...
converting to multi-dimensional arrays ... Aggregating data to list ...
Running model ...
Done modeling!WARNING! ONLY 4 SNPS IN REGION!
Assessing posterior probability of CNV in region ...
with 4 snps ... within 2 genes ...
converting to multi-dimensional arrays ... Aggregating data to list ...
Running model ...
Done modeling!Retesting bound snps and genes using joint modelError in as.vector(x) : no method for coercing this S4 class to a vector
Calls: -> union -> unique -> as.vector
Execution halted

read bam files

Hi I am trying to use HoneyBADGER to convert 10X genomic single cell RNA-seq data to CNV. to read bam file, I did:

files <- list.files(path = '/data/CellRanger/sample1/outs’)
bamFiles <- files[grepl('possorted_genome_bam.bam', files)]
bamFiles <- paste0(path, bamFiles)

Error in as.vector(x, "character") :
cannot coerce type 'closure' to vector of type 'character'

I am wondering how to fix this

Thanks

vcf files for heterozygous SNPs

Hi Jean,

I want to try honeyBadger with SNPs from GRch38 reference genome. So I downloaded vcf from ExAC and followed steps;

vcfFile <- "legacy-exacv1_downloads-release1-ExAC.r1.sites.vep.vcf.gz"
require(GenomicRanges)
testRanges <- GRanges(seqnames = "1", IRanges(1, 248956422))
require(VariantAnnotation)
param <- ScanVcfParam(which=testRanges)
vcf <- readVcf(vcfFile, "hg38", param = param)
snps <- rowData(vcf)

The result looks like

DataFrame with 939046 rows and 5 columns
paramRangeID REF

1:13372_G/C NA G
1:13380_C/G NA C
1:13382_C/G NA C
1:13402_G/C NA G
1:13404_G/A NA G
... ... ...
1:248903130_C/T NA C
1:248903149_G/GAGTCTAGGCAATCTTCCCAGAATGGAAACCCAATCCACTCTTACTA NA G
1:248903165_G/A NA G
rs73148524 NA C
1:248903190_C/T NA C
ALT QUAL

1:13372_G/C C 608.91
1:13380_C/G G 7829.15
1:13382_C/G G 320.4
1:13402_G/C C 89.66
1:13404_G/A A,T 136.58
... ... ...
1:248903130_C/T T 3402.57
1:248903149_G/GAGTCTAGGCAATCTTCCCAGAATGGAAACCCAATCCACTCTTACTA GAGTCTAGGCAATCTTCCCAGAATGGAAACCCAATCCACTCTTACTA 744.88
1:248903165_G/A A 11336.29
rs73148524 T 1602562.81
1:248903190_C/T T 592.42
FILTER

1:13372_G/C PASS
1:13380_C/G VQSRTrancheSNP99.60to99.80
1:13382_C/G VQSRTrancheSNP99.60to99.80
1:13402_G/C VQSRTrancheSNP99.60to99.80
1:13404_G/A VQSRTrancheSNP99.60to99.80
... ...
1:248903130_C/T PASS
1:248903149_G/GAGTCTAGGCAATCTTCCCAGAATGGAAACCCAATCCACTCTTACTA VQSRTrancheINDEL99.00to99.50
1:248903165_G/A PASS
rs73148524 PASS
1:248903190_C/T PASS

This format is different from what you provid in honeyBadger package. I also tried other vcf files from
https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/#table-1

Those results are even more different from yours.
Am I on right track? Could you tell me which vcf files I need to use for most updated SNPs from hg38?

Best,

Doogie

error in hb$summarizeResults

Hi I am running the HoneyBADGER (https://github.com/JEFworks-Lab/HoneyBADGER/blob/master/docs/Getting_Started.md) on simply test data and its not working. It should at least work at test data but giving error at hb$summarizeResults

results <- hb$summarizeResults(geneBased=TRUE, alleleBased=FALSE)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 14, 13

could you fix this tutorial

error about "getCellAlleleCount" command line

Hi,
I am trying to use HonryBADGER to call CNV from 10X Genomics RNAseq data. Initially, I used the snps matrix from the HoneyBadger package and loaded my own bam, bai and barcode file to run

results <- getCellAlleleCount(snps, bamFiles, indexFiles, cellBarcodes)
However, I got error like this:
Error in .io_bam(.c_Pileup, file, bamReverseComplement(scanBamParam), :
seqlevels(param) not in BAM header:
seqlevels: ‘NC_007605’.
I used the Rsamtools to check the header of BAM file and there was no "NC_007605" in the header. I have googled about NC_007605. It means "Human gammaherpesvirus 4, complete genome". So what does it work for here. How can I figure this out. Really appreciated if anyone can help. Thanks.

Add "next step" link to tutorial

This might be the least important issue ever but I was wondering if you could add a Next step link at the end of every tutorial.
For example, at the end of Preparing data it would be useful to have to possibility to quickly go to Getting started and so on.

Errors in getSnpMats, getSnpMats10X, and getCellAlleleCount: unable to find an inherited method for function ‘seqnames’ for signature ‘"DataFrame"’

I'm getting an unexpected error running through the prepare data tutorial on my 10X dataset. Is there any advice on where the source of the error might be and how to tackle it?

results <- getSnpMats(snps, bamFilename, indexFilename)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘seqnames’ for signature ‘"DataFrame"’
results <- getCellAlleleCount(snps, bamFilename, indexFilename, cellBarcodename)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘seqnames’ for signature ‘"DataFrame"’
results <- getSnpMats10X(snps, bamFilename, indexFilename)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘seqnames’ for signature ‘"DataFrame"’
session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.5.2 (2018-12-20)
os macOS Mojave 10.14.3
system x86_64, darwin15.6.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Chicago
date 2019-04-11

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
AnnotationDbi 1.44.0 2018-10-30 [1] Bioconductor
ape 5.3 2019-03-17 [1] CRAN (R 3.5.2)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.2)
backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.0)
bibtex 0.4.2 2017-06-30 [1] CRAN (R 3.5.0)
Biobase * 2.42.0 2018-10-30 [1] Bioconductor
BiocGenerics * 0.28.0 2018-10-30 [1] Bioconductor
BiocManager 1.30.4 2018-11-13 [1] CRAN (R 3.5.0)
BiocParallel * 1.16.6 2019-02-17 [1] Bioconductor
biomaRt 2.38.0 2018-10-30 [1] Bioconductor
Biostrings * 2.50.2 2019-01-03 [1] Bioconductor
bit 1.1-14 2018-05-29 [1] CRAN (R 3.5.0)
bit64 0.9-7 2017-05-08 [1] CRAN (R 3.5.0)
bitops 1.0-6 2013-08-17 [1] CRAN (R 3.5.0)
blob 1.1.1 2018-03-25 [1] CRAN (R 3.5.0)
BSgenome 1.50.0 2018-10-30 [1] Bioconductor
callr 3.2.0 2019-03-15 [1] CRAN (R 3.5.2)
caTools 1.17.1.2 2019-03-06 [1] CRAN (R 3.5.2)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.2)
cluster 2.0.7-1 2018-04-13 [1] CRAN (R 3.5.0)
coda 0.19-2 2018-10-08 [1] CRAN (R 3.5.0)
codetools 0.2-16 2018-12-24 [1] CRAN (R 3.5.2)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.5.2)
cowplot * 0.9.4 2019-01-08 [1] CRAN (R 3.5.2)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.0)
data.table 1.12.0 2019-01-13 [1] CRAN (R 3.5.2)
DBI 1.0.0 2018-05-02 [1] CRAN (R 3.5.0)
DelayedArray * 0.8.0 2018-10-30 [1] Bioconductor
desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.0)
devtools * 2.0.1 2018-10-26 [1] CRAN (R 3.5.2)
digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.0)
dplyr * 0.8.0.1 2019-02-15 [1] CRAN (R 3.5.2)
fitdistrplus 1.0-14 2019-01-23 [1] CRAN (R 3.5.2)
fs 1.2.7 2019-03-19 [1] CRAN (R 3.5.2)
future 1.12.0 2019-03-08 [1] CRAN (R 3.5.2)
future.apply 1.2.0 2019-03-07 [1] CRAN (R 3.5.2)
gbRd 0.4-11 2012-10-01 [1] CRAN (R 3.5.0)
gdata 2.18.0 2017-06-06 [1] CRAN (R 3.5.0)
GenomeInfoDb * 1.18.2 2019-02-12 [1] Bioconductor
GenomeInfoDbData 1.2.0 2019-03-12 [1] Bioconductor
GenomicAlignments 1.18.1 2019-01-04 [1] Bioconductor
GenomicFeatures 1.34.7 2019-03-25 [1] Bioconductor
GenomicRanges * 1.34.0 2018-10-30 [1] Bioconductor
ggplot2 * 3.1.1 2019-04-07 [1] CRAN (R 3.5.2)
ggrepel 0.8.0 2018-05-09 [1] CRAN (R 3.5.0)
ggridges 0.5.1 2018-09-27 [1] CRAN (R 3.5.0)
globals 0.12.4 2018-10-11 [1] CRAN (R 3.5.0)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.2)
gplots 3.0.1.1 2019-01-27 [1] CRAN (R 3.5.2)
gridExtra 2.3 2017-09-09 [1] CRAN (R 3.5.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 3.5.2)
gtools 3.8.1 2018-06-26 [1] CRAN (R 3.5.0)
hdf5r * 1.1.1 2019-03-26 [1] CRAN (R 3.5.2)
hms 0.4.2 2018-03-10 [1] CRAN (R 3.5.0)
HoneyBADGER * 0.1 2019-04-05 [1] Github (4eb9c06)
htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.0)
htmlwidgets 1.3 2018-09-30 [1] CRAN (R 3.5.0)
httr 1.4.0 2018-12-11 [1] CRAN (R 3.5.0)
ica 1.0-2 2018-05-24 [1] CRAN (R 3.5.0)
igraph 1.2.4 2019-02-13 [1] CRAN (R 3.5.2)
IRanges * 2.16.0 2018-10-30 [1] Bioconductor
irlba 2.3.3 2019-02-05 [1] CRAN (R 3.5.2)
jsonlite 1.6 2018-12-07 [1] CRAN (R 3.5.0)
KernSmooth 2.23-15 2015-06-29 [1] CRAN (R 3.5.2)
labeling 0.3 2014-08-23 [1] CRAN (R 3.5.0)
lattice 0.20-38 2018-11-04 [1] CRAN (R 3.5.2)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.5.2)
listenv 0.7.0 2018-01-21 [1] CRAN (R 3.5.0)
lmtest 0.9-36 2018-04-04 [1] CRAN (R 3.5.0)
lsei 1.2-0 2017-10-23 [1] CRAN (R 3.5.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.0)
MASS 7.3-51.4 2019-03-31 [1] CRAN (R 3.5.2)
Matrix * 1.2-17 2019-03-22 [1] CRAN (R 3.5.2)
matrixStats * 0.54.0 2018-07-23 [1] CRAN (R 3.5.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.0)
metap 1.1 2019-02-06 [1] CRAN (R 3.5.2)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.0)
nlme 3.1-137 2018-04-07 [1] CRAN (R 3.5.2)
npsurv 0.4-0 2017-10-14 [1] CRAN (R 3.5.0)
pbapply 1.4-0 2019-02-05 [1] CRAN (R 3.5.2)
pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.0)
pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.5.2)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.0)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.0)
plotly 4.8.0 2018-07-20 [1] CRAN (R 3.5.2)
plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.0)
png 0.1-7 2013-12-03 [1] CRAN (R 3.5.0)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.0)
processx 3.3.0 2019-03-10 [1] CRAN (R 3.5.2)
progress 1.2.0 2018-06-14 [1] CRAN (R 3.5.0)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.0)
purrr 0.3.2 2019-03-15 [1] CRAN (R 3.5.2)
R.methodsS3 1.7.1 2016-02-16 [1] CRAN (R 3.5.0)
R.oo 1.22.0 2018-04-22 [1] CRAN (R 3.5.0)
R.utils 2.8.0 2019-02-14 [1] CRAN (R 3.5.2)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.2)
RANN 2.6.1 2019-01-08 [1] CRAN (R 3.5.2)
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 3.5.0)
Rcpp 1.0.1 2019-03-17 [1] CRAN (R 3.5.2)
RCurl 1.95-4.12 2019-03-04 [1] CRAN (R 3.5.2)
Rdpack 0.10-1 2018-10-04 [1] CRAN (R 3.5.0)
remotes 2.0.2 2018-10-30 [1] CRAN (R 3.5.0)
reshape2 * 1.4.3 2017-12-11 [1] CRAN (R 3.5.0)
reticulate 1.11.1 2019-03-06 [1] CRAN (R 3.5.2)
rjags 4-8 2018-10-19 [1] CRAN (R 3.5.0)
rlang 0.3.4 2019-04-07 [1] CRAN (R 3.5.2)
ROCR 1.0-7 2015-03-26 [1] CRAN (R 3.5.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.0)
Rsamtools * 1.34.1 2019-01-31 [1] Bioconductor
RSQLite 2.1.1 2018-05-06 [1] CRAN (R 3.5.0)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.5.2)
rsvd 1.0.0 2018-11-06 [1] CRAN (R 3.5.0)
rtracklayer 1.42.2 2019-03-01 [1] Bioconductor
Rtsne 0.15 2018-11-10 [1] CRAN (R 3.5.0)
S4Vectors * 0.20.1 2018-11-09 [1] Bioconductor
scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.0)
sctransform * 0.0.0.900 2019-04-08 [1] Github (satijalab/sctransform@4ab830c)
SDMTools 1.1-221 2014-08-05 [1] CRAN (R 3.5.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.0)
Seurat * 3.0.0.9000 2019-03-30 [1] Github (satijalab/seurat@6d361c0)
stringi 1.4.3 2019-03-12 [1] CRAN (R 3.5.2)
stringr 1.4.0 2019-02-10 [1] CRAN (R 3.5.2)
SummarizedExperiment * 1.12.0 2018-10-30 [1] Bioconductor
survival 2.44-1.1 2019-04-01 [1] CRAN (R 3.5.2)
tibble 2.1.1 2019-03-16 [1] CRAN (R 3.5.2)
tidyr 0.8.3 2019-03-01 [1] CRAN (R 3.5.2)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.0)
tsne 0.1-3 2016-07-15 [1] CRAN (R 3.5.0)
usethis * 1.4.0 2018-08-14 [1] CRAN (R 3.5.0)
VariantAnnotation * 1.28.13 2019-03-19 [1] Bioconductor
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.5.0)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.0)
XML 3.98-1.19 2019-03-06 [1] CRAN (R 3.5.2)
XVector * 0.22.0 2018-10-30 [1] Bioconductor
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.0)
zlibbioc 1.28.0 2018-10-30 [1] Bioconductor
zoo 1.8-5 2019-03-21 [1] CRAN (R 3.5.2)

[1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library

Usage of HoneyBADGER on 10x data?

Hi! I installed the HoneyBADGER on MacOS with R 3.5.0. Afterwards went though tutorial and everything worked fine. My main question: is the package suitable for 3' focused 10X data? Are there some settings adjustment for such data analysis?
I tried to apply the tool usage on 10X sample with gene counts CPM normalized and log2 adjusted (~2500 cells). However, several issues were introduced. The visualisation function hb$plotGexpProfile() got stuck and did not work in RStudio. Then I tried to make a PDF and this was OK, however did not work for PNG or SVG generation. Further the analysis got stuck on function hb$calcGexpCnvBoundaries(init=TRUE, verbose=FALSE), it took more than 3 hours and R session crashed. What could’ve been the problem? RAM limits? I am going to try rerunning it on Linux cluster as well, but thought maybe this could be improved from settings control.

Package exports object named ref

To reproduce:

ref <- c(1)
library(HoneyBadger)

Attaching package: ‘HoneyBADGER’

The following object is masked by ‘.GlobalEnv’:

ref

lt$setGexpDev

I am experiencing problem while executing the setGexpDev function with my data. I think the error might be at the step: gexp.sd <- sd(gexp.norm). The function, in standalone setting, works when I convert the gexp.norm into a matrix.

lt$setMvFit(verbose=TRUE)
Modeling expected variance ... Done!
lt$setGexpDev(verbose=TRUE)
Error in is.data.frame(x) :
(list) object cannot be coerced to type 'double'
setGexpDev(as.matrix(lt$gexp.norm))
1
0.1507096
lt$calcGexpCnvBoundaries(init=TRUE, verbose=FALSE)
Error in -dev : invalid argument to unary operator

Error in calcGexpCnvBoundaries when running with numeric chromosome names

Hello,

I am testing the Honeybadger package and we have mapping to an assembly that has chromosome names without the "chr" so I modified setGexpMats to not add on "chr" to the chrom names and then tested running:

hb$calcGexpCnvBoundaries(init=TRUE, verbose=FALSE, chrs=as.character(1:22))

It failed and I found that the chrs parameter is not passed to the the function again at the recursion step, same thing for all params that are not default in the original call to the function. This is the part where it fails:

    ## Recursion

    ##print('Recursion for Group1')

    if(length(g1)>=3) {

        tryCatch({
            calcGexpCnvBoundaries(gexp.norm.sub=gexp.norm[, g1])
        }, error = function(e) { cat(paste0("ERROR: ", e)) })

    }

    ##print('Recursion for Group2')

    if(length(g2)>=3) {

        tryCatch({
            calcGexpCnvBoundaries(gexp.norm.sub=gexp.norm[, g2])
        }, error = function(e) { cat(paste0("ERROR: ", e)) })

    }

Another comment - I need to run the package on a secure server with no internet access, so it would be preferred to be able to pass a data frame with gene annotation instead of the mart object. I made my own function that takes the gos data.frame instead of the mart.obj as input and at the same time changed the chromosome names. Which led me to this new problem...

unable to install HoneyBADGER packge

Dear Jean,
I have problems to install HoneyBADGER packge. I tried using your recommended code lines:

require(devtools)
devtools::install_github('JEFworks/HoneyBADGER')

and get the following error:

> require(devtools)
Loading required package: devtools
Warning message:
package ‘devtools’ was built under R version 3.4.3 
> devtools::install_github('JEFworks/HoneyBADGER')
Downloading GitHub repo JEFworks/HoneyBADGER@master
from URL https://api.github.com/repos/JEFworks/HoneyBADGER/zipball/master
Installing HoneyBADGER
"C:/PROGRA~1/R/R-34~1.1/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL  \
  "C:/Users/Hassan/AppData/Local/Temp/RtmpEl8p7L/devtools18d0487b12d8/JEFworks-HoneyBADGER-28b51c9"  \
  --library="C:/Program Files/R/R-3.4.1/library" --install-tests 

ERROR: dependencies 'GenomicRanges', 'ChIPseeker' are not available for package 'HoneyBADGER'
* removing 'C:/Program Files/R/R-3.4.1/library/HoneyBADGER'
Installation failed: Command failed (1)

Do you have any suggestion why I am getting this?

Thanks a lot, HM

jefworks-lab / honeybadger Goto Github PK

honeybadger's People

Contributors

Stargazers

Watchers

Forkers

honeybadger's Issues

Recommend Projects

Recommend Topics

Recommend Org