Coder Social home page Coder Social logo

Comments (13)

JEFworks avatar JEFworks commented on August 19, 2024

Hi,

The error is caused by your snps variable containing a genomic ranges coordinate with seqlevel ‘NC_007605’ that is not present in your bam.

Can you please try the getSnpMats10X() function instead if you have not done so already? I have previously encountered this issue and thought I had addressed it by filtering to relevant seqlevels in getSnpMats10X().

Also, please double check that your bam is aligned to the same genome version as your snps. Note the built-in ExAC snps are from hg19.

Best,
Jean

from honeybadger.

helianthuszhu avatar helianthuszhu commented on August 19, 2024

Hi, Jean,
Thanks for your reply. Based on your suggestion, I tried to use getSnpMats10X() command line, unfortunately also failed and got the same errors. I am sure my bam file were aligned from hg19 genome version which means it can be matched with the snps from ExAC just in the honeyBADGER package. So, do you have any other suggestion for me? Thanks.

from honeybadger.

JEFworks avatar JEFworks commented on August 19, 2024

Hum, in that case, please try the following:

# load ExAC chromosome 1 snps
load(system.file("ExAC", "ExAC_chr1.RData", package = "HoneyBADGER"))
head(snps)
seqlevels(snps)

# filter for only those with seqlevels in canonical chromosomes
chrs <- c(as.character(1:22), "X", "Y")
seqlevels(snps) <- chrs
head(snps)
seqlevels(snps)

Now you should be able to do

results <- getSnpMats10X(snps, bamFiles, indexFiles, barcodes)

If it works, I can update the getSnpMats10X() to do this by default.

from honeybadger.

helianthuszhu avatar helianthuszhu commented on August 19, 2024

Hi, Jean,
Thanks again. This time I got a different error like this in the pileup step:
[1] "Getting pileup..."
Error in validObject(.Object) :
invalid class “ScanBamParam” object: 'tagFilter' must contain only non-NULL, non-NA, non-empty character or integer values.
Does it sound like the bam file is lack of something? Or should I filter or modify my bam file?
Thanks.

from honeybadger.

JEFworks avatar JEFworks commented on August 19, 2024

Did you use CellRanger to get your bam? Do you know which version of CellRanger was used? It looks like your reads may be missing the 'CB' tag, which is used to assign each read to a cell barcode. Or perhaps CellRanger updated their tag name.

For example, here is a read from a 10X bam. Note the CB tag.

> samtools view aml027_post_transplant_possorted_genome_bam.bam | less
NB500915:165:HYLMFBGXX:3:23402:11145:13415      16      1       10544   3       68M30S  *       0       0       AAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCCCCATGTACTCTGCGTTGATACCACTGCTT      <EE/EEE6EEE6EEAE6EEA/AEE<EEE/EAEEEEE<EEAAEEAEEAAEAEEE/EEEEEEEEEEEEEEEEEEE6EAEEEEEAE/EEEEEEEEEAAAA/      NH:i:2  HI:i:1  AS:i:64 nM:i:1  NM:i:1  CR:Z:TAAAGACTTCTAGG     CQ:Z:AAAAA6EEEEEEEE     CB:Z:TAAAGACTTCTAGG-2   UR:Z:TCCTGTCGTT UQ:Z:AAAAAEEE/A UB:Z:TCCTGTCGTT BC:Z:CTATACGC   QT:Z:EEEAAAAA

from honeybadger.

helianthuszhu avatar helianthuszhu commented on August 19, 2024

from honeybadger.

JEFworks avatar JEFworks commented on August 19, 2024

As long as the CB tag is present in all reads, it should be fine. To rule out the possibility of this being an issue with R/RSamtools, can you please try the python version for getting the allele counts information: https://github.com/barkasn/scAlleleCount

from honeybadger.

helianthuszhu avatar helianthuszhu commented on August 19, 2024

from honeybadger.

joeymays avatar joeymays commented on August 19, 2024

Hum, in that case, please try the following:

# load ExAC chromosome 1 snps
load(system.file("ExAC", "ExAC_chr1.RData", package = "HoneyBADGER"))
head(snps)
seqlevels(snps)

# filter for only those with seqlevels in canonical chromosomes
chrs <- c(as.character(1:22), "X", "Y")
seqlevels(snps) <- chrs
head(snps)
seqlevels(snps)

Now you should be able to do

results <- getSnpMats10X(snps, bamFiles, indexFiles, barcodes)

If it works, I can update the getSnpMats10X() to do this by default.

Following up on this part of the discussion, I was having a similar problem where the snps object contained genomic ranges corresponding to GL unlocalized sequences. Filtering on canonical seqlevels as suggested above solved the issue for me.

from honeybadger.

pangxueyu233 avatar pangxueyu233 commented on August 19, 2024

Hello, Jean
it's appreciated working to analysis CNV in single cell transcriptome level. and recently, I used getSnpMats10X() to import our 10x data, and I found some problems as following:

that's the code of getSnpMats10X():
getSnpMats10X <- function (snps, bamFiles, indexFiles, barcodes, n.cores = 1,
verbose = FALSE)
{
cov <- getCoverage(snps, bamFile, indexFile, verbose)
if (verbose) {
print("Snps with coverage:")
print(table(cov > 0))
}
vi <- cov > 0
if (sum(vi) == 0) {
print("ERROR: NO SNPS WITH COVERAGE. Check if snps and bams are using the same alignment reference.")
return(NULL)
}
......

And have you noticed that parameters are not paired in following function
getSnpMats10X <- function (snps, bamFiles, indexFiles, barcodes, n.cores = 1,
verbose = FALSE)
{....}

and following function like getCoverage() used bamFile and indexFile without 's'
cov <- getCoverage(snps, bamFile, indexFile, verbose)

So, I just fixed it up by remove 's' in code of getSnpMats10X. Hope that's right for following analysis.

from honeybadger.

ccruizm avatar ccruizm commented on August 19, 2024

Hello @JEFworks!
Thanks for this amazing tools. I have been trying to use it in my own data (scRNA 10x V3 - CellRanger 3.0.2) but have had some issues. In the beginning I use the function getCellAlleleCount and had the same issue posted by @zhuxqdoctor (also using the snps matrix from the HoneyBadger). Then after reading this thread I have followed your recommendations to read in 10x data. However, once I run the command getSnpMats10X, I got this error:

Error in getCoverage(snps, bamFile, indexFile, verbose): object 'bamFile' not found
Traceback:

1. getSnpMats10X(snps, bamFiles, indexFiles, barcodes, n.cores = 12)
2. getCoverage(snps, bamFile, indexFile, verbose)
3. pileup(file = bamFile, index = indexFile, scanBamParam = ScanBamParam(which = gr), 
 .     pileupParam = pp)

I really do not understand why it is happening because I am point out to the correct path to the file. Then, I checked the bam file generated by CellRanger and noticed that not all reads have the CB tag. Do you think this can be the source of the problem? Should I filter the bam file manually?

Thanks in advance for your help!

from honeybadger.

JEFworks avatar JEFworks commented on August 19, 2024

Hi @pangxueyu233 ,
Thanks so much for the catch. The bug has been fixed in commit 6c77f05

@ccruizm Please follow @pangxueyu233 's fix for the bug or copy the updated function with the correct parameter names:

getSnpMats10X <- function(snps, bamFile, indexFile, barcodes, n.cores=1, verbose=FALSE) {
  
  ## loop
  cov <- getCoverage(snps, bamFile, indexFile, verbose)
  
  ## any coverage?
  if(verbose) {
    print("Snps with coverage:")
    print(table(cov>0))
  }
  vi <- cov>0
  if(sum(vi)==0) {
    print('ERROR: NO SNPS WITH COVERAGE. Check if snps and bams are using the same alignment reference.')
    return(NULL)
  }
  cov <- cov[vi]
  snps.cov <- snps[vi,]
  
  if(verbose) {
    print("Getting allele counts...")
  }
  alleleCount <- getCellAlleleCount(snps.cov, bamFile, indexFile, barcodes, verbose=TRUE, n.cores=n.cores)
  refCount <- alleleCount[[1]]
  altCount <- alleleCount[[2]]
  
  ## check correspondence
  if(verbose) {
    print("altCount + refCount == cov:")
    print(table(altCount + refCount == cov))
    print("altCount + refCount < cov: sequencing errors")
    print(table(altCount + refCount < cov))
    ##vi <- which(altCount + refCount != cov, arr.ind=TRUE)
    ## some sequencing errors evident
    ##altCount[vi]
    ##refCount[vi]
    ##cov[vi]
  }
  
  results <- list(refCount=refCount, altCount=altCount, cov=cov)
  return(results)
}

from honeybadger.

davidcoffey avatar davidcoffey commented on August 19, 2024

As long as the CB tag is present in all reads, it should be fine. To rule out the possibility of this being an issue with R/RSamtools, can you please try the python version for getting the allele counts information: https://github.com/barkasn/scAlleleCount

I had the same issue as reported by @helianthuszhu. I am using Cellranger 3.0 and confirmed that the CB tags are present in my BAM files. I identified the problem to be within the getCellAlleleCount function. The tf variable must be a character vector within a named list. Instead, it is a factor within a named list. So to solve the problem, change tf <- list(cell) to tf <- list(as.character(cell)) within the getCellAlleleCount function.

from honeybadger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.