Comments (9)
Hi, it might be related to the warning: "RuntimeWarning: divide by zero encountered in log". Before that you can try a few diagnoses:
- change to single CPU mode by setting -p 1, see if you see the same error.
- try it in Python 3 environment, which seems not likely the case though
Alternatively, I wonder which species you are working on. If human, I would normally pile up a list candidate SNPs, e.g., from 1000 genome project, to avoid potential RNA editing variants contamination.
from cellsnp.
I tried all of these suggestions, none of them helped. Would you mind to explain a little bit what this part of code exactly do?
from cellsnp.
one possibility is that your reads have some bases with lowest quality, hence Q=0:
https://support.illumina.com/help/BaseSpace_OLH_009008/Content/Source/Informatics/BS/QualityScoreEncoding_swBS.htm
I'm not sure if this is the case? can you provide more details on the logs? is this failed from the begining or it runs a while and failed in the middle?
from cellsnp.
The log I provided in the very beginning is all the information it outputs. If Q=0 is the case, is it possible to release a hot-fix to defend this situation?
from cellsnp.
OK, I have added a minQ
to avoid 0 value:
https://github.com/huangyh09/cellSNP/blob/master/cellSNP/utils/pileup_utils.py#L100
Could you re-install from this repo (not on pypi yet), and see if it works? You may need to pip uninstall cellSNP
first.
from cellsnp.
I double checked the code, after pip uninstall and pip install, the line you modified didn't sync with your branch. I manually added this line, I will try again.
from cellsnp.
After trying various ways to update and confirm that the code has included this patch, cellSNP still complains the very beginning message:
[cellSNP] mode 1: fetch given SNPs in 23111 single cells.
[cellSNP] loading the VCF file for given SNPs ...
[cellSNP] fetching 36312488 candidate variants ...
Traceback (most recent call last):
File "/net/1000g/fanzhang/bin/miniconda2/bin/cellSNP", line 9, in
load_entry_point('cellSNP==0.1.4', 'console_scripts', 'cellSNP')()
File "/net/1000g/fanzhang/bin/miniconda2/lib/python2.7/site-packages/cellSNP-0.1.4-py2.7.egg/cellSNP/cellSNP.py", line 209, in main
min_MAPQ, max_FLAG, min_LEN, doubletGL, True)
File "/net/1000g/fanzhang/bin/miniconda2/lib/python2.7/site-packages/cellSNP-0.1.4-py2.7.egg/cellSNP/utils/pileup_utils.py", line 240, in fetch_positions
qual_list, cell_list, UMIs_list, barcodes)
File "/net/1000g/fanzhang/bin/miniconda2/lib/python2.7/site-packages/cellSNP-0.1.4-py2.7.egg/cellSNP/utils/pileup_utils.py", line 309, in map_barcodes
qual_cells[_idx][BASE_IDX[_base]] += qual_vector(_qual)
File "/net/1000g/fanzhang/bin/miniconda2/lib/python2.7/site-packages/cellSNP-0.1.4-py2.7.egg/cellSNP/utils/pileup_utils.py", line 101, in qual_vector
RV = [np.log(1-Q), np.log(3/4 - 2/3Q), np.log(1/2 - 1/3Q), np.log(Q)]
RuntimeWarning: divide by zero encountered in log
I think this means Q also should be banned from more values. is it possible to explain how do these variables map to your paper formula?
Thanks!
from cellsnp.
That's very strange. I have given a lower bound 0.00001. This is a developing feature and has been used for the donor deconvolution.
In principle, you can turn off some of these codes for diagnosis. Alternatively, you can send me a few reads in your bam file, and I may unsterstand this issue better. Thanks.
from cellsnp.
I mean it is not necessarily because of the term np.log(Q), I think np.log(1-Q), np.log(3/4 - 2/3Q), np.log(1/2 - 1/3Q) also potentially could cause the problem, especially log(1-Q), because base quality 1 is also very common.
from cellsnp.
Related Issues (20)
- Run time estimation HOT 1
- Default nan-handling policy is a memory hog HOT 1
- Is it expected to generate a "position x barcode" matrix? HOT 1
- output sparse matrix HOT 11
- No SNPs called in more than 1 barcodes HOT 3
- Run time estimation HOT 3
- Question about running cellSNP on merged BAM files from multiple samples HOT 4
- Change the flag filtering default to include PCR duplicates HOT 4
- Running cellSNP on transcriptomic BAM from salmon alevin HOT 4
- Reference Genome HOT 1
- Wrong number of chromosomes in output file? HOT 3
- Does it work for mouse? HOT 1
- Some SNPs are not in gene HOT 1
- AD in cells HOT 1
- Empty AD&DP mtx HOT 23
- KeyError: '.' in cellSNP/pileup_utils.py causing failure of temp file merging HOT 2
- Chromosome naming in reference VCF files HOT 11
- generating VCF for REGION_VCF from RNA-seq data HOT 1
- Adapting cellSNP for gene-of-interest / indels / multiple scRNA-seq samples
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cellsnp.