Comments (4)
Hi Lukas,
Thanks for follow up the discussion here. It is indeed nice to test the synthetic mixture before the real experiments.
First, I think samtoools merge
can't resolve the issue that cell barcodes are overlapped between multiple samples, so the merged cell barcode may refer to multiple cells in different donors. That's the main motivation that we wrote the python simulate, which we mainly manipulated the cell barcode by adding the donor id (as the lane id), so it will be barcode-1, barcode-2, etc. If you already keep the donor id in the lane id, then I guess samtools
should be fine.
For the our python simulator, it requires cell barcodes to avoide parsing all whitelist, as most are empty drops. The input barcodes files should be each barcodes for each bam. For example if you pooling 4 samples, it will be -b path/barocde_file1.tsv,path/barcode_file2.tsv,path/barocde_file3.tsv,path/barcode_file4.tsv
in the same order of bam files. BTW, the barcode file need to be plain text but not gzipped (sorry developped with cellrange v2).
Once you pool the bam files properly, you run cellSNP with giving a list of SNPs and cell barcodes, it should run in mode 1.
Let me know anything unclear.
Yuanhua
from cellsnp.
Ok, thank you! I will try to merge the BAM files correctly, and then get back to you again if I have any further questions.
from cellsnp.
Hi Yuanhua, thanks a lot for the reply, and for providing the synth_pool.py
script. This makes sense -- so the way I was trying to merge the BAM files was too simple.
I have now tried running synth_pool.py
together with the Cell Ranger BAM files and (gunzipped) filtered barcode lists per sample, to merge the BAM files, but now I am getting the following error:
Traceback (most recent call last):
File "../synth_pool/synth_pool.py", line 266, in <module>
main()
File "../synth_pool/synth_pool.py", line 220, in main
load_sample=False)
File "/users/lweber/.local/lib/python3.7/site-packages/cellSNP/utils/vcf_utils.py", line 70, in load_VCF
if vcf_file[-3:] == ".gz" or vcf_file[-4:] == ".bgz":
TypeError: 'NoneType' object is not subscriptable
If I understand this correctly, it is because I did not provide a VCF file to the --regionFile
argument. However, if I provide a VCF file, then the script will return a BAM file containing only reads covering the given variants -- but I am trying to merge the BAM files and keep all reads.
Is there a way to run the script to merge BAM files without the --regionFile
argument? (Sorry I am not so proficient at Python, otherwise I could probably just edit the script.)
from cellsnp.
Hi Lukas, you understanding is correct. This code indeed requries --regionFile
, as it parses reads by covering a list of SNPs (see the code). You could change this section of codes to cover the whole chromosome(s). However, this stragegy is probably not fast, and difficult to handle many settings.
Alternatively, you have the initial cell barcodes with the sample id when running cellranger (not sure if --sample
or --lanes
can do this?), then you can use samtools merge
, which might be more efficient to pool the whole bam files.
Yuanhua
from cellsnp.
Related Issues (20)
- Change the flag filtering default to include PCR duplicates HOT 4
- Running cellSNP on transcriptomic BAM from salmon alevin HOT 4
- Reference Genome HOT 1
- Wrong number of chromosomes in output file? HOT 3
- Does it work for mouse? HOT 1
- Some SNPs are not in gene HOT 1
- AD in cells HOT 1
- Empty AD&DP mtx HOT 23
- KeyError: '.' in cellSNP/pileup_utils.py causing failure of temp file merging HOT 2
- Chromosome naming in reference VCF files HOT 11
- Run time estimation HOT 1
- generating VCF for REGION_VCF from RNA-seq data HOT 1
- Adapting cellSNP for gene-of-interest / indels / multiple scRNA-seq samples
- RuntimeWarning: divide by zero encountered in log HOT 9
- Default nan-handling policy is a memory hog HOT 1
- Is it expected to generate a "position x barcode" matrix? HOT 1
- output sparse matrix HOT 11
- No SNPs called in more than 1 barcodes HOT 3
- Run time estimation HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cellsnp.