Coder Social home page Coder Social logo

Comments (4)

huangyh09 avatar huangyh09 commented on June 16, 2024 2

Hi Lukas,

Thanks for follow up the discussion here. It is indeed nice to test the synthetic mixture before the real experiments.

First, I think samtoools merge can't resolve the issue that cell barcodes are overlapped between multiple samples, so the merged cell barcode may refer to multiple cells in different donors. That's the main motivation that we wrote the python simulate, which we mainly manipulated the cell barcode by adding the donor id (as the lane id), so it will be barcode-1, barcode-2, etc. If you already keep the donor id in the lane id, then I guess samtools should be fine.

For the our python simulator, it requires cell barcodes to avoide parsing all whitelist, as most are empty drops. The input barcodes files should be each barcodes for each bam. For example if you pooling 4 samples, it will be -b path/barocde_file1.tsv,path/barcode_file2.tsv,path/barocde_file3.tsv,path/barcode_file4.tsv in the same order of bam files. BTW, the barcode file need to be plain text but not gzipped (sorry developped with cellrange v2).

Once you pool the bam files properly, you run cellSNP with giving a list of SNPs and cell barcodes, it should run in mode 1.

Let me know anything unclear.

Yuanhua

from cellsnp.

lmweber avatar lmweber commented on June 16, 2024 1

Ok, thank you! I will try to merge the BAM files correctly, and then get back to you again if I have any further questions.

from cellsnp.

lmweber avatar lmweber commented on June 16, 2024

Hi Yuanhua, thanks a lot for the reply, and for providing the synth_pool.py script. This makes sense -- so the way I was trying to merge the BAM files was too simple.

I have now tried running synth_pool.py together with the Cell Ranger BAM files and (gunzipped) filtered barcode lists per sample, to merge the BAM files, but now I am getting the following error:

Traceback (most recent call last):
  File "../synth_pool/synth_pool.py", line 266, in <module>
    main()
  File "../synth_pool/synth_pool.py", line 220, in main
    load_sample=False)
  File "/users/lweber/.local/lib/python3.7/site-packages/cellSNP/utils/vcf_utils.py", line 70, in load_VCF
    if vcf_file[-3:] == ".gz" or vcf_file[-4:] == ".bgz":
TypeError: 'NoneType' object is not subscriptable

If I understand this correctly, it is because I did not provide a VCF file to the --regionFile argument. However, if I provide a VCF file, then the script will return a BAM file containing only reads covering the given variants -- but I am trying to merge the BAM files and keep all reads.

Is there a way to run the script to merge BAM files without the --regionFile argument? (Sorry I am not so proficient at Python, otherwise I could probably just edit the script.)

from cellsnp.

huangyh09 avatar huangyh09 commented on June 16, 2024

Hi Lukas, you understanding is correct. This code indeed requries --regionFile, as it parses reads by covering a list of SNPs (see the code). You could change this section of codes to cover the whole chromosome(s). However, this stragegy is probably not fast, and difficult to handle many settings.

Alternatively, you have the initial cell barcodes with the sample id when running cellranger (not sure if --sample or --lanes can do this?), then you can use samtools merge, which might be more efficient to pool the whole bam files.

Yuanhua

from cellsnp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.