statgen / demuxlet Goto Github PK

Genetic multiplexing of barcoded single cell RNA-seq

License: Apache License 2.0

C 1.97% C++ 97.48% Makefile 0.25% M4 0.30%

demuxlet's Introduction

Dependencies
------------

On debian type systems (including Ubuntu), add the following packages if they are not already installed (or have your admin add them if you do not have permission):

sudo apt-get install g++ libssl-dev libcurses-perl zlib1g-dev

Building
--------

To compile, from the top level directory, type: make
To test (after compiling), from the top level directory, type: make test

Under the main statgen repository, there are: 
* lib - the library code
** The library code is compiled into libStatGen.a which, after compiling is located at: statgen/lib/libStatGen.a
** After compiling, library headers can all be found in: statgen/lib/include
* src - the tools we developed
** After compling, the executables are found in: statgen/src/bin
* scripts - the scripts we developed

Makefiles
---------
statgen/Makefile.include should contain the definitions that you need for creating software using this library.

statgen/lib/Makefile.lib and statgen/lib/Makefile.src can be used as templates for creating Makefiles for new software.  If possible, just link to them.  They look for a file called Makefile.tool that should be all you need to update for your specific software.  (both Makefiles automatically include Makefile.include)

A similar setup should be used for test code, linking your Makefile to statgen/Makefile.test and defining Makefile.testtool for your specific test.

Other Notes
-----------
* Typically the .o files are compiled into their own directory called obj.

demuxlet's People

Contributors

Stargazers

Watchers

demuxlet's Issues

install error

Hi,

I keep getting the following error when run ./configure

./configure: line 5240: syntax error near unexpected token 2.2' ./configure: line 5240: LT_PREREQ(2.2)'

And I checked my config.log, the error occurred from here

gcc version 9.1.0 (Ubuntu 9.1.0-2ubuntu2~16.04)
configure:3112: $? = 0
configure:3101: g++ -V >&5
g++: error: unrecognized command line option '-V'
g++: fatal error: no input files
compilation terminated.
configure:3112: $? = 1
configure:3101: g++ -qversion >&5
g++: error: unrecognized command line option '-qversion'; did you mean '--version'?
g++: fatal error: no input files
compilation terminated.

Could you help figure what might be the reason? Thanks.

Any recommandation about computational requirement?

Hi, have you any recommandation on computational need for a demuxlet run?
Is there any step of the workflow that run parallelization and would benefit from more cores?
Thank you!

FATAL ERROR - [bcf_filtered_reader.cpp:5 void BCFFilteredReader::init_params()] bcf_file_name is empty

What might be the reason on the followins issue of FATAL ERROR:

$ which demuxlet
/path-to/demuxlet-20180509/bin/centos/demuxlet
$ demuxlet

Available Options

The following parameters are available. Ones with "[]" are in effect:
   Options for input SAM/BAM/CRAM : --sam, --tag-group [CB], --tag-UMI [UB]
        Options for input VCF/BCF : --vcf, --field [GP], --geno-error [0.01],
                                    --min-mac [1], --min-callrate [0.50], --sm,
                                    --sm-list
                   Output Options : --out, --alpha, --write-pair,
                                    --doublet-prior [0.50],
                                    --sam-verbose [1000000],
                                    --vcf-verbose [10000]
           Read filtering Options : --cap-BQ [40], --min-BQ [13],
                                    --min-MQ [20], --min-TD,
                                    --excl-flag [3844]
   Cell/droplet filtering options : --group-list, --min-total, --min-uniq,
                                    --min-snp

Run with --help for more detailed help messages of each argument.


FATAL ERROR - 
[bcf_filtered_reader.cpp:5 void BCFFilteredReader::init_params()] bcf_file_name is empty

terminate called after throwing an instance of 'pException'
  what():  Exception was thrown
Aborted

Naive question about VCF file

Hi everyone,

Maybe a big naive question, but do you think that it is possible to generate the vcf file necessary for demultiplexing based on RNAseq bulk for each patient?

Thanks a lot,
nicolas

autoreconf may not run as is

At least for me autoreconf is expecting ~/tmp and ./m4 to exist otherwise it'll give errors. Maybe add an empty ./m4 folder and add a note to the installation instructions about ~/tmp.

gpAB has a size limit (causes seg faults for large # snps and genotypes)

Specifically, even on a 64-bit system there is a limit on the array sizes, for example on the usual intel systems if

scl.nsnps * nv * nv * 9

is greater than the max for an integer (2^32 or about 4.3 billion) it will not be able to allocate. For example, something like scl.nsnps=3M and nv=20 would cause segmentation faults.

Output SNP information for the barcodes

Hello,

The N.SNP in the best output now shows the number of SNPs that are covered for each barcode/droplet. Is there a way to get the actual positions of these SNPs?

vcf file for figure 2

It seems the vcf file provided in demuxlet_paper_code/fig2 is for 32 samples in figure3, but not the ones for 4 (W1 & W2) or 8 (W3) sample split...

version of demuxlet that does not require external genotypes

Hi, about a year ago @hyunminkang mentioned that there would have been an upgrade on the genotyping strategy.

It seems nothing has happened (?)

Has anyone tried genotyping on 10x data to deconvolute individual genotypes from pooled samples, and would something like this tool work? 10.1038/s41467-018-07170-5

Any suggestion is welcome
thanks!

Segmentation fault

Thanks for developing this tool. I am trying to get the program working on a subset of my data before running it on the complete dataset. I have extracted some 10X genomics reads mapping to chromosome 22 (test.bam) and run the following command:

demuxlet --sam test.sam --tag-group CB --tag-UMI UB --vcf pilot2.vcf --field GT --geno-error 0.01 --out test_run

Which throws the following error.

Available Options

The following parameters are available. Ones with "[]" are in effect:
Options for input SAM/BAM/CRAM : --sam [test.sam], --tag-group [CB],
--tag-UMI [UB]
Options for input VCF/BCF : --vcf [pilot2.vcf], --field [GT],
--geno-error [0.01], --min-mac [1],
--min-callrate [0.50], --sm, --sm-list
Output Options : --out [test_run], --alpha, --write-pair,
--doublet-prior [0.50],
--sam-verbose [1000000],
--vcf-verbose [10000]
Read filtering Options : --cap-BQ [40], --min-BQ [13],
--min-MQ [20], --min-TD,
--excl-flag [3844]
Cell/droplet filtering options : --group-list, --min-total, --min-uniq,
--min-snp

Run with --help for more detailed help messages of each argument.

NOTICE [2019/03/11 14:31:53] - Finished identifying 8 samples to load from VCF/BCF

FATAL ERROR -
Segmentation fault

Samtools has no problem with the bam/sam files so I'm not sure why I am getting a segmentation fault. Any ideas what is going on?

Excessive read filtering

I am having an issue (or lack of understanding) with the read filtering. The documentation states that the RD.TOTL specifies counts of reads overlapping variants and RD.PASS indicates reads passing quality filters. However, when I set all quality thresholds to floor values (as well as a range of other integers/floats where noted), an absurd amount of reads are tossed from the analysis. Here is a snap shot of the *.best file:

BARCODE RD.TOTL RD.PASS RD.UNIQ N.SNP
AAACGAAGTAACGGAC 1528 1 1 1
AAACTCGAGAGGTCCA 30 2 2 1
AAACTCGGTAACGGTG 650 2 2 2
AAACTGCAGTCACGCC 106578 2 2 2
AAACTGCAGTTGCTTG 1024 2 2 1
AAACTGCCAGCTGATT 13348 1 1 1

The parameters set were:
demuxlet --sam alignments.bam --tag-group CB --tag-UMI UB --vcf genotypeData.vcf --field GT --geno-error 0.00001 --min-mac 1 --min-callrate 0.0001 --min-BQ 0 --min-MQ 0 --out test --min-total 0 --min-uniq 0 --min-snp 0

I should point out that I am analyzing paired-end scATAC-seq data. To get the UMI error to go away, I added a fake tag to the bam file (UB:Z:xxxxxxxx) with a random 8mer for each read. I suspect the issue is with the bam file. I have more than 9 million variants well-covered (75% of reads) by the PE data. I am also unclear what the 'callrate' refers to. Changing the genotype error, minor allele frequency, and read/variant filtering parameters have negligible effects. I have two genotypes in the pooled bam file. Any advice or suggestions would be fantastic.

Thank you.

Can this tool be applicable for scRNAseq data from C57BL/6 mouse inbred strains ?

As far as I know，C57BL/6 mouse inbred strain has few SNPs. This may lead to the homogeneous of genomic SNPs in this mouse strain.
Could it be possible that we can use Demuxlet to recover the sample identity of a cell in a mixed pool of cells from several C57BL/6 mice?

terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted

I am trying to run demuxlet on a big vcf file (~500Gb); but I am getting the following error that can be due to the memory running out the error (?), can you please suggest how can I get rid of it?

Here is my commend:
/home/demuxlet --sam /home/possorted_genome_bam.bam --vcf /home/picard_sorted.vcf --field GT --min-mac 10 --min-uniq 4 --out mut

Here is the error:
NOTICE [2019/02/25 17:12:47] - Reading 522000000 reads at Y:2865101 and skipping 292213112
NOTICE [2019/02/25 17:13:27] - WARNING: Suppressed a total of 217241 UMI warnings...
NOTICE [2019/02/25 17:13:27] - WARNING: Suppressed a total of 8688106 droplet/cell barcode warnings...
NOTICE [2019/02/25 17:13:27] - Finished reading 13 markers from the VCF file
NOTICE [2019/02/25 17:13:27] - Total number input reads : 547992973
NOTICE [2019/02/25 17:13:27] - Total number valid droplets observed : 631538
NOTICE [2019/02/25 17:13:27] - Total number valid SNPs observed : 13
NOTICE [2019/02/25 17:13:27] - Total number of read-QC-passed reads : 230131249
NOTICE [2019/02/25 17:13:27] - Total number of skipped reads with ignored barcodes : 0
NOTICE [2019/02/25 17:13:27] - Total number of non-skipped reads with considered barcodes : 230056564
NOTICE [2019/02/25 17:13:27] - Total number of gapped/noninformative reads : 230013896
NOTICE [2019/02/25 17:13:27] - Total number of base-QC-failed reads : 706
NOTICE [2019/02/25 17:13:27] - Total number of redundant reads : 1213
NOTICE [2019/02/25 17:13:27] - Total number of pass-filtered reads : 40749
NOTICE [2019/02/25 17:13:27] - Total number of pass-filtered reads overlapping with multiple SNPs : 16738
NOTICE [2019/02/25 17:13:27] - Starting to prune out cells with too few reads...
NOTICE [2019/02/25 17:13:27] - Finishing pruning out 0 cells with too few reads...
NOTICE [2019/02/25 17:13:27] - Starting to identify best matching individual IDs
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted

Could demuxlet be run on linux without Docker?

Thanks

Request for raw counts of figure 3D

Nat Biotechnol. 2018 Jan;36(1):89-94. doi: 10.1038/nbt.4042.
Multiplexed droplet single-cell RNA-sequencing using natural genetic variation.
Kang HM et al.
https://www.nature.com/articles/nbt.4042

I would like the raw counts you used to produce the figure 3D
probability graphs.

Could you please put a simple ascii csv of the data on github?

Thanks!

Tom

Thomas D. Schneider, Ph.D.
Senior Investigator
National Institutes of Health
National Cancer Institute
Center for Cancer Research
RNA Biology Laboratory
Biological Information Theory Group
Frederick, Maryland 21702-1201
https://schneider.ncifcrf.gov (current link)
https://alum.mit.edu/www/toms (permanent link)

turn off doublet detection

Is there a way to just infer sample identity and turn off double detection to save computational cost for experiments with higher numbers of samples?

Demuxlet without UMIs

Hi authors!

I have some single-cell ATAC seq data from 10X on which I would like to use demuxlet, however there is no UMI barcode in ATAC seq results. Is there a way to specify that there is no UMI? or should it work anyway?

The solution we thought about is to give the sequence as the UMI barcodes, then every reads except duplicates would be considered. However might be heavy as the sequences are very long.

Thank you very much!

installing Demuxlet

Hello,

I am having some trouble installing Demuxlet on my Mac. I followed the instructions step by step by README2.md and everything was ran fine until I was trying to run $make after I downloaded demuxlet. Here is the error message I got:

3 warnings generated. mv -f .deps/filter.Tpo .deps/filter.Po /usr/local/Cellar/llvm/7.0.1/bin/clang++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT tsv_reader.o -MD -MP -MF .deps/tsv_reader.Tpo -c -o tsv_reader.o tsv_reader.cpp mv -f .deps/tsv_reader.Tpo .deps/tsv_reader.Po /usr/local/Cellar/llvm/7.0.1/bin/clang++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT cmd_cram_demuxlet.o -MD -MP -MF .deps/cmd_cram_demuxlet.Tpo -c -o cmd_cram_demuxlet.o cmd_cram_demuxlet.cpp mv -f .deps/cmd_cram_demuxlet.Tpo .deps/cmd_cram_demuxlet.Po make[1]: *** No rule to make target ../htslib/libhts.a', needed by demuxlet'. Stop. make: *** [all] Error 2

Can someone please help me? Thanks very much!

Siyao

VCF file throws "Not a VCF/BCF file" error

Hello,

I am trying to figure out why running demuxlet with my vcf file, "sorted_kit1_1inFrontv3.vcf", is throwing this error.

After merging several vcf files together, I sorted the vcf files with vcftools (command: vcf-sort) and edited the header to reflect the numbering of the chromosomes, as per earlier suggestions. However, when I try to use demuxlet, I get this error:

Options for input SAM/BAM/CRAM : --sam [/data/H1_possorted_genome_bam.bam],
                                   --tag-group [CB], --tag-UMI [UB]
       Options for input VCF/BCF : --vcf [/data/sorted_kit1_1inFrontv3.vcf],
                                   --field [GT], --geno-error [0.01],
                                   --min-mac [1], --min-callrate [0.50], --sm,
                                   --sm-list
                  Output Options : --out [/data/16p_demuxlet_H1], --alpha,
                                   --write-pair, --doublet-prior [0.50],
                                   --sam-verbose [1000000],
                                   --vcf-verbose [10000]
          Read filtering Options : --cap-BQ [40], --min-BQ [13],
                                   --min-MQ [20], --min-TD,
                                   --excl-flag [3844]
  Cell/droplet filtering options : --group-list, --min-total, --min-uniq,
                                   --min-snp

Run with --help for more detailed help messages of each argument.


FATAL ERROR -
[bcf_chunked_reader.cpp:162 bool BCFChunkedReader::open_current_file()] Not a VCF/BCF file: /data/sorted_kit1_1inFrontv3.vcf


terminate called after throwing an instance of 'pException'
 what():  Exception was thrown

Does anyone know what might be causing this? Thank you.

multithreading?

Hi,

Thanks for the great software!

Is it multi threaded / multi process enabled? If so how do I use this feature? I wasn't able to find any docs on it.

Thanks!

Does demuxlet support Red Hat linux?

Demuxlet on Single Nuclei

Hi,
Has anyone tried Demuxlet on single nuclei using 10x? Does it work?

Also, what are people using for genotyping of human samples?

Doublets with the same sample ID

Good day,

I'm receiving a huge proportion of doublets from the same sample under the "BEST" column in demuxlet results.

I've combined bam files from two single-cell experiments (lets call them Sample1 and Sample2) for which we have genotyping data. The majority of cells are assigned as "DBL-Sample1-Sample1" or "AMB-Sample1-Sample1-Sample1". The top singlet (SNG.1ST) corresponds to the right sample in 99% of the cases.

What is the cause of the DBL-Sample1-Sample1 calls and is there a way to circumvent this?

Any help greatly appreciated.

Nenad

GL vs PL for genotype likelihood

Freebayes outputs VCFs with genotype likelihoods of format GL (which are not phred scaled) instead of PL, but I found that demuxlet failed to produce reasonable results using those as the field input. Do you have a script that converts GL to PL or what is the expected input/what platform was used to create the input VCF files? Thanks

Demuxlet filtering out vast majority of reads that 10X deems good

Hi!

I've been trying to use demuxlet to demultiplex a mixture of cells from two donors as a test run. I have 10x data which was processed with CellRanger version 2.1.1, and the summary stats I'm getting appear quite good (Q30 Bases in Barcode: 97.5%, Q30 Bases in RNA Read: 79.1%, Reads mapped confidently to transcriptome: 72.8%), however demuxlet filters the vast majority of my data out.

I'm running demuxlet with the following (mostly default) input:
Options for input SAM/BAM/CRAM : --sam [...],
--tag-group [CB], --tag-UMI [UB]
Options for input VCF/BCF : --vcf [...],
--field [GT], --geno-error [0.01],
--min-mac [1], --min-callrate [0.50], --sm,
--sm-list
Output Options : --out [test], --alpha, --write-pair,
--doublet-prior [0.50],
--sam-verbose [1000000],
--vcf-verbose [10000]
Read filtering Options : --cap-BQ [40], --min-BQ [13],
--min-MQ [20], --min-TD,
--excl-flag [3844]
Cell/droplet filtering options : --group-list, --min-total [500],
--min-uniq [500], --min-snp

During the run, the read QC output from demuxlet is giving me the following:

NOTICE [2018/05/01 08:46:14] - Total number input reads : 767538903
NOTICE [2018/05/01 08:46:14] - Total number of read-QC-passed reads : 321803293
NOTICE [2018/05/01 08:46:14] - Total number of skipped reads with ignored barcodes : 0
NOTICE [2018/05/01 08:46:14] - Total number of non-skipped reads with considered barcodes : 298996268
NOTICE [2018/05/01 08:46:14] - Total number of gapped/noninformative reads : 295557137
NOTICE [2018/05/01 08:46:14] - Total number of base-QC-failed reads : 250806
NOTICE [2018/05/01 08:46:14] - Total number of redundant reads : 707077
NOTICE [2018/05/01 08:46:14] - Total number of pass-filtered reads : 2481248
NOTICE [2018/05/01 08:46:14] - Total number of pass-filtered reads overlapping with multiple SNPs : 178168

While CellRanger tells me 72.8% of my reads are mapped confidently to the transcriptome, only 2.5 million of the 767.5 million reads demuxlet finds get through the read QC. Could anyone tell me why this may be the case? Should I be setting the read QC parameters to be more lenient?

It appears a lot of my reads are deemed 'noninformative'. Could someone tell me what this is likely to mean?

Another curious aspect of the whole thing is that demuxlet appears to be finding more reads than CellRanger (767 million total reads quoted in demuxlet vs 714 million total reads in CellRanger web summary output).

I would be very grateful for some advice or a point in the right direction.

How to enable --pileup mode/feature?

Thanks for the pointer in #14 !

However, when I try to activate this mode like the following, I get an error:

$demuxlet --pileup \
--sam /path/to/bam \
--field GT --geno-error 0.01 \
--sm-list samples.txt \
--group-list barcodes.tsv \
--out out_prefix \
--alpha 0 --alpha 0.5

---output truncated---

FATAL ERROR - 
[E:params.cpp:564 Status] Problems encountered parsing command line:

Command line parameter --pileup (#1) not recognized


terminate called after throwing an instance of 'pException'
  what():  Exception was thrown
Aborted (core dumped)

Thank you in advance!

Consider cleaning up the history of this repository

This repository is over 600MB in size when it should be significantly smaller.

du -ha | sort -h

reveals a very large .git folder. Using code from here I see there used to be a large supplementary.material folder which no longer exists. I'd suggest you either republish this repository and discard the history or maybe do something with Github's instructions for removing files and bfg to prune away these old files weighing the repository down.

VCF file to reproduce "Multiplexing droplet-based single cell RNA-sequencing using natural genetic barcodes" analysis.

Hello,
I'm Sébastien Nin, currently working in the Centre d'Immunologie de Marseille Luminy in France.
I would like to use demuxlet to enable the multiplexing of single-cell RNA-seq samples. In order to do this, I did the tutorial and I found it very good built. To continue my practice with this tool, I would like to reproduce the analysis that can be found in the paper "Multiplexing droplet-based single cell RNA-sequencing using natural genetic barcodes". I already found the bam files on GEO but I can't find the vcf file you used.
Can you advise me on the way I can get the vcf file?

Have a nice day,
Best regards,

Sébastien

Recommendation for cleaning of a VCF file for demuxlet

We are trying to use genotyping arrays (Axiom) to get VCFs for demuxlet.

Are there any recommended steps / tutorials for SNP cleaning and filtering to optimise the demuxlet yield?

We are currently cleaning our VCF on a) snp missingness b) SNP duplicates c) SNP strand (flipping) before imputing on Michigan server, then lifting over to hg38 and taking only those variants that overlap with exons of protein_coding genes and lncRNAs.

Any help greatly appreciated! :)

Tag a release

It would be convenient for conda users if demuxlet was added to bioconda. In order to facilitate writing a conda recipe, would you be able to tag a release on GitHub? Also, since currently htslib has to be in the same directory as demuxlet during the installation, would you be able to include a copy of htslib in the tarball as well? This would be similar to the verifyBamID release tarball that includes the LibStatGen dependency: verifyBamIDLibStatGen.1.1.3.tgz

Please let me know if there is anything I can do to help.

Fail to open .single

I used the command to run demuxlet, and it took 3-4 days running smoothly, and today it gives me a message:
demuxlet --sam SAMPLE.bam --vcf SAMPLE.vcf --field GT --out SAMPLE

Has anyone encountered this before?
complete screenshot here:

NOTICE [2018/07/11 09:28:50] - Total number of gapped/noninformative reads : 283047021
NOTICE [2018/07/11 09:28:50] - Total number of base-QC-failed reads : 1195369
NOTICE [2018/07/11 09:28:50] - Total number of redundant reads : 32619271
NOTICE [2018/07/11 09:28:50] - Total number of pass-filtered reads : 23393737
NOTICE [2018/07/11 09:28:50] - Total number of pass-filtered reads overlapping with multiple SNPs : 10782274
NOTICE [2018/07/11 09:28:50] - Starting to prune out cells with too few reads...
NOTICE [2018/07/11 09:28:50] - Finishing pruning out 0 cells with too few reads...
NOTICE [2018/07/11 09:28:51] - Starting to identify best matching individual IDs
[E::hts_open_format] Failed to open file /PATH/SAMPLE.single

FATAL ERROR -
[E:cmd_cram_demuxlet.cpp:410 main] Cannot create /PATH/SAMPLE.single file

terminate called after throwing an instance of 'pException'
what(): Exception was thrown
Abort (core dumped)

Segmentation fault - core dumped

Hi Hyun and Jimmie,

We have sequenced the cells that were pooled from 4 individuals. Now I am trying to de-convolute them using Demuxlet with 120 Gb memory. However, I get the segmentation fault error at Chr 6.

NOTICE [2019/05/21 21:59:58] - Observed 298000 droplets with unique cell barcode
NOTICE [2019/05/21 22:00:07] - Reading 287000000 reads at chr6:16441802 and skipping 177642284
NOTICE [2019/05/21 22:00:15] - Observed 299000 droplets with unique cell barcode
NOTICE [2019/05/21 22:00:15] - Observed 299000 droplets with unique cell barcode
/opt/p6444/n048/job_scripts/21323893: line 9: 180888 Segmentation fault (core dumped) tools/demuxlet/bin/demuxlet --sam reads.bam --vcf genotype_info.vcf --sm-list sample.list --min-MQ 30 --field GT --out demux_out

As you suggested in the posts of other people, I already tried reducing the number of individuals from 4 to 2 using --sm-list, as well as taking only exonic SNPs, but I still get this error at Chr 6.

Could you please help?
Thanks a lot,
Jatin

five or eight samples, https://www.ebi.ac.uk/ena/data/view/PRJNA381100

I found the paper "Multiplexed droplet single-cell rNA-sequencing using natural genetic variation" mentioned eight samples , but there are only five fastqs in EBI of SRP102802. and there are only five fastq.gz, no annotated left and right(in the website https://www.ebi.ac.uk/ena/data/view/PRJNA381100)

and in the geo website, no genes.tsv found

Thanks a lot. and do you do extral quality control of fastq files, for example, in bulk rna-seq, used softwares like seqprep, scikle.

Disable doublet analysis

For one of our use cases, we use demuxlet to compare a single cell RNA-seq data set to a large number of samples in a VCF. In this instance, we don't care about doublet assignments, but just want to find which cells are singlets and which is the most likely sample. Unfortunately, we run into memory issues when demuxlet tries to find doublets as there are so many pairs of possible samples. It doesn't seem like an option exists to avoid this OOM crash (i.e., skip doublet searching). If it doesn't exist, would it be possible to implement a feature like this? I am happy to try myself and submit a PR, but it's not clear to me where in the codebase such a change would go. Any other tips for VCF files that have a large number of samples would also be appreciated! Thanks in advance.

using demuxlet for two mouse strain ( BL6 and BALBC)

Hi,

Demuxlet is a awesome tool! I'm wondering can I use it to demuplex scRNA experiment mixed with mouse cell from BL6 and BALBC strain? If it works, how should I handle the VCF file? In this case, BL6 will be the reference genome, and I only have one VCF file for BALBC strain.
Can Demuxlet work with VCF file containing only one sample variation information?

Thank you very much!

running with no argument causes segmentation fault

Running with no arguments results in printing the help but then it attempts to continue without a filename and segfaults.

FATAL ERROR -
[bcf_filtered_reader.cpp:5 void BCFFilteredReader::init_params()] bcf_file_name is empty

terminate called after throwing an instance of 'pException'
what(): Exception was thrown
Aborted (core dumped)

Different results for the tutorial

Hello,

firstly, thank you for the tool. I was going through the tutorial and I get quite different results using the Docker instance and a version of demuxlet (5548344) compiled on our server.

Using the Docker instance, there are 458 singlets and 42 doublets.

docker run -v /Users/dtang/data/demuxlet.tutorial/:/data yimmieg/demuxlet --sam /data/jurkat_293t_downsampled_n500_full_bam.bam --vcf /data/jurkat_293t_exons_only.vcf --field GT --out data/jurkat_293t_demuxlet

cat jurkat_293t_demuxlet.best | cut -f6 | cut -f1 -d'-' | sort | uniq -c
   1 BEST
  42 DBL
 458 SNG

However, using the version compiled on the server gives 494 singlets and 6 doublets.

demuxlet --sam jurkat_293t_downsampled_n500_full_bam.bam --vcf jurkat_293t_exons_only.vcf --field GT --out jurkat_293t_demuxlet

cat jurkat_293t_demuxlet.best | cut -f6 | cut -f1 -d'-' | sort | uniq -c
      1 BEST
      6 DBL
    494 SNG

Should I be concerned?

Cheers,

Dave

Shouldn't need to import htslib from local address

If the user has htslib installed properly, you should be able to use

#include <htslib/sam.h>

and such without needing the library in some neighbouring folder. You should be able to just add -lhts to the compile line.

QC of deconvoluted SNG

I've run 2 big experiments with demultiplexing so far.
Both of them have unsatisfying SNG numbers compared to the expected numbers from the publication and for sure less than 90% of the total GEMS. my first experiment (8 samples pool) had 25.000 reads per cell and was a 3' sequencing 10x v2, the second (20 samples pool) has 50.000 reads per cell and is a 5' sequencing 10x v2. These differences result in higher number of UMi per cell in the second instance, but the number of SNP-per-cell distribution is still similar. hence I'm throwing away a lot of sequencing in both cases.

the genotype data for bot was produced using the Illumina infinium global screening array. I deliberately generated a VCF file with only the GT information to test the demultiplexing.

Demuxlet is calibrated on 10x chemistry v1. What exactly is affected by that in the mixture model?
I noticed that tweaking --geno-error from the default 0.01 to 0.05 increases dramatically the SNG over the DBL in the 5' seq experiment, but not in the 3' seq experiment (tested geno error 0.0001, 0.001, 0.05 and 0.2) how does demuxlet calculate the geno-error from the GT info? (from the methods of the paper P_sv (g) = Pr(g |Data_sv ))
How many of the SNG would be also a "good cell" from the 10x pipeline (i.e. the upper part of the knee plot in 10x report)? In my data the SNG and 10xPassFilter don't match.
How many SNG are actually good QC cells (%mitocondria, nUMI, complexity, # reads per cell...) in your data - is this not mentioned in the paper?

many thanks

problem installing demuxlet on a server

Hello,

We are trying to install demuxlet on a server at work so that several people could access it, but we have error messages we don't understand and I was wondering if you could help us. We tried on 2 different server and had different error messages.

Server 1:

Server (1) works with modules that people need to load to use them. htslib was already installed on it. We put demuxlet in the same folder as htslib: /cm/shared/apps/htslib/1.9 that now contains - amongst others - a directory called demuxlet and another one called htslib. We installed demuxlet there but the make command gave the following error message:

[email protected] /cm/shared/apps/htslib/1.9/demuxlet # make
make  all-am
make[1]: Entering directory `/cm/shared/apps/htslib/1.9/demuxlet'
g++ -DHAVE_CONFIG_H -I.  -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2    -g -O2 -MT Error.o -MD -MP -MF .deps/Error.Tpo -c -o Error.o Error.cpp
mv -f .deps/Error.Tpo .deps/Error.Po
g++ -DHAVE_CONFIG_H -I.  -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2    -g -O2 -MT PhredHelper.o -MD -MP -MF .deps/PhredHelper.Tpo -c -o PhredHelper.o PhredHelper.cpp
mv -f .deps/PhredHelper.Tpo .deps/PhredHelper.Po
g++ -DHAVE_CONFIG_H -I.  -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2    -g -O2 -MT params.o -MD -MP -MF .deps/params.Tpo -c -o params.o params.cpp
mv -f .deps/params.Tpo .deps/params.Po
g++ -DHAVE_CONFIG_H -I.  -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2    -g -O2 -MT hts_utils.o -MD -MP -MF .deps/hts_utils.Tpo -c -o hts_utils.o hts_utils.cpp
In file included from hts_utils.cpp:24:0:
hts_utils.h:40:25: fatal error: htslib/kseq.h: No such file or directory
 #include "htslib/kseq.h"
                         ^
compilation terminated.
make[1]: *** [hts_utils.o] Error 1
make[1]: Leaving directory `/cm/shared/apps/htslib/1.9/demuxlet'
make: *** [all] Error 2

It seems that it didn't find htslib/kseq.h but if I type
[email protected] /cm/shared/apps/htslib/1.9/demuxlet $ ll ../htslib/, it's there:

total 281
-rw-r--r-x 1 root root 14797 Sep 19 13:09 bgzf.h
-rw-r--r-x 1 root root 15393 Sep 19 13:09 cram.h
-rw-r--r-x 1 root root  8972 Sep 19 13:09 faidx.h
-rw-r--r-x 1 root root 10077 Sep 19 13:09 hfile.h
-rw-r--r-x 1 root root  3336 Sep 19 13:09 hts_defs.h
-rw-r--r-x 1 root root 11446 Sep 19 13:09 hts_endian.h
-rw-r--r-x 1 root root 30459 Sep 19 13:09 hts.h
-rw-r--r-x 1 root root  3898 Sep 19 13:09 hts_log.h
-rw-r--r-x 1 root root  1996 Sep 19 13:09 hts_os.h
-rw-r--r-x 1 root root  4195 Sep 19 13:09 kbitset.h
-rw-r--r-x 1 root root  2716 Sep 19 13:09 kfunc.h
-rw-r--r-x 1 root root 21540 Sep 19 13:09 khash.h
-rw-r--r-x 1 root root  3990 Sep 19 13:09 khash_str2int.h
-rw-r--r-x 1 root root  5086 Sep 19 13:09 klist.h
-rw-r--r-x 1 root root  2850 Sep 19 13:09 knetfile.h
-rw-r--r-x 1 root root  9139 Sep 19 13:09 kseq.h
-rw-r--r-x 1 root root 10567 Sep 19 13:09 ksort.h
-rw-r--r-x 1 root root  7012 Sep 19 13:09 kstring.h
-rw-r--r-x 1 root root  5834 Sep 19 13:09 regidx.h
-rw-r--r-x 1 root root 29580 Sep 19 13:09 sam.h
-rw-r--r-x 1 root root 14954 Sep 19 13:09 synced_bcf_reader.h
-rw-r--r-x 1 root root  3118 Sep 19 13:09 tbx.h
-rw-r--r-x 1 root root  9386 Sep 19 13:09 thread_pool.h
-rw-r--r-x 1 root root 43671 Sep 19 13:09 vcf.h
-rw-r--r-x 1 root root  1602 Sep 19 13:09 vcf_sweep.h
-rw-r--r-x 1 root root  4819 Sep 19 13:09 vcfutils.h

Is-there anything we are doing wrong? Is the demuxlet forlder at the right place? Does it need to be in the folder /cm/shared/apps/htslib/1.9 or /cm/shared/apps/htslib/ (but that folder contains 2 versions of htslib)? Could that be the problem or is there something else?

Server 2

Server (2) runs on Debian, 9.6 (stretch). That server doesn't work with modules that people need to load and htslib wasn't installed on it. Therefore we installed the last version of htslib then demuxlet in the same folder.

In this case, the error message is different:

root@compute-1-16:/home/alemp/dev/demuxlet# make
make  all-am
make[1]: Entering directory '/home/alemp/dev/demuxlet'
/bin/bash ./libtool  --tag=CXX   --mode=link g++  -g -O2   -o demuxlet Error.o PhredHelper.o params.o hts_utils.o bcf_ordered_reader.o interval.o interval_tree.o utils.o genome_interval.o reference_sequence.o bcf_chunked_reader.o genomeChunk.o sam_filtered_reader.o sc_drop_seq.o bcf_filtered_reader.o filter.o tsv_reader.o cmd_cram_demuxlet.o ../htslib/libhts.a -lpthread -llzma -lz -lbz2 -lgomp -lcurl -lcrypto -lz 
libtool: link: g++ -g -O2 -o demuxlet Error.o PhredHelper.o params.o hts_utils.o bcf_ordered_reader.o interval.o interval_tree.o utils.o genome_interval.o reference_sequence.o bcf_chunked_reader.o genomeChunk.o sam_filtered_reader.o sc_drop_seq.o bcf_filtered_reader.o filter.o tsv_reader.o cmd_cram_demuxlet.o  ../htslib/libhts.a -lpthread -llzma -lbz2 -lgomp -lcurl -lcrypto -lz
/usr/bin/ld: cannot find -lcurl
/usr/bin/ld: cannot find -lcrypto
collect2: error: ld returned 1 exit status
Makefile:462: recipe for target 'demuxlet' failed
make[1]: *** [demuxlet] Error 1
make[1]: Leaving directory '/home/alemp/dev/demuxlet'
Makefile:359: recipe for target 'all' failed
make: *** [all] Error 2

It doesn't seem to find lcurl but when we look for it, it's there:

root@compute-1-16:/usr# find . -name 'libcur*'
./share/lintian/overrides/libcurl3
./share/lintian/overrides/libcurl3-gnutls
./share/doc/libcurl3
./share/doc/libcurl3-gnutls
./lib/x86_64-linux-gnu/libcurl.so.3
./lib/x86_64-linux-gnu/libcurl.so.4.4.0
./lib/x86_64-linux-gnu/libcurl-gnutls.so.4.4.0
./lib/x86_64-linux-gnu/libcurl.so.4
./lib/x86_64-linux-gnu/libcurl-gnutls.so.3
./lib/x86_64-linux-gnu/libcurl-gnutls.so.4

Could you help us with this? It would be great to be able to run demuxlet on one of these server!

Thank you very much!

Best wishes,

Alice

FATAL ERROR - [E:int32_t main(int32_t, char**)] Your VCF/BCF files and SAM/BAM/CRAM files have different ordering of chromosomes. SAM/BAM/CRAM file has 1 before 10, but VCF/BCF file has 1 after 10 terminate called after throwing an instance of 'pException' what(): Exception was thrown Aborted (core dumped)

Can please someone help me with this issue, causes, and cures/solutions?

This is my commend:

./demuxlet/demuxlet --sam /home/outs/possorted_genome_bam.bam --vcf /home/new.vcf --field GT --min-mac 10 --min-uniq 4 --out C24

Available Options

The following parameters are available. Ones with "[]" are in effect:
Options for input SAM/BAM/CRAM : --sam [/home/outs/possorted_genome_bam.bam],
--tag-group [CB], --tag-UMI [UB]
Options for input VCF/BCF : --vcf [ /home/new.vcf ],
--field [GT], --geno-error [0.01],
--min-mac [10], --min-callrate [0.50],
--sm, --sm-list
Output Options : --out [C24], --alpha, --write-pair,
--doublet-prior [0.50],
--sam-verbose [1000000],
--vcf-verbose [10000]
Read filtering Options : --cap-BQ [40], --min-BQ [13],
--min-MQ [20], --min-TD,
--excl-flag [3844]
Cell/droplet filtering options : --group-list, --min-total, --min-uniq [4],
--min-snp

Run with --help for more detailed help messages of each argument.

NOTICE [2019/02/22 16:38:17] - Finished identifying 6667 samples to load from VCF/BCF

FATAL ERROR -
[E:int32_t main(int32_t, char**)] Your VCF/BCF files and SAM/BAM/CRAM files have different ordering of chromosomes. SAM/BAM/CRAM file has 1 before 10, but VCF/BCF file has 1 after 10

terminate called after throwing an instance of 'pException'
what(): Exception was thrown
Aborted (core dumped)

Installing demuxlet

Hi, as it appears with some others, I too am getting errors when trying to install demuxlet on an Ubuntu machine.

I made sure htslibs was downloaded and installed in the "sibling" folder, within the same parent directory as my demuxlet downloaded folder. After running the initial /.configure command (output pasted below), I ran make and see the following error (also pasted below):

demuxlet$ sudo ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make sets $(MAKE)... (cached) yes
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking for style of include used by make... GNU
checking dependency style of g++... gcc3
checking whether we are using the GNU C++ compiler... (cached) yes
checking whether g++ accepts -g... (cached) yes
checking dependency style of g++... (cached) gcc3
checking how to run the C++ preprocessor... g++ -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking limits.h usability... yes
checking limits.h presence... yes
checking for limits.h... yes
checking stddef.h usability... yes
checking stddef.h presence... yes
checking for stddef.h... yes
checking for stdint.h... (cached) yes
checking for stdlib.h... (cached) yes
checking for string.h... (cached) yes
checking for unistd.h... (cached) yes
checking for stdbool.h that conforms to C99... no
checking for _Bool... no
checking for inline... inline
checking for int32_t... yes
checking for int64_t... yes
checking for C/C++ restrict keyword... __restrict
checking for size_t... yes
checking for uint16_t... yes
checking for uint32_t... yes
checking for uint64_t... yes
checking for uint8_t... yes
checking for ptrdiff_t... yes
checking for error_at_line... yes
checking for stdlib.h... (cached) yes
checking for GNU libc compatible malloc... yes
checking whether sys/types.h defines makedev... yes
checking for stdlib.h... (cached) yes
checking for GNU libc compatible realloc... yes
checking for working strtod... yes
checking for floor... yes
checking for gethostname... yes
checking for memset... yes
checking for pow... yes
checking for select... yes
checking for sqrt... yes
checking for strchr... yes
checking for strstr... yes
checking for strtol... yes
checking for gzopen in -lz... yes
checking for erf in -lm... yes
checking for erfc in -lm... yes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking how to print strings... printf
checking for gcc... gcc
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking for a sed that does not truncate output... /bin/sed
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking how to convert x86_64-pc-linux-gnu file names to x86_64-pc-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-pc-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @file support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for a working dd... /bin/dd
checking how to truncate binary pipes... /bin/dd bs=4096 count=1
checking for mt... mt
checking if mt is a manifest tool... no
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking for shl_load... no
checking for shl_load in -ldld... no
checking for dlopen... no
checking for dlopen in -ldl... yes
checking whether a program can dlopen itself... yes
checking whether a statically linked program can dlopen itself... no
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking how to run the C++ preprocessor... g++ -E
checking for ld used by g++... /usr/bin/ld -m elf_x86_64
checking if the linker (/usr/bin/ld -m elf_x86_64) is GNU ld... yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking for g++ option to produce PIC... -fPIC -DPIC
checking if g++ PIC flag -fPIC -DPIC works... yes
checking if g++ static flag -static works... yes
checking if g++ supports -c -o file.o... yes
checking if g++ supports -c -o file.o... (cached) yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking dynamic linker characteristics... (cached) GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating config.h
config.status: executing depfiles commands
config.status: executing libtool commands

demuxlet$ sudo make
make all-am
make[1]: Entering directory '/mnt/bin/demuxlet'
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT Error.o -MD -MP -MF .deps/Error.Tpo -c -o Error.o Error.cpp
mv -f .deps/Error.Tpo .deps/Error.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT PhredHelper.o -MD -MP -MF .deps/PhredHelper.Tpo -c -o PhredHelper.o PhredHelper.cpp
mv -f .deps/PhredHelper.Tpo .deps/PhredHelper.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT params.o -MD -MP -MF .deps/params.Tpo -c -o params.o params.cpp
params.cpp: In member function ‘void param::message(const char*, ...)’:
params.cpp:77:17: warning: format not a string literal and no format arguments [-Wformat-security]
::printf(buf);
^
mv -f .deps/params.Tpo .deps/params.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT hts_utils.o -MD -MP -MF .deps/hts_utils.Tpo -c -o hts_utils.o hts_utils.cpp
In file included from hts_utils.cpp:25:0:
../htslib/htslib/hfile.h: In function ‘ssize_t hwrite(hFILE*, const void*, size_t)’:
../htslib/htslib/hfile.h:261:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (fp->limit - fp->begin < nbytes){
~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
mv -f .deps/hts_utils.Tpo .deps/hts_utils.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT bcf_ordered_reader.o -MD -MP -MF .deps/bcf_ordered_reader.Tpo -c -o bcf_ordered_reader.o bcf_ordered_reader.cpp
mv -f .deps/bcf_ordered_reader.Tpo .deps/bcf_ordered_reader.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT interval.o -MD -MP -MF .deps/interval.Tpo -c -o interval.o interval.cpp
mv -f .deps/interval.Tpo .deps/interval.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT interval_tree.o -MD -MP -MF .deps/interval_tree.Tpo -c -o interval_tree.o interval_tree.cpp
mv -f .deps/interval_tree.Tpo .deps/interval_tree.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT utils.o -MD -MP -MF .deps/utils.Tpo -c -o utils.o utils.cpp
mv -f .deps/utils.Tpo .deps/utils.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT genome_interval.o -MD -MP -MF .deps/genome_interval.Tpo -c -o genome_interval.o genome_interval.cpp
mv -f .deps/genome_interval.Tpo .deps/genome_interval.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT reference_sequence.o -MD -MP -MF .deps/reference_sequence.Tpo -c -o reference_sequence.o reference_sequence.cpp
mv -f .deps/reference_sequence.Tpo .deps/reference_sequence.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT bcf_chunked_reader.o -MD -MP -MF .deps/bcf_chunked_reader.Tpo -c -o bcf_chunked_reader.o bcf_chunked_reader.cpp
mv -f .deps/bcf_chunked_reader.Tpo .deps/bcf_chunked_reader.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT genomeChunk.o -MD -MP -MF .deps/genomeChunk.Tpo -c -o genomeChunk.o genomeChunk.cpp
mv -f .deps/genomeChunk.Tpo .deps/genomeChunk.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT sam_filtered_reader.o -MD -MP -MF .deps/sam_filtered_reader.Tpo -c -o sam_filtered_reader.o sam_filtered_reader.cpp
mv -f .deps/sam_filtered_reader.Tpo .deps/sam_filtered_reader.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT sc_drop_seq.o -MD -MP -MF .deps/sc_drop_seq.Tpo -c -o sc_drop_seq.o sc_drop_seq.cpp
mv -f .deps/sc_drop_seq.Tpo .deps/sc_drop_seq.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT bcf_filtered_reader.o -MD -MP -MF .deps/bcf_filtered_reader.Tpo -c -o bcf_filtered_reader.o bcf_filtered_reader.cpp
bcf_filtered_reader.cpp: In member function ‘void BCFFilteredReader::init_params()’:
bcf_filtered_reader.cpp:57:53: warning: ‘scol’ may be used uninitialized in this function [-Wmaybe-uninitialized]
std::string id = tsv_sex_map.str_field_at(scol);
^
mv -f .deps/bcf_filtered_reader.Tpo .deps/bcf_filtered_reader.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT filter.o -MD -MP -MF .deps/filter.Tpo -c -o filter.o filter.cpp
filter.cpp: In function ‘void filters_set_genotype_string(filter_t*, bcf1_t*, token_t*)’:
filter.cpp:649:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if ( str.l - plen > blen )
~~~~~~~~~~~~~^~~~~~
filter.cpp: In function ‘int filters_init1(filter_t*, char*, int, token_t*)’:
filter.cpp:1318:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i=0; i<tmp.l; i++)
~^~~~~~
filter.cpp:1443:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if ( end - tmp.s != strlen(tmp.s) )
~~~~~~~~~~~~^~~~~~~~~~~~~~~~
filter.cpp: In function ‘filter_t* filter_init(bcf_hdr_t*, const char*)’:
filter.cpp:1639:26: warning: ‘ival’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if ( out[ival].tok_type!=TOK_VAL || !out[ival].key )
^
filter.cpp:1617:38: warning: ‘ival’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if ( !strcasecmp(out[ival].key,"snp") || !strcasecmp(out[ival].key,"snps") ) { out[ival].threshold = VCF_SNP<<1; out[ival].is_str = 0; }
^
filter.cpp:1623:26: warning: ‘itok’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if ( out[itok].tok_type==TOK_LIKE || out[itok].tok_type==TOK_NLIKE ) out[itok].comparator = filters_cmp_bit_and;
^
mv -f .deps/filter.Tpo .deps/filter.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT tsv_reader.o -MD -MP -MF .deps/tsv_reader.Tpo -c -o tsv_reader.o tsv_reader.cpp
mv -f .deps/tsv_reader.Tpo .deps/tsv_reader.Po
g++ -DHAVE_CONFIG_H -I. -I ../htslib/ -I ../../htslib/htslib -pipe -D__STDC_LIMIT_MACROS -Wall -Wno-unused-local-typedefs -Wno-enum-compare -fpic -O2 -g -O2 -MT cmd_cram_demuxlet.o -MD -MP -MF .deps/cmd_cram_demuxlet.Tpo -c -o cmd_cram_demuxlet.o cmd_cram_demuxlet.cpp
mv -f .deps/cmd_cram_demuxlet.Tpo .deps/cmd_cram_demuxlet.Po
/bin/bash ./libtool --tag=CXX --mode=link g++ -g -O2 -o demuxlet Error.o PhredHelper.o params.o hts_utils.o bcf_ordered_reader.o interval.o interval_tree.o utils.o genome_interval.o reference_sequence.o bcf_chunked_reader.o genomeChunk.o sam_filtered_reader.o sc_drop_seq.o bcf_filtered_reader.o filter.o tsv_reader.o cmd_cram_demuxlet.o ../htslib/libhts.a -lpthread -llzma -lz -lbz2 -lgomp -lcurl -lcrypto -lz
libtool: link: g++ -g -O2 -o demuxlet Error.o PhredHelper.o params.o hts_utils.o bcf_ordered_reader.o interval.o interval_tree.o utils.o genome_interval.o reference_sequence.o bcf_chunked_reader.o genomeChunk.o sam_filtered_reader.o sc_drop_seq.o bcf_filtered_reader.o filter.o tsv_reader.o cmd_cram_demuxlet.o ../htslib/libhts.a -lpthread -llzma -lbz2 -lgomp /usr/lib/x86_64-linux-gnu/libcurl.so -lcrypto -lz -pthread
make[1]: Leaving directory '/mnt/bin/demuxlet'

Any input would be appreciated, as I try to get this up and running!

--alpha does not set up gridalpha[] properly

It appears the assumption in the code is that gridAlpha[0] = 0.0. When using --alpha to set the alpha values, you will need to do

--alpha 0.0 --alpha 0.5

Otherwise doublet detection is broken. Just doing

--alpha 0.5

(as recommended in the help) causes no alpha values to ever be tested for doublets (the first alpha value never gets tested for doublets). Probably need to check/enforce gridAlpha[0]=0.0.

Symptom is getting very weird doublet calls since jBest, kBest end up being set to -1.

Current workaround is to make sure you always use --alpha 0.0 before specifying the actual alpha values you want to test.

--alpha 0.3 for 3 samples pooled data?

Hi!
In your webpage, it is said that for 2 sample pooled data, we should use "--alpha 0 --alpha 0.5" to better estimate the doublets.
I wonder whether we should use "--alpha 0 --alpha 0.3" for 3 sample pooled data?

vcf file with all individual information for a pooled sample

Hi! In the demuxlet paper, I did not find the place commenting on how did you make the .vcf file for the 8 individuals. Could you specify it a bit? Thanks!

Ambiguity in tutorial of demuxlet

Hello and thanks a lot for the nice and very practical tool.

I have a data set of single cell RNA seq data produced by 10X genomics technology. The data set is a combination of cells that are coming from two persons and I would like to separate them.

My problem is that I do not exactly know which vcf file should I use. As you know, on the 1000 genomes website, the SNPs on human genome are reported in the vcf format. In each vcf file there are variants related to around 3000 persons (in the tutorial variants related to two persons are included). Should I run demuxlet for the whole 3000 individuals?

Because my sample is composed of only two persons and if I run demuxlet with 3000 thousand individuals, in the final results I get more than two individuals for my samples which I know it is not true.

How do you suggest to solve this problem?

My second question is regarding to the genomic positions of the variants. Is it necessary to only include exons or program will also works with all the variants in all genomic positions.

Best regards

Likelihoods not caclulated (10x Chromium)

I'm trying to run demuxlet on reads generated by the 10x chromium pipeline. The bam files are generated using cellranger. On each lane we've pooled 6 individuals, so the input files (bam, vcf) contain the information of 6 indivuals. For the group-list argument we use the corrected barcode output file yielded by cellranger.

The complete command I'm using:

demuxlet \
      --sam possorted_genome_bam.bam \
      --tag-group CB \
      --tag-UMI UB \
      --field GP \
      --vcf sorted.vcf.gz \
      --out output_filtered_barcodes \
      --group-list barcodes.tsv

The software runs but the likelihoods in the generated output files are all set to 0.
output_filtered_barcodes.best:

BARCODE	RD.TOTL	RD.PASS	RD.UNIQ	N.SNP	BEST	SNG.1ST	SNG.LLK1	SNG.2ND	SNG.LLK2	SNG.LLK0	DBL.1ST	DBL.2ND	ALPHA	LLK12	LLK1	LLK2	LLK10	LLK20	LLK00	PRB.DBL	PRB.SNG1
AAACCTGGTGCCTTGG-1	19039	3240	1126	1016	AMB-sample1-sample1-sample1/sample1	sample1	0.0000	sample1	0.0000	nan	sample1sample1	0.000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	nan	nan
AAACGGGAGAGAACAG-1	12617	2276	721	669	AMB-sample1-sample1-sample1/sample1	sample1	0.0000	sample1	0.0000	nan	sample1sample1	0.000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	nan	nan

Error log:

NOTICE [2018/01/25 16:49:38] - Finished loading 4309 droplet/cell barcodes to consider
NOTICE [2018/01/25 16:49:38] - Finished identifying 6 samples to load from VCF/BCF
NOTICE [2018/01/25 16:49:38] - Reading 0 reads at 1:1 and skipping 0
NOTICE [2018/01/25 16:49:38] - WARNING: Cannot find Droplet/Cell tag UB from 748-th read K00296:61:HHVN5BBXX:5:1222:13007:17913 at 1:14187-566351. Treating all of them as a single group
...
NOTICE [2018/01/25 16:49:38] - WARNING: Cannot find Droplet/Cell tag UB from 15981-th read K00296:61:HHVN5BBXX:4:2224:6076:40367 at 1:134996-135133. Treating all of them as a single group
NOTICE [2018/01/25 16:49:38] - WARNING: Suppressing 10+ missing Droplet/Cell tag warnings...
NOTICE [2018/01/25 16:49:39] - WARNING: Cannot find UMI tag UB from 44270-th read K00296:61:HHVN5BBXX:6:2109:32491:25615 at 1:564485-564623. Treating all of them as a single UMI
[W::vcf_parse] FILTER 'GENOTYPED_ONLY' is not defined in the header
[W::vcf_parse] FILTER 'GENOTYPED' is not defined in the header
NOTICE [2018/01/25 16:49:44] - WARNING: Cannot find UMI tag UB from 88246-th read K00296:61:HHVN5BBXX:7:1111:22313:30943 at 1:565645-565783. Treating all of them as a single UMI
...
NOTICE [2018/01/25 16:49:59] - WARNING: Cannot find UMI tag UB from 108070-th read K00296:61:HHVN5BBXX:1:1117:10439:43040 at 1:565692-565830. Treating all of them as a single UMI
NOTICE [2018/01/25 16:49:59] - WARNING: Suppressing 10+ UMI warnings...
NOTICE [2018/01/25 16:50:33] - Reading 10000 variants at 1:1229225, Skipping 9210, Missing 0.
...
NOTICE [2018/01/26 13:59:12] - Reading 39230000 variants at 9:140680332, Skipping 33823488, Missing 0.
NOTICE [2018/01/26 13:59:36] - Reading 275000000 reads at MT:1987 and skipping 154598269
NOTICE [2018/01/26 13:59:47] - Reading 278000000 reads at MT:8002 and skipping 156520112
NOTICE [2018/01/26 14:00:15] - Reading 286000000 reads at X:48835761 and skipping 162389102
NOTICE [2018/01/26 14:00:18] - Reading 287000000 reads at X:71492517 and skipping 162877106
NOTICE [2018/01/26 14:00:22] - Reading 288000000 reads at X:71736339 and skipping 163745120
NOTICE [2018/01/26 14:01:51] - WARNING: Suppressed a total of 38387 UMI warnings...
NOTICE [2018/01/26 14:01:51] - WARNING: Suppressed a total of 4386863 droplet/cell barcode warnings...
NOTICE [2018/01/26 14:01:51] - Finished reading 5406968 markers from the VCF file
NOTICE [2018/01/26 14:01:51] - Total number input reads : 314454500
NOTICE [2018/01/26 14:01:51] - Total number of read-QC-passed reads : 126383207 
NOTICE [2018/01/26 14:01:51] - Total number of skipped reads with ignored barcodes : 20439398
NOTICE [2018/01/26 14:01:51] - Total number of non-skipped reads with considered barcodes : 105882246
NOTICE [2018/01/26 14:01:51] - Total number of gapped/noninformative reads : 89224135
NOTICE [2018/01/26 14:01:51] - Total number of base-QC-failed reads : 1489872
NOTICE [2018/01/26 14:01:51] - Total number of redundant reads : 9605337
NOTICE [2018/01/26 14:01:51] - Total number of pass-filtered reads : 5562902
NOTICE [2018/01/26 14:01:51] - Total number of pass-filtered reads overlapping with multiple SNPs : 630819
NOTICE [2018/01/26 14:01:51] - Starting to prune out cells with too few reads...
NOTICE [2018/01/26 14:01:51] - Finishing pruning out 0 cells with too few reads...
NOTICE [2018/01/26 14:01:51] - Starting to identify best matching individual IDs
NOTICE [2018/01/26 14:01:51] - Processing 10000 markers...
...
NOTICE [2018/01/26 14:01:55] - Processing 5400000 markers...
NOTICE [2018/01/26 14:01:55] - Identifying best-matching individual..
NOTICE [2018/01/26 14:01:55] - Processing 1000 droplets...
NOTICE [2018/01/26 14:01:55] - Processing 2000 droplets...
NOTICE [2018/01/26 14:01:55] - Processing 3000 droplets...
NOTICE [2018/01/26 14:01:55] - Processing 4000 droplets...
NOTICE [2018/01/26 14:01:55] - Finished processing 4309 droplets total
NOTICE [2018/01/26 14:02:09] - Processing 0 cells..
...
NOTICE [2018/01/26 14:04:16] - Processing 4300 cells..
NOTICE [2018/01/26 14:04:16] - Finished writing output files

Header and first 2 lines of the vcf file (this file is correctly sorted):

##fileformat=VCFv4.2
##filedate=2017.5.20
##INFO=<ID=AF,Number=1,Type=Float,Description="Estimated Alternate Allele Frequency">
##filedate=2016.5.11
##source=Minimac3
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=1,Type=Float,Description="Estimated Alternate Allele Dosage : [P(0/1)+2*P(1/1)]">
##FORMAT=<ID=GP,Number=3,Type=Float,Description="Estimated Posterior Probabilities for Genotypes 0/0, 0/1 and 1/1">
##INFO=<ID=MAF,Number=1,Type=Float,Description="Estimated Alternate Allele Frequency">
##INFO=<ID=R2,Number=1,Type=Float,Description="Estimated Imputation Accuracy">
##INFO=<ID=ER2,Number=1,Type=Float,Description="Empirical (Leave-One-Out) R-square (available only for genotyped variants)">
##INFO=<ID=SF,Number=.,Type=String,Description="Source File (index to sourceFiles, f when filtered)">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##contig=<ID=1,length=249250621,assembly=b37>
##contig=<ID=10,length=135534747,assembly=b37>
##contig=<ID=11,length=135006516,assembly=b37>
##contig=<ID=12,length=133851895,assembly=b37>
##contig=<ID=13,length=115169878,assembly=b37>
##contig=<ID=14,length=107349540,assembly=b37>
##contig=<ID=15,length=102531392,assembly=b37>
##contig=<ID=16,length=90354753,assembly=b37>
##contig=<ID=17,length=81195210,assembly=b37>
##contig=<ID=18,length=78077248,assembly=b37>
##contig=<ID=19,length=59128983,assembly=b37>
##contig=<ID=2,length=243199373,assembly=b37>
##contig=<ID=20,length=63025520,assembly=b37>
##contig=<ID=21,length=48129895,assembly=b37>
##contig=<ID=22,length=51304566,assembly=b37>
##contig=<ID=3,length=198022430,assembly=b37>
##contig=<ID=4,length=191154276,assembly=b37>
##contig=<ID=5,length=180915260,assembly=b37>
##contig=<ID=6,length=171115067,assembly=b37>
##contig=<ID=7,length=159138663,assembly=b37>
##contig=<ID=8,length=146364022,assembly=b37>
##contig=<ID=9,length=141213431,assembly=b37>
##contig=<ID=X,length=155270560,assembly=b37>
##contig=<ID=Y,length=59373566,assembly=b37>
##contig=<ID=MT,length=16569,assembly=b37>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	sample1	sample2	sample3	sample4	sample5	sample6
1	13380	1:13380	C	G	.	PASS	AC=0;AF=0.00003;AN=94;MAF=0.00003;R2=0.00000;SF=0,1	GT:DS:GP	0|0:0.000:1.000,0.000,0.000	0/0:0.000:1.000,0.000,0.000	0/0:0.000:1.000,0.000,0.000	0/0:0.000:1.000,0.000,0.000	0/0:0.000:1.000,0.000,0.000	0/0:0.000:1.000,0.000,0.000
1	16071	1:16071	G	A	.	PASS	AC=0;AF=0.00006;AN=94;MAF=0.00006;R2=0.00004;SF=0,1	GT:DS:GP	0|0:0.000:1.000,0.000,0.000	0/0:0.000:1.000,0.000,0.000	0/0:0.000:1.000,0.000,0.000	0/0:0.000:1.000,0.000,0.000	0/0:0.000:1.000,0.000,0.000	0/0:0.000:1.000,0.000,0.000

If i subset the VCF and BAM file for only chromosome 1 the likelihoods do seem to be correctly calculated:

BARCODE	RD.TOTL	RD.PASS	RD.UNIQ	N.SNP	BEST	SNG.1ST	SNG.LLK1	SNG.2ND	SNG.LLK2	SNG.LLK0	DBL.1ST	DBL.2ND	ALPHA	LLK12	LLK1	LLK2	LLK10	LLK20	LLK00	PRB.DBL	PRB.SNG1
AAACCTGGTGCCTTGG-1	1985	417	121	118	DBL-sample3-1_LLDeep_0938-0.100	sample3	-32.9460	sample1	-91.1793	-53.7561	sample3	sample4	0.100	-30.8391	-32.9460	-104.3739	-32.9155	-79.9862	-53.7624	0.345	1
AAACGGGAGAGAACAG-1	1127	204	67	66	SNG-sample2	sample2	-28.5922	LLDeep_1619	-87.9795	-44.3673	sample2	sample1	0.100	-27.3026	-28.5922	-87.9795	-27.3026	-87.9806	-44.3754	0.228	1

Do you have any idea what might cause this issue

Downstream analysis

My question is more downstream how to proceed forward. I have 2 pools and each pools has 6 samples, Pool1 with samples before treatment and Pool2 after treatment. I want to compare the Sample1(before treatment) from Pool1 to Sample1(after treatment) from Pool2.

Do I subset the samples from each pool and merge them to 1 using seurat or do I need to re run the cellranger for each samples wit their barcodes?

Cannot find Droplet/Cell tag CB/UB warning messages

Hello,

We have been noticing the following warning messages during the analysis of data generated by Cell Ranger 3.

The following parameters are available. Ones with "[]" are in effect:
   Options for input SAM/BAM/CRAM : --sam [/share/ScratchGeneral/annsen/data/experimental_data/CLEAN/POAG_scRNA/POAG_scRNA_V2/181108_MD_003/181108_MD_003/outs/possorted_genome_bam.bam],
                                    --tag-group [CB], --tag-UMI [UB]
        Options for input VCF/BCF : --vcf [/share/ScratchGeneral/annsen/data/experimental_data/CLEAN/POAG_scRNA/POAG_scRNA.vcf],
                                    --field [GP], --geno-error [0.01],
                                    --min-mac [1], --min-callrate [0.50], --sm,
                                    --sm-list [/share/ScratchGeneral/annsen/repositories/POAG_scRNA/Pools/181108_MD_003.tsv]
                   Output Options : --out [/share/ScratchGeneral/annsen/analysis/POAG_scRNA/POAG_scRNA_Demuxlet/181108_MD_003],
                                    --alpha [0.00, 0.50], --write-pair,
                                    --doublet-prior [0.50],
                                    --sam-verbose [1000000],
                                    --vcf-verbose [10000]
           Read filtering Options : --cap-BQ [40], --min-BQ [13],
                                    --min-MQ [20], --min-TD,
                                    --excl-flag [3844]
   Cell/droplet filtering options : --group-list [/share/ScratchGeneral/annsen/data/experimental_data/CLEAN/POAG_scRNA/POAG_scRNA_V2/181108_MD_003/181108_MD_003/outs/filtered_feature_bc_matrix/barcodes.tsv],
                                    --min-total, --min-uniq, --min-snp

Run with --help for more detailed help messages of each argument.

NOTICE [2019/05/24 17:19:52] - Finished loading 12569 droplet/cell barcodes to consider
NOTICE [2019/05/24 17:19:52] - Finished loading 7 IDs from /share/ScratchGeneral/annsen/repositories/POAG_scRNA/Pools/181108_MD_003.tsv
NOTICE [2019/05/24 17:19:52] - Finished identifying 7 samples to load from VCF/BCF
NOTICE [2019/05/24 17:19:55] - Reading 0 reads at 1:1 and skipping 0
NOTICE [2019/05/24 17:19:55] - WARNING: Cannot find Droplet/Cell tag CB from 12274-th read A00152:78:HHNG5DSXX:4:2103:4155:7310 at 1:17475-17573. Treating all of them as a single group
NOTICE [2019/05/24 17:19:55] - WARNING: Cannot find Droplet/Cell tag CB from 26628-th read A00152:78:HHNG5DSXX:4:1669:22733:7764 at 1:128604-128702. Treating all of them as a single group
NOTICE [2019/05/24 17:19:55] - WARNING: Cannot find Droplet/Cell tag CB from 26651-th read A00152:78:HHNG5DSXX:1:2140:16242:34976 at 1:128667-128765. Treating all of them as a single group
NOTICE [2019/05/24 17:19:55] - WARNING: Cannot find Droplet/Cell tag CB from 26656-th read A00152:78:HHNG5DSXX:2:1412:32208:27398 at 1:128679-128777. Treating all of them as a single group
NOTICE [2019/05/24 17:19:55] - WARNING: Cannot find Droplet/Cell tag CB from 26996-th read A00152:78:HHNG5DSXX:2:2268:17779:12868 at 1:134158-134256. Treating all of them as a single group
NOTICE [2019/05/24 17:19:55] - WARNING: Cannot find Droplet/Cell tag CB from 26997-th read A00152:78:HHNG5DSXX:2:2526:30228:26318 at 1:134158-134256. Treating all of them as a single group
NOTICE [2019/05/24 17:19:55] - WARNING: Cannot find Droplet/Cell tag CB from 29536-th read A00152:78:HHNG5DSXX:4:2647:3405:22701 at 1:135153-135251. Treating all of them as a single group
NOTICE [2019/05/24 17:19:55] - WARNING: Cannot find Droplet/Cell tag CB from 29577-th read A00152:78:HHNG5DSXX:4:2253:7473:12117 at 1:138510-138608. Treating all of them as a single group
NOTICE [2019/05/24 17:19:55] - WARNING: Cannot find Droplet/Cell tag CB from 35070-th read A00152:78:HHNG5DSXX:2:1137:1271:5259 at 1:157470-692645. Treating all of them as a single group
NOTICE [2019/05/24 17:19:55] - WARNING: Cannot find Droplet/Cell tag CB from 35117-th read A00152:78:HHNG5DSXX:1:2460:9408:17973 at 1:157515-692690. Treating all of them as a single group
NOTICE [2019/05/24 17:19:55] - WARNING: Suppressing 10+ missing Droplet/Cell tag warnings...

This is the first time we have seen such issues; we did not encounter them with data generated by Cell Ranger 2. It is of great concern to us as the results have less cells.

This version of demuxlet was installed via anaconda. The conda environment is as follows:

# packages in environment at /share/ClusterShare/software/contrib/annsen/anaconda3/envs/demuxlet:
#
# Name                    Version                   Build  Channel
bzip2                     1.0.6             h14c3975_1002    conda-forge
ca-certificates           2019.3.9             hecc5488_0    conda-forge
curl                      7.64.0               h646f8bb_0    conda-forge
demuxlet                  1.0                  h7279bd8_1    bioconda
htslib                    1.9                  ha228f0b_7    bioconda
krb5                      1.16.3            hc83ff2d_1000    conda-forge
libcurl                   7.64.0               h01ee5af_0    conda-forge
libdeflate                1.0                  h14c3975_1    bioconda
libedit                   3.1.20170329      hf8c457e_1001    conda-forge
libgcc-ng                 8.2.0                hdf63c60_1    anaconda
libssh2                   1.8.0             h1ad7b7a_1003    conda-forge
libstdcxx-ng              8.2.0                hdf63c60_1    anaconda
libtool                   2.4.6             h14c3975_1002    conda-forge
ncurses                   6.1               hf484d3e_1002    conda-forge
openssl                   1.0.2r               h14c3975_0    conda-forge
samtools                  1.9                  h43f6869_9    bioconda
tk                        8.6.9             h84994c4_1001    conda-forge
xz                        5.2.4             h14c3975_1001    conda-forge
zlib                      1.2.11            h14c3975_1004    conda-forge

How can I prepare the VCF file?

Hi authors!

I now have 8 samples, each of which have single-cell RNA sequencing and bulk RNA sequencing.
Can I prepare the VCF file by comparing the data of bulk RNASeq data with the reference genome? (eg: GATK alignment)

statgen / demuxlet Goto Github PK

demuxlet's Introduction

demuxlet's People

Contributors

Stargazers

Watchers

Forkers

demuxlet's Issues

Server 1:

Server 2

Recommend Projects

Recommend Topics

Recommend Org