chaissonlab / danbing-tk Goto Github PK

View Code? Open in Web Editor NEW

21.0 3.0 3.0 56.42 MB

Toolkit for VNTR genotyping and repeat-pan genome graph construction

License: BSD 3-Clause "New" or "Revised" License

Python 51.02% C++ 47.62% Makefile 0.77% Shell 0.25% Dockerfile 0.33%

genome-graph vntr genotyping graph-aligner tandem-repeats

danbing-tk's Introduction

danbing-tk

A toolkit for variable number tandem repeats (VNTRs) analysis, which enables:

building repeat-pan genome graphs (RPGG) given haplotype-resolved assemblies for genome-wide profiling or simply VNTR alleles for targeted genotyping (referred to as danbing-tk build),
genotyping each VNTR as a set of (k-mer, count) given short-read sequencing (SRS) data (referred to as danbing-tk align), and
estimating VNTR or motif dosage from the genotype with bias correction (referred to as danbing-tk predict).

Our initial manuscript illustrates the key concept of this tool. The latest update details improvements on QC and bias correction, and extended applications on eQTL discoveries with motif compositions.

Download Releases

The latest release v1.3.1-manuscript comes with the lastest version of VNTR set, RPGG, and QC statistics.

	File	Input of	Output of
VNTR set	`tr.good.bed`	`danbing-tk build`
RPGG	`pan.tr.kmers`, `pan.kmerDBi.umap`, `pan.kmerDBi.vv`, `pan.graph.umap`	`danbing-tk align`	`danbing-tk build` or `vntr2kmers_thread`

Release v1.3.1-manuscript: provided QC statistics and all resources associated with the latest manuscript.
Release v1.3: Updated RPGG built from 35 HGSVC genomes.
Release v1.0: VNTR summary statistics and eGene discoveries are also included. Example analyses such as differential length/motif analysis, eQTL mapping, VNTR locus QC, sample QC are also included.

Building on Linux

git clone --recursive https://github.com/ChaissonLab/danbing-tk
cd danbing-tk && make -j 5

danbing-tk align

Decompress the RPGG RPGG.tar.gz in your working directory.

An example usage to genotype SRS sample using the RPGG:

samtools fasta -@2 -n $SRS.bam |
/$PREFIX/danbing-tk/bin/danbing-tk -gc 85 3 -ae -kf 4 1 -cth 45 -o $OUT_PREF -k 21 -qs pan -fa /dev/stdin -p $THREADS | gzip >$OUT_PREF.aln.gz

danbing-tk align takes ~12 cpu hours to genotype a 30x SRS sample. This will generate $OUT_PREF.tr.kmers and $OUT_PREF.aln.gz output with format specified in File Format.

Important note: If outputs of danbing-tk align are intended to be compared across individuals e.g. association studies, please check the bias_correction notebook before running.

danbing-tk build

Install Dependencies

For users intend to use danbing-tk align or the Scenario 1 of danbing-tk build, this step is not required.

The danbing-tk build pipeline and danbing-tk predict require several external packages. It is recommended to install all requirements using conda as follows:

conda create -n $MY_ENVIRONMENT -c conda-forge -c bioconda \
    python=3.11.4 snakemake=7.30.1 minimap2=2.26 samtools=1.17 bedtools=2.31.0 statsmodels=0.14.0 matplotlib=3.7.2
conda activate $MY_ENVIRONMENT

To check if everything is configured properly (tested on v1.3.2):

Go to /$PREFIX/danbing-tk/test/
Replace $PREFIX in goodPanGenomeGraph.json and input/genome.bam.tsv with the path to danbing-tk
Run snakemake -p -s ../pipeline/GoodPanGenomeGraph.snakefile -j 4 --forceall --output-wait 3

Running danbing-tk build

Scenario 1: building an RPGG for a single TR locus given VNTR alleles

Required inputs:
- VNTR alleles for each haplotype (one FASTA per haplotype)
Run vntr2kmers_thread with something like this:
- vntr2kmers_thread -g -k 21 -fs 700 -ntr 700 -on $NAME -fa $NUM_HAPLOTYPES $LIST_OF_FASTAS
- Note: at least 500 bp flanks are required for accurate mapping of pair-end reads, 700 bp was specified in the above example.
Index the graph as follows to use danbing-tk align later:
- /$PREFIX/danbing-tk/bin/ktools serialize $NAME

Scenario 2: building an RPGG for a VNTR set given assemblies

Required inputs:
- haplotype-resolved assemblies (FASTA)
- matched SRS data (BAM; optional)
- GRCh38 (FASTA; major chromosomes only without minor contigs)
- tandem repeat regions (BED; tr.good.bed from the release page or user-defined)
Copy /$PREFIX/danbing-tk/pipeline/goodPanGenomeGraph.json to your working directory and edit accordingly.
A config file /$INPUT_DIR/genome.bam.tsv with two columns, one for genome name and one for bam file path, is required if SRS data is available for graph pruning, e.g.

HG00514 /panfs/qcb-panasas/tsungyul/HG00514/HG00514.IL.srt.bam

Otherwise, set pruning in goodPanGenomeGraph.json to False and use a single column input for genome.bam.tsv.
Run the snakemake pipline with:

snakemake -p -s /$PREFIX/danbing-tk/pipeline/GoodPanGenomeGraph.snakefile -j 40\
    --cluster "{params.copts} -c {resources.cores} --mem={resources.mem}G -k" \
    --rerun-incomplete --restart-times 1 --output-wait 30

Submitting jobs to cluster is preferred as danbing-tk build is compute-intensive, ~1200 cpu hours for the original dataset. Otherwise, remove --cluster and its parameters to run jobs locally.

danbing-tk predict

Locus- and sample-specific biases are critical for normalizing the sum of k-mer counts to VNTR dosage (as a proxy for predicted length) and normalizing the average of k-mer counts to motif dosage. The bias for each locus in each sample is computed from the deviation of local read depth from the global mean given a set of invariant k-mers. Examples of this analysis can be found here

Automated bias correction has been added since v1.3.2. Invariant k-mer metadata ikmer.meta (human readable version ikmer.meta.txt) and example trkmers.meta.txt can be found in Assets.

Example usage:

/$PREFIX/danbing-tk/bin/danbing-tk-pred trkmers.meta.txt ikmer.meta corrected.gt.tsv bias.tsv

Caveat: Estimated k-mer dosage could be inaccurate if the bias term is too close to zero.

Miscellaneous

Leave-one-out analysis

To evaluate the quality of custom RPGG on matching SRS dataset, copy /$PREFIX/danbing-tk/pipeline/leaveOneOut.snakefile to your working directory and edit accordingly. Run the snakemake pipleine with:

snakemake -p -s /$PREFIX/pipeline/LeaveOneOut.snakefile -j 40 --cluster \
    "{params.copts} -c {resources.cores} --mem={resources.mem}G -k" \
    --rerun-incomplete --restart-times 1 --output-wait 30

Submitting jobs to cluster is preferred as this analysis is compute-intensive; otherwise, remove --cluster and its parameters to run jobs locally.

File Format

*.graph.kmers

>locus i
kmer0	out_edges0
kmer1	out_edges1
...
>locus i+1
...

out_edges denotes the presence of T/G/C/A as the next nucleotide encoded with 4 bits.

*.(tr|ntr).kmers

>locus i
kmer0	kmer_count0
kmer1	kmer_count1
...
>locus i+1
...

The second field is optional.

Important Note: the output of danbing-tk align do not contain locus info and the first field for minimal disk usage. The table can be reconstructed using the danbing_aln_output.tr_kmers.metadata.txt.gz from metadata.tar.gz on Zenodo

Alignment output (`-a` option)

Synopsis

 <src> <dest> <read_name> <read_seq/0> <read_seq/1> <ops/0> <annot/0> <ops/1> <annot/1>

src: source locus of a read pair (for simulation only)
dest: aligned locus for the read pair
ops: nucleotide-level operations to align the read to the graph. size = read_len + #del
- =: a match
- X[A|C|G|T]: a mismatch; letter in the graph is shown
- D[A|C|G|T]: a deletion in the read; letter in the graph is shown
- I: an insertion in the read
- *: unalinged nucleotide
annot: kmer-level VNTR annotations after applying ops to the read. size = read_len - ksize + 1 + #del - #ins
- =: a match in the repeat
- .: a match in the flank
- *: unaligned kmer

danbing-tk's People

Contributors

Stargazers

Watchers

Forkers

lijianying-arch asleonard ramayyala

danbing-tk's Issues

kmer decode

Hi
Thanks for this excellent tool, I tried to use this software to build graph for our VNTRs. After I got pan.tr.kmers and pan.graph.kmers, I find that kmers were coded into numbers. How can I decode kmers into DNA seqence, do you have any suggestion？

Best Wishes,
Lee

Using danbing with 250 bp PE WGS data

Hi,
How can I reliably decide upon -kf for 250 bp paired end reads? The tools help menu suggests 4,1 which are optimized for 150bp paired end reads.
Thanks.

Segmentation fault when checking the environment with testing samples

Dear ChaissonLab,
I am trying to check my environment by running on the testing samples with command:
snakemake -p -s ../pipeline/GoodPanGenomeGraph.snakefile -j 4 --rerun-incomplete --output-wait 5
and I encountered some problems in this line:
/home/txren/my_path/danbing_tk/danbing-tk/bin/ktools serialize /home/txren/my_path/danbing_tk/danbing-tk/test/output/pan
with the err:
reading kmers from /home/txren/my_path/danbing_tk/danbing-tk/test/output/pan.graph.kmers /usr/bin/bash: line 4: 1863431 Segmentation fault (core dumped) /home/txren/my_path/danbing_tk/danbing-tk/bin/ktools serialize /home/txren/my_path/danbing_tk/danbing-tk/test/output/pan
Could you help me with this issue?

danbing-tk: src/aQueryFasta_thread.cpp:882: int main(int, char**): Assertion `trFile and ntrFile' failed.

Hi,
I use the following command to align bam file to RPGG graph:
samtools fasta -@2 -n $SRS.bam |
/$PREFIX/danbing-tk/bin/danbing-tk -gc 80 -ae -kf 4 1 -cth 45 -o $OUT_PREF -k 21 -qs pan -fai /dev/stdin -p $THREADS

I encountered the error message:
danbing-tk: src/aQueryFasta_thread.cpp:882: int main(int, char**): Assertion `trFile and ntrFile' failed.

I checked the files in RPGG.tar.gz and found that there was no file endwith .ntr.kmers.

What should I do to make the command run successfully. Any suggestion will be appreciated, thanks.

Best Wishes,
Lee

error encountered in the example snakemake command

Dear ChaissonLab,

I tried running the test example command using snakemake. I encountered the error message:

KeyError in line 10 of GoodPangenomeGraph.snakefile:
'pruning'

There indeed appears no key "pruning" in the goodPanGenomeGraph.json file.

Thanks!

Missing RPGG.tar.gz file

Hi, I tried to run this soft. When I followed the guidance, I can 't find RPGG.tar.gz file in the package downloaded. Did I missed anything or this file is no longer needed? Thanks for ur help.

Best.
Abel

danbing-tk: src/aQueryFasta_thread.cpp:1194 int main(int, char**): Assertion ´tfFile´ failed

Hi,

I ran the sample command provided for danbing-tk as:
samtools fasta -@2 -n $SRS.bam |
/$PREFIX/danbing-tk/bin/danbing-tk -gc 80 -ae -kf 4 1 -cth 45 -o $OUT_PREF -k 21 -qs pan -fai /dev/stdin -p $THREADS

and have received:
danbing-tk: src/aQueryFasta_thread.cpp:1194 int main(int, char**): Assertion ´tfFile´ failed
Aborted (core dumped)

as soon as I run the command above with no other log/progress.

Any help would be appreciated to see why this may be happening.
Thank you,
Doruk

danbing-tk align empty *.kmer output

I conducted danbing-tk align with my short read-based bam file, which had been aligned against the GRCh37 reference using bwa mem.
The command, in the working directory containing the presence of the RPGG pan files, completed without any errors, but the generated kmers output file (NA12878.tr.kmers) was empty and the file size of aln.gz file was only 1.3 MB. Is there any solution for this trouble?

My command and output on the console are the follows:

samtools fasta -@ 2 -n NA12878.rmdup.bam | danbing-tk -gc 80 -ae -kf 4 1 -cth 45 -o NA12878 -k 21 -qs pan -fai /dev/stdin -p 6 | gzip > NA12878.aln.gz

use baitDB: 0
extract fasta: 0
interleaved: 1
sim mode: 0
trim mode: 0
augmentation mode: 0
graph threading mode: 1
output alignment: 1
output successfully aligned reads only: 1
k: 21
#of subsampled kmers in pre-filtering: 4
minimal # of matches in pre-filtering: 1
Cthreshold: 45
Rthreshold: 0.5
threading Cthreshold: 80
Running both step1 (kmer-based filtering) and step2 (threading)
fastx: /dev/stdin
query: pan.(tr/ntr).kmers

total number of loci in pan.tr.kmers: 80518
deserializing kmerDBi.umap
deserializing kmerDBi.vv
deserializing graph.umap
reading kmers from pan.tr.kmers
deserialized graph/index and read tr.kmers in 127 sec.
#unique kmers in kmerDBi: 147568763
creating data for each process...
threads created
Buffered reading 300000 300000 0
Buffered reading 300000 600000 0
Buffered reading 300000 900000 0
Buffered reading 300000 1200000 0
Buffered reading 300000 1500000 0
Buffered reading 300000 1800000 0

Thank you for your help.

Predicted length meanings

Dear ChaissonLab,
Thanks for your great work! I have got the estimated lengths of my dataset, and I'm a little confused of the result. Does the estimated length represent the mean allele length or the sum of two allele lengths of a locus?

Best,
Skye

danbingtk v1.3.2 align error

Hi,
When I test the "align" module of danbing-tk v1.3.2, there were errors occurred.
when using the command:
danbing-tk -gc 85 -ae -kf 4 1 -cth 45 -o $sample -k 21 -qs pan -fa $WORKDIR/${sample}.fasta -p 16
the error was:

[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 730496712 reads
use baitDB: 0
extract fasta: 0
interleaved: 0
sim mode: 0
trim mode: 0
augmentation mode: 0
graph threading mode: 1
output alignment: 1
output successfully aligned reads only: 1
k: 21
# of subsampled kmers in pre-filtering: 4
minimal # of matches in pre-filtering: 1
Cthreshold: 45
Rthreshold: 0.5
threading Cthreshold: 85
Running both step1 (kmer-based filtering) and step2 (threading)
fastx: /shared/home/zhangsj/scripts/danbingtk-1.3.2/HG00405.fasta
query: pan.(tr/ntr).kmers

total number of loci in pan.tr.kmers: 80517
deserializing kmerDBi.umap
deserializing kmerDBi.vv
deserializing graph.umap
reading kmers from pan.tr.kmers
terminate called after throwing an instance of 'std::out_of_range'
  what():  stoul
Aborted (core dumped)

when using the command:
danbing-tk -gc 85 3 -ae -kf 4 1 -cth 45 -o $sample -k 21 -qs pan -fa $WORKDIR/${sample}.fasta -p 16
the error was:

[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 730496712 reads
invalid option: 3
terminate called without an active exception
Aborted (core dumped)

Could you please help me this out? Many thanks!!

Build pipeline uses only ref VNTRs or all assemblies?

Hi,
I may be reading something wrong, but in the build pipeline, the config file seems to only take in VNTRs discovered from the reference (here), although the README seems ambiguous if it is ref-only (like tr.good.bed) or all assemblies.

Running danbing-tk build

Required inputs:
- haplotype-resolved assemblies (FASTA)
- matched SRS data (BAM; optional)
- reference genome (major chromosomes only without minor contigs)
- tandem repeat regions (BED; available on release page or user-defined)

while from the manuscript methods definitely sounds like all assemblies are used

TRF37 v4.09 (option: 2 7 7 80 10 50 500 -f -d -h) was used to roughly annotate the SSR regions of five PacBio assemblies (AK1, HG00514, HG00733, NA19240, NA24385).

Could you clarify if the build pipeline uses VTNRs discovered from each assembly or only the reference, and and how to include them if so?

Thanks,
Alex

Error in Build RPGG for my own datasets

Hi,
I use 35 hyplotypes to build RPGG for my VNTR sets, I find that the program always reports the following error when I run /soft/danbing-tk-1.1//script/preMBE.py /35Hyplto/genome.bam.tsv pan.tr.mbe.v0.bed:
beds[hi,m] = np.loadtxt(f"{g}/tmp1.{h}.bed", dtype=object, usecols=[0,1,2,6], ndmin=2)
ValueError: shape mismatch: value array of shape (101497,4) could not be broadcast to indexing result of shape (37920,4)
I checked f"{g}/tmp1.{h}.bed" file, it's seems that every hyplotype file have different number of mapping locations, what should I do to solve this problem, any suggestion will be appreciated.

Best wishes
Lee

Boundary expanded VNTRs

Dear ChaissonLab,

I was curious if the boundary expanded VNTRs are made available (or can be made available) as described in page 21 of your preprint https://www.biorxiv.org/content/10.1101/2020.08.13.249839v3.full.pdf

If not, can you pinpoint on how to perform this expansion on an assembly or hg38?

Thank you.
Doruk

Missing LSB.tsv file of release 1.3

Thanks for creating this extraordinary tool. Currently, our lab found a VNTR loci by long read sequencing data, and is really looking forward to validating our finding by genotyping it in sea-like NGS data. Sadly, I found this loci was in the v1.3 80,518 loci but not in the v1.0 32,138 loci. And by checking the download link of v1.3, I cannot find the LSB file and control bed of that release. I noticed the LSB file should have the loci number the same as RPGG loci number plus control region loci number, so those two missing files are required in the predict step. May I know if there is any place that I can download the LSB file and control bed of v1.3?

By the way, seems the current *.tr.kmers ouput of align step contains only kmer count but no kmer, and the loci lines which starts with > are lost, too.

Assertion `its.size()' failed error in danbing-tk align

Hi,

When testing a sample from the 1000 Genomes Project using danbing-tk align module, I encountered the error message:

danbing-tk: src/aQueryFasta_thread.cpp:128: void countDupRemove(std::vector<std::__detail::_Node_iterator<std::pair<const long unsigned int, long unsigned int>, false, false> >&, std::vector<std::__detail::_Node_iterator<std::pair<const long unsigned int, long unsigned int>, false, false> >&, std::vector<std::pair<unsigned char, unsigned char> >&): Assertion `its.size()' failed.

The commend I used is as follow:
samtools fasta -n HG00096.bam |./danbing-tk -gc 80 -ae -kf 4 0 -cth 45 -o test -k 21 -qs pan -fai /dev/stdin -p 2 |gzip >test.aln.gz

The complete log is pleased to be provided if you need.

Thank you for helping me out!

Best,
Skye

danbing-tk build doesn't appear to work in v1.3

I'm running the build pipeline using several assemblies, currently just trying it without the prune option, so my genome.bam.tsv looks like

OxO
BSW3
BSW4

The pipeline goes ahead fine until the GenPanGenomeGraph step, where it segfaults almost immediately.
It looks like the main issue is there are "0 loci" in my data when loading, and upon inspecting the ..kmers files are all just a single column list of numbers as far as I can tell. This doesn't match the listed file format like

>locus i
kmer0	kmer_count0

so I am assuming something has gone wrong causing there to be no loci found. However, in the stderr of the GenRawGenomeGraph there is "Using orthology map, total number of loci: 13703".

I'm currently rerunning the pipeline using v1.1 to see if it is a recent change, so will update on that later.

danbing-tk: src/aQueryFasta_thread.h:457: Assertion `fin' failed

Dear ChaissonLab,

I tried running the test example command using snakemake. I encountered the error message:
"danbing-tk: src/aQueryFasta_thread.h:457: void readBinaryIndex(kmerIndex_uint32_umap&, std::vector&, std::string&): Assertion `fin' failed."

If you could give me some suggestion about what should I do to make the command successfully, it will be appreciated.
The following is the complete log:

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 GenPanGenomeGraph
2 GenPrunedGenomeGraph
2 GenRawGenomeGraph
1 GenSerializedGraphAndIndex
1 all
7

[Mon Feb 14 23:45:53 2022]
rule GenRawGenomeGraph:
input: /public/home/software/danbing-tk-1.3/test/output/HG00514.0.tr.fasta, /public/home/software/danbing-tk-1.3/test/output/HG00514.1.tr.fasta, /public/home/software/danbing-tk-1.3/test/input/HG00514.filtered.reads.bam, /public/home/software/danbing-tk-1.3/test/output/OrthoMap.v2.tsv
output: /public/home/software/danbing-tk-1.3/test/output/HG00514.rawPB.tr.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00514.rawPB.ntr.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00514.rawPB.graph.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00514.rawIL.tr.kmers
jobid: 13
wildcards: genome=HG00514
priority: 95
resources: cores=24, mem=25

set -eu
ulimit -c 20000
cd /public/home/software/danbing-tk-1.3/test/output/
module load gcc

/public/home/software/danbing-tk-1.3//bin/vntr2kmers_thread -g -m <(cut -f $((0+1)),$((0+2)) /public/home/software/danbing-tk-1.3/test/output/OrthoMap.v2.tsv) -k 21 -fs 700 -ntr 700 -o HG00514.rawPB -fa 2 /public/home/software/danbing-tk-1.3/test/output/HG00514.0.tr.fasta /public/home/software/danbing-tk-1.3/test/output/HG00514.1.tr.fasta

if [ 1 == "1" ]; then
samtools fasta -@2 -n /public/home/software/danbing-tk-1.3/test/input/HG00514.filtered.reads.bam |
/public/home/software/danbing-tk-1.3//bin/bam2pe -fai /dev/stdin |
/public/home/software/danbing-tk-1.3//bin/danbing-tk -g 50 -k 21 -qs /public/home/software/danbing-tk-1.3/test/output//HG00514.rawPB -fai /dev/stdin -o HG00514.rawIL -p 24 -cth 45 -rth 0.5
fi

rule GenRawGenomeGraph:
input: /public/home/software/danbing-tk-1.3/test/output/HG00733.0.tr.fasta, /public/home/software/danbing-tk-1.3/test/output/HG00733.1.tr.fasta, /public/home/software/danbing-tk-1.3/test/input/HG00733.filtered.reads.bam, /public/home/software/danbing-tk-1.3/test/output/OrthoMap.v2.tsv
output: /public/home/software/danbing-tk-1.3/test/output/HG00733.rawPB.tr.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00733.rawPB.ntr.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00733.rawPB.graph.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00733.rawIL.tr.kmers
jobid: 14
wildcards: genome=HG00733
priority: 95
resources: cores=24, mem=25

set -eu
ulimit -c 20000
cd /public/home/software/danbing-tk-1.3/test/output/
module load gcc

/public/home/software/danbing-tk-1.3//bin/vntr2kmers_thread -g -m <(cut -f $((2+1)),$((2+2)) /public/home/software/danbing-tk-1.3/test/output/OrthoMap.v2.tsv) -k 21 -fs 700 -ntr 700 -o HG00733.rawPB -fa 2 /public/home/software/danbing-tk-1.3/test/output/HG00733.0.tr.fasta /public/home/software/danbing-tk-1.3/test/output/HG00733.1.tr.fasta

if [ 1 == "1" ]; then
samtools fasta -@2 -n /public/home/software/danbing-tk-1.3/test/input/HG00733.filtered.reads.bam |
/public/home/software/danbing-tk-1.3//bin/bam2pe -fai /dev/stdin |
/public/home/software/danbing-tk-1.3//bin/danbing-tk -g 50 -k 21 -qs /public/home/software/danbing-tk-1.3/test/output//HG00733.rawPB -fai /dev/stdin -o HG00733.rawIL -p 24 -cth 45 -rth 0.5
fi

Using orthology map, total number of loci: Using orthology map, total number of loci: 11

building and counting building and counting /public/home/software/danbing-tk-1.3/test/output/HG00514.0.tr.fasta/public/home/software/danbing-tk-1.3/test/output/HG00733.0.tr.fasta kmers
kmers
building and counting /public/home/software/danbing-tk-1.3/test/output/HG00733.1.tr.fasta kmers
writing outputs
building and counting /public/home/software/danbing-tk-1.3/test/output/HG00514.1.tr.fasta kmers
writing outputs
fname: /dev/stdin
fname: /dev/stdin
use baitDB: 0
extract fasta: 0
interleaved: 1
sim mode: 0
trim mode: 0
augmentation mode: 0
graph threading mode: 1
output alignment: 0
use baitDB: 0
extract fasta: 0
interleaved: 1output successfully aligned reads only:
0sim mode:
0k:
21trim mode:
0# of subsampled kmers in pre-filtering:
4augmentation mode:
0minimal # of matches in pre-filtering:
1graph threading mode:
1Cthreshold:
45output alignment:
0Rthreshold:
0.5output successfully aligned reads only:
0threading Cthreshold:
50k:
21Running both step1 (kmer-based filtering) and step2 (threading)

of subsampled kmers in pre-filtering: fastx: 4/dev/stdin

minimal # of matches in pre-filtering: query: 1/public/home/software/danbing-tk-1.3/test/output//HG00514.rawPB
.(tr/ntr).kmersCthreshold:
45

total number of loci in Rthreshold: /public/home/software/danbing-tk-1.3/test/output//HG00514.rawPB.tr.kmers: 0.5
threading Cthreshold: 50
Running both step1 (kmer-based filtering) and step2 (threading)
fastx: /dev/stdin
query: /public/home/software/danbing-tk-1.3/test/output//HG00733.rawPB.(tr/ntr).kmers

total number of loci in /public/home/software/danbing-tk-1.3/test/output//HG00733.rawPB.tr.kmers: 0
0
deserializing kmerDBi.umap
deserializing kmerDBi.umap
danbing-tk: src/aQueryFasta_thread.h:457: void readBinaryIndex(kmerIndex_uint32_umap&, std::vector&, std::string&): Assertion fin' failed. danbing-tk: src/aQueryFasta_thread.h:457: void readBinaryIndex(kmerIndex_uint32_umap&, std::vector<unsigned int>&, std::string&): Assertion fin' failed.
/bin/bash: line 11: 220388 Broken pipe samtools fasta -@2 -n /public/home/software/danbing-tk-1.3/test/input/HG00514.filtered.reads.bam
220389 | /public/home/zhaoFL/software/danbing-tk-1.3//bin/bam2pe -fai /dev/stdin
220390 Aborted (core dumped) | /public/home/software/danbing-tk-1.3//bin/danbing-tk -g 50 -k 21 -qs /public/home/software/danbing-tk-1.3/test/output//HG00514.rawPB -fai /dev/stdin -o HG00514.rawIL -p 24 -cth 45 -rth 0.5
/bin/bash: line 11: 220385 Broken pipe samtools fasta -@2 -n /public/home/software/danbing-tk-1.3/test/input/HG00733.filtered.reads.bam
220386 | /public/home/software/danbing-tk-1.3//bin/bam2pe -fai /dev/stdin
220387 Aborted (core dumped) | /public/home/software/danbing-tk-1.3//bin/danbing-tk -g 50 -k 21 -qs /public/home/software/danbing-tk-1.3/test/output//HG00733.rawPB -fai /dev/stdin -o HG00733.rawIL -p 24 -cth 45 -rth 0.5
[Mon Feb 14 23:45:54 2022]
Error in rule GenRawGenomeGraph:
[Mon Feb 14 23:45:54 2022]
jobid: 13
Error in rule GenRawGenomeGraph:
output: /public/home/software/danbing-tk-1.3/test/output/HG00514.rawPB.tr.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00514.rawPB.ntr.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00514.rawPB.graph.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00514.rawIL.tr.kmers
jobid: 14
shell:

set -eu
ulimit -c 20000
cd /public/home/software/danbing-tk-1.3/test/output/
module load gcc

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
output: /public/home/software/danbing-tk-1.3/test/output/HG00733.rawPB.tr.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00733.rawPB.ntr.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00733.rawPB.graph.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00733.rawIL.tr.kmers

shell:

set -eu
ulimit -c 20000
cd /public/home/software/danbing-tk-1.3/test/output/
module load gcc

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job GenRawGenomeGraph since they might be corrupted:
/public/home/software/danbing-tk-1.3/test/output/HG00514.rawPB.tr.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00514.rawPB.ntr.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00514.rawPB.graph.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00514.rawIL.tr.kmers
Removing output files of failed job GenRawGenomeGraph since they might be corrupted:
/public/home/software/danbing-tk-1.3/test/output/HG00733.rawPB.tr.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00733.rawPB.ntr.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00733.rawPB.graph.kmers, /public/home/software/danbing-tk-1.3/test/output/HG00733.rawIL.tr.kmers
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

Best wishes,
Huiying

tr.kmers format coversion problem

How could I covert tr.kmers file to ntr.kmers file when I use an RPGG built by myself.
By the way, if there is a script which can covert kmers count to vntr length with a tr.kmers file is available?

danbing-tk predict user manual

Hi there,

Thank you for this excellent tool. I'm trying to use it to genotype VNTRs in a diverse panel of WGS samples. I am able to use danbing-tk to align my reads to the 35 genome RPGG reference that you provide.

However, I am really struggling to use the bash script and python script provided to convert these kmer counts into coverage estimates. Even a tool to simply convert kmer counts into uncorrected depth estimates would be handy. I have tried performing this procedure with the test files provided, and the script is currently erroring out while loading the LSB.tsv file into memory. Could you provide a test dataset and test commands to get coverage estimates?

Thank you,
Joe

chaissonlab / danbing-tk Goto Github PK

danbing-tk's Introduction

danbing-tk

Download Releases

Building on Linux

danbing-tk align

danbing-tk build

Install Dependencies

Running danbing-tk build

Scenario 1: building an RPGG for a single TR locus given VNTR alleles

Scenario 2: building an RPGG for a VNTR set given assemblies

danbing-tk predict

Miscellaneous

Leave-one-out analysis

File Format

*.graph.kmers

*.(tr|ntr).kmers

Alignment output (-a option)

danbing-tk's People

Contributors

Stargazers

Watchers

Forkers

danbing-tk's Issues

Running danbing-tk build

of subsampled kmers in pre-filtering: fastx: 4/dev/stdin

Recommend Projects

Recommend Topics

Recommend Org

Alignment output (`-a` option)