Coder Social home page Coder Social logo

walaj / svaba Goto Github PK

View Code? Open in Web Editor NEW
225.0 17.0 44.0 3.84 MB

Structural variation and indel detection by local assembly

License: GNU General Public License v3.0

Shell 0.16% Makefile 18.06% R 6.42% M4 0.06% C++ 72.71% C 2.32% Dockerfile 0.04% CMake 0.23%
indels variants assembled-contigs structural-variations c-plus-plus

svaba's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

svaba's Issues

Is it possible to resume a prematurely stopped analysis?

Hi Jeremiah

I have an analysis that stopped ~80% of the way through, with a few chromosomes to go. Is there any way to continue the analysis from the point it got up to, and append the results from the last chromosomes to the existing .bps and other output files?

Best,
Kevin.

There is a somatic SV, but the alignment file is empty

Hi,

I was able to detect a single somatic SV in a sample (according to svaba.sv.vcf), but the alignment file (alignments.txt and therefore contigs.fa) is empty.
Can you explain when those files can be empty?

Thank you in advance,

AD larger than DP?

Hi,

When I check the vcf head infomation, I suppose DP > AD = SR + DR.

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Depth of coverage: Number of reads covering site.">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Allele depth: Number of reads supporting the variant">

While I found some AD value is larger than DP in a normal-tumor pair. Is this a bug?

GT:AD:DP:GQ:PL:SR:DR:LR:LO      0/1:17:29:24.8:47.9,0,24.8:17:0:-48.36:48.36    1/1:31:28:8.4:92.4,8.4,0:31:0:-92.42:92.42

Specific types of BND

Hi, could you please provide a script for converting BND svtype to specific SV types, e.g. DEL, DUP. I don't think the one in #4 is accurate.
Thanks,

Example BAM file and output file are available?

Hi Jeremiah,

I wonder if you provide example BAM files and output file (.vcf) that I can test svaba and replicate the result. It is because I like to test whether I am running svaba correctly or not.

I downloaded BAM files from ICGC, and ran svaba. Please see the attached files that I got from svaba running;

  1. DO6350.svaba.somatic.sv.vcf.txt - This is filtered file.
  2. DO6350.svaba.unfiltered.somatic.sv.vcf.txt - This is unfiltered file.

The FILTERED file is empty. All of variant callings have low quality in the UNFILTERED vcf file. I didn't use any parameter in svaba running. Please see the command I used below;

svaba run -t 474e3b36be9aba5e893234f2d4891e6e.bam -n 3a27910d544adbeab95b2b73c20ab6e5.bam
bwa index ./BWA_DNA_RefSeq_Fasta_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa
-G ./BWA_DNA_RefSeq_Fasta_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa
-p 16 -D ./DBSNP_knownIndel/dbsnp_indel.vcf
-a DO6350

I don't think the BAM files I used have problem or wrong file format. Therefore, I like to test svaba with example bam files and see whether I can replicate the result, and see whether I can get FILTERED (high quality) variant callings. Could you help?

Sanghoon

DO6350.svaba.somatic.sv.vcf.txt
DO6350.svaba.unfiltered.somatic.sv.vcf.txt

Compiling v0.2.1 (bioconda recipe)

I am trying to make a bioconda recipe for SvABA v0.2.1, but I keep getting compiler errors:

bioconda/bioconda-recipes#12943

The errors are reproducible when I do the compilation locally. However, if I check out the current master branch, everthing works fine. Do you have any plans of making a new release incorporating the latest changes?

Comparing SvABA running result with ICGC StSM somatic.sv.vcf file

Hi,

I downloaded tumor and normal BAM files of ICGC donor ID, "DO48701" from ICGC. The tumor BAM file ID is FI44968, and normal BAM file ID is FI44967. I ran SvABA on the BAM files, and got "svaba.unfiltered.germline.sv.vcf (541 vcf calls)" and "svaba.unfiltered.somatic.sv.vcf (789 vcf calls)" Here is my command line;

svaba run -t 6abba3f621d5442ad1e893d1ea11e9c6.bam
-n 1ddf1b0c325527318767571b8b076e13.bam
bwa index ./Homo_sapiens.GRCh37.75.dna.primary_assembly.fa
-G ./Homo_sapiens.GRCh37.75.dna.primary_assembly.fa
-p 16 -D ./DBSNP_knownIndel/dbsnp_indel.vcf
-a DO48701

Question1. Bunt, "svaba.germline.sv.vcf" and "svaba.somatic.sv.vcf" files are empty. This means that all vcf calls were filtered out as they are low quality.. am I correct?

Question2. I downloaded "00c27940-c623-11e3-bf01-24c6515278c0.broad-snowman-10.20151218.germline.sv.vcf" file from ICGC. This file is also from the same donor ID, "DO48701". As you see the file name, it is analyzed by 'snowman'. I compared the vcf callings between this file and "svaba.unfiltered.germline.sv.vcf (541 vcf calls)". But, I couldn't see any common vcf callings. I expected my SvABA running result and ICCC StSM somatic.sv.vcf file show similar performance. Could you advise what I did wrong..?

Thank you,
Sanghoon

Any plan of replacing bwa-mem with minimap2?

Hi Jeremiah,
Thanks for your great work! Svaba is awesome! I wonder if there is any plan to replace bwa-mem with minimap2 which is compatible with paired end short reads and supports long reads. Furthermore, do you have any plan to support long reads for SV detection like Sniffle and Picky? Thanks!

LOD (LO)

Are there any materials that explain how LO works? I want to do some filtering by LO and I'm not sure what threshold I should put my filter at.

Thanks,

Yosef

Metadata ID Format

Hi,

I've noticed that the paths to the bam files I'm using in my svaba runs are used as the sample IDs in the vcf files. I'm wondering if this is the intended output, or if there will be an option to specify each sample ID for vcf format purposes? Having flags such as --tumor-sample-name and --normal-sample-name would be really helpful so that we don't run into any format issues due to the backslashes in the bam file paths.

Tumor only analysis

Hi @walaj,

I'm trying out svaba for the first time looking to call somatic mutations on tumor only data. I was wondering if there is a recommended way to run svaba in those cases?

Thanks,
Carlos

STCommon.h:99:46: error: left operand of shift expression

Trying to compile with GCC version gcc (GCC) 6.1.0

I got

../../../src/SGA/SuffixTools/STCommon.h:99:49: warning: left shift of negative value [-Wshift-negative-value] static const uint64_t HIGH_MASK = ~0 << POS_BITS; ^~~~~~~~ ../../../src/SGA/SuffixTools/STCommon.h:99:46: error: left operand of shift expression '(-1 << 28)' is negative [-fpermissive] static const uint64_t HIGH_MASK = ~0 << POS_BITS;

I changed the line to

static const uint64_t HIGH_MASK = (~((uint64_t)(0))) << POS_BITS;

It solved the compilation probleme.

VCF files incomplete?

Greetings, I was recently running SvABA on data from the ICGC and was comparing the vcf files to older data. I noticed that my indel vcf files show only analysis from Chromosome 1. The SV vcf files show other chromosomes, but also feel as if most of the file is dedicated to Chrosome 1 too. I'm not sure if it's a problem with the way I ran SvABA, the data that I'm analyzing, etc. From reading past issues I think the problem mostly lies on the parameters, but I honestly don't know which ones I should change and how. . I'm including the log from my run in case it's of any help for troubleshooting for anyone that would like to help.

Many thanks in advance.
svaba_DO48701_LiverC.log

LOWMATCHLEN is a FILTER value in sv.vcf but not described in header

There is no description of the LOWMATCHLEN included in the vcf header. Lines 738-741 in BreakPoint.cpp define the criteria for this FILTER value, but then a description is missing for LOWMATCHLEN in the sv_header.addFilterField function within vcf.cpp. This causes errors for bcftools:[W::vcf_parse] FILTER 'LOWMATCHLEN' is not defined in the header
Error: The filter is not defined in the header: LOWMATCHLEN

output directory

Hi Jeremiah,

would it be possible to add an option to specify where the outputs should be written instead of using the current directory ?
i don't think this maybe such a complicated task to do (maybe a bit time consuming I agree) but it would very help to make the software included in more broader pipeline.

Thank you

Error with simulated data

Hi Walaj,

I am working with some simulated paired-end data generated using wgsim (coverage=100X, read length=70bp, mean insert size=150 and sd insert size=15). When I run svaba, I come across the following errors:

!!!! WARNING. Multiple max mapq mixed: NC_001137.3_164083_164206_2--60 max possible 60

not enough paired-end reads to get insert-size distribution. skipping discordant analysis
       tead group: NC_001136.10_331155_331313_1 Insert-sizes sampled: 1 visited 2
 [M::bwa_idx_load_from_disk] read 0 ALT contigs
         --- Loaded non-read data. Starting detection pipeline

Also, no output files are generated. Trying the following commands:

module load svaba/intel/20170210 
svaba run -p 16 -G $ref -t $bam1

Can you help me figure out what could be going on and how to fix it? Also, let me know if you require any additional information.

Thanks,
Gunjan

Splitting input file to small files / Directing output directory

Hi,

I have two questions.

  1. Splitting input file to small files.
    My input files (bam files) are 100 ~ 130 GB. When I ran svaba first time in a super computer (16 cores multi-thread), I set waltime limit 72 hours, but the running exceeded 72 hours and terminated. I didn't know the running will take this long time. I wonder if I can split the BAM file to small files, and run svaba. I just googled whether I can split BAM file, and I found some information.;
    https://www.biostars.org/p/46327/ Once I can split the BAM file and run svaba for small input files, I think I can get multiple outputs from multiple runnings, and then I can combine the outputs. Would it be okay?

  2. Directing output directory.
    I tried to direct output directory. So, I tried to see if there is a flag that can enable to direct output files, but it seems not. "-a" is just output file name prefix. Does svaba has a flag that can direct output files?

Thank you,
Sanghoon

Example dbsnp_indel.vcf no longer available

The example dbsnp_indel.vcf is no longer available: wget "https://data.broadinstitute.org/snowman/dbsnp_indel.vcf" ## get a DBSNP known indel file

Is this known indel dbsnp available to download somewhere else? Thanks!

"Filed to load index for ..." error while running SvABA.

Hi Jeremiah,

I am testing SvABA but I got an error message about "loading index" and didn't get '.vcf' output files. The command line I used is like below.

svaba run -t ../54_dbGaP_RNA-seq_FusionCapture_BWA/FusionCaptureTest_ReadID_NewFormat_test_20180118/aln_fq_paired_SRR988437_1_200M.fastq.gz.bam -G ./BWA_Reference_fasta/GRCh38.primary_assembly.genome.fa -p 16 -D ./DBSNP_knownIndel/dbsnp_indel.vcf -a SRR988437_200M

The log file is attached. If you see the log file, the main error messages are; "Failed to load index for....." and the last line is

svaba: run_svaba.cpp:1788: CountPair collect_mate_reads(WalkerMap&, const MateRegionVector&, int, SeqLib::GRC&): Assertion `w.second.SetMultipleRegions(gg)' failed.
/var/spool/slurmd/job624012/slurm_script: line 20: 15817 Aborted svaba run -t /pghbio/dbmi/xwangp/sal170/55_Snowman_54729-5_AIR/2_54729-5_WGS_AIR_BRCA_BAM/aln_fq_paired_SRR476765.bam -G ./BWA_Reference_fasta/GRCh38.primary_assembly.genome.fa -p 16 -D ./DBSNP_knownIndel/dbsnp_indel.vcf -a SRR476765

As I read the SvABA manual again, it says "A BWA-MEM index reference genome must also be supplied with -G" But, as I see the example command line, "-G" needs refseq fasta file. So, I just used GRCh38.primary_assembly.genome.fa. Should I use BWA-MEM index ref genome with -G? Then, could you provide the BWA-MEM index ref geneom?

Thank you,
Sanghoon

svaba_tumorOnly_SRR476765.log

Run on exomes?

Hi, @walaj. This looks like something that I would like to try out. We have a project with about 50/50 mix of genomes and exomes. I understand that the exome SV/indel calls will be limited, but is there any particular reason that snowman could not be applied to exomes?

Annotate svaba output vcf

Hi,

Is there a way to annotate svaba output vcf to get gene names and genomic regions etc. I tried snpEFF but it failed to recognize svaba vcf format.

Many thanks,
Rahul

calculate Variant Allele Frequency (VAF)

Hello,

I was wondering what the correct way to calculate the VAF is? I tried AD/DP but a lot of the regions have a value greater than 1 and they also don't correspond with the VAF that some other callers are giving.

Thanks!

#29

Challenge of running svaba with FFPE tumor/normal pairs

Svaba is awesome! For frozen WGS samples, we got a genome done within 4-5 hours with low CPU/memory footage.

However, as we run tumor-normal pairs on FFPE samples,it is taking extra long time and not finishing. Is there a way to deal with the FFPE tumor-pair samples using svaba? Suggestions are really appreciated.

Here are some of the log information I have:
The first number is the line number of the sample.log:

930147 Ran GL000192.1:    441,001-    466,001 | T:  2856 N:   127 C:    94 | R: 49% M:  0% T:  5% C: 12% A: 31% P:  0% | CPU: 10282m53s Wall: 1326m25s
930148 ...BFC attempted correct 5314 train: 25402 kcov: 80.874680 kmer: 17
930149 ...assembled 227 contigs for c_84_465501_490501
930150 ...BFC attempted correct 6433 train: 33261 kcov: 100.696404 kmer: 17
930151 writing contigs etc on thread 47362852042496 with limit hit of 7520
930152 Ran GL000192.1:    465,501-    490,501 | T:  4418 N:   895 C:   159 | R: 51% M:  0% T:  6% C: 13% A: 27% P:  0% | CPU: 10283m01s Wall: 1326m27s
930153 ...assembled 268 contigs for c_84_490001_515001
930154 ...assembled 307 contigs for c_84_514501_539501
930155 writing contigs etc on thread 47362854143744 with limit hit of 7295
930156 Ran GL000192.1:    490,001-    515,001 | T:  4677 N:   637 C:   192 | R: 55% M:  0% T:  5% C: 12% A: 25% P:  0% | CPU: 10283m10s Wall: 1326m29s
930157 writing contigs etc on thread 47362839435008 with limit hit of 10967
930158 Ran GL000192.1:    514,501-    539,501 | T:  5463 N:   970 C:   203 | R: 60% M:  0% T:  4% C:  8% A: 25% P:  0% | CPU: 10283m11s Wall: 1326m29s
930159 writing contigs etc on thread 47362847840000 with limit hit of 20518
930160 ...assembled 1199 contigs for c_83_171501_196501
930161 Ran GL000225.1:    196,001-    211,173 | T:  7289 N:  4674 C:   388 | R: 53% M:  0% T:  7% C: 10% A: 27% P:  0% | CPU: 10283m14s Wall: 1326m31s
930162 writing contigs etc on thread 47362841536256 with limit hit of 51970
930163 Ran GL000225.1:    171,501-    196,501 | T: 12692 N:  8323 C:   732 | R: 58% M:  0% T:  6% C: 15% A: 18% P:  0% | CPU: 10283m24s Wall: 1326m41s

The error log of the job (the one killed before) looks like this:

...
21858     stopping read lookup at 83:85,871(+) in window 83:73,501-98,501(*) with 50,001 weird reads. Limit: 50,000
21859     stopping read lookup at 83:50,907(+) in window 83:49,001-74,001(*) with 50,001 weird reads. Limit: 50,000
21860     stopping read lookup at 83:131,670(-) in window 83:122,501-147,501(*) with 50,001 weird reads. Limit: 50,000
21861     stopping read lookup at 83:102,433(-) in window 83:98,001-123,001(*) with 50,001 weird reads. Limit: 50,000
21862     stopping read lookup at 83:126,115(+) in window 83:122,501-147,501(*) with 50,001 weird reads. Limit: 50,000
21863     stopping read lookup at 83:82,785(+) in window 83:73,501-98,501(*) with 50,001 weird reads. Limit: 50,000

calculate SR DR through alignment file

Hi,

This is the alignment information for a somatic SV event.

Global BP: : 1:1,038,901(-) to 1:1,039,297(+) SPAN 396 c_1_1029001_1054001_367C  n001:0 t000:6 ins_aginst_contig 0 del_against_contig 0  c_1_1029001_1054001_367C
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>................................................................................................................      C[0,1
14] G[1039184,1039297]  Local: 1        Aligned to: chr1:1039183(+) CIG: 114M112S MAPQ: 60 SUBN 0 Disc: 1:1038873(-)-1:1039346(+) Tcount: 7 Ncount: 0 Mean MAPQ: 60 Mean Mate MAPQ: 53 Valid: TRUE  -- c_1_1029001_1054001_367C
..................................................................................................................>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>...........      C[114
,215] G[1038901,1039001]        Local: 1        Aligned to: chr1:1038900(+) CIG: 114S101M11S MAPQ: 60 SUBN 0 Disc: 1:1038873(-)-1:1039346(+) Tcount: 7 Ncount: 0 Mean MAPQ: 60 Mean Mate MAPQ: 53 Valid: TRUE  -- c_1_1029001_1054001_367C
ACCCCGGCCTCCTGAATAGTTGGGACCTCAGGTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACGAGCGGCATCTATACACAGGACTCCCCTCGGCTCTT    c_1_102
9001_1054001_367C
ACCCCGGCCTCCTGAATAGTTGGGACCTCAGGTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCAT                                                                            t000_161_E0
0514:191:H3YHNCCXY:4:1117:4350:58778--1:1039183 r2c CIGAR: 150M, c_1_1029001_1054001_367C
ACCCCGGCCTCCTGAATAGTTGGGACCTCAGGTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACTCCTGCGGTGAGCGGTGGCTCAGGTCACTGCAT                                                                            t000_161_E0
0516:172:H3H3YCCXY:7:1112:17929:2557--1:1039183 r2c CIGAR: 150M, c_1_1029001_1054001_367C
         TCCTGAATAGTTGGGACCTCAGGTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGTCCAGGCTGGTCTCAAATGCCTGG      TCACACCTGCGGTGAGCGGTGGCCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCC                                          t000_83_E00
514:191:H3YHNCCXY:4:2112:27113:50779--1:1039192 r2c CIGAR: 99M28S,n001_83_E00516:197:H5MGTCCXY:7:2224:26027:37612--1:1038871 r2c CIGAR: 29S22M1D76M, c_1_1029001_1054001_367C
                               GTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGG                                                                    t000_97_E00
514:191:H3YHNCCXY:4:2116:24383:72280--1:1039214 r2c CIGAR: 127M, c_1_1029001_1054001_367C
                               GTGCACACCATTCCACCCAAGTAACGTCCTTTTCTTTTGTAGAGACTGGGTTGCCCAGGCTGCTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGC                                                                                          t000_65_E00
514:191:H3YHNCCXY:5:1201:31822:35766--1:1039214 r2c CIGAR: 105M22S, c_1_1029001_1054001_367C
                                                     AACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACGAGCGGCATCTAT                       t000_163_E0
0514:191:H3YHNCCXY:5:1112:19816:46437--1:1038900 r2c CIGAR: 150M, c_1_1029001_1054001_367C
                                                                                                   AATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACGAGCGGCATCTATACACAGGACTCCCCTCGGCTCTT     t000_8
1_E00514:191:H3YHNCCXY:4:1105:28260:14125--1:1038900 r2c CIGAR: 127M, c_1_1029001_1054001_367C
                                                                                                                  CACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACG                                   n001_105_E0
0516:197:H5MGTCCXY:8:1222:3884:34834--1:1038876 r2c CIGAR: 24S101M2S, c_1_1029001_1054001_367C
                                                                                                                  CACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACG                                   t000_99_E00
514:191:H3YHNCCXY:4:1213:13027:38051--1:1038894 r2c CIGAR: 24S101M2S, c_1_1029001_1054001_367C
                                                                                                                  CACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACGAGCGGCATCTATA                      t000_99_E00
514:191:H3YHNCCXY:5:1219:25631:63472--1:1038899 r2c CIGAR: 11S101M15S, c_1_1029001_1054001_367C

I want to know how "Tcount: 7 Ncount: 0" is counted?

Meanwhile, the vcf record for this variation is:

chr1    1038901 6183:1  c       ]chr1:1039297]c 0       PASS    BX=ACGCCAGGTCCGAATT-1_1,TACTAGGGTGATTACC-1_1,CAACAACAGCATGACG-1_4;DISC_MAPQ=60;EVDNC=ASDIS;MAPQ=60;MATEID=6183:2;MATENM=1;NM=0;NUMPARTS=2;SCTG=c_1_1029001_1054001_367C;SPAN=
396;SVTYPE=BND     GT:AD:DP:GQ:PL:SR:DR:LR:LO      0/0:0:46:12.6:0,12.6,138.6:0:0:12.92:0  0/0:8:108:3.6:0,3.6,277.2:6:7:3.936:14.29
chr1    1039297 6183:2  a       a[chr1:1038901[ 0       PASS    BX=ACGCCAGGTCCGAATT-1_1,TACTAGGGTGATTACC-1_1,CAACAACAGCATGACG-1_4;DISC_MAPQ=53;EVDNC=ASDIS;MAPQ=60;MATEID=6183:1;MATENM=0;NM=1;NUMPARTS=2;SCTG=c_1_1029001_1054001_367C;SPAN=
396;SVTYPE=BND     GT:AD:DP:GQ:PL:SR:DR:LR:LO      0/0:0:46:12.6:0,12.6,138.6:0:0:12.92:0  0/0:8:108:3.6:0,3.6,277.2:6:7:3.936:14.29

How SR=6, DR=7, and AD=8 is calculated based on the alignment information?
I didn't seen 7 discordant reads in alignment file.

Thanks a lot

a question about TSI?

Hi Walaj,

I wonder if you can give more explanation on TSI_L and TSI_G in the "bpx.txt".
The VCF file says:

TSI_L templated-sequence insertion (local, e.g. AB or BC of an ABC), TSI_G global (e.g. AC of ABC)

Here, could you more clarify what "AB, BC, AC, ABC" represents?
I guess that ABC means an assembled sequence is aligned onto three pieces (A-B-C), and the bp pairs (pos1 and pos2) represent AB, BC, or BC. Is this a correct interpretation?

Thank you in advance,

Problem with long deletion

Hi Walaj,

I tried to use SvABA on long deletion range from 1mbp to 150mbp in simulated bam file. SvABA works well when only small indel are included. However, when I start to include those long deletion into the bam SvABA failed to detect anything, not even the small indels.
I checked the contig bam file, and those contigs do include correct breaking points.
BTW, default germline settings were used for the analysis.

Reads supporting Variant call

Hello,

Please can you tell me how to get information about number of reads supporting the variant listed in the *sv.vcf output file?

I would really appreciate it. Thanks!

Strange output with in-house reference

Hi,
I try to use svaba to detect bacterial SVs in metagenomics data with a reference contains ~600 bacterial genomes. And in *discordant.txt.gz output, I notice the chromosome id 'chr1' and 'chr2' are range from 1 to 24; while in *.alignments.txt.gz output, BP information had some chromosomes named 'X', 'Y' and 'M', like:
Global BP: : X:2,968,914(+) to 457:3,952,360(-)
...
Global BP: : Y:349,942(-) to 104:1,266,871(-)
...
Global BP: : M:10,276(+) to 552:23,800(-)
which are not in my reference.

I guess this may caused by svaba treated the reference as human genome.
Is it caused by that? and if there any way I can do to make svaba can used with this reference?

Thanks a lot!

MAPQ zero in unfiltered file

Are these entries in the unfiltered vcf file with QUAL=LOWMAPQ and one mate with MAPQ=0 means that mate has multiple mapping?

Somatic rearrangement scores: `LO` or `SL`

Hi Jeremiah,

I have a question about the scores for somatic rearrangements. I was checking the detected variants in "*.svaba.somatic.sv.vcf" and was wondering whether "LO" in the FORMAT column is the correct one to look at.

Or should I use the scaled score "SL" for somatic rearrangement scores? However, I found "SL" is not included in the resulting vcf after running svaba run. Should I run svaba refilter to get the SL scores? Or would it be fine to compute (min(MAPQ, MATEMAPQ) - 2 * NM) / 60 * LO?

SVTYPE classification

Hi,

I have been testing out your new program which I was hoping would filling a niche in our tumor sv pipeline. After running svaba on a number of in-house benchmarking datasets, I've notice that although in your bioRxiv paper you've classified SVs as DUP, DEL etc in the SV vcf outputs all SVTYPE are all classified as BND. Are there any plans to further classify the outputted SVs by SVTYPE and/or do you already have a mechanism to do so?

Thanks in advance.

Best,

Calls Specific to High Coverage Data

Hi,

We have been running multiple SV callers on high coverage data (275x tumor / 156x normal). We have observed that SvABA seems to be making quite a few SV calls that are about 500bp in size and only have a couple supporting reads. SvABA does not make these calls on the same tumor/normal pair with lower coverage (80x tumor / 40x normal). The other SV tools (Manta and Lumpy) do not find these calls at high coverage or low coverage, and after looking at these variants in IGV they do not seem real. We are wondering if this might have something to do with the reassembly being affected by high coverage data?

Thanks,
Molly

Question about tumor bam file for germline SV calling

Dear author
I recently want to study the difference of whole level of SV structures for normal and tumor tissues. I know that somatic calling could serve the purpose for calling somatic SV structures. However, I wonder that if I can use 'SvABA run -a germline' on tumor and normal tissues separately, and extract the information from separate germline_run.svaba.sv.vcf to represent the SV situations for normal and tumor tissues?

about svaba version

Hi,
Has the svaba software been updated? Is there any information about the software version number?
Thank you!

Installation error

I am trying to compile your tool using GCC 4.8.5 and following error is reported:

make[2]: Entering directory svaba/SeqLib/htslib' gcc -g -Wall -O2 -I. -c -o thread_pool.o thread_pool.c thread_pool.c: In function ‘hts_tpool_init’: thread_pool.c:658:5: warning: implicit declaration of function ‘pthread_mutexattr_settype’ [-Wimplicit-function-declaration] pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE); ^ thread_pool.c:658:38: error: ‘PTHREAD_MUTEX_RECURSIVE’ undeclared (first use in this function) pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE); ^ thread_pool.c:658:38: note: each undeclared identifier is reported only once for each function it appears in make[2]: *** [thread_pool.o] Error 1 make[2]: Leaving directory svaba/SeqLib/htslib'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `svaba'
make: *** [all] Error 2

It seems the function has not been defined. Can you please suggest a way out of it.
Thanks

Deterministic Output?

Hi,

I'm wondering if svaba is expected to have deterministic output, or if it's possible to get deterministic output? I've run svaba with the same settings on the same tumor-normal pairing and am noticing a few differences between runs. For example, I found a structural variant that's listed in the PASS somatic calls in one run, but not in the PASS somatic calls in the second run. However, this SV is found in the germline PASS calls in the second run even though it shows a somatic genotype.

Another SV was missed due to a low alignment score in one run, but not the other.

Thanks,
Molly

Snowman VCF Visualisation in IGV

HI,

I am processsing ICGC broad snowman indel calls and I dont see genotype column filled. But when I check snv_mnv.vcf I see that each genotype has a information under it. When I remove those two genotype columns from the indelvcf, problem is solved( I visualised vcf in IGV).

My question is;
Do I loose any information by removing those two empty genotype columns in IGV ?? Also, are these columns left empty as a result of an error or something?

Best regards,

Tunc.

this is indel vcf sample.

Tuncs-MacBook-Pro:CallerComparison morova$ more /Users/morova/Desktop/CallerComparison/patient.somatic.indel.vcf
##fileformat=VCFv4.2
##fileDate=20151212
##source=snowmanSV
##reference=hg19
##INFO=<ID=NFRAC,Number=1,Type=String,Description="Normal allelic fraction at break. -1 for undefined">
##INFO=<ID=NSPLIT,Number=1,Type=Integer,Description="Number of normal reads with read-to-contig alignments supporting this indel">
##INFO=<ID=SPAN,Number=1,Type=Integer,Description="Size of the indel">
##INFO=<ID=TCOV,Number=1,Type=Integer,Description="Tumor coverage at break">
##INFO=<ID=MAPQ,Number=1,Type=Integer,Description="Mapping quality (BWA-MEM) of the assembled contig">
##INFO=<ID=TCIGAR,Number=1,Type=Integer,Description="Number of tumor reads with cigar strings supporting this indel">
##INFO=<ID=BLACKLIST,Number=1,Type=String,Description="Normal allelic fraction at break. -1 for undefined">
##INFO=<ID=NCIGAR,Number=1,Type=Integer,Description="Number of normal reads with cigar strings supporting this indel">
##INFO=<ID=REPSEQ,Number=1,Type=String,Description="Repeat sequence near the event">
##INFO=<ID=TSPLIT,Number=1,Type=Integer,Description="Number of tumor reads with read-to-contig alignments supporting this indel">
##INFO=<ID=NCOV,Number=1,Type=Integer,Description="Normal coverage at break">
##INFO=<ID=TFRAC,Number=1,Type=String,Description="Tumor allelic fraction at break. -1 for undefined">
##FILTER=<ID=PASS,Description="3+ split reads, 60 contig MAPQ">
##FILTER=<ID=REPEAT,Description="3+ split reads, 60 contig MAPQ">
##FILTER=<ID=WEAKCIGARMATCH,Description="For indels <= 5 bp, require 8+ split reads or 4+ and 3+ cigar matches">
##FILTER=<ID=WEAKASSEMBLY,Description="4 or fewer split reads">
##FILTER=<ID=LOWMAPQ,Description="Assembly contig has less than MAPQ 60">
##FORMAT=<ID=NALT,Number=1,Type=Integer,Description="Number of ALT support reads or pairs">
##FORMAT=<ID=NREF,Number=1,Type=Integer,Description="Number of REF support Reads">
##FORMAT=<ID=NALT_SR,Number=1,Type=Integer,Description="Number of ALT support Split Reads">
##FORMAT=<ID=NALT_RP,Number=1,Type=Integer,Description="Number of ALT support aberrant Read Pairs">
##FORMAT=<ID=READ_ID,Number=.,Type=String,Description="ALT supporting Read IDs">
##SAMPLE=<ID=###T>
##SAMPLE=<ID=###N>
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ###N   ###T
1       3789178 1741:no_id      G       GGGCCCTGTCCCTGCCGC      .       PASS    BLACKLIST=1;EVDNC=INDEL;INSERTION=GGCCCTGTCCCTGCCGC;MAPQ=60;NCIGAR=0;NCOV=39;NDISC=0;NFRAC=0;NSPLIT=0;PONCOUNT=0;SCTG=c_1_3785001_3791001_48;SPAN=17;TCIGAR=37;TCOV=73;TDISC=0;TFRAC=0.506849;TSPLIT=31

snv_mnv vcf

Tuncs-MacBook-Pro:0db91c73-b2e7-4f00-8cea-0b7b3d48451e morova$ more /Users/morova/Desktop/CallerComparison/samepatient.somatic.snv_mnv.vcf
##fileformat=VCFv4.1
##oncotator_version=Oncotator_v1.8.0.0_|
##INFO=<ID=Verification_Status,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Match_Norm_Seq_Allele2,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Validation_Method,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Match_Norm_Seq_Allele1,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Score,Number=.,Type=String,Description="Unknown">
##INFO=<ID=sequencer,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Validation_Status,Number=.,Type=String,Description="Unknown">
##INFO=<ID=tumor_f,Number=.,Type=String,Description="Unknown">
##INFO=<ID=BAM_file,Number=.,Type=String,Description="Unknown">
##INFO=<ID=source,Number=.,Type=String,Description="Unknown">
##INFO=<ID=t_lod_fstar,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Strand,Number=.,Type=String,Description="Unknown">
##INFO=<ID=status,Number=.,Type=String,Description="Unknown">
##INFO=<ID=init_t_lod,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Center,Number=.,Type=String,Description="Unknown">
##INFO=<ID=PoN_remove,Number=.,Type=String,Description="Unknown">
##INFO=<ID=judgement,Number=.,Type=String,Description="Unknown">
##INFO=<ID=NCBI_Build,Number=.,Type=String,Description="Unknown">
##INFO=<ID=phase,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Tumor_Validation_Allele1,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Tumor_Validation_Allele2,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Match_Norm_Validation_Allele2,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Match_Norm_Validation_Allele1,Number=.,Type=String,Description="Unknown">
##FORMAT=<ID=alt_count,Number=.,Type=String,Description="Unknown">
##FORMAT=<ID=ref_count,Number=.,Type=String,Description="Unknown">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  CPCG0206-B1-CPCG0206-F1
1       729633  .       G       A       .       PASS    PoN_remove=False;Validation_Method=none;sequencer=Illumina_GAIIx;Validation_Status=Untested;tumor_f=0.166667;source=WGS;Strand=+;status=Somatic;init_t_lod=19.297413;Center=broad.mit.edu;t_lod_fstar=26.240323;judgement=KEEP;NCBI_Build=37;phase=Phase_I      GT:alt_count:ref_count  ./.:11:55

Multi-threading not working

Hello,
I've installed svABA with gcc 4.9.2 on centos 6.2
If I run without -p option, I got:

-----------------------------------------------------------
--- Running svaba SV and indel detection on 1 threads ---
---    (inspect *.log for real-time progress updates)   ---
-----------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
--- Loaded non-read data. Starting detection pipeline
 Caught exception for lregion with reg 1:1-25,001(*)
Caught exception in BreakPoint:setRefAlt for indel ref: RefGenome::queryRegion - Returning empty query on 1:10396-10402
Caught exception in BreakPoint:setRefAlt for indel ref: RefGenome::queryRegion - Returning empty query on 1:10402-10439
Caught exception in BreakPoint:setRefAlt for indel ref: RefGenome::queryRegion - Returning empty query on 1:10408-10439
...

but with -p 8 option, program fails :

-----------------------------------------------------------
--- Running svaba SV and indel detection on 4 threads ---
---    (inspect *.log for real-time progress updates)   ---
-----------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
--- Loaded non-read data. Starting detection pipeline
 Caught exception for lregion with reg 1:49,001-74,001(*)
Floating point exception

Do you have any idea of the problem's origin ?
I've attached log files of configure and make commands if it can help.

Thanks in advance for any help !

Anne-Sophie

svaba_config.log
svaba_install.log

can't install on Mac

Hello
I have tried on two macs running El Capitan
it fails after the "make" command

../../depcomp: line 611: /Users/christos/svaba/compile: No such file or directory
../../depcomp: line 611: exec: /Users/christos/svaba/compile: cannot execute: No such file or directory
make[2]: *** [libseqlib_a-ssw.o] Error 126
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

any suggestions?
thanks
christos

this is the whole output in case it helps
Christoss-MacBook:~ christos$ cd svaba
Christoss-MacBook:svaba christos$ ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
/Users/christos/svaba/missing: Unknown --is-lightweight' option Try /Users/christos/svaba/missing --help' for more information
configure: WARNING: 'missing' script is too old or missing
checking for a thread-safe mkdir -p... ./install-sh -c -d
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking for style of include used by make... GNU
checking dependency style of g++... gcc3
checking for gcc... gcc-5.3
checking whether we are using the GNU C compiler... no
checking whether gcc-5.3 accepts -g... no
checking for gcc-5.3 option to accept ISO C89... unsupported
checking whether gcc-5.3 understands -c and -o together... no
checking dependency style of /Users/christos/svaba/compile gcc-5.3... none
checking for ranlib... ranlib
checking how to run the C++ preprocessor... g++ -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking zlib.h usability... yes
checking zlib.h presence... yes
checking for zlib.h... yes
checking for library containing gzopen... -lz
checking for library containing clock_gettime... no
checking google/sparse_hash_set usability... no
checking google/sparse_hash_set presence... no
checking for google/sparse_hash_set... no
checking google/sparse_hash_map usability... no
checking google/sparse_hash_map presence... no
checking for google/sparse_hash_map... no
checking unordered_map usability... yes
checking unordered_map presence... yes
checking for unordered_map... yes
checking tr1/unordered_map usability... no
checking tr1/unordered_map presence... no
checking for tr1/unordered_map... no
checking ext/hash_map usability... yes
checking ext/hash_map presence... yes
checking for ext/hash_map... yes
checking unordered_set usability... yes
checking unordered_set presence... yes
checking for unordered_set... yes
checking tr1/unordered_set usability... no
checking tr1/unordered_set presence... no
checking for tr1/unordered_set... no
checking ext/hash_set usability... yes
checking ext/hash_set presence... yes
checking for ext/hash_set... yes
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating SeqLib/src/Makefile
config.status: creating src/SGA/Util/Makefile
config.status: creating src/SGA/SQG/Makefile
config.status: creating src/SGA/Bigraph/Makefile
config.status: creating src/SGA/Algorithm/Makefile
config.status: creating src/SGA/StringGraph/Makefile
config.status: creating src/SGA/SuffixTools/Makefile
config.status: creating src/SGA/SGA/Makefile
config.status: creating src/svaba/Makefile
config.status: creating config.h
config.status: executing depfiles commands
Christoss-MacBook:svaba christos$ make
/Applications/Xcode.app/Contents/Developer/usr/bin/make all-recursive
Making all in SeqLib/fermi-lite
gcc -c -g -Wall -O2 -Wno-unused-function kthread.c -o kthread.o
gcc -c -g -Wall -O2 -Wno-unused-function misc.c -o misc.o
gcc -c -g -Wall -O2 -Wno-unused-function bseq.c -o bseq.o
gcc -c -g -Wall -O2 -Wno-unused-function htab.c -o htab.o
gcc -c -g -Wall -O2 -Wno-unused-function bfc.c -o bfc.o
gcc -c -g -Wall -O2 -Wno-unused-function rle.c -o rle.o
gcc -c -g -Wall -O2 -Wno-unused-function rope.c -o rope.o
gcc -c -g -Wall -O2 -Wno-unused-function mrope.c -o mrope.o
gcc -c -g -Wall -O2 -Wno-unused-function rld0.c -o rld0.o
gcc -c -g -Wall -O2 -Wno-unused-function unitig.c -o unitig.o
gcc -c -g -Wall -O2 -Wno-unused-function mag.c -o mag.o
gcc -c -g -Wall -O2 -Wno-unused-function bubble.c -o bubble.o
gcc -c -g -Wall -O2 -Wno-unused-function f_ksw.c -o f_ksw.o
ar -csru libfml.a kthread.o misc.o bseq.o htab.o bfc.o rle.o rope.o mrope.o rld0.o unitig.o mag.o bubble.o f_ksw.o
gcc -c -g -Wall -O2 -Wno-unused-function example.c -o example.o
gcc -g -Wall -O2 -Wno-unused-function libfml.a example.o -o fml-asm -L. -lfml -lm -lz -lpthread
Making all in SeqLib/htslib
echo '/* Empty config.h generated by Makefile /' > config.h
gcc -g -Wall -O2 -I. -c -o kfunc.o kfunc.c
gcc -g -Wall -O2 -I. -c -o knetfile.o knetfile.c
gcc -g -Wall -O2 -I. -c -o kstring.o kstring.c
gcc -g -Wall -O2 -I. -c -o bgzf.o bgzf.c
gcc -g -Wall -O2 -I. -c -o errmod.o errmod.c
gcc -g -Wall -O2 -I. -c -o faidx.o faidx.c
gcc -g -Wall -O2 -I. -c -o hfile.o hfile.c
gcc -g -Wall -O2 -I. -c -o hfile_net.o hfile_net.c
echo '#define HTS_VERSION "1.3.2-147-g293a426"' > version.h
gcc -g -Wall -O2 -I. -c -o hts.o hts.c
hts.c:51:5: warning: unused function 'ks_getc' [-Wunused-function]
KSTREAM_INIT2(, BGZF
, bgzf_read, 65536)
^
./htslib/kseq.h:152:2: note: expanded from macro 'KSTREAM_INIT2'
__KS_INLINED(__read)
^
./htslib/kseq.h:68:20: note: expanded from macro '__KS_INLINED'
static inline int ks_getc(kstream_t *ks)
^
1 warning generated.
gcc -g -Wall -O2 -I. -c -o md5.o md5.c
gcc -g -Wall -O2 -I. -c -o probaln.o probaln.c
gcc -g -Wall -O2 -I. -c -o realn.o realn.c
gcc -g -Wall -O2 -I. -c -o regidx.o regidx.c
gcc -g -Wall -O2 -I. -c -o sam.o sam.c
gcc -g -Wall -O2 -I. -c -o synced_bcf_reader.o synced_bcf_reader.c
gcc -g -Wall -O2 -I. -c -o vcf_sweep.o vcf_sweep.c
gcc -g -Wall -O2 -I. -c -o tbx.o tbx.c
gcc -g -Wall -O2 -I. -c -o thread_pool.o thread_pool.c
gcc -g -Wall -O2 -I. -c -o vcf.o vcf.c
gcc -g -Wall -O2 -I. -c -o vcfutils.o vcfutils.c
gcc -g -Wall -O2 -I. -c -o cram/cram_codecs.o cram/cram_codecs.c
gcc -g -Wall -O2 -I. -c -o cram/cram_decode.o cram/cram_decode.c
gcc -g -Wall -O2 -I. -c -o cram/cram_encode.o cram/cram_encode.c
gcc -g -Wall -O2 -I. -c -o cram/cram_external.o cram/cram_external.c
gcc -g -Wall -O2 -I. -c -o cram/cram_index.o cram/cram_index.c
gcc -g -Wall -O2 -I. -c -o cram/cram_io.o cram/cram_io.c
gcc -g -Wall -O2 -I. -c -o cram/cram_samtools.o cram/cram_samtools.c
gcc -g -Wall -O2 -I. -c -o cram/cram_stats.o cram/cram_stats.c
gcc -g -Wall -O2 -I. -c -o cram/files.o cram/files.c
gcc -g -Wall -O2 -I. -c -o cram/mFILE.o cram/mFILE.c
gcc -g -Wall -O2 -I. -c -o cram/open_trace_file.o cram/open_trace_file.c
gcc -g -Wall -O2 -I. -c -o cram/pooled_alloc.o cram/pooled_alloc.c
gcc -g -Wall -O2 -I. -c -o cram/rANS_static.o cram/rANS_static.c
gcc -g -Wall -O2 -I. -c -o cram/sam_header.o cram/sam_header.c
gcc -g -Wall -O2 -I. -c -o cram/string_alloc.o cram/string_alloc.c
gcc -g -Wall -O2 -I. -c -o cram/vlen.o cram/vlen.c
gcc -g -Wall -O2 -I. -c -o cram/zfio.o cram/zfio.c
ar -rc libhts.a kfunc.o knetfile.o kstring.o bgzf.o errmod.o faidx.o hfile.o hfile_net.o hts.o md5.o probaln.o realn.o regidx.o sam.o synced_bcf_reader.o vcf_sweep.o tbx.o thread_pool.o vcf.o vcfutils.o cram/cram_codecs.o cram/cram_decode.o cram/cram_encode.o cram/cram_external.o cram/cram_index.o cram/cram_io.o cram/cram_samtools.o cram/cram_stats.o cram/files.o cram/mFILE.o cram/open_trace_file.o cram/pooled_alloc.o cram/rANS_static.o cram/sam_header.o cram/string_alloc.o cram/vlen.o cram/zfio.o
ranlib libhts.a
gcc -dynamiclib -install_name /usr/local/lib/libhts.2.dylib -current_version 1.3.255 -compatibility_version 2 -o libhts.dylib kfunc.o knetfile.o kstring.o bgzf.o errmod.o faidx.o hfile.o hfile_net.o hts.o md5.o probaln.o realn.o regidx.o sam.o synced_bcf_reader.o vcf_sweep.o tbx.o thread_pool.o vcf.o vcfutils.o cram/cram_codecs.o cram/cram_decode.o cram/cram_encode.o cram/cram_external.o cram/cram_index.o cram/cram_io.o cram/cram_samtools.o cram/cram_stats.o cram/files.o cram/mFILE.o cram/open_trace_file.o cram/pooled_alloc.o cram/rANS_static.o cram/sam_header.o cram/string_alloc.o cram/vlen.o cram/zfio.o -lz
ln -sf libhts.dylib libhts.2.dylib
gcc -g -Wall -O2 -I. -c -o bgzip.o bgzip.c
gcc -pthread -o bgzip bgzip.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o htsfile.o htsfile.c
gcc -pthread -o htsfile htsfile.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o tabix.o tabix.c
gcc -pthread -o tabix tabix.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/fieldarith.o test/fieldarith.c
gcc -pthread -o test/fieldarith test/fieldarith.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/hfile.o test/hfile.c
gcc -pthread -o test/hfile test/hfile.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/sam.o test/sam.c
gcc -pthread -o test/sam test/sam.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/test-regidx.o test/test-regidx.c
gcc -pthread -o test/test-regidx test/test-regidx.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/test_view.o test/test_view.c
gcc -pthread -o test/test_view test/test_view.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/test-vcf-api.o test/test-vcf-api.c
gcc -pthread -o test/test-vcf-api test/test-vcf-api.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/test-vcf-sweep.o test/test-vcf-sweep.c
gcc -pthread -o test/test-vcf-sweep test/test-vcf-sweep.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
Making all in SeqLib/bwa
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS utils.c -o utils.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS kthread.c -o kthread.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS ksw.c -o ksw.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwt.c -o bwt.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bntseq.c -o bntseq.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwa.c -o bwa.o
bwa.c:147:18: warning: absolute value function 'abs' given an argument of type 'long long' but has parameter of
type 'int' which may cause truncation of value [-Wabsolute-value]
w = (max_gap + abs(rlen - l_query) + 1) >> 1;
^
bwa.c:147:18: note: use function 'llabs' instead
w = (max_gap + abs(rlen - l_query) + 1) >> 1;
^~~
llabs
bwa.c:149:11: warning: absolute value function 'abs' given an argument of type 'long long' but has parameter of
type 'int' which may cause truncation of value [-Wabsolute-value]
min_w = abs(rlen - l_query) + 3;
^
bwa.c:149:11: note: use function 'llabs' instead
min_w = abs(rlen - l_query) + 3;
^~~
llabs
2 warnings generated.
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwamem.c -o bwamem.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwamem_pair.c -o bwamem_pair.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwamem_extra.c -o bwamem_extra.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS malloc_wrap.c -o malloc_wrap.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS is.c -o is.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtindex.c -o bwtindex.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS rope.c -o rope.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS rle.c -o rle.o
ar -csru libbwa.a utils.o kthread.o ksw.o bwt.o bntseq.o bwa.o bwamem.o bwamem_pair.o bwamem_extra.o malloc_wrap.o is.o bwtindex.o rope.o rle.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwashm.c -o bwashm.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwase.c -o bwase.o
bwase.c:181:57: warning: absolute value function 'abs' given an argument of type 'long long' but has parameter
of type 'int' which may cause truncation of value [-Wabsolute-value]
ksw_global(len, seq, rlen, rseq, 5, mat, 5, 1, SW_BW > abs(rlen - len) * 1.5? SW_BW : abs(rlen ...
^
bwase.c:181:57: note: use function 'llabs' instead
ksw_global(len, seq, rlen, rseq, 5, mat, 5, 1, SW_BW > abs(rlen - len) * 1.5? SW_BW : abs(rlen ...
^~~
llabs
bwase.c:181:88: warning: absolute value function 'abs' given an argument of type 'long long' but has parameter
of type 'int' which may cause truncation of value [-Wabsolute-value]
...seq, rlen, rseq, 5, mat, 5, 1, SW_BW > abs(rlen - len) * 1.5? SW_BW : abs(rlen - len) * 1.5, n_cigar, &...
^
bwase.c:181:88: note: use function 'llabs' instead
...seq, rlen, rseq, 5, mat, 5, 1, SW_BW > abs(rlen - len) * 1.5? SW_BW : abs(rlen - len) * 1.5, n_cigar, &...
^~~
llabs
2 warnings generated.
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwaseqio.c -o bwaseqio.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtgap.c -o bwtgap.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtaln.c -o bwtaln.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bamlite.c -o bamlite.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwape.c -o bwape.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS kopen.c -o kopen.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS pemerge.c -o pemerge.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS maxk.c -o maxk.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtsw2_core.c -o bwtsw2_core.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtsw2_main.c -o bwtsw2_main.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtsw2_aux.c -o bwtsw2_aux.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwt_lite.c -o bwt_lite.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtsw2_chain.c -o bwtsw2_chain.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS fastmap.c -o fastmap.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtsw2_pair.c -o bwtsw2_pair.o
bwtsw2_pair.c:239:13: warning: using floating point absolute value function 'fabs' when argument is of integer
type [-Wabsolute-value]
diff = fabs(G[0] - G[1]) / (opt->a + opt->b) / ((hits[i]->hits[...
^
bwtsw2_pair.c:239:13: note: use function 'abs' instead
diff = fabs(G[0] - G[1]) / (opt->a + opt->b) / ((hits[i]->hits[...
^~~~
abs
1 warning generated.
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS main.c -o main.o
gcc -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwashm.o bwase.o bwaseqio.o bwtgap.o bwtaln.o bamlite.o bwape.o kopen.o pemerge.o maxk.o bwtsw2_core.o bwtsw2_main.o bwtsw2_aux.o bwt_lite.o bwtsw2_chain.o fastmap.o bwtsw2_pair.o main.o -o bwa -L. -lbwa -lm -lz -lpthread
Making all in SeqLib/src
g++ -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -g -Wall -Wextra -Wno-unknown-pragmas -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -g -O2 -MT libseqlib_a-FastqReader.o -MD -MP -MF .deps/libseqlib_a-FastqReader.Tpo -c -o libseqlib_a-FastqReader.o test -f 'FastqReader.cpp' || echo './'FastqReader.cpp
mv -f .deps/libseqlib_a-FastqReader.Tpo .deps/libseqlib_a-FastqReader.Po
g++ -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -g -Wall -Wextra -Wno-unknown-pragmas -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -g -O2 -MT libseqlib_a-BFC.o -MD -MP -MF .deps/libseqlib_a-BFC.Tpo -c -o libseqlib_a-BFC.o test -f 'BFC.cpp' || echo './'BFC.cpp
mv -f .deps/libseqlib_a-BFC.Tpo .deps/libseqlib_a-BFC.Po
g++ -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -g -Wall -Wextra -Wno-unknown-pragmas -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -g -O2 -MT libseqlib_a-ReadFilter.o -MD -MP -MF .deps/libseqlib_a-ReadFilter.Tpo -c -o libseqlib_a-ReadFilter.o test -f 'ReadFilter.cpp' || echo './'ReadFilter.cpp
mv -f .deps/libseqlib_a-ReadFilter.Tpo .deps/libseqlib_a-ReadFilter.Po
g++ -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -g -Wall -Wextra -Wno-unknown-pragmas -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -g -O2 -MT libseqlib_a-SeqPlot.o -MD -MP -MF .deps/libseqlib_a-SeqPlot.Tpo -c -o libseqlib_a-SeqPlot.o test -f 'SeqPlot.cpp' || echo './'SeqPlot.cpp
mv -f .deps/libseqlib_a-SeqPlot.Tpo .deps/libseqlib_a-SeqPlot.Po
g++ -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -g -Wall -Wextra -Wno-unknown-pragmas -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -g -O2 -MT libseqlib_a-jsoncpp.o -MD -MP -MF .deps/libseqlib_a-jsoncpp.Tpo -c -o libseqlib_a-jsoncpp.o test -f 'jsoncpp.cpp' || echo './'jsoncpp.cpp
mv -f .deps/libseqlib_a-jsoncpp.Tpo .deps/libseqlib_a-jsoncpp.Po
g++ -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -g -Wall -Wextra -Wno-unknown-pragmas -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -g -O2 -MT libseqlib_a-ssw_cpp.o -MD -MP -MF .deps/libseqlib_a-ssw_cpp.Tpo -c -o libseqlib_a-ssw_cpp.o test -f 'ssw_cpp.cpp' || echo './'ssw_cpp.cpp
ssw_cpp.cpp:449:19: warning: unused parameter 'score_matrix_size' [-Wunused-parameter]
const int& score_matrix_size,
^
1 warning generated.
mv -f .deps/libseqlib_a-ssw_cpp.Tpo .deps/libseqlib_a-ssw_cpp.Po
source='ssw.c' object='libseqlib_a-ssw.o' libtool=no
DEPDIR=.deps depmode=none /bin/sh ../../depcomp
/Users/christos/svaba/compile gcc-5.3 -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -c -o libseqlib_a-ssw.o test -f 'ssw.c' || echo './'ssw.c
../../depcomp: line 611: /Users/christos/svaba/compile: No such file or directory
../../depcomp: line 611: exec: /Users/christos/svaba/compile: cannot execute: No such file or directory
make[2]: *** [libseqlib_a-ssw.o] Error 126
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

DP=0, AD>0 and Incorrect Genotypes?

Hi Jeremiah,

I've been seeing a few examples of somatic SVs where AD>0 and DP=0. I saw from issue #29 that AD can be greater than DP since discordant reads spanning the breakpoint are counted towards AD, but not necessarily DP. However, looking in IGV this variant seems to also have non-discordantly mapped reads lining up on the breakpoint, so I'm wondering why DP=0? I'm also curious why the genotype would be 1/1 here, and not 0/1?

screen shot 2018-09-04 at 9 55 59 am

Top track - tumor

Bottom track - normal
screen shot 2018-09-04 at 10 20 22 am
Top track - tumor
Bottom track - normal
screen shot 2018-09-04 at 10 20 38 am

I've also seen some other examples where the genotype doesn't seem to match what I'm seeing in IGV. For example, this variant is called as somatic with genotype of 0/0 in normal and 0/1 in tumor, but in IGV it looks like the genotype should be 1/1 in both the normal and the tumor.

screen shot 2018-09-04 at 10 25 13 am
Top track - tumor
Bottom track - normal
screen shot 2018-09-04 at 10 29 28 am

Thanks in advance!

Query regarding SV calls

Hello,

I ran svaba on few BAM files (similar samples) . I observed that a SV call is made in some samples and not in others, even though the discordant reads can be seen in the BAM files for all samples in IGV. Is there any filtering criteria or cut-off used to decide when a SV call is made and written to the *sv.vcf files? (Mode: tumor only )

I would appreciate your help.

Thanks!

Parameter for limitation of Weired read

Hi Jeremiah,

I am trying to apply SvABA to high depth targeted panel data (~1000X), especially highly fragmented FFPE tissue. Due to the numerous split reads of FFPE, SvABA printed out these messages when it ran.

breaking at 2:11,166,812(-) in window 2:11,166,378-11,167,271() with 401 weird reads. Limit: 400 breaking at 2:11,171,138(-) in window 2:11,170,834-11,171,424() with 401 weird reads. Limit: 400 breaking at 2:11,173,008(+) in window 2:11,172,546-11,173,220(*) with 401 weird reads. Limit: 400
.
.

I could not find the proper parameter to optimize these issues. How I optimize SvABA for FFPE samples with many weird reads? and does SvABA have any problems running without control data?

Thanks

Hyun-Tae

a question in 'alignment.txt'

Hi Walaj,

I have a question regarding interpreting the alignment file.
I am looking at one contig's alignment.

The "Global BP" line show presence of a breakpoint between - strand on 7:145,643,321 and + strand on 7:145,913,731:
Global BP: : 7:145,643,321(-) to 7:145,913,731(+) SPAN 270410

However, the corresponding alignment says the breakpoint is between both + strands:
image

Would you please give me a short description on what those strands mean?

Thank you in advance,

How the -k option affects SV call results

Hi Jeremiah,

I was wondering how the -k option affects SV call results. When specified, will local assembly results or any other stats derived from input data be affected? Or does the option only affect reporting?

In other words, will the results of the following runs be the same?

  1. Run SvABA with -k ${target.bed}
  2. Run SvABA without -k and then extract the SV calls overlapping with ${target.bed}

Thanks!
-Junko

[E::hts_open_format] fail to open file

Hello,

I'm sure this is something small and insignificant but I am trying to run SvABA for the first time and the following command gives me the following error return:

svaba/bin/svaba run -t /mnt/scratch/users/40057929/MCLHighNov2016/Batch1/04-0309-01-36351377/04-0309-01_S2.bam/ n- /mnt/scratch/users/40057929/MCL*/MCL*/12-2164*/12-2164-01_MRDneg_S1.bam/ -G /mnt/scratch/users/40057929/wg.fa/ -a svaba_Batch1_0309 -p 4

--- Running svaba SV and indel detection on 4 threads ---
--- (inspect *.log for real-time progress updates) ---

[E::hts_open_format] fail to open file '/mnt/scratch/users/40057929/MCLHighNov2016/Batch1/04-0309-01-36351377/04-0309-01_S2.bam/'
ERROR: Cannot open main bam file: /mnt/scratch/users/40057929/MCLHighNov2016/Batch1/04-0309-01-36351377/04-0309-01_S2.bam/

I've checked the file path for the main tumour bam file and it is definitely correct. Any other advice? Let me know if you need any other information.

Thanks,
rmurphy49

Chromosome name is 23 assertion error

Hi @walaj. I've encountered a problem when running svaba on non-human samples, namely that the program exits with an assertion error when it finds a breakpoint on chromosome 23:

svaba: BreakPoint.cpp:600: BreakEnd::BreakEnd(const SeqLib::BamRecord&): Assertion `chr_name != "23"' failed.

My samples are dog samples, mapped to CanFam3.1, so they have 38 chromosomes (+ X). What is the quickest way for me to get around this problem?

Thanks,
Kevin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.