fritzsedlazeck / survivor Goto Github PK
View Code? Open in Web Editor NEWToolset for SV simulation, comparison and filtering
License: MIT License
Toolset for SV simulation, comparison and filtering
License: MIT License
Hi
The R-script which is supposed to visualize the summary output is not working.
Error in Math.factor(t[, 1]) : 'log10' not meaningful for factors
Which is not surprising as it takes the first column for the x-axis and transforms it into log10:
t=read.table(file,header=T)
plot(log10(t[,1]),t[,2],xlab="log10(length(bp))",ylab="# of SVs",type='l',col=cols[1],main=fie)
But the format of the file has changed (at least that's my guess):
pbsv.summary
Len Del Dup Inv INS TRA UNK
0-50bp 0 0 0 0 0 0
50-100bp 1038 0 0 2875 0 0
100-1000bp 1162 0 0 2507 0 0
1000-10000bp 241 0 0 108 0 0
10000+bp 0 0 0 0 0 0
As it is now a string and no number any-more the match function reports an error
000000F_002|arrow | 3616 | TRA002SUR | N | | . | PASS | SUPP=1;SUPP_VEC=01;AVGLEN=100000;SVTYPE=TRA;SVMETHOD=SURVIVORv2;CHR2=000000F_080|arrow;END=778870;STRANDS=+- | GT:LN:DR:ST:TY:CO | ./.:0:0,0:--:NaN:NaN | 1/1:6507834:0,15:+-:TRA:000000F_002|arrow_3616-000000F_080|arrow_778870
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
But venn shows both samples have this event
[jiwu2573@login4 ngmlr-0.2.6]$ head w53w104tow64r13_new_merged.vcf_venn
IdentifierW53tow64r13_new.vcf W104tow64r13_new.vcf Length
TRA002SUR 1 1 100000
Hey there Fritz,
In your newest release there is an issue with dropping of INS calls during the vcftobed conversion. This is related to issue #14 where you were fixing the names of the variants being lifted from VCF to the bedpe step.
Can we get an update where the INS are also represented within the bedpe output? Not sure exactly why there were lost.
Thanks,
Phil
Hi fritz.
I tried to make simulated reads using SURVIVOR simSV
I created SV parameter file as bellow:
$ cat parameter_file
PARAMETER FILE: DO JUST MODIFY THE VALUES AND KEEP THE SPACES!
DUPLICATION_minimum_length: 100
DUPLICATION_maximum_length: 10000
DUPLICATION_number: 3
INDEL_minimum_length: 20
INDEL_maximum_length: 500
INDEL_number: 1
TRANSLOCATION_minimum_length: 1000
TRANSLOCATION_maximum_length: 3000
TRANSLOCATION_number: 2
INVERSION_minimum_length: 600
INVERSION_maximum_length: 800
INVERSION_number: 4
INV_del_minimum_length: 600
INV_del_maximum_length: 800
INV_del_number: 2
INV_dup_minimum_length: 600
INV_dup_maximum_length: 800
INV_dup_number: 2
and, I run this command:
SURVIVOR simSV ~/human_reference_genome/GRCh38_p12_chr22.fa parameter_file 0.1 0 simulated
This script takes more than 2 days for generating SVs.
Can you explain how SURVIVOR simSV
works and why it takes so long time?
I imaged that just modifying reference sequence doesn't take so long time.
Sorry for bothering you.
Thanks.
harazono
Hi,
This tools is quite good in merging different call sets. I have merged 336 samples into one vcf file effectively. However, there is still one problem for me. In the merged vcf file, it only shows the 0/1, 1/1 ./. genotypes without 0/0. The problem is how to determine the ./. genotypes belong to. Are they belong to 0/0 or they are definitely ./. type ? Do you have some resolutions on this?
Cheers,
Hi Fritz,
Despite this:
convert << ";AVGLEN=";
if(entry->type == 0){
convert << get_avglen(entry->caller_info)*-1;
}else if (entry->type != 3) {
convert << get_avglen(entry->caller_info);
} else {
convert << "0"; // TODO think about it.
}
I am running into occurences of merged deletions that have a positive AVGLEN.
This is with the latest 1.0.5 version.
Hi Fritz,
I came across a strange behavior today, when using SURVIVOR merge. This is the command I used:
SURVIVOR merge input_files 500 1 1 0 0 20 merged.vcf
Meaning that taking the SV type into account is turned on.
My input contains 2 sorted VCF files, and I know that the first one does not contain any SVTYPE=DUP, while the second does.
However, in the output I have 4 merged SVs with support vector 11 and SVTYPE=DUP. In fact from the VCF lines below it is visible that 3 INS and 1 DEL got merged with DUPs from the second set:
chr13 85730064 DUP0022605SUR N <DUP> . PASS SUPP=2;SUPP_VEC=11;AVGLEN=902;SVTYPE=DUP;SVMETHOD=SURVIVORv2;CHR2=chr13;END=85730228;CIPOS=0,164;CIEND=0,0;STRANDS=++ GT:PSV:LN:DR:ST:TY:CO 0/1:NA:1640:0,0:+-:INS:chr13_85730228-chr13_85730228 0/1:010:164:0,0:++:DUP:chr13_85730064-chr13_85730228
chr14 88530275 DUP0025745SUR N <DUP> . PASS SUPP=2;SUPP_VEC=11;AVGLEN=220.5;SVTYPE=DUP;SVMETHOD=SURVIVORv2;CHR2=chr14;END=88530492;CIPOS=0,217;CIEND=0,0;STRANDS=-+ GT:PSV:LN:DR:ST:TY:CO 0/1:NA:224:0,0:+-:INS:chr14_88530492-chr14_88530492 0/1:010:217:0,0:-+:DUP:chr14_88530275-chr14_88530492
chr3 195501349 DUP0060843SUR N <DUP> . PASS SUPP=2;SUPP_VEC=11;AVGLEN=-197;SVTYPE=DUP;SVMETHOD=SURVIVORv2;CHR2=chr3;END=195501705;CIPOS=0,318;CIEND=0,0;STRANDS=-+ GT:PSV:LN:DR:ST:TY:CO 0/1:NA:38:0,0:+-:DEL:chr3_195501667-chr3_195501705 0/1:001:356:0,0:-+:DUP:chr3_195501349-chr3_195501705
chr8 51696645 DUP0085793SUR N <DUP> . PASS SUPP=2;SUPP_VEC=11;AVGLEN=322;SVTYPE=DUP;SVMETHOD=SURVIVORv2;CHR2=chr8;END=51696870;CIPOS=0,225;CIEND=0,0;STRANDS=-+ GT:PSV:LN:DR:ST:TY:CO 1/1:NA:419:0,0:+-:INS:chr8_51696870-chr8_51696870 0/1:011:225:0,0:-+:DUP:chr8_51696645-chr8_51696870
Is there an explanation for this behavior or is it a bug?
Thanks :)
Marc
Hi,
Thank you so much for your effort to maintain this wonderful software.
I'd like to ask how to use ID for SURVIVOR option.
My goal is to convert Assemblytics output (bed) to a vcf file for SURVIVOR_ant, then use SURVIVOR_ant to annotate my structural variation. So, first I tried ./SURVIVOR bedtovcf assemblytics.output.bed DEL output.vcf
to convert bed to vcf, but it changed all variant types into deletion. I tried ALL, All, all, etc., but it showed
Bed file
Type
Output vcf file
repeatedly. I would be grateful if you added other explanations how to use it properly, which part should I change, or what Type
I can use, etc.
Second, I tried to use ./SURVIVOR ID
, so I replaced ID to the number 11: Convert SV calls from Assemblytics: ./SURVIVOR 11
However, it didn't show any informative error message, but showed the same message with typing ./SURVIVOR
. Could you explain how to use ./SURVIVOR
properly?
Yours sincerely,
Thank you.
Jun.
Hi Fritz,
As a follow up on fritzsedlazeck/Sniffles#61, I saw you suggested:
Step 3:
SURVIVOR merge files 1000 1 1 0 0 0 output_merged.vcf
Step 5:
SURVIVOR merge 1000 0 1 0 0 0 output_genotyped_merged.vcf
I assume in step 5 you missed the argument specifying the input vcfs and meant:
SURVIVOR merge files_genotyped 1000 0 1 0 0 0 output_genotyped_merged.vcf
But can you explain why in step 3 you recommend 1
for "Minimum number of supporting caller" and 0
in step 5? I'd expect both of them to be "1".
Cheers,
Wouter
Extend the info file for strand information if provided by any caller. Plus report it in the caller specific column.
Hello,
I used the SURVIVOR vcftobed function like so (with filenames removed)
./SURVIVOR vcftobed <input.vcf> 0 -1 <output.bed>
the output.bed file reported variants as single bases which did not account for the AVGLEN values in the INFO column of the input.vcf file. Is there a way to get vcftobed to use that length information when reporting start and end values in the output bed file?
Any help is greatly appreciated. Thank you
Alberto
Hi @fritzsedlazeck ,
I've noticed survivor usage in some recent publications analyzes ins/del at sizes 50 and higher -- note that the manta default is actually off by one from this value. If you haven't reconfigured it, Manta results will be 51 bases and higher, per the configuration file setting here:
This will have a minor impact overall, but I think it's worth noting/correcting in future if this is not already configured in the SURVIVOR pipeline.
Thanks! -c
Hi,
I am using the merge option to merge my SV vcf files. However, there are two sections that I do not quite understand in using SURVIVOR merge"SURVIVOR merge sample_files 1000 2 1 1 0 30 sample_merged.vcf".
1.What does the 1000 means? You explained it as maximum allowed distance of 1kb. Does that mean the start position of the two SV sites could be 1kb distance? For example, one SV start position is chr1:100 and the other SV start position is chr1:1100 , and they can be merged as one sv site if they are the same SV type and on the same strand.
2. What does the 0 mean? You explained it as Estimate distance based on the size of SV. What distance does it estimate?
Cheers
SURVIVOR simSV
To simulate SV:
1: Reference fasta file
2: Parameter file
3: SNP mutations frequency (0-1)
4: 0= simulated reads; 1= real reads
5: output prefix
Hi Fritz,
Could you please tell me what is the 4th parameter? My aim is to introduce svs on given genome but I really did not understand 4th parameter. This step only creates an altered ref with given parameter file as expected but I don't understand what does 4th parameter.
Thanks,
Mehmet
Hi,
First, SURVIVOR is really a very good tool. I used it for merging structure variations of multi samples. But we also want to define the reciprocal overlap of SVs when we merge data. I wonder is there any chance you consider add this option for SURVIVOR ?
Best,
Shuo
Hi,
Thanks for the solving the problem of scanreads funciton yesterday.
However, I have the similar problem with simreads module:
/home/wyzhang/opt/biosoftware/ONT_Analysis_Tools/SURVIVOR/Debug/SURVIVOR simreads ../../1.Deletion_Simulation/mm10_Chr1_Homo_DEL.fa ./ONT_error.txt 4 mm10_Chr1_Homo_DEL_ONT
I still got the help info on the screen:
No parameters provided:
1: Reference fasta file
2: error profile file (see scan)
3: Coverage
4: output prefix
I think I have given the program all the neccessary information, but it did not work again.
Best,
Wenyu
Thank you for developing such a great tool !
We use SURVIVOR to merge Manta calls across multiple samples and we love how simple it is to run.
We have a small enhancement request. Would it be possible to include the actual sample names in the output multi-sample VCF ?
Basically, we would like to have the sample name (extracted from the original VCF) to appear in the corresponding header column of the output multi-sample VCF.
We typically build a list of absolute paths to the input VCF files and we would like just the sample name, instead of the long absolute path, to be listed in the output file after merging.
Hi Fritz,
I want to compare several SV callers which use short read data including (pindel, delly and lumpy) to see their accuracy. I have used SURVIVOR to introduce artifical SVs, but I couldnt find any parameter to simulate ngs reads from simulated genome, I have only found long reads parameter.
Do I miss something or survivor only implemented to simulate long reads?
Hi,
I wonder the merge strategy of SURVIVOR. For different SVs, DEL, INS, DUP and others, SURVIOR merge them together or seperately?
Best,
Shuo
Hello,
is it possible to filter out SV's from an input vcf file based on the proportion of overlap with regions in a bedpe file (similar to the -f option in bedtools intersect). I attempted to filter out SV's whose intervals had a 50% overlap with SV's in an exclusion bed file with the following command:
SURVIVOR filter <input.vcf> <remove.bed> 0.5 -1 0.01 10 <out.vcf>
However the out.vcf file did not have the expected output; there are SV's in out.vcf that do intersect with intervals with remove.bed that were not actually removed.
Does entering a decimal value for min SV size the way i did accomplish what I am intending to do?
Hello,
I've been trying to run LRSIM to generate some simulated human reads, and I've noticed that the run is stalling at the initial SURVIVOR stage. (Ie. stuck there for > 12 hours).
After doing some tests, I suspect that it is due to my reference FASTA file lacking newlines for a given sequence.
Ex. for e. coli tests:
SURVIVOR 0 ./ecoli.fa ./test1.hap.parameter 0 ./test1.hap 1000
completes in under a second (fasta has 80 residues per line)
while
SURVIVOR 0 ./ecoli_noNewlines.fa ./test1.hap.parameter 0 ./test1.hap 1000
stalls, and I aborted it after ~4 minutes (fasta has all residues for a sequence on one line).
I'm just wondering if that is a known formatting requirement for running SURVIVOR?
Thanks!
Lauren
The output of SURVIVOR merge is not always sorted by chrom,POS when the input files are.
For a test see https://gist.github.com/amwenger/f3cfcfa2e144a7bea7623e520cb982fa.
$ SURVIVOR 2>&1 | head -n2
Program: SURVIVOR (Tools for Structural Variations in the VCF format)
Version: 1.0.5
$ SURVIVOR merge sort-error.fofn 500 1 1 0 0 30 merged.vcf
# input file is sorted
$ grep -vE "^#" sort-error.vcf | cut -f1-3
1 811404 var1
1 811404 var2
1 811680 var3
1 811680 var4
# output file is not sorted
$ grep -vE "^#" merged.vcf | cut -f1-3
1 811680 var3;var4
1 811404 var1;var2
I tried to merge a couple of vcf files with the following command:
SURVIVOR merge sample_files 100 1 0 0 0 30 survivor_merge.vcf
In the output vcf file there are those lines for example:
22 18502316 INV00936SUR N <INV> . PASS SUPP=8;SUPP_VEC=00001111111100;AVGLEN=690.1;SVTYPE=INV;SVMETHOD=SURVIVORv2;CHR2=22;END=18503011;CIPOS=0,6;CIEND=-5,0;STRANDS=++ GT:PSV:LN:DR:ST:TY:CO ./.:NaN:0:0,0:--:NaN:NaN ./.:NaN:0:0,0:--:NaN:NaN 0/0:NA:688:0,7:++:INV:22_18502319-22_18503007 0/0:NA:689:0,3:++:INV:22_18502319-22_18503008 0/1:NA:694:0,0:++:INV,INV:22_18502317-22_18503011,22_18502316-22_18503009 0/1:NA:694:0,0:++:INV,INV:22_18502317-22_18503011,22_18502316-22_18503009 0/1:NA:695:0,0:++:INV:22_18502316-22_18503011 0/1:NA:695:0,0:++:INV:22_18502316-22_18503011 0/1:NA:686:0,0:++:INV,INV:22_18502319-22_18503005,22_18502322-22_18503005 0/1:NA:686:0,0:++:INV,INV:22_18502322-22_18503004,22_18502319-22_18503005 0/1:NA:687:0,20:++:INV:22_18502319-22_18503006 0/1:NA:687:0,14:++:INV:22_18502319-22_18503006 ./.:NaN:0:0,0:--:NaN:NaN ./.:NaN:0:0,0:--:NaN:NaN
22 18502316 INV00938SUR N <INV> . PASS SUPP=4;SUPP_VEC=00000000100111;AVGLEN=641.25;SVTYPE=INV;SVMETHOD=SURVIVORv2;CHR2=22;END=18503009;CIPOS=0,75;CIEND=0,0;STRANDS=++ GT:PSV:LN:DR:ST:TY:CO ./.:NaN:0:0,0:--:NaN:NaN ./.:NaN:0:0,0:--:NaN:NaN ./.:NaN:0:0,0:--:NaN:NaN ./.:NaN:0:0,0:--:NaN:NaN ./.:NaN:0:0,0:--:NaN:NaN ./.:NaN:0:0,0:--:NaN:NaN ./.:NaN:0:0,0:--:NaN:NaN ./.:NaN:0:0,0:--:NaN:NaN 0/1:NA:559:0,0:++:INV:22_18502391-22_18502950 ./.:NaN:0:0,0:--:NaN:NaN ./.:NaN:0:0,0:--:NaN:NaN 1/1:NA:620:0,2:--:INV:22_18502387-22_18503007 1/1:NA:693:0,0:++:INV:22_18502316-22_18503009 1/1:NA:693:0,0:++:INV:22_18502316-22_18503009```
Can you explain why those SV's are not merged into one?
At first, I skipped past this program because I thought it was just another SV detection tool for short reads. I heard from somewhere else that this tool does some format interchange and, going through the README now, became interested.
I strongly suggest that you change the repository description and the top of the README to better represent what this program can do.
Hi Fritz,
I used SURVIVOR simsv function with parameter file to introduce SVs in given genome.
For deletions and inversions I did not have any problem based on length. I introduced 1000 svs including indels duplications and inversions. Length of duplications are always greater than DUPLICATION_maximum_length. Could you please tell me how to solve this?
Best,
Mehmet
Hi,
I'm merging two VCF files from NanoSV using
SURVIVOR merge input_files 1000 1 0 0 0 10 TUMOR_NORMAL_merged.vcf
I found the merged VCF files with SVTYPE:INV added, although the SVTYPE is BND in original VCF files. Please see the following example:
TUMOR.vcf
chr1 | 179947 | 36 | T | T[chr1:181453[ | 45 | GAP;MapQual | SVTYPE=BND;SVMETHOD=NanoSV_v1.1.6;CIPOS=0,0;CIEND=0,0;SVLEN=1506;RT=0,1,0;GAP=997;MAPQ=60,60;PID=0.975,0.964;PLENGTH=0.536,0.412;RLENGTH=19764;DEPTHPVAL=0.558 | GT:DR:DV:GQ:HR:PL | 0/1:2,1:1,1:15:0,0:45,0,15
NORMAL.vcf
chr1 | 180588 | 1344 | C | C[chr1:181422[ | 60 | GAP;MapQual | SVTYPE=BND;SVMETHOD=NanoSV_v1.1.6;CIPOS=0,0;CIEND=0,0;SVLEN=834;RT=0,1,0;GAP=556;MAPQ=60,60;PID=0.941,0.932;PLENGTH=0.324,0.553;RLENGTH=4834;DEPTHPVAL=0.289 | GT:DR:DV:GQ:HR:PL | 1/1:0,0:1,1:6:0,0:60,6,0
TUMOR_NORMAL_merged.vcf
chr1 | 180588 | INV004SUR | N | | . | PASS | SUPP=2;SUPP_VEC=11;AVGLEN=1170;SVTYPE=INV;SVMETHOD=SURVIVORv2;CHR2=chr1;END=181422;CIPOS=-641,0;CIEND=-641,0;STRANDS=++ | GT:PSV:LN:DR:ST:TY:CO | 0/1:NA:1506:0,0:++:INV:chr1_179947-chr1_181453 | 1/1:NA:834:0,0:++:INV:chr1_180588-chr1_181422
ALL SVs with SVTYPE=BND in original VCF files are added as INV, except those with breakend in different chromosomes. How SURVIVOR infer the SVTYPE?
If I want to keep the SVTYPE in original VCF file, what should I do?
Thank you very much!
Hello,
I am trying to use survivor to generated simulated data using existing error profile NA12878_nano_error_profile_bwa.txt. Below is my command.
However the bed file is empty despite running vcf2bed again and the vcf file is huge. Could you please clarify if I am using this in the right way?
0 simulated1.bed 3.0G simulated1.fasta 0 simulated1.insertions.fa 31G simulated1.vcf
0 simulated1.bed 3.0G simulated1.fasta 0 simulated1.insertions.fa 31G simulated1.vcf
Thank you,
Ram
Hi,
I ran simSV
with SURVIVOR simSV $REFERENCE parameter_file 0 1 test
but got the following error:
# Chrs passed size threshold:24
generate SV
Terminate program as it could not find a non overlapping region
The reference I used was hg38 and the parameter file was like:
PARAMETER FILE: DO JUST MODIFY THE VALUES AND KEEP THE SPACES!
DUPLICATION_minimum_length: 50
DUPLICATION_maximum_length: 1000000
DUPLICATION_number: 2500
INDEL_minimum_length: 50
INDEL_maximum_length: 1000000
INDEL_number: 2500
TRANSLOCATION_minimum_length: 50
TRANSLOCATION_maximum_length: 1000000
TRANSLOCATION_number: 2500
INVERSION_minimum_length: 50
INVERSION_maximum_length: 1000000
INVERSION_number: 2500
INV_del_minimum_length: 50
INV_del_maximum_length: 1000000
INV_del_number: 2500
INV_dup_minimum_length: 50
INV_dup_maximum_length: 1000000
INV_dup_number: 2500
Do you know how this happened? Thanks in advance!
Hi,
I want to merge my vcf files using 'SURVIVOR merge 925.gt.vcf 924.gt.vcf 1000 1 1 1 50 merge.vcf'
But it has an error showing "VCF Parser: could not open file: ##fileformat=VCFv4.2"
How can I resolve this problem?
Cheers.
Hi,
I am trying to use the latest version of SURVIVOR (v1.02) to simulate ONT reads based on "true" alignment data. The first step is to generate error file from the alignment data. Using the command line suggested, I cannot get the program running, however:
samtools view ./MC_All_Albacore_trim_filtered_BWA.bam | ./SURVIVOR 2 scan 10000 error.txt
_Program: SURVIVOR (Tools for Structural Variations in the VCF format)
Version:
Usage: SURVIVOR [options]
Commands:
-- Simulation/ Evaluation
simSV Simulates SVs and SNPs on a reference genome.
scanreads Obtain error profiles form mapped reads for simulation.
simreads Simulates long reads (Pacio or ONT).
eval Evaluates a VCF file after SV calling over simulated data.
-- Comparison/filtering
merge Compare or merge VCF files to generate a consensus or multi sample vcf files.
filter Filter a vcf file based on size and/or regions to ignore
stats Report multipe stats over a VCF file
compMUMMer Annotates a VCF file with the breakpoints found with MUMMer (Show-diff).
-- Conversion
bincov Bins coverage vector to a bed file to filter SVs in low MQ regions
vcftobed Converts a VCF file to a bed file
bedtovcf Converts a bed file to a VCF file
smaptovcf Converts the smap file to a VCF file (beta version)_
As you can see, with the command line above, I get the "help" information on the screen, and did not have anything to output.
Also tried:
samtools view ./MC_All_Albacore_trim_filtered_BWA.bam | ./SURVIVOR scanreads 1000 1 error.txt
But still did not work.
Could you help to take a look at what is going on there?
Thanks!
Hi, Friz
First, thank you for developing such a great tools to evaluate SV callers.
I tried SURVIVOR scanreads
and it generated a file as bellow.
samtools view ~/20180705_aligner_and_SVcaller_evaluation/minialign/chr22.bam | head -5000 |SURVIVOR scanreads 1000 error.txt
Pos P(stop) P(match) P(mismatch) P(ins)] P(del)
I aligned NA12878 Nanopore(red 3) reads to GRCH38 with minialign,
bam file is as bellow:
$ samtools view ~/20180705_aligner_and_SVcaller_evaluation/minialign/chr22.bam | head -1
4730cf64-110d-4363-a502-2b940dd91a6d_Basecall_Alignment_template 256 CM000684.2 10510001 0 2730H41M1D35M1D5M1D1M1D18M1D18M1I1M1D6M1D2M1D52M1D64M2I4M2D33M2D27M1D10M1D2M1D8M2D2M1D2M2D1M7D35M2D41M1I30M3D1M1D2M1D11M2D12M1D16M5D10M1D39M3D1M1D9M1D18M2D30M1D12M2D25M1D4M1D35M1D24M1D11M9D17M1D5M1D4M1I2M1D3M2D1M1D4M1D17M1D1M1D2M1D7M3D6M1D7M1D18M1D33M3D8M1D6M2D5M1D4M2D22M1I20M1D7M1D50M1D15M1I25M1I3M3D44M1D12M3D16M1D7M1D11M1D21M2D1M1D5M1D35M1I1M1D12M2D5M2D4M1D23M1I3M1I2M2D21M1D5M2D12M1I1M1I22M1D6M1D17M4D2M1D2M1I29M1D9M1D6M1D1M1D1M1D7M1D5M1D4M1I1M1I38M1D4M1D2M4D5M1D3M1D1M1D32M1D9M1D9M1D9M2D2M1D9M9935H * 0 0 GAATTCTTGTGTTTATATAATAAGATGTCCTATAATTTCTGTTGGAATATAAAATCAGCAACTAATATGTATTTTCAAGCATTAAATACAGAGTGCTAAGTACTTCACTGTGAAATGTGATCATATAAAACATAATAATTATACTGGATTATTTTTAAATGGGCTGTCTAACATTATATTAAAGGTTTCATCAGTAATTCATTATATCAAAATGCTCCAGGCCAGGCGTGGTGGCTTATGCCTGTATAATCAGCACTTTGGGAGGTCGAAGTGGGCGGATCGCGAGGTCAGGAGATGGAGACTAGCCTGGCAACATGATGAACCCGTGTCATATTAAAAATTAGCTGGGTGTGGTGGTGGGCAGCTGTAATCAGCTAATCAGGAGGCTGAGGCAGAAGAATTGCTTGAACGATGGAAGGCAGAGTTTACAGTGTGCCAAGATACACCTACTCCAGCGGGTGACAGAGCAGACGCAGTCTCAAGGAAAAAGCTCGAAAATGTTTGCTTATTTTGGTAAAATTATTCATTGACTATATAAATCAACAAACTGTCCATATTTCATTTTGAAATTACATATTAAGGATCTAAAACAAGGATAAAATATGCAAAAATATTTGCATTTCTGGATAAGTGGTTCCTGATTTAACTCCTTGTTGACACTGAATCTGTTTAAATTAAACAAGGTGCTGATAGACAAAAGACAATTCAAAATAAAATTGAAGGTAAATCCAAACACAGAATAAAATTTCCTCTGACGGTAAAGAGATCATACAAAATCAAATTTATATAGTCTCTTTCAAAGGGTCATAAAACCAATCAGTTAATAGTTGTTTATGTGAAAAAGGCTGTAAAGAAAATAATGACACCAAAATCCCTTTTTAAGTTCAAAGAGTCATAAGTTTCAAAAATGTAATTCAGATAAGAAATTAATGTCAGCTACATTCTTCCACAGTAAGGTTTGCATCAACTTTTTCAGCTAATAGCTTTAGCATTCGACTAAACTTGTTTCAACCATTATTTTACTAGGTGTTATAACAGGGTATACATTCTTTCATTTTCTTCATTTATTAATGGAATATTTCTAAAAGACATTTCCTCCCAAATGTGTAGTTACACAAAACACAGTTTGGACAGGAAAAACAAGATAAATGAACAAATGCTTGATTTATGCCTCCCACTTTTCACAGTAATGAATTAATTAGCCAGCAAACTTTAATAGTGATTAAAGCATTTCCCATAATAATAAAATTTTAAGTTTTTGTAAGTAGTGATAATAATAATTTGTCCAGTCTGTAGTACCTATAACCAGCCAGGATCCATTCTCAACAATGTGGGAAAGGCCAGTTTAAATTCTGTCCAAGGATACCAGCAAGATGATGAAATAGAAAGCCTCAAAGCCCTCACCTCTCAAGGACACCAGCTCGATAATAATACACAGATAAATCCCTTCGTGAAAGTTCAGGGAGAATTTAGCTCTGGATT *
Hello,
I'm trying to use the SURVIVOR filter function to remove SV's from a vcf file. Here is the command I am using with the file names removed:
SURVIVOR filter <input.vcf> <remove.bed> 1 1000 0.01 10 <out.vcf>
I entered dummy values for min and max sv size and just used the values from the example usage on your github for min allele frequency and min number of reads. I am not entirely sure how to determine what values to input for these parameters.
The command executes successfully and produces an output but I do receive this warning/error message:
Warning: Max size threshold set, TRA wont be reported as their size cannot be assesst.
Additionally the <out.vcf> file has all SV's removed and only contains the header. There are 131704 SV's in my <input.vcf> and I receive the message SVs ignored: 131704
resulting in an empty output file.
Is this due to an incorrect use of parameters and if so how can I correct these settings? Any help is greatly appreciated.
Best,
Alberto
Hello,
First off, I love the tool and it is very useful to our SV analysis for merging call sets between individuals.
One question with regards to Command #5:
Is there some way to control reciprocal overlap? I know there is a parameter that controls "within ## BP of each of each other" for merging SVs, but I'm wondering how the logic behind it works.
From your paper:
"Two SVs were defined as overlapping if they occur on the same chromosome, their start and stop coordinates were within 1 kb,"
Is it as such:
SV1: chr1 1000 3000
SV2: chr1 1500 3500
Do you then compute: ( | SV1_start - SV2_start | + | SV1_end - SV2_end | ) ≤ 1000bp
( | 1000 - 1500 | + | 3000 - 3500 | ) = (500 + 500) ≥ 1000, so you count this as the same variant?
Or is it: | SV1_start - SV2_start | ≤ 1000 AND | SV1_end - SV2_end | ≤ 1000, then you count this variant.
Thanks for your help and this awesome tool!
-Phil Richmond @Phillip-a-richmond
Hi Fritz,
We sequenced a NA12878 run and used parliament2 to generate SVs. We tried to compare the parliament2 (>500bp) DEL results with Personalis' DELs, using "SURVIVOR merge", with parameters: 1000 2 1 1 0 500. We used the parliament2 combined vcf as well as the MANTA calls from the combined vcf (simply by grep MANTA). Here is what confused us:
We got more merged calls from the MANTA subset than the combined vcf. An example:
Personalis:
chr1 59582965 DEL00BED N . LowQual IMPRECISE;SVTYPE=DEL;SVMETHOD=BEDFILE;CHR2=chr1;END=59583989;SVLEN=1024;PE=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV
Combined vcf:
chr1 59583072 DEL0086690SUR N . PASS SUPP=2;AVGLEN=971.5;SVTYPE=DEL;SVMETHOD=SURVIVORv2;CHR2=1;END=59583990;STRANDS=+-;CALLERS=BREAKDANCER,LUMPY GT:SP 1/1:BREAKDANCER,LUMPY
chr1 59583302 DEL0086689SUR N . PASS SUPP=2;AVGLEN=687;SVTYPE=DEL;SVMETHOD=SURVIVORv2;CHR2=1;END=59583989;STRANDS=+-;CALLERS=MANTA GT:SP 1/1:MANTA
MANTA subset:
chr1 59583302 DEL0086689SUR N . PASS SUPP=2;AVGLEN=687;SVTYPE=DEL;SVMETHOD=SURVIVORv2;CHR2=1;END=59583989;STRANDS=+-;CALLERS=MANTA GT:SP 1/1:MANTA
In the above example, the combined vcf failed to merge the DEL with the Personalis' but the MANTA subset merged the DEL successfully.
Suspecting that the problem might be due to the overlapping calls in the combined vcf, we merged combined vcf against itself. In the above example, the two DELs from combined vcf indeed merged into one. However, the DEL from self merged combine-vcf still fail to merge with Personalis'.
chr1 59583072 DEL0087SUR N . PASS SUPP=2;SUPP_VEC=11;AVGLEN=971;SVTYPE=DEL;SVMETHOD=SURVIVORv2;CHR2=1;END=59583990;CIPOS=0,230;CIEND=-1,0;STRANDS=+- GT:PSV:LN:DR:ST:TY:CO 1/1:2,:971:0,0:+-:DEL,DEL:chr1_59583072-1_59583990,chr1_59583302-1_59583989 1/1:2,:971:0,0:+-:DEL,DEL:chr1_59583072-1_59583990,chr1_59583302-1_59583989
The above call still failed to merge with Personalis.
Your thought on this is greatly appreciated. Thanks much.
Dear valuable authors,
Please consider adding comments describing function of each function, and possibly a short document describing the architecture and overall structure of the code.
This way, future researchers could more easily understand and improve your code.
Of course documentation has lower priority and you can postpone it. But please consider it in the last step.
Regards
Hi Fritz,
I tried to simulate the reads using simSV and notice that the par file has INDEL but no separate DEL or INS. I am wondering if there is a way to directly simulate large deletions, as many benchmark data are deletions. Thanks much!
Yue
Hello Fritz,
I tried to use SURVIVOR to simulated a few number of SVs in one of the yeast reference genomes. The command line was:
SURVIVOR 1 yeast.fa par_file 0 simulated
Some lines of the bed output is as below:
NC_001139.9 239105 NC_001139.9 246078 DUP
NC_001137.3 304464 NC_001137.3 305703 DEL
NC_001134.8 469961 NC_001134.8 474862 INS
NC_001141.2 186459 NC_001141.2 189325 INV
NC_001137.3 338964 NC_001134.8 210746 TRA
NC_001137.3 339552 NC_001134.8 211334 TRA
I think I understand the insertion, deletion, and inversion lines. For instance, there is an insertion event from 469,961 to 474,852 on NC_001134.8 chromosome, right?
But I am not quite sure how to interpret the translocation and duplication lines. Appreciate your help in advance!
Simo
Hello,
I've run SURVIVOR on the outputs of CNVnator, LUMPY, and Pindel, and I see that I have a significant number of SVs of type 'NA' in the output (~25%).
For example:
1 43171 NA004SUR N . PASS SUPP=1;SUPP_VEC=010;AVGLEN=295;SVTYPE=NA;SVMETHOD=SURVIVORv2;CHR2=1;END=43466;STRANDS=++ GT:LN:DR:ST:TY:CO ./.:0:0,0:--:NaN:NaN 0/0:295:0,0:++:NA:1_43171-1_43466 ./.:0:0,0:--:NaN:NaN
and
1 563387 NA0014SUR N . PASS SUPP=1;SUPP_VEC=010;AVGLEN=150;SVTYPE=NA;SVMETHOD=SURVIVORv2;CHR2=1;END=563537;STRANDS=++ GT:LN:DR:ST:TY:CO ./.:0:0,0:--:NaN:NaN 0/0:150:0,0:++:NA:1_563387-1_563537 ./.:0:0,0:--:NaN:NaN
What are these? Are they records with conflicting SV types?
Thanks for your help!
Madeline
Add names instead of options and get a better description and parameter passing
Hello Fritz,
Should not Line 141 and 142 be as follows?
cend.first = std::min(caller_info[i]->stops[j] - caller_info[index]->stops[0], cend.first);
cend.second = std::max(caller_info[i]->stops[j] - caller_info[index]->stops[0], cend.second);
SURVIVOR/src/merge_vcf/combine_svs.cpp
Line 141 in 5343702
SURVIVOR/src/merge_vcf/combine_svs.cpp
Line 142 in 5343702
Cheers,
Luca
Hi ,
I recently noticed that when merging two VCFs, all DUPs in my sample are merging into the NA type,
The command I ran is the following,
SURVIVOR merge vcf_files_raw_calls.txt 1000 1 1 -1 -1 -1 merged_SURVIVOR_1kbpdist_typesave.vcf
here's an example of the failed merge type
#merged line in SURVIVOR
chr1 10003 NA000SUR N . PASS SUPP=2;SUPP_VEC=110;AVGLEN=135;SVTYPE=NA;SVMETHOD=SURVIVORv2;CHR2=chr1;END=10199;CIPOS=0,75;CIEND=-49,0;STRANDS=++ GT:PSV:LN:DR:ST:TY:CO ./.:NA:73:0,2:++:NA:chr1_10078-chr1_10150 ./.:NA:197:0,3:++:NA:chr1_10003-chr1_10199 ./.:NaN:0:0,0:--:NaN:NaN
#the two entries that got merged together into this are the following
chr1 10078 TJ10668 N . PASS IMPRECISE;SVMETHOD=picky;END=10150;SVTYPE=DUP;RE=2;RNAMES=m151015_221903_42196_c100828522550000001823180911251523_s1_p0/91653/37610_37635,m150919_214537_42196_c100828492550000001823180911251585_s1_p0/30014/34259_34367;SVLEN=73;CIPOS=-47,47;CIEND=-76,76;NOTE=CIPOS_CIEND;ISVTYPE=TDSR(2) GT ./.
chr1 10003 TJ36088 N . PASS IMPRECISE;SVMETHOD=picky;END=10199;SVTYPE=DUP;RE=3;RNAMES=m150704_162046_42134_c100846082550000001823189212311577_s1_p0/140541/30105_30350,m150530_105730_00118_c100813682550000001823175310291582_s1_p0/88961/25088_25294,m150630_190056_00118_c100855592550000001823186312311594_s1_p0/131968/37141_37217;SVLEN=197;CIPOS=-2,69;CIEND=-1,90;NOTE=CIPOS_CIEND;ISVTYPE=TDSR(3) GT ./.
Please advise,
This will merge all the vcf files specified in sample_files together using a maximum allowed distance of 1kb, as measured pairwise between breakpoints (begin1 vs begin2, end1 vs end2). Furthermore we ask SURVIVOR only to report calls supported by 2 callers and they have to agree on the type (1) and on the strand (1) of the SV
Hi,@fritzsedlazeck,
I didn't understand the strand,can you explain it briefly?
Thanks
I am using SURVIVOR to simulate structural variations, but I am not sure about that the position of each SV is about source genome or generated genome.
I compared the insertion sequences generated by SURVIVOR with generated genome, but very confusing. Some insertion sequences are really there in generated genome, but some others are not.
What happened?
using ./SURVIVOR 10 ./test.tails 20 test.vcf
The input is:
#id chrKey uRef uBreak uMapq dRef dBreak dMapq remainSeq annot numReads numZMWs evidence
0 1_1 1 40015898 254 1 40030111 254 1 DEL 2 2 ->p=i->;<-i=p<-
The output:
1 40015898 DEL00Honey N <DEL> . LowQual IMPRECISE;SVTYPE=DEL;SVMETHOD=Honey_tails;CHR2=1;END=40030111;SVLEN=14213;PE=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV
Is this value DEL00Honey
is the default behavior?
also the coverage is not included, what is PE filed why it is always 1?
Thanks,
I'm getting a compiling error when trying to install the latest release.
Below the relevant log lines:
Building target: SURVIVOR
Invoking: Cross G++ Linker
g++ -o "SURVIVOR" ./src/vcfs/Annotate_vcf.o ./src/vcfs/Combine_3_VCF.o ./src/vcfs/Compoverlap_VCF.o ./src/vcfs/Detect_nested.o ./src/vcfs/Filter_vcf.o ./src/vcfs/Generate_distMat.o ./src/vcfs/Merge_VCF.o ./src/snp_overlap/Overlap_snps.o ./src/simulator/Error_scanner.o ./src/simulator/Eval_vcf.o ./src/simulator/Pac_Simulator.o ./src/simulator/SV_Simulator.o ./src/simulator/Sim_reads.o ./src/simulator/test_cov.o ./src/merge_vcf/IntervallTree.o ./src/merge_vcf/combine_svs.o ./src/convert/ConvertMQ0Bed.o ./src/convert/Convert_Assemblytics.o ./src/convert/Convert_Bionano.o ./src/convert/Convert_Honey_tails.o ./src/convert/Convert_MUMmer.o ./src/convert/Convert_Pindel.o ./src/convert/Convert_VCF_to_BED.o ./src/convert/Convert_hapcut2.o ./src/convert/Process_Coverage.o ./src/convert/Process_Lumpy.o ./src/analysis_sv/GIAB_summary.o ./src/analysis_sv/MT_identifier.o ./src/analysis_sv/MUMmer_overlap.o ./src/analysis_sv/Select_samples.o ./src/analysis_sv/Simplify_SVs.o ./src/analysis_sv/Summ_mat.o ./src/CorrectAllele.o ./src/DetectDif.o ./src/Extract_Seq.o ./src/SURVIVOR.o ./src/Summarize_SV.o
./src/convert/Convert_Assemblytics.o: In functionstd::string::_M_data() const': /home/cog/jvalleinclan/kloosterman/tools/SURVIVOR/Debug/../src/convert/Convert_Assemblytics.cpp:11: multiple definition of
print_header(std::string, _IO_FILE*&)'
./src/vcfs/Compoverlap_VCF.o:/home/cog/jvalleinclan/kloosterman/tools/SURVIVOR/Debug/../src/vcfs/Compoverlap_VCF.cpp:33: first defined here
./src/SURVIVOR.o: In functionofficial_interface(int, char**)': /home/cog/jvalleinclan/kloosterman/tools/SURVIVOR/Debug/../src/SURVIVOR.cpp:248: undefined reference to
parental_phasing(std::string, std::string, std::string)'
collect2: error: ld returned 1 exit status
make: *** [SURVIVOR] Error 1
Is it on my side or is there a mis-definition somewhere?
I get the following errors when installing SURVIVOR:
.
.
.
Finished building: ../src/Summarize_SV.cpp
Building target: SURVIVOR
Invoking: Cross G++ Linker
g++ -o "SURVIVOR" ./src/vcfs/Annotate_vcf.o ./src/vcfs/Combine_3_VCF.o ./src/vcfs/Compoverlap_VCF.o ./src/vcfs/Filter_vcf.o ./src/vcfs/Merge_VCF.o ./src/simulator/Eval_vcf.o ./src/simulator/Pac_Simulator.o ./src/simulator/SV_Simulator.o ./src/merge_vcf/IntervallTree.o ./src/merge_vcf/combine_svs.o ./src/convert/ConvertMQ0Bed.o ./src/convert/Convert_Assemblytics.o ./src/convert/Convert_Bionano.o ./src/convert/Convert_Honey_tails.o ./src/convert/Convert_Pindel.o ./src/convert/Convert_VCF_to_BED.o ./src/convert/Process_Lumpy.o ./src/CorrectAllele.o ./src/DetectDif.o ./src/Extract_Seq.o ./src/SURVIVOR.o ./src/Summarize_SV.o
./src/SURVIVOR.o: In function `main':
/home/sean/SURVIVOR/Debug/../src/SURVIVOR.cpp:292: undefined reference to `summary_giab(std::string, std::string)'
/home/sean/SURVIVOR/Debug/../src/SURVIVOR.cpp:309: undefined reference to `detect_nested(std::string, std::string)'
/home/sean/SURVIVOR/Debug/../src/SURVIVOR.cpp:273: undefined reference to `summarize_badcoverage(std::string, int, int, std::string)'
collect2: error: ld returned 1 exit status
make: *** [SURVIVOR] Error 1
Thanks!
Hello,
I use SURVIVOR merge to merge information from several software.
I wanted to know how DR information was calculated.
For exemple, with Manta this information is true, for Delly this information is false and for Lumpy we have only the information for the variant and not for the reference.
Thank you in Advance,
Regards,
Steven
I think one scenario which might often arise is that we have e.g.
Or alternatively that the assembly and the SV-caller would disagree with some assembly hypothesis and report SVs.
It would be great if an option would exist to define one set as background SVs which should be ignored, or flagged.
So essentially the opposite of "Consensus call from multipe SV calls (vcf files)" .
Cheers
As we discussed during the hackathon, insertions are represented in many different ways by different callers, particularly in how the end and size of the insertion is represented. It would be great to find a way to compare these appropriately.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.