Coder Social home page Coder Social logo

cutesv's People

Contributors

bnoyvert avatar corese avatar meltpinkg avatar npinter avatar tjianghit avatar wdecoster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cutesv's Issues

question about INFO

hello:
when i run cutesv on ngmlr.sort.bam , INFO of all out which including BND is IMPRECISE;SVTYPE=BND;RE=1;RNAMES=NULL
IMPRECISE represent Imprecise structural variant,what is wrong? And there is no CHR2 and END?

supporting reads?

I'm currently trying to replace sniffles in a workflow with cuteSV. One of the more useful features for complex SV confirmation is that sniffles outputs the read_names for the reads that support each SV in the INFO.RNAMES field. Might I request this as a feature?

execution error

Hello!!

I am trying to run cuteSV with my PacBio yeast data and I get this error:

cuteSV /Users/io/Documents/TB50/SVs/PacBio_alignedReads_pbmm2.sorted.bam /Users/io/Documents/TB50/SVs/Final_SV_calling/reference_genome/S288C_genome.fasta cuteSV_PB.vcf /Users/io/Documents/TB50/SVs/Final_SV_calling/cuteSV/results --max_cluster_bias_INS 100 --diff_ratio_merging_INS 0.3 --max_cluster_bias_DEL 200 --diff_ratio_merging_DEL 0.5
2021-09-08 10:15:41,613 [INFO] Running /Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/bin/cuteSV /Users/io/Documents/TB50/SVs/PacBio_alignedReads_pbmm2.sorted.bam /Users/io/Documents/TB50/SVs/Final_SV_calling/reference_genome/S288C_genome.fasta cuteSV_PB.vcf /Users/io/Documents/TB50/SVs/Final_SV_calling/cuteSV/results --max_cluster_bias_INS 100 --diff_ratio_merging_INS 0.3 --max_cluster_bias_DEL 200 --diff_ratio_merging_DEL 0.5
2021-09-08 10:15:41,661 [INFO] The total number of chromsomes: 17
Traceback (most recent call last):
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/bin/cuteSV", line 4, in
import('pkg_resources').run_script('cuteSV==1.0.11', 'cuteSV')
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/site-packages/pkg_resources/init.py", line 651, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/site-packages/pkg_resources/init.py", line 1448, in run_script
exec(code, namespace, namespace)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/site-packages/cuteSV-1.0.11-py3.9.egg/EGG-INFO/scripts/cuteSV", line 802, in
run(sys.argv[1:])
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/site-packages/cuteSV-1.0.11-py3.9.egg/EGG-INFO/scripts/cuteSV", line 798, in run
main_ctrl(args, argv)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/site-packages/cuteSV-1.0.11-py3.9.egg/EGG-INFO/scripts/cuteSV", line 627, in main_ctrl
analysis_pools = Pool(processes=int(args.threads))
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/pool.py", line 212, in init
self._repopulate_pool()
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/spawn.py", line 183, in get_preparation_data
main_mod_name = getattr(main_module.spec, "name", None)
AttributeError: module 'main' has no attribute 'spec'

Any idea??
Thanks!
T

Add a wiki for SV calling based on diploid assembly alignment

Hello everyone,

A new script that post-processes the typically cuteSV callsets from assembly alignments has been developed.
This will help users to generate the diploid-assembly-based SV callsets.
The evaluations are shown that the F1 scores of SVs presence and genotype achieved over 95%.
More details please see wiki.

Best,
Tao

Wrong insertion allele?

Hi,

I think cuteSV might calls variant alleles in a wrong way for some insertions. I provide an example below. Please would it be possible for you to check if it is a bug or an expected call?

I believe the issue is only present in insertions, like the one shown on the attached IGV screenshot. The reference sequence is repeated and there is a 58 base insertion in about half of the reads. Although the insertion appears at different locations I believe it is the same heterozygous variant, and the difference in location is caused by different ways one can put a variant in repeated sequence.

CuteSV calls one SV in this region (the first variant in the attached vcf file):

6 102204264 cuteSV.INS.0 C CAAATTTTAGGTATTGCTTTCCTCCACTGTCTAAAACAGAATCTGGATAATGTATCTT

and it is a 57 base insertion. The problem is that the variant allele from the vcf file above is quite different from the sequence inserted in the reads from which cuteSV called the variant. A few examples of the sequences inserted in reads at different locations (extracted from the bam file using samtools mpileup):

"TACATTTTCCTAATCTACATTTATAATGATTACATTTTATAATATACATTTATAATAGC"
"TTTATAATCTACATTTATAAATGGTTACATTTATAATCTACATTTATAAATGGTTACA"
"TATAAATGGTTACATTTATAATCTACATTTATAAATGGTTACATTTATAATCTACATT"
"ATTTATAATCTACATTTATAAATGGTTACATTTATAATCTACATTTATAAATGGTTAC"

I think the allele called by cuteSV cannot be traced to the read insertions above.

I have seen other examples similar to this one, can provide them if needed.

The bam and vcf files are attached.
The vcf was generated by this command:
cuteSV example.bam human_g1k_v37.fasta example.vcf ./ --genotype
The fasta file is the standard 1000 genomes hg19 reference file.
The version of software is cuteSV 1.0.12 cloned from this repository on 12 January 2022.

Many thanks,
Boris

example
example.zip

IndexError when force calling translocations

Hi,

I did the following with cuteSV 1.0.10:

  1. CuteSV SV calling
  2. merging the VCFs with Jasmine
  3. CuteSV force calling with -Ivcf

However, the last step generates the following error:

Traceback (most recent call last):
  File "/home/wdecoster/miniconda3/envs/cutesv/bin/cuteSV", line 4, in <module>
    __import__('pkg_resources').run_script('cuteSV==1.0.10', 'cuteSV')
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 665, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1463, in run_script
    exec(code, namespace, namespace)
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/cuteSV-1.0.10-py3.8.egg/EGG-INFO/scripts/cuteSV", line 801, in <module>
    run(sys.argv[1:])
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/cuteSV-1.0.10-py3.8.egg/EGG-INFO/scripts/cuteSV", line 797, in run
    main_ctrl(args, argv)
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/cuteSV-1.0.10-py3.8.egg/EGG-INFO/scripts/cuteSV", line 671, in main_ctrl
    result = force_calling(args.input, args.Ivcf, args.output, temporary_dir,
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/cuteSV-1.0.10-py3.8.egg/cuteSV/cuteSV_forcecalling.py", line 334, in force_calling
    sv_type, chrom, sv_chr2, pos, sv_end, sv_strand = parse_record(record)
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/cuteSV-1.0.10-py3.8.egg/cuteSV/cuteSV_forcecalling.py", line 73, in parse_record
    end = int(tra_alt.split(':')[1])
IndexError: list index out of range

I am afraid I can't share the VCF or bam files with you to reproduce this. Based on the error I suspect it is something with translocations, and removing these from the jasmine-merged file solves the error. I will show you how the first nine columns of TRA records look like after merging with Jasmine:

chr11   18643255        0_cuteSV.BND.0  .       <TRA>   219.5   PASS    PRECISE;SVTYPE=TRA;RE=23;RNAMES=1d499825-dfb5-45ea-98a5-fe16cf6541f5,88f6b37c-5f2c-4321-9ce9-d7fa63fb1b3c,8f28177b-de81-4ecd-aba7-37263a2f7f05,de9e5e07-2fcf-4677-a7df-c53d3607a2ba,b5d8446a-be7a-41d1-8c66-6069fceae18e,b1816320-0708-44e8-a75c-72f3dd4685f6,d1359d10-8ddd-4ff2-87b5-aa46fac52d87,5172ee36-fe44-4a8a-974c-f624e56b2526,4320724c-87d4-4847-a1a8-6e92cb7d78f3,68cda33f-1af3-4138-88de-af9d9c8da685,10867622-47fd-40d5-8e7f-ac25ae380431,493637b1-03ae-489d-bbb2-632d3cf83369,7979d796-bf1b-4d18-beee-b1f993c2c287,04821c5c-d4da-4dc2-80f9-1442b5ddb72a,6ec55480-0de9-4e0d-b0e3-16b6d843e7bc,6931cc19-5fcd-4b0a-a535-42c48ec7ccaa,ff16f2ba-1beb-4785-a800-2077dcb4e1fb,0520b923-f1d1-4bcc-948f-9011d36607cc,032622cf-9222-43b3-a146-30b7a074c261,a96cca8a-18c7-4286-b116-77ccde9d5a4e,cde75c30-857d-44e5-8d45-148e30614c9f,b3ddb5cd-6054-489e-903d-2608821eefda,8b89de4d-980c-4a60-984a-ffb498abb866;IRIS_PROCESSED=1;IRIS_REFINED=0;CHR2=chr15;END=34376485;STRANDS=--;SVLEN=0;STARTVARIANCE=645871785781648.800000;ENDVARIANCE=1291743571563295.200000;AVG_LEN=0.000000;AVG_START=18643256.792899;AVG_END=34376486.769231;SUPP_VEC_EXT=101000110111110111000111111010011011111111011101110001000101011000010100001010000110000111100100011101110111011100011110101011111101111101100000000010010100100001011111111110111111101110110000110000100111100110110100011101101101110110100001110000111010110001000110111010001111011111101101111010000000001010;IDLIST_EXT=cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0;SUPP_EXT=169;SUPP_VEC=101000110111110111000111111010011011111111011101110001000101011000010100001010000110000111100100011101110111011100011110101011111101111101100000000010010100100001011111111110111111101110110000110000100111100110110100011101101101110110100001110000111010110001000110111010001111011111101101111010000000001010;SUPP=169;SVMETHOD=JASMINE;IDLIST=cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0        GT:IS:OT:OS:DV:DR       
chr11   18649472        0_cuteSV.BND.1  .       <TRA>   51.9    PASS    PRECISE;SVTYPE=TRA;RE=14;RNAMES=04821c5c-d4da-4dc2-80f9-1442b5ddb72a,8f28177b-de81-4ecd-aba7-37263a2f7f05,1d499825-dfb5-45ea-98a5-fe16cf6541f5,de9e5e07-2fcf-4677-a7df-c53d3607a2ba,10867622-47fd-40d5-8e7f-ac25ae380431,b5d8446a-be7a-41d1-8c66-6069fceae18e,6931cc19-5fcd-4b0a-a535-42c48ec7ccaa,88f6b37c-5f2c-4321-9ce9-d7fa63fb1b3c,032622cf-9222-43b3-a146-30b7a074c261,cde75c30-857d-44e5-8d45-148e30614c9f,b3ddb5cd-6054-489e-903d-2608821eefda,7979d796-bf1b-4d18-beee-b1f993c2c287,5172ee36-fe44-4a8a-974c-f624e56b2526,4320724c-87d4-4847-a1a8-6e92cb7d78f3;IRIS_PROCESSED=1;IRIS_REFINED=0;CHR2=chr15;END=34376063;STRANDS=++;SVLEN=0;STARTVARIANCE=3.312500;ENDVARIANCE=1611210068452230.000000;AVG_LEN=0.000000;AVG_START=18649474.859813;AVG_END=34376059.794393;SUPP_VEC_EXT=101000010011111111000011111000000000101111011100110000010101011000000100000000000000000100000100011100010111011100001010101011011000111100000000000010010100000001001111110000111010000110010000100000000110100100100000011001000001110100100000000000010010110001000010101010001001010011001101010010000000000010;IDLIST_EXT=cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.0,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.0,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1;SUPP_EXT=107;SUPP_VEC=101000010011111111000011111000000000101111011100110000010101011000000100000000000000000100000100011100010111011100001010101011011000111100000000000010010100000001001111110000111010000110010000100000000110100100100000011001000001110100100000000000010010110001000010101010001001010011001101010010000000000010;SUPP=107;SVMETHOD=JASMINE;IDLIST=cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.0,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.0,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1       GT:IS:OT:OS:DV:DR

Hope that helps!

Cheers,
Wouter

No SV found when aligning pacbio CLR reads to reference

Hi,

I was running cuteSV on pacbio CLR reads on maize, but the output VCF file contains only header. When I checked the output directory by adding --retain_work_dir, there are only several empty sigs files. In the signatures folder, there are many bed files which contained lines starting with "DEL/DUP/INS/INV/TRA". The are no warnings or errors for the whole process. The accesion we sequenced is an inbred line so there should be no heterozygous problem. Is there something wrong so that those information was not passed to the final VCF file? Or is it because that there are many repeat in maize genomes or just cuteSV won't work with plant? I would appreciate if you can give me some suggestions. Many thanks!

Best,
Hui


I installed cuteSV via conda. And mapped CLR reads to genome and call snps with command as follows:

minimap2 -t${THREADS} -ax map-pb input/${REF} input/${READS} | samtools sort -@${THREADS} - > output/${SORTEDBAM}
cuteSV --threads ${THREADS} --max_cluster_bias_INS  100 \
    --retain_work_dir \
    --diff_ratio_merging_INS    0.3 \
    --max_cluster_bias_DEL    200 \
    --diff_ratio_merging_DEL    0.5 \
    output/${SORTEDBAM} input/${REF} output/${VCF} output/${VCFDIR}

The output VCF table was like the following:

DEL     chr1   670     30      m54065_170710_194541/48432050/101_9842
DEL     chr1    9756    30      m54065_170714_031530/22086563/0_31613
DEL     chr1    16544   71      m54167_170709_181157/53215835/44719_58683
DEL     chr1    14267   153     m54065_170713_170545/37618114/0_13180

The log file:

2021-05-19 14:03:46,328 [INFO] Running cuteSV --threads 20 --max_cluster_bias_INS 100 --retain_work_dir --diff_ratio_merging_INS 0.3 --max_cluster_bias_DEL 200 --diff_ratio_merging_DEL 0.5 output/AA.pacbio_clr.vs.BB.genome.bam input/BB.genome output/AA.pacbio_clr.vs.BB.genome.bam.vcf output/AA.pacbio_clr.vs.BB.genome.bam.cuteSV
2021-05-19 14:03:46,722 [INFO] The total number of chromsomes: 265
2021-05-19 14:04:12,055 [INFO] Finished 1:180000000-190000000.
2021-05-19 14:04:13,565 [INFO] Finished 1:110000000-120000000.
2021-05-19 14:04:14,647 [INFO] Finished 1:40000000-50000000.
2021-05-19 14:04:15,176 [INFO] Finished 1:20000000-30000000.
2021-05-19 14:04:15,968 [INFO] Finished 1:160000000-170000000.
2021-05-19 14:04:16,573 [INFO] Finished 1:60000000-70000000.
...
2021-05-19 14:10:09,849 [INFO] Rebuilding signatures of structural variants.
2021-05-19 14:10:13,971 [INFO] Clustering structural variants.
2021-05-19 14:10:14,137 [INFO] Writing to your output file.
2021-05-19 14:10:14,138 [INFO] Loading reference genome...
2021-05-19 14:10:27,169 [INFO] Finished in 400.84 seconds.

consulting about parameter settings on low depth ONT data

Hi,

I used the pipeline listed on https://github.com/tjiangHIT/sv-benchmark to detect ~8X ONT data of HG002, and evaluated using truvari and GIAB Tier1 truth-set.

Although the precision was high (~0.94), it can only call total ~6000 INSs and DELs, and the recall was relatively low (~0.1).

I was following all the pipeline listed on mentioned website, except that I used current version cuteSV (1.0.10), used the input of the original 27G 8X HG002 sequencing data on (https://ftp.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/UCSC_Ultralong_OxfordNanopore_Promethion/GM24385_1.fastq.gz), used minimap without MD tag, and used cuteSV with the hs37d5 reference.

I noticed that cuteSV version in this benchmark test was 1.0.3, so I am wondering if the version transition made the inner parameter setting changed, so that the performance on low depth gets lower? In this case how can I adjust the parameters to enhance low depth ONT mode?

Otherwise, was there something wrong with my workflow that dampened the results? Maybe because of the MD tag and the input reference genome?

END smaller than POS

Hi,

I noticed for quite some interchromosomal BND variants the END is smaller than the POS, which results in problems with bcftools and tabix.
In fact, it is against the VCF specifications which define the END value as End position on CHROM (used with symbolic alleles; see below)

In more detail:

• END: End reference position (1-based), indicating the variant spans positions POS–END on reference/contig CHROM. Normally this is the position of the last base in the REF allele, so it can be derived from POS and the length of REF, and no END INFO field is needed. However when symbolic alleles are used, e.g. in gVCF or structural variants, an explicit END INFO field provides variant span information that is otherwise unknown. This field is used to compute BCF’s rlen field (see 6.3.1) and is important when indexing VCF/BCF files to enable random access and querying by position.

So it turns out the END field contains the end position on the same chromosome as in CHROM, and not on the other chromosome for interchromosomal variants.

Cheers,
Wouter

Query regarding duplications from cutesSV output

Hi,

Could you please let me know why the ALT allele section is generally empty in case of duplication event? For example-

1 20949988 cuteSV.DUP.13 T 63.3 PASS PRECISE;SVTYPE=DUP;SVLEN=171;END=20950159;RE=16;STRAND=-+;RNAMES=NULL GT:DR:DV:PL:GQ 0/1:19:16:63,0,92:6

Also, I am assuming that since duplications are CNVs (essentially copy number gains), does the number of reads supported are the number of copy numbers?

Regards,
Prasun

Bug Report - Allele Frequency Output

Hi @tjiangHIT ,

For some of the SV values from cuteSV, the AF value is not being reported:

For these cases, would we expect an output of AF="."?
E.g

chrUn_KI270467v1        669     cuteSV.DUP.629  A       <DUP>   .       PASS    PRECISE;SVTYPE=DUP;SVLEN=3252;END=3921;RE=26;STRAND=-+;RNAMES=f3030d0a-9f7c-40cd-808f-aa4d8a3ba60e,bee42230-7acd-4    e6d-991f-26d41ed81135,b3f37141-7053-4519-a1b3-c713bac13bac,16853b17-cd33-4096-8f5f-c8507dbfd023,0e01e084-9696-4bb3-b7c8-49ed4e56310f,1f5f301a-409e-4aea-bbf5-10c8ea484f8a,425ca161-6b3e-4a94-a0e6-    dfc71e130416,80b9eb5b-fe12-4c6c-bb90-22f830f7c0b5,89e05aa5-db35-4589-b2af-8febbd6ff464,cd773f8b-8ccb-4ee4-8411-13b34d28c475,fe9a7885-463e-46bf-afaa-f4687d4062cf,e857a183-f64c-469d-a0b6-e17aa4ac7    72c,b749eb34-8591-4416-b4be-b5a1350b9e7a,507f10b8-9bac-494d-adbc-13426c083437,3c98a620-c9e5-44e5-af41-5bd086776dca,2e27a945-3130-485a-bc2a-68510b5e3c18,92f920a9-8a70-4de6-bf45-ac5d9d550b83,6301d    473-0c0f-4db6-b263-f8af65613ba5,ea0879e2-9020-4d4d-b2b6-871c5e2a8ac6,f54111a6-a317-4dcd-a838-4a3c035cd5e6,9689be54-7465-4279-8b31-4e8ac623df81,400410f9-cca5-4e97-99d7-672298b81dc5,6be3880d-2129-    44dd-ab36-db5079d6cc03,8d28f5c1-2d71-40df-941c-301e1d18482e,03b22ec4-bd1b-41dd-bae4-4f7a3358582f,1ec8abf5-177c-490c-9531-96c1e2c7d842 GT:DR:DV:PL:GQ  ./.:.:26:.,.,.:.

Known SV not detected in final output

Hello,

I'm using CuteSV on Nanopore data with 30X coverage. My sample has a known duplication that isn't making it into the final output but it is detected in the signatures file. I was wondering if you could recommend alternative parameters which could help me catch it. Here's what I've been using. I know it is supported by 7 reads all with a MAPQ of 60.

Thanks,
Melissa

cuteSV $BAM $REF ${PROJDIR}/CuteSV/${Sample}/${Sample}.CuteSV.vcf ${PROJDIR}/CuteSV/${Sample}/ \
			-S ${Sample} \
			--max_size 3000000 \
			--min_support 1 \
			--max_cluster_bias_INS 100 \
			--diff_ratio_merging_INS 0.3 \
			--max_cluster_bias_DEL 100 \
			--diff_ratio_merging_DEL 0.3 \
			 --retain_work_dir --report_readid
<style> </style>
DUP chr7 138861117 140782817 9aec5e38-c0aa-4005-b0b6-3da9ac1da2fd
DUP chr7 138861117 140782817 e5e61d25-4c36-4bfb-8b8c-87ea6b6f42e1
DUP chr7 138861117 140782817 e97c7c35-5587-4ef1-94fd-46065cf536b0
DUP chr7 138861117 140782817 76cfe648-35c1-4662-b940-44436c05a5a5
DUP chr7 138861117 140782817 2900cb39-6002-4cd1-8cea-de34d070e6a5
DUP chr7 138861118 140782816 5473d280-3c0e-4e84-a20a-adaf4f5218d3
DUP chr7 138861121 140782815 c60e972c-0743-4478-b23d-395b5b82c600

work_dir parameter error

Hey,

your program looks very interesting but I can't run. What's does the work_dir parameter mean?? If I omit it, I get an error

2020-09-03 16:45:01,175 [INFO] Running /home/guerrer/.local/bin/cuteSV mapped_sorted.bam ref.fasta mapped.vcf  --max_cluster_bias_INS 1000 --threads 12 --diff_ratio_merging_INS 0.9 --max_cluster_bias_DEL 1000 --diff_ratio_merging_DEL 0.5
2020-09-03 16:45:01,198 [INFO] The total number of chromsomes: 267
Traceback (most recent call last):
  File "/home/guerrer/.local/bin/cuteSV", line 708, in <module>
    run(sys.argv[1:])
  File "/home/guerrer/.local/bin/cuteSV", line 704, in run
    main_ctrl(args, argv)
  File "/home/guerrer/.local/bin/cuteSV", line 562, in main_ctrl
    os.mkdir("%ssignatures"%temporary_dir)
FileNotFoundError: [Errno 2] No such file or directory: 'mapped.vcf/signatures'

if I try to put a directory, it's not recognized:

cuteSV mapped_sorted.bam ref.fasta mapped.vcf ../folder_where_I_am

cuteSV: error: unrecognized arguments: ../folder_where_I_am

Cheers,
Ricardo

Only one type of SV in the output (SVTYPE=BND)

Dear developers,

I used cuteSV with oxford nanopore data (coverage 6.5x, nmglr aligner). My aim was to identify inversions from 4Mbp to 25Mbp . I received many potential results, but they all were with the same SV type : SVTYPE=BND What doest it mean? My code is below. Thank you!

cuteSV -t 10 -l 1000000 -L 1000000 -b 500000000 --min_support 3 --max_cluster_bias_INS 100
--diff_ratio_merging_INS 0.2 --diff_ratio_filtering_INS 0.6
--diff_ratio_filtering_DEL 0.7 BAM
BAM.vcf BAM_folder

Breakpoint problem

HI
Thanks for the software provided, but I have a question,.
what does the location of the resulting chromosome '[[',']]' represent when detecting BND,?
what if I want to stitch the sequences at the left and right ends of the breakpoint together, if the negative chain of the reference genome is aligned

Look forward to your answer

events don't merge with svanalyzer

I have been using svanalyzer with sniffles calls to compare SVs between members of a family of individuals, and while sniffles calls merge when they have an ALT allele sequence or are both in cis and defined by a pair of points (e.g. deletions with a start and stop coordinate delineating the entire event), my cuteSV calls do not merge between samples at all, despite, in some cases, of being clearly the same SV inherited from parent to child.

cuteSV on assembly: no DUP or BND

Hi,

I have run cuteSV in "assembly mode" following the instructions of the wiki. My input assembly is the CHM13 v1.0 assembly from the T2T consortium aligned to the reference genome GRCh38.p13. Since the CHM13 assembly is only from one haplotype, I did not haplotag the BAM file.

The VCF file I obtained from cute SV contained no break ends (and hence no translocations) and contained only one duplication on chrM. Is that expected behavior?

Thank you for the help.
Guillaume

Visualization of Structural Variants

Hello,

We are trying to run cuteSV and use the output VCF file to visualize the structural variants (SV).

If you are aware of any visualization tools for SVs please let us know. We greatly appreciate your help.

Thank you,
Kiran

Problems about VISOR in simulated alignments generation

cuteSV/simulate/README.md gives the way for simulated alignments gemeration, and I run this command:
VISOR LASeR -g reference_genome.fasta -s donor_genome_del -bed LASeR.bed -o data_del_5x -c 5 --threads 16 --noaddtag
when it finished it give me 3 files in data_del_5x, sim.srt.bam、sim.srt.bam.bai and VISOR_LASeR.log
Then I run cuteSV with sim.srt.bam, the command like this:
cuteSV sim.srt.bam ref.fasta cutesv.vcf ./
but got the error:
ValueError: file has no sequences defined (mode='r') - is it SAM/BAM format? Consider opening with check_sq=False
It seems that there is something wrong in the .bam file, but I have no idea about it. Is there anyone could help me about this?

Run VISOR LASeR using --noaddtag [Suggestion]

Hi @tjiangHIT,

it seems that you are using VISOR for your simulations. JFYI, you can speed things up by enforcing VISOR LASeR to skip haplotype and clone tagging of the generated BAM files if you do not need such informations in the final BAM (simply ignore this if you need the HP and CL tags). Something like:

VISOR LASeR -g reference_genome.fasta -s donor_genome_del -bed LASeR.bed -o data_del_5x -c 5 --threads 16 --noaddtag

This prevents VISOR to spend time in re-parsing the BAM files generated, which can be time-consuming for high coverages.
Good luck with your tool !

Best,

Davide

A question about -Ivcf

Hi, when I use -Ivcf argument in different platforms, such as PB CLR, ONT or PB HiFi, do I need to adjust the arguments about the platform (--max_cluster_bias_DEL , etc.) ?

Unable genotype

Hi @tjiangHIT,

I used cuteSV (version 1.0.6) with a human PacBio data (both ngmlr and pbmm2 aligner) according to the command:

cuteSV -t 16 -l 50 --max_cluster_bias_INS 100 --diff_ratio_merging_INS 0.2 --diff_ratio_filtering_INS 0.6 --diff_ratio_filtering_DEL 0.7 $bam_file $sample.vcf $sample

However, I got the vcf result that all the genotype of detected SV is "./.". It seems cuteSV couldn't measure the DR (High-quality reference reads) information. Do you have any suggestion for solving this problem? Thank you!

-  #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NULL
- 1       10338   cuteSV.DEL.0    N       <DEL>   .       PASS    PRECISE;SVTYPE=DEL;SVLEN=-76;END=10414;CIPOS=-19,19;CILEN=-6,6;RE=16;STRAND=+-  GT:DR:DV:PL:GQ  ./.:.:16:.,.,.:.
- 1       10402   cuteSV.INS.0    N       <INS>   .       PASS    PRECISE;SVTYPE=INS;SVLEN=58;END=10403;CIPOS=-4,4;CILEN=-2,2;RE=16       GT:DR:DV:PL:GQ  ./.:.:16:.,.,.:.
- 1       10818   cuteSV.INS.1    N       <INS>   .       PASS    PRECISE;SVTYPE=INS;SVLEN=100;END=10819;CIPOS=-21,21;CILEN=-7,7;RE=98    GT:DR:DV:PL:GQ  ./.:.:98:.,.,.:.
- 1       136960  cuteSV.INS.2    N       <INS>   .       PASS    PRECISE;SVTYPE=INS;SVLEN=280;END=136961;CIPOS=-22,22;CILEN=-11,11;RE=20 GT:DR:DV:PL:GQ  ./.:.:20:.,.,.:.
- 1       227418  cuteSV.BND.0    N       ]2:243159580]N  .       PASS    PRECISE;SVTYPE=BND;CHR2=2;END=243159580;RE=9    GT:DR:DV:PL:GQ  ./.:.:9:.,.,.:.
- 1       227421  cuteSV.BND.1    N       [5:180900319[N  .       PASS    PRECISE;SVTYPE=BND;CHR2=5;END=180900319;RE=11   GT:DR:DV:PL:GQ  ./.:.:11:.,.,.:.
- ...

Conda installs v1.0.8 and not the latest (v1.0.12)

Hi,

I tried the conda installation method and the installed version is v1.0.8, instead of v1.0.12.

cuteSV --version
cuteSV 1.0.8

I then tried "conda update cuteSV", but it keeps v1.0.8, doesn´t update it.

How can I solve this?

Thanks.

José

Nested deletion detection

Hi,
First of all thanks a lot for this super fast and performant tool.
I ran cuteSV on my ONT dataset with an average coverage of 30X. I know, that the sequenced cell line has a nested deletion. One allele has a large, about 30kb long deletion and the other allele shows a 15kb deletion- both end at the same location. The coverage profile looks like 30X, then for 15kb it is about 15X, then for another 15X there is no coverage, and then it is back to almost 30X.
When I run cuteSV, I get the large 30kb deletion reported, however, not split into hetero- and homozygous parts. Can you recommend any parameters to improve the sensitivity here? Is cuteSV in general able to detect such nexted events?
Thanks a lot!

query regarding QUAL

Hi,

I was just wondering if the variant quality (QUAL) in VCF is phred scaled. Its definition is absent in the vcf metadata (cuteSV-1.0.10). I am assuming it is, but just confirming. I observed that in the paper, variants were removed which had variant quality less than 5. Is it QUAL you are referring to?

Regards,
Prasun

Allele not matching the reference sequence

Hello,
I'm currently processing the variants called using HiFi sequences. I've got both short variants called using either DeepVariant or Clair and structural variants called using cuteSV. After calling the variants, I've combined the data using bcftools. However, since there is a level of overlap between the variants called by the short read caller and the SV caller (especially in the range 30-50bp) I'm trying to simplify the dataset using bcftools norm.
However, when I run the command as follow, I get the following error:

bcftools norm -f ref.fa tmp.vcf > tmp2.vcf
Reference allele mismatch at 000001F:745885 .. REF_SEQ:'T' vs VCF:'A'

I've seen the issue #43 and I understand the trouble in defining the actual breakpoint, especially with ambiguous positionings. However, I think it would be good to have the software to print out the allele matching the base, so that there is consistency and the VCF can be used in downstream analyses without having to apply custom fixing. Would it be possible to patch this issue?

Alternatively, does this issue affect only insertions? If so, is it always that the nucleotide refers to POS-1?
Thank you in advance,
Andrea

Installation not working

Hello!!

I have tried both:

conda install -c bioconda cutesv
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: |
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

and
git clone https://github.com/tjiangHIT/cuteSV.git && cd cuteSV/ && python setup.py install

Cloning into 'cuteSV'...
remote: Enumerating objects: 940, done.
remote: Counting objects: 100% (123/123), done.
remote: Compressing objects: 100% (91/91), done.
remote: Total 940 (delta 59), reused 73 (delta 32), pack-reused 817
Receiving objects: 100% (940/940), 2.98 MiB | 18.25 MiB/s, done.
Resolving deltas: 100% (515/515), done.
ERROR:root:code for hash md5 was not found.
Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 147, in
globals()[__func_name] = __get_hash(__func_name)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type md5
ERROR:root:code for hash sha1 was not found.
Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 147, in
globals()[__func_name] = __get_hash(__func_name)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha1
ERROR:root:code for hash sha224 was not found.
Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 147, in
globals()[__func_name] = __get_hash(__func_name)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha224
ERROR:root:code for hash sha256 was not found.
Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 147, in
globals()[__func_name] = __get_hash(__func_name)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha256
ERROR:root:code for hash sha384 was not found.
Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 147, in
globals()[__func_name] = __get_hash(__func_name)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha384
ERROR:root:code for hash sha512 was not found.
Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 147, in
globals()[__func_name] = __get_hash(__func_name)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha512
running install
running bdist_egg
running egg_info
creating src/cuteSV.egg-info
writing requirements to src/cuteSV.egg-info/requires.txt
writing src/cuteSV.egg-info/PKG-INFO
writing top-level names to src/cuteSV.egg-info/top_level.txt
writing dependency_links to src/cuteSV.egg-info/dependency_links.txt
writing manifest file 'src/cuteSV.egg-info/SOURCES.txt'
reading manifest file 'src/cuteSV.egg-info/SOURCES.txt'
writing manifest file 'src/cuteSV.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.13-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/cuteSV
copying src/cuteSV/cuteSV_resolveTRA.py -> build/lib/cuteSV
copying src/cuteSV/cuteSV_forcecalling.py -> build/lib/cuteSV
copying src/cuteSV/cuteSV_resolveINV.py -> build/lib/cuteSV
copying src/cuteSV/init.py -> build/lib/cuteSV
copying src/cuteSV/cuteSV_resolveDUP.py -> build/lib/cuteSV
copying src/cuteSV/cuteSV_genotype.py -> build/lib/cuteSV
copying src/cuteSV/CommandRunner.py -> build/lib/cuteSV
copying src/cuteSV/cuteSV_Description.py -> build/lib/cuteSV
copying src/cuteSV/diploid_calling.py -> build/lib/cuteSV
copying src/cuteSV/cuteSV_resolveINDEL.py -> build/lib/cuteSV
creating build/lib/benchmarks
copying src/benchmarks/eval_sim.py -> build/lib/benchmarks
copying src/benchmarks/sta_venn.py -> build/lib/benchmarks
copying src/benchmarks/init.py -> build/lib/benchmarks
copying src/benchmarks/eval_trio.py -> build/lib/benchmarks
copying src/benchmarks/vcf2bedpe.py -> build/lib/benchmarks
copying src/benchmarks/cmp_NA19240.py -> build/lib/benchmarks
copying src/benchmarks/multi_platform.py -> build/lib/benchmarks
creating build/bdist.macosx-10.13-x86_64
creating build/bdist.macosx-10.13-x86_64/egg
creating build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_resolveTRA.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_forcecalling.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_resolveINV.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/init.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_resolveDUP.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_genotype.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/CommandRunner.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_Description.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/diploid_calling.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_resolveINDEL.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
creating build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/eval_sim.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/sta_venn.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/init.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/eval_trio.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/vcf2bedpe.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/cmp_NA19240.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/multi_platform.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_resolveTRA.py to cuteSV_resolveTRA.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_forcecalling.py to cuteSV_forcecalling.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_resolveINV.py to cuteSV_resolveINV.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/init.py to init.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_resolveDUP.py to cuteSV_resolveDUP.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_genotype.py to cuteSV_genotype.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/CommandRunner.py to CommandRunner.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_Description.py to cuteSV_Description.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/diploid_calling.py to diploid_calling.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_resolveINDEL.py to cuteSV_resolveINDEL.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/eval_sim.py to eval_sim.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/sta_venn.py to sta_venn.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/init.py to init.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/eval_trio.py to eval_trio.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/vcf2bedpe.py to vcf2bedpe.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/cmp_NA19240.py to cmp_NA19240.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/multi_platform.py to multi_platform.pyc
installing package data to build/bdist.macosx-10.13-x86_64/egg
running install_data
copying LICENSE -> build/bdist.macosx-10.13-x86_64/egg/
creating build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
installing scripts to build/bdist.macosx-10.13-x86_64/egg/EGG-INFO/scripts
running install_scripts
running build_scripts
creating build/scripts-2.7
copying and adjusting src/cuteSV/cuteSV -> build/scripts-2.7
changing mode of build/scripts-2.7/cuteSV from 644 to 755
creating build/bdist.macosx-10.13-x86_64/egg/EGG-INFO/scripts
copying build/scripts-2.7/cuteSV -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO/scripts
changing mode of build/bdist.macosx-10.13-x86_64/egg/EGG-INFO/scripts/cuteSV to 755
copying src/cuteSV.egg-info/PKG-INFO -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
copying src/cuteSV.egg-info/SOURCES.txt -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
copying src/cuteSV.egg-info/dependency_links.txt -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
copying src/cuteSV.egg-info/not-zip-safe -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
copying src/cuteSV.egg-info/requires.txt -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
copying src/cuteSV.egg-info/top_level.txt -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
creating dist
creating 'dist/cuteSV-1.0.11-py2.7.egg' and adding 'build/bdist.macosx-10.13-x86_64/egg' to it
removing 'build/bdist.macosx-10.13-x86_64/egg' (and everything under it)
Processing cuteSV-1.0.11-py2.7.egg
removing '/usr/local/lib/python2.7/site-packages/cuteSV-1.0.11-py2.7.egg' (and everything under it)
creating /usr/local/lib/python2.7/site-packages/cuteSV-1.0.11-py2.7.egg
Extracting cuteSV-1.0.11-py2.7.egg to /usr/local/lib/python2.7/site-packages
cuteSV 1.0.11 is already the active version in easy-install.pth
Installing cuteSV script to /usr/local/bin

Installed /usr/local/lib/python2.7/site-packages/cuteSV-1.0.11-py2.7.egg
Processing dependencies for cuteSV==1.0.11
Searching for pyvcf
Reading https://pypi.org/simple/pyvcf/
Download error on https://pypi.org/simple/pyvcf/: unknown url type: https -- Some packages may not be found!
Couldn't find index page for 'pyvcf' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.org/simple/
Download error on https://pypi.org/simple/: unknown url type: https -- Some packages may not be found!
No local packages or working download links found for pyvcf
error: Could not find suitable distribution for Requirement.parse('pyvcf')

Pyvcf is installed
conda install -c conda-forge pyvcf
Collecting package metadata (current_repodata.json): done
Solving environment: done

All requested packages already installed.

Thanks a lot!!
T

No INS in the results

Hi!

I ran cuteSV on pacbio CLR data with following command:
cuteSV --max_cluster_bias_INS 100 --diff_ratio_merging_INS 0.3 --max_cluster_bias_DEL 200 --diff_ratio_merging_DEL 0.5 reads.bam hg38.no_alts.fasta output.vcf ./output
And I got DEL, DUP, INV, BND in the output, but no INS.
Is this normal for a cuteSV output? Do you have ideas about why this is happening?
Thanks!

INS sequence in ALT field

In cases of compound heterozygosity, there may be two different insertions of the same size at the same location in the genome on the two different alleles. In order to distinguish those in downstream analysis, it would be useful to have an option where the ALT allele field is not <INS> but rather contains either an example or a consensus of the inserted sequence.

Reversed start and end of BND

If the start and end of two BND, i.e. translocation, are reversed, does it means they are actually the same translocation event?
E.g.

id            start                end
TRA-1:   chr1,         1,    chr2,   100
TRA-2:   chr2,   100,   chr1,         1  

Increasing sensitivity of SV calling

As noted in Issue 42, I'm also having some issue detecting some known tandem duplication and deletion calls - I was wondering what parameters of cuteSV we could use to optimize sensitivity.

Thanks,
Jeremy

run cuteSV by chromosome

Dear @tjiangHIT

To speed up the process, can I run cuteSV parallelly by chromosome? I do not know whether the inter-chromsome aligned reads will be processed properly. Thanks for your attention.

samtools view -h -O BAM -o ${sample_id}.${chr}.bam ${in_bam} ${chr}
samtools index ${sample_id}.${chr}.bam ${sample_id}.${chr}.bai
cuteSV ${sample_id}.${chr}.bam ${ref} ${sample_id}.${chr}.cuteSV.vcf ${runDir}/${sample_id}/${chr} -s 5 -l 40 -L 1000000 -md 500 -mi 500 --max_cluster_bias_INS 100 --diff_ratio_merging_INS 0.9 --max_cluster_bias_DEL 200 --diff_ratio_merging_DEL 0.5 --genotype --sample ${sample_id} || exit 1

Best regards,
Zheng zhuqing

cuteSV call SV with length =0 from haplotype-resolved assemblies

Hello,

When I use cuteSV to call SVs from haplotype-resolved assemblies, a strange SV was called, with the length equal to 0, and no Ref seqs are available:
chr1_KI270709v1_random 1 cuteSV.DEL.6890 G 11.8 PASS SVTYPE=DEL;SVLEN=0;END=1;RE=1;RNAMES= GT 0/1
Any idea with this case? Thank you very much!

Translocations BND meaning. Inversions also?

Hello again,

I have small questions about translocations. Look at the following example (I'm still using 1.0.7, but I think it doesn't matter for the question):

NC_024459.2 115467803 cuteSV.BND.54 N ]NC_024463.2:150593897]N . PASS PRECISE;SVTYPE=BND;CHR2=NC_024463.2;END=150593897;RE=118;RNAMES=m64093_191209_130050/78512469/ccs_63,m64093_191209_130050/3606801/ccs_64,m64093_191209_130050/154338485 <clipped all read names out here> GT:DR:DV:PL:GQ ./.:.:118:.,.,.:.

So, this translocation comes from NC_024459.2 at position 115467803 to NC_024463.2 at position 150593897? Or the otherway around? What does this notation:

]NC_024463.2:150593897]N

exactly mean? Specifically I mean the "]" sign before the contig name and "]N" afterwards.

Another question is: Are inversions identifiable? Are they noted as translocations with the same chromossome and inverted coordinates?

Kind regards,
Ricardo

Wrong position

Hello,

I've noticed that some variants are reported as PRECISE and with a PASS filter but when taking a look at the alignment in the IGV the position reported is wrong in comparison to the actual precision of the structural variant.
image

As it can be observed the VCF file reports an insertion at position 143202282 in chromosome 1, while the actual position that can be observed is at 143202274 at chromosome 1. There's also the fact that the VCF states the reference is a T while in the position reported is an A, nonetheless in the actual position of the insertion the reference is indeed a T. Why could this be happening?

EDIT: I'd also like to mention there are variants reported that don't actually appear in the BAM files.

Best regards,

Jonatan

force recall option

One of the things that is likely to be a heavily used workflow in the future is to look at inheritance patterns of SVs in clinical samples for pathogenic SVs. Doing so requires a merging of several individuals' variant sets and merging of associated genotypes and such. One way this can be done currently is to merge SVs between samples, then to force the SV caller to make an assessment for each SV in the resulting superset of SVs because this merging step is quite messy. Such a two-step approach is described here: SV calling for a population using sniffles + SURVIVOR to perform SV calling and merging respectively.

Would it be possible to add a --forceall option so that this kind of two-step workflow could be performed using CuteSV rather than sniffles?

Question about calling INV

Hi @tjiangHIT,

Thanks for developing cuteSV! It's really nice to have more options of SV caller!
I have a question about calling INV. I run alignment by lra (a long read aligner), but cannot call INV at this site. In IGV, there is clearly an inversion there. Do you have any idea what is causing that? For a read containing an inversion, it would be split into 3 alignments in SAM output. Is there any specific output order of those 3 alignments in order to call INV? Or could the primary/supplementary flag influence this? Since there are 3 alignments, only one of them gets the primary flag, the rest two are supplementary. I'm just guessing if which one gets primary/supplementary flag will influence the calling of INV. Any help is deeply appreciated!!
image

Best,
Jingwen

Skip generating empty signature file & Rewrite the job schedule

cuteSV applies cat and sort command from Linux to perform merging and sorting SV signatures. It has great potential to improve this performance through decreasing the number of signature files and discarding those files with no signal. Besides, I adjusted the job schedule to accelerate the SV signatures clustering, especially when the reference genome containing tens of thousand contigs.

The evaluation of SVs

There is a question about the evaluation of SV results: For example, on my simulated data, I use minimap2+SVIM to predict SV calls, and then use SURVIVOR to evaluate and find that the recall is always around 0.42?

Output format

Hi,

Would it be possible to output the variants in BEDPE format with the number of supporting reads included in the output? I would like to do some post variant call filtering with bedtools so I need to convert the VCF to BEDPE. I currently am using SURVIVOR vcftobed to convert the VCF, but the output does not include the number of supporting reads for each call, which is part of the post filtering process.

Thanks,
Mike

PL tag number is incorrect

##FORMAT=<ID=PL,Number=1,Type=Integer,Description="# Phred-scaled genotype likelihoods rounded to the closest integer">

should be

##FORMAT=<ID=PL,Number=3,Type=Integer,Description="# Phred-scaled genotype likelihoods rounded to the closest integer">

KeyError when force calling

Hi,

I did the following with cuteSV 1.0.10:

  • CuteSV SV calling
  • merging the VCFs with Jasmine
  • CuteSV force calling with -Ivcf

However, the last step generates the following error:

Traceback (most recent call last):
  File "/home/wdecoster/p200/workflow_results/locustype/.snakemake/conda/ab0f7b26/bin/cuteSV", line 788, in <module>
    run(sys.argv[1:])
  File "/home/wdecoster/p200/workflow_results/locustype/.snakemake/conda/ab0f7b26/bin/cuteSV", line 784, in run
    main_ctrl(args, argv)
  File "/home/wdecoster/p200/workflow_results/locustype/.snakemake/conda/ab0f7b26/bin/cuteSV", line 658, in main_ctrl
    result = force_calling(args.input, args.Ivcf, args.output, temporary_dir,
  File "/home/wdecoster/p200/workflow_results/locustype/.snakemake/conda/ab0f7b26/lib/python3.8/site-packages/cuteSV/cuteSV_forcecalling.py", line 350, i
    for strand_iter in sv_dict['INV'][chrom]:
KeyError: 'chr15'

Regards,
Wouter

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.