tjianghit / cutesv Goto Github PK

View Code? Open in Web Editor NEW

241.0 7.0 34.0 3.15 MB

Long read based human genomic structural variation detection with cuteSV

License: MIT License

Python 100.00%

structural-variation pacbio-data oxford-nanopore

cutesv's Introduction

cuteSV

Getting Start

                                               __________    __       __
                                              |   ____   |  |  |     |  |
                          _                   |  |    |__|  |  |     |  |
 _______    _     _   ___| |___     ______    |  |          |  |     |  |
|  ___  |  | |   | | |___   ___|   / ____ \   |  |_______   |  |     |  |
| |   |_|  | |   | |     | |      / /____\ \  |_______   |  |  |     |  |
| |        | |   | |     | |      | _______|   __     |  |  \  \     /  /
| |    _   | |   | |     | |  _   | |     _   |  |    |  |   \  \   /  /
| |___| |  | |___| |     | |_| |  \ \____/ |  |  |____|  |    \  \_/  /
|_______|  |_______|     |_____|   \______/   |__________|     \_____/

Installation

$ pip install cuteSV
or
$ conda install -c bioconda cutesv
or 
$ git clone https://github.com/tjiangHIT/cuteSV.git && cd cuteSV/ && python setup.py install

Introduction

Long-read sequencing enables the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high sensitivity and performance simultaneously due to the complex SV characteristics implied by noisy long reads. Therefore, we propose cuteSV, a sensitive, fast and scalable long-read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to analyze the signatures to implement sensitive SV detection. Benchmarks on real Pacific Biosciences (PacBio) and Oxford Nanopore Technology (ONT) datasets demonstrate that cuteSV has better yields and scalability than state-of-the-art tools.

The benchmark results of cuteSV on the HG002 human sample are below:

BTW, we used Truvari to calculate the recall, precision, and f-measure. For more detailed implementation of SV benchmarks, we show an example here.

Notice

A new wiki page about diploid-assembly-based SV detection using cuteSV has been established. More details please see here.

We provided a new document for applying force calling (or regenotyping) benchmark here.

Dependence

1. python3
2. scipy
2. pysam
3. Biopython
4. cigar
5. numpy
6. pyvcf3
7. scikit-learn

Usage

cuteSV <sorted.bam> <reference.fa> <output.vcf> <work_dir>

Suggestions

> For PacBio CLR data:
	--max_cluster_bias_INS		100
	--diff_ratio_merging_INS	0.3
	--max_cluster_bias_DEL	200
	--diff_ratio_merging_DEL	0.5

> For PacBio CCS(HIFI) data:
	--max_cluster_bias_INS		1000
	--diff_ratio_merging_INS	0.9
	--max_cluster_bias_DEL	1000
	--diff_ratio_merging_DEL	0.5

> For ONT data:
	--max_cluster_bias_INS		100
	--diff_ratio_merging_INS	0.3
	--max_cluster_bias_DEL	100
	--diff_ratio_merging_DEL	0.3
> For force calling:
	--min_mapq 			10

Parameter	Description	Default
--threads	Number of threads to use.	16
--batches	Batch of genome segmentation interval.	10,000,000
--sample	Sample name/id	NULL
--retain_work_dir	Enable to retain temporary folder and files.	False
--write_old_sigs	Enable to output temporary sig files.	False
--report_readid	Enable to report supporting read ids for each SV.	False
--max_split_parts	Maximum number of split segments a read may be aligned before it is ignored. All split segments are considered when using -1. (Recommand -1 when applying assembly-based alignment.)	7
--min_mapq	Minimum mapping quality value of alignment to be taken into account.	20
--min_read_len	Ignores reads that only report alignments with not longer than bp.	500
--merge_del_threshold	Maximum distance of deletion signals to be merged.	0
--merge_ins_threshold	Maximum distance of insertion signals to be merged.	100
--min_support	Minimum number of reads that support a SV to be reported.	10
--min_size	Minimum length of SV to be reported.	30
--max_size	Maximum size of SV to be reported. Full length SVs are reported when using -1.	100000
--genotype	Enable to generate genotypes.	False
--gt_round	Maximum round of iteration for alignments searching if perform genotyping.	500
--read_range	The interval range for counting reads distribution.	1000
-Ivcf	Optional given vcf file. Enable to perform force calling.	NULL
--max_cluster_bias_INS	Maximum distance to cluster read together for insertion.	100
--diff_ratio_merging_INS	Do not merge breakpoints with basepair identity more than the ratio of default for insertion.	0.3
--max_cluster_bias_DEL	Maximum distance to cluster read together for deletion.	200
--diff_ratio_merging_DEL	Do not merge breakpoints with basepair identity more than the ratio of default for deletion.	0.5
--max_cluster_bias_INV	Maximum distance to cluster read together for inversion.	500
--max_cluster_bias_DUP	Maximum distance to cluster read together for duplication.	500
--max_cluster_bias_TRA	Maximum distance to cluster read together for translocation.	50
--diff_ratio_filtering_TRA	Filter breakpoints with basepair identity less than the ratio of default for translocation.	0.6
--remain_reads_ratio	The ratio of reads remained in cluster to generate the breakpoint. Set lower to get more precise breakpoint when the alignment data have high quality but recommand over 0.5.	1
-include_bed	Optional given bed file. Only detect SVs in regions in the BED file.	NULL

Datasets generated from cuteSV

We provided the SV callsets of the HG002 human sample produced by cuteSV form three different long-read sequencing platforms (i.e. PacBio CLR, PacBio CCS, and ONT PromethION).

You can download them at:

Please cite the manuscript of cuteSV before using these callsets.

Changelog

cuteSV (v2.1.1)
1. fix bugs in resolving reference genomes
2. modify several dependencies and remove some useless dependencies
3. update several evaluation scripts

cuteSV (v2.1.0)
1. Speed up both SV discovery calling and force calling comprehensively.
2. Upgrade the force calling module.
3. Modify the temporary files. The sigs file are only generated with the "write_old_sigs" parameter.
4. Update several regulations in signature extraction.

cuteSV (v2.0.3):
1. Fix the error of missing min_size parameter.
2. Fix the missing signatures in duplication clustering.

cuteSV (v2.0.2):
1. Fix several errors in signature extraction.
2. Filter low quality reads in the statistics of reference reads.
3. Modify the rule of merging signatures on the same read.
4. Modify the cluster rule of insertions and deletions in force calling.

cuteSV (v2.0.1):
1. Fix an error in handling strand in force calling.
2. Speed up the genotype module of discovery calling. The comparison results on various datasets are as follows.
  |     | cuteSV   | cuteSV2  |
  |     |(previous)| (latest) |
  | CCS | 900.37s  | 261.77s  |
  | CLR | 3620.00s | 2644.94s |
  | ONT | 2893.08s | 1264.26s |

cuteSV (v2.0.0):
1. Upgrate force calling module.
2. Add --remain_reads_ratio parameter in order to generate highly accurate record by discarding a few signatures.
3. Fix several bugs in inversion and translocation calling.
4. Remove the redundant operations in the signature extraction and accelerate the whole analysis.
5. Streamline the translocation output when performing force-calling.
6. Modify the signature matching rule.
7. Modify the sequence of the inserted allele.

cuteSV (v1.0.13):
1. Modify the breakpoints of alternative allele and reference allele. 
2. Fix an initialization error that will reproduce wrong diploid-assembly-based SV call.

cuteSV (v1.0.12):
1. Add Allele frequency (AF) info in the outputs.
2. Fix an index error when force calling BND variants.
3. Modify the parameter of --max_size and enable to report full length of SVs.

cuteSV (v1.0.11):
1. Add a script for post-processing typically cuteSV callsets from assembly-based alignments to generate the diploid-assembly-based SV calls.
2. Give a wiki page for helping uses to achieve assembly-based SV calling.
3. Improve acquirement of inserted sequence in a read whose primary alignment contains hardclips.
4. Improve the performance of force calling.
5. Enable cuteSV to output allele sequences when performing force calling with the VCF generated from other callers. 
6. Fix bugs to avoid the error raised by abnormal SV type.
7. Update the sort commands used in cuteSV.  
8. Update the parameter of --max_split_parts.

cuteSV (v1.0.10):
1. Fix a bug leading to calculate wrong TRA positions.
2. Add a file format conversion script that enable to transfer the vcf file to bedpe file. 
3. Involve several clustering-and-refinement strategies in force calling function.
4. Assessed the performance of force calling with Giab HG002 sample datasets (including CLR, CCS, and ONT platforms).

cuteSV (v1.0.9):
1. Change 0-based pos into 1-based pos in DUP in order to support bcftools conversion.
2. Correct REF and ALT fields. Adjust END value of INS to make it equal to the value of POS.
3. Improve the description of errors.
4. Add usegalaxy.eu badge.
5. Remove CHR2 and the corresponding END position on the BND call.
6. Skip generating empty signature file and rewrite the job schedule.
7. Add force calling function and enable cuteSV to perform population-based SV calling.
8. Fix several minor bugs.

cuteSV (v1.0.8):
1. Rewirte the function of ins/del signatures clustering.
2. Update the recommandation parameters for different sequencing datasets.
3. Replace <DEL>/<INS> with its variant allele sequence, which needs the reference genome sequence as input.
4. Fix several bugs.

cuteSV (v1.0.7):
1. Add read name list for each SV call.
2. Fix several descriptions in VCF header field.

cuteSV (v1.0.6):
1.Improvement of genotyping by calculation of likelihood.
2.Add variant quality value, phred-scaled genotype likelihood and genotype quality in order to filter false positive SV or quality control.
3.Add --gt_round parameter to control the number of read scans.
4.Add variant strand of DEL/DUP/INV.
5.Fix several bugs.

cuteSV (v1.0.5):
1.Add new options for specificly setting the threshold of deletion/insertion signals merging in the same read. The default parameters are 0 bp for deletion and 100 bp for insertion.
2.Remove parameter --merge_threshold.
3.Fix bugs in inversion and translocation calling.
4.Add new option for specificly setting the maximum size of SV to be discovered. The default value is 100,000 bp. 

cuteSV (v1.0.4):
1.Add a new option for specificly setting the threshold of SV signals merging in the same read. The default parameter is 500 bp. You can reduce it for high-quality sequencing datasets like PacBio HiFi (CCS).
2.Make the genotyping function optional.
3.Enable users to set the threshold of SV allele frequency of homozygous/heterozygous.
4.Update the description of recommendation parameters in processing ONT data.

cuteSV (v1.0.3):
1.Refine the genotyping model.
2.Adjust the threshold value of heterozygosis alleles.

cuteSV (v1.0.2):
1.Improve the genotyping performance and enable it to be default option.
2.Make the description of parameters better.
3.Modify the header description of vcf file.
4.Add two new indicators, i.e., BREAKPOINT_STD and SVLEN_STD, to further characterise deletion and insertion.
5.Remove a few redundant functions which will reduce code readability.

Citation

Jiang T et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol 21, 189 (2020). https://doi.org/10.1186/s13059-020-02107-y

Cao S et al. Re-genotyping structural variants through an accurate force-calling method. bioRxiv 2022.08.29.505534; doi: https://doi.org/10.1101/2022.08.29.505534

Contact

For advising, bug reporting and requiring help, please post on Github Issue or contact [email protected].

cutesv's People

Contributors

Stargazers

Watchers

cutesv's Issues

No INS in the results

Hi!

I ran cuteSV on pacbio CLR data with following command:
cuteSV --max_cluster_bias_INS 100 --diff_ratio_merging_INS 0.3 --max_cluster_bias_DEL 200 --diff_ratio_merging_DEL 0.5 reads.bam hg38.no_alts.fasta output.vcf ./output
And I got DEL, DUP, INV, BND in the output, but no INS.
Is this normal for a cuteSV output? Do you have ideas about why this is happening?
Thanks!

IndexError when force calling translocations

Hi,

I did the following with cuteSV 1.0.10:

CuteSV SV calling
merging the VCFs with Jasmine
CuteSV force calling with -Ivcf

However, the last step generates the following error:

Traceback (most recent call last):
  File "/home/wdecoster/miniconda3/envs/cutesv/bin/cuteSV", line 4, in <module>
    __import__('pkg_resources').run_script('cuteSV==1.0.10', 'cuteSV')
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 665, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1463, in run_script
    exec(code, namespace, namespace)
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/cuteSV-1.0.10-py3.8.egg/EGG-INFO/scripts/cuteSV", line 801, in <module>
    run(sys.argv[1:])
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/cuteSV-1.0.10-py3.8.egg/EGG-INFO/scripts/cuteSV", line 797, in run
    main_ctrl(args, argv)
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/cuteSV-1.0.10-py3.8.egg/EGG-INFO/scripts/cuteSV", line 671, in main_ctrl
    result = force_calling(args.input, args.Ivcf, args.output, temporary_dir,
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/cuteSV-1.0.10-py3.8.egg/cuteSV/cuteSV_forcecalling.py", line 334, in force_calling
    sv_type, chrom, sv_chr2, pos, sv_end, sv_strand = parse_record(record)
  File "/home/wdecoster/miniconda3/envs/cutesv/lib/python3.8/site-packages/cuteSV-1.0.10-py3.8.egg/cuteSV/cuteSV_forcecalling.py", line 73, in parse_record
    end = int(tra_alt.split(':')[1])
IndexError: list index out of range

I am afraid I can't share the VCF or bam files with you to reproduce this. Based on the error I suspect it is something with translocations, and removing these from the jasmine-merged file solves the error. I will show you how the first nine columns of TRA records look like after merging with Jasmine:

chr11   18643255        0_cuteSV.BND.0  .       <TRA>   219.5   PASS    PRECISE;SVTYPE=TRA;RE=23;RNAMES=1d499825-dfb5-45ea-98a5-fe16cf6541f5,88f6b37c-5f2c-4321-9ce9-d7fa63fb1b3c,8f28177b-de81-4ecd-aba7-37263a2f7f05,de9e5e07-2fcf-4677-a7df-c53d3607a2ba,b5d8446a-be7a-41d1-8c66-6069fceae18e,b1816320-0708-44e8-a75c-72f3dd4685f6,d1359d10-8ddd-4ff2-87b5-aa46fac52d87,5172ee36-fe44-4a8a-974c-f624e56b2526,4320724c-87d4-4847-a1a8-6e92cb7d78f3,68cda33f-1af3-4138-88de-af9d9c8da685,10867622-47fd-40d5-8e7f-ac25ae380431,493637b1-03ae-489d-bbb2-632d3cf83369,7979d796-bf1b-4d18-beee-b1f993c2c287,04821c5c-d4da-4dc2-80f9-1442b5ddb72a,6ec55480-0de9-4e0d-b0e3-16b6d843e7bc,6931cc19-5fcd-4b0a-a535-42c48ec7ccaa,ff16f2ba-1beb-4785-a800-2077dcb4e1fb,0520b923-f1d1-4bcc-948f-9011d36607cc,032622cf-9222-43b3-a146-30b7a074c261,a96cca8a-18c7-4286-b116-77ccde9d5a4e,cde75c30-857d-44e5-8d45-148e30614c9f,b3ddb5cd-6054-489e-903d-2608821eefda,8b89de4d-980c-4a60-984a-ffb498abb866;IRIS_PROCESSED=1;IRIS_REFINED=0;CHR2=chr15;END=34376485;STRANDS=--;SVLEN=0;STARTVARIANCE=645871785781648.800000;ENDVARIANCE=1291743571563295.200000;AVG_LEN=0.000000;AVG_START=18643256.792899;AVG_END=34376486.769231;SUPP_VEC_EXT=101000110111110111000111111010011011111111011101110001000101011000010100001010000110000111100100011101110111011100011110101011111101111101100000000010010100100001011111111110111111101110110000110000100111100110110100011101101101110110100001110000111010110001000110111010001111011111101101111010000000001010;IDLIST_EXT=cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0;SUPP_EXT=169;SUPP_VEC=101000110111110111000111111010011011111111011101110001000101011000010100001010000110000111100100011101110111011100011110101011111101111101100000000010010100100001011111111110111111101110110000110000100111100110110100011101101101110110100001110000111010110001000110111010001111011111101101111010000000001010;SUPP=169;SVMETHOD=JASMINE;IDLIST=cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0,cuteSV.BND.0        GT:IS:OT:OS:DV:DR       
chr11   18649472        0_cuteSV.BND.1  .       <TRA>   51.9    PASS    PRECISE;SVTYPE=TRA;RE=14;RNAMES=04821c5c-d4da-4dc2-80f9-1442b5ddb72a,8f28177b-de81-4ecd-aba7-37263a2f7f05,1d499825-dfb5-45ea-98a5-fe16cf6541f5,de9e5e07-2fcf-4677-a7df-c53d3607a2ba,10867622-47fd-40d5-8e7f-ac25ae380431,b5d8446a-be7a-41d1-8c66-6069fceae18e,6931cc19-5fcd-4b0a-a535-42c48ec7ccaa,88f6b37c-5f2c-4321-9ce9-d7fa63fb1b3c,032622cf-9222-43b3-a146-30b7a074c261,cde75c30-857d-44e5-8d45-148e30614c9f,b3ddb5cd-6054-489e-903d-2608821eefda,7979d796-bf1b-4d18-beee-b1f993c2c287,5172ee36-fe44-4a8a-974c-f624e56b2526,4320724c-87d4-4847-a1a8-6e92cb7d78f3;IRIS_PROCESSED=1;IRIS_REFINED=0;CHR2=chr15;END=34376063;STRANDS=++;SVLEN=0;STARTVARIANCE=3.312500;ENDVARIANCE=1611210068452230.000000;AVG_LEN=0.000000;AVG_START=18649474.859813;AVG_END=34376059.794393;SUPP_VEC_EXT=101000010011111111000011111000000000101111011100110000010101011000000100000000000000000100000100011100010111011100001010101011011000111100000000000010010100000001001111110000111010000110010000100000000110100100100000011001000001110100100000000000010010110001000010101010001001010011001101010010000000000010;IDLIST_EXT=cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.0,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.0,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1;SUPP_EXT=107;SUPP_VEC=101000010011111111000011111000000000101111011100110000010101011000000100000000000000000100000100011100010111011100001010101011011000111100000000000010010100000001001111110000111010000110010000100000000110100100100000011001000001110100100000000000010010110001000010101010001001010011001101010010000000000010;SUPP=107;SVMETHOD=JASMINE;IDLIST=cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.0,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.0,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1,cuteSV.BND.1       GT:IS:OT:OS:DV:DR

Hope that helps!

Cheers,
Wouter

Query regarding duplications from cutesSV output

Hi,

Could you please let me know why the ALT allele section is generally empty in case of duplication event? For example-

1 20949988 cuteSV.DUP.13 T 63.3 PASS PRECISE;SVTYPE=DUP;SVLEN=171;END=20950159;RE=16;STRAND=-+;RNAMES=NULL GT:DR:DV:PL:GQ 0/1:19:16:63,0,92:6

Also, I am assuming that since duplications are CNVs (essentially copy number gains), does the number of reads supported are the number of copy numbers?

Regards,
Prasun

events don't merge with svanalyzer

I have been using svanalyzer with sniffles calls to compare SVs between members of a family of individuals, and while sniffles calls merge when they have an ALT allele sequence or are both in cis and defined by a pair of points (e.g. deletions with a start and stop coordinate delineating the entire event), my cuteSV calls do not merge between samples at all, despite, in some cases, of being clearly the same SV inherited from parent to child.

force recall option

One of the things that is likely to be a heavily used workflow in the future is to look at inheritance patterns of SVs in clinical samples for pathogenic SVs. Doing so requires a merging of several individuals' variant sets and merging of associated genotypes and such. One way this can be done currently is to merge SVs between samples, then to force the SV caller to make an assessment for each SV in the resulting superset of SVs because this merging step is quite messy. Such a two-step approach is described here: SV calling for a population using sniffles + SURVIVOR to perform SV calling and merging respectively.

Would it be possible to add a --forceall option so that this kind of two-step workflow could be performed using CuteSV rather than sniffles?

Skip generating empty signature file & Rewrite the job schedule

cuteSV applies cat and sort command from Linux to perform merging and sorting SV signatures. It has great potential to improve this performance through decreasing the number of signature files and discarding those files with no signal. Besides, I adjusted the job schedule to accelerate the SV signatures clustering, especially when the reference genome containing tens of thousand contigs.

Wrong position

Hello,

I've noticed that some variants are reported as PRECISE and with a PASS filter but when taking a look at the alignment in the IGV the position reported is wrong in comparison to the actual precision of the structural variant.

As it can be observed the VCF file reports an insertion at position 143202282 in chromosome 1, while the actual position that can be observed is at 143202274 at chromosome 1. There's also the fact that the VCF states the reference is a T while in the position reported is an A, nonetheless in the actual position of the insertion the reference is indeed a T. Why could this be happening?

EDIT: I'd also like to mention there are variants reported that don't actually appear in the BAM files.

Best regards,

Jonatan

execution error

Hello!!

I am trying to run cuteSV with my PacBio yeast data and I get this error:

cuteSV /Users/io/Documents/TB50/SVs/PacBio_alignedReads_pbmm2.sorted.bam /Users/io/Documents/TB50/SVs/Final_SV_calling/reference_genome/S288C_genome.fasta cuteSV_PB.vcf /Users/io/Documents/TB50/SVs/Final_SV_calling/cuteSV/results --max_cluster_bias_INS 100 --diff_ratio_merging_INS 0.3 --max_cluster_bias_DEL 200 --diff_ratio_merging_DEL 0.5
2021-09-08 10:15:41,613 [INFO] Running /Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/bin/cuteSV /Users/io/Documents/TB50/SVs/PacBio_alignedReads_pbmm2.sorted.bam /Users/io/Documents/TB50/SVs/Final_SV_calling/reference_genome/S288C_genome.fasta cuteSV_PB.vcf /Users/io/Documents/TB50/SVs/Final_SV_calling/cuteSV/results --max_cluster_bias_INS 100 --diff_ratio_merging_INS 0.3 --max_cluster_bias_DEL 200 --diff_ratio_merging_DEL 0.5
2021-09-08 10:15:41,661 [INFO] The total number of chromsomes: 17
Traceback (most recent call last):
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/bin/cuteSV", line 4, in
import('pkg_resources').run_script('cuteSV==1.0.11', 'cuteSV')
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/site-packages/pkg_resources/init.py", line 651, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/site-packages/pkg_resources/init.py", line 1448, in run_script
exec(code, namespace, namespace)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/site-packages/cuteSV-1.0.11-py3.9.egg/EGG-INFO/scripts/cuteSV", line 802, in
run(sys.argv[1:])
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/site-packages/cuteSV-1.0.11-py3.9.egg/EGG-INFO/scripts/cuteSV", line 798, in run
main_ctrl(args, argv)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/site-packages/cuteSV-1.0.11-py3.9.egg/EGG-INFO/scripts/cuteSV", line 627, in main_ctrl
analysis_pools = Pool(processes=int(args.threads))
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/pool.py", line 212, in init
self._repopulate_pool()
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/Users/io/Documents/TB50/SVs/Final_SV_calling/sv_callers/software/sv-callers/miniconda3/lib/python3.9/multiprocessing/spawn.py", line 183, in get_preparation_data
main_mod_name = getattr(main_module.spec, "name", None)
AttributeError: module 'main' has no attribute 'spec'

Any idea??
Thanks!
T

Known SV not detected in final output

Hello,

I'm using CuteSV on Nanopore data with 30X coverage. My sample has a known duplication that isn't making it into the final output but it is detected in the signatures file. I was wondering if you could recommend alternative parameters which could help me catch it. Here's what I've been using. I know it is supported by 7 reads all with a MAPQ of 60.

Thanks,
Melissa

cuteSV $BAM $REF ${PROJDIR}/CuteSV/${Sample}/${Sample}.CuteSV.vcf ${PROJDIR}/CuteSV/${Sample}/ \
			-S ${Sample} \
			--max_size 3000000 \
			--min_support 1 \
			--max_cluster_bias_INS 100 \
			--diff_ratio_merging_INS 0.3 \
			--max_cluster_bias_DEL 100 \
			--diff_ratio_merging_DEL 0.3 \
			 --retain_work_dir --report_readid

DUP	chr7	138861117	140782817	9aec5e38-c0aa-4005-b0b6-3da9ac1da2fd
DUP	chr7	138861117	140782817	e5e61d25-4c36-4bfb-8b8c-87ea6b6f42e1
DUP	chr7	138861117	140782817	e97c7c35-5587-4ef1-94fd-46065cf536b0
DUP	chr7	138861117	140782817	76cfe648-35c1-4662-b940-44436c05a5a5
DUP	chr7	138861117	140782817	2900cb39-6002-4cd1-8cea-de34d070e6a5
DUP	chr7	138861118	140782816	5473d280-3c0e-4e84-a20a-adaf4f5218d3
DUP	chr7	138861121	140782815	c60e972c-0743-4478-b23d-395b5b82c600

Feature Request - Allele Frequency Output

Hi @tjiangHIT ,

I was wondering if there could be an option to output allele frequency ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency."> that other SV callers like Sniffles output.

ValueError: mapping information not recorded in index or index not available

Hi,

I will be using cuteSV to detect the variations but I cannot properly run the codes, getting "ValueError: mapping information not recorded in index or index not available" or "[E::idx_find_and_load] Could not retrieve index file for 'input.bam'"

Thanks,
Nurbanu

Only one type of SV in the output (SVTYPE=BND)

Dear developers,

I used cuteSV with oxford nanopore data (coverage 6.5x, nmglr aligner). My aim was to identify inversions from 4Mbp to 25Mbp . I received many potential results, but they all were with the same SV type : SVTYPE=BND What doest it mean? My code is below. Thank you!

cuteSV -t 10 -l 1000000 -L 1000000 -b 500000000 --min_support 3 --max_cluster_bias_INS 100
--diff_ratio_merging_INS 0.2 --diff_ratio_filtering_INS 0.6
--diff_ratio_filtering_DEL 0.7 BAM
BAM.vcf BAM_folder

Allele not matching the reference sequence

Hello,
I'm currently processing the variants called using HiFi sequences. I've got both short variants called using either DeepVariant or Clair and structural variants called using cuteSV. After calling the variants, I've combined the data using bcftools. However, since there is a level of overlap between the variants called by the short read caller and the SV caller (especially in the range 30-50bp) I'm trying to simplify the dataset using bcftools norm.
However, when I run the command as follow, I get the following error:

bcftools norm -f ref.fa tmp.vcf > tmp2.vcf
Reference allele mismatch at 000001F:745885 .. REF_SEQ:'T' vs VCF:'A'

I've seen the issue #43 and I understand the trouble in defining the actual breakpoint, especially with ambiguous positionings. However, I think it would be good to have the software to print out the allele matching the base, so that there is consistency and the VCF can be used in downstream analyses without having to apply custom fixing. Would it be possible to patch this issue?

Alternatively, does this issue affect only insertions? If so, is it always that the nucleotide refers to POS-1?
Thank you in advance,
Andrea

Output format

Hi,

Would it be possible to output the variants in BEDPE format with the number of supporting reads included in the output? I would like to do some post variant call filtering with bedtools so I need to convert the VCF to BEDPE. I currently am using SURVIVOR vcftobed to convert the VCF, but the output does not include the number of supporting reads for each call, which is part of the post filtering process.

Thanks,
Mike

run cuteSV by chromosome

Dear @tjiangHIT

To speed up the process, can I run cuteSV parallelly by chromosome? I do not know whether the inter-chromsome aligned reads will be processed properly. Thanks for your attention.

samtools view -h -O BAM -o ${sample_id}.${chr}.bam ${in_bam} ${chr}
samtools index ${sample_id}.${chr}.bam ${sample_id}.${chr}.bai
cuteSV ${sample_id}.${chr}.bam ${ref} ${sample_id}.${chr}.cuteSV.vcf ${runDir}/${sample_id}/${chr} -s 5 -l 40 -L 1000000 -md 500 -mi 500 --max_cluster_bias_INS 100 --diff_ratio_merging_INS 0.9 --max_cluster_bias_DEL 200 --diff_ratio_merging_DEL 0.5 --genotype --sample ${sample_id} || exit 1

Best regards,
Zheng zhuqing

AttributeError: 'Namespace' object has no attribute 'merge_del_threshold'

Hi there,

I ran cuteSV and had the following error:

AttributeError: 'Namespace' object has no attribute 'merge_del_threshold'

Thanks!

the meaning of the output for eval_sim.py

I want to use eval_sim.py to evaluate my simulated data, but I don't know the meaning for the output. How can I get precision, recall, F1-score?

Visualization of Structural Variants

Hello,

We are trying to run cuteSV and use the output VCF file to visualize the structural variants (SV).

If you are aware of any visualization tools for SVs please let us know. We greatly appreciate your help.

Thank you,
Kiran

Nested deletion detection

Hi,
First of all thanks a lot for this super fast and performant tool.
I ran cuteSV on my ONT dataset with an average coverage of 30X. I know, that the sequenced cell line has a nested deletion. One allele has a large, about 30kb long deletion and the other allele shows a 15kb deletion- both end at the same location. The coverage profile looks like 30X, then for 15kb it is about 15X, then for another 15X there is no coverage, and then it is back to almost 30X.
When I run cuteSV, I get the large 30kb deletion reported, however, not split into hetero- and homozygous parts. Can you recommend any parameters to improve the sensitivity here? Is cuteSV in general able to detect such nexted events?
Thanks a lot!

INS sequence in ALT field

In cases of compound heterozygosity, there may be two different insertions of the same size at the same location in the genome on the two different alleles. In order to distinguish those in downstream analysis, it would be useful to have an option where the ALT allele field is not <INS> but rather contains either an example or a consensus of the inserted sequence.

Bug Report - Allele Frequency Output

Hi @tjiangHIT ,

For some of the SV values from cuteSV, the AF value is not being reported:

For these cases, would we expect an output of AF="."?
E.g

chrUn_KI270467v1        669     cuteSV.DUP.629  A       <DUP>   .       PASS    PRECISE;SVTYPE=DUP;SVLEN=3252;END=3921;RE=26;STRAND=-+;RNAMES=f3030d0a-9f7c-40cd-808f-aa4d8a3ba60e,bee42230-7acd-4    e6d-991f-26d41ed81135,b3f37141-7053-4519-a1b3-c713bac13bac,16853b17-cd33-4096-8f5f-c8507dbfd023,0e01e084-9696-4bb3-b7c8-49ed4e56310f,1f5f301a-409e-4aea-bbf5-10c8ea484f8a,425ca161-6b3e-4a94-a0e6-    dfc71e130416,80b9eb5b-fe12-4c6c-bb90-22f830f7c0b5,89e05aa5-db35-4589-b2af-8febbd6ff464,cd773f8b-8ccb-4ee4-8411-13b34d28c475,fe9a7885-463e-46bf-afaa-f4687d4062cf,e857a183-f64c-469d-a0b6-e17aa4ac7    72c,b749eb34-8591-4416-b4be-b5a1350b9e7a,507f10b8-9bac-494d-adbc-13426c083437,3c98a620-c9e5-44e5-af41-5bd086776dca,2e27a945-3130-485a-bc2a-68510b5e3c18,92f920a9-8a70-4de6-bf45-ac5d9d550b83,6301d    473-0c0f-4db6-b263-f8af65613ba5,ea0879e2-9020-4d4d-b2b6-871c5e2a8ac6,f54111a6-a317-4dcd-a838-4a3c035cd5e6,9689be54-7465-4279-8b31-4e8ac623df81,400410f9-cca5-4e97-99d7-672298b81dc5,6be3880d-2129-    44dd-ab36-db5079d6cc03,8d28f5c1-2d71-40df-941c-301e1d18482e,03b22ec4-bd1b-41dd-bae4-4f7a3358582f,1ec8abf5-177c-490c-9531-96c1e2c7d842 GT:DR:DV:PL:GQ  ./.:.:26:.,.,.:.

cuteSV call SV with length =0 from haplotype-resolved assemblies

Hello,

When I use cuteSV to call SVs from haplotype-resolved assemblies, a strange SV was called, with the length equal to 0, and no Ref seqs are available:
chr1_KI270709v1_random 1 cuteSV.DEL.6890 G 11.8 PASS SVTYPE=DEL;SVLEN=0;END=1;RE=1;RNAMES= GT 0/1
Any idea with this case? Thank you very much!

could cuteSV use for tetraploid？ like Brassica napus L.

i notice that cuteSV is used for diploid-assembly-based SV detection, so dose it could also use for tetraploid-assembly-based SV detection？

Increasing sensitivity of SV calling

As noted in Issue 42, I'm also having some issue detecting some known tandem duplication and deletion calls - I was wondering what parameters of cuteSV we could use to optimize sensitivity.

Thanks,
Jeremy

cuteSV on assembly: no DUP or BND

Hi,

I have run cuteSV in "assembly mode" following the instructions of the wiki. My input assembly is the CHM13 v1.0 assembly from the T2T consortium aligned to the reference genome GRCh38.p13. Since the CHM13 assembly is only from one haplotype, I did not haplotag the BAM file.

The VCF file I obtained from cute SV contained no break ends (and hence no translocations) and contained only one duplication on chrM. Is that expected behavior?

Thank you for the help.
Guillaume

query regarding QUAL

Hi,

I was just wondering if the variant quality (QUAL) in VCF is phred scaled. Its definition is absent in the vcf metadata (cuteSV-1.0.10). I am assuming it is, but just confirming. I observed that in the paper, variants were removed which had variant quality less than 5. Is it QUAL you are referring to?

Regards,
Prasun

Breakpoint problem

HI
Thanks for the software provided, but I have a question,.
what does the location of the resulting chromosome '[[',']]' represent when detecting BND,?
what if I want to stitch the sequences at the left and right ends of the breakpoint together, if the negative chain of the reference genome is aligned

Look forward to your answer

END smaller than POS

Hi,

I noticed for quite some interchromosomal BND variants the END is smaller than the POS, which results in problems with bcftools and tabix.
In fact, it is against the VCF specifications which define the END value as End position on CHROM (used with symbolic alleles; see below)

In more detail:

• END: End reference position (1-based), indicating the variant spans positions POS–END on reference/contig CHROM. Normally this is the position of the last base in the REF allele, so it can be derived from POS and the length of REF, and no END INFO field is needed. However when symbolic alleles are used, e.g. in gVCF or structural variants, an explicit END INFO field provides variant span information that is otherwise unknown. This field is used to compute BCF’s rlen field (see 6.3.1) and is important when indexing VCF/BCF files to enable random access and querying by position.

So it turns out the END field contains the end position on the same chromosome as in CHROM, and not on the other chromosome for interchromosomal variants.

Cheers,
Wouter

Question about calling INV

Hi @tjiangHIT,

Thanks for developing cuteSV! It's really nice to have more options of SV caller!
I have a question about calling INV. I run alignment by lra (a long read aligner), but cannot call INV at this site. In IGV, there is clearly an inversion there. Do you have any idea what is causing that? For a read containing an inversion, it would be split into 3 alignments in SAM output. Is there any specific output order of those 3 alignments in order to call INV? Or could the primary/supplementary flag influence this? Since there are 3 alignments, only one of them gets the primary flag, the rest two are supplementary. I'm just guessing if which one gets primary/supplementary flag will influence the calling of INV. Any help is deeply appreciated!!

Best,
Jingwen

The evaluation of SVs

There is a question about the evaluation of SV results: For example, on my simulated data, I use minimap2+SVIM to predict SV calls, and then use SURVIVOR to evaluate and find that the recall is always around 0.42?

Add a wiki for SV calling based on diploid assembly alignment

Hello everyone,

A new script that post-processes the typically cuteSV callsets from assembly alignments has been developed.
This will help users to generate the diploid-assembly-based SV callsets.
The evaluations are shown that the F1 scores of SVs presence and genotype achieved over 95%.
More details please see wiki.

Best,
Tao

supporting reads?

I'm currently trying to replace sniffles in a workflow with cuteSV. One of the more useful features for complex SV confirmation is that sniffles outputs the read_names for the reads that support each SV in the INFO.RNAMES field. Might I request this as a feature?

A question about -Ivcf

Hi, when I use -Ivcf argument in different platforms, such as PB CLR, ONT or PB HiFi, do I need to adjust the arguments about the platform (--max_cluster_bias_DEL , etc.) ?

consulting about parameter settings on low depth ONT data

Hi,

I used the pipeline listed on https://github.com/tjiangHIT/sv-benchmark to detect ~8X ONT data of HG002, and evaluated using truvari and GIAB Tier1 truth-set.

Although the precision was high (~0.94), it can only call total ~6000 INSs and DELs, and the recall was relatively low (~0.1).

I was following all the pipeline listed on mentioned website, except that I used current version cuteSV (1.0.10), used the input of the original 27G 8X HG002 sequencing data on (https://ftp.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/UCSC_Ultralong_OxfordNanopore_Promethion/GM24385_1.fastq.gz), used minimap without MD tag, and used cuteSV with the hs37d5 reference.

I noticed that cuteSV version in this benchmark test was 1.0.3, so I am wondering if the version transition made the inner parameter setting changed, so that the performance on low depth gets lower? In this case how can I adjust the parameters to enhance low depth ONT mode?

Otherwise, was there something wrong with my workflow that dampened the results? Maybe because of the MD tag and the input reference genome?

Wrong insertion allele?

Hi,

I think cuteSV might calls variant alleles in a wrong way for some insertions. I provide an example below. Please would it be possible for you to check if it is a bug or an expected call?

I believe the issue is only present in insertions, like the one shown on the attached IGV screenshot. The reference sequence is repeated and there is a 58 base insertion in about half of the reads. Although the insertion appears at different locations I believe it is the same heterozygous variant, and the difference in location is caused by different ways one can put a variant in repeated sequence.

CuteSV calls one SV in this region (the first variant in the attached vcf file):

6 102204264 cuteSV.INS.0 C CAAATTTTAGGTATTGCTTTCCTCCACTGTCTAAAACAGAATCTGGATAATGTATCTT

and it is a 57 base insertion. The problem is that the variant allele from the vcf file above is quite different from the sequence inserted in the reads from which cuteSV called the variant. A few examples of the sequences inserted in reads at different locations (extracted from the bam file using samtools mpileup):

"TACATTTTCCTAATCTACATTTATAATGATTACATTTTATAATATACATTTATAATAGC"
"TTTATAATCTACATTTATAAATGGTTACATTTATAATCTACATTTATAAATGGTTACA"
"TATAAATGGTTACATTTATAATCTACATTTATAAATGGTTACATTTATAATCTACATT"
"ATTTATAATCTACATTTATAAATGGTTACATTTATAATCTACATTTATAAATGGTTAC"

I think the allele called by cuteSV cannot be traced to the read insertions above.

I have seen other examples similar to this one, can provide them if needed.

The bam and vcf files are attached.
The vcf was generated by this command:
cuteSV example.bam human_g1k_v37.fasta example.vcf ./ --genotype
The fasta file is the standard 1000 genomes hg19 reference file.
The version of software is cuteSV 1.0.12 cloned from this repository on 12 January 2022.

Many thanks,
Boris

example.zip

Reversed start and end of BND

If the start and end of two BND, i.e. translocation, are reversed, does it means they are actually the same translocation event?
E.g.

id            start                end
TRA-1:   chr1,         1,    chr2,   100
TRA-2:   chr2,   100,   chr1,         1

question about INFO

hello:
when i run cutesv on ngmlr.sort.bam , INFO of all out which including BND is IMPRECISE;SVTYPE=BND;RE=1;RNAMES=NULL
IMPRECISE represent Imprecise structural variant,what is wrong? And there is no CHR2 and END?

KeyError when force calling

Hi,

I did the following with cuteSV 1.0.10:

CuteSV SV calling
merging the VCFs with Jasmine
CuteSV force calling with -Ivcf

However, the last step generates the following error:

Traceback (most recent call last):
  File "/home/wdecoster/p200/workflow_results/locustype/.snakemake/conda/ab0f7b26/bin/cuteSV", line 788, in <module>
    run(sys.argv[1:])
  File "/home/wdecoster/p200/workflow_results/locustype/.snakemake/conda/ab0f7b26/bin/cuteSV", line 784, in run
    main_ctrl(args, argv)
  File "/home/wdecoster/p200/workflow_results/locustype/.snakemake/conda/ab0f7b26/bin/cuteSV", line 658, in main_ctrl
    result = force_calling(args.input, args.Ivcf, args.output, temporary_dir,
  File "/home/wdecoster/p200/workflow_results/locustype/.snakemake/conda/ab0f7b26/lib/python3.8/site-packages/cuteSV/cuteSV_forcecalling.py", line 350, i
    for strand_iter in sv_dict['INV'][chrom]:
KeyError: 'chr15'

Regards,
Wouter

Any additional approach for population study?

Dear Tao,
I want to check if there are special operations for the population study just like the joint-calling strategy of sniffles (https://github.com/fritzsedlazeck/Sniffles/wiki/SV-calling-for-a-population).

Thanks in advance
Jane

Conda installs v1.0.8 and not the latest (v1.0.12)

Hi,

I tried the conda installation method and the installed version is v1.0.8, instead of v1.0.12.

cuteSV --version
cuteSV 1.0.8

I then tried "conda update cuteSV", but it keeps v1.0.8, doesn´t update it.

How can I solve this?

Thanks.

José

work_dir parameter error

Hey,

your program looks very interesting but I can't run. What's does the work_dir parameter mean?? If I omit it, I get an error

2020-09-03 16:45:01,175 [INFO] Running /home/guerrer/.local/bin/cuteSV mapped_sorted.bam ref.fasta mapped.vcf  --max_cluster_bias_INS 1000 --threads 12 --diff_ratio_merging_INS 0.9 --max_cluster_bias_DEL 1000 --diff_ratio_merging_DEL 0.5
2020-09-03 16:45:01,198 [INFO] The total number of chromsomes: 267
Traceback (most recent call last):
  File "/home/guerrer/.local/bin/cuteSV", line 708, in <module>
    run(sys.argv[1:])
  File "/home/guerrer/.local/bin/cuteSV", line 704, in run
    main_ctrl(args, argv)
  File "/home/guerrer/.local/bin/cuteSV", line 562, in main_ctrl
    os.mkdir("%ssignatures"%temporary_dir)
FileNotFoundError: [Errno 2] No such file or directory: 'mapped.vcf/signatures'

if I try to put a directory, it's not recognized:

cuteSV mapped_sorted.bam ref.fasta mapped.vcf ../folder_where_I_am

cuteSV: error: unrecognized arguments: ../folder_where_I_am

Cheers,
Ricardo

PL tag number is incorrect

##FORMAT=<ID=PL,Number=1,Type=Integer,Description="# Phred-scaled genotype likelihoods rounded to the closest integer">

should be

##FORMAT=<ID=PL,Number=3,Type=Integer,Description="# Phred-scaled genotype likelihoods rounded to the closest integer">

Installation not working

Hello!!

I have tried both:

conda install -c bioconda cutesv
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: |
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

and
git clone https://github.com/tjiangHIT/cuteSV.git && cd cuteSV/ && python setup.py install

Cloning into 'cuteSV'...
remote: Enumerating objects: 940, done.
remote: Counting objects: 100% (123/123), done.
remote: Compressing objects: 100% (91/91), done.
remote: Total 940 (delta 59), reused 73 (delta 32), pack-reused 817
Receiving objects: 100% (940/940), 2.98 MiB | 18.25 MiB/s, done.
Resolving deltas: 100% (515/515), done.
ERROR:root:code for hash md5 was not found.
Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 147, in
globals()[__func_name] = __get_hash(__func_name)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type md5
ERROR:root:code for hash sha1 was not found.
Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 147, in
globals()[__func_name] = __get_hash(__func_name)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha1
ERROR:root:code for hash sha224 was not found.
Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 147, in
globals()[__func_name] = __get_hash(__func_name)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha224
ERROR:root:code for hash sha256 was not found.
Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 147, in
globals()[__func_name] = __get_hash(__func_name)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha256
ERROR:root:code for hash sha384 was not found.
Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 147, in
globals()[__func_name] = __get_hash(__func_name)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha384
ERROR:root:code for hash sha512 was not found.
Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 147, in
globals()[__func_name] = __get_hash(__func_name)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha512
running install
running bdist_egg
running egg_info
creating src/cuteSV.egg-info
writing requirements to src/cuteSV.egg-info/requires.txt
writing src/cuteSV.egg-info/PKG-INFO
writing top-level names to src/cuteSV.egg-info/top_level.txt
writing dependency_links to src/cuteSV.egg-info/dependency_links.txt
writing manifest file 'src/cuteSV.egg-info/SOURCES.txt'
reading manifest file 'src/cuteSV.egg-info/SOURCES.txt'
writing manifest file 'src/cuteSV.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.13-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/cuteSV
copying src/cuteSV/cuteSV_resolveTRA.py -> build/lib/cuteSV
copying src/cuteSV/cuteSV_forcecalling.py -> build/lib/cuteSV
copying src/cuteSV/cuteSV_resolveINV.py -> build/lib/cuteSV
copying src/cuteSV/init.py -> build/lib/cuteSV
copying src/cuteSV/cuteSV_resolveDUP.py -> build/lib/cuteSV
copying src/cuteSV/cuteSV_genotype.py -> build/lib/cuteSV
copying src/cuteSV/CommandRunner.py -> build/lib/cuteSV
copying src/cuteSV/cuteSV_Description.py -> build/lib/cuteSV
copying src/cuteSV/diploid_calling.py -> build/lib/cuteSV
copying src/cuteSV/cuteSV_resolveINDEL.py -> build/lib/cuteSV
creating build/lib/benchmarks
copying src/benchmarks/eval_sim.py -> build/lib/benchmarks
copying src/benchmarks/sta_venn.py -> build/lib/benchmarks
copying src/benchmarks/init.py -> build/lib/benchmarks
copying src/benchmarks/eval_trio.py -> build/lib/benchmarks
copying src/benchmarks/vcf2bedpe.py -> build/lib/benchmarks
copying src/benchmarks/cmp_NA19240.py -> build/lib/benchmarks
copying src/benchmarks/multi_platform.py -> build/lib/benchmarks
creating build/bdist.macosx-10.13-x86_64
creating build/bdist.macosx-10.13-x86_64/egg
creating build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_resolveTRA.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_forcecalling.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_resolveINV.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/init.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_resolveDUP.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_genotype.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/CommandRunner.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_Description.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/diploid_calling.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
copying build/lib/cuteSV/cuteSV_resolveINDEL.py -> build/bdist.macosx-10.13-x86_64/egg/cuteSV
creating build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/eval_sim.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/sta_venn.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/init.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/eval_trio.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/vcf2bedpe.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/cmp_NA19240.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
copying build/lib/benchmarks/multi_platform.py -> build/bdist.macosx-10.13-x86_64/egg/benchmarks
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_resolveTRA.py to cuteSV_resolveTRA.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_forcecalling.py to cuteSV_forcecalling.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_resolveINV.py to cuteSV_resolveINV.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/init.py to init.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_resolveDUP.py to cuteSV_resolveDUP.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_genotype.py to cuteSV_genotype.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/CommandRunner.py to CommandRunner.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_Description.py to cuteSV_Description.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/diploid_calling.py to diploid_calling.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/cuteSV/cuteSV_resolveINDEL.py to cuteSV_resolveINDEL.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/eval_sim.py to eval_sim.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/sta_venn.py to sta_venn.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/init.py to init.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/eval_trio.py to eval_trio.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/vcf2bedpe.py to vcf2bedpe.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/cmp_NA19240.py to cmp_NA19240.pyc
byte-compiling build/bdist.macosx-10.13-x86_64/egg/benchmarks/multi_platform.py to multi_platform.pyc
installing package data to build/bdist.macosx-10.13-x86_64/egg
running install_data
copying LICENSE -> build/bdist.macosx-10.13-x86_64/egg/
creating build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
installing scripts to build/bdist.macosx-10.13-x86_64/egg/EGG-INFO/scripts
running install_scripts
running build_scripts
creating build/scripts-2.7
copying and adjusting src/cuteSV/cuteSV -> build/scripts-2.7
changing mode of build/scripts-2.7/cuteSV from 644 to 755
creating build/bdist.macosx-10.13-x86_64/egg/EGG-INFO/scripts
copying build/scripts-2.7/cuteSV -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO/scripts
changing mode of build/bdist.macosx-10.13-x86_64/egg/EGG-INFO/scripts/cuteSV to 755
copying src/cuteSV.egg-info/PKG-INFO -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
copying src/cuteSV.egg-info/SOURCES.txt -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
copying src/cuteSV.egg-info/dependency_links.txt -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
copying src/cuteSV.egg-info/not-zip-safe -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
copying src/cuteSV.egg-info/requires.txt -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
copying src/cuteSV.egg-info/top_level.txt -> build/bdist.macosx-10.13-x86_64/egg/EGG-INFO
creating dist
creating 'dist/cuteSV-1.0.11-py2.7.egg' and adding 'build/bdist.macosx-10.13-x86_64/egg' to it
removing 'build/bdist.macosx-10.13-x86_64/egg' (and everything under it)
Processing cuteSV-1.0.11-py2.7.egg
removing '/usr/local/lib/python2.7/site-packages/cuteSV-1.0.11-py2.7.egg' (and everything under it)
creating /usr/local/lib/python2.7/site-packages/cuteSV-1.0.11-py2.7.egg
Extracting cuteSV-1.0.11-py2.7.egg to /usr/local/lib/python2.7/site-packages
cuteSV 1.0.11 is already the active version in easy-install.pth
Installing cuteSV script to /usr/local/bin

Installed /usr/local/lib/python2.7/site-packages/cuteSV-1.0.11-py2.7.egg
Processing dependencies for cuteSV==1.0.11
Searching for pyvcf
Reading https://pypi.org/simple/pyvcf/
Download error on https://pypi.org/simple/pyvcf/: unknown url type: https -- Some packages may not be found!
Couldn't find index page for 'pyvcf' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.org/simple/
Download error on https://pypi.org/simple/: unknown url type: https -- Some packages may not be found!
No local packages or working download links found for pyvcf
error: Could not find suitable distribution for Requirement.parse('pyvcf')

Pyvcf is installed
conda install -c conda-forge pyvcf
Collecting package metadata (current_repodata.json): done
Solving environment: done

All requested packages already installed.

Thanks a lot!!
T

Run VISOR LASeR using --noaddtag [Suggestion]

Hi @tjiangHIT,

it seems that you are using VISOR for your simulations. JFYI, you can speed things up by enforcing VISOR LASeR to skip haplotype and clone tagging of the generated BAM files if you do not need such informations in the final BAM (simply ignore this if you need the HP and CL tags). Something like:

VISOR LASeR -g reference_genome.fasta -s donor_genome_del -bed LASeR.bed -o data_del_5x -c 5 --threads 16 --noaddtag

This prevents VISOR to spend time in re-parsing the BAM files generated, which can be time-consuming for high coverages.
Good luck with your tool !

Best,

Davide

The recommended parameters for ONT PromethION

Could you provide the recommended parameters for ONT PromethION？

No SV found when aligning pacbio CLR reads to reference

Hi,

I was running cuteSV on pacbio CLR reads on maize, but the output VCF file contains only header. When I checked the output directory by adding --retain_work_dir, there are only several empty sigs files. In the signatures folder, there are many bed files which contained lines starting with "DEL/DUP/INS/INV/TRA". The are no warnings or errors for the whole process. The accesion we sequenced is an inbred line so there should be no heterozygous problem. Is there something wrong so that those information was not passed to the final VCF file? Or is it because that there are many repeat in maize genomes or just cuteSV won't work with plant? I would appreciate if you can give me some suggestions. Many thanks!

Best,
Hui

I installed cuteSV via conda. And mapped CLR reads to genome and call snps with command as follows:

minimap2 -t${THREADS} -ax map-pb input/${REF} input/${READS} | samtools sort -@${THREADS} - > output/${SORTEDBAM}
cuteSV --threads ${THREADS} --max_cluster_bias_INS  100 \
    --retain_work_dir \
    --diff_ratio_merging_INS    0.3 \
    --max_cluster_bias_DEL    200 \
    --diff_ratio_merging_DEL    0.5 \
    output/${SORTEDBAM} input/${REF} output/${VCF} output/${VCFDIR}

The output VCF table was like the following:

DEL     chr1   670     30      m54065_170710_194541/48432050/101_9842
DEL     chr1    9756    30      m54065_170714_031530/22086563/0_31613
DEL     chr1    16544   71      m54167_170709_181157/53215835/44719_58683
DEL     chr1    14267   153     m54065_170713_170545/37618114/0_13180

The log file:

2021-05-19 14:03:46,328 [INFO] Running cuteSV --threads 20 --max_cluster_bias_INS 100 --retain_work_dir --diff_ratio_merging_INS 0.3 --max_cluster_bias_DEL 200 --diff_ratio_merging_DEL 0.5 output/AA.pacbio_clr.vs.BB.genome.bam input/BB.genome output/AA.pacbio_clr.vs.BB.genome.bam.vcf output/AA.pacbio_clr.vs.BB.genome.bam.cuteSV
2021-05-19 14:03:46,722 [INFO] The total number of chromsomes: 265
2021-05-19 14:04:12,055 [INFO] Finished 1:180000000-190000000.
2021-05-19 14:04:13,565 [INFO] Finished 1:110000000-120000000.
2021-05-19 14:04:14,647 [INFO] Finished 1:40000000-50000000.
2021-05-19 14:04:15,176 [INFO] Finished 1:20000000-30000000.
2021-05-19 14:04:15,968 [INFO] Finished 1:160000000-170000000.
2021-05-19 14:04:16,573 [INFO] Finished 1:60000000-70000000.
...
2021-05-19 14:10:09,849 [INFO] Rebuilding signatures of structural variants.
2021-05-19 14:10:13,971 [INFO] Clustering structural variants.
2021-05-19 14:10:14,137 [INFO] Writing to your output file.
2021-05-19 14:10:14,138 [INFO] Loading reference genome...
2021-05-19 14:10:27,169 [INFO] Finished in 400.84 seconds.

Unable genotype

Hi @tjiangHIT,

I used cuteSV (version 1.0.6) with a human PacBio data (both ngmlr and pbmm2 aligner) according to the command:

cuteSV -t 16 -l 50 --max_cluster_bias_INS 100 --diff_ratio_merging_INS 0.2 --diff_ratio_filtering_INS 0.6 --diff_ratio_filtering_DEL 0.7 $bam_file $sample.vcf $sample

However, I got the vcf result that all the genotype of detected SV is "./.". It seems cuteSV couldn't measure the DR (High-quality reference reads) information. Do you have any suggestion for solving this problem? Thank you!

-  #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NULL
- 1       10338   cuteSV.DEL.0    N       <DEL>   .       PASS    PRECISE;SVTYPE=DEL;SVLEN=-76;END=10414;CIPOS=-19,19;CILEN=-6,6;RE=16;STRAND=+-  GT:DR:DV:PL:GQ  ./.:.:16:.,.,.:.
- 1       10402   cuteSV.INS.0    N       <INS>   .       PASS    PRECISE;SVTYPE=INS;SVLEN=58;END=10403;CIPOS=-4,4;CILEN=-2,2;RE=16       GT:DR:DV:PL:GQ  ./.:.:16:.,.,.:.
- 1       10818   cuteSV.INS.1    N       <INS>   .       PASS    PRECISE;SVTYPE=INS;SVLEN=100;END=10819;CIPOS=-21,21;CILEN=-7,7;RE=98    GT:DR:DV:PL:GQ  ./.:.:98:.,.,.:.
- 1       136960  cuteSV.INS.2    N       <INS>   .       PASS    PRECISE;SVTYPE=INS;SVLEN=280;END=136961;CIPOS=-22,22;CILEN=-11,11;RE=20 GT:DR:DV:PL:GQ  ./.:.:20:.,.,.:.
- 1       227418  cuteSV.BND.0    N       ]2:243159580]N  .       PASS    PRECISE;SVTYPE=BND;CHR2=2;END=243159580;RE=9    GT:DR:DV:PL:GQ  ./.:.:9:.,.,.:.
- 1       227421  cuteSV.BND.1    N       [5:180900319[N  .       PASS    PRECISE;SVTYPE=BND;CHR2=5;END=180900319;RE=11   GT:DR:DV:PL:GQ  ./.:.:11:.,.,.:.
- ...

Translocations BND meaning. Inversions also?

Hello again,

I have small questions about translocations. Look at the following example (I'm still using 1.0.7, but I think it doesn't matter for the question):

NC_024459.2 115467803 cuteSV.BND.54 N ]NC_024463.2:150593897]N . PASS PRECISE;SVTYPE=BND;CHR2=NC_024463.2;END=150593897;RE=118;RNAMES=m64093_191209_130050/78512469/ccs_63,m64093_191209_130050/3606801/ccs_64,m64093_191209_130050/154338485 <clipped all read names out here> GT:DR:DV:PL:GQ ./.:.:118:.,.,.:.

So, this translocation comes from NC_024459.2 at position 115467803 to NC_024463.2 at position 150593897? Or the otherway around? What does this notation:

]NC_024463.2:150593897]N

exactly mean? Specifically I mean the "]" sign before the contig name and "]N" afterwards.

Another question is: Are inversions identifiable? Are they noted as translocations with the same chromossome and inverted coordinates?

Kind regards,
Ricardo

Any suggestion about Genotyping using short-reads

Dear @tjiangHIT

Thank you for your good tools, I will have a try. I will use several representative long-read samples to generate the candidate SVs, and then genotype these SVs in short-reads samples. Do you have some suggestions about the tools used in genotyping? I have tried to use paragraph (https://github.com/Illumina/paragraph) but failed to prepare the input file.

Thanks
Zheng Zhuqing

Problems about VISOR in simulated alignments generation

cuteSV/simulate/README.md gives the way for simulated alignments gemeration, and I run this command:
VISOR LASeR -g reference_genome.fasta -s donor_genome_del -bed LASeR.bed -o data_del_5x -c 5 --threads 16 --noaddtag
when it finished it give me 3 files in data_del_5x, sim.srt.bam、sim.srt.bam.bai and VISOR_LASeR.log
Then I run cuteSV with sim.srt.bam, the command like this:
cuteSV sim.srt.bam ref.fasta cutesv.vcf ./
but got the error:
ValueError: file has no sequences defined (mode='r') - is it SAM/BAM format? Consider opening with check_sq=False
It seems that there is something wrong in the .bam file, but I have no idea about it. Is there anyone could help me about this?

tjianghit / cutesv Goto Github PK

cutesv's Introduction

cuteSV

Getting Start

Installation

Introduction

Notice

Dependence

Usage

Datasets generated from cuteSV

Changelog

Citation

Contact

cutesv's People

Contributors

Stargazers

Watchers

Forkers

cutesv's Issues

All requested packages already installed.

Recommend Projects

Recommend Topics

Recommend Org