eldariont / svim Goto Github PK
View Code? Open in Web Editor NEWStructural Variant Identification Method using Long Reads
License: GNU General Public License v3.0
Structural Variant Identification Method using Long Reads
License: GNU General Public License v3.0
Hi David,
Thanks for developing SVIM.
I am filtering SV called by SVIM and ran into this issue with the variants.vcf.
There are DUP_TAN variants with GT:CN:DP:AD, but CN was not defined in the header.
I checked wiki and did not find any information about CN.
I am not sure what to do with these variants since these are pass variants.
Would love to get your insights about it.
Thanks!
Best wishes,
Zih-Hua
In the output, I found out that translocation (BND) are separated into two lines, the position1
and position2
is reversed, the svim.BND.xxx
sv ID is also different. I want to merge every translocation event into one line. Is there any recommended method?
I am using multi SV callers, and found out that some callers do the same thing for translocations like manta
and svaba
some callers do not (only one line for each translocation) like sniffles
and cuteSV
.
chr1 53900 svim.BND.7 N N[chr2:11656999[ 1 PASS SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=.
GT:DP:AD ./.:.:.,.
chr2 11656999 svim.BND.156311 N ]chr1:53900]N 1 PASS SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=. GT:DP:AD ./.:.:.,.
Hi David
I would like to explore the features of insertion predicted by SVIM, Is there any option to extract these insertion sequences or the reads with insertion?
Thanks
Edison
Hello,
I am just wondering how should I interpret the unphased genotypes in the context of the 'variants.vcf' output file. Do those really matter?
Thanks so much for developing this tool and making it accessible to the research community!
As I was getting started with SVIM I noticed that in the example analysis you use options like --interspersed_duplications_as_insertions
and --max_sv_size 1000000
. I've tried to find a place where all options are listed (e.g. with svim --help) along with a description of what they do, but I haven't been able to find one. Does such documentation exist?
Thanks for your help!
Dear all,
According to some literatures, I decided to use minimap2 to do alignment and then use Sniffles and SVIM to call SV. Thes variants detected by both will be retained. However, when I using SVIM, I found that the BAM files from chromosome level assembly vs chromosome level assembly always gave me an error like following. Hope you can help me, thank you.
2020-05-02 03:19:04,661 [ERROR ] value too large to convert to uint32_t
Traceback (most recent call last):
File "svim", line 165, in <module>
sys.exit(main())
File "svim", line 87, in main
sv_signatures = analyze_alignment_file_coordsorted(aln_file, options)
File "SVIM_COLLECT.py", line 138, in analyze_alignment_file_coordsorted
supplementary_alignments = retrieve_other_alignments(current_alignment, bam)
File "SVIM_COLLECT.py", line 82, in retrieve_other_alignments
a.cigarstring = cigar
File "pysam/libcalignedsegment.pyx", line 1296, in pysam.libcalignedsegment.AlignedSegment.cigarstring.__set__
File "pysam/libcalignedsegment.pyx", line 2217, in pysam.libcalignedsegment.AlignedSegment.cigartuples.__set__
OverflowError: value too large to convert to uint32_t
Sincerely,
Zheng Zhuqing
HI @eldariont
Thanks for working on this tool. I was trying to make a call using the svim reads
method as follow:
svim reads --min_mapq 30 <working-dir> <fastq.gz> <genome.fasta>
It just suddenly stopped, so I decided to share the entire .log
file here:
2020-10-11 20:25:47,438 [INFO ] ****************** Start SVIM, version 1.4.2 ******************
2020-10-11 20:25:47,438 [INFO ] CMD: python3 /home/cgarci39/.conda/envs/svim/bin/svim reads --min_mapq 30 /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta
2020-10-11 20:25:47,438 [INFO ] WORKING DIR: /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim
2020-10-11 20:25:47,438 [INFO ] PARAMETER: sub, VALUE: reads
2020-10-11 20:25:47,438 [INFO ] PARAMETER: working_dir, VALUE: /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim
2020-10-11 20:25:47,439 [INFO ] PARAMETER: reads, VALUE: /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz
2020-10-11 20:25:47,439 [INFO ] PARAMETER: genome, VALUE: /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta
2020-10-11 20:25:47,439 [INFO ] PARAMETER: verbose, VALUE: False
2020-10-11 20:25:47,439 [INFO ] PARAMETER: cores, VALUE: 1
2020-10-11 20:25:47,439 [INFO ] PARAMETER: aligner, VALUE: ngmlr
2020-10-11 20:25:47,439 [INFO ] PARAMETER: nanopore, VALUE: False
2020-10-11 20:25:47,439 [INFO ] PARAMETER: min_mapq, VALUE: 30
2020-10-11 20:25:47,439 [INFO ] PARAMETER: min_sv_size, VALUE: 40
2020-10-11 20:25:47,439 [INFO ] PARAMETER: max_sv_size, VALUE: 100000
2020-10-11 20:25:47,439 [INFO ] PARAMETER: segment_gap_tolerance, VALUE: 10
2020-10-11 20:25:47,439 [INFO ] PARAMETER: segment_overlap_tolerance, VALUE: 5
2020-10-11 20:25:47,439 [INFO ] PARAMETER: all_bnds, VALUE: False
2020-10-11 20:25:47,439 [INFO ] PARAMETER: partition_max_distance, VALUE: 1000
2020-10-11 20:25:47,439 [INFO ] PARAMETER: distance_normalizer, VALUE: 900
2020-10-11 20:25:47,439 [INFO ] PARAMETER: cluster_max_distance, VALUE: 0.3
2020-10-11 20:25:47,439 [INFO ] PARAMETER: del_ins_dup_max_distance, VALUE: 1.0
2020-10-11 20:25:47,439 [INFO ] PARAMETER: trans_sv_max_distance, VALUE: 500
2020-10-11 20:25:47,439 [INFO ] PARAMETER: skip_genotyping, VALUE: False
2020-10-11 20:25:47,439 [INFO ] PARAMETER: minimum_score, VALUE: 3
2020-10-11 20:25:47,439 [INFO ] PARAMETER: homozygous_threshold, VALUE: 0.8
2020-10-11 20:25:47,439 [INFO ] PARAMETER: heterozygous_threshold, VALUE: 0.2
2020-10-11 20:25:47,439 [INFO ] PARAMETER: minimum_depth, VALUE: 4
2020-10-11 20:25:47,439 [INFO ] PARAMETER: sample, VALUE: Sample
2020-10-11 20:25:47,439 [INFO ] PARAMETER: types, VALUE: DEL,INS,INV,DUP:TANDEM,DUP:INT,BND
2020-10-11 20:25:47,439 [INFO ] PARAMETER: sequence_alleles, VALUE: False
2020-10-11 20:25:47,439 [INFO ] PARAMETER: insertion_sequences, VALUE: False
2020-10-11 20:25:47,439 [INFO ] PARAMETER: tandem_duplications_as_insertions, VALUE: False
2020-10-11 20:25:47,439 [INFO ] PARAMETER: interspersed_duplications_as_insertions, VALUE: False
2020-10-11 20:25:47,439 [INFO ] PARAMETER: read_names, VALUE: False
2020-10-11 20:25:47,439 [INFO ] PARAMETER: zmws, VALUE: False
2020-10-11 20:25:47,439 [INFO ] ****************** STEP 1: COLLECT ******************
2020-10-11 20:25:47,439 [INFO ] MODE: reads
2020-10-11 20:25:47,439 [INFO ] INPUT: /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz
2020-10-11 20:25:47,439 [INFO ] GENOME: /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta
2020-10-11 20:25:47,439 [INFO ] Recognized reads file as gzipped FASTQ format.
2020-10-11 20:25:47,671 [INFO ] Starting alignment pipeline..
2020-10-11 20:27:58,503 [ERROR ] The alignment pipeline failed with exit code 1. Command was: set -o pipefail && gunzip -c /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz | ngmlr -t 1 -r /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta | samtools view -b -@ 1 | samtools sort -@ 1 -o /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.fastq.ngmlr.coordsorted.bam
Traceback (most recent call last):
File "/home/cgarci39/.conda/envs/svim/lib/python3.7/site-packages/svim/SVIM_alignment.py", line 52, in run_alignment
run(" ".join(command_align), shell=True, check=True, executable='/bin/bash')
File "/home/cgarci39/.conda/envs/svim/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'set -o pipefail && gunzip -c /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz | ngmlr -t 1 -r /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta | samtools view -b -@ 1 | samtools sort -@ 1 -o /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.fastq.ngmlr.coordsorted.bam' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/cgarci39/.conda/envs/svim/bin/svim", line 216, in <module>
sys.exit(main())
File "/home/cgarci39/.conda/envs/svim/bin/svim", line 85, in main
bam_path = run_alignment(options.working_dir, options.genome, options.reads, reads_type, options.cores, options.aligner, options.nanopore)
File "/home/cgarci39/.conda/envs/svim/lib/python3.7/site-packages/svim/SVIM_alignment.py", line 55, in run_alignment
raise AlignmentPipelineError('The alignment pipeline failed with exit code {0}. Command was: {1}'.format(e.returncode, e.cmd)) from e
svim.SVIM_alignment.AlignmentPipelineError: The alignment pipeline failed with exit code 1. Command was: set -o pipefail && gunzip -c /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz | ngmlr -t 1 -r /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta | samtools view -b -@ 1 | samtools sort -@ 1 -o /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.fastq.ngmlr.coordsorted.bam
After the job is executed it gives me also a coordsorted.bam
file which I suspect is good and complete, but the pipeline does not continue.
By the way, I installed svim via conda. If there is a way to solve this issue I will really appreciate it!
Also here is my conda list for the env:
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
bzip2 1.0.8 h516909a_3 conda-forge
c-ares 1.16.1 h516909a_3 conda-forge
ca-certificates 2020.6.20 hecda079_0 conda-forge
certifi 2020.6.20 py37he5f6b98_2 conda-forge
cycler 0.10.0 py_2 conda-forge
decorator 4.4.2 py_0 conda-forge
freetype 2.10.3 he06d7ca_0 conda-forge
htslib 1.11 hd3b49d5_0 bioconda
jpeg 9d h516909a_0 conda-forge
k8 0.2.5 he513fc3_0 bioconda
kiwisolver 1.2.0 py37h99015e2_0 conda-forge
krb5 1.17.1 hfafb76e_3 conda-forge
lcms2 2.11 hbd6801e_0 conda-forge
ld_impl_linux-64 2.35 h769bd43_9 conda-forge
libblas 3.8.0 17_openblas conda-forge
libcblas 3.8.0 17_openblas conda-forge
libcurl 7.71.1 hcdd3856_8 conda-forge
libdeflate 1.6 h516909a_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libffi 3.2.1 he1b5a44_1007 conda-forge
libgcc-ng 9.3.0 h5dbcf3e_17 conda-forge
libgfortran-ng 7.5.0 hae1eefd_17 conda-forge
libgfortran4 7.5.0 hae1eefd_17 conda-forge
libgomp 9.3.0 h5dbcf3e_17 conda-forge
liblapack 3.8.0 17_openblas conda-forge
libnghttp2 1.41.0 h8cfc5f6_2 conda-forge
libopenblas 0.3.10 pthreads_hb3c22a3_5 conda-forge
libpng 1.6.37 hed695b0_2 conda-forge
libssh2 1.9.0 hab1572f_5 conda-forge
libstdcxx-ng 9.3.0 h2ae2ef3_17 conda-forge
libtiff 4.1.0 hc7e4089_6 conda-forge
libwebp-base 1.1.0 h516909a_3 conda-forge
lz4-c 1.9.2 he1b5a44_3 conda-forge
matplotlib-base 3.3.2 py37hd478181_0 conda-forge
minimap2 2.17 hed695b0_3 bioconda
ncurses 6.2 he1b5a44_1 conda-forge
networkx 2.5 py_0 conda-forge
ngmlr 0.2.7 he513fc3_2 bioconda
numpy 1.19.2 py37h7ea13bd_1 conda-forge
olefile 0.46 py_0 conda-forge
openssl 1.1.1h h516909a_0 conda-forge
pillow 7.2.0 py37h718be6c_1 conda-forge
pip 20.2.3 py_0 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pysam 0.16.0.1 py37hc334e0b_1 bioconda
python 3.7.8 h6f2ec95_1_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.7 1_cp37m conda-forge
readline 8.0 he28a2e2_2 conda-forge
samtools 1.11 h6270b1f_0 bioconda
scipy 1.5.2 py37hb14ef9d_1 conda-forge
setuptools 49.6.0 py37he5f6b98_2 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
sqlite 3.33.0 h4cf870e_1 conda-forge
svim 1.4.2 py_0 bioconda
tk 8.6.10 hed695b0_1 conda-forge
tornado 6.0.4 py37h8f50634_1 conda-forge
wheel 0.35.1 pyh9f0ad1d_0 conda-forge
xz 5.2.5 h516909a_1 conda-forge
zlib 1.2.11 h516909a_1009 conda-forge
zstd 1.4.5 h6597ccf_2 conda-forge
Thanks in advance.
Camilo
Hi,
This looks like an interesting tool so I'm eager to give it a try.
Using a gzipped fastq file I get the following error:
Traceback (most recent call last):
File "svim/SVIM_COLLECT.py", line 383, in <module>
sys.exit(main())
File "svim/SVIM_COLLECT.py", line 353, in main
full_reads_path = create_full_file(options.working_dir, options.reads, reads_type)
File "svim/SVIM_COLLECT.py", line 200, in create_full_file
if line.startswith('@'):
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
I could solve this by changing, in line 178 of SVIM_COLLECT.py the mode of gzip.open()
from 'rb' to 'rt'.
Additionally, would it be possible that the script creates the working dir if it doesn't exist? Earlier I had the following error:
File "svim/SVIM_COLLECT.py", line 383, in <module>
sys.exit(main())
File "svim/SVIM_COLLECT.py", line 321, in main
fileHandler = logging.FileHandler("{0}/SVIM-COLLECT_{1}.log".format(options.working_dir, strftime("%y%m%d_%H%M%S", localtime())), mode="w")
File "/home/wdecoster/anaconda3/envs/svim/lib/python3.6/logging/__init__.py", line 1030, in __init__
StreamHandler.__init__(self, self._open())
File "/home/wdecoster/anaconda3/envs/svim/lib/python3.6/logging/__init__.py", line 1059, in _open
return open(self.baseFilename, self.mode, encoding=self.encoding)
FileNotFoundError: [Errno 2] No such file or directory: '/home/wdecoster/temp/test_svim/test/SVIM-COLLECT_180620_125723.log'
which was solved by creating the 'test' directory.
Cheers,
Wouter
Hi!
All alignment visualizers rely on a coordinate sorted alignment, yet your tool requires me to resort again, which can be annoying when dealing with >100GB BAM files. Why do you need a special sorting? If you need info about all supplementary alignments for each read, make use of the SA
tag.
Armin
SVIM works fine in most cases.
But I'm wondering is it possible to run with bam of empty SEQ field (the 10th column in the SAM format).
Currently it prompts the following for such bam, which seems to call the sequence bases.
SVIM_intra.py", line 42, in analyze_alignment_indel insertion_seq = alignment.query_sequence[pos_read:pos_read+length] TypeError: 'NoneType' object is not subscriptable
But since the final result doesn't contain information regarding the SEQ filed, I think it would be more flexible to use if it can run sam/bam with limited information (such as the cigar only).
Hi!
I am currently trying to implement SVIM into a pipeline that requires BEDPE files as an input for misassembly detection. At the moment I can manage converting the VCF to BEDPE but I wonder If you might consider adding in the future an option to directly output a BEDPE file from SVIM.
Hi,
Not that I had issues with installing svim, but especially for novice users adding your tool to PyPI and/or bioconda could make installation less painful. Simultaneously, it would make sure that your scripts are placed in $PATH
Please let me know if you need some help to figure this out!
Cheers,
Wouter
Hello,
I used the SVIM
to call the HiFi reads SV, in the result of variants.vcf file, which the GT
was ./.
, my command is :
SVIM/bin/svim alignment ./ ${BAM} ${Reference} --min_mapq 20 --min_sv_size 30 --max_sv_size 100000 --minimum_depth 3 --sample ${SampleName} --sequence_alleles --insertion_sequences
and I used the HiFi coverage is 20X.
Whether need to set some parameters?
Best wishes~
Hi !
Do you have any tips ? what should I use for the alignment, raw or corrected long reads ?
Cheers,
hello! using svim alignment to genotype, all the tandem duplications show missing value.
Hi,
I try to use Sniffle and SVIM search for complex SVs, I am wondering if there any recommended parameter settings for CCS reads.
Thanks!
Hi,
I am wondering that can I run SVIM for paired control-tumor sample, and obtain the somatic SV through the subtraction of normal sample and tumor sample?
Is SVIM designed for germline SV detection?
Thanks,
Charlie
I am using SVIM to call SV in a mammalian diploid genome with pacbio reads at around 60X. The genome was assembled with the same data. Because I have good depth I am using a quality cut-off of of 10, and calling SV > 10bp. This identifies 44K deletions and only 87 insertions with v1.3.
However, if I use an older version (1.1) with the same data it detects 46K deletions and 65K insertions.
I inspected a few INS from version 1.1 and they look correct to me.
I was wondering if this is an issue with the changes make to v1.3.1?
v1.3.1 command used:
svim alignment --min_sv_size 10 --max_sv_size 1000000 --insertion_sequences --sequence_alleles --interspersed_duplications_as_insertions --minimum_score 8 PB_to_BrahGenome20200102_N_1_D PB_to_BrahGenome20200102_N_1.default.bam BrahGenome20200102_N.fasta
v1.1 command used:
svim alignment --min_sv_size 10 --max_sv_size 1000000 --minimum_score 8 PB_to_BrahGenome20200102_N_1_D_svim1.1 PB_to_BrahGenome20200102_N_1.default.bam
Thanks.
minimap2 human_hs37d5.fasta GM24385_1[.chr1].fastq a -z 600,200 -x map-ont --MD -Y -o out.sam
svim alignment . GM24385_1.chr1.sorted.bam human_hs37d5.fasta --min_sv_size 30
Reads - ftp://ftp.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/UCSC_Ultralong_OxfordNanopore_Promethion/GM24385_1.fastq.gz
Reference - ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
2021-02-18 11:58:53,050 [INFO ] PARAMETER: tandem_duplications_as_insertions, VALUE: False
2021-02-18 11:58:53,050 [INFO ] PARAMETER: interspersed_duplications_as_insertions, VALUE: False
2021-02-18 11:58:53,050 [INFO ] PARAMETER: read_names, VALUE: False
2021-02-18 11:58:53,050 [INFO ] PARAMETER: zmws, VALUE: False
2021-02-18 11:58:53,050 [INFO ] ****************** STEP 1: COLLECT ******************
2021-02-18 11:58:53,050 [INFO ] MODE: alignment
2021-02-18 11:58:53,051 [INFO ] INPUT: GM24385_1.chr1.sorted.bam
[E::idx_find_and_load] Could not retrieve index file for 'GM24385_1.chr1.sorted.bam'
2021-02-18 11:58:53,826 [WARNING] Input BAM file is missing a valid index. Please generate with 'samtools index'. Continuing without genotyping for now..
2021-02-18 11:59:29,171 [INFO ] Processed read 10000
2021-02-18 11:59:40,869 [INFO ] Processed read 20000
2021-02-18 11:59:49,003 [INFO ] Processed read 30000
2021-02-18 11:59:56,724 [INFO ] Processed read 40000
2021-02-18 12:00:04,810 [INFO ] Processed read 50000
2021-02-18 12:00:18,820 [INFO ] Processed read 60000
2021-02-18 12:00:31,692 [INFO ] Processed read 70000
2021-02-18 12:00:39,897 [INFO ] Processed read 80000
2021-02-18 12:00:47,652 [INFO ] Processed read 90000
2021-02-18 12:00:55,886 [INFO ] Processed read 100000
2021-02-18 12:01:05,996 [INFO ] Processed read 110000
2021-02-18 12:01:23,571 [INFO ] Found 37092 signatures for deleted regions.
2021-02-18 12:01:23,572 [INFO ] Found 30256 signatures for inserted regions.
2021-02-18 12:01:23,572 [INFO ] Found 26 signatures for inverted regions.
2021-02-18 12:01:23,572 [INFO ] Found 778 signatures for tandem duplicated regions.
2021-02-18 12:01:23,572 [INFO ] Found 2356 signatures for translocation breakpoints.
2021-02-18 12:01:23,572 [INFO ] Found 12 signatures for inserted regions with detected region of origin.
2021-02-18 12:01:23,572 [INFO ] ****************** STEP 2: CLUSTER ******************
2021-02-18 12:01:26,864 [INFO ] Clustered deleted regions: 11499 partitions and 15039 clusters
2021-02-18 12:01:30,957 [INFO ] Clustered inserted regions: 7197 partitions and 11356 clusters
2021-02-18 12:01:32,281 [INFO ] Clustered inverted regions: 8 partitions and 9 clusters
2021-02-18 12:01:32,383 [INFO ] Clustered tandem duplicated regions: 127 partitions and 189 clusters
2021-02-18 12:01:32,521 [INFO ] Clustered translocation breakpoints: 1495 partitions and 1522 clusters
2021-02-18 12:01:32,646 [INFO ] Clustered inserted regions with detected region of origin: 6 partitions and 6 clusters
2021-02-18 12:01:32,648 [INFO ] Finished clustering. Writing signature clusters..
2021-02-18 12:01:33,483 [INFO ] ****************** STEP 3: COMBINE ******************
2021-02-18 12:01:33,489 [INFO ] Combine inserted regions with translocation breakpoints..
2021-02-18 12:01:33,532 [INFO ] Create interspersed duplication candidates and flag cut&paste insertions..
2021-02-18 12:01:34,030 [INFO ] Cluster interspersed duplication candidates one more time..
2021-02-18 12:01:34,031 [INFO ] Clustered interspersed duplication candidates: 6 partitions and 6 clusters
2021-02-18 12:01:34,031 [INFO ] Write SV candidates..
2021-02-18 12:01:34,031 [INFO ] Final deletion candidates: 15039
2021-02-18 12:01:34,031 [INFO ] Final inversion candidates: 9
2021-02-18 12:01:34,032 [INFO ] Final interspersed duplication candidates: 6
2021-02-18 12:01:34,032 [INFO ] Final tandem duplication candidates: 189
2021-02-18 12:01:34,032 [INFO ] Final novel insertion candidates: 11276
2021-02-18 12:01:34,032 [INFO ] Final breakend candidates: 1522
2021-02-18 12:01:35,506 [INFO ] Draw plots..
/software/bioinformatics/svim-1.4.2/lib/python3.6/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return array(a, dtype, copy=False, order=order)
2021-02-18 12:01:48,380 [INFO ] Done.
Hi,
When I run SVIM using the following command
svim alignment --sample darmor --min_mapq 30 --min_sv_size 10 variants/darmor/18 align/darmor/darmor.westar.sorted.REF_chrC08.bam westar.fa
but it occurs MemoryError, and the SVIM often using memory more than 1TB, my each bam file is about 400MB. So, is there any suggestions about how to reduce the memory usage?
Thank you in advance!
Hello SVIM,
I just have a quick question about the score each variant call and the ones used as a cut-off eg. q5 and q10.
In the wiki it mentions:
Since the publication of our manuscript, we changed the score formula to the formula described **here.** The current formula puts more emphasis on the number of supporting reads than on the other features.
However, I don't see the formula used for scoring, am I missing something?
Kind Regards,
Alex Lim
Dear developers
I've tried to run svim alignment with a coordinate-sorted nglmr bamfile which I'd indexed with samtools. The indices are in the same directory as the alignment files. At the genotyping stage an error informs me that no index was found, see log file content below. The same error occurs with my minimap alignments also. It does produce a vcf file with variants but no genotyping information for the sample.
(filenames and directories are anonymised)
2019-08-20 13:37:13,851 [INFO ] ****************** Start SVIM, version 1.2.0 ******************
2019-08-20 13:37:13,851 [INFO ] CMD: python3 /home/annew/.local/bin/svim alignment --min_sv_size 10 --minimum_depth 10 --sample sample . /scratch/alignments_to_ref/sample_nanopore_to_fungusref_n
gmlr.sorted.bam /scratch/alignments_to_ref/fungusref.fasta
2019-08-20 13:37:13,852 [INFO ] WORKING DIR: /workdir/sample_SVIM-calls_nglmr
2019-08-20 13:37:13,852 [INFO ] PARAMETER: del_ins_dup_max_distance, VALUE: 1.0
2019-08-20 13:37:13,852 [INFO ] PARAMETER: segment_overlap_tolerance, VALUE: 5
2019-08-20 13:37:13,852 [INFO ] PARAMETER: duplications_as_insertions, VALUE: False
2019-08-20 13:37:13,852 [INFO ] PARAMETER: distance_normalizer, VALUE: 900
2019-08-20 13:37:13,853 [INFO ] PARAMETER: segment_gap_tolerance, VALUE: 10
2019-08-20 13:37:13,853 [INFO ] PARAMETER: minimum_depth, VALUE: 10
2019-08-20 13:37:13,853 [INFO ] PARAMETER: homozygous_threshold, VALUE: 0.8
2019-08-20 13:37:13,853 [INFO ] PARAMETER: minimum_score, VALUE: 3
2019-08-20 13:37:13,853 [INFO ] PARAMETER: read_names, VALUE: False
2019-08-20 13:37:13,853 [INFO ] PARAMETER: sample, VALUE: sample
2019-08-20 13:37:13,853 [INFO ] PARAMETER: heterozygous_threshold, VALUE: 0.2
2019-08-20 13:37:13,854 [INFO ] PARAMETER: max_sv_size, VALUE: 100000
2019-08-20 13:37:13,854 [INFO ] PARAMETER: trans_sv_max_distance, VALUE: 500
2019-08-20 13:37:13,854 [INFO ] PARAMETER: min_mapq, VALUE: 20
2019-08-20 13:37:13,854 [INFO ] PARAMETER: partition_max_distance, VALUE: 5000
2019-08-20 13:37:13,854 [INFO ] PARAMETER: skip_genotyping, VALUE: False
2019-08-20 13:37:13,854 [INFO ] PARAMETER: sequence_alleles, VALUE: False
2019-08-20 13:37:13,855 [INFO ] PARAMETER: trans_destination_partition_max_distance, VALUE: 1000
2019-08-20 13:37:13,855 [INFO ] PARAMETER: cluster_max_distance, VALUE: 0.3
2019-08-20 13:37:13,855 [INFO ] PARAMETER: sub, VALUE: alignment
2019-08-20 13:37:13,855 [INFO ] PARAMETER: bam_file, VALUE: /scratch/alignments_to_ref/sample_nanopore_to_fungusref_ngmlr.sorted.bam
2019-08-20 13:37:13,855 [INFO ] PARAMETER: types, VALUE: DEL,INS,INV,DUP_TAN,DUP_INT,BND
2019-08-20 13:37:13,855 [INFO ] PARAMETER: working_dir, VALUE: /local/tmp/annew/sample_SVIM-calls_nglmr
2019-08-20 13:37:13,855 [INFO ] PARAMETER: insertion_sequences, VALUE: False
2019-08-20 13:37:13,856 [INFO ] PARAMETER: min_sv_size, VALUE: 10
2019-08-20 13:37:13,856 [INFO ] PARAMETER: genome, VALUE: /alignments_to_ref/fungusref.fasta
2019-08-20 13:37:13,856 [INFO ] PARAMETER: trans_partition_max_distance, VALUE: 200
2019-08-20 13:37:13,856 [INFO ] ****************** STEP 1: COLLECT ******************
2019-08-20 13:37:13,856 [INFO ] MODE: alignment
2019-08-20 13:37:13,856 [INFO ] INPUT: /scratch/alignments_to_ref/sample_nanopore_to_fungusref_ngmlr.sorted.bam
2019-08-20 13:37:15,281 [INFO ] Processed read 10000
(...)
2019-08-20 13:38:00,210 [INFO ] Processed read 450000
2019-08-20 13:38:02,670 [INFO ] Found 106970 signatures for deleted regions.
2019-08-20 13:38:02,671 [INFO ] Found 37542 signatures for inserted regions.
2019-08-20 13:38:02,671 [INFO ] Found 243 signatures for inverted regions.
2019-08-20 13:38:02,671 [INFO ] Found 698 signatures for tandem duplicated regions.
2019-08-20 13:38:02,671 [INFO ] Found 17433 signatures for translocation breakpoints.
2019-08-20 13:38:02,671 [INFO ] Found 116 signatures for inserted regions with detected region of origin.
2019-08-20 13:38:02,671 [INFO ] ****************** STEP 2: CLUSTER ******************
2019-08-20 13:38:09,548 [INFO ] Clustered deleted regions: 20332 partitions and 34426 clusters
2019-08-20 13:38:13,878 [INFO ] Clustered inserted regions: 4769 partitions and 6662 clusters
2019-08-20 13:38:14,528 [INFO ] Clustered inverted regions: 173 partitions and 175 clusters
2019-08-20 13:38:14,614 [INFO ] Clustered tandem duplicated regions: 203 partitions and 239 clusters
2019-08-20 13:38:14,645 [INFO ] Clustered inserted regions with detected region of origin: 14 partitions and 14 clusters
2019-08-20 13:38:14,683 [INFO ] Finished clustering. Writing signature clusters..
2019-08-20 13:38:15,385 [INFO ] ****************** STEP 3: COMBINE ******************
2019-08-20 13:38:15,386 [INFO ] Cluster translocation breakpoints..
2019-08-20 13:38:17,001 [INFO ] Combine inserted regions with translocation breakpoints..
2019-08-20 13:38:17,027 [INFO ] Create interspersed duplication candidates and flag cut&paste insertions..
2019-08-20 13:38:18,819 [INFO ] Cluster interspersed duplication candidates one more time..
2019-08-20 13:38:18,822 [INFO ] Clustered interspersed duplication candidates: 14 partitions and 15 clusters
2019-08-20 13:38:18,828 [INFO ] ****************** STEP 4: GENOTYPE ******************
2019-08-20 13:38:18,828 [INFO ] Genotyping deletions..
2019-08-20 13:38:18,828 [ERROR ] fetch called on bamfile without index
Traceback (most recent call last):
File "/home/annew/.local/bin/svim", line 165, in <module>
sys.exit(main())
File "/home/annew/.local/bin/svim", line 126, in main
genotype(deletion_candidates, aln_file, "DEL", options)
File "/home/annew/.local/lib/python3.5/site-packages/svim/SVIM_genotyping.py", line 49, in genotype
alignment_it = bam.fetch(contig=contig, start=max(0, start-1000), stop=min(contig_length, end+1000))
File "pysam/libcalignmentfile.pyx", line 1088, in pysam.libcalignmentfile.AlignmentFile.fetch
ValueError: fetch called on bamfile without index
Kind regards,
A. Webb
Dear David,
I'm running SVIM 1.2.0 on some PACBIO samples and I noticed that on a few occasions the SVLEN field is showing wrong values. For example, this tandem duplication should be ~486kb but SVLEN is showing 972kb.
17 44165263 svim.DUP_TAN.511 N <DUP:TANDEM> 39 not_fully_covered SVTYPE=DUP:TANDEM;END=44651666;SVLEN=972806;SUPPORT=32;STD_SPAN=307.46;STD_POS=150.1 GT:DP:AD ./.:.:.,.
For other samples the SVLEN looks fine. Am I missing something?
Thanks!
Hi David
I mapped the long reads to reference genome using minimap2, and then used the SIVM to call SV, The command line i used is "svim alignment $dir $bam ". At stage of GENOTYPE, i got a error message that was "RGBA values should be within 0-1 range". Any suggestions? Thanks
Edison
Hi,
Me again. svim ran succesfully, and I'll do some more testing with bigger datasets (PromethION data).
Upon parsing the obtained vcf file (using cyvcf2) I get multiple warnings as below:
[W::vcf_parse] contig 'chr1' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr10' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr11' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr12' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr13' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr14' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr14_GL000225v1_random' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr15' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr16' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr17' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr17_GL000205v2_random' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr18' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr19' is not defined in the header. (Quick workaround: index the file with tabix.)
The header of the vcf file looks indeed quite empty:
##fileformat=VCFv4.3
##source=SVIMV0.1
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=DUP:TANDEM,Description="Tandem Duplication">
##ALT=<ID=INS,Description="Insertion">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
#CHROM POS ID REF ALT QUAL FILTER INFO
Cheers,
Wouter
Hi David,
Thank you for making your SVIM software available to the research community.
I am starting on a new project of trying to discover SV in assembled genomes of same plant species.
It appears to me, from details at https://github.com/eldariont/svim and https://github.com/eldariont/svim/wiki that the inputs have to reads, rather than fully assembled genomes. Or can I use entire genome. In which case, I will align various genomes to one of them as a reference?
I mean, if I want to discover SVs across 20 individuals, then I'd choose fasta assembly of 1 individual my reference, and fasta assemblies of rest 19 as queries? Is this possible in theory? And is this possible in practice?
Or can I break each of my 19 non-reference genomes into let's say 50kB fake fasta contiguous reads?
Could you please advise me? Thanks, in advance.
Sincerely,
Anand
Hi David,
I am reading the Wiki page and found that NGLMR is a required dependency in addition to minimap2.
Now since minimap2 also does alignment, I'd say it makes sense to give the users an option to use minimap2 for mapping and alignment as well, therefore having one less dependency.
Of course, this is totally dependent on the assumptions that
What do you think?
Steve
I have a 7x WGS ONP genome and my jobs crashed after 64hours. Is this normal? I started with raw FASTQs and using default parameters except for the nanopore option.
I forgot to index my BAM (my mistake, too eager to get to the variants...) and fed it to SVIM.
SVIM happily accepted the BAM, but ultimately errored out
2020-04-01 17:28:51,738 [INFO ] ****************** STEP 3: COMBINE ******************
2020-04-01 17:28:51,767 [INFO ] Cluster translocation breakpoints..
2020-04-01 17:28:53,830 [INFO ] Combine inserted regions with translocation breakpoints..
2020-04-01 17:28:55,566 [INFO ] Create interspersed duplication candidates and flag cut&paste insertions..
2020-04-01 17:29:19,039 [INFO ] Cluster interspersed duplication candidates one more time..
2020-04-01 17:29:19,042 [INFO ] Clustered interspersed duplication candidates: 59 partitions and 70 clusters
2020-04-01 17:29:19,061 [INFO ] ****************** STEP 4: GENOTYPE ******************
2020-04-01 17:29:19,062 [INFO ] Genotyping deletions..
2020-04-01 17:29:19,062 [ERROR ] fetch called on bamfile without index
Traceback (most recent call last):
File "/home/shuang/miniconda3/envs/svim/bin/svim", line 165, in <module>
sys.exit(main())
File "/home/shuang/miniconda3/envs/svim/bin/svim", line 126, in main
genotype(deletion_candidates, aln_file, "DEL", options)
File "/home/shuang/miniconda3/envs/svim/lib/python3.7/site-packages/svim/SVIM_genotyping.py", line 49, in genotype
alignment_it = bam.fetch(contig=contig, start=max(0, start-1000), stop=min(contig_length, end+1000))
File "pysam/libcalignmentfile.pyx", line 1092, in pysam.libcalignmentfile.AlignmentFile.fetch
ValueError: fetch called on bamfile without index
It would be nice to check the existence of the index file (*.bam.bai and/or *.bai) upfront to save sometime.
Thanks!
Steve
Hi,
I have recently aligned the CHM13 v1.0 assembly from the T2T consortium to the reference genome GRCh38.p13. I subsequently tried to detect SVs from the CHM13 alignment by using the following svim-asm command:
svim-asm haploid --max-sv-size 1000000 --sample CHM13 --reference_gap_tolerance 1000 --reference_overlap_tolerance 1000 --query_gap_tolerance 2000 --query_overlap_tolerance 2000 ./svim_chm13 chm13.bam hg38_p13.fa
This command results in only 4 interspersed duplications which seems quite low to me given the completeness of CHM13. I am also a little bit confused by these duplications: when I look at these 4 duplications, the first 3 of them are ending at the same location on GRCh38.p13 but they have very different lengths (all of them are PASS). Does it mean these are 3 different duplications occurring at different places in CHM13 or is it the same duplication but svim-asm cannot tell which of the 3 duplication lengths is the correct one?
Thank you for the help and the great software!
Guillaume
https://pypi.org/project/svim/#history lists svim 0.3.1, but that is missing from the Github releases of svim - which commit corresponds to 0.3.1?
Hi,
I'm using SVIM v1.4.1, and I notice that for some of the random contigs and unplaced contigs (e.g. chr5_GL000208v1_random, chrUn_GL000226v1) I get surprisingly high coordinates, suspiciously always 4294967296 or 2^32. Building a normal tbi tabix index breaks for variants like that.
Below are some examples, I can share the full VCF for this sample if you want:
chr5_GL000208v1_random 4294967296 svim.BND.130629 N ]chr5:47079731]N 7 PASS SVTYPE=BND;SUPPORT=6;STD_POS1=.;STD_POS2=1.63 GT:DP:AD ./.:.:.,.
chr17_KI270729v1_random 4294967296 svim.BND.311456 N [chrX:66676029[N 1 PASS SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=. GT:DP:AD ./.:.:.,.
chr22_KI270736v1_random 4294967296 svim.BND.346279 N ]chr22_KI270736v1_random:111780]N 1 PASS SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=. GT:DP:AD ./.:.:.,.
chrEBV 4294967296 svim.DUP_TANDEM.6691 N <DUP:TANDEM> 1 not_fully_covered SVTYPE=DUP:TANDEM;END=51803;SVLEN=51803;SUPPORT=1;STD_SPAN=.;STD_POS=. GT:CN:DP:AD ./.:2:.:.,.
chrUn_GL000226v1 4294967296 svim.DUP_TANDEM.6833 N <DUP:TANDEM> 1 PASS SVTYPE=DUP:TANDEM;END=15008;SVLEN=15008;SUPPORT=1;STD_SPAN=.;STD_POS=. GT:CN:DP:AD ./.:2:.:.,.
chrUn_KI270435v1 4294967296 svim.BND.347861 N [chrY:10657300[N 1 PASS SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=. GT:DP:AD ./.:.:.,.
chrUn_KI270435v1 4294967296 svim.BND.347860 N ]chr16:34065991]N 2 PASS SVTYPE=BND;SUPPORT=2;STD_POS1=.;STD_POS2=. GT:DP:AD ./.:.:.,.
chrUn_KI270590v1 4294967296 svim.DUP_TANDEM.6993 N <DUP:TANDEM> 4 PASS SVTYPE=DUP:TANDEM;END=2914;SVLEN=2914;SUPPORT=4;STD_SPAN=3.95;STD_POS=1.65 GT:CN:DP:AD ./.:2:.:.,.
I checked the length of chr5_GL000208v1_random and that's only 92kb. So something is off here :)
Cheers,
Wouter
Right now SVIM seems to process in batch of 10K reads, and keep them in memory (from the log and memory footprint).
Would it be possible to expose an CLI allowing users to specify the batch size (bigger), hence speeding up the process?
Thanks!
Steve
hello,
the software seems to be rewrited by python script, and the before version can detecting cnvs from small genomes,and I want to know whether the newest version can detect cnvs or not? and which option control the genotypes in the final vcfs, thank you!
I have one bamfile for which svim repeatedly fails with the following error message:
2020-02-27 21:09:20,651 [INFO ] ****************** STEP 3: COMBINE ******************
2020-02-27 21:09:20,684 [INFO ] Cluster translocation breakpoints..
2020-02-27 21:09:25,173 [INFO ] Combine inserted regions with translocation breakpoints..
2020-02-27 21:09:25,720 [INFO ] Create interspersed duplication candidates and flag cut&paste insertions..
2020-02-27 21:12:28,804 [INFO ] Cluster interspersed duplication candidates one more time..
2020-02-27 21:12:28,809 [INFO ] Clustered interspersed duplication candidates: 26 partitions and 27 clusters
2020-02-27 21:12:28,844 [INFO ] ****************** STEP 4: GENOTYPE ******************
2020-02-27 21:12:28,844 [INFO ] Genotyping deletions..
2020-02-27 21:12:28,859 [ERROR ] truncated file
Traceback (most recent call last):
File "/mnt/SRV018/projects/external/promethion/sv_lr_svim/venv/bin/svim", line 165, in <module>
sys.exit(main())
File "/mnt/SRV018/projects/external/promethion/sv_lr_svim/venv/bin/svim", line 126, in main
genotype(deletion_candidates, aln_file, "DEL", options)
File "/mnt/SRV018/projects/external/promethion/sv_lr_svim/venv/lib/python3.7/site-packages/svim/SVIM_genotyping.py", line 58, in genotype
current_alignment = next(alignment_it)
File "pysam/libcalignmentfile.pyx", line 2092, in pysam.libcalignmentfile.IteratorRowRegion.__next__
OSError: truncated file
To me this appears to be a problem with the input data, however other analysis on the same bamfile worked without problems and samtools quickcheck reports no issues with the file. Also it is strange that svim fails at such a late stage, after analysing the whole file. Do you have an idea what might cause the problem?
I am using a venv with python 3.7.3, pysam 0.15.3 and svim 1.2.0
Hi,
I am wondering the recommended gene annotation tools after SV calling, to estimate the SV effect on human genes mutation.
Hi, many thanks for developing such a useful tool.
After using and comparing different kind of SV calling software, I found that svim it best for sv calling in my project. However, I can only use it for an individual sample. Are there have any function for population calling for svim? for example, sniffles + survivor for population sv calling.
Many thanks for your help.
Best,
Lin
HI,
I installed svim using pip but cannot make it work.
Hi,
I'm about to test out this tool via
conda install -c bioconda svim
but received the following errors.
Can you help me please?
Thank you!
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: \
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Package biopython conflicts for:
svim -> biopython
Package ca-certificates conflicts for:
svim -> python=3.6 -> openssl=1.0 -> ca-certificates
python=3.7 -> openssl[version='>=1.1.1c,<1.1.2a'] -> ca-certificates
Package certifi conflicts for:
python=3.7 -> pip -> setuptools -> certifi[version='>=2016.09']
Package pip conflicts for:
svim -> python=3.6 -> pip
python=3.7 -> pip
Package wheel conflicts for:
svim -> python=3.6 -> pip -> wheel
python=3.7 -> pip -> wheel
Package setuptools conflicts for:
svim -> matplotlib -> setuptools
python=3.7 -> pip -> setuptools
Package python-dateutil conflicts for:
svim -> matplotlib -> python-dateutil
Package python conflicts for:
python=3.7
I followed the evaluation step by step and i got the error at step 9 .
{
9. Run SVIM on BAM file with distance threshold=1.4 for signatures to be merged into clusters:
svim alignment --cluster_max_distance 1.4 svim_wd aln/hs37d5.all.querysorted.bam
}
I've got
svim alignment --cluster_max_distance 1.4 svim_wd aln/hs37d5.all.querysorted.bam
usage: svim alignment [-h] [--verbose] [--min_mapq MIN_MAPQ]
[--min_sv_size MIN_SV_SIZE] [--max_sv_size MAX_SV_SIZE]
[--segment_gap_tolerance SEGMENT_GAP_TOLERANCE]
[--segment_overlap_tolerance SEGMENT_OVERLAP_TOLERANCE]
[--partition_max_distance PARTITION_MAX_DISTANCE]
[--distance_normalizer DISTANCE_NORMALIZER]
[--cluster_max_distance CLUSTER_MAX_DISTANCE]
[--all_bnds]
[--del_ins_dup_max_distance DEL_INS_DUP_MAX_DISTANCE]
[--trans_sv_max_distance TRANS_SV_MAX_DISTANCE]
[--skip_genotyping] [--minimum_score MINIMUM_SCORE]
[--homozygous_threshold HOMOZYGOUS_THRESHOLD]
[--heterozygous_threshold HETEROZYGOUS_THRESHOLD]
[--minimum_depth MINIMUM_DEPTH] [--sample SAMPLE]
[--types TYPES] [--sequence_alleles]
[--insertion_sequences]
[--tandem_duplications_as_insertions]
[--interspersed_duplications_as_insertions]
[--read_names] [--zmws]
working_dir bam_file genome
svim alignment: error: the following arguments are required: genome
how do i handle it?
Hi David,
I'm trying to call large (>10kb) somatic structural variants in a tumour sample for which I have ~80X coverage of ONT reads.
From previous analysis conducted on this sample, I know there are at least 45 deletions, 10 inversions and 6 tandem duplications, all in the 10kb - 100Mb length range. The allele frequency of somatic variants is about 30% in this sample.
I have run SVIM (v0.5.0) with the following parameters:
svim alignment $WORKSPACE $BAM --sample $SAMPLE --min_mapq 1 --min_sv_size 10000 --max_sv_size 100000000
For the somatic deletions, the recall is not bad: 32 are detected. However only one of the inversions and none of the duplications are included in the results vcf.
I was wondering whether you had an suggestions for tweaking the parameters to improve sensitivity for the detection of large, somatic SVs?
Many thanks,
Hannah
Hi!
I encourage you to use the latest HG002 datasets to investigate your FP and FN numbers using truvari
and GIAB annotation.
PacBio offers
15kb
library with CCS: ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/PacBio_CCS_15kb/>15kb
long read library: https://downloads.pacbcloud.com/public/dataset/HG002/Sequel-201810/I benchmarked svim
internally and didn't get great results. Just to give you a feeling, both public versions of sniffles
and pbsv
achieve >0.90
F1 score.
Happy developing!
Armin
Me again :)
bcftools complains:
[W::vcf_parse] INFO 'STD_SPAN' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'STD_POS' is not defined in the header, assuming Type=String
Cheers,
W
Hi there,
I am working with both ngmlr and minimap mapped reads and although the ngmlr files are easily accepted, I find that the minimap BAMs come against this error sometimes during the combine step, depending on the read data set and whether it is sorted by name or not...
2019-02-06`
10:38:27,976 [INFO ] ****************** STEP 3: COMBINE ******************
2019-02-06 10:38:27,976 [INFO ] Cluster translocation breakpoints..
2019-02-06 10:38:27,977 [INFO ] Combine inserted regions with translocation breakpoints..
2019-02-06 10:38:27,978 [INFO ] Create interspersed duplication candidates and flag cut&paste insertions..
Traceback (most recent call last):
File "/usr/local/bin/svim", line 224, in
sys.exit(main())
File "/usr/local/bin/svim", line 221, in main
combine_clusters(signature_clusters, options.working_dir, options, version, aln_file.references, aln_file.lengths, options.sample)
File "/usr/local/lib/python3.5/dist-packages/svim/SVIM_COMBINE.py", line 195, in combine_clusters
length2 = end2 - start2
UnboundLocalError: local variable 'end2' referenced before assignment
Attached in the log file also.
Any clues?
Thanks
Hi,
I am trying to run svim on some sam or bam files generated by minimap, but I always get the following error, any idea how I can fix this?
2020-01-08 05:34:34,700 [INFO ] ****************** STEP 1: COLLECT ******************
2020-01-08 05:34:34,700 [INFO ] MODE: alignment
2020-01-08 05:34:34,700 [INFO ] INPUT: /staging/leuven/stg_00019/Nicolas/Nanopore_tatjana/nanopore1_minimap.bam
2020-01-08 05:34:34,758 [ERROR ] 'pysam.calignmentfile.AlignedSegment' object has no attribute 'get_cigar_stats'
Traceback (most recent call last):
File "/data/leuven/332/vsc33273/miniconda3/envs/NGS/bin/svim", line 165, in
sys.exit(main())
File "/data/leuven/332/vsc33273/miniconda3/envs/NGS/bin/svim", line 87, in main
sv_signatures = analyze_alignment_file_coordsorted(aln_file, options)
File "/data/leuven/332/vsc33273/miniconda3/envs/NGS/lib/python3.6/site-packages/svim/SVIM_COLLECT.py", line 138, in analyze_alignment_file_coordsorted
supplementary_alignments = retrieve_other_alignments(current_alignment, bam)
File "/data/leuven/332/vsc33273/miniconda3/envs/NGS/lib/python3.6/site-packages/svim/SVIM_COLLECT.py", line 47, in retrieve_other_alignments
if main_alignment.get_cigar_stats()[0][5] > 0:
AttributeError: 'pysam.calignmentfile.AlignedSegment' object has no attribute 'get_cigar_stats'
Hi,
I am wondering about the recommended sequencing depth for SVIM to detect correct SV.
Hi,
SVIM is a good SV caller and can be implemented easily. Could I run it with multithreading and how to do like this?
Hope your reply.
Thank you!
The python run command uses /bin/sh by default. Not all Linux installations support "set -o pipefail". Consider updating the run commands with the executable flag set:
run(" ".join(command), shell=True, check=True, executable='/bin/bash')
I believe this part of your wiki is outdated (https://github.com/eldariont/svim/wiki#bam-input):
Currently, read alignments need to be sorted by read name (samtools sort -n).
Same for the command line help information:
bam_file SAM/BAM file with aligned long reads (should be
queryname-sorted with samtools sort -n)
In addition, the wiki contains python svim.py
usage, which can probably be replaced by just svim
Hi,
In your WIKI you describe five different SVTYPEs: DEL, INV, DUP:TANDEM, DUP:INT, and INS:NOVEL.
I get all the above plus:
]chr19:107955]N
N[chr19:107969[
N[chr19:111124[
N[chr6:69267[
]chr1:278176]N
all SVTYPE=BND
What are these? Rubbish to filter out?
Liam
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.