CyclonRe-WGS software is dedicated to MGI-cyclone/ONT triple sequencing technology, the software analyzes human sequencing data. Functions include data quality control, mapping, small variant detection, structural variant detection, and variant evaluation.
- x86-64 compatible processors.
- require at least 50GB of RAM and 4 CPU.
- centos 7.x 64-bit operating system (Linux kernel 3.10.0, compatible with higher software and hardware configuration).
The CyclonRe-WGS software is recommended to be installed using conda.
unzip CyclonRe_WGS_V1.0.1.zip
cd CyclonRe_WGS_V1.0.1
conda env create -f CyclonRe_WGS_environment.yml
requirement
software | version | link |
NanoStat | 1.6.0 | https://github.com/wdecoster/nanostat |
NanoFilt | 2.8.0 | https://github.com/wdecoster/nanofilt |
NanoPlot | 1.42.0 | https://github.com/wdecoster/NanoPlot |
Sniffles2 | 2.2 | https://github.com/fritzsedlazeck/Sniffles |
minimap2 | 2.17 | https://github.com/lh3/minimap2 |
samtools | 1.7 | https://github.com/samtools/samtools |
clair3 | 1.5 | https://github.com/HKU-BAL/Clair3 |
pigz | 2.7 | https://github.com/madler/pigz |
bcftools | 1.5 | https://github.com/samtools/bcftools |
tabix | 0.2.6 | https://github.com/samtools/tabix |
truvari | 1.5 | https://github.com/ACEnglish/truvari |
gatk | 4.2 | https://github.com/broadinstitute/gatk |
rtg-tools | 3.11 | https://github.com/RealTimeGenomics/rtg-tools |
The software supports WGS analysis of 3 reference genomes (hg19/hg38/ecoli), please download the reference genomes in advance.
##download hg38
wget https://ftp.ensembl.org/pub/release-111/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
##download hg19
wget https://ftp.ensembl.org/pub/grch37/current/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
Build index for reference genome
For example:
samtools faidx Homo_sapiens.GRCh38.dna.toplevel.fa.gz
gatk CreateSequenceDictionary -R Homo_sapiens.GRCh38.dna.toplevel.fa.gz
While downloading the vcf.gz file, you need to download the corresponding bed and tbi files.
Download HG001 baseline
##download HG001 small-variants baseline
wget https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/NISTv4.2.1/GRCh37/HG001_GRCh37_1_22_v4.2.1_benchmark.vcf.gz
wget https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/NISTv4.2.1/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
Download HG002 baseline
##download HG002 small-variants baseline
wget https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NISTv4.2.1/GRCh37/HG002_GRCh37_1_22_v4.2.1_benchmark.vcf.gz
wget https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NISTv4.2.1/GRCh38/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
##download HG002 SV baseline
wget https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/HG002_SVs_Tier1_v0.6.vcf.gz
wget https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_HG002_DraftBenchmark_defrabbV0.012-20231107/GRCh38_HG002-T2TQ100-V1.0_stvar.vcf.gz
Replace the software and file paths in the config file with local paths.
vim config
Usage
usage: run_cyclone_work -i <fq> -n <sample_name> -o <outputFile>
cyclone fq QC workflow
optional arguments:
-h, --help show this help message and exit
-i FASTQ, --fastq FASTQ
input fsatq
-n SAMPLE_NAME, --sample_name SAMPLE_NAME
sample name
-o OUTPUTFILE, --outputFile OUTPUTFILE
output file path,Note that you need to add the "/" symbol at the end of the
folder,eg:'/home/','/home/result/'
-callSNP_indel {None,True}, --callsmallVariants {None,True}
The process can autonomously choose whether or not to perform SNP and InDel testing
and evaluation.Choose between None or True.(optional)
-t QCTHREAD, --qcthread QCTHREAD
Threads required for the QC program.(optional)
-topcrop TOPCROP, --topcrop TOPCROP
Trim n nucleotides from start of read.(optional)
-tailcrop TAILCROP, --tailcrop TAILCROP
Trim n nucleotides from end of read.(optional)
-q TRIM_QUALITY, --trim_quality TRIM_QUALITY
Filter on a minimum average read quality score.(optional)
-qctype {NanoPlot,NanoStat}, --QCtype {NanoPlot,NanoStat}
Select the software used in the QC step. Choose between NanoPlot or
NanoStat.(optional)
-standard STANDARDTYPE, --standardtype STANDARDTYPE
Select the standard sample file . Choose between HG002 or HG001.(optional)
-ref REF, --REF REF Select the reference genome used in this analysis. Choose between hg19 or
hg38.(optional)
-t2 MAPPINGTHREAD, --mappingthread MAPPINGTHREAD
Threads required for the Mapping program.(optional)
-t3 SAMTOOLSTHREAD, --samtoolsthread SAMTOOLSTHREAD
Threads required for the samtools program.(optional)
-t4 CALLVARIANTTHREAD, --Callvariantthread CALLVARIANTTHREAD
Threads required for the Clair3 program.(optional)
Example: Use the example data to initiate the analysis.
- Performs detection of small-variants
- Data quality requirements Q15
bin/run_cyclone_workflow
-i /test/TB2000B73B-202403261655301_read.cut.fq.gz
-n demo_test
-o /result/
-callSNP_indel True
-standard HG002
-ref hg19
-q 15
The software uses the HG002 Cyclone data for variant detection, comparing small variants (SNP/InDel) to baseline in the giab database (https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/) to obtain SNP /InDel evaluation values.
coverage | SNP number | Precision | Sensitivity | F-measure |
10X | 3850830 | 0.9719 | 0.9258 | 0.9483 |
15X | 3923097 | 0.9761 | 0.9485 | 0.9621 |
30X | 4011627 | 0.9888 | 0.9807 | 0.9847 |
coverage | InDel number | Precision | Sensitivity | F-measure |
10X | 710718 | 0.7175 | 0.3746 | 0.4922 |
15X | 841515 | 0.6596 | 0.4147 | 0.5092 |
30X | 909464 | 0.6844 | 0.5005 | 0.5782 |
coverage | SVs number | Precision | recall | f1 |
10X | 35866 | 0.7114 | 0.905 | 0.7966 |
15X | 37366 | 0.8022 | 0.9265 | 0.8599 |
30X | 32681 | 0.9018 | 0.9467 | 0.9237 |