Coder Social home page Coder Social logo

radiomumm / cyclonre_wgs Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 222 KB

CyclonRe-WGS software is dedicated to MGI-cyclone/ONT triple sequencing technology, the software analyzes human-sequencing data. Functions include data quality control, mapping, small variant calling, structural variant calling, and variant evaluation.

cyclonre_wgs's Introduction

CyclonRe_WGS

1 Introduction

CyclonRe-WGS software is dedicated to MGI-cyclone/ONT triple sequencing technology, the software analyzes human sequencing data. Functions include data quality control, mapping, small variant detection, structural variant detection, and variant evaluation.

1.1 Software Workflow

图片alt

1.2 Hardware/Software requirements

  • x86-64 compatible processors.
  • require at least 50GB of RAM and 4 CPU.
  • centos 7.x 64-bit operating system (Linux kernel 3.10.0, compatible with higher software and hardware configuration).

2 Getting Started

2.1 Installation

The CyclonRe-WGS software is recommended to be installed using conda.

unzip CyclonRe_WGS_V1.0.1.zip
cd CyclonRe_WGS_V1.0.1
conda env create -f CyclonRe_WGS_environment.yml

requirement

software version link
NanoStat 1.6.0 https://github.com/wdecoster/nanostat
NanoFilt 2.8.0 https://github.com/wdecoster/nanofilt
NanoPlot 1.42.0 https://github.com/wdecoster/NanoPlot
Sniffles2 2.2 https://github.com/fritzsedlazeck/Sniffles
minimap2 2.17 https://github.com/lh3/minimap2
samtools 1.7 https://github.com/samtools/samtools
clair3 1.5 https://github.com/HKU-BAL/Clair3
pigz 2.7 https://github.com/madler/pigz
bcftools 1.5 https://github.com/samtools/bcftools
tabix 0.2.6 https://github.com/samtools/tabix
truvari 1.5 https://github.com/ACEnglish/truvari
gatk 4.2 https://github.com/broadinstitute/gatk
rtg-tools 3.11 https://github.com/RealTimeGenomics/rtg-tools

2.2 Preparation

2.2.1 Download reference genome

The software supports WGS analysis of 3 reference genomes (hg19/hg38/ecoli), please download the reference genomes in advance.

##download hg38 
wget https://ftp.ensembl.org/pub/release-111/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
##download hg19
wget https://ftp.ensembl.org/pub/grch37/current/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.toplevel.fa.gz

Build index for reference genome

For example:

samtools faidx  Homo_sapiens.GRCh38.dna.toplevel.fa.gz
gatk CreateSequenceDictionary -R Homo_sapiens.GRCh38.dna.toplevel.fa.gz

2.2.2 Download standard baseline

While downloading the vcf.gz file, you need to download the corresponding bed and tbi files.

Download HG001 baseline

##download HG001 small-variants baseline
wget https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/NISTv4.2.1/GRCh37/HG001_GRCh37_1_22_v4.2.1_benchmark.vcf.gz
wget https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/NISTv4.2.1/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz

Download HG002 baseline

##download HG002 small-variants baseline
wget https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NISTv4.2.1/GRCh37/HG002_GRCh37_1_22_v4.2.1_benchmark.vcf.gz
wget https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NISTv4.2.1/GRCh38/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
##download HG002 SV baseline
wget https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/HG002_SVs_Tier1_v0.6.vcf.gz
wget https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_HG002_DraftBenchmark_defrabbV0.012-20231107/GRCh38_HG002-T2TQ100-V1.0_stvar.vcf.gz

2.2.3 Modify the config file

Replace the software and file paths in the config file with local paths.

vim config

3 RUN

Usage

usage: run_cyclone_work -i <fq>  -n <sample_name>  -o <outputFile>

cyclone fq QC workflow

optional arguments:
  -h, --help            show this help message and exit
  -i FASTQ, --fastq FASTQ
                        input fsatq
  -n SAMPLE_NAME, --sample_name SAMPLE_NAME
                        sample name
  -o OUTPUTFILE, --outputFile OUTPUTFILE
                        output file path,Note that you need to add the "/" symbol at the end of the
                        folder,eg:'/home/','/home/result/'
  -callSNP_indel {None,True}, --callsmallVariants {None,True}
                        The process can autonomously choose whether or not to perform SNP and InDel testing
                        and evaluation.Choose between None or True.(optional)
  -t QCTHREAD, --qcthread QCTHREAD
                        Threads required for the QC program.(optional)
  -topcrop TOPCROP, --topcrop TOPCROP
                        Trim n nucleotides from start of read.(optional)
  -tailcrop TAILCROP, --tailcrop TAILCROP
                        Trim n nucleotides from end of read.(optional)
  -q TRIM_QUALITY, --trim_quality TRIM_QUALITY
                        Filter on a minimum average read quality score.(optional)
  -qctype {NanoPlot,NanoStat}, --QCtype {NanoPlot,NanoStat}
                        Select the software used in the QC step. Choose between NanoPlot or
                        NanoStat.(optional)
  -standard STANDARDTYPE, --standardtype STANDARDTYPE
                        Select the standard sample file . Choose between HG002 or HG001.(optional)
  -ref REF, --REF REF   Select the reference genome used in this analysis. Choose between hg19 or
                        hg38.(optional)
  -t2 MAPPINGTHREAD, --mappingthread MAPPINGTHREAD
                        Threads required for the Mapping program.(optional)
  -t3 SAMTOOLSTHREAD, --samtoolsthread SAMTOOLSTHREAD
                        Threads required for the samtools program.(optional)
  -t4 CALLVARIANTTHREAD, --Callvariantthread CALLVARIANTTHREAD
                        Threads required for the Clair3 program.(optional)

Example: Use the example data to initiate the analysis.

  • Performs detection of small-variants
  • Data quality requirements Q15
bin/run_cyclone_workflow 
-i /test/TB2000B73B-202403261655301_read.cut.fq.gz 
-n demo_test 
-o /result/ 
-callSNP_indel True 
-standard HG002 
-ref hg19
-q 15

4 Performance

Small variant evaluation

The software uses the HG002 Cyclone data for variant detection, comparing small variants (SNP/InDel) to baseline in the giab database (https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/) to obtain SNP /InDel evaluation values.

SNPs evaluation

coverage SNP number Precision Sensitivity F-measure
10X 3850830 0.9719 0.9258 0.9483
15X 3923097 0.9761 0.9485 0.9621
30X 4011627 0.9888 0.9807 0.9847

InDel evaluation

coverage InDel number Precision Sensitivity F-measure
10X 710718 0.7175 0.3746 0.4922
15X 841515 0.6596 0.4147 0.5092
30X 909464 0.6844 0.5005 0.5782

SVs evaluation

coverage SVs number Precision recall f1
10X 35866 0.7114 0.905 0.7966
15X 37366 0.8022 0.9265 0.8599
30X 32681 0.9018 0.9467 0.9237

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.