Coder Social home page Coder Social logo

eclipsezhao / uvc-delins Goto Github PK

View Code? Open in Web Editor NEW

This project forked from genetronhealth/uvc-delins

0.0 0.0 0.0 66 KB

A tool for calling deletion-insertion variants (i.e., MNVs and complex InDels) released under the APACHE-2 license

Shell 6.44% C++ 91.74% Makefile 1.82%

uvc-delins's Introduction

UVC-delins is a very accurate and reasonably fast delins (including multiple-nucleotide variant (MNV) and complex insertion-deletion (InDel)) variant caller based on the VCF file generated by UVC (which is available at https://github.com/genetronhealth/uvc). Please note that UVC already stored each haplotype and its information during the process of calling simple small variants. In other words, the step "store each haplotype and its information during the process of calling simple small variants" was already done in UVC. Therefore, the VCF file generated by UVC already contains haplotype information which is used by UVC-delins to call deletion-insertion (delins) variants.

In fact, UVC-delins can process any VCF file with variant records containing parenthesis-grouped haplotype IDs (for example, 'Hap=(haplotype1)(haplotypep2)' without single quotes) in the VCF tags/fields of these records, meaning that this variant record is covered by haplotype1 and haplotype2. For more information about the forms of haplotype IDs used by UVC, please see the description for the bHap VCF tag in UVC.

Although UVC-delins can be used with other SNV/InDel variant callers, we highly recommend to use it with UVC due to its seamless integration with UVC.

How to install

The installation requirements for UVC-delins are the same as these for UVC (for the requirements for UVC, please check https://github.com/genetronhealth/uvc)

In sum, UVC-delins requires BASH 4.0+ (4.0 is the minimum version required) and a compiler that supports the C++14 standard. The Makefile in this directory compiles with g++, but the Makefile can be easily modified to use another compiler instead of g++ (for example, clang). To install from scratch, please run: (./install-dependencies.sh && make clean && make all -j4 && make deploy).

In total, the installation for UVC-delins should take about 5 minutes.

A docker image of UVC-delins along with UVC is provided at: https://hub.docker.com/r/genetronhealth/gcc-0-6-0-uvc-0-8-0-delins-0-1-5

How to use

The script uvcvcf-raw2delins-all.sh in the bin directory is the main executable that generates all VCF files related to the calling of delins variants. Run bin/uvcvcf-raw2delins-all.sh without any command-line argument will display its usage help. The usage help for uvcvcf-raw2delins-all.sh refers to the executable uvcvcf-raw2delins, which performs the actual variant calling. The binary executable uvcvcf-raw2delins performs the actual calling of delins variants by combining SNV(s) and InDel(s) that are near each other from the same haplotype, where the SNV(s) and InDel(s) that are not from the same haplotype are still kept.
The script uvcvcf-raw2delins-all.sh simply wraps around the binary executable uvcvcf-raw2delins.

In sum, given an input BAM file, the command to generate VCF with delins variants (which include MNVs and complex InDels) along with SNVs and simple InDels are as follows (INDEXED_REFERENCE_FASTA should be already pre-indexed by bwa index and samtools faidx):

UVC_INSTALL_DIRECTORY/uvc1 -f INDEXED_REFERENCE_FASTA -o SNV_INDEL_VCF_GZ INPUT_BAM && UVC_DELINS_INSTALL_DIRECTORY/uvcvcf-raw2delins-all.sh INDEXED_REFERENCE_FASTA SNV_INDEL_VCF_GZ DELINS_VARIANT_VCF_PREFIX

A toy input BAM file is avaiable at https://github.com/genetronhealth/MNV-test-data/blob/master/HNF4A.bam and this BAM file is pre-aligned to the http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz reference fasta file. The full script/code used to evaluate UVC-delins with other variant callers is provided at https://github.com/genetronhealth/uvc-delins-eval

What to report if a runtime error arises

All bug reports, feature requests, and ideas for improvement are welcome (although not all of them may be addressed in time)!

Other things

For more information, please check the wiki.

Patent

CN114566214A 检测基因组缺失插入变异的方法及检测装置和计算机可读存储介质与应用

Contact: XiaoFei Zhao

x43zhao AT uwaterloo DOT ca (first email to be reached at)

cndfeifei AT gmail DOT com (if the above email does not work)

xiaofei DOT zhao AT genetronhealth DOT com (if the above email does not work)

uvc-delins's People

Contributors

genetronhealth avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.