Coder Social home page Coder Social logo

matchall's Introduction

Matchall

matchall (MATCH ALLele) annotates a VCF with the information in another VCF. The core allele matching algorithm is described in this preprint.

Example use case:

  • filling the population allele frequency (AF) field in a variant set using information in a reference panel.
  • comparing two VCFs to find shared and private variants

A variant can be represented in multiple formats. An example in the table below shows a variant in two forms. The ambiguity in variant representation can confound annotating and result in errors.

POS REF ALT
1 GAC GA
2 AC A

The matchall algorithm solves this issue by comparing a variant and a set of queried variants from another VCF using re-constructed local haplotypes. This algorithm annotates variants accurately regardless of representation.

Installation

Dependencies - required

  • Python (3.6+)
  • pysam (0.15.3)

Dependencies. - optional (used in our data processing pipelines)

  • bcftools (1.12)
  • tabix (1.12)

Usage

Download

https://github.com/milkschen/matchall.git

Annotate

Annotate an INFO field in the VCF using such information from another VCF.

python src/annotate.py -v target.vcf.gz -q query.vcf.gz -r ref.fa -o out.vcf.gz

Compare

Compare two VCFs and optionally report intersected and private variants.

python src/compare.py -v A.vcf.gz -q B.vcf.gz -op A_0-B_1 -m annotate,private,isec -o A_0-B_1.vcf.gz -r ref.fa

Test

Run both unit and end-to-end tests:

sh test_all.sh

Or run them separately:

python src/test_matchall.py
python src/test_end_to_end.py

matchall's People

Contributors

milkschen avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

maojanlin

matchall's Issues

Supporting querying multiple database VCF

Currently a unified database VCF is required. But for some reference panels VCFs are provided chromosome-by-chromosome. Concatenating them is an additional cost, so it'll be nice to allow multiple db VCFs (either a file containing paths, or multiple arguments, or both).

Supporting user-defined tags

Currently only AF is supported. It'll be nice to support other tags. Remember to check if the database VCF contains the tag when implementing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.