Coder Social home page Coder Social logo

gnmcsbnfrmtcsclb / tangram Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jiantao/tangram

0.0 0.0 0.0 14.93 MB

Fast Structural Variation Detection Toolbox

License: MIT License

Makefile 0.50% C 44.35% C++ 50.69% TeX 0.15% Perl 3.22% Java 0.24% Python 0.77% Objective-C 0.07%

tangram's Introduction

=========================================================================
Tangram 0.3.1        Release Distribution Documentation        2014-02-09
Author: Jiantao Wu ([email protected])
        Wan-Ping Lee ([email protected])
Marth Lab [1], Boston College Biology Department
=========================================================================


Introduction
=========================

Tangram is a C/C++ command line toolbox for structural variation(SV) 
detection. It takes advantage of both read-pair and split-read algorithms 
and is extremely fast and memory-efficient. Powered by the Bamtools API 
[3], Tangram can call SV events on multiple BAM files (a population) 
simutaneously to increase the sensitivity on low-coverage dataset. 
Currently it reports mobile element insertions (MEI). More other SV event 
types will be introduced soon. For SNP calling and short INDEL calling, 
please check an other toolbox from our lab: FreeBayes[4].


Obtaining and Compiling
=========================

> git clone git://github.com/jiantao/Tangram.git
> cd src
> make

To successfully compile Tangram, it requires:

1. g++ 4.2.0 and above
2. zlib
3. pthread lib


Detection pipeline
=========================

Currently, Tangram contains seven sub-programs:

0. tangram_bam    : If the input bam files are not generated by MOSAIK [2],
                    tangram_bam will add ZA tags that are necessary for the
		    following steps.

1. tangram_scan   : Scan through the bam file and calculate the fragment 
                    length distribution for each library in that bam file. 
                    It will output the fragment length distribution files 
                    for each input bam file.

2. tangram_merge  : If more than one bam files need to be scanned, this 
                    program will combine all the fragment length distribution 
                    files together. It will output the merged fragment length 
                    distribution file that enable the detection of multiple 
                    bam files simutaneously. This step is optional if only one 
                    bam file (pooled bam file) was used.

3. tangram_index  : Index the normal and special (MEI sequences) reference 
                    file. It will output the indexed refrence file. This step 
                    is required for split read algorithm.

4. tangram_detect : Detect and genotype the SV events from the MOSAIK aligned 
                    BAM files. It will output the unfiltered VCF files.

5. tangram_filter : Filter the raw VCF file generated by the detector.
                    NOTE: this program requires the windowBed 
                    (from bedtools) [5], Unix sort and grep to be in the 
                    default path.

6. tangram_view_scan_file : Provide functions to view or change the contents 
                            in the lib_table.dat and hist.dat files (in 
                            binary format) that are generated by 
                            tangram_scan. This script can be used for
                            a sanity check of the input bam files, such
                            as missing MEI reference names or abnormal
                            read groups.

The overall detection pipeline for Tangram looks like the following

tangram_bam
(BAM Input)
      \
       \
   tangram_scan  \
   (BAM Input)    \
                   -----> tangram_detect --> tangram_filter --> VCF file(s)
                  /       (BAM input)
   tangram_index /
   (Ref Fasta)

For the detailed usage of each program, please run "$PROGRAM -help"


ZA Tag Information
=========================
ZA tag (an optional tag in Bam file) is required for MEI detection with Tangram. 
The basic structure of this tag looks like this:

<@/&:MQ1:MQ2:SP_REF:NUM_MAP:CIGAR:MD>

There are 7 fields in this tag:

1. @ or &: @ means this is the information for the current read and & means this
           is the information for its mate (pair-end sequencing)

2. MQ1: The best mapping quality

3. MQ2: The second best mapping quality

4. SP_REF: If the read can be aligned to a special reference provided by user,
           this field will record the first two characters of the special 
           reference name in this field, e.g. AL(ALU). Otherwise, this 
           field will be empty.

5. NUM_MAP: Number of mapping places of this read that can be found in the 
            genome.

6. CIGAR: CIGAR string of this mapping

7. MD: MD string of this mapping


Bug Report
=========================

Please report bugs using the built-in bug reporting feature in github or 
by sending the authors an email.


References
=========================

[1] http://bioinformatics.bc.edu/marthlab/Main_Page 
[2] https://github.com/wanpinglee/MOSAIK 
[3] https://github.com/pezmaster31/bamtools
[4] https://github.com/ekg/freebayes
[5] http://code.google.com/p/bedtools

tangram's People

Contributors

jiantao avatar wanpinglee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.