Coder Social home page Coder Social logo

c-g-l / haplotagging Goto Github PK

View Code? Open in Web Editor NEW

This project forked from evolgenomics/haplotagging

0.0 0.0 0.0 55 KB

Code and binaries related to processing haplotagging data

License: GNU General Public License v3.0

Shell 33.27% C++ 45.62% Perl 21.11%

haplotagging's Introduction

haplotagging

Code and barcode files related to processing haplotagging data

Dependencies

  • bcl2fastq v2.18.0 or above.
  • bwa v0.6 or above.
  • libgzstream

Strategy

Haplotagging uses a segmented combinatorial barcoding system in the standard Illumina Nextera indexing positions i5 and i7 to preserve linking information. To properly convert the data, our code expects the full set of R1, I1, I2, R2 fastq files, and assigns the barcode based on the look-up table segments A, B, C and D. It then encodes the barcode as comment fields BX, QX and RX (corresponding to barcode, quality strings, and corrected barcode tags, respectively) in a standard set of paired-end fastq files with R1 and R2.

These comment fields can then be passed into a BAM file as BX, QX and RX tags using standard software like bwa with a -C switch.

Example bcl2fastq command:

bcl2fastq --use-bases-mask=Y150,I13,I13,Y149 --create-fastq-for-index-reads -r [INT] -w [INT] -d [INT] -p [INT] -R <run_dir, e.g. 190125_ST-J00101_0130_AHYJWTBBXX> --tiles s_[1-8] --output-dir=<output_dir> --interop-dir=<INTEROPT_DIR> --reports-dir=<REPORT_DIR> --stats-dir=<STATS_DIR>

Here the options --use-bases-mask=Y150,I13,I13,Y149 allows the full use of all 13 positions in the index reads. Note that a single cycle is taken out of R2 to extend the I2 cycle to 13nt.

--create-fastq-for-index-reads is key here to allow our demultiplexing code to see the full, untrimmed barcodes.

Best Practice Recommendations

Since around mid-2021, we and others have found concrete advantages to using barcode-first read mapppers like EMA. Specifically, EMA uses BWA's API to place reads, but does a better job of taking linked-reads (or "read clouds") into account. Please see their repo for details.

For this reason, we recommend substituting EMA for the read mapping step. To do so, use our script ema_prep.sh to pre-process and sort the reads prior to mapping.

While we find EMA to be less polished than BWA and it currently involves additional overheads, we feel strongly that the improved read mapping, especially in complex regions, is well worth you trouble. Please consider adopting this recommendation in your pipeline.

haplotagging's People

Contributors

evolgenomics avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.