Coder Social home page Coder Social logo

bushmanlab / chiva Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cnobles/chiva

0.0 1.0 0.0 21.89 MB

[Dev] Combined HIV integration site and genomic variant analysis pipeline.

License: GNU General Public License v3.0

Python 7.34% Shell 5.92% R 86.66% CSS 0.07%

chiva's Introduction

cHIVa

Combined HIV integration site and genomic variant analysis pipeline.

Install

git clone https://github.com/BushmanLab/chiva.git

cd chiva

./install.sh

Testing

chiva setup configs/HIV_test.config.hml

chiva run configs/HIV_test.config.hml

Processing Overview

This section provides an overview of each step in the processing pipeline.

Demultiplexing

Reads are demultiplexed into R1 and R2 .fastq.gz files for each sample using a standard demultiplexing algorithm.

Trimming

R1 reads for each sample are trimmed on the 5' end to remove the linker sequence used in that sample's library construction. Reads not matching the linker sequence are filtered out. The remaining reads are further trimmed on the 3' end to remove the reverse complement of the expected LTR sequence.

R2 reads for each sample are trimmed in three steps: First the 8nt "primer bit" is removed from the 5' end, and reads without a perfect match are filtered out. Second, an approximate match of the expected LTR sequence is removed from the 5' end, and reads without an approximate match are filtered out. Lastly, a CA sequence is removed from the 5' end, and reads without a CA at the 5' end are filtered out. This last bit is done because the CA is known to be the end of the LTR and thus remaining sequence should be human (or internal viral genome).

Filtering

Read pairs that did not pass both sets of trimming filters (R1 and R2) are removed from downstream analysis.

Consolidation

Unique R1 and R2 sequences are identified. A key file is generated to map sequencer IDs onto each unique sequence identified.

Mapping

R1 and R2 reads are mapped, independently, against the hg38 genome using BLAT. Default parameters can be found in the configs/HIV_test.config.yml file.

Integration site identification

For each sample, BLAT output is analyzed using tools/rscripts/couple.R and a list of unique integration sites is generated. We define a "unique site" as a unique pair of "anchor" (R2) and "adrift" (R1) sequences.

Condensing integration sites

The list of unique sites for each sample is condensed across replicates and regions (U3 and U5) for each sample. Sites found in both U3 and U5 regions are marked as "Dual Detect".

Filtering integration sites

The final processing step is filtering of "crossover" sites which are sites suspected of spilling over from one sample to another.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.