Coder Social home page Coder Social logo

mapping_capture_experiments's Introduction

Mapping pipeline for target capture experiments

by Claudia Fontsere July 2022 - keep in mind that this will need to be updated

Endogenous or hDNA Quantification to prepare equi-endogenous pools for cature

Step 0 - Demultiplexing ##### Production reads #####

bash hDNA/Step0_demultiplex.sh # previousy you have to gather the barcode information per pool  
bgzip ${sample}.fastq   
mv *gz FASTQ/Demultiplex/  

Step 1 - Trimming adapters #####

bash Step1_trimmingAdaptors_hDNA.sh # Here I do not collapse reads since I am only interested on how many of reads map  

Step 2 - Mapping ##### Mapped reads #####

bash hDNA/Step2_mapping_hg19_hDNA.sh  

Step 3 - Remove Duplicates ##### Unique reads ####

bash hDNA/Step3_rmdupsPicard_hg19_hDNA.sh  

Step 4 - Filtering by quality, remove secondary aligmnets, unmapped ##### Reliable reads #####

bash hDNA/Step4_filterQual_hg19_hDNA.sh  

Step 5 - Count number of reads in each step, I count the FIRST in pair #####

zcat FASTQs/Demutliplex/${sample}_1.fastq.gz | wc -l # count production reads --> then divide by 4  
bash hDNA/Step5_STATS_hDNA.sh  
# Extract the rellevant column for each .stats file, only for the First in pair--> Column 9
grep "FIRST" ${sample}.stats | awk '{print $9}' | uniq | head -n 1 #HQ aligned reads
# Extract the high quality aligned bases for paired reads --> Column 10
grep -w "PAIRED" ${sample}_rmdups.qual.stats | awk '{print $10}' | uniq | head -n 1 # only for the Reliable reads to calculate the coverage

Step 6 - Add all the previously collected information in an spread sheet hDNA=RR/PR #####

Disclaimer: If number of production reads is very different between libraries, it might be wise to downsample to the same amount of reads/library

Processing of captured data

Step 0 - Demultiplexing ##### Production reads #####

bash Capture/Step0_demultiplex.sh # previousy you have to gather the barcode information per pool  
bgzip ${sample}.fastq
mv *gz FASTQ/Demultiplex/

Step 1 - Trimming adapters #####

bash Capture/Step1_trimmingAdaptors_FastP.sh # If sequenced in 2x150bp and the insert size is small, collapse reads  

Step 2 - Mapping ##### Mapped reads #####

bash Capture/Step2_mapping_hg19_pe.sh # mapping pair end reads  
bash Capture/Step2_mapping_hg19_se.sh # mapping single end reads  

Step 2.1 - Merging single end and paired end data #####

bash Capture/Step2.1_merge_pese.sh  

Step 2.2 - Merging - Optional, if same library has been captured in diff pools or sequenced in different lanes #####

bash Capture/Step2.2_mergeLib.sh  

Step 3 - Remove Duplicates ##### Unique reads ####

bash Capture/Step3_rmdupsPicard_hg19.sh  

Step 4 - Filtering by quality, remove secondary aligmnets, unmapped ##### Reliable reads #####

bash Capture/Step4_filterQual_hg19.sh  

Step 5 - Extract OnTarget Reads intersecting with BED file ##### OnTarget reads #####

bash Capture/Step5_Ontarget_hg19.sh  

Step 6 - Count number of reads in each step #####

zcat FASTQs/Demutliplex/${sample}_1.fastq.gz | wc -l # count production reads --> then divide by 4  
bash Capture/Step6_STATS.sh # here one can decide to do stats per indvidiual hybridization (process everthing without merging the reads from multiple hyb) or total (after merging from different hyb)  

Step 7 - Add all the previously collected information in an spread sheet hDNA=RR/PR #####

mapping_capture_experiments's People

Stargazers

 avatar

Watchers

Clàudia Fontserè avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.