Coder Social home page Coder Social logo

atac-seq_analysis's Introduction

ATAC-Seq analysis

Author: Ji Huang

This is the my note on the ATAC-Seq analysis.

Background

There are many wonderful tutorials/notes on web and I learnt from them. Below to name a few:

  1. crazyhottommy /ChIP-seq-analysis
  2. crazyhottommy pyflow-ATACseq
  3. ATAC-seq data analysis: from FASTQ to peaks
  4. ATAC-seq Guidelines from Harvard FAS Informatics
  5. From reads to insight: a hitchhiker’s guide to ATAC-seq data analysis, 2020, Genome Biology

Pipeline overview

My scripts is the basic one, including:

  1. Download fast files from EBI. You can use SRA-Explorer to find the URL for downloading fastq files. No need to start from sra files any more.
  2. Adapter trimming with fastp.
  3. Reads mapping with Bowtie2.
  4. Peak calling with Genrich. The nice part of Genrich is Genrich was designed to be able to run all of the post-alignment steps through peak-calling with one command. No need to run samtools, Picard and all kinds of commands. However, Genrich is still not peer-reviewed/published, although it get used many times in papers.
  5. Calculate Fraction of reads in peaks (FRiP).

According to ENCODE term:

Fraction of reads in peaks (FRiP) – Fraction of all mapped reads that fall into the called peak regions, i.e. usable reads in significantly enriched peaks divided by all usable reads. In general, FRiP scores correlate positively with the number of regions. (Landt et al, Genome Research Sept. 2012, 22(9): 1813–1831)

For ATAC-Seq,

The fraction of reads in called peak regions (FRiP score) should be >0.3, though values greater than 0.2 are acceptable. For EN-TEx tissues, FRiP scores will not be enforced as QC metric. TSS enrichment remains in place as a key signal to noise measure.

I mostly follow the ATAC-seq Guidelines from Harvard FAS Informatics.

Pipeline details

You can run step 1,2,3 sequentially, or run submit_all.sh to submit all three jobs.

  1. 01_download_fastq.slurm: download fastq files.
  2. 02_submit_snake.slurm: run the adapter cleaning and Bowtie2 alignment step.
  3. 03_call_peaks.slurm: call peaks with Genrich.
  4. submit_all.sh: if you set up correctly, you can just run this bash script which including the above three jobs.
  5. calc_FRiP.slurm: calculate FRiP.

Misc

  1. To get the maize contigs name that we want to exclude from Genrich peak calling, you can run grep "^B" zmav4.chr.length.txt |cut -f1|sed -z 's/\n/,/g;s/,$/\n/'.
  2. The example code is to re-analyze the maize protoplast ATAC-Seq data in 3D Chromatin Architecture of Large Plant Genomes Determined by Local A/B Compartments, 2017 Molecular Plant.
  3. To get the sum base pair of the peak regions, awk '{$4=$3-$2; sum+=$4} END {print sum}' SRR5748809_10_ATAC_Zma_leaf_mesophyll.narrowPeak. The result is 22856828 or 22.9Mb.

atac-seq_analysis's People

Contributors

timedreamer avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.