Coder Social home page Coder Social logo

wegnerce / smk_emseq Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 27.17 MB

Snakemake workflow for the processing of EMseq data

License: GNU General Public License v3.0

Python 100.00%
bisulfite-sequencing emseq methylation methylation-analysis snakemake

smk_emseq's Introduction

GitHub tag License Made with Python DOI SnakeMake

smk_emseq - A Snakemake-based workflow for EMseq data processing

๐Ÿ“Œ Acknowledgement/Disclaimer

This workflow is heavily based on https://github.com/seb-mueller/snakemake-bisulfite by @seb-mueller. I basically just streamlined/tailored the workflow to my needs.

โ— Needed/used software

The workflow is based on the following tools:

The separate installation of the tools is not necessary, they are installed 'on the fly' (see Usage below).

Snakemake should be installed as outlined in its documentation for instance using conda/mamba. It is recommended to create a dedicated conda environment for Snakemake.

๐Ÿ“˜ Description of the workflow

The workflow can be equally used for the analysis of data derived from NEB's EMseq approach as well as data derived from WGBS (whole-genome bisulfite sequencing). Working with environmental microbes, we made the experience that the usage of commonly available bisulfite treatment kits lead to a strong loss of DNA. As a result, we ended up giving EMseq a try. For details about the methodology have a look at NEBs EMseq paper.

A reference genome stored in resources/ is bisulfite-treated in silico with bismark. Paired-end sequencing data (stored in data/) is subjected to quality-control and adapter-trimming using bbduk. Quality reports are written using fastQC before and after trimming.

Read pairs are subsequently mapped onto the bisulfite-treated genome, alignments with identical mapping positions are removed. Methylations are extracted for all three contexts (CpG, CHH, CHX) and used to generate a .bedGraph and coverage file. The latter can be used for downstream methylation analysis.

The below DAG graph outlines the different processes of the workflow.

DAG of smk_emseq.

๐Ÿ”จ Usage

Start by cloning the repository and move into respective directory.

git clone https://github.com/wegnerce/smk_emseq.git
cd smk_emseq

Place paired sequence data (R{1,2}.fastq.gz) in data/. The repository contains two pairs of exemplary files (La.1_R1.fastq.gz + La.1_R2.fastq.gz & Nd.1_R1.fastq.gz + Nd.1_R2.fastq.gz).

config/ contains, besides from the configuration of the workflow (config/config.yaml), a tab-separated table samples.tsv, which contains a list of all datasets, one per line. The workflow expects *.fastq.gzfiles and R{1,2} as prefixes for forward and reverse read files.

From the root directory of the workflow, processing the data can then be started.

# --use-conda makes sure that needed tools are installed based
# on the requirements specified in the respective *.yaml in /envs
snakemake  --use-conda

The directory structure of the workflow is shown below:

โ”œโ”€โ”€ config
โ”‚   โ”œโ”€โ”€ config.yaml
โ”‚   โ””โ”€โ”€ samples.tsv
โ”œโ”€โ”€ dag.svg
โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ La.1_R1.fastq.gz
โ”‚   โ”œโ”€โ”€ La.1_R2.fastq.gz
โ”‚   โ”œโ”€โ”€ Nd.1_R1.fastq.gz
โ”‚   โ””โ”€โ”€ Nd.1_R2.fastq.gz
โ”œโ”€โ”€ logs
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ resources
โ”‚   โ”œโ”€โ”€ adapters.fa
โ”‚   โ””โ”€โ”€ RHAL1_chromosome_plasmid.fa
โ”œโ”€โ”€ results
โ””โ”€โ”€ workflow
    โ”œโ”€โ”€ envs
    โ”‚   โ”œโ”€โ”€ bbmap.yaml
    โ”‚   โ”œโ”€โ”€ bismark.yaml
    โ”‚   โ”œโ”€โ”€ fastqc.yaml
    โ”‚   โ””โ”€โ”€ samtools.yaml
    โ”œโ”€โ”€ rules
    โ”‚   โ”œโ”€โ”€ bismark.smk
    โ”‚   โ”œโ”€โ”€ qc.smk
    โ”‚   โ””โ”€โ”€ sort.smk
    โ””โ”€โ”€ Snakefile

Output from the different steps of the workflow are stored in /results and /logs.

The resulting *.cov.gz files (results/04_coverage/*.cov.gz) can be used for downstream methylation analysis.

ยฉ๏ธ Carl-Eric Wegner, 2023

smk_emseq's People

Contributors

wegnerce avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.