Coder Social home page Coder Social logo

alexandersimws / automatedbisulfiteanalysis Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 98 KB

A simple python script to find the amount of methylation after bisulfites sequencing from .fasta files

License: MIT License

Python 100.00%
cpg-sites bisulfite-sequencing cpg-islands methylation automated-bisulfite-analysis

automatedbisulfiteanalysis's Introduction

Automated Bisulfite Analysis

Overview

A simple automated bisulfite analysis Python script to determine the methylation status of CpG sites (Predicted by MethPrimer) given sequences of control samples (.fasta files) and sequences of perturbed samples (also .fasta files). Read below for very brief background information, usage instructions and an example output.

In brief, it allows the automated analysis of the extent of methylation of cells after being exposed to a perturbation (e.g. cigarette smoke).

Can be run from terminal in Mac OS. An example of visualisation of the data with powerpoint is provided as example_visualisation_with_powerpoint.png

Background

Epigenetics

Epigenetics is the reversible modification of DNA to change the amount of transcription and, hence, translation of various proteins in organisms. Gene expression in the transcription stage can be modified in the following ways:

  1. DNA Methylation
  2. Histone Modification
  3. Non-coding DNA

This script focuses on the analysis of the extent of DNA methylation in promoter sequences. Methylation of the promoter sequence inhibits transcription factors from binding to the DNA and represses transcriptional activity

https://www.cdc.gov/genomics/disease/epigenetics.htm

Bisulfite Sequencing

Treatment of DNA with bisulfite converts cytosine residues to uracil but leaves methylated cytosines unaffected. After direct sequencing, unmethylated cytosines are displayed in the sense strand as thymine residues. These are called CpG islands. This script compares the original sequence found in MethPrimer with control sequences and perturbed sequences.

In my experiment, RNA was extracted from control and perturbed samples, converted into cDNA, treated with bisulfite and then ligated into a TOPO vector (ThermoFisher) and subsequently E. Coli. These were grown on an agar plate containing X-gal and successfully transformed bacterial colonies were observed as white colonies instead of blue colonies. These successfully transformed bacterial colonies were picked by a pipette and sequenced.

https://en.wikipedia.org/wiki/Bisulfite_sequencing

MethPrimer

MethPrimer is a program for designing bisulfite-conversion-based Methylation PCR Primers. Currently, it can design primers for two types of bisulfite PCR: 1) Methylation-Specific PCR (MSP) and 2) Bisulfite-Sequencing PCR (BSP) or Bisulfite-Restriction PCR. MethPrimer can also predict CpG islands in DNA sequences.

https://www.urogene.org/methprimer/

Relevant functionalities of MethPrimer are designing primers for bisulfite-sequencing PCR and prediction of CpG islands.

Quick Start

Dependencies

Ensure you have the following libraries installed:
$ pip install biopython
https://biopython.org/wiki/Download
$ pip install pandas
https://pypi.org/project/pandas/
$ pip install plotly-express
https://pypi.org/project/plotly-express/

Preparing the files and understanding your data

Assuming you've already conducted bisulfite PCR and have the sequences of your control and perturbation samples perform the following steps:

  1. Identify the sequence you're analysing with MethPrimer and index (0 indexed) all the CG segments between your forward and reverse primers (Indicated by GC in the original stand on top)
  2. Identify false CpG sites (GC sequences that MethPrimer does not recognise as a CpG site, predicted CpG sites are indicated by a "++" between the top and bottom strand), note their index and modify the bisulfite_analysis.py as stated in the "user config" section
  3. MethyPrimer displays 2 strands (the strand on top is the original sequence and the strand at the bottom is the bisulfite treated sequence). Visually identify a sequence that is unaffected by bisulfite sequencing after your forward primer and just before the first CpG island. Find another short sequence after your last CpG island and before your reverse primer. Enter these in the .py file as the variables "before_first_cpg" and "after_last_cpg" respectively. This allows the program to zoom in on the area of DNA with CpG sites.
  4. Change the total number of CpG sites (according to MethPrimer) in bisulfite_analysis.py
  5. Change the number of control and perturbed samples you have in bisulfite_analysis.py
  6. Place all your .fasta files of your control sequences into the control_samples folder
  7. Place all your .fasta files of your perturbed sequences into the perturbed_samples folder

Running the program from terminal

  1. Prepare your sequences and modify variables in the bisulfite_analysis.py file as shown above
  2. Navigate to the the folder you placed bisulfite_analysis.py, control_samples and perturbed_samples with
    $ cd path/to/bisulfite_analysis/
  3. Run the program with
    $ python bisulfite_analysis.py
  4. For a graph of the statistics, uncomment the last block of code in the PRINT_OUTPUT() function in the bisulfite_analysis.py file

Example Output

Based in example input files provided

Columns are

  1. Sample name (from file name)
  2. Methylation status of each CpG site, methylated sites are indicated by an "O", unmethylated sites are indicated by a "_"
  3. Number of sites methylated in the sequence

Control Group Methylation:
A07 : __________O ; 1
A08 : __________O ; 1
B07 : _________OO ; 2
C07 : __________O ; 1
D07 : O_________O ; 2
E07 : __________O ; 1
F07 : ___O______O ; 2
G07 : __________O ; 1
H07 : _O________O ; 2

Perturbed Group Methylation:
A12 : ______OO__O ; 3
B12 : ______OOO_O ; 4
C12 : ______O_O_O ; 3
D12 : ______OO__O ; 3
E12 : _______OO_O ; 3

Statistics:
Control Methylation Percentage: 13.131313131313133 %
Perturbed Methylation Percentage: 29.09090909090909 %

Optional bar graph of statistics can be produced by uncommenting the relevant code in the .py file.

automatedbisulfiteanalysis's People

Contributors

alexandersimws avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.