Coder Social home page Coder Social logo

epicpmddetect's Introduction

epicPMDdetect

Software that detects PMDs (Partially Methylated Domains) from Illumina (EPIC) Infinium Methylation Assays. Based on MethylseekR, adapted to support EPIC assays, using KNN datapoint selection and alpha-value smoothing.

About PMDs

  • PMDs are regions of intermediate methylation, that are highly disordered between CpGs. (in contrast the background methylation is polarized: highly methylated >=70% or depleted ) link
  • PMDs are celltype-specific link
  • PMDs can be related to closed chromatin compartments link

Method

The local methylation level distribution is approximated using a beta-distribution. Its α,β parameter determines the shape. Here both parameters are set equal, meaning α=β. Values that fullfill α > 1, result in a probability density function shape that favors intermediate methylation levels (PMDs). Values that fullfill α < 1, will result in a probability density function shape that favors the polarized methylation levels (background)

Each covered CpG is assigned to an estimated α value by using its neighboring CpGs to determin the most likely value. Those estimated alpha values are used as input to an 2-state Hidden Markov Model (HMM) that predictes whenever a change between PMD/BG is likely. A solution (segmentation) to the HMM is found using the Viterbi algorithm.

This method is adapted from MethylseekR (MSR):

"Identification of active regulatory regions from DNA methylation data" Lukas Burger and Dimos Gaidatzis and Dirk Schubeler and Michael B. Stadler 2013 Nucleic Acids Research

Also RnBeads is used for EPIC data processing and representation

"Compehensive Analysis of DNA Methylation Data with RnBeads" "Yassen Assenov and Fabian Mueller and Pavlo Lutsik and Joern Walter and Thomas Lengauer and Christoph Bock" 2014 Nature Methods 11

Adjustments to work on EPIC data

Support EPIC data

MSR was developed with WGBS (Whole genome bisulfite sequencing) data in mind. Therefore it operates on read counts, comparing reads of methylated CpGs to the total number of reads. EPIC assays use red/green light intensities to determin the methylation level. The total number of reads is calculated by the sum of light intensities, light intensities covering methylated CpGs are passed as methylated reads.

Respect difference in data point distributions

In comparison to WGBS, EPIC assays only cover a small subset of CpGs from the human genome. Therefore size constrains (distance cutoff) and nearest neighbor selection using KNN ensure that only data points in reasonable distance are used for alpha value estimation.

Reduce influence of certain parameter setups

Per default different values for k (neighboorhood size) and distance cutoff are evaluated to estimate alpha. Those alpha value estimations are averaged to diminish the effect of individual parameter setups to alpha.

Mark unreliable segments

Some areas have too few data points to reliably assign segments. Those are marked accordingly

How to use

load the library

library(epicPMDdetect)

generate a csv sampleSheet that describes your data, herefore refer to the instructions in RnBeads Manual.

Now load the data using rnbeads by calling readEPIC_idats. The sample.col.name musst be the column name of the csv holding the sample names

data = readEPIC_idats(sample.sheet,idat.dir,sample.col.name,preprocess=T)

Specific which samples should be processed (default all) by providing an vector of names (optional a settings object can be passed that controls further parameters.)

segmentRnbSet(data,outputFolder,samples=NULL)

The outputFolder contains the segmentations for each specified sample. They are named:

[sampleName].seg.bed.gz

epicpmddetect's People

Contributors

malger avatar

Stargazers

 avatar

Watchers

 avatar  avatar

epicpmddetect's Issues

A package issue

Hello,

I saw epicPMDdetect is not a package yet, can we still use epicPMDdetect for predicting PMD from EPIC array data? Thanks!

Best regards
Qianhui

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.