Coder Social home page Coder Social logo

compepigen / methrix Goto Github PK

View Code? Open in Web Editor NEW
27.0 8.0 10.0 105.36 MB

An R :package: for fast and flexible DNA methylation analysis

Home Page: https://www.bioconductor.org/packages/release/bioc/html/methrix.html

License: Other

R 99.79% CSS 0.21%
dna-methylation bedgraph bioinformatics

methrix's Introduction

A Github Pages template for academic websites. This was forked (then detached) by Stuart Geiger from the Minimal Mistakes Jekyll Theme, which is © 2016 Michael Rose and released under the MIT License. See LICENSE.md.

I think I've got things running smoothly and fixed some major bugs, but feel free to file issues or make pull requests if you want to improve the generic template / theme.

Note: if you are using this repo and now get a notification about a security vulnerability, delete the Gemfile.lock file.

Instructions

  1. Register a GitHub account if you don't have one and confirm your e-mail (required!)
  2. Fork this repository by clicking the "fork" button in the top right.
  3. Go to the repository's settings (rightmost item in the tabs that start with "Code", should be below "Unwatch"). Rename the repository "[your GitHub username].github.io", which will also be your website's URL.
  4. Set site-wide configuration and create content & metadata (see below -- also see this set of diffs showing what files were changed to set up an example site for a user with the username "getorg-testacct")
  5. Upload any files (like PDFs, .zip files, etc.) to the files/ directory. They will appear at https://[your GitHub username].github.io/files/example.pdf.
  6. Check status by going to the repository settings, in the "GitHub pages" section
  7. (Optional) Use the Jupyter notebooks or python scripts in the markdown_generator folder to generate markdown files for publications and talks from a TSV file.

See more info at https://academicpages.github.io/

To run locally (not on GitHub Pages, to serve on your own computer)

  1. Clone the repository and made updates as detailed above
  2. Make sure you have ruby-dev, bundler, and nodejs installed: sudo apt install ruby-dev ruby-bundler nodejs
  3. Run bundle clean to clean up the directory (no need to run --force)
  4. Run bundle install to install ruby dependencies. If you get errors, delete Gemfile.lock and try again.
  5. Run bundle exec jekyll liveserve to generate the HTML and serve it from localhost:4000 the local server will automatically rebuild and refresh the pages on change.

Changelog -- bugfixes and enhancements

There is one logistical issue with a ready-to-fork template theme like academic pages that makes it a little tricky to get bug fixes and updates to the core theme. If you fork this repository, customize it, then pull again, you'll probably get merge conflicts. If you want to save your various .yml configuration files and markdown files, you can delete the repository and fork it again. Or you can manually patch.

To support this, all changes to the underlying code appear as a closed issue with the tag 'code change' -- get the list here. Each issue thread includes a comment linking to the single commit or a diff across multiple commits, so those with forked repositories can easily identify what they need to patch.

methrix's People

Contributors

clarissafeuersteinakgoz avatar heylifehd avatar hpages avatar lutsik avatar maurerv avatar maxschoenung avatar nturaga avatar poisonalien avatar rnbatra avatar tkik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

methrix's Issues

Bioconductor devel version error

Methirx is failing on Bioconductor devel for a while now. This issue was a tricky one to track but should be fixed soon to avoid being kicked out from next release. Problem arises from changes BSgenome objects and the way they store metadata. methrix parses ref. build info from BSgenome object which is stored as an attribute.

Current implementation as on BC release and GitHub

#version: BSgenome.Hsapiens.UCSC.hg19_1.4.3
#Version: BSgenome_1.56.0
> ref_genome = BSgenome::getBSgenome(genome = 'hg19')

> names(attributes(x = ref_genome))
 [1] "pkgname"            "single_sequences"   "multiple_sequences"
 [4] "source_url"         "user_seqnames"      "injectSNPs_handler"
 [7] ".seqs_cache"        ".link_counts"       "nmask_per_seq"     
[10] "masks"              "organism"           "common_name"       
[13] "provider"           "provider_version"   "release_date"      
[16] "release_name"       "seqinfo"            "class"

>attributes(x = ref_genome)$provider_version
[1] "hg19"

In the newer versions (BC devel) this information is stored under metadata subslot (which results in NULL when one try to access)

#version: BSgenome.Hsapiens.UCSC.hg19_1.4.3
#Version: BSgenome_1.57.6

> names(attributes(x = ref_genome))
 [1] "pkgname"            "single_sequences"   "multiple_sequences"
 [4] "seqinfo"            "user_seqnames"      "injectSNPs_handler"
 [7] ".seqs_cache"        ".link_counts"       "metadata"          
[10] "class"

>attributes(x = ref_genome)$provider_version  #Results in NULL
NULL

> attributes(x = ref_genome)$metadata$genome #Workaround
[1] "hg19"

This line needs to be changed for the fix.

Typo in the documentation

in documenatation: at STEP2

meth = methrix::read_bedgraphs(files = bdg_files[1:3], pipeline = "MethylcTools", collapse_starnds = TRUE, vect_batch_size = 3, ref_build = "Hs37d5", ref_cpgs = hs37d5_cpgs)

collapse_starnds --> collapse_strands

Error while summarising regions

summary_data_tables<-list()
for(i in seq(granges_list)){

  • summary_data_tables[[i]]<-get_region_summary(m = meth, regions = granges_list[[i]], type = "M", how = "mean")
    
  • }
    -Checking for overlaps..
    -Summarizing by average
    Error in [.data.table(dat, , lapply(.SD, mean, na.rm = na_rm), by = yid, :
    Some items of .SDcols are not column names: [cells01-rep1, cells01-rep2, cells01-rep3, cells02-rep1, cells02-rep2, cells02-rep3, cells03-rep1, cells03-rep2, cells03-rep3, cells04-rep1, ...]

plot_coverage; error in selection of random subset

methrix::plot_coverage(meth, type = "dens", size.lim = 1e+06)

warnings/errors:
The dataset is bigger than the size limit. A random subset of the object will be used that contains ~1e+06 observations.
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'

further information:
meth object has 25679692 elements/CpGs,
plot funktion worked when size.lim was equal or larger than the methrix object and therefore did not need to select a random subset.

Subsetting samples will not allow report generation when recal_stats=T

Here the code and the error message:

install_github( CompEpigen/methrix ,  ref = "devel_update")
library("methrix")

report.dir <- "/C010-projects/Pathway_27/SGBS_hydroxymethylation/WGoxBS/QC_all"

setwd(report.dir)
m <-readRDS("methrix_object_hs37d5.RDS")


#subset samples 
m_subset <-m[,m@ metadata$descriptive_stats$genome_stat$mean_cov>10]

#filter CpGs 
data_mask <- mask_methrix(m_subset , low_count = 2, high_quantile = 0.99)
m10masked <-coverage_filter(data_mask, cov_thr = 10, min_samples = 16)

#generate report and recalculate statistics
methrix_report(m10masked, output_dir=file.path(report.dir,"cov10" ), recal_stats = T)


**Error message:**
> _Step 1 of 5: Methylation/Coverage per chromosome  Error in merge.data.table(meth_stat, cov_stat, by = c("chr", "Sample_Name")) :    Elements listed in `by` must be valid column names in x and y In addition: Warning message: In merge.data.table(meth_stat, cov_stat, by = c("chr", "Sample_Name")) :   You are trying to join data.tables where 'x' and 'y' arguments are 0 columns data.table.
> --_

Display issue of plot legend titles in Methrix report

Hey man !
Very cool plotly report !
I don't know if I am the only one to experience this, but the legend titles in some plot are cut out of the display windows, see below (upper right corner of the plots):
methrix issue1
Methrix_issue2

I am using Chromium Version 80.0.3987.87 (Build officiel) Built on Ubuntu , running on Ubuntu 16.04 (64 bits). It occurs on full screen mode half-splitted screen mode.
The issue is also affecting the exported PNG file.

This is no emergency.
If you wait enough maybe I will just fix it myself :) .
I guess adjusting plot's margin should solve this (check it first, and if setting new margins solves it you could just provide a fix on the right side margin setting a value on separated plots, or using nchar() on the given color variable name multiplied by a factor n that keeps the margin large enough to display the full legend title doesn't matter it's length).
Cheers !

Using get_region_summary with very large methrix object

Hi, I am attempting to use get_region_summary with a very large methrix object (almost 400 human samples) and a GRanges object with about 80,000 regions and am finding that it is constantly using all the RAM on my machine before being killed. Even on a workstation with almost 200 GB RAM, the problem persists. I guess by using the n_chunks parameter I can reduce the RAM being used, but I am wondering is there a rough guide for how many chunks to use depending on the number of samples and number of regions being evaluated?

Bug when exporting bsseq objects from methrix objects created using Methyldackel pipeline

I've been having a great time using methrix over the past weeks. Thanks for creating such a nice package.

I've run into a bug when reading BedGraph files from Methyldackel using the pipeline = "MethyDackel preset and transforming the methrix file into a bsseq object using methrix2bsseq.

The resulting bsseq from methrix2bsseq will have non-integer M-Coverage values when retrieving them using:

meth <-  methrix::read_bedgraphs(n_threads = threads,
                                            files = bedgraph_fps,
                                            pipeline = "Methyldackel"
                                            ref_cpgs = mm10_cpgs,
                                            coldata = meta_data)
bsseq_obj <- methrix::methrix2bsseq(meth)                                
bsseq::getCoverage(bsseq_obj, type = M)

I've had a look at the code for methrix2bsseq and I think I've found the cause.

When using pipeline = "MethyDackel, read_bedgraph uses the Methyldackel calculated beta values, however these are rounded to the 2nd decimal place.
Example(4th column):

track type="bedGraph" description="SRR1182519.sorted CpG methylation levels"
1	25114	25115	66	2	1
1	25115	25116	100	3	0

methrix2bsseq calculates the M-Matrix of methylated reads by multiplying with the coverage, however using these rounded beta values will cause non-integer values to be inserted in the M-Coverage Matrix.

My workaround for now is setting the read_bedgraph fields manually since then methrix calculates exact beta values itself.
Then everything works out

methrix::read_bedgraphs(n_threads = threads,
                                           files = smoothed_bsseq_files,
                                           chr_idx = 1, start_idx = 2, end_idx = 3, M_idx =5, U_idx = 6,
                                           ref_cpgs = mm10_cpgs,
                                           coldata = meta_data,
                                            
  )

Maybe it would make sense having having methrix calculate the Beta values when reading from Methyldackel bedgraphs?

Error while removing uncovered CpGs

meth <- methrix::remove_uncovered(m = meth)
Error in as.vector(x, "list") :
cannot coerce type 'closure' to vector of type 'list'
traceback)(
Error: unexpected ')' in "traceback)"
traceback()
No traceback available

Show which CpGs are missing when loading bedGraphs

The loading report shows first a breakdown of CpGs by "raw", "filtered", "stranded". Downstream there is a "Processing" section, where also "CpGs missing" numbers are given after processing each bedGraph. It is unclear to which number from the first section (raw/filtered/stranded) the missing number should be related. Perhaps specifying this in parenthesis along with percentages could be useful.

Suggestions for example dataset

Hello @tkik @MaxSchoenung @HeyLifeHD @lutsik ,

As a part of the package it would be nicer to have an example data for testing/demonstration. Bsseq has BS.chr22 which can be loaded with the command data(BS.chr22). In methrix I included a part of single cell dataset from bsmap protocol - data(mm9_bsmap). However @tkik suggested that this dataset is quite "boring" and we should use something more robust - and I agree.

Two options:

  1. We convert the dataset bundled with bsseq to methrix object and use it.
  2. We make our own custom dataset from well studied in-house data.

If you opt for 2, we should make sure that we have at-lease two samples while keeping the object size small. For example, bsseq data is quite small (3.8MB) with ~50K loci and 2 samples. Bioconductor has a package size limit of 5 MB and the core team is quite strict in this regard.

> data(BS.chr22)
> BS.chr22
An object of type 'BSseq' with
  494728 methylation loci
  2 samples
has not been smoothed
All assays are in-memory
> print(object.size(x = BS.chr22), units = "MB")
3.8 Mb

Suggestions welcome, and please post here if you have an example set in mind.

Error while masking CpGs based on coverage

meth <- methrix::mask_methrix(m = meth, low_count=5, high_quantile = 0.99)
-Masked 5,498,862 CpGs due to low coverage.
Error in [<-(*tmp*, row_idx, value = NA) :
linear subassignment to a DelayedArray object 'x' (i.e. 'x[i] <-
value') is only supported when the subscript 'i' is a logical
DelayedArray object of the same dimensions as 'x' and 'value' an
ordinary vector of length 1)
traceback()
4: stop(wmsg(.linear_subassign_error_msg))
3: [<-(*tmp*, row_idx, value = NA)
2: [<-(*tmp*, row_idx, value = NA)
1: methrix::mask_methrix(m = meth, low_count = 5, high_quantile = 0.99)

remove_snps() - Error in h(simpleError(msg, call))

After loading a methrix object from a RDS file ( readRDS() ), when I tried to remove SNPs this error message shown up (partially translated from French):

Error in h(simpleError(msg, call)) : 
  evaluation error of the argument 'x' when selecting a method for 'unique' function: evaluation error of the argument 'x' when selecting a method for 'unique' function: evaluation error of the argument 'x' when selecting a method for 'which' function : Sequence names M in 'ranges' not present in reference genome hs37d5.

For some reason the error for unique() is printed 2 times. The 3 errors are concatenated (without carriage returns). The last part of the error message sounds like an attempt to query M chromosome on the annotation MafDb.1Kgenomes.phase3.hs37d5.

Here is the complete code:

#Load library
library(methrix)
library(BSgenome.Hsapiens.UCSC.hg19)

#Example bedgraph files 2 cancer samples and 2 normal samples
bdg_files <- list.files(
  path = system.file('extdata', package = 'methrix'),
  pattern = "*bedGraph\\.gz$",
  full.names = TRUE
)

#Generate some sample annotation table
sample_anno <- data.frame(
  row.names = gsub(
    pattern = "\\.bedGraph\\.gz$",
    replacement = "",
    x = basename(bdg_files)
  ),
  Condition = c("cancer", 'cancer', "normal", "normal"),
  Pair = c("pair1", "pair2", "pair1", "pair2"),
  stringsAsFactors = FALSE
)

#First extract genome wide CpGs from the desired reference genome
hg19_cpgs <- methrix::extract_CPGs(ref_genome = "BSgenome.Hsapiens.UCSC.hg19")

meth <- readRDS(file = "methrix_object.RDS")

#Generate report
methrix::methrix_report(meth = meth, output_dir = "~/methrix_tuto/")

# #Remove uncovered locis
# meth = methrix::remove_uncovered(m = meth)

if(!require(MafDb.1Kgenomes.phase3.hs37d5)) {
  BiocManager::install("MafDb.1Kgenomes.phase3.hs37d5")}
if(!require(GenomicScores)) {
  BiocManager::install("GenomicScores")}

library(MafDb.1Kgenomes.phase3.hs37d5)
library(GenomicScores)

#Remove SNPs
meth_snps_filtered <- methrix::remove_snps(m = meth)

The error only happens for the last command.
I cannot reproduce the error when the methrix object is generated with read_bedgraphs(), instead of being loaded from a RDS file.

Let me know if you need anything else for the issue.
Best,

Import of the bed files should be more generic

Thanks Anand, that is a great start!

I still think that the import of bed files should be a bit more structured. What I mean is that, no matter what exact format the bed has, the information which they provide is roughly the same (chromosome, start, end, strand, number of methylated, number of unmethylated, mean methylation, total coverage, anything else?). We could therefore first could try to map each input column to the set of predefined columns, and then write a common read-in procedure that does not depend on the format anymore. Back then I tried to do it this way in RnBeads, but not sure about my code anymore. Please do look/steal/ stuff from there (dataImport.R).

Error using HDF5 backend

Hi,

I used HDF5 backend due to memory issues. I firstly tried to save a RDS file of the output from methrix::read_bedgraphs, but it gives me error that name = '/private/var/folders/0r/35wl9njn767_c2md3zy0h61w0000gn/T/Rtmpp14UbV/M_sink_1.h5', errno = 2, error message = 'No such file or directory'. I checked that there is the file named M_sink_1.h5 in the directory so I don't know what happened here. Also, I tried to remove the h5 file and rerun the code, it gives error that Error in DelayedArray::write_block(block = as.matrix(b$bdg[, .(beta)]), :
unused argument (sink = M_sink). Could you help me? :)

Thanks

Error in methrix report with subsetted methrix object

methrix_report(m10masked, "/C010-projects/Pathway_27/SGBS_hydroxymethylation/WGoxBS/QC/methrix_reports_filtered2", recal_stats=T)

error message only occures if recal_stats=T:

Step 1 of 5: Methylation/Coverage per chromosome
Error in merge.data.table(meth_stat, cov_stat, by = c("chr", "Sample_Name")) :
Elements listed in by must be valid column names in x and y
In addition: Warning message:
In merge.data.table(meth_stat, cov_stat, by = c("chr", "Sample_Name")) :
You are trying to join data.tables where 'x' and 'y' arguments are 0 columns data.table.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.