Coder Social home page Coder Social logo

methylize's People

Contributors

jaredmeyers avatar marcmaxson avatar nhrigby avatar viabard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

methylize's Issues

support differentially methylated probe (DMP) EWAS COVARIATES

INPUTS
COVARIATES: probe plus experiment covariates to control for (in the linear/logistic reg model)
linear_model( probe_dataset, phenotype, covariates (from phenotype dataset) )

  • presumably these arein the meta_data object created by methylprep and user specifies which columns in dataframe to use.

Outputs

Table of associations (CSV file)
Rows = probes
Columns = ...

  • regression coefficient
  • SE
  • 95% Confidence Interval (Upper Limit)
  • 95% Confidence Interval (Lower Limit)
  • p-value
  • FDR
  • Chromosome
  • Genomic coordinate
  • Gene (if probe found in a gene)
  • RefGene Group (e.g., TSS1500, Body, 5'UTR, Exon 1)
  • Relation to CpG Islands (e.g., N_Shore, S-Shelf, Island)

Volcano plot

  • x-axis = regression coefficient
  • y-axis = -log10(pvalue)
  • Draw horizontal lines at genome-wide significance levels (p=5E-8 and FDR<0.05)

Manhattan plot

  • Color coded by chromosome
  • Draw horizontal lines at genome-wide significance levels (p=5E-8 and FDR<0.05)

QQ plot

  • Print 'lambda' value on plots

This site may be helpful for the Manhattan plots and QQ plots:
http://www.gettinggeneticsdone.com/2014/05/qqman-r-package-for-qq-and-manhattan-plots-for-gwas-results.html

DMP features in v1.0

  • get refGene and UCSC database integration working (v1.0)
  • DMR in methylize: miss-methyl go-meth() (est 8h)
  • look at MissMethyl. Can we integrate it? https://fuma.ctglab.nl and missMethyl::gometh
  • make sure you can export BigWig to other gene mapping tools, and keep the gene assoc networks outsourced as part of the usage, but still easy to integrate with as possible. And be sure your code has an academic paper trail of citations.
  • Nichole: https://github.com/tanghaibao/goatools/tree/main/notebooks
  • look into DMR functions in methylprep in python
    https://github.com/ListerLab/HOME/blob/master/HOME/HOME_functions.py
  • FUTURE methylize unit EWAS equivalence test: compare DMP (diff meth pos) and DMR(bumphunter) with R?
    1 - need sex, age, sample type, and (treatment vs control) paradigm binary (log regress)
    2 - contiuum (blood pressure as predictor with linear regression) so BP with AGE.
    3 - at least one other covariate (age)
    or use blood pressure with a cutoff to become a binary outcome for log-regress.
  • not including bumpHunter (was too diff from later versions avail in python)
  • Allow users to select between the latest two genome references from Ensembl (hg38 and hg19).
  • combined-pvalues
  • cruzdb or equivalent gene-mapping tool, or some way to export data into a mapper with good documentation / demos on how to do this.

Conversion from Beta to M values not working.

I think I have a workaround for this, so this is more of a heads up than an issue that needs solving at this point.

I have a data frame holding beta values. When I put that into methylize.diff_meth_pos(df, phenotype) it breaks, giving me a math domain error.

[/data/projects/classifiers/src/exploration/methylation/differentialMethylation.ipynb](https://vscode-remote+ssh-002dremote-002bmtbnotes-002ddev-002ezerochildhoodcancer-002ecloud.vscode-resource.vscode-cdn.net/data/projects/classifiers/src/exploration/methylation/differentialMethylation.ipynb) Cell 17 in ()
----> [1](vscode-notebook-cell://ssh-remote%2Bmtbnotes-dev.zerochildhoodcancer.cloud/data/projects/classifiers/src/exploration/methylation/differentialMethylation.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) methylize.diff_meth_pos(meth_data, phenotype)

File [/data/projects/classifiers/bin/envs/classifiers/lib/python3.10/site-packages/methylize/diff_meth_pos.py:283](https://vscode-remote+ssh-002dremote-002bmtbnotes-002ddev-002ezerochildhoodcancer-002ecloud.vscode-resource.vscode-cdn.net/data/projects/classifiers/bin/envs/classifiers/lib/python3.10/site-packages/methylize/diff_meth_pos.py:283), in diff_meth_pos(meth_data, pheno_data, regression_method, impute, **kwargs)
    281     def beta2m(val):
    282         return math.log2(val[/](https://vscode-remote+ssh-002dremote-002bmtbnotes-002ddev-002ezerochildhoodcancer-002ecloud.vscode-resource.vscode-cdn.net/)(1-val))
--> 283     meth_data = meth_data.apply(np.vectorize(beta2m))
    284     if verbose: LOGGER.info(f"Converted your beta values into M-values; {meth_data.shape}")
    286 # Check that the methylation and phenotype data correspond to the same number of samples; flip if necessary

File [/data/projects/classifiers/bin/envs/classifiers/lib/python3.10/site-packages/pandas/core/frame.py:9568](https://vscode-remote+ssh-002dremote-002bmtbnotes-002ddev-002ezerochildhoodcancer-002ecloud.vscode-resource.vscode-cdn.net/data/projects/classifiers/bin/envs/classifiers/lib/python3.10/site-packages/pandas/core/frame.py:9568), in DataFrame.apply(self, func, axis, raw, result_type, args, **kwargs)
   9557 from pandas.core.apply import frame_apply
   9559 op = frame_apply(
   9560     self,
   9561     func=func,
   (...)
   9566     kwargs=kwargs,
   9567 )
-> 9568 return op.apply().__finalize__(self, method="apply")

File [/data/projects/classifiers/bin/envs/classifiers/lib/python3.10/site-packages/pandas/core/apply.py:764](https://vscode-remote+ssh-002dremote-002bmtbnotes-002ddev-002ezerochildhoodcancer-002ecloud.vscode-resource.vscode-cdn.net/data/projects/classifiers/bin/envs/classifiers/lib/python3.10/site-packages/pandas/core/apply.py:764), in FrameApply.apply(self)
    761 elif self.raw:
    762     return self.apply_raw()
...
File [/data/projects/classifiers/bin/envs/classifiers/lib/python3.10/site-packages/methylize/diff_meth_pos.py:282](https://vscode-remote+ssh-002dremote-002bmtbnotes-002ddev-002ezerochildhoodcancer-002ecloud.vscode-resource.vscode-cdn.net/data/projects/classifiers/bin/envs/classifiers/lib/python3.10/site-packages/methylize/diff_meth_pos.py:282), in diff_meth_pos..beta2m(val)
    281 def beta2m(val):
--> 282     return math.log2(val/(1-val))

ValueError: math domain error

There doesn't appear to be anything wrong with the data, if I do the following (which is replicating the beta2m() function):

df = meth_data.apply(lambda x: x / (1 - x))
df = df.applymap(lambda x: np.log2(x))

and plug the resulting df into diff_meth_pos, then it will process that resulting df quite happily. Though it does give me two "Warning: invalid value encountered in subtract" warnings and then tell me that "No DMPs were found within the q < 1 (the significance cutoff level specified)".

I am assuming that's something else though.

Genome-wide DMP analysis

Outputs

Table of associations (CSV file)
Rows = probes
Columns = ...

  • regression coefficient
  • SE
  • 95% Confidence Interval (Upper Limit)
  • 95% Confidence Interval (Lower Limit)
  • p-value
  • FDR

Volcano plot

  • x-axis = regression coefficient
  • y-axis = -log10(pvalue)
  • Draw horizontal lines at genome-wide significance levels (p=5E-8 and FDR<0.05)

Manhattan plot

  • Color coded by chromosome
  • Draw horizontal lines at genome-wide significance levels (p=5E-8 and FDR<0.05)

GSEA for EWAS

Incorporate this package into methylize:

https://github.com/aet21/ebGSEA

Citation:
https://www.ncbi.nlm.nih.gov/pubmed/30715212
https://academic.oup.com/bioinformatics/article/35/18/3514/5305022

Gene Set Enrichment Analysis (GSEA) is a general tool to aid biological interpretation, yet its correct and unbiased implementation in the EWAS context is difficult due to the differential probe representation of Illumina Infinium DNA methylation beadchips.

ebGSEA ranks genes, not CpGs, according to the overall level of differential methylation, as assessed using all the probes mapping to the given gene. ebGSEA may exhibit higher sensitivity and specificity.
https://github.com/aet21/ebGSEA

diff_meth_pos is broken : TypeError: rv_generic.interval() missing 1 required positional argument: 'confidence'

Putting it here for the record. When trying to use diff_meth_pos, the following error can happen, seemingly in joblib :

TypeError: rv_generic.interval() missing 1 required positional argument: 'confidence'

Other things may or may not pop in the traceback depending on whether you launch the command from a jupyter notebook

To fix it: downgrade scipy to 1.10 :

pip install --force-reinstall -v "scipy==1.10"

OverflowError: timeout value is too large

I have encountered the following error when trying to generate differentially expressed regions as per documentation:

files_created = methylize.diff_meth_regions(test_results2, '450k', prefix='../data/asthma/dmr/')

INFO:methylprep.files.manifests:Reading manifest file: HumanMethylation450k_15017482_v3.csv
INFO:methylprep.files.manifests:Reading manifest file: HumanMethylation450k_15017482_v3.csv
ERROR:methylize.diff_meth_regions:Traceback (most recent call last):
  File "c:\Users\adams\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 856, in next
    item = self._items.popleft()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\adams\AppData\Local\Programs\Python\Python310\lib\site-packages\methylize\diff_meth_regions.py", line 149, in diff_meth_regions
    results = _pipeline(kw['col_num'], kw['step'], kw['dist'],
  File "c:\Users\adams\AppData\Local\Programs\Python\Python310\lib\site-packages\methylize\diff_meth_regions.py", line 296, in _pipeline
    putative_acf_vals = methylize.cpv.acf(bed_files, lags, col_num0, simple=False,
  File "c:\Users\adams\AppData\Local\Programs\Python\Python310\lib\site-packages\methylize\cpv\acf.py", line 101, in acf
    for chrom_acf in imap(_acf_by_chrom, arg_list):
  File "c:\Users\adams\AppData\Local\Programs\Python\Python310\lib\site-packages\toolshed\pool.py", line 31, in wrap
    return func(self, timeout=timeout or 1e8)
  File "c:\Users\adams\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 861, in next
    self._cond.wait(timeout)
  File "c:\Users\adams\AppData\Local\Programs\Python\Python310\lib\threading.py", line 324, in wait
    gotit = waiter.acquire(True, timeout)
OverflowError: timeout value is too large

ERROR:methylize.diff_meth_regions:timeout value is too large
ERROR:methylize.diff_meth_regions:Other/.fdr.bed.gz: [Errno 2] No such file or directory: 'Other/.fdr.bed.gz'
ERROR:methylize.diff_meth_regions:Other/.slk.bed.gz: [Errno 2] No such file or directory: 'Other/.slk.bed.gz'

I am working with Python 3.10 on a Windows 11 machine.

Unable to use pd.Series for phenotype

I believe I have a work around for this, so more of a heads up than something that needs solving right now.

The documentation says that the phentoype can be provided as

- a list of strings,
- integer binary data,
- numeric continuous data
- pandas Series, DataFrame or numpy array

I'm using a linear regression - I've got 11 different cancer diagnoses in my dataset. I'm taking the phenotype data from a metadata dataframe. If I pass it in as a Series, it breaks - giving me "Could not understand your pheno_data". In the following, phenotype is a pandas Series, containing strings.

methylize.diff_meth_pos(df, phenotype)

ValueError                                Traceback (most recent call last)
[/data/projects/classifiers/src/exploration/methylation/differentialMethylation.ipynb](https://vscode-remote+ssh-002dremote-002bmtbnotes-002ddev-002ezerochildhoodcancer-002ecloud.vscode-resource.vscode-cdn.net/data/projects/classifiers/src/exploration/methylation/differentialMethylation.ipynb) Cell 17 in ()
----> [1](vscode-notebook-cell://ssh-remote%2Bmtbnotes-dev.zerochildhoodcancer.cloud/data/projects/classifiers/src/exploration/methylation/differentialMethylation.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) methylize.diff_meth_pos(meth_data, phenotype)

File [/data/projects/classifiers/bin/envs/classifiers/lib/python3.10/site-packages/methylize/diff_meth_pos.py:210](https://vscode-remote+ssh-002dremote-002bmtbnotes-002ddev-002ezerochildhoodcancer-002ecloud.vscode-resource.vscode-cdn.net/data/projects/classifiers/bin/envs/classifiers/lib/python3.10/site-packages/methylize/diff_meth_pos.py:210), in diff_meth_pos(meth_data, pheno_data, regression_method, impute, **kwargs)
    208         regression_method = 'linear'
    209     else:
--> 210         raise ValueError("Could not understand your pheno_data.")
    211 else:
    212     raise ValueError(f"pheno_data must be list-like, or if a DataFrame, specify the 'column' to use.")

ValueError: Could not understand your pheno_data.

It won't accept a pandas Series.
It won't accept a list of strings if I convert the series to a list.

It will accept it, and run if I map the strings to integers, i.e.:

unique_strings = phenotype.unique()
string_to_int_map = {string: i for i, string in enumerate(unique_strings)}
phenotype = [string_to_int_map[string] for string in phenotype]
results = methylize.diff_meth_pos(df, phenotype)

It was my understanding from the documentation that methylize would internally maps strings to integers, but that doesn't appear to be working, if my understanding of it is correct.

Cheers
Ben.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.