Coder Social home page Coder Social logo

csbiology / tmea Goto Github PK

View Code? Open in Web Editor NEW
8.0 4.0 3.0 10.84 MB

Thermodynamically Motivated Enrichment Analysis (TMEA) is a new approach to gene set enrichment analysis.

Home Page: https://www.mdpi.com/1099-4300/22/9/1030

License: MIT License

F# 99.39% Batchfile 0.09% Shell 0.13% CSS 0.39%
gsea genes biostatistics bioinformatics f-sharp enrichment-score deedle plotly

tmea's Introduction

This repository is home of the framework TMEA (Thermodynamically Motivated Enrichment Analysis), which we created from the scripts we used in our 2020 Entropy paper

Find the authors on github: Kevin Schneider (1), Benedikt Venn (1), Timo Mühlhaus

  • (1) : These authors contributed equally.

If you use this package in your research, please cite it. Citation formats are available at the original article page

alternatively, here is an example citation:

Schneider K, Venn B, Mühlhaus T. TMEA: A Thermodynamically Motivated Framework for Functional Characterization of Biological Responses to System Acclimation. Entropy. 2020; 22(9):1030.

This package is in an early beta stage, there may be bugs. Issues and PRs are greatly appreciated!

Introduction

The objective of gene set enrichment analysis (GSEA) in modern biological studies is to identify functional profiles in huge sets of biomolecules generated by high throughput measurements of genes, transcripts, metabolites, and proteins. GSEA is based on a two-stage process using classical statistical analysis to score the input data and subsequent testing for overrepresentation of the enrichment score within a given functional coherent set. However, enrichment scores computed by different methods are merely statistically motivated and often elusive to direct biological interpretation.

Here, we propose a novel approach, called Thermodynamically Motivated Enrichment Analysis (TMEA), to account for the energy investment in biological relevant processes. Therefore, TMEA is based on surprisal analysis, that offers a thermodynamic-free energy-based representation of the biological steady state and of the biological change. The contribution of each biomolecule underlying the changes in free energy is used in a Monte Carlo resampling procedure resulting in a functional characterization directly coupled to the thermodynamic characterization of biological responses to system perturbations.

Installation

For instructions on how to install F#, please head here(Windows) , here(MAC) or here(Linux)

the package itself is available on nuget: https://www.nuget.org/packages/TMEA

alternatively, clone this repo and run fake.cmd or fake.sh (requires dotnet sdk >= 3.1.302)

Usage

  1. Include the lapack folder to your PATH variable, either for the fsi session or on your systems level. The folder is located in the nuget package under ./Netlib_LAPACK

  2. Reference this library and its dependencies.

  3. We strongly recommend to register fsi printers for Deedle, the dataframe library we use in this project. There is a Deedle.fsx file located in the Deedle nuget package which will take care of that if you #load it.

  4. A simple pipeline to perform TMEA on time series data looks like this:

    open TMEA
    open TMEA.SurprisalAnalysis
    open TMEA.MonteCarlo
    open TMEA.Frames
    open TMEA.Plots
    
    let annotationMap : Map<string,string[]> = ... // We assume you have ontology annotations for your dataset
    
    let tmeaRes = 
        IO.readDataFrame 
            "TranscriptIdentifier" // The column of the data table that contains your entity identifiers
            "\t" // separator for the input file
            "path/to/your/raw/data.txt"
        |> Analysis.computeOfDataFrame 
            Analysis.standardTMEAParameters //using custom parameters you can change verbosity, bootstrap iterations, and the annotation used for unannotated entities
            annotationMap

Plots

All plot functions have a generate* analog, which generates the Chart object without rendering it (in case you want to fine tune styles etc.). Currently, the following plots are provided by the package:

All charting functions are extension methods of the TMEAResult type. Given the example script above, you can visualize the results as:

Functionally Annotated Set (FAS) weight distributions

  • plotFASWeightDistribution is an exploratory plot that visualizes the overall weight distributions of the given TMEA Characterizations, and adds detailed weight distributions of the FAS of interest on top of that. additionally, annotations on the respective subplots show useful information about the FAS characterization.

    tmeaRes
    |> TMEAResult.plotFASWeightDistribution 
        true //use style presets
        0.05 //significance threshold for (corrected!) p values
        [1;2;3] //constraints to plot
        "signalling.light" //name of the FAS

Potential Time Course:

  • plotConstraintTimecourses plots the constraint potential time courses of the given TMEA result:

    tmeaRes
    |> TMEAResult.plotConstraintTimecourses true //true -> will use style presets

  • plotPotentialHeatmap is a more visually pleasing version of above plot (it omits the baseline state per default):

    tmeaRes
    |> TMEAResult.plotPotentialHeatmap true

Free Energy Landscape:

  • plotFreeEnergyLandscape plots the free energy landscape of the TMEA result:

    tmeaRes
    |> TMEAResult.plotFreeEnergyLandscape true
    

Constraint importance:

  • plotConstraintImportance: given the TMEA result, plots the singular values of all constraints (except the baseline state) and the 'importance loss' between them.

    tmeaRes
    |> TMEAResult.plotConstraintImportance true
    

Data recovery:

  • plotDataRecovery: given the TMEA result, plots the gradual reconstruction of the original data when using only n (in the example below, n = 3) constraints from the given TMEA result:

    tmeaRes
    |> TMEAResult.plotDataRecovery true 3 
    

TMEA.Dash

TMEA.Dash is a guided analytics application for TMEA using Dash.NET.

Usage

  • Clone this repository

  • install dotnet sdk >= 3.1.302

  • in a shell, navigate to src/TMEA.Dash

  • use dotnet run to start the application. Open a browser and head to https://localhost:5001/

  • you should see the following interface:

License acknowlegments

This library contains Netlib LAPACK binaries compiled from source, thanks to all the authors of it:

Anderson, E. and Bai, Z. and Bischof, C. and Blackford, S. and Demmel, J. and Dongarra, J. and
Du Croz, J. and Greenbaum, A. and Hammarling, S. and McKenney, A. and Sorensen, D.

tmea's People

Contributors

bvenn avatar kmutagene avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tmea's Issues

Signal standardization

When handling time series, would it be beneficial to standardize the signals prior to TMEA (e.g. log2 fold changes regarding the first time point)!?

I think in unnormalized data an amplitude change of 21 -> 23 has the same impact as a change from 1 -> 3, whereas the latter could be weighted more to better reflect biological relevance.

Add more dev tooling

Add the following tools for productivity/reproducibility

  • Addition of unit tests
  • Test coverage reports
  • Build targets for releasing the package

Feature roadmap for TMEA

While this package already contains everything needed for performing TMEA, there are many things that can and should be ported from our internal analysis scripts that we used to prepare our paper. Users of this library can expect the following features to be added over time:

General:

  • better result types as input for various functions (added via cc32d39 )
  • namespace cleanup (shorter and more concise namespaces) (added via cc32d39 )

Plots:

  • Visualizations for Surprisal Analysis results: constraint potential time course and free energy landscape (added via 121138f)
  • Surprisal Analysis: constraint potential heatmap (added via 1be3084)
  • Surprisal Analysis: plots for supporting selection of constraints that are important (visualizing Singular Values and data reconstruction) (added via eef8d55)
  • Analytical plots for the TMEA result e.g. p value distribution,
  • Set comparison (either via venn or chord diagrams, maybe add upset plot support)
  • Weight distributions of single entities in the dataset across constraints (added via c8dccac)

Frames:

  • Frames suitable for fast comparison of significant entities in the TMEA result

Additional dev tooling (moved to #2)

Add convenience functions for TMEA results

Convert and save different parts of the result as frame , e.g. potentials only, constraints only, etc.

Most important: save full result in a format that can be consumed again to prevent unnecessary runs of the pipeline

1.0.0 Roadmap

Here are the features and respective issues that are planned for 1.0.0.

Features

Core lib

  • Convenience functions for handling TMEAResults (#5)
  • More helpers to read/correctly transform ontology maps (#6)
  • Optional parameters for all plotting functions to increase customizability without much Plotly.NET knowledge (#7)

DashApp

  • export results (#3)
  • Control TMEA analysis parameters (#8)
  • Host DashApp on a free heroku instance
  • Containerize DashApp as docker container with IIS/asp.netcore base image

Tooling/accessibility

  • dotnet CLI tool for the full pipeline
  • Docker container containing above CLI tool with the correct env variables
  • Basic tests (Correct output format, reproducible results (to a certain degree, there are some heuristics included))
  • Better docs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.