Coder Social home page Coder Social logo

fastemu's Introduction

fastEmu

R-CMD-check codecov

fastEmu is an R package for estimating changes in the abundance of microbial categories (taxa, genes, etc.) associated with covariates using high throughput sequencing data. fastEmu is a wrapper for the package radEmu that implements a fast version of the hypothesis testing procedure implemented in radEmu.

Check out radEmu for a full list of reasons to use this method for your differential abundance analysis. Some highlights include:

  • radEmu uses high throughput sequencing data to estimate changes in the "absolute abundance" of categories
  • radEmu doesn't require data transformations, a reference taxon, or pseudocounts
  • radEmu is robust to differential sampling depths and differential detection of categories (taxa, genes, etc.)
  • radEmu has great error control, including in small samples and data generated from different distributions

However, hypothesis testing in radEmu can be slow, especially when there is a large number of categories. This is why we created fastEmu! fastEmu uses a simplified model to test the same parameters* tested in radEmu, but much faster.

What parameters are we estimating exactly?

*We don't estimate exactly the same parameters as in radEmu. In radEmu, we estimate the log fold difference in abundance associated with the covariate for each category relative to the typical log fold difference in abundance associated with the covariate across all categories. This lets us pick out the categories that are the most differentially abundant compared to the overall set of categories. In fastEmu, we cannot compare to the typical log fold difference all categories, because we need to reduce the size of our model in order to run it quickly. Instead, we compare to the typical log fold difference in a subset of categories. The smaller the subset, the faster the test will run.

We suggest choosing a meaningful subset if there is one in your analysis. For example, when we run differential abundance analyses for Kegg Orthologies (KOs) that represent genes, we compare to the typical log fold difference in KOs that encode ribosomal proteins. We do this because we expect these KOs to be changing very little across any covariate. If you have something similar in your analysis you can choose that, otherwise you can choose an arbitrary subset and approximate the parameters from radEmu pretty well!

Installation

To download fastEmu, use the code below.

# install.packages("devtools")
devtools::install_github("statdivlab/fastEmu")
library(radEmu)

Use

The vignettes demonstrate example usage of the main functions. Please file an issue if you have a request for a tutorial that is not currently included.

The following code will run radEmu to estimate log fold differences in abundance for all genes in $Y$ in comparison to the typical log fold difference in a subset of genes that encode ribosomal proteins, and will run fastEmu to quickly run a robust score test for the hypothesis that the $5\text{th}$ category has a log fold difference in abundance associated with the treatment that is different than the typical log fold difference associated with the treatment in the set of ribosomal genes.

emu_test <- fastEmuTest(constraint_cats = ribosomal_subset, 
                        formula = ~ treatment,
                        data = sample_data, 
                        Y = count_data,
                        test_kj = data.frame(k = 2, j = 5))

Documentation

We additionally have a pkgdown website that contains pre-built versions of our function documentation and our vignettes (coming soon).

Citation

If you use fastEmu for your analysis, check back in soon for a preprint!

Bug Reports / Change Requests

If you encounter a bug or would like make a change request, please file it as an issue here.

If you're a developer, we would love to review your pull requests.

fastemu's People

Contributors

svteichman avatar

Stargazers

Roberto Siani avatar

Watchers

 avatar AmyW avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.