Coder Social home page Coder Social logo

kkdey / corshrink Goto Github PK

View Code? Open in Web Editor NEW
23.0 3.0 3.0 40.89 MB

R package for adaptive correlation and covariance matrix shrinkage.

License: GNU General Public License v3.0

R 100.00%
covariance-matrix covariance-shrinkage shrinkage ash word2vec missing-data

corshrink's Introduction

CorShrink

Build Status Build Status Build Status

R package for adaptive correlation and covariance matrix shrinkage.

Kushal K Dey, Matthew Stephens.

License

Copyright (c) 2017-2018, Kushal Dey.

All source code and software in this repository are made available under the terms of the GNU General Public License. See the LICENSE file for the full text of the license.

Citing this work

If you find that this R package is useful for your work, please cite the following papers:

Dey, Kushal K and Stephens, Matthew. CorShrink : Empirical Bayes shrinkage estimation of correlations, with applications. 2018. bioRxiv. Cold Spring Harbor Laboratory. 10.1101/368316. https://www.biorxiv.org/content/early/2018/07/24/368316.full.pdf

Methods Overview

A companion package to the ashr package by Matthew Stephens see paper, CorShrink adaptive shrinks correlation between a pair of variables based on the number of pairwise complete observations. CorShrink can be applied to a vector or matrix of pairwise correlations and can also be generalized to quantities similar in nature to correlations - like partial correlations, rank correlations and cosine simialrities from word2vec model. CorShrink when applied to a data matrix, is able to learn an individual shrinkage intensity for a pair of variables from the number of missing observations between each such pair - which allows the method to handle large scale missing observations (a demo of which is presented in the example below).

Quick Start

The instructions for installing the package are as follows.

For CRAN version:

install.packages("CorShrink")

For the development version:

library(devtools)
install_github("kkdey/CorShrink", build_vignettes = TRUE)

Then load the package with:

library(CorShrink)

A demo example usage of CorShrink is given below. For detailed examples and methods, check here.

We first load an example data matrix of gene expression for a specific gene in a tissue sample drawn from a test individual in the GTEx Project. We note that there are many missing observations in this data matrix, which correspond to tissue samples not contributed by an individual.

data("sample_by_feature_data")
sample_by_feature_data[1:5, 1:5]

           Adipose - Subcutaneous Adipose - Visceral (Omentum)
GTEX-111CU              10.472332                     10.84006
GTEX-111FC               7.335392                           NA
GTEX-111VG               9.118889                           NA
GTEX-111YS              10.806459                     11.26113
GTEX-1122O              11.040446                     11.71497
           Adrenal Gland Artery - Aorta Artery - Coronary
GTEX-111CU      2.721234             NA                NA
GTEX-111FC            NA             NA                NA
GTEX-111VG            NA             NA                NA
GTEX-111YS      3.454823       1.162059                NA
GTEX-1122O      1.522667       1.674467          4.188002

We use CorShrink to estimate the correlation matrix taking account of the missing observations and compare the result with the matrix of pairwise correlations generated from complete observations for each pair of features.

out <- CorShrinkData(sample_by_feature_data, sd_boot = FALSE, image = "both",
                    image.control = list(tl.cex = 0.2))                            

Structure Plot

The above approach uses an asymototic version of CorShrink. Alternatively, one can use a re-sampling or Bootstrapping approach.

out <- CorShrinkData(sample_by_feature_data, sd_boot = TRUE, image = "both",
                    image.control = list(tl.cex = 0.2))

Structure Plot

Walk through some more detailed examples in the vignette:

vignette("corshrink")

If you want to reproduce the analysis from our paper, please check the codes and available data here.

Credits

The authors would like to thank the GTEx Consortium, John Blischak, Sarah Urbut, Chiaowen Joyce Hsiao, Peter Carbonetto and all members of the Stephens Lab. For any queries related to the CorShrink package, contact Kushal K. Dey here [email protected]

corshrink's People

Contributors

kkdey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

corshrink's Issues

Parallelisation

What would be the best way to parallelise it, or incorporate parallelisation? Are there any plans to parallelise CorShrink?

Question about options

Hi Kushal - This is a really great method. I had a couple questions regarding the different options available:

  1. For Spearman correlations, can I input the rho and n into the CorShrinkVector() function, or do rank correlations require bootstrapped-derived SEs for valid inference?

  2. I have a large number of correlations, but I am only interested in positive correlations. This will also save computational time because it is A LOT of correlations. If I subset to only positive correlations and feed them into CorShrinkVector(), would specifying the "+uniform" distribution be the best way to proceed?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.