hmjianggatech / huge Goto Github PK

High-Dimensional Undirected Graph Estimation

R 58.60% C++ 24.83% M4 0.60% Shell 15.97%

huge's Introduction

Huge

R Package for High-Dimensional Undirected Graph Estimation and Inference

Huge (Huge-Dimensional Undirected Graph Estimation) implements the algorithm of estimating the parameters of a Gaussian distribution in such a way that the resulting undirected graphical model is sparse. The core algorithm is implemented in C++ with RcppEigen support for portable high performance linear algebra. It also implements a unified framework to quantify local and global inferential uncertainty for high dimensional graphical models. In particular, we consider the problems of testing the presence of a single edge. Runtime profiling is documented in the Performance section.

Installation

Prerequisites

Huge uses OpenMP to enables faster matrix multiplication. So, to use huge, you must correctly enables OpenMP for the compiler.

For Windows and Linux users, newest version of GCC has fully support of OpenMP.

But for MAC OS users, things are a little tricky since the default llvm on MAC OS does not support OpenMP. But the solution is easy. You can simply install llvm with full OpenMP support and direct R using this version of llvm.

First, install llvm with OpenMP support by typing

brew install llvm

Then append the following lines into ~/.R/Makevars to enable llvm with OpenMP support to be the compiler for R packages.

CC = /usr/local/bin/clang-omp
CXX = /usr/local/bin/clang-omp++
CXX98 = /usr/local/bin/clang-omp++
CXX11 = /usr/local/bin/clang-omp++
CXX14 = /usr/local/bin/clang-omp++
CXX17 = /usr/local/bin/clang-omp++
OBJC = /usr/local/bin/clang-omp
OBJCXX = /usr/local/bin/clang-omp++

Installing from GitHub

First, you need to install the devtools package. You can do this from CRAN. Invoke R and then type

install.packages(devtools)

Then load the devtools package and install huge

library(devtools)
install_github("HMJiangGatech/huge")
library(huge)

Windows User: If you encounter a Rtools version issue: 1. make sure you install the latest Rtools; 2. try the following code

assignInNamespace("version_info", c(devtools:::version_info, list("3.5" = list(version_min = "3.3.0", version_max = "99.99.99", path = "bin"))), "devtools")

Install from CRAN

Ideally you can just install and enable huge using with the help of CRAN on an R console.

install.packages("huge")
library(huge)

Examples

#generate data  
L = huge.generator(n = 50, d = 12, graph = "hub", g = 4)

#graph path estimation using glasso  
est = huge(L$data, method = "glasso")
plot(est)

#inference of Gaussian graphical model at 0.05 significance level  
T = est$icov[[10]]  
inf = huge.inference(L$data, T, L$theta)
print(inf$error) # print out type-I error

Experiments

For detailed implementation of the experiments, please refer to benchmark/benchmark.R

Graph Estimation

We compared our package on hub graph with (n=200,d=200) with other packages, namely, QUIC and clime. Huge significantly outperforms clime, QUIC and original huge in timing performance. We also calculated the likelihood for estimation.

	CPU Times(s)
Huge glasso	1.12
Huge tiger	1.88
Huge v1.2.7	1.80
QUIC	7.50
Clime	416.77

	Object value
Huge glasso	-125.96
Huge tiger	-125.47
QUIC	-90.58
Clime	-136.96

Graph Inference

When using the Gaussian graphical model, huge controls the type I error well.

	band		hub		scale-free
significance level	0.05	0.10	0.05	0.10	0.05	0.10
type I error	0.0175	0.0391	0.0347	0.0669	0.0485	0.0854

References

[1] T. Zhao and H. Liu, The huge Package for High-dimensional Undirected Graph Estimation in R, 2012
[2] Xingguo Li, Jason Ge, Haoming Jiang, Mingyi Hong, Mengdi Wang, and Tuo Zhao, Boosting Pathwise Coordinate Optimization: Sequential Screening and Proximal Subsampled Newton Subroutine, 2016
[3] Quanquan Gu, Yuan Cao, et al. Local and Global Inference for High Dimensional Nonparanormal Graphical Models
[4] Conﬁdence intervals for high-dimensional inverse covariance estimation
[5] D. Witten and J. Friedman, New insights and faster computations for the graphical lasso,2011
[6] N. Meinshausen and P. Buhlmann, High-dimensional Graphs and Variable Selection with the Lasso, 2006

huge's People

Contributors

Stargazers

Watchers

Forkers

gatech-flash qwertier24 mirca acdeboer zdk123 tveten eddelbuettel ynshen turgeonmaxime mgyliu desanou olivroy

huge's Issues

For Loop Not Nested Correctly

In src/RIC.cpp the for loop on 19 should have parentheses. Because it doesn't, the for loop on line 31 is not actually nested inside of it, unlike what the indentation suggests. The for loop on line 31 isn't actually ever entered, since j == d when line 31 is actually reached.

This causes lambda to be a bit smaller than it actually should be (since you're not looking at as many values when finding lambda_max).

Thank you

Include the method robust GLASSO

Thanks for the great package. I'm trying to use it for a project, but I would really like to use a robust version of GLASSO as in https://link.springer.com/chapter/10.1007/978-3-319-22404-6_19. I guess this is a feature that could be useful for many others in applications as well. If I want to use huge for this purpose now, I have to input a robust covariance matrix estimate into huge.glasso, but then I'm not allowed to use huge.select for model selection afterwards because I input a covariance matrix estimate rather than the data.

'huge.mb' is not installed at ver. 1.3 (cran)

function 'huge.mb' was install at ver. 1.27.
'huge.mb' is not installed at ver. 1.3.

Is it intended?

Many NaN starting from custom covariance matrix. method = "glasso"

Hi,
firstly thanks for your work!
I'm trying to use the glasso method from huge package on a 100x100 correlation matrix calculated from a dataset.
My problem is that I get very fast almost only NaN in the elements of inverse covariance matrices paths and only 1 in the corresponding adjacency matrices.
If instead, I start from the scaled matrix from which I calculate my correlation measure, I get an inverse covariance and an adjacency matrices path comparable with results coming from other packages. Unfortunately I need to test that specific correlation measure and some other.
Do you have some suggestion on what could be causing such NaNs?
I tried to run both huge(S, lambda, method = "glasso", cov.output = TRUE)
and huge.glasso(S, lambda, cov.output = TRUE).
I tried again with cov.output = FALSE, to minimize computational burdens and in all these attempts I used as "S" a covariance matrix I estimated trough some measure different from Pearson correlation.

`validObject` Error in huge.mb for small sample data

I'm trying to understand this error - which seems to occur when re-constructing a sparse matrix on small-sample data and perhaps with small values of lambda.

Reproduce the error:

library(huge)
set.seed(10010)
dat <- huge.generator(n=5, d=250)

# these fail
est1 <- huge(dat$data, method='mb', scr=FALSE, nlambda=100, lambda.min.ratio = 5e-5)
est2 <- huge(dat$data, method='mb', scr=FALSE, nlambda=10, lambda.min.ratio = 1e-4)

Error in validObject(.Object) : 
  invalid class “dgCMatrix” object: all row indices must be between 0 and nrow-1

Interestingly, small tweaks to the lambda path seem to run OK.

est3 <- huge(dat$data, method='mb', scr=FALSE, nlambda=100, lambda.min.ratio = 2e-4)
est4 <- huge(dat$data, method='mb', scr=FALSE, nlambda=10, lambda.min.ratio = 2e-4)

and if we pass in the correlation matrix directly, previously failed are OK

est1.cor <- huge(cor(dat$data), method='mb', scr=FALSE, nlambda=100, lambda.min.ratio = 5e-5)
est2.cor <- huge(cor(dat$data), method='mb', scr=FALSE, nlambda=10, lambda.min.ratio = 1e-4)
est3.cor <- huge(cor(dat$data), method='mb', scr=FALSE, nlambda=100, lambda.min.ratio = 2e-4)
est4.cor <- huge(cor(dat$data), method='mb', scr=FALSE, nlambda=10, lambda.min.ratio = 2e-4)

It seems that the only difference when passing the correlation matrix seems to be the maxdf is d rather than n.

The error seems to be coming from this block:
https://github.com/HMJiangGatech/huge/blob/master/R/huge.mb.R#L107-L120
It seems perhaps the for loop is trying to index values that aren't there - perhaps because maxdf isn't large enough (in which case I don't understand the purpose of the parameter).

MB: errors with large data, small lambdas

Tracking a huge-related error here:
zdk123/SpiecEasi#73

Reproduce with:

X <- MASS::mvrnorm(10, rep(0,120), diag(120))
huge::huge(X, method='mb', lambda=c(.1))

Error in validObject(.Object) :
invalid class “dgCMatrix” object: all row indices must be between 0 and nrow-1

huge::huge(X, method='mb', lambda=c(.01))

*** caught segfault ***
address 0x450000003a, cause 'memory not mapped'

MB is not consistent between huge versions

A SPIEC-EASI user discovered a discrepancy in the huge.mb results at least after the switch to version 1.3 (related issue zdk123/SpiecEasi#107).

In trying to reproduce the error, I found that there are both false negatives and false positive edges typically associated with small [in magnitude] coefficients.

Here's a reproducible example managed by different conda environments:

conda create -n huge2.7 -c conda-forge r-huge=1.2.7
conda create -n huge3.3 -c conda-forge r-huge=1.3.3

Run this in an R session code twice, under each huge version

library(huge)
set.seed(10010)
dat <- huge.generator(100, 215, graph="scale-free", v=.01, u=2)

out <- huge::huge.mb(dat$data, lambda.min.ratio=1e-2, nlambda=10)
save(out, file=paste0('hugev', packageVersion('huge'), ".RData"))

Compare results

library(Matrix)
load('hugev1.3.3.RData')
out33 <- out
load('hugev1.2.7.RData')
out27 <- out
rm(out)

sapply(1:length(out27$path), function(i) norm(out33$path[[i]]-out27$path[[i]], '1'))

So it seems likely to me that in migrating the source code, maybe the zero tolerance and/or floating point precision has changed.

I would greatly appreciate some guidance on this, especially if it was a deliberate choice. An option to obtain numerically equivalent results would be highly useful so that users can safely upgrade dependencies without dozens to hundreds of edges changing.

Non-paranormal transformation erroneously sets missing data to a value

Hi,

The current version of huge seems to have a bug that leads to NA values being placed by the largest outcome in the non-paranormal transformation:

library("huge")
df <- data.frame(
  a = c(1,2,3,NA)
)
huge.npn(df)

MB: negative length vectors are not allowed

Hi there,

We've uncovered another bug over at SpiecEasi that seems to be related to a toxic combination of parameters (large data sets & when number of lambdas is large).

For instance:

dat <- huge::huge.generator(1183, 1510)
out <- huge::huge.mb(dat$data, lambda.min.ratio=1e-2, nlambda=100)
## Conducting Meinshausen & Buhlmann graph estimation (mb)....Error in huge::huge.mb(X, lambda.min.ratio = 0.01, nlambda = 100) :
##  negative length vectors are not allowed

out <- huge::huge.mb(dat$data, lambda.min.ratio=1e-2, nlambda=10)
## NO ERROR

dat <- huge::huge.generator(1511, 1510)
out <- huge::huge.mb(dat$data, lambda.min.ratio=1e-2, nlambda=100)
## NO ERROR

The error is getting thrown by the C code, when p < n, oddly and only when nlambda is greater than 20 or so.

thanks!

Export huge.mb or return beta

I was using the output from the huge.mb function in my SpiecEasi package, but migrated back to the huge::huge wrapper in the recent update to version 1.3.

However, since huge doesn't pass through coefficient matrix (beta) from huge::huge I am now missing some key functionality. This is not a problem for "glasso" mode, since icov/cov does get returned.

Would it be possible to (optionally?) return beta and/or export huge.mb as was done prior to version 1.3?

Logging the bug on my end here: zdk123/SpiecEasi#72