Coder Social home page Coder Social logo

dataslingers / exclusivelasso Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 2.0 641 KB

Generalized Linear Models with the Exclusive Lasso Penalty

Home Page: https://DataSlingers.github.io/ExclusiveLasso/

R 71.10% C++ 28.90%
high-dimensional-data statistics statistical-learning sparsity structured-sparsity generalized-linear-models regularization lasso exclusive-lasso

exclusivelasso's People

Contributors

michaelweylandt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

exclusivelasso's Issues

Is exclusivelasso package suitable for cox survival analysis?

It is interesting and exciting to see exclusivelasso package. It is helpful for the variable selection. I am a clinic researchers. Cox model is common and important to analyze the variable related to sample's survival status and time. So, I want to ask if the exclusivelasso package can apply to the cox model. I also scan the code of the function of the package. But it's difficult for me to understand because the programming ability.
It's very glad for your future reply.

Implement Exclusive Lasso for Logistic and Poisson GLMs

It's not necessary to implement the full set of GLMs supported by glmnet, but we should have the "big three" of Gaussian, Logistic, and Poisson. No plans to implement Cox PH, multinomial and multi-response Gaussian for now.

  • Coordinate Descent Algorithm (nested within IRLS like glmnet?)
  • Prox Gradient Algorithm (will probably need back-tracking, so need to make sure prox is optimized)
  • Methods for deviance function
  • Other CV loss functions

Accelerated Proximal Gradient

The exclusive lasso proximal operator is relatively expensive to calculate, so an accelerated prox gradient scheme might be useful.

Box Constraints

Upper and lower bounds on each parameter. This should be a pretty minor algorithmic tweak: for CD, apply the constraint at each update; for PG, incorporate the constraint into the prox (itself a CD problem).

Unable to run exclusive_lasso when Matrix size too large

I am running into the error of

Error: Mat::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD

I understand that this is due to large matrix error which should be the issue from the RcppArmadillo package. The sample code that run into issue as below.

library(glmnet)
library(ExclusiveLasso)

N = 80000 # number of observations
p = 35  # number of variables

# random generated X
X = matrix(rnorm(N*p), ncol=p)

# standardization : mean = 0, std=1
X = scale(X)

# artificial coefficients
beta = c(0.15,-0.33,0.25,-0.25,0.05,0,0,0,0.5,0.2,
         0.15,-0.33,0.25,-0.25,0.05,0,0,0,0.5,0.2,
         0.15,-0.33,0.25,-0.25,0.05,0,0,0,0.5,0.2,
         1, -0.2, 0.2, 0.1, 0.5)

# Y variable, standardized Y
y = X%*%beta + rnorm(N, sd=0.5)
#y = scale(y)

# group index for X variables
v.group <- rep(1:10, length.out = 35 )

#--------------------------------------------
# Model with a given lambda
#--------------------------------------------
# exclusive lasso
ex <- exclusive_lasso(X, y,lambda = 0.2, 
                      groups = v.group, family="gaussian", 
                      intercept = F)

Default Lambda range

Currently, if lambda is not supplied by the user, we use the same range as glmnet (more or less).

Does theory suggest a better default? Ideally, we want lambda_max to be the smallest value of lambda that gives exactly one non-zero element in each group, but there are problems for which no such lambda exists and even when it does, it's not clear how to calculate it quickly.

OpenMP Support

Recent versions of Armadillo (and hence RcppArmadillo) come with OpenMP support, used to speed
up certain expensive elementwise operators. Compiler support for OpenMP is inconsistent (particularly on MacOS) so we should disable this for now and re-enable it when the upstream fix is on CRAN.[1]

  • Disable OpenMP for now
  • Add dependency on recent RcppArmadillo when released to CRAN
  • Re-enable OpenMP (conditionally)

[1] See RcppCore/RcppArmadillo#177 and RcppCore/RcppArmadillo#185, especially RcppCore/RcppArmadillo@dc294cb

L2 Regularization

Add an option for partial L2 regularization (similar to glmnet's alpha argument) -- should only be a minor modification of the prox and CD steps.

fail to install, it gives 'fatal error: R.h: No such file or directory'

using

library(devtools)
install_github("DataSlingers/ExclusiveLasso")

and it gives:

In file included from D:/R-4.2.2/library/Rcpp/include/RcppCommon.h:30,
                 from D:/R-4.2.2/library/RcppArmadillo/include/RcppArmadillo/interface/RcppArmadilloForward.h:25,
                 from D:/R-4.2.2/library/RcppArmadillo/include/RcppArmadillo.h:29,
                 from ExclusiveLasso.cpp:23:
D:/R-4.2.2/library/Rcpp/include/Rcpp/r/headers.h:66:10: fatal error: R.h: No such file or directory
   66 | #include <R.h>
      |          ^~~~~
compilation terminated.
make: *** [D:/R-4.2.2/etc/x64/Makeconf:260:ExclusiveLasso.o] 错误 1
ERROR: compilation failed for package 'ExclusiveLasso'
* removing 'D:/R-4.2.2/library/ExclusiveLasso'

i'm using r 4.2.2, win11, vscode

Screening Rules

Is it possible to derive (safe or strong) screening rules for the Exclusive Lasso?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.