dataslingers / exclusivelasso Goto Github PK
View Code? Open in Web Editor NEWGeneralized Linear Models with the Exclusive Lasso Penalty
Home Page: https://DataSlingers.github.io/ExclusiveLasso/
Generalized Linear Models with the Exclusive Lasso Penalty
Home Page: https://DataSlingers.github.io/ExclusiveLasso/
It is interesting and exciting to see exclusivelasso package. It is helpful for the variable selection. I am a clinic researchers. Cox model is common and important to analyze the variable related to sample's survival status and time. So, I want to ask if the exclusivelasso package can apply to the cox model. I also scan the code of the function of the package. But it's difficult for me to understand because the programming ability.
It's very glad for your future reply.
It's not necessary to implement the full set of GLMs supported by glmnet, but we should have the "big three" of Gaussian, Logistic, and Poisson. No plans to implement Cox PH, multinomial and multi-response Gaussian for now.
glmnet
?)deviance
functionBy default, Rcpp transfers the RNG state to and from R when calling into C++. This is a little bit expensive and not necessary for us since we don't use RNGs in C++.
If we change our C++ attributes to // [[Rcpp::export(rng = false)]]
, things will be a smidge faster. (I'd imagine this is still dwarfed by compute time, but it can't hurt.)
See example at https://github.com/tidyverse/dplyr/blob/e08f0511986a10efda5b9fd991af2e049d801333/src/address.cpp#L18
The exclusive lasso proximal operator is relatively expensive to calculate, so an accelerated prox gradient scheme might be useful.
As discussed at [1], uint
is not standard C++. We should use arma::uword
consistently instead.
There's also a signed/unsigned comparison at [2] that should be fixed. I think switching to arma::uvec
should suffice.
[1] https://stackoverflow.com/q/3552094
[2]
ExclusiveLasso/src/ExclusiveLasso.cpp
Line 50 in 5a75b7c
Multinomial and Multi-Response regression (with a group-lasso penalty)
Upper and lower bounds on each parameter. This should be a pretty minor algorithmic tweak: for CD, apply the constraint at each update; for PG, incorporate the constraint into the prox (itself a CD problem).
I am running into the error of
Error: Mat::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD
I understand that this is due to large matrix error which should be the issue from the RcppArmadillo package. The sample code that run into issue as below.
library(glmnet)
library(ExclusiveLasso)
N = 80000 # number of observations
p = 35 # number of variables
# random generated X
X = matrix(rnorm(N*p), ncol=p)
# standardization : mean = 0, std=1
X = scale(X)
# artificial coefficients
beta = c(0.15,-0.33,0.25,-0.25,0.05,0,0,0,0.5,0.2,
0.15,-0.33,0.25,-0.25,0.05,0,0,0,0.5,0.2,
0.15,-0.33,0.25,-0.25,0.05,0,0,0,0.5,0.2,
1, -0.2, 0.2, 0.1, 0.5)
# Y variable, standardized Y
y = X%*%beta + rnorm(N, sd=0.5)
#y = scale(y)
# group index for X variables
v.group <- rep(1:10, length.out = 35 )
#--------------------------------------------
# Model with a given lambda
#--------------------------------------------
# exclusive lasso
ex <- exclusive_lasso(X, y,lambda = 0.2,
groups = v.group, family="gaussian",
intercept = F)
Currently, if lambda
is not supplied by the user, we use the same range as glmnet
(more or less).
Does theory suggest a better default? Ideally, we want lambda_max
to be the smallest value of lambda that gives exactly one non-zero element in each group, but there are problems for which no such lambda exists and even when it does, it's not clear how to calculate it quickly.
Recent versions of Armadillo (and hence RcppArmadillo) come with OpenMP support, used to speed
up certain expensive elementwise operators. Compiler support for OpenMP is inconsistent (particularly on MacOS) so we should disable this for now and re-enable it when the upstream fix is on CRAN.[1]
[1] See RcppCore/RcppArmadillo#177 and RcppCore/RcppArmadillo#185, especially RcppCore/RcppArmadillo@dc294cb
Section 7 of Campbell and Allen (2017) contains a (simulated) NMR example data set.
This should be included and used in documentation / examples.
Add an option for partial L2 regularization (similar to glmnet
's alpha
argument) -- should only be a minor modification of the prox and CD steps.
using
library(devtools)
install_github("DataSlingers/ExclusiveLasso")
and it gives:
In file included from D:/R-4.2.2/library/Rcpp/include/RcppCommon.h:30,
from D:/R-4.2.2/library/RcppArmadillo/include/RcppArmadillo/interface/RcppArmadilloForward.h:25,
from D:/R-4.2.2/library/RcppArmadillo/include/RcppArmadillo.h:29,
from ExclusiveLasso.cpp:23:
D:/R-4.2.2/library/Rcpp/include/Rcpp/r/headers.h:66:10: fatal error: R.h: No such file or directory
66 | #include <R.h>
| ^~~~~
compilation terminated.
make: *** [D:/R-4.2.2/etc/x64/Makeconf:260:ExclusiveLasso.o] 错误 1
ERROR: compilation failed for package 'ExclusiveLasso'
* removing 'D:/R-4.2.2/library/ExclusiveLasso'
i'm using r 4.2.2, win11, vscode
Is it possible to derive (safe or strong) screening rules for the Exclusive Lasso?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.