scikit-learn-contrib / skglm Goto Github PK
View Code? Open in Web Editor NEWFast and modular sklearn replacement for generalized linear models
Home Page: http://contrib.scikit-learn.org/skglm
License: BSD 3-Clause "New" or "Revised" License
Fast and modular sklearn replacement for generalized linear models
Home Page: http://contrib.scikit-learn.org/skglm
License: BSD 3-Clause "New" or "Revised" License
The following assignment will fail in scikit-learn 1.1:
https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/estimators.py#L917
coef_
and intercept_
have been deprecated and are removed in scikit-learn 1.1.
Testing the RC on the code, break one of the tests.
Currently, users are not explicitly warned when an unsupported solver is used with some penalties or datafits. We should write some form of mechanism (a dispatcher?) which raises an exception when the coupling (datafit, penalty, solver) is unsupported.
E.g.: for #100 currently we should raise an exception, that is explicit for the user
As suggested by @agramfort , I am opening an issue here (original) to make a feature request for the square-root LASSO.
The unconstrained LASSO is currently implemented in lasso_path
. However, the parameter for unconstrained LASSO is sensitive to the measurement noise. Instead, the square-root LASSO has objective norm(Ax-b) + alpha * norm(x, 1) and its parameter, alpha, is insensitive to the noise level. I would like an sklearn
interface (with fit
, predict
, etc.) coupled with a lasso_path
-like function, srlasso_path
, which solves the square-root LASSO optimization program.
I propose an sklearn
-friendly wrapper around the cvxpy
or cvxopt
interface. More ambitiously, I propose an implementation based on [1] or [2] that implements a fast method for solving square-root LASSO (either via (accelerated) PGD or ADMM).
[1] "On fast convergence of proximal algorithms for SQRT-LASSO optimization: Don’t worry about its nonsmooth loss function" by Li, Xinguo, Jiang, Haoming, et al. (2020) PMLR.
[2] "The flare package for high dimensional linear regression and precision matrix estimation in R" by Li, Xingguo, Zhao, Tuo, Yuan, Xiaoming and Liu, Han. (2015) JMLR.
Currently, statsmodels
offers a square-root LASSO solver implemented via the Python package cvxopt
- this approach is slow. Using cvxpy
with another interior point solver like MOSEK offers only 2x speed-up (it is snails compared to lasso_path
).
Separately, if the problem is appropriately regularized to avoid over-fitting, some work has shown that it's possible to use accelerated proximal gradient descent to achieve convergence. However, the paper offers an algorithm with several critical implementation details unprovided (referring to [1] above).
Note that the characterization of srLASSO as a second-order conic programming (SOCP) problem is given in [3]
[3] Belloni, A., Chernozhukov, V. and Wang, L. Square-Root LASSO: Pivotal recovery of sparse signals via conic programming https://arxiv.org/pdf/1009.5689.pdf
Following our discussion in #75 (comment)
For survival analysis, Cox loss is the go-to model (see http://www.sthda.com/english/wiki/cox-proportional-hazards-model) . @Klopfe thinks he might have a very interesting use case for skglm using Cox loss and non-convex penalties for bulk RNAseq data.
Currently, it is implemented in glmnet which is slow. Might be a valuable addition.
I think that'd be a valuable addition for any newcomers wanting to add a custom datafit or penalty. Plus it's a great piece of doc to understand how skglm
works behind the scenes.
The arguments grad
and w
mismatch: grad
is restricted to the working set while w
is not. renaming to grad_ws
seems clearer
In [1]: import time
In [2]: t0 = time.time()
...: import skglm
...:
...: print(time.time() - t0)
0.7723352909088135
I thought this was due to numba compilation, but it's not fixed on #44. importing Lasso from sklearn.linear_model takes 1e-5 s, that is weird
it will be easier to pass instances of such classes to GeneralizedLinearEstimator (currently the solvers are hardcoded), and it will simplify the API.
The utils.py
cureently contains maths functions, as well as jit / numba specific functions.
IMO this file should be splited.
Now that the handle the intercept differently, we no longer need this
This will cause the solver not to move since 0 is always a critical point
The intercept fitting logic in skglm is non trivial for non experts. Adding a few lines of docs would be a valuable addition for future maintainers and users.
max_iter = 0 is no longer supported in sklearn 1.1, it raises:
ValueError: max_iter == 0, must be >= 1.
we can use max_iter=1 it's not important
import time
import numpy as np
from numpy.linalg import norm
from sklearn.linear_model import Lasso as Lasso_sk
from skglm.estimators import Lasso
n_samples = 100
n_features = 10_000
X = np.random.normal(0, 1, (n_samples, n_features))
y = np.random.normal(0, 1, (n_samples,))
alpha_max = norm(X.T @ y, ord=np.inf) / n_samples
alpha = alpha_max * 0.1
start = time.time()
clf = Lasso(alpha).fit(X, y)
print("skglm:", time.time() - start)
start = time.time()
clf = Lasso_sk(alpha).fit(X, y)
print("sklearn:", time.time() - start)
This script gives:
skglm: 4.0232319831848145
sklearn: 0.2305459976196289
This is due to the compilation cost. We should cache this compilation once and for all, ideally during install (by pre-building/pre-compiling the IR generated by Numba) or when the user first runs a script, using njit(cache=True)
.
given how easy it was to add Poisson datafit in #78 this may be a quick win
we are no longer a @online
but a @inproceedings
:)
As investigated in #59 through timing and benchmarks, fitting a gram_cd_solver
on a sparse dataset comes with a big overhead of computing the gram matrix, as opposed to the dense case.
It seems that it has to do more with scipy
sparse matrix multiplication. More details about that (explanation, code snippets).
It would be beneficial to have a more efficient way to compute the gram matrix in the sparse case as it would speed up drastically the solver.
Most non-expert users might not be familiar with the rescaling by n_samples
for alpha
. Issuing a warning when alpha > alpha_max
and giving hints to the user (for instance by printing alpha_max
) and informing the user of the scaling by n_samples
would be a valuable addition.
Fititng an intercept can drastically change the size of the supports and the time it takes to fit:
import numpy as np
import time
from libsvmdata import fetch_libsvm
X, y = fetch_libsvm("finance")
from celer import Lasso
for fit_intercept in [False, True]:
alpha_max = np.max(np.abs(X.T @ (y - fit_intercept*np.mean(y)))) / len(y)
clf = Lasso(alpha_max / 200, fit_intercept=fit_intercept, verbose=1)
t0 = time.time()
clf.fit(X, y)
t1 = time.time()
print(f"### fit_intercept={fit_intercept}, time to fit {t1-t0:.2f} s, supp size: {(clf.coef_ != 0).sum()}")
To do it without pain my idea is to add a gradient step with np.ones(features)
in the solver at the end of each CD epoch. That is like adding a column full of ones to X, but without copying it.
One needs to add a way to compute the lipschitz constant of this; I do not see another solution than to make it an attribute of datafit (eg 1 for Quadratic, 1/4 for Logistic)
Doable thanks to #110
To do so
positive
argument in the estimatorspositive
argument to the penaltiesWe have a considerable overhead when fitting Lasso Estimator as shown in the screenshot below
To reproduce go to benchopt Lasso benchmark repo
After investigating, this overhead is because of the computation of the global_lipschitz
skglm/skglm/datafits/single_task.py
Lines 52 to 54 in cca6d48
that we introduced after adding the FISTA solver #91.
The global_lipschitz
is only relevant for the FISTA solver. Hence, it should be computed only in this case.
It seems to work very well: scikit-learn/scikit-learn#23507
Following #110 (comment),
Lasso
and WeightedLasso
estimators share the same implementation logic.
How weights=None
is handled in the WeightedLasso
estimator hints at merging the two estimators by adding a weights
argument to Lasso.
To achieve that
WeightedLasso
weights
argument in Lasso
estimatorweighs
argument in path
and fit
methodsValidation of parameters at fit time seems to be an issue, parameters which should be set by the parent class of, e.g., MCPRegressor, are not returned by get_params()
and so clf._validate_params()
fails:
FAILED skglm/tests/test_estimators.py::test_check_estimator[Lasso] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X', 'tol', 'war...
FAILED skglm/tests/test_estimators.py::test_check_estimator[wLasso] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X', 'tol', 'wa...
FAILED skglm/tests/test_estimators.py::test_check_estimator[ElasticNet] - ValueError: The parameter constraints ['alpha', 'l1_ratio', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'cop...
FAILED skglm/tests/test_estimators.py::test_check_estimator[MCP] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X', 'tol', 'warm_...
FAILED skglm/tests/test_estimators.py::test_estimator[X0-Lasso] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X', 'tol', 'warm_s...
FAILED skglm/tests/test_estimators.py::test_estimator[X0-wLasso] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X', 'tol', 'warm_...
FAILED skglm/tests/test_estimators.py::test_estimator[X0-ElasticNet] - ValueError: The parameter constraints ['alpha', 'l1_ratio', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X...
FAILED skglm/tests/test_estimators.py::test_estimator[X0-MCP] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X', 'tol', 'warm_sta...
FAILED skglm/tests/test_estimators.py::test_estimator[X1-Lasso] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X', 'tol', 'warm_s...
FAILED skglm/tests/test_estimators.py::test_estimator[X1-wLasso] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X', 'tol', 'warm_...
FAILED skglm/tests/test_estimators.py::test_estimator[X1-ElasticNet] - ValueError: The parameter constraints ['alpha', 'l1_ratio', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X...
FAILED skglm/tests/test_estimators.py::test_estimator[X1-MCP] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X', 'tol', 'warm_sta...
FAILED skglm/tests/test_estimators.py::test_generic_estimator[Quadratic-L1-False-Lasso-pen_args0] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'm...
FAILED skglm/tests/test_estimators.py::test_generic_estimator[Quadratic-WeightedL1-False-WeightedLasso-pen_args1] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', ...
FAILED skglm/tests/test_estimators.py::test_generic_estimator[Quadratic-L1_plus_L2-False-ElasticNet-pen_args2] - ValueError: The parameter constraints ['alpha', 'l1_ratio', 'fit_intercept', 'nor...
FAILED skglm/tests/test_estimators.py::test_generic_estimator[Quadratic-MCPenalty-False-MCPRegression-pen_args3] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', '...
FAILED skglm/tests/test_estimators.py::test_grid_search[Lasso] - ValueError:
FAILED skglm/tests/test_estimators.py::test_grid_search[wLasso] - ValueError:
FAILED skglm/tests/test_estimators.py::test_grid_search[ElasticNet] - ValueError:
FAILED skglm/tests/test_estimators.py::test_grid_search[MCP] - ValueError:
FAILED skglm/tests/test_group.py::test_equivalence_lasso - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X', 'tol', 'warm_start', ...
FAILED skglm/tests/test_group.py::test_vs_celer_grouplasso[15-50-True] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X', 'tol', ...
FAILED skglm/tests/test_group.py::test_vs_celer_grouplasso[5-50-False] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X', 'tol', ...
FAILED skglm/tests/test_group.py::test_vs_celer_grouplasso[19-59-False] - ValueError: The parameter constraints ['alpha', 'fit_intercept', 'normalize', 'precompute', 'max_iter', 'copy_X', 'tol',...
================================================================ 24 failed, 112 passed, 1 xfailed, 16 warnings in 114.40s (0:01:54) ================================================
Reproduce by
clf = MCPRegression(
alpha=alpha, gamma=np.inf, fit_intercept=False, tol=tol)
print(clf.get_params())
clf.fit(X, y)
in the setup of test_estimators.py
As AA has been refactored in a class, we should make sure to catch a sneaky bug that can make AA fail silently.
I've already stumbled upon cases where the residual matrix is ill-conditioned, which makes the solution of the quadratic problem associated to AA take very large positive and very small negative values which cancel out.
This bug is not caught by the try
- except
block paying attention to LinAlgError
, and computational time is wasted.
We should make sure to catch this bug in the AA class.
See https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/solvers/cd_solver.py#L295 for more detailed explanation.
It is not obvious that ws_strategy=subdiff
won't work for some penalties (e.g. L0.5, L2/3, ...). Throwing an error and issuing a warning in this situation would prevent users from spending time figuring out the issue.
This means computation of p_obj_acc
is not needed. It's a big bottleneck in datasets such as Finance where the subproblems are very small compared to n_features
.
Returning an additional boolean variable as the output of accelerate
would allow us to skip the p_obj_acc
computation in cases where it's not needed
It is now inferred based on the datafit in glm_fit
This should make tests and code a bit simpler
I apologize for opening an issue that isn't a real bug.
I need to control the number of cores used when I fit a model.
I've tried the following things:
(1)
from numba import set_num_threads
set_num_threads(1)
(2)
from numba import config
config.NUMBA_NUM_THREADS = 1
(3) Going through the source code and replacing every instance of @njit
with @njit(parallel=False)
.
No matter what I try, I see my core usage shoot up to 128.
I'm on a cluster of machines connected by slow interconnect, so this core usage actually makes the model run very slow.
Does anybody understand how to control this?
@PABannier said the doc looks old and @Badr-MOUFAD did a great job at modernizing celer's doc at https://mathurinm.github.io/celer
in particular with respect to computation in optimality conditions
To reproduce, run this in test_estimators.py
:
estimator_name = "SVC"
clf = clone(dict_estimators_ours[estimator_name])
clf.verbose = 2
clf.tol = 1e-6 # failure in float32 computation otherwise
# if isinstance(clf, WeightedLasso):
# clf.weights = None
check_estimator(clf)
Currently, there is no warning and the solver diverges. We should raise a value error at fit time.
Probably calls for a non working set solver, or we can work something out with is_penalized
(that could be called yields_sparse_coefs
or something, ie if this is not true, we should always include the feature.
Currently penalty.value
on takes w
in argument
skglm/skglm/penalties/separable.py
Line 77 in 99d0a06
penalty.value(w, ws)
(one cannot simply call penalty.value(w[ws])
because of the weighted l1 normcould be useful for the SLOPE penalty
Doable after merging #68
fit_intercept
argument in SqrtLasso
fit_intercept
in the path
methodintercept_
in the fit
methodCurrently FISTA relies on "global" Lipschitz constant. For non-Lipschitz datafits (eg: Poisson), this is a problem, hence the support of line search.
Following https://github.com/a-rahimi/sparse-regression/blob/main/sparse%20regression.ipynb, the current prox-Newton solver does not handle fixpoint strategy, hence we cannot use this solver to solve l05 or l23 penalized problems
import numpy as np
from skglm.datafits import Quadratic
from skglm.penalties import L1
from skglm.solvers.cd_solver import cd_solver
from skglm.utils import make_correlated_data
n_samples, n_features = 10, 50
X, y, _ = make_correlated_data(n_samples, n_features, random_state=0)
alpha_max = np.linalg.norm(X.T @ y, ord=np.inf) / n_samples
datafit = Quadratic()
penalty = L1(alpha=alpha_max / 10.)
w = np.zeros(n_features)
Xw = np.zeros(n_samples)
cd_solver(X, y, datafit, penalty, w, Xw)
print("This line will never be printed")
print("This one also will never be printed")
print("...")
This is a dangerous behavior as the code breaks without giving any hint on what's wrong (even with a debugger).
After debugging, I find out that the problem emanates from passing in a non-initialized datafit
to cd_solver
. In fact, the solver attempts to access the Lipschitz constants of datafit to compute grads, which doesn't exist.
A natural way to fix this bug is to make cd_solver
a private member to prevent calling it directly from the outside.
Not having the Lipschitz constant doesn't explain why the code breaks without throwing any error. I suspect a bug in Numba
compilation and here is a small script to reproduce the bug
import numpy as np
from numba import float64
from numba.experimental import jitclass
@jitclass([('Xty', float64[:])])
class Quadratic:
def __init__(self):
pass
def gradient_scalar(self, X, Xw, j):
return (X[:, j] @ Xw - self.Xty[j]) / len(Xw)
X = np.ones((2, 5))
Xw = np.zeros(2)
Quadratic().gradient_scalar(X, Xw, 0)
print("This line will never be printed")
We use it only to define C_LIST in utils.py
, it's not worth having a dependency.
For the doc, it can be installed directly on Circle CI together with numpydoc and other doc-only packages.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.