benchopt / benchopt Goto Github PK

View Code? Open in Web Editor NEW

218.0 6.0 50.0 1.82 MB

Making your benchmark of optimization algorithms simple and open

Home Page: https://benchopt.github.io

License: BSD 3-Clause "New" or "Revised" License

Python 72.98% Shell 0.66% Julia 0.07% R 0.19% Makefile 0.03% CSS 7.84% HTML 8.77% JavaScript 9.45%

optimization machine-learning python benchmark julia-language optimization-methods rlang

benchopt's Introduction

—Making your ML and optimization benchmarks simple and open—

Benchopt is a benchmarking suite tailored for machine learning workflows. It is built for simplicity, transparency, and reproducibility. It is implemented in Python but can run algorithms written in many programming languages.

So far, benchopt has been tested with Python, R, Julia and C/C++ (compiled binaries with a command line interface). Programs available via conda should be compatible as well. See for instance an example of usage with R.

Install

It is recommended to use benchopt within a conda environment to fully-benefit from benchopt Command Line Interface (CLI).

To install benchopt, start by creating a new conda environment and then activate it

conda create -n benchopt python
conda activate benchopt

Then run the following command to install the latest release of benchopt

pip install -U benchopt

It is also possible to use the latest development version. To do so, run instead

pip install --pre benchopt -U -i https://test.pypi.org/simple

Getting started

After installing benchopt, you can

replicate/modify an existing benchmark
create your own benchmark

Using an existing benchmark

Replicating an existing benchmark is simple. Here is how to do so for the L2-logistic Regression benchmark.

Clone the benchmark repository and cd to it

git clone https://github.com/benchopt/benchmark_logreg_l2
cd benchmark_logreg_l2

Install the desired solvers automatically with benchopt

benchopt install . -s lightning -s sklearn

Run the benchmark to get the figure below

benchopt run . --config ./example_config.yml

These steps illustrate how to reproduce the L2-logistic Regression benchmark. Find the complete list of the Available benchmarks. Also, refer to the documentation to learn more about benchopt CLI and its features. You can also easily extend this benchmark by adding a dataset, solver or metric. Learn that and more in the Benchmark workflow.

Creating a benchmark

The section Write a benchmark of the documentation provides a tutorial for creating a benchmark. The benchopt community also maintains a template benchmark to quickly and easily start a new benchmark.

Finding help

Join benchopt discord server and get in touch with the community! Feel free to drop us a message to get help with running/constructing benchmarks or (why not) discuss new features to be added and future development directions that benchopt should take.

Citing Benchopt

Benchopt is a continuous effort to make reproducible and transparent ML and optimization benchmarks. Join us in this endeavor! If you use benchopt in a scientific publication, please cite

@inproceedings{benchopt,
   author    = {Moreau, Thomas and Massias, Mathurin and Gramfort, Alexandre
                and Ablin, Pierre and Bannier, Pierre-Antoine
                and Charlier, Benjamin and Dagréou, Mathieu and Dupré la Tour, Tom
                and Durif, Ghislain and F. Dantas, Cassio and Klopfenstein, Quentin
                and Larsson, Johan and Lai, En and Lefort, Tanguy
                and Malézieux, Benoit and Moufad, Badr and T. Nguyen, Binh and Rakotomamonjy,
                Alain and Ramzi, Zaccharie and Salmon, Joseph and Vaiter, Samuel},
   title     = {Benchopt: Reproducible, efficient and collaborative optimization benchmarks},
   year      = {2022},
   booktitle = {NeurIPS},
   url       = {https://arxiv.org/abs/2206.13424}
}

Available benchmarks

Problem	Results	Build Status
Ordinary Least Squares (OLS)	Results
Non-Negative Least Squares (NNLS)	Results
LASSO: L1-Regularized Least Squares	Results
LASSO Path	Results
Elastic Net
MCP	Results
L2-Regularized Logistic Regression	Results
L1-Regularized Logistic Regression	Results
L2-regularized Huber regression
L1-Regularized Quantile Regression	Results
Linear SVM for Binary Classification
Linear ICA
Approximate Joint Diagonalization (AJD)
1D Total Variation Denoising
2D Total Variation Denoising
ResNet Classification	Results
Bilevel Optimization	Results

benchopt's People

Contributors

Stargazers

Watchers

Forkers

mathurinm agramfort josephsalmon bcharlier tomdlt eugenendiaye gdurif ngazagna arakotom huang-kx tommoral zaccharieramzi clej philouc tanglef bmalezieux goujilinouhaila-coder hassibatej jolars pabannier enlai111 simondelamare badr-moufad tbng bokun-wang cassiofragadantas matdag amelievernay antoinesimoes inria-thoth tachella tesixiao albanpi ogrisel apmellot matthieutrs svaiter pierreablin cohenjer theoguyard chris-mrn francois-rozet lilianboulard paquiteau antoinecollas joerucodes pbarbarant melvin-klein rflamary

benchopt's Issues

BUG: broken link in the doc how.html to benchmarks folder

The first link at https://benchopt.github.io/how.html, points to https://github.com/benchopt/benchOpt/tree/master/benchmarks, which gives a 404

DOC : Write on the website what are the plot kinds available

On :
https://benchopt.github.io/cli.html#benchopt-plot
user don't know what they can write as <kinds>.

API benchopt run -f baseline reinstalls all solvers

ad then it runs them all:

(base) ➜  benchOpt git:(add_fista) ✗ benchopt run lasso -f baseline -n 1 -d boston
Uninstalling Baseline in lasso:... done
Installing Baseline in lasso:... done
Installing Blitz in lasso:... done
Installing cd in lasso:... done
Installing Celer in lasso:... done
Installing cvxpy in lasso:... done
Installing Cyanure in lasso:... done
Installing Lightning in lasso:... done
Boston
|--Lasso regression(reg=0.05)
|----Baseline(use_acceleration=false): done               
|----Baseline(use_acceleration=false): done               
|----Blitz: done

Tests fail locally with benchopt version error

If I try to run test_cli.py I get :

E AssertionError: Installed the wrong version of benchopt (1.1.1.dev23) in conda env. This should be version: 1.1.1.dev22. There is something wrong the env creation mechanism. Please report this error to https://github.com/benchopt/benchopt

This happens here:

E           AssertionError: Installed the wrong version of benchopt (1.1.1.dev23) in conda env. This should be version: 1.1.1.dev22. There is something wrong the env creation mechanism. Please report this error to https://github.com/benchopt/benchopt

I also get

E               RuntimeError: No conda environment is activated.

../cli/main.py:233: RuntimeError

E               RuntimeError: No conda environment is activated.

../cli/main.py:233: RuntimeError

Issue with conda env creation in run

Hi,

I am trying to run the benchmark with a conda env.
I am running it the following way: benchopt run --env ./.
I installed benchopt from source (with the latest version).
My conda version is 4.5.12 (just tried with an updated 4.9.2 and it's the same story), and Python is 3.6.8.
The error I get is:

Creating conda env benchopt_benchmark_lasso:... failed to create the environment.
Traceback (most recent call last):
  File "/home/zaccharie/workspace/benchmark_lasso/venv/bin/benchopt", line 33, in <module>
    sys.exit(load_entry_point('benchopt', 'console_scripts', 'benchopt')())
  File "/home/zaccharie/workspace/benchmark_lasso/venv/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/zaccharie/workspace/benchmark_lasso/venv/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/zaccharie/workspace/benchmark_lasso/venv/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/zaccharie/workspace/benchmark_lasso/venv/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/zaccharie/workspace/benchmark_lasso/venv/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/zaccharie/workspace/benchOpt/benchopt/cli.py", line 115, in run
    create_conda_env(env_name, recreate=recreate)
  File "/home/zaccharie/workspace/benchOpt/benchopt/utils/shell_cmd.py", line 184, in create_conda_env
    capture_stdout=True, raise_on_error=True
  File "/home/zaccharie/workspace/benchOpt/benchopt/utils/shell_cmd.py", line 93, in _run_shell
    raise RuntimeError(raise_on_error.format(output=output))
RuntimeError: /tmp/tmpn6mny88h:2: command not found: conda

When I turn on the debugging flag (BENCHO_DEBUG=True), I find that the env creation is done with the following command conda env create -n benchopt_benchmark_lasso -f /tmp/conda_env_7xfo9azm.yml.

When I run this command on its own, I find the following error:

SpecNotFound: Invalid name, try the format: user/package

This error has been described here as well, and apparently it carries on until today.
I tried to fix it by simply removing the env in the command in this line, but I still get the same error (i.e. RuntimeError: /tmp/tmpn6mny88h:2: command not found: conda).

For further investigation, I added a breakpoint here, and saw that the command being run was in a tmp file, in this instance /usr/bin/zsh /tmp/tmp1dwabbzo.

When using this command in a different shell (with the tmp file being still present), I get the following error:

Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - /tmp/conda_env_kwp39_w4.yml

Current channels:

  - https://repo.anaconda.com/pkgs/main/linux-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/linux-64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

I can see how it's not the same as command not found: conda. But I have no idea on how to debug further than this because the command run in the terminal or the command run in the subprocess.getstatusoutput do not give the same results...

For info the content of /tmp/tmp1dwabbzo is:

set -e
conda create  -n benchopt_benchmark_lasso -f /tmp/conda_env_kwp39_w4.yml

BUG Automatic install the solver depencies fails

$ benchopt run --env-name benchopt-env ./path/to/benchmark

returns

Usage: benchopt run [OPTIONS] BENCHMARK
Try 'benchopt run -h' for help.

Error: Invalid value for '--local' / '-l': benchopt-env is not a valid boolean

ENH : adding bibtex / references to credit solvers

That would be great to have a uniform way of crediting the papers / solvers added to the various benchmarks.
For the moment, no credit is given (except may be the github where some sources might be coming from).

Bibtex of the papers might be reasonable start.

Any thought on the way to proceed @agramfort ?

It's hard to bench on new dataset

it seems quite difficult to launch a benchmark on a new dataset.

Adding a file new_data.py by copying boston.py for example does not work
It would also be cool to be able to download a local dataset and benchmark on it.

WDYT?

API newly coded solver is not run and no error message is shown

To test #20 I ran the following command and gapsafe solver was not included in the benchmark. Is it the desired behavior ?

(base) ➜  benchOpt git:(gapsafe) ✗ benchopt run lasso -s celer -s gapsafe -d simulated
Simulated(n_samples=100,n_features=5000)
|--Lasso regression(reg=0.05)
|----Celer: done                                          
|--Lasso regression(reg=0.1)
|----Celer: done                                          
|--Lasso regression(reg=0.5)
|----Celer: done                                          
Simulated(n_samples=100,n_features=10000)
|--Lasso regression(reg=0.05)
|----Celer: done                                          
|--Lasso regression(reg=0.1)
|----Celer: done                                          
|--Lasso regression(reg=0.5)
|----Celer: done

ENH gain time for some solvers by returning list of times and iterates

When we code a solver specifically for a benchmark, we can return a list of (time, iterate) (or two lists), so that we get one full convergence curve per run, instead of having to run with different max_iter or tol. I think the time to get a curve should go from quadratic in max_iter to linear.

This means we trust the user to dump the correct time, but the curves could be altered anyways with the current process, so IMO we should assume good faith.

ENH Rename viz as plotting

CLI (BUG?) : the percentage are printed on new lines

When I run a benchmark I see:

while I would expect the line to be updated every time the percentage is incremented. Is it normal ?

[ENH] Add coordinate descent and FISTA solvers for Lasso/Logreg

CD for Lasso: #8

ENH : verbose information at benchmark install / conda env creation

While installing benchmark_lasso, the installation process is quite long without any upfront information.

It would interesting to set conda create to verbose by default (with a possibility to silence it).

Additionally, knowing in advance the precise download size for datasets might also been needed.

BUG Automatic install of classes fails

First, I have corrected in the benchmark solver files of benchmark_logreg_l2 in my branch FIX_utils_import

from benchopt.util import safe_import_context

from benchopt.utils.safe_import import safe_import_context

to solve a first bug which should be raised in an issue in all concerned benchmarks.

Then I still get this error because lightning cannot be installed automatically by the Benchmark class:

$ benchopt run ../benchmarks/benchmark_logreg_l2 
Installing Lightning in None:... failed
/home/nidham/phd/benchOpt/benchopt/benchmark.py:116: UserWarning: Some solvers were not successfully installed, and will thus be ignored. Use 'export BENCHO_RAISE_INSTALL_ERROR=true' to stop at any installation failure and print the traceback.
  warnings.warn(
/tmp/tmp02ch3slw: line 2: benchopt.benchmark.Benchmark: No such file or directory

ENH : add benchopt clean command

benchopt clean would be a way to remove the __cache__ and outputs dirs

ENH generate .py file that plots the figures after a benchmark

Hi,

This is just a random idea: it would be neat if a .py file that plots the figure of the output of the benchmark was also generated, a file like plot_benchmark.py so that python plot_benchmark.py plots the figure. This way, the user can then easily iterate on this .py file to have paper-quality figures.

ENH remove I/O system calls from timing for non-python solvers

It is possible to get the I/O time for a subprocess by using either strace or /usr/bin/time.
With this it should be possible to correct for the I/O time in languages other than python.

This could be investigated.

ENH: improve benchopt manual

(base) ➜ ~ benchopt -h
Usage: benchopt [OPTIONS] COMMAND [ARGS]...

Command-line interface to benchOpt

Options:
-h, --help Show this message and exit.

Commands:
bench Run benchmark.
run Run benchmark.

The difference between the two commands is not very clear only from this info; benchopt bench -h and benchopt run -h did not help me a lot since the only difference I see is the recreate option in one of them.

Graphic card information in results

It would be good to have the name of the graphic card in addition to the CUDA version currently saved.
With NVIDIA-smi there is the command:

nvidia-smi --query-gpu=name --format=csv,noheader

so using subprocess module (already used for other informations), we can get it easily. Not sure for others though.

BUG fail to run default benchmark (PyPI version)

Hi,

On a fresh install (inside a clean python virtual environment) via pip, I cannot run any of the standard benchmarks

$ pip install benchopt
$ git clone https://github.com/benchopt/benchmark_ols
$ benchopt run benchmark_ols 
Traceback (most recent call last):
  File "/xxxx/benchopt/.pyenv/bin/benchopt", line 11, in <module>
    sys.exit(main())
  File "/xxxx/benchopt/.pyenv/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/xxxx/benchopt/.pyenv/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/xxxx/benchopt/.pyenv/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/xxxx/benchopt/.pyenv/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/xxxx/benchopt/.pyenv/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/xxxx/benchopt/.pyenv/lib/python3.6/site-packages/benchopt/cli.py", line 80, in run
    validate_dataset_patterns(benchmark, dataset_names)
  File "/xxxx/benchopt/.pyenv/lib/python3.6/site-packages/benchopt/utils/checkers.py", line 35, in validate_dataset_patterns
    datasets = list_benchmark_datasets(benchmark)
  File "/xxxx/benchopt/.pyenv/lib/python3.6/site-packages/benchopt/util.py", line 104, in list_benchmark_datasets
    return _list_benchmark_classes(benchmark_dir, 'Dataset')
  File "/xxxx/benchopt/.pyenv/lib/python3.6/site-packages/benchopt/util.py", line 91, in _list_benchmark_classes
    classes.append(_load_class_from_module(module_filename, class_name))
  File "/xxxx/benchopt/.pyenv/lib/python3.6/site-packages/benchopt/utils/dynamic_modules.py", line 46, in _load_class_from_module
    module = _get_module_from_file(module_filename)
  File "/xxxx/benchopt/.pyenv/lib/python3.6/site-packages/benchopt/utils/dynamic_modules.py", line 22, in _get_module_from_file
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "benchmark_ols/datasets/boston.py", line 1, in <module>
    from benchopt import BaseDataset
ImportError: cannot import name 'BaseDataset'

same for benchmark_lasso, benchmark_logreg_l2, benchmark_logreg_l1, benchmark_nnls.

Here are my system info:

$ python -V
Python 3.6.9
$ uname -a
Linux xxxx 5.4.0-59-generic #65~18.04.1-Ubuntu SMP Mon Dec 14 15:59:40 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Am I missing something ? Thanks in advance

ENH : refine solver name matching

currently -s celer also selects python-pgd[use_acceleration=True] because it contains celer

ENH Add a requirements.txt for each benchmark

I wanted to test the lasso benchmark and I had to install manually each solver (cyanure, celer, lightning etc). I guess we could try to write a requirements.txt to install all these solvers directly.

ENH Save the loss at the initial point

When implementing stochastic methods, I realized that the first point at which the loss is computed is $w^1$ and not $w^0$.

By modifying this all the curves will start from the same point, which makes more sense

ENH : Need to give a variety of plotting options

We might need to add different plotting options (benchopt plot in the cli) since we will need to plot different things on the y axis

train suboptimality (currently plotted)
test suboptimality
convergence of iterates

sysinfo error when rendering html

trying to run any benchmark with current master leads to this error:

Writing results to outputs/benchmark_linear_ica_benchopt_run_2021-05-30_21h30m34.html
Traceback (most recent call last):
  File "/Users/alex/miniconda3/bin/benchopt", line 33, in <module>
    sys.exit(load_entry_point('benchopt', 'console_scripts', 'benchopt')())
  File "/Users/alex/miniconda3/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/alex/miniconda3/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/alex/miniconda3/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/alex/miniconda3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/alex/miniconda3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/alex/work/src/benchOpt/benchopt/cli/main.py", line 97, in run
    run_benchmark(
  File "/Users/alex/work/src/benchOpt/benchopt/runner.py", line 496, in run_benchmark
    plot_benchmark(save_file, benchmark)
  File "/Users/alex/work/src/benchOpt/benchopt/plotting/__init__.py", line 43, in plot_benchmark
    plot_benchmark_html(fname, benchmark, kinds, display)
  File "/Users/alex/work/src/benchOpt/benchopt/plotting/generate_html.py", line 427, in plot_benchmark_html
    rendered = render_benchmark(
  File "/Users/alex/work/src/benchOpt/benchopt/plotting/generate_html.py", line 301, in render_benchmark
    return Template(
  File "/Users/alex/miniconda3/lib/python3.8/site-packages/mako/template.py", line 473, in render
    return runtime._render(self, self.callable_, args, data)
  File "/Users/alex/miniconda3/lib/python3.8/site-packages/mako/runtime.py", line 878, in _render
    _render_context(
  File "/Users/alex/miniconda3/lib/python3.8/site-packages/mako/runtime.py", line 920, in _render_context
    _exec_template(inherit, lclcontext, args=args, kwargs=kwargs)
  File "/Users/alex/miniconda3/lib/python3.8/site-packages/mako/runtime.py", line 947, in _exec_template
    callable_(context, *args, **kwargs)
  File "_Users_alex_work_src_benchOpt_benchopt_plotting_html_templates_benchmark_mako_html", line 38, in render_body
KeyError: 'sysinfo'

the function _fetch_cached_run_list seems to remove the sysinfo from the results dictionaries and the rendering fails.

@tanglef do I miss something ? It works for you?

Shouldn't you have two sequences of variables for the accelerated version?

For instance this line only has variables w

https://github.com/benchopt/benchOpt/blob/01acb156bd76708b01a30c3de8918609f1cde3ff/benchmarks/lasso/solvers/python_pgd.py#L31

but I think you need two sequences. Something like this (note the additional variables in x)

   L = np.linalg.norm(self.X, ord=2) ** 2
   n_features = self.X.shape[1]
   w = np.zeros(n_features)
   x = np.zeros(n_features)
   t_new = 1

    for _ in range(n_iter):
        if self.use_acceleration:
            t_old = t_new
            t_new = (1 + np.sqrt(1 + 4 * t_old ** 2)) / 2
        grad = self.X.T.dot(self.X.dot(w) - self.y)
        x_old = x.copy()
        x -= grad / L
        x = self.st(x, self.lmbd / L)  # Not sure what this step does
        if self.use_acceleration:
            w = x+ (t_old - 1.) / t_new * (x -x_old)
    self.w = w

Running benchmark_logreg_l2 fails to install sklearn

I freshly pulled this master, did the pip install -e ., had no preexisting benchopt_ conda env, and tried to run the sklearn solver. Install fails and I get a surprising command not found benchopt after.

Steps to reproduce (?):
fresh install of benchopt, go to benchmark_logreg_l2 and try to benchopt run . -s sklearn

(base) ➜  benchmark_logreg_l2 git:(master) benchopt run . -s sklearn      
Could not generate requirement for distribution -cikit-learn 0.23.0 (/home/mathurin/miniconda3/lib/python3.7/site-packages): Parse error at "'-cikit-l'": Expected W:(abcd...)
Installing Simulated in benchopt_:... failed
/home/mathurin/workspace/benchOpt/benchopt/util.py:156: UserWarning: Some solvers were not successfully installed, and will thus be ignored. Use 'export BENCHO_RAISE_INSTALL_ERROR=true' to stop at any installation failure and print the traceback.
  UserWarning
Installing sklearn in benchopt_:... failed
Installing sklearn in benchopt_:... failed
Installing sklearn in benchopt_:... failed
/tmp/tmpw6thjflk:4: command not found: benchopt

Tests not run in PRs in benchmarks subdirectories

When opening a PR in a benchmark subdirectory, eg benchopt/benchmark_logreg_l2, automated tests are not run.

See: benchopt/benchmark_logreg_l2#1

ENH: ctrl-C takes >10s to kill benchopt when the figures are showing

MTN test webhook

Feedback line could be more informative

Currently the displayed text is: |----Python-PGD[use_acceleration=True]: 21.0% (1 / 5)

the 1 / 5 could be made more explit (1 / 5 reps) ?
the percentage is misleading as each run takes longer than the previous

TST add a test that a solver can run with n_iter=0

Some solvers do not support n_iter=0 or INFINITY for solver based on tolerance.
It would be nice to add a test in test_benchmark that check this.

cannot run benchmark

Hi,

on two different machines running two different variants of anaconda
pip install benchopt
git clone https://github.com/benchopt/benchmark_lasso
benchopt run ./benchmark_lasso

results in

from benchopt import BaseDataset

ImportError: cannot import name 'BaseDataset' from 'benchopt' (/opt/miniconda3/lib/python3.7/site-packages/benchopt/init.py)

Finance (and possibly other datasets) issues with zeros columns

When launching benchOpt with the finance dataset (for lasso or nnls) you can get such a warning:

lasso/solvers/cd.py", line 46, in run
    L, n_iter)
ZeroDivisionError: division by zero

For finance this could be solved by doing the same pre-processing as in celer, removing columns with too small norms: https://github.com/mathurinm/celer/blob/dea87fe226450770c3dc3bfe23cf421202221dcf/celer/datasets/libsvm.py#L143

But what should we do in general to handle (the weird case) where one column is a zero vector?

hope nobody proposes such a dataset (so raise a warning on the solver page)
rewrite the solvers to avoid such an issue?

wdyt @mathurinm @agramfort ?

BUG cannot load Lapack in R solvers

The solver benchmarks/lasso/solvers/r_proximal_gradient.py fails when calling norm(X, "2") to compute the Lipschitz constant with error

rpy2.rinterface_lib.embedded.RRuntimeError: Error in La.svd(x, nu, nv) : LAPACK routines cannot be loaded

This seems to have to do with interaction with BLAS library loaded with numpy as calling sessionInfo in R from the environment does not raise the error while the following script reproduces

from benchmarks.lasso.solvers.r_proximal_gradient import robjects
r_ista = robjects.r['sessionInfo']()

BUG Pyjulia update broke on macOS

The latest pyjulia release (0.5.6 on the 14/09/2020) seems to have broken the CI on OSX and consistently cause a segfault.

It was working until this build on the 15/09 6:52AM.
It broke down with SEGFAULT without changes on this build on the 15/09 3PM.

Maybe we should pin the version to 0.5.5 while this is not resolve?

ENH add A2DR solver for logreg and Lasso

A2DR is a general purpose package to solve proximable optimization problems.
The package by Anqi Fu: https://github.com/cvxgrp/a2dr

Adding it in the benchmark would be very interesting.
It seems to at least work on the Lasso and the logreg.

support different versions of the doc (stable / dev)

with sphinx...

ENH simulated dataset has nearly diagonal covariance

I understand that's it a toy dataset, but we could introduce a bit of correlation in the data: the solvers run in < 0.01 s and the results are not trustable. I wouldn't mind per se, if it weren't the first figure that people were shown. Additionally I wouldn't accept such a dataset in a paper I review.

We can easily construct a Toeplitz 2d array without scipy (corr_{ij} = rho ** |i - j|) and use it as covariance when generating data. Afterwards, if one wants, rho could be made a parameter of the dataset.

What is your take on this @tomMoral @agramfort ?

ENH: running benchmark on Finance raises joblib warning multiple times

Since X is large, persisting it takes some time and joblib raises the same warning which takes a lot of space in the cli:

/home/mathurin/workspace/benchOpt/benchopt/runner.py:135: UserWarning: Persisting input arguments took 2.08s to run.
If this happens often in your code, it can cause performance problems 
(results will be correct in all cases). 
The reason for this is probably some large input arguments for a wrapped
 function (e.g. large strings).
THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an
 example so that they can fix the problem.
  force=force
/home/mathurin/workspace/benchOpt/benchopt/runner.py:135: UserWarning: Persisting input arguments took 2.04s to run.
If this happens often in your code, it can cause performance problems 
(results will be correct in all cases). 
The reason for this is probably some large input arguments for a wrapped
 function (e.g. large strings).
THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an
 example so that they can fix the problem.
  force=force

Celer installation fails

I run into an issue when trying to benchmark celer:


(base) ➜  benchOpt git:(master) benchopt bench -n 3 --max-samples 10 -s celer
Mathurin's version of Blitzl1
Installing solver Blitz in lasso:... done
Installing solver Celer in lasso:...Traceback (most recent call last):
  File "/home/mathurin/miniconda3/bin/benchopt", line 11, in <module>
    load_entry_point('benchopt', 'console_scripts', 'benchopt')()
  File "/home/mathurin/workspace/benchOpt/benchopt/cli.py", line 93, in start
    main()
  File "/home/mathurin/miniconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/mathurin/miniconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/mathurin/miniconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/mathurin/miniconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/mathurin/miniconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/mathurin/workspace/benchOpt/benchopt/cli.py", line 80, in bench
    env_name=benchmark)
  File "/home/mathurin/workspace/benchOpt/benchopt/util.py", line 287, in install_solvers
    solver.install(env_name=env_name, force=force_install)
  File "/home/mathurin/workspace/benchOpt/benchopt/base.py", line 193, in install
    pip_install_in_env(cls.package_install, env_name=env_name)
  File "/home/mathurin/workspace/benchOpt/benchopt/util.py", line 106, in pip_install_in_env
    msg=f"Failed to pip install packages {packages}\n"
  File "/home/mathurin/workspace/benchOpt/benchopt/util.py", line 96, in _run_bash_in_env
    return _run_in_bash(script, msg=msg)
  File "/home/mathurin/workspace/benchOpt/benchopt/util.py", line 66, in _run_in_bash
    raise RuntimeError(msg.format(output=output))
RuntimeError: Failed to pip install packages ('git+https://github.com/mathurinm/celer.git',)
Error:Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-zczpjezu/

It works fine with the blitz solver. Updating setuptools did not fix the issue (Version: 45.1.0.post20200119), neither did removing my local install of celer.
Any idea where it might come from ?

PyTorch based opt library ?

Hi everyone,

I've been working on an optimization library based on PyTorch.
Everything in it may not directly fit current benchopt benchmarks, but I'd be glad to help integrate it.
Let me know if you're interested!

ENH allow the user to pass the optimal objective function value

It would be nice to have a way for the user to pass its own optimal function value, so that it is not always taken as the minimum over all runs, but just as a function of the dataset. For instance, sometimes you know that F^* = 0.

Issue in the doc home page index.rst

Section Command line usage, in the final chunk:

Use

.. code-block::

    $ benchopt run -h

for more details about different options read the :ref:`api_documentation`.

Something is missing in the final line, two sentences seem to be merged in one Use <command> for ... and For more details about ..., read the api_documentation.

I could rephrase it but I want to be sure that there is not a full missing section on the Python API.

Better message for install and import failure

Running the benchmark, both cvxpy and cyanure seem to fail at import, but the error is caught by safe_import. The error messages are not informing of the import error. I wonder why you use safe_import, and if it would not be more informative to raise an ImportError.
The corresponding installing steps could also report the failure, like
Installing cvxpy in lasso:... failed.

Traceback:

Creating venv lasso:... done
Installing Boston in lasso:... done
Installing finance in lasso:... done
Installing Blitz in lasso:... done
Installing cd in lasso:... done
Installing Celer in lasso:... done
Installing cvxpy in lasso:... done
Installing Cyanure in lasso:... done
Installing Lightning in lasso:... done
Boston
|--Lasso regression(reg=0.05)
|----Baseline(use_acceleration=false): done               
|----Baseline(use_acceleration=true): done                
|----Blitz: done                                          
/home/tom/work/github/benchOpt/benchmarks/lasso/solvers/cd.py:60: NumbaPerformanceWarning: '@' is faster on contiguous arrays, called on (array(float64, 1d, A), array(float64, 1d, C))
  w[j] = st(w[j] + X[:, j] @ R / L[j], lmbd / L[j])
/home/tom/work/github/benchOpt/.venv/lasso/lib/python3.8/site-packages/numba/typing/npydecl.py:958: NumbaPerformanceWarning: '@' is faster on contiguous arrays, called on (array(float64, 1d, A), array(float64, 1d, C))
  warnings.warn(NumbaPerformanceWarning(msg))
|----Cd: done                                             
|----Celer: done                                          
|----Cvxpy: failed                                        
Traceback (most recent call last):
  File "/home/tom/work/github/benchOpt/benchopt/runner.py", line 131, in run_one_solver
    sample_curve, objective_value = run_one_sample(
  File "/home/tom/work/github/benchOpt/.venv/lasso/lib/python3.8/site-packages/joblib/memory.py", line 568, in __call__
    return self._cached_call(args, kwargs)[0]
  File "/home/tom/work/github/benchOpt/.venv/lasso/lib/python3.8/site-packages/joblib/memory.py", line 534, in _cached_call
    out, metadata = self.call(*args, **kwargs)
  File "/home/tom/work/github/benchOpt/.venv/lasso/lib/python3.8/site-packages/joblib/memory.py", line 734, in call
    output = self.func(*args, **kwargs)
  File "/home/tom/work/github/benchOpt/benchopt/runner.py", line 81, in run_one_sample
    cost, objective_value = run_repetition(*args)
  File "/home/tom/work/github/benchOpt/.venv/lasso/lib/python3.8/site-packages/joblib/memory.py", line 568, in __call__
    return self._cached_call(args, kwargs)[0]
  File "/home/tom/work/github/benchOpt/.venv/lasso/lib/python3.8/site-packages/joblib/memory.py", line 534, in _cached_call
    out, metadata = self.call(*args, **kwargs)
  File "/home/tom/work/github/benchOpt/.venv/lasso/lib/python3.8/site-packages/joblib/memory.py", line 734, in call
    output = self.func(*args, **kwargs)
  File "/home/tom/work/github/benchOpt/benchopt/runner.py", line 53, in run_repetition
    solver.set_objective(**objective.to_dict())
  File "/home/tom/work/github/benchOpt/benchmarks/lasso/solvers/cvxpy.py", line 23, in set_objective
    self.beta = cp.Variable(n_features)
NameError: name 'cp' is not defined
|----Cyanure: failed                                      
Traceback (most recent call last):
  File "/home/tom/work/github/benchOpt/benchopt/runner.py", line 131, in run_one_solver
    sample_curve, objective_value = run_one_sample(
  File "/home/tom/work/github/benchOpt/.venv/lasso/lib/python3.8/site-packages/joblib/memory.py", line 568, in __call__
    return self._cached_call(args, kwargs)[0]
  File "/home/tom/work/github/benchOpt/.venv/lasso/lib/python3.8/site-packages/joblib/memory.py", line 534, in _cached_call
    out, metadata = self.call(*args, **kwargs)
  File "/home/tom/work/github/benchOpt/.venv/lasso/lib/python3.8/site-packages/joblib/memory.py", line 734, in call
    output = self.func(*args, **kwargs)
  File "/home/tom/work/github/benchOpt/benchopt/runner.py", line 81, in run_one_sample
    cost, objective_value = run_repetition(*args)
  File "/home/tom/work/github/benchOpt/.venv/lasso/lib/python3.8/site-packages/joblib/memory.py", line 568, in __call__
    return self._cached_call(args, kwargs)[0]
  File "/home/tom/work/github/benchOpt/.venv/lasso/lib/python3.8/site-packages/joblib/memory.py", line 534, in _cached_call
    out, metadata = self.call(*args, **kwargs)
  File "/home/tom/work/github/benchOpt/.venv/lasso/lib/python3.8/site-packages/joblib/memory.py", line 734, in call
    output = self.func(*args, **kwargs)
  File "/home/tom/work/github/benchOpt/benchopt/runner.py", line 53, in run_repetition
    solver.set_objective(**objective.to_dict())
  File "/home/tom/work/github/benchOpt/benchmarks/lasso/solvers/cyanure.py", line 21, in set_objective
    self.solver = Regression(loss='square', penalty='l1',
NameError: name 'Regression' is not defined

ENH: benchopt must be run from the repo folder

However there is a benchopt alias defined, and running it from another folder either gives no output - if no benchmark is specified :

(base) ➜  ~ benchopt bench --max-samples 5 -n 10  -s blitz 
(base) ➜  ~

(no figures generated)
either fails :

(base) ➜  ~ benchopt bench --max-samples 5 -n 10 logreg -s blitz
...
AssertionError: {'logreg'} is not a valid benchmark. Should be one of: []

ENH change the y lims in the figures so that it scales automatically

Sometimes you might want to monitor an objective that is different that the function optimized by the solver (eg the error between the predicted coefficients and true coefficients when doing linear regression), and in these cases the value of the objective might not get as close as 10^-9 to the optimal objective. In this case, having an ylim go to 10^-9 makes the plot rather unreadable.

cannot run benchmark_lasso in local

I am trying to run the benchmark_lasso within my conda env named my_optim_env.

I installed benchopt within this environment with pip install benchopt. With conda list, I checked and benchopt does appear installed in this env. I also installed dependencies e.g Scipy, Numpy. I have cloned benchmark_lasso to a specific directory .../optimization/.
Then, within my_optim_env (with Miniconda3), I went to the directory where benchmark_lasso was cloned and entered benchopt run -l ../optimization/benchmark_lasso to run it in local but it returned the following error:

Traceback (most recent call last): File "c:\users\...\miniconda3\envs\my_optim_env\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "c:\users\...\miniconda3\envs\my_optim_env\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\...\Miniconda3\envs\my_optim_env\Scripts\benchopt.exe\__main__.py", line 4, in <module> ModuleNotFoundError: No module named 'benchopt'

So it seems that whatever I try to do, the command benchopt is not recognized by conda.
Am I doing something wrongly ?
I have checked that benchopt.exe exists is in this env (under directory /Scripts/).
I apologize the issue seems to be really basic. I also tried with the windows shell, the output error is the same.
Could you please some help to solve it ?

conda version: 4.8.3. Python version: 3.6.12

Capture stdout of C or FORTRAN solvers

As described in #25 (comment), we might want to capture stdout of C or FORTRAN solvers, which is not straightforwardly doable with contextlib.redirect_stdout, but something like this solution works.

I wonder if this is something worth adding to benchopt, since it might be useful for multiple solvers, including benchmarks/nnls/solvers/nnls_scipy.py.