openml / automlbenchmark Goto Github PK

View Code? Open in Web Editor NEW

398.0 15.0 132.0 117.05 MB

OpenML AutoML Benchmarking Framework

Home Page: https://openml.github.io/automlbenchmark

License: MIT License

Shell 2.95% Python 78.22% R 0.52% Jupyter Notebook 18.31%

automl machine-learning benchmark

automlbenchmark's Introduction

AutoML Benchmark

The OpenML AutoML Benchmark provides a framework for evaluating and comparing open-source AutoML systems.
The system is extensible because you can add your own AutoML frameworks and datasets. For a thorough explanation of the benchmark, and evaluation of results, you can read our paper.

Automatic Machine Learning (AutoML) systems automatically build machine learning pipelines or neural architectures in a data-driven, objective, and automatic way. They automate a lot of drudge work in designing machine learning systems, so that better systems can be developed, faster. However, AutoML research is also slowed down by two factors:

We currently lack standardized, easily-accessible benchmarking suites of tasks (datasets) that are curated to reflect important problem domains, practical to use, and sufficiently challenging to support a rigorous analysis of performance results.
Subtle differences in the problem definition, such as the design of the hyperparameter search space or the way time budgets are defined, can drastically alter a task’s difficulty. This issue makes it difficult to reproduce published research and compare results from different papers.

This toolkit aims to address these problems by setting up standardized environments for in-depth experimentation with a wide range of AutoML systems.

Website: https://openml.github.io/automlbenchmark/index.html

Documentation: https://openml.github.io/automlbenchmark/docs/index.html

Installation: https://openml.github.io/automlbenchmark/docs/getting_started/

Features:

Curated suites of benchmarking datasets from OpenML (regression, classification).
Includes code to benchmark a number of popular AutoML systems on regression and classification tasks.
New AutoML systems can be added
Experiments can be run in Docker or Singularity containers
Execute experiments locally or on AWS

automlbenchmark's People

Contributors

Stargazers

Watchers

Forkers

sebhrusen stc-cqupt nofeetbird0321 sprinterzzj nandini269 justinormont emailhy tom-deng ajoeajoe felixneutatz dawncc scape1989 coorsaa jhosoume shashgpt zirui-ray-liu aligapaul boba-and-beer innixma gabikadlecova drpdr dallal9 automl franchuterivera tqcai mljar fmohr haozhestat nanaakwasiabayieboateng parthchudasama mwever stjordanis hp2500 paxcema 7am7 zhuygln ennosigaeon arm7ai tdoublep u1234x1234 nicl-nno sberbank-ai-lab gfournier jimmy-inl christopherbunn pplonski danielschulz arryboom mcx akshayparanjape termit209 niobeus lucasnil qingyun-wu surf-rbood littlelittlecloud ledell xiaowu0162 jaypeeml modep-ai mkyl microprediction drheavy kgm0817 henglicad eddiebergman ngc436 pgijsbers ravinkohli a-hanf tdhooghe dev-rinchin wgova collab-uniba yanping123dfef maxpark tanhimislam metaprov rainbow7788 ramlatchxramspeicher private-oeli robinnibor jeremyliweishih gidler phenopolis lujiaying xiacedar alex-turintech limpbot bingzhaozhu alanwilter alyambr turintech uts-caslab ahmad-abdellatif ayushi-3536 mertcandonmez ddiem-ri-4d trellixvulnteam dangss

automlbenchmark's Issues

Support for OpenML Studies

As mentioned in #17 and #57, now that the benchmark definition file format is very simple, we could create a small script converting OpenML Studies into a benchmark definition to make it easier to manage definitions and maybe as well to upload results.

How to deal with meta-learning

This issue is moved to the new discussion board.

We encourage everyone to partake in this discussion, as it is an important but difficult issue to address.

The Problem

Meta-learning is the act of learning across datasets to generalize to new problems. AutoML frameworks which use meta-learning to aid their optimization gain an unfair advantage if the benchmark problem to solve was present in their meta-learning process.

For instance, auto-sklearn uses results of experiments on a plethora of datasets to warm-start their optimization. It will characterize new problems, relate them to those meta-learning datasets, and propose candidate solutions which worked well for those old problems. If auto-sklearn is asked to solve a problem which it has seen before, it obviously benefits as it has access to experiment data on the same dataset. This is the unfair advantage.

Discussion

The cleanest solution is that we should not allow AutoML framework tools to use any of the problems we selected in their meta-models (or depending on where to place the burden, we have to select datasets which were not used in any system's meta-learning process).
Both stances share the problem that both parties are very interested in using the data. Excluding them from meta-learning would make the AutoML tools worse, while excluding them from the benchmark would make the benchmark worse.

In our ICML 2019 AutoML workshop paper, we did not have a satisfying solution; we merely indicated where auto-sklearn had seen the dataset during meta-learning.

Any proposed solutions should take into account that datasets in this benchmark change over time. This means that a tool which had no overlap in its meta-learning datasets with the benchmark, may have so after additions to the benchmark.

Abolishing meta-learning for the benchmark is also not an option. We want to evaluate the frameworks as a user would use them. Meta-learning has shown it can provide significant improvements (e.g. auto-sklearn, OBOE), and we hope to see further improvements from it in the future. These improvements should also be accurately reflected in our benchmark.

Solutions

I will try to keep this section up-to-date as the discussion grows.

One solution could be to require AutoML frameworks which use meta-learning to expose an interface to easily exclude a specific dataset from the meta-model (referenced by dataset id).

This requires AutoML developers to cross-reference their used data with OpenML (or preferably always use OpenML to source their data).
Not all meta-learning techniques easily exclude results from one dataset. For those that don't, up to (N+1) models need to be trained/maintained (one with all N datasets, and N models excluding one dataset).
Requires the interface to be adopted specifically for (this) benchmark purposes.
This would allow clean evaluation in the AutoML benchmark.

H2Oautoml docker mode error using 3.24.0.1-stable image

Hi,

I pulled the latest h2oautoml docker image from dockerhub and run
python3 runbenchmark.py h2oautoml -m docker -Xdocker.image=automlbenchmark/h2oautoml:3.24.0.1-stable
I'm getting the following error.

Running `h2oautoml` on `test` benchmarks in `docker` mode.
Loading frameworks definitions from /home/ubuntu/automlbenchmark/resources/frameworks.yaml.
Loading benchmark constraint definitions from /home/ubuntu/automlbenchmark/resources/constraints.yaml.
Loading benchmark definitions from /home/ubuntu/automlbenchmark/resources/benchmarks/test.yaml.
Running cmd `docker images -q automlbenchmark/h2oautoml:3.24.0.1-stable`

---------------------------------------------
Starting job docker_test_test_all__H2OAutoML.
Starting docker: docker run --name h2oautoml_test_test_docker_20200227T221812 --shm-size=1024M -v /home/ubuntu/.openml/cache:/input -v /home/ubuntu/automlbenchmark/results:/output -v /home/ubuntu/.config/automlbenchmark:/custom --rm automlbenchmark/h2oautoml:3.24.0.1-stable H2OAutoML test test   -Xseed=auto -i /input -o /output -u /custom -s skip -Xrun_mode=docker .
Datasets are loaded by default from folder /home/ubuntu/.openml/cache.
Generated files will be available in folder /home/ubuntu/automlbenchmark/results.
Running cmd `docker run --name h2oautoml_test_test_docker_20200227T221812 --shm-size=1024M -v /home/ubuntu/.openml/cache:/input -v /home/ubuntu/automlbenchmark/results:/output -v /home/ubuntu/.config/automlbenchmark:/custom --rm automlbenchmark/h2oautoml:3.24.0.1-stable H2OAutoML test test   -Xseed=auto -i /input -o /output -u /custom -s skip -Xrun_mode=docker `
usage: runbenchmark.py [-h] [-m {local,docker,aws}]
                       [-t [task_id [task_id ...]]]
                       [-f [fold_num [fold_num ...]]] [-i input_dir]
                       [-o output_dir] [-u user_dir] [-p parallel_jobs]
                       [-s {auto,skip,force,only}] [-k [true|false]]
                       framework [benchmark]
runbenchmark.py: error: unrecognized arguments: test
Running cmd `docker kill h2oautoml_test_test_docker_20200227T221812`
Error response from daemon: Cannot kill container: h2oautoml_test_test_docker_20200227T221812: No such container: h2oautoml_test_test_docker_20200227T221812

Seems the runbenchmark.py used do not have the [constraint] argument? Is the docker image automlbenchmark/h2oautoml:3.24.0.1-stable outdated?

benchmark.py save logs

Save logs of amazon instances to disk.

Decision tree not possible with docker on aws

Hi there,

This is probably not a usecase that you have tried (or want to support) when trying to run a decision tree with docker enabled this gave an error (see attached logs). I ran it without docker on aws and its working just fine. This is just a heads-up, no actual problem for me :)

runbenchmark_20190701T201906.log

Set limit for EC2 instances

Currently, one instance per job is spawned, at some point that might be too much, so have a parameter specifying how many instances are allowed at the same time.

Easy: Loop over Chunk limit
Harder: Asynchronously start new instances as soon as jobs are done

how to find the model information?

for example , after use autosklearn to train the data from openml, and want to see its finally selected algorithm with its hyper parameters.

TunedRandomForest is having incorrect venv setting?

In TunedRandomForest following line is not properly allowing venv to work:

automlbenchmark/frameworks/TunedRandomForest/setup.sh

Line 3 in 1c1559c

. $HERE/../shared/setup.sh

It should be something like this right:
. $HERE/../shared/setup.sh $HERE

Add support for AILib (aka. ML-Plan)

New OS framework to be added:
https://github.com/fmohr/AILibs

looking for contributors.

Error while using a custom benchmark on AutoWEKA

Hi there,

While using my own set of benchmark datasets to test AutoWEKA on AWS, I get an error that some files are not created. I do not run into this error while running other AutoML methods on the same benchmark set. When running AutoWEKA with other sets (test and 2small ch1) this error does not occur.

Attached is the log file and the benchmark set, the benchmark set is converted to .txt for upload purposes.
benchmarkhealthcare.txt

runbenchmark_20190624T104333.log

Use argparse in benchmark.py

Makes it easier to use

Automatically submit task results to OpenML

We should provide the option to automatically submit results obtained for a given (task, fold, framework) to openml (openml Py client provides OpenMLRun, that we need to publish)

Instance termination is called twice

Better than not at all, but this still results in to many "termination successful"messages

Website with results?

Thank you for this package! I'm working on my AutoML python package and this benchmark suite is very needed.

I have a question, is there any website that publishes results of benchmarking from available frameworks?

Upgrade frameworks to their latest stable version

autosklean: 0.6.0
TPOT: 0.11.1
H2O: 3.30.x
...

Problems caused by different sklearn versions

I am currently trying to implement a new automl approach, but this approach is using scikit-learn v0.21.3.
If I try to run the benchmark I run into errors, because automlbenchmark is requiring scikit-learn v0.19.2.

Is there anything I can do to work around this?

Shut down a node when its job is done.

Currently, nodes only get terminated if the whole chunk is done, if there is a straggler, other nodes will keep running for the same amount of time. It's not a big deal (as there is a timelimit on the stragglers anyway), but it would be a nice improvement.

Upgrade sklearn to be >=0.21

I am currently attempting to integrate AutoGluon into automlbenchmark, however I am facing some issues regarding sklearn version requirements.

If I try to run with sklearn >= 0.21, I get this exception:

Traceback (most recent call last):
  File "../automlbenchmark/runbenchmark.py", line 7, in <module>
    import amlb.logger
  File "/home/ubuntu/workspace/automlbenchmark/amlb/__init__.py", line 8, in <module>
    from .benchmark import Benchmark
  File "/home/ubuntu/workspace/automlbenchmark/amlb/benchmark.py", line 19, in <module>
    from .openml import Openml
  File "/home/ubuntu/workspace/automlbenchmark/amlb/openml.py", line 9, in <module>
    import openml as oml
  File "/home/ubuntu/workspace/autogluon/venv/lib/python3.6/site-packages/openml/__init__.py", line 22, in <module>
    from . import runs
  File "/home/ubuntu/workspace/autogluon/venv/lib/python3.6/site-packages/openml/runs/__init__.py", line 3, in <module>
    from .functions import (run_model_on_task, run_flow_on_task, get_run, list_runs,
  File "/home/ubuntu/workspace/autogluon/venv/lib/python3.6/site-packages/openml/runs/functions.py", line 21, in <module>
    from ..flows import sklearn_to_flow, get_flow, flow_exists, _check_n_jobs, \
  File "/home/ubuntu/workspace/autogluon/venv/lib/python3.6/site-packages/openml/flows/__init__.py", line 3, in <module>
    from .sklearn_converter import sklearn_to_flow, flow_to_sklearn, _check_n_jobs
  File "/home/ubuntu/workspace/autogluon/venv/lib/python3.6/site-packages/openml/flows/sklearn_converter.py", line 20, in <module>
    from sklearn.utils.fixes import signature
ImportError: cannot import name 'signature'

AutoSklearn 0.6.0 also requires sklearn >= 0.21, and latest TPOT requires sklearn >= 0.22

Would it be possible to make automlbenchmark compatible with newer sklearn versions?

Best,
Nick

Invalid bucket name AWS

Hi there,

When trying to run some tests to see if the benchmark is running on AWS. When trying to do so I get the error:
Invalid bucket name "thesisbenchooms\ec2": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$"

The \ec2 part is added automatically to the name I edited in the config file as described in the instructions. For more information the complete error log is copied below.

fatal error: Parameter validation failed: [ 113.140682] cloud-init[1435]: Invalid bucket name "thesisbenchooms\ec2": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" [ 113.629291] cloud-init[1435]: fatal error: Parameter validation failed: [ 113.631229] cloud-init[1435]: Invalid bucket name "thesisbenchooms\ec2": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" [ 114.624355] cloud-init[1435]: Running RandomForeston/s3bucket/user/test.yamlbenchmarks inlocalmode. [ 114.662194] cloud-init[1435]: Loading frameworks definitions from /repo/resources/frameworks.yaml. [ 114.677646] cloud-init[1435]: ERROR: [ 114.678772] cloud-init[1435]: Incorrect benchmark name or path/s3bucket/user/test.yaml, name not available in /repo/resources/benchmarks. [ 115.725435] cloud-init[1435]: Running RandomForeston/s3bucket/user/test.yamlbenchmarks inlocalmode. [ 115.762451] cloud-init[1435]: Loading frameworks definitions from /repo/resources/frameworks.yaml. [ 115.778224] cloud-init[1435]: ERROR: [ 115.778351] cloud-init[1435]: Incorrect benchmark name or path/s3bucket/user/test.yaml, name not available in /repo/resources/benchmarks. [ 116.352902] cloud-init[1435]: upload failed: ../s3bucket/output/logs/runbenchmark_20190618T075011.log to s3://thesisbenchooms\ec2/aws_randomforest_test_20190618T074743\aws_test_all__randomforest\output/logs/runbenchmark_20190618T075011.log Parameter validation failed: [ 116.354415] cloud-init[1435]: Invalid bucket name "thesisbenchooms\ec2": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" [ 116.362361] cloud-init[1435]: Completed 8.9 KiB/~9.3 KiB (0 Bytes/s) with ~1 file(s) remaining (calculating...) upload failed: ../s3bucket/output/logs/runbenchmark_20190618T075011_full.log to s3://thesisbenchooms\ec2/aws_randomforest_test_20190618T074743\aws_test_all__randomforest\output/logs/runbenchmark_20190618T075011_full.log Parameter validation failed: [ 116.363474] cloud-init[1435]: Invalid bucket name "thesisbenchooms\ec2": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" [ 116.373777] cloud-init[1435]: Completed 9.3 KiB/~18.4 KiB (0 Bytes/s) with ~1 file(s) remaining (calculating...) upload failed: ../s3bucket/output/logs/runbenchmark_20190618T075012.log to s3://thesisbenchooms\ec2/aws_randomforest_test_20190618T074743\aws_test_all__randomforest\output/logs/runbenchmark_20190618T075012.log Parameter validation failed: [ 116.373937] cloud-init[1435]: Invalid bucket name "thesisbenchooms\ec2": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" [ 116.386116] cloud-init[1435]: Completed 18.4 KiB/~18.4 KiB (0 Bytes/s) with ~0 file(s) remaining (calculating...) upload failed: ../s3bucket/output/logs/runbenchmark_20190618T075012_full.log to s3://thesisbenchooms\ec2/aws_randomforest_test_20190618T074743\aws_test_all__randomforest\output/logs/runbenchmark_20190618T075012_full.log Parameter validation failed: [ 116.386213] cloud-init[1435]: Invalid bucket name "thesisbenchooms\ec2": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" [ 116.386264] cloud-init[1435]: Completed 18.7 KiB/~18.7 KiB (0 Bytes/s) with ~0 file(s) remaining (calculating...)

Default local logs folder is not automatically created

When running the app for the first time without specifying anu output directory (-o, --output CLI params), the defaults logs directory is not automatically created anymore, causing the script to fail:

Traceback (most recent call last):
  File "runbenchmark.py", line 69, in <module>
    root_level='INFO', app_level='DEBUG', console_level='INFO', print_to_log=True)
  File "/Users/seb/repos/ml/automlbenchmark/automl/logger.py", line 68, in setup
    app_handler = logging.FileHandler(log_file, mode='a')
  File "/Users/seb/.pyenv/versions/3.7.3/lib/python3.7/logging/__init__.py", line 1092, in __init__
    StreamHandler.__init__(self, self._open())
  File "/Users/seb/.pyenv/versions/3.7.3/lib/python3.7/logging/__init__.py", line 1121, in _open
    return open(self.baseFilename, self.mode, encoding=self.encoding)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/seb/repos/ml/automlbenchmark/logs/runbenchmark_20190619T111925.log'

Subprocess running exec_run died abruptly

**** Random Forest (sklearn) ****

Running RandomForest with a maximum time of 3600s on 2 cores.
We completely ignore the requirement to stay within the time limit.
We completely ignore the advice to optimize towards metric: auc.
CPU Utilization: 48.4%
Memory Usage: 9.9%
Disk Usage: 59.0%
CPU Utilization: 100.0%
Memory Usage: 20.7%
Disk Usage: 59.0%
CPU Utilization: 100.0%
Memory Usage: 30.8%
Disk Usage: 59.0%
CPU Utilization: 100.0%
Memory Usage: 42.5%
Disk Usage: 59.0%
CPU Utilization: 100.0%
Memory Usage: 45.9%
Disk Usage: 59.0%
CPU Utilization: 100.0%
Memory Usage: 91.3%
Disk Usage: 59.0%
Subprocess running exec_run died abruptly.
Traceback (most recent call last):
File "/content/automls/automlbenchmark/automl/utils.py", line 698, in call_in_subprocess
result = q.get_nowait()
File "/usr/lib/python3.6/multiprocessing/queues.py", line 126, in get_nowait
return self.get(False)
File "/usr/lib/python3.6/multiprocessing/queues.py", line 107, in get
raise Empty
queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/content/automls/automlbenchmark/automl/benchmark.py", line 371, in run
meta_result = framework.run(self._dataset, task_config)
File "/content/automls/automlbenchmark/frameworks/RandomForest/init.py", line 10, in run
return call_in_subprocess(exec_run, *args, **kwargs)
File "/content/automls/automlbenchmark/automl/utils.py", line 704, in call_in_subprocess
raise Exception("Subprocess running {} died abruptly.".format(target.name))
Exception: Subprocess running exec_run died abruptly.
Metric scores: { 'acc': nan,
'auc': nan,
'duration': nan,
'fold': 0,
'framework': 'RandomForest',
'id': 'openml.org/t/189354',
'info': 'Exception: Subprocess running exec_run died abruptly.',
'mode': 'local',
'models': nan,
'params': "{'n_estimators': 2000}",
'result': nan,
'seed': 3104790822,
'tag': 'stable',
'task': 'Airlines',
'utc': '2019-07-18T18:37:20',
'version': '0.19.2'}
Job local_Airlines_0_RandomForest executed in 748.140 seconds.

Starting job local_Airlines_1_RandomForest.
Assigning 2 cores (total=2) for new task Airlines.
Assigning 9945 MB (total=13023 MB) for new Airlines task.
Running task Airlines on framework RandomForest with config:
TaskConfig(framework='RandomForest', framework_params={'n_estimators': 2000}, type='classification', name='Airlines', fold=1, metrics=['auc', 'acc'], metric='auc', seed=3104790823, max_runtime_seconds=3600, cores=2, max_mem_size_mb=9945, min_vol_size_mb=-1, input_dir='/root/.openml/cache', output_dir='/content/automls/automlbenchmark/results/local_randomforest_openml_20190718T163809', output_predictions_file='/content/automls/automlbenchmark/results/local_randomforest_openml_20190718T163809/predictions/randomforest_Airlines_1.csv')

**** Random Forest (sklearn) ****

Running RandomForest with a maximum time of 3600s on 2 cores.
We completely ignore the requirement to stay within the time limit.
We completely ignore the advice to optimize towards metric: auc.
CPU Utilization: 80.5%
Memory Usage: 12.8%
Disk Usage: 59.1%
CPU Utilization: 99.9%
Memory Usage: 23.4%
Disk Usage: 59.1%
CPU Utilization: 100.0%
Memory Usage: 35.9%
Disk Usage: 59.1%
CPU Utilization: 100.0%
Memory Usage: 48.7%
Disk Usage: 59.1%
CPU Utilization: 100.0%
Memory Usage: 57.2%
Disk Usage: 59.1%
Subprocess running exec_run died abruptly.
Traceback (most recent call last):
File "/content/automls/automlbenchmark/automl/utils.py", line 698, in call_in_subprocess
result = q.get_nowait()
File "/usr/lib/python3.6/multiprocessing/queues.py", line 126, in get_nowait
return self.get(False)
File "/usr/lib/python3.6/multiprocessing/queues.py", line 107, in get
raise Empty
queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/content/automls/automlbenchmark/automl/benchmark.py", line 371, in run
meta_result = framework.run(self._dataset, task_config)
File "/content/automls/automlbenchmark/frameworks/RandomForest/init.py", line 10, in run
return call_in_subprocess(exec_run, *args, **kwargs)
File "/content/automls/automlbenchmark/automl/utils.py", line 704, in call_in_subprocess
raise Exception("Subprocess running {} died abruptly.".format(target.name))
Exception: Subprocess running exec_run died abruptly.
Metric scores: { 'acc': nan,
'auc': nan,
'duration': nan,
'fold': 1,
'framework': 'RandomForest',
'id': 'openml.org/t/189354',
'info': 'Exception: Subprocess running exec_run died abruptly.',
'mode': 'local',
'models': nan,
'params': "{'n_estimators': 2000}",
'result': nan,
'seed': 3104790823,
'tag': 'stable',
'task': 'Airlines',
'utc': '2019-07-18T18:47:53',
'version': '0.19.2'}
Job local_Airlines_1_RandomForest executed in 632.856 seconds.

about overfit or not

How can I interpret the result, for example , run 10 folds using autosklearn train on adult dataset , and config max_runtime_seconds one hour, may be two hour per fold, How can I know if this result overfit or not as i have no idea of learning curve?

Adding Singularity Support

Motivation:
Today the framework supports docker as a container solution. Some servers particularly only support singularity (no docker installed).
Also, singularity has this nice feature that when installed, there is no superuser permission required to run. In contrast, docker usually requires sudo unless some group handling is performed.

But in short, our biggest motivation is to be able to run replicable-isolated runs in servers that only support singularity.

Expected Support:
The idea is to provide a solution that is equivalent to the current docker support. Automlbenchmark team suggested developing a container abstraction to facilitate support. Singularity supports running images from docker hub, so we would only require docker hub to be updated appropriately (so no new requirement here).

At the end, the main script should change such -m option supports singularity as well.

Tag docker images should include automlbenchmark version

Docker images are currently tagged with {framework.version}-{automlbenchmark.branch}, e.g.: automlbenchmark/autosklearn:0.5.1-stable.

However in config.yaml, due to default
project_repository: https://github.com/openml/automlbenchmark#stable
the {automlbenchmark.branch} is set to stable, not providing meaningful information about the app version used by the docker image.

We can either change the default project_repository branch to reflect the current stable version (v0.9, v1.0.1) or do this dynamically when building the docker image using a git command.

AWS instance running out of memory

When trying to reproduce your benchmark test using my own datasets on the m5.2xlarge instances on aws for 4 hours with 10k cross validation run into the AWS error OSError: [Errno 28] No space left on device for Auto-WEKA and autosklearn (see logs). My benchmark file is also attached. Hyperopt and TPOT seem to function pretty well. Both autoweka and autosklearn have functioned properly on the test runs on the free tier of AWS. Is there something wrong in my configuration? I changed the amount of cores from 2 to 8 in my benchmark test file, due to the machinery upgrade.

benchmarkhealthcare.txt

runbenchmark_20190625T134539.log
runbenchmark_20190625T122444.log

Problem With Singularity Pull Command

There is an issue with amlb/singularity.py, likely produced by a name change.

I think when doing the merge, you guys saw fit to rename:

def _container_image_name(self, branch=None, return_docker_name=False):

to be:

image_name(cls, framework_def, branch=None, as_docker_image=False, **kwargs):

We need to replace all occurrences of return_docker_name to be as_docker_image. For instance

automlbenchmark/amlb/singularity.py

Line 130 in 2842df7

image=self._container_image_name(return_docker_name=True),

As of now, pulling from a singularity command fails and is masked by the exception catching.

Validate or sanitize task names when loading benchmark definition file

Task names are used to generate the name of prediction files and others.

In this regard, we should ensure that they don't contain invalid characters: especially it is recommended to automatically replace space char by hyphen or underscore to avoid any issue in frameworks like AutoWEKA where the prediction file is passed as a param.

Specify query frequency to AWS

is currently hardcoded to 10 seconds

Docker image can't be build

Hi there,

Thank you for your contribution to AutoML benchmarking, it is really good that there is a univeral standandard now. I just cloned the git repository and tried to run a benchmark using the 'docker' mode. I get the following error:

ERROR: Docker image can't be built as current branch is not tagged as requiredstable. Please switch to the expected tagged branch first.

I'm using the v1.0 branch. Could you please mark this one stable?

Submit existing task results to OpenML

Once we have implemented the logic to automatically upload new results to OpenML (#44), we should be able to add a script to publish results a posteriori and use it to publish our existing results from https://github.com/openml/automlbenchmark/tree/master/reports.

Error running pip install -r requirements.txt

I can't seem to run the standard installation and encountered the following error when I was running pip install -r requirements.txt
----------------------------------------
cwd: /tmp/pip-install-cw6vw0sa/openml/
Complete output (1 lines):
numpy is required during installation
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

To reproduce:

git clone https://github.com/openml/automlbenchmark.git
cd automlbenchmark
python3 -m venv ./venv
source venv/bin/activate
pip3 install -r requirements.txt

I currently have the following:

Python 3.7.5
pip 19.3.1

I was able to reproduce this on my Windows computer and on my mac.

Add support for GAMA

New AutoML framework: https://pgijsbers.github.io/gama/

Upgrade oboe support to latest version (>12.07.2019)

Oboe fixed issues that prevented oboe from producing a single model in the given time constraints on several datasets: https://github.com/udellgroup/oboe
We need to apply the upgrade, build a new docker image and rerun the benchmarks for oboe.

Suggestion: rename top-level Python package to something else

automl is very generic, especially if you want to use against other frameworks which might also have automl name. :-)

Add support for AutoXGBoost

We can now add @ja-thomas's toolkit to our framework list:
https://github.com/ja-thomas/autoxgboost

and run existing benchmarks against it once integration is stable.

Standard procedures for dealing with missing values

Hi,
I have found a hint in datautils.py, that missing values are "masked" by default. But how are they masked?

Thank you in advance

Add support for NNI

AutoML toolkit for hyperparameter tuning, NAS and model compression: https://github.com/Microsoft/nni

Singularity PWD for docker conversion

What?

One of the features of singularity is to convert docker images to singularity images. This is useful for us, because the automlbenchmarck team generates docker images which we can use in our cluster machines that only have singularity.

Seems like there is a bug when translating the docker image to a singularity image, documented here sylabs/singularity#380

Request:

add a --pwd option to singularity
from:

automlbenchmark/amlb/singularity.py

Line 97 in e2d8ec3

"singularity run {options} "

to:
cmd = (
"singularity run --pwd /bench {options} "

Why wasn't this seen before?

I notice this behavior when adding a new framework using the virtual environment for python based frameworks. I don't know what exactly triggers the WORKDIR directive of docker not to be followed by singularity, but the --pwd option is a robust WA to be transparent to this problem.

I am sorry for the trouble, this wasn't seen before in our testing. I can create a push request for this if you want. Please let me know.

Add time limit to AWS benchmarks

Shut down instances after a given time, even if they do not return results.

Decouple resources from tasks in the benchmark

I think coupling the resources and tasks into a single file is not optimal.
This leads to a lot of duplicate benchmarks where only the resources specifications differ (see our current resources/benchmarks folder).
At the very least I think we should have separate files specifying resources and collections of tasks.
Perhaps we should only have files specifying the resources (if at all, since that would be easy to do from the command line too), and use OpenML Studies to specify tasks.

The upsides would be:

easier to add a dataset: modify only one file if not using OpenML study, or none if we are.
rerunning a set of tasks with different resources would not require a new file.

Downsides:

no automatic public history of tasks added (if through an OpenML study).
not possibly to add arbitrary evaluation metrics (not possible in the current setup either, but would be easier to add as it would not rely on an OpenML implementation for it). I wouldn't be sure how to, if ever, allow arbitrary evaluation metrics on OpenML (as you would need to be able to run them in any of the supported languages).

@joaquinvanschoren are there any plans for something like versions for studies? That could negate one of the downsides.

Object 'Benchmark' has no attribute 'SetupMode'

Hi,

I run h2oautoml in docker mode
python3 runbenchmark.py h2oautoml test -m docker
and get the following error

Running `h2oautoml` on `test` benchmarks in `docker` mode.
Loading frameworks definitions from /home/ubuntu/automlbenchmark/resources/frameworks.yaml.
Loading benchmark constraint definitions from /home/ubuntu/automlbenchmark/resources/constraints.yaml.
Loading benchmark definitions from /home/ubuntu/automlbenchmark/resources/benchmarks/test.yaml.
type object 'Benchmark' has no attribute 'SetupMode'
Traceback (most recent call last):
  File "runbenchmark.py", line 126, in <module>
    bench.setup(amlb.SetupMode[args.setup])
  File "/home/ubuntu/automlbenchmark/amlb/docker.py", line 54, in setup
    if mode == Benchmark.SetupMode.skip:
AttributeError: type object 'Benchmark' has no attribute 'SetupMode'

Seems in docker.py there is a bug. I tried to import SetupMode and using SetupMode instead of Benchmark.SetupMode in the code and it works.
Just want to mention this bug here. If help is needed, I can do a pr related to this.

Add AutoGluon framework

https://github.com/awslabs/autogluon
Last stable version seems to be v0.0.6.

@Innixma do you have any concern if we want to make Autogluon publicly available in the benchmark application?

About frameworks plan

frameworks worth mentioning as you plan to add:

oboe added
autoxgboost no
hyperopt-sklearn added
ML-Plan no

I just want to ask when autoxgboost are added as you plan.

and give some info about “ModuleNotFoundError: No module named '_regression’ ”

OpenML library upgrade

Broken links in README

Several links in the README are broken. E.g. the one on documenting how to add a system.

Collect log files when running on AWS

useful for debugging

Support for non-openML datasets

now, raw dataset are not supported. Are there any plans to support it?

Automate testing / version upgrade / docker images uploads

To ensure that we don't break the app with future changes, we should automate some basic testing/verification tasks.
I suggest the following:

fresh git clone of the repo
fresh Py setup: virtual env + pip install -r requirements.txt
python3 runbenchmark.py constantpredictor test
python3 runbenchmark.py constantpredictor test -m aws
for each framework
    python3 runbenchmark.py framework test -m docker

and for each run, verify that it produces successful results.

The first local run is very fast and will detect basic broken features immediately.
The AWS run is also relatively fast as we just want to test basic AWS support: no need to run all frameworks there.
Running docker mode for each framework though is pretty slow and can't be done in parallel as it is cpu intensive (would need multiple machines, not worth it): but this would properly test frameworks setup and run them against the simple test benchmark.

This kind of script, at the price of additional params, could also be used to "release" the app in case of success:

tag the branch (new version + stable).
push the new tags.
push the new docker images to docker repo.

Setup: simplify install + cleanup/standardize usage of virtual environments by the app.

Currently virtual envs appear in various places:

README recommends users to create their own virtual env under automlbenchmark.
docker and aws images create a virtual env for the app under /venvs/bench.
frameworks/shared/setup.sh offers the possibility to create a virtual env under frameworks/{framework}/venv dedicated to the framework for a better isolation.

The last one should stay as is.
However we could standardize the first 2 by creating a setup.py that would automatically create and activate a local virtual environment under automlbenchmark/venv + install dependencies, and use this setup script in docker + aws setup.

Library should then be ready to use after simply running:
pip install .

Add new framework: mljar-supervised

I'm working on AutoML python package: mljar-supervised. I would like to add it to automlbenchmark.

I successfully run through 1-9 points from HOWTO::Add default framework

I stuck at 10 (docker integration) and 11 (aws).

For docker I susspect I had problem with privilages:

python3 runbenchmark.py supervised -m docker
Running `supervised` on `test` benchmarks in `docker` mode.
Loading frameworks definitions from /home/piotr/sandbox/automlbenchmark/resources/frameworks.yaml.
Loading benchmark constraint definitions from /home/piotr/sandbox/automlbenchmark/resources/constraints.yaml.
Loading benchmark definitions from /home/piotr/sandbox/automlbenchmark/resources/benchmarks/test.yaml.
Running cmd `docker images -q automlbenchmark/supervised:0.2.6-stable`
WARNING: Error loading config file: /home/piotr/.docker/config.json: stat /home/piotr/.docker/config.json: permission denied
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.38/images/json?filters=%7B%22reference%22%3A%7B%22automlbenchmark%2Fsupervised%3A0.2.6-stable%22%3Atrue%7D%7D: dial unix /var/run/docker.sock: connect: permission denied

For aws I got a lot of loga and then errors:

[  119.518648] cloud-init[1475]:     Running setup.py install for openml: finished with status 'done'
[  119.545129] cloud-init[1475]: Successfully installed Babel-2.8.0 PyYAML-5.3.1 attrs-19.3.0 certifi-2020.4.5.1 chardet-3.0.4 debtcollector-2.0.1 decorator-4.4.2 fasteners-0.15 idna-2.9 importlib-metadata-1.6.0 ipython-genutils-0.2.0 iso8601-0.1.12 jsonschema-3.2.0 jupyter-core-4.6.3 mock-4.0.2 monotonic-1.5 nbformat-5.0.6 netaddr-0.7.19 netifaces-0.10.9 nose-1.3.7 openml-0.7.0 oslo.concurrency-4.0.2 oslo.config-8.0.2 oslo.i18n-4.0.1 oslo.utils-4.1.1 pbr-5.4.5 pyparsing-2.4.7 pyrsistent-0.16.0 requests-2.23.0 rfc3986-1.4.0 stevedore-1.32.0 traitlets-4.3.3 wrapt-1.12.1 xmltodict-0.12.0 zipp-3.1.0
[  122.904087] cloud-init[1475]: Running `supervised` on `test` benchmarks in `local` mode.
[  122.935523] cloud-init[1475]: Loading frameworks definitions from /repo/resources/frameworks.yaml.
[  122.950187] cloud-init[1475]: ERROR:
[  122.952154] cloud-init[1475]: Incorrect framework `supervised`: not listed in /repo/resources/frameworks.yaml.
[  123.826013] cloud-init[1475]: Running `supervised` on `test` benchmarks in `local` mode.
[  123.858485] cloud-init[1475]: Loading frameworks definitions from /repo/resources/frameworks.yaml.
[  123.873685] cloud-init[1475]: ERROR:
[  123.873797] cloud-init[1475]: Incorrect framework `supervised`: not listed in /repo/resources/frameworks.yaml.
[  125.065343] cloud-init[1475]: upload failed: ../s3bucket/output/logs/runbenchmark_20200421T132409.log to s3://automl-benchmark/ec2/supervised_test_test_aws_20200421T131523/aws_test_test_all__supervised/output/logs/runbenchmark_20200421T132409.log An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
[  125.085960] cloud-init[1475]: Completed 17.1 KiB/34.9 KiB (0 Bytes/s) with 3 file(s) remaining
upload failed: ../s3bucket/output/logs/runbenchmark_20200421T132410_full.log to s3://automl-benchmark/ec2/supervised_test_test_aws_20200421T131523/aws_test_test_all__supervised/output/logs/runbenchmark_20200421T132410_full.log An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
[  125.099099] cloud-init[1475]: Completed 17.4 KiB/34.9 KiB (0 Bytes/s) with 2 file(s) remaining
upload failed: ../s3bucket/output/logs/runbenchmark_20200421T132409_full.log to s3://automl-benchmark/ec2/supervised_test_test_aws_20200421T131523/aws_test_test_all__supervised/output/logs/runbenchmark_20200421T132409_full.log An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
[  125.104312] cloud-init[1475]: Completed 17.7 KiB/34.9 KiB (0 Bytes/s) with 1 file(s) remaining
upload failed: ../s3bucket/output/logs/runbenchmark_20200421T132410.log to s3://automl-benchmark/ec2/supervised_test_test_aws_20200421T131523/aws_test_test_all__supervised/output/logs/runbenchmark_20200421T132410.log An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
ci-info: no authorized ssh keys fingerprints found for user ubuntu.

My code is here:
https://github.com/pplonski/automlbenchmark/tree/master/frameworks/supervised

I would ask for some tips, what can be a problem with aws setup? (I think I will handle docker problem myself)

openml / automlbenchmark Goto Github PK

automlbenchmark's Introduction

AutoML Benchmark

Features:

automlbenchmark's People

Contributors

Stargazers

Watchers

Forkers

automlbenchmark's Issues

This issue is moved to the new discussion board.

The Problem

Discussion

Solutions

What?

Request:

Why wasn't this seen before?

Recommend Projects

Recommend Topics

Recommend Org