udellgroup / oboe Goto Github PK

View Code? Open in Web Editor NEW

82.0 82.0 16.0 332.31 MB

An AutoML pipeline selection system to quickly select a promising pipeline for a new dataset.

License: BSD 3-Clause "New" or "Revised" License

Python 97.78% Shell 2.22%

automl collaborative-filtering ml-pipelines

oboe's People

Contributors

Stargazers

Watchers

Forkers

ledell ml-lab oneteam1912 smokerx wh-forker schko chang111 mbahmani kiminh emailhy liangcao2018 chaoyue729 enyazz alanwilter phymucs harel-coffee

oboe's Issues

Number of datasets and ML algorithms, preprocessing methods!

Hi,
I have a few questions that I could not find them on the paper.

How many datasets did you use for meta-learning?
How many machine learning algorithms did you use for meta-learning? (it seems 12)
and Also regard to preprocessing and ensembling could explain how it works in oboe(Please talk specifically about statistics)?
That would be very nice if you could mention this type of information.

Training estimation of performance

Greetings,

I am interested in using OBOE for a publication, in which I am also reporting the training performance estimation. From the examples I found here, I see no way to retrieve this estimation. Am I missing something, or has it not been implemented yet?

Thank in advance.

AutoLearner crashes on Windows

Hello,

I wanted to try the example from the Readme (Python3.8, Windows 10). Unfortunately, an error occurred when running it under Windows:

...

File ~\miniconda3\envs\automl\lib\site-packages\oboe\auto_learner.py:749, in AutoLearner.fit.<locals>.time_limit(seconds)
    747 def signal_handler(signum, frame):
    748     raise TimeoutException("Time limit reached.")
--> 749 signal.signal(signal.SIGALRM, signal_handler)
    750 signal.alarm(seconds)
    751 try:

AttributeError: module 'signal' has no attribute 'SIGALRM'

As it looks Windows does not implement that signal (see https://stackoverflow.com/questions/52779920/why-is-signal-sigalrm-not-working-in-python-on-windows)

Maybe you could find a platform independent solution?

the shape of ERROR_TENSOR

Hi, I read the file error_tensor.npy and find that the shape of ERROR_TENSOR is (215, 4, 2, 8, 183).

After computing, I find the number of standardizer is 2; the number of dim_reducer is 8; the number of estimator is 183; maybe the number of dataset is 215. So, what is the number in the shape 4 means?

Here are infos in classification.json

{
"imputer":
{"algorithms": ["SimpleImputer"],
"hyperparameters": {
    "SimpleImputer": {"strategy": ["mean", "median", "most_frequent", "constant"]}
}},
 "encoder": 
 {"algorithms": [null, "OneHotEncoder"],
 "hyperparameters": {
     "OneHotEncoder": {"handle_unknown": ["ignore"], "sparse": [0]}    
 }},
  "standardizer":
  {"algorithms": [null, "StandardScaler"],
  "hyperparameters": {
  "StandardScaler": {}
 }},
    
    "dim_reducer":
    {"algorithms": [null, "PCA", "VarianceThreshold", "SelectKBest"],
    "hyperparameters": {
    "PCA": {"n_components": ["25%", "50%", "75%"]},
    "VarianceThreshold": {},
    "SelectKBest": {"k": ["25%", "50%", "75%"]}
 }}, 
"estimator":
{"algorithms": ["KNN", "DT", "RF", "GBT", "AB", "lSVM", "Logit", "Perceptron", "GNB", "MLP", "ExtraTrees"], 
 "hyperparameters": {
     "KNN": {"n_neighbors": [1, 3, 5, 7, 9, 11, 13, 15], "p": [1, 2]}, 
     "DT": {"min_samples_split": [2,4,8,16,32,64,128,256,512,1024,0.01,0.001,0.0001,1e-05]}, 
     "RF": {"min_samples_split": [2,4,8,16,32,64,128,256,512,1024,0.1,0.01,0.001,0.0001,1e-05], "criterion": ["gini", "entropy"]}, 
     "GBT": {"learning_rate": [0.001,0.01,0.025,0.05,0.1,0.25,0.5], "max_depth": [3, 6], "max_features": [null, "log2"]}, 
     "AB": {"n_estimators": [50, 100], "learning_rate": [1.0, 1.5, 2.0, 2.5, 3.0]}, 
     "lSVM": {"C": [0.125,0.25,0.5,0.75,1,2,4,8,16]},
     "Logit": {"C": [0.25,0.5,0.75,1,1.5,2,3,4], "solver": ["liblinear", "saga"], "penalty": ["l1", "l2"]}, 
     "Perceptron": {}, 
     "GNB": {}, 
     "MLP": {"learning_rate_init": [0.0001,0.001,0.01], "learning_rate": ["adaptive"], "solver": ["sgd", "adam"], "alpha": [0.0001, 0.01]}, 
     "ExtraTrees": {"min_samples_split": [2,4,8,16,32,64,128,256,512,1024,0.1,0.01,0.001,0.0001,1e-05], "criterion": ["gini", "entropy"]}
 }}
}

Documentation

Hello,
Thanks for your nice framework!
I would like to ask when you release documentation for putting this framework in action and more useability!?

Does not install properly through pip

Hello,

As part of a project, I am trying to fix up some things with automlbenchmark and from there realized oboe does not install correctly.

Reproducible code (sample shown in README.md)

method = 'Oboe' 
problem_type = 'classification'

from auto_learner import AutoLearner
import numpy as np

m = AutoLearner(p_type=problem_type, runtime_limit=30, method=method, verbose=False)

Error log:

Traceback (most recent call last):
  File "test_case.py", line 6, in <module>
    m = AutoLearner(p_type=problem_type, runtime_limit=30, method=method, verbose=False)
  File "/home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe/auto_learner.py", line 87, in __init__
    with open(os.path.join(DEFAULTS, p_type + '.json')) as file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe/defaults/Oboe/classification.json'

Looking at /home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe/defaults/Oboe/classification.json I can see that pip didn't include everything

ls /home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe
__pycache__  ..               convex_opt.py  experiment_design.py  __init__.py  model.py     preprocessing.py
.            auto_learner.py  ensemble.py    generate_vector.py    linalg.py    pipeline.py  util.py

As you can see, the defaults folder and deeper is not included. Upon inspecting these folders, it seems it's because these are not python modules (lacking the __init__.py file). There for your setup.py would have to be modified according to this stack overflow answer

More datasets that in the paper?

I was checking the tensor on the repository "oboe/large_files/error_tensor_f16_compressed.npz", and I noticed there are 551 datasets, while in the [paper] (https://people.ece.cornell.edu/cy/_papers/tensor_oboe.pdf) you mentioned only 215 for meta-training. Did you add more? Moreover, is it possible to get the meta-features of these 551 datasets? Or how do you compute the best initializations when meta-learning with Auto-sklearn?

Thanks!

AutoLearner crashes when using a dataset with 'NaN' values

Hi,

I wanted to use AutoLearner with method="Oboe", for the OpenML dataset=168868, but it fails because it has NaN values. Do you maybe know why?

udellgroup / oboe Goto Github PK

oboe's People

Contributors

Stargazers

Watchers

Forkers

oboe's Issues

Number of datasets and ML algorithms, preprocessing methods!

Training estimation of performance

AutoLearner crashes on Windows

the shape of ERROR_TENSOR

Documentation

Does not install properly through pip

Reproducible code (sample shown in README.md)

More datasets that in the paper?

AutoLearner crashes when using a dataset with 'NaN' values

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent