udellgroup / oboe Goto Github PK
View Code? Open in Web Editor NEWAn AutoML pipeline selection system to quickly select a promising pipeline for a new dataset.
License: BSD 3-Clause "New" or "Revised" License
An AutoML pipeline selection system to quickly select a promising pipeline for a new dataset.
License: BSD 3-Clause "New" or "Revised" License
Hi,
I have a few questions that I could not find them on the paper.
Greetings,
I am interested in using OBOE for a publication, in which I am also reporting the training performance estimation. From the examples I found here, I see no way to retrieve this estimation. Am I missing something, or has it not been implemented yet?
Thank in advance.
Hello,
I wanted to try the example from the Readme (Python3.8, Windows 10). Unfortunately, an error occurred when running it under Windows:
...
File ~\miniconda3\envs\automl\lib\site-packages\oboe\auto_learner.py:749, in AutoLearner.fit.<locals>.time_limit(seconds)
747 def signal_handler(signum, frame):
748 raise TimeoutException("Time limit reached.")
--> 749 signal.signal(signal.SIGALRM, signal_handler)
750 signal.alarm(seconds)
751 try:
AttributeError: module 'signal' has no attribute 'SIGALRM'
As it looks Windows does not implement that signal (see https://stackoverflow.com/questions/52779920/why-is-signal-sigalrm-not-working-in-python-on-windows)
Maybe you could find a platform independent solution?
Hi, I read the file error_tensor.npy
and find that the shape of ERROR_TENSOR is (215, 4, 2, 8, 183)
.
After computing, I find the number of standardizer
is 2; the number of dim_reducer
is 8; the number of estimator
is 183; maybe the number of dataset
is 215. So, what is the number in the shape 4
means?
Here are infos in classification.json
{
"imputer":
{"algorithms": ["SimpleImputer"],
"hyperparameters": {
"SimpleImputer": {"strategy": ["mean", "median", "most_frequent", "constant"]}
}},
"encoder":
{"algorithms": [null, "OneHotEncoder"],
"hyperparameters": {
"OneHotEncoder": {"handle_unknown": ["ignore"], "sparse": [0]}
}},
"standardizer":
{"algorithms": [null, "StandardScaler"],
"hyperparameters": {
"StandardScaler": {}
}},
"dim_reducer":
{"algorithms": [null, "PCA", "VarianceThreshold", "SelectKBest"],
"hyperparameters": {
"PCA": {"n_components": ["25%", "50%", "75%"]},
"VarianceThreshold": {},
"SelectKBest": {"k": ["25%", "50%", "75%"]}
}},
"estimator":
{"algorithms": ["KNN", "DT", "RF", "GBT", "AB", "lSVM", "Logit", "Perceptron", "GNB", "MLP", "ExtraTrees"],
"hyperparameters": {
"KNN": {"n_neighbors": [1, 3, 5, 7, 9, 11, 13, 15], "p": [1, 2]},
"DT": {"min_samples_split": [2,4,8,16,32,64,128,256,512,1024,0.01,0.001,0.0001,1e-05]},
"RF": {"min_samples_split": [2,4,8,16,32,64,128,256,512,1024,0.1,0.01,0.001,0.0001,1e-05], "criterion": ["gini", "entropy"]},
"GBT": {"learning_rate": [0.001,0.01,0.025,0.05,0.1,0.25,0.5], "max_depth": [3, 6], "max_features": [null, "log2"]},
"AB": {"n_estimators": [50, 100], "learning_rate": [1.0, 1.5, 2.0, 2.5, 3.0]},
"lSVM": {"C": [0.125,0.25,0.5,0.75,1,2,4,8,16]},
"Logit": {"C": [0.25,0.5,0.75,1,1.5,2,3,4], "solver": ["liblinear", "saga"], "penalty": ["l1", "l2"]},
"Perceptron": {},
"GNB": {},
"MLP": {"learning_rate_init": [0.0001,0.001,0.01], "learning_rate": ["adaptive"], "solver": ["sgd", "adam"], "alpha": [0.0001, 0.01]},
"ExtraTrees": {"min_samples_split": [2,4,8,16,32,64,128,256,512,1024,0.1,0.01,0.001,0.0001,1e-05], "criterion": ["gini", "entropy"]}
}}
}
Hello,
Thanks for your nice framework!
I would like to ask when you release documentation for putting this framework in action and more useability!?
Hello,
As part of a project, I am trying to fix up some things with automlbenchmark and from there realized oboe does not install correctly.
method = 'Oboe'
problem_type = 'classification'
from auto_learner import AutoLearner
import numpy as np
m = AutoLearner(p_type=problem_type, runtime_limit=30, method=method, verbose=False)
Error log:
Traceback (most recent call last):
File "test_case.py", line 6, in <module>
m = AutoLearner(p_type=problem_type, runtime_limit=30, method=method, verbose=False)
File "/home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe/auto_learner.py", line 87, in __init__
with open(os.path.join(DEFAULTS, p_type + '.json')) as file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe/defaults/Oboe/classification.json'
Looking at /home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe/defaults/Oboe/classification.json
I can see that pip didn't include everything
ls /home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe
__pycache__ .. convex_opt.py experiment_design.py __init__.py model.py preprocessing.py
. auto_learner.py ensemble.py generate_vector.py linalg.py pipeline.py util.py
As you can see, the defaults
folder and deeper is not included. Upon inspecting these folders, it seems it's because these are not python modules (lacking the __init__.py
file). There for your setup.py
would have to be modified according to this stack overflow answer
I was checking the tensor on the repository "oboe/large_files/error_tensor_f16_compressed.npz", and I noticed there are 551 datasets, while in the [paper] (https://people.ece.cornell.edu/cy/_papers/tensor_oboe.pdf) you mentioned only 215 for meta-training. Did you add more? Moreover, is it possible to get the meta-features of these 551 datasets? Or how do you compute the best initializations when meta-learning with Auto-sklearn?
Thanks!
Hi,
I wanted to use AutoLearner with method="Oboe", for the OpenML dataset=168868, but it fails because it has NaN values. Do you maybe know why?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.