Coder Social home page Coder Social logo

udellgroup / oboe Goto Github PK

View Code? Open in Web Editor NEW
82.0 82.0 16.0 332.31 MB

An AutoML pipeline selection system to quickly select a promising pipeline for a new dataset.

License: BSD 3-Clause "New" or "Revised" License

Python 97.78% Shell 2.22%
automl collaborative-filtering ml-pipelines

oboe's People

Contributors

chengrunyang avatar yujiakimoto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oboe's Issues

Number of datasets and ML algorithms, preprocessing methods!

Hi,
I have a few questions that I could not find them on the paper.

  1. How many datasets did you use for meta-learning?
  2. How many machine learning algorithms did you use for meta-learning? (it seems 12)
  3. and Also regard to preprocessing and ensembling could explain how it works in oboe(Please talk specifically about statistics)?
    That would be very nice if you could mention this type of information.

Training estimation of performance

Greetings,

I am interested in using OBOE for a publication, in which I am also reporting the training performance estimation. From the examples I found here, I see no way to retrieve this estimation. Am I missing something, or has it not been implemented yet?

Thank in advance.

AutoLearner crashes on Windows

Hello,

I wanted to try the example from the Readme (Python3.8, Windows 10). Unfortunately, an error occurred when running it under Windows:

...

File ~\miniconda3\envs\automl\lib\site-packages\oboe\auto_learner.py:749, in AutoLearner.fit.<locals>.time_limit(seconds)
    747 def signal_handler(signum, frame):
    748     raise TimeoutException("Time limit reached.")
--> 749 signal.signal(signal.SIGALRM, signal_handler)
    750 signal.alarm(seconds)
    751 try:

AttributeError: module 'signal' has no attribute 'SIGALRM'

As it looks Windows does not implement that signal (see https://stackoverflow.com/questions/52779920/why-is-signal-sigalrm-not-working-in-python-on-windows)

Maybe you could find a platform independent solution?

the shape of ERROR_TENSOR

Hi, I read the file error_tensor.npy and find that the shape of ERROR_TENSOR is (215, 4, 2, 8, 183).

After computing, I find the number of standardizer is 2; the number of dim_reducer is 8; the number of estimator is 183; maybe the number of dataset is 215. So, what is the number in the shape 4 means?

Here are infos in classification.json

{
"imputer":
{"algorithms": ["SimpleImputer"],
"hyperparameters": {
    "SimpleImputer": {"strategy": ["mean", "median", "most_frequent", "constant"]}
}},
 "encoder": 
 {"algorithms": [null, "OneHotEncoder"],
 "hyperparameters": {
     "OneHotEncoder": {"handle_unknown": ["ignore"], "sparse": [0]}    
 }},
  "standardizer":
  {"algorithms": [null, "StandardScaler"],
  "hyperparameters": {
  "StandardScaler": {}
 }},
    
    "dim_reducer":
    {"algorithms": [null, "PCA", "VarianceThreshold", "SelectKBest"],
    "hyperparameters": {
    "PCA": {"n_components": ["25%", "50%", "75%"]},
    "VarianceThreshold": {},
    "SelectKBest": {"k": ["25%", "50%", "75%"]}
 }}, 
"estimator":
{"algorithms": ["KNN", "DT", "RF", "GBT", "AB", "lSVM", "Logit", "Perceptron", "GNB", "MLP", "ExtraTrees"], 
 "hyperparameters": {
     "KNN": {"n_neighbors": [1, 3, 5, 7, 9, 11, 13, 15], "p": [1, 2]}, 
     "DT": {"min_samples_split": [2,4,8,16,32,64,128,256,512,1024,0.01,0.001,0.0001,1e-05]}, 
     "RF": {"min_samples_split": [2,4,8,16,32,64,128,256,512,1024,0.1,0.01,0.001,0.0001,1e-05], "criterion": ["gini", "entropy"]}, 
     "GBT": {"learning_rate": [0.001,0.01,0.025,0.05,0.1,0.25,0.5], "max_depth": [3, 6], "max_features": [null, "log2"]}, 
     "AB": {"n_estimators": [50, 100], "learning_rate": [1.0, 1.5, 2.0, 2.5, 3.0]}, 
     "lSVM": {"C": [0.125,0.25,0.5,0.75,1,2,4,8,16]},
     "Logit": {"C": [0.25,0.5,0.75,1,1.5,2,3,4], "solver": ["liblinear", "saga"], "penalty": ["l1", "l2"]}, 
     "Perceptron": {}, 
     "GNB": {}, 
     "MLP": {"learning_rate_init": [0.0001,0.001,0.01], "learning_rate": ["adaptive"], "solver": ["sgd", "adam"], "alpha": [0.0001, 0.01]}, 
     "ExtraTrees": {"min_samples_split": [2,4,8,16,32,64,128,256,512,1024,0.1,0.01,0.001,0.0001,1e-05], "criterion": ["gini", "entropy"]}
 }}
}

Documentation

Hello,
Thanks for your nice framework!
I would like to ask when you release documentation for putting this framework in action and more useability!?

Does not install properly through pip

Hello,

As part of a project, I am trying to fix up some things with automlbenchmark and from there realized oboe does not install correctly.

Reproducible code (sample shown in README.md)

method = 'Oboe' 
problem_type = 'classification'

from auto_learner import AutoLearner
import numpy as np

m = AutoLearner(p_type=problem_type, runtime_limit=30, method=method, verbose=False)

Error log:

Traceback (most recent call last):
  File "test_case.py", line 6, in <module>
    m = AutoLearner(p_type=problem_type, runtime_limit=30, method=method, verbose=False)
  File "/home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe/auto_learner.py", line 87, in __init__
    with open(os.path.join(DEFAULTS, p_type + '.json')) as file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe/defaults/Oboe/classification.json'

Looking at /home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe/defaults/Oboe/classification.json I can see that pip didn't include everything

ls /home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe
__pycache__  ..               convex_opt.py  experiment_design.py  __init__.py  model.py     preprocessing.py
.            auto_learner.py  ensemble.py    generate_vector.py    linalg.py    pipeline.py  util.py

As you can see, the defaults folder and deeper is not included. Upon inspecting these folders, it seems it's because these are not python modules (lacking the __init__.py file). There for your setup.py would have to be modified according to this stack overflow answer

More datasets that in the paper?

I was checking the tensor on the repository "oboe/large_files/error_tensor_f16_compressed.npz", and I noticed there are 551 datasets, while in the [paper] (https://people.ece.cornell.edu/cy/_papers/tensor_oboe.pdf) you mentioned only 215 for meta-training. Did you add more? Moreover, is it possible to get the meta-features of these 551 datasets? Or how do you compute the best initializations when meta-learning with Auto-sklearn?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.