Coder Social home page Coder Social logo

zygmuntz / hyperband Goto Github PK

View Code? Open in Web Editor NEW
588.0 22.0 75.0 379 KB

Tuning hyperparams fast with Hyperband

Home Page: http://fastml.com/tuning-hyperparams-fast-with-hyperband/

License: Other

Python 100.00%
hyperparameters hyperparameter-optimization hyperparameter-tuning gradient-boosting-classifier gradient-boosting machine-learning

hyperband's Introduction

hyperband

Code for tuning hyperparams with Hyperband, adapted from Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.

defs/ - functions and search space definitions for various classifiers
defs_regression/ - the same for regression models
common_defs.py - imports and definitions shared by defs files
hyperband.py - from hyperband import Hyperband

load_data.py - classification defs import data from this file
load_data_regression.py - regression defs import data from this file

main.py - a complete example for classification
main_regression.py - the same, for regression
main_simple.py - a simple, bare-bones, example	

The goal is to provide a fully functional implementation of Hyperband, as well as a number of ready to use functions for a number of models (classifiers and regressors). Currently these include four from scikit-learn and four others:

  • gradient boosting (GB)
  • random forest (RF)
  • extremely randomized trees (XT)
  • linear SGD
  • factorization machines from polylearn
  • polynomial networks from polylearn
  • a multilayer perceptron from Keras
  • gradient boosting from XGBoost (classification only)

Meta-classifier/regressor

Use defs.meta/defs_regression.meta to try many models in one Hyperband run. This is an automatic alternative to constructing search spaces with multiple models (like defs.rf_xt, or defs.polylearn_fm_pn) by hand.

Loading data

Definitions files in defs/defs_regression import data from load_data.py and load_data_regression.py, respectively.

Edit these files, or a definitions file directly, to make your data available for tuning.

Regression defs use the kin8nm dataset in data/kin8nm. There is no attached data for classification.

For the provided models data format follows scikit-learn conventions, that is, there are x_train, y_train, x_test and y_test Numpy arrays.

Usage

Run main.py (with your own data), or main_regression.py. The essence of it is

from hyperband import Hyperband
from defs.gb import get_params, try_params

hb = Hyperband( get_params, try_params )
results = hb.run()

Here's a sample output from a run (three configurations tested) using defs.xt:

3 | Tue Feb 28 15:39:54 2017 | best so far: 0.5777 (run 2)

n_estimators: 5
{'bootstrap': False,
'class_weight': 'balanced',
'criterion': 'entropy',
'max_depth': 5,
'max_features': 'sqrt',
'min_samples_leaf': 5,
'min_samples_split': 6}

# training | log loss: 62.21%, AUC: 75.25%, accuracy: 67.20%
# testing  | log loss: 62.64%, AUC: 74.81%, accuracy: 66.78%

7 seconds.

4 | Tue Feb 28 15:40:01 2017 | best so far: 0.5777 (run 2)

n_estimators: 5
{'bootstrap': False,
'class_weight': None,
'criterion': 'gini',
'max_depth': 5,
'max_features': 'sqrt',
'min_samples_leaf': 1,
'min_samples_split': 2}

# training | log loss: 53.39%, AUC: 75.69%, accuracy: 72.37%
# testing  | log loss: 53.96%, AUC: 75.29%, accuracy: 71.89%

7 seconds.

5 | Tue Feb 28 15:40:07 2017 | best so far: 0.5396 (run 4)

n_estimators: 5
{'bootstrap': True,
'class_weight': None,
'criterion': 'gini',
'max_depth': 3,
'max_features': None,
'min_samples_leaf': 7,
'min_samples_split': 8}

# training | log loss: 50.20%, AUC: 77.04%, accuracy: 75.39%
# testing  | log loss: 50.67%, AUC: 76.77%, accuracy: 75.12%

8 seconds.

Early stopping

Some models may use early stopping (as the Keras MLP example does). If a configuration stopped early, it doesn't make sense to run it with more iterations (duh). To indicate this, make try_params()

return { 'loss': loss, 'early_stop': True }

This way, Hyperband will know not to select that configuration for any further runs.

Moar

See http://fastml.com/tuning-hyperparams-fast-with-hyperband/ for a detailed description.

hyperband's People

Contributors

zygmuntz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hyperband's Issues

Viewing and interpreting results

Thanks for putting together this library!

What is the best way to view and interpret the pickled results from one of the runs?

Import Error main_regression.py

First, thank you for putting together this library!

I was trying out your main_regression.py file to get a feel for the library and ran into the following:

import_error

Is this polylearn from scikit-learn or another library?

Understanding iterations vs runs

Looking over the results from one of my runs, I am seeing both a number of iterations and a number of runs? Can you explain the difference between these two?

How would I go about setting up an experiment where there would be 50 pulls of the bandits? I assume that is setting the run parameter?

Configurations not being updated

Hello,

Thanks for this very nice repo ! But something isn't very clear to me. As it is said in the blogpost, Hyperband runs configs for just an iteration or two at first, to get a taste of how they perform. Then it takes the best performers and runs them longer.
So I thought that in the outer loop, we would first randomly instantiate the configuration T and then update it at the end of each inner loop.
However this is not the case, and for each new s, a random T is then again drawn, without taking into account the previous computed T. Am I missing something here ?

Optimize with CV and specific scoring

Hi! I am wondering whether it is possible to optimize with cross validation and preferably with a custom scoring function. Currently, it picks the configuration that minimizes e.g., the log loss of the training data if I am not mistaken. Would be good to also have similar options as grid search offers in scikit learn.

Questions about run one configuration with two different times of iterations

Hi @zygmuntz ,

I am confused about the running process if I run one configuration with two different times of iterations. For example, I have two configurations, named A and B And in the first running process, I run A and B with 5 iterations, respectively. I find config A performs better than B. So in the second running process, I will run A with 10 iterations, which means assigning more resources to A. Hyperband will run A from the point of the end of first running process, or run a from scratch? That is to say, Hyperband will run A with (10 - 5 = 5) iterations or 10 iterations in the second running process ?

I notice that in each try_params function, it seems that a complete new classifiers with specific running iterations n_iterations will be created. Such as, https://github.com/zygmuntz/hyperband/blob/master/defs_regression/sgd.py#L57, https://github.com/zygmuntz/hyperband/blob/master/defs_regression/gb.py#L43.

Thanks for your sharing !!!
Ramay7

Shape of prediction

Hi,

Thank you for this implementation of Hyperband.

I noticed that in defs_regression, prediction "p" for keras_mlp and rf has shape (n,1) whereas "target" has shape (n,).

I wanted to define my own metric that involved substracting prediction by target at some point. For small arrays it is OK to substract a (n, ) array to a (n, 1) array but for n > 100,000 I got a memory error.

You might want to squeeze prediction p to troubleshoot this problem.

Thank you.

License

I am not a lawyer (and this is not legal advice), but the current license appears to be either: (1) incompatible with the GPL, or (2) effectively the same as the regular BSD 2-clause since any user could sublicense to whatever government agency he or she so desires. Either way, it's vague and really should be replaced by the regular BSD 2-clause license (or the MIT license or whatever).

No dropout in last hidden layer?

I've been working with your code lately and I've notice that the last layer of the keras_mlp.py in both models does never apply dropout:

model = Sequential()
model.add( Dense( params['layer_1_size'], init = params['init'], 
activation = params['layer_1_activation'], input_dim = input_dim ))

for i in range( int( params['n_layers'] ) - 1 ):
	
	extras = 'layer_{}_extras'.format( i + 1 )
	
	if params[extras]['name'] == 'dropout':
		model.add( Dropout( params[extras]['rate'] ))
	elif params[extras]['name'] == 'batchnorm':
		model.add( BatchNorm())
		
	model.add( Dense( params['layer_{}_size'.format( i + 2 )], init = params['init'], 
		activation = params['layer_{}_activation'.format( i + 2 )]))
	   
model.add( Dense( 1, init = params['init'], activation = 'linear' ))

As can be seen in the code, the last hidden layer can't have dropout since the dropout is coded before the layer itself. Is this intentional or it's undesired behaviour?

data/classification.pkl missing

When I try to run an example I get

IOError: [Errno 2] No such file or directory: 'data/classification.pkl'

Can you post that file somewhere? Or is it derived from one of the other data files?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.