zygmuntz / hyperband Goto Github PK

Tuning hyperparams fast with Hyperband

Home Page: http://fastml.com/tuning-hyperparams-fast-with-hyperband/

License: Other

Python 100.00%

hyperparameters hyperparameter-optimization hyperparameter-tuning gradient-boosting-classifier gradient-boosting machine-learning

hyperband's Introduction

hyperband

Code for tuning hyperparams with Hyperband, adapted from Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.

defs/ - functions and search space definitions for various classifiers
defs_regression/ - the same for regression models
common_defs.py - imports and definitions shared by defs files
hyperband.py - from hyperband import Hyperband

load_data.py - classification defs import data from this file
load_data_regression.py - regression defs import data from this file

main.py - a complete example for classification
main_regression.py - the same, for regression
main_simple.py - a simple, bare-bones, example

The goal is to provide a fully functional implementation of Hyperband, as well as a number of ready to use functions for a number of models (classifiers and regressors). Currently these include four from scikit-learn and four others:

gradient boosting (GB)
random forest (RF)
extremely randomized trees (XT)
linear SGD
factorization machines from polylearn
polynomial networks from polylearn
a multilayer perceptron from Keras
gradient boosting from XGBoost (classification only)

Meta-classifier/regressor

Use defs.meta/defs_regression.meta to try many models in one Hyperband run. This is an automatic alternative to constructing search spaces with multiple models (like defs.rf_xt, or defs.polylearn_fm_pn) by hand.

Loading data

Definitions files in defs/defs_regression import data from load_data.py and load_data_regression.py, respectively.

Edit these files, or a definitions file directly, to make your data available for tuning.

Regression defs use the kin8nm dataset in data/kin8nm. There is no attached data for classification.

For the provided models data format follows scikit-learn conventions, that is, there are x_train, y_train, x_test and y_test Numpy arrays.

Usage

Run main.py (with your own data), or main_regression.py. The essence of it is

from hyperband import Hyperband
from defs.gb import get_params, try_params

hb = Hyperband( get_params, try_params )
results = hb.run()

Here's a sample output from a run (three configurations tested) using defs.xt:

3 | Tue Feb 28 15:39:54 2017 | best so far: 0.5777 (run 2)

n_estimators: 5
{'bootstrap': False,
'class_weight': 'balanced',
'criterion': 'entropy',
'max_depth': 5,
'max_features': 'sqrt',
'min_samples_leaf': 5,
'min_samples_split': 6}

# training | log loss: 62.21%, AUC: 75.25%, accuracy: 67.20%
# testing  | log loss: 62.64%, AUC: 74.81%, accuracy: 66.78%

7 seconds.

4 | Tue Feb 28 15:40:01 2017 | best so far: 0.5777 (run 2)

n_estimators: 5
{'bootstrap': False,
'class_weight': None,
'criterion': 'gini',
'max_depth': 5,
'max_features': 'sqrt',
'min_samples_leaf': 1,
'min_samples_split': 2}

# training | log loss: 53.39%, AUC: 75.69%, accuracy: 72.37%
# testing  | log loss: 53.96%, AUC: 75.29%, accuracy: 71.89%

7 seconds.

5 | Tue Feb 28 15:40:07 2017 | best so far: 0.5396 (run 4)

n_estimators: 5
{'bootstrap': True,
'class_weight': None,
'criterion': 'gini',
'max_depth': 3,
'max_features': None,
'min_samples_leaf': 7,
'min_samples_split': 8}

# training | log loss: 50.20%, AUC: 77.04%, accuracy: 75.39%
# testing  | log loss: 50.67%, AUC: 76.77%, accuracy: 75.12%

8 seconds.

Early stopping

Some models may use early stopping (as the Keras MLP example does). If a configuration stopped early, it doesn't make sense to run it with more iterations (duh). To indicate this, make try_params()

return { 'loss': loss, 'early_stop': True }

This way, Hyperband will know not to select that configuration for any further runs.

Moar

See http://fastml.com/tuning-hyperparams-fast-with-hyperband/ for a detailed description.

hyperband's People

Contributors

Stargazers

Watchers

Forkers

benjamesbabala alvinjamur luotongml chrinide aiadventures ethancaballero enavarroai kastnerkyle yuriyouzhou jvmncs kdubovikov wujian16 michelealberto badgermind hedgefair tony32769 ofergold vishalbelsare pinjutien lpcinelli elpistolero slizb yiyisan markgr-ds soudia j6e shubhampachori12110095 wn9081 mikecunha lukas mldsbigguy wojohowitz00 ml-squad smhendryx afcarl arnaudmkonan bingyingshao eycab vladperervenko hangelwen bezova glittershark carpedm20 mufarooqq jacky-liang sajidahmeduiu hcfthu stjordanis zerocurve psyche11 deep-warrior diyano icemark123 hyunghunny eun-jung ssgalitsky skyjiao varunpillai karthikshivaram24 ssitb fagan2888 drpdr zaouk ibrahimzamit adesope xue-smile bahramjafrasteh anyaschenikova liyunbin chana678 phymucs guan-jw faizan1234567 thubz09 stanleyjacob

hyperband's Issues

Viewing and interpreting results

Thanks for putting together this library!

What is the best way to view and interpret the pickled results from one of the runs?

Can we use auc or accuracy rather than logloss?

loss = result['loss']
val_losses.append( loss )

Can we replace the loss with auc if I am more interested in auc? Or logloss will be better even if I care about auc?

Provide Classifier Pickle

Wanting to test this and not having a small data set available is cumbersome.

Import Error main_regression.py

First, thank you for putting together this library!

I was trying out your main_regression.py file to get a feel for the library and ran into the following:

Is this polylearn from scikit-learn or another library?

Understanding iterations vs runs

Looking over the results from one of my runs, I am seeing both a number of iterations and a number of runs? Can you explain the difference between these two?

How would I go about setting up an experiment where there would be 50 pulls of the bandits? I assume that is setting the run parameter?

Configurations not being updated

Hello,

Thanks for this very nice repo ! But something isn't very clear to me. As it is said in the blogpost, Hyperband runs configs for just an iteration or two at first, to get a taste of how they perform. Then it takes the best performers and runs them longer.
So I thought that in the outer loop, we would first randomly instantiate the configuration T and then update it at the end of each inner loop.
However this is not the case, and for each new s, a random T is then again drawn, without taking into account the previous computed T. Am I missing something here ?

Python 3 compatibility

Python 2 is nearly dead.

Optimize with CV and specific scoring

Hi! I am wondering whether it is possible to optimize with cross validation and preferably with a custom scoring function. Currently, it picks the configuration that minimizes e.g., the log loss of the training data if I am not mistaken. Would be good to also have similar options as grid search offers in scikit learn.

Make a package

Questions about run one configuration with two different times of iterations

Hi @zygmuntz ,

I am confused about the running process if I run one configuration with two different times of iterations. For example, I have two configurations, named A and B And in the first running process, I run A and B with 5 iterations, respectively. I find config A performs better than B. So in the second running process, I will run A with 10 iterations, which means assigning more resources to A. Hyperband will run A from the point of the end of first running process, or run a from scratch? That is to say, Hyperband will run A with (10 - 5 = 5) iterations or 10 iterations in the second running process ?

I notice that in each try_params function, it seems that a complete new classifiers with specific running iterations n_iterations will be created. Such as, https://github.com/zygmuntz/hyperband/blob/master/defs_regression/sgd.py#L57, https://github.com/zygmuntz/hyperband/blob/master/defs_regression/gb.py#L43.

Thanks for your sharing !!!
Ramay7

Rewrite the history with more meaningful commit messages

Shape of prediction

Hi,

Thank you for this implementation of Hyperband.

I noticed that in defs_regression, prediction "p" for keras_mlp and rf has shape (n,1) whereas "target" has shape (n,).

I wanted to define my own metric that involved substracting prediction by target at some point. For small arrays it is OK to substract a (n, ) array to a (n, 1) array but for n > 100,000 I got a memory error.

You might want to squeeze prediction p to troubleshoot this problem.

Thank you.

License

I am not a lawyer (and this is not legal advice), but the current license appears to be either: (1) incompatible with the GPL, or (2) effectively the same as the regular BSD 2-clause since any user could sublicense to whatever government agency he or she so desires. Either way, it's vague and really should be replaced by the regular BSD 2-clause license (or the MIT license or whatever).

No dropout in last hidden layer?

I've been working with your code lately and I've notice that the last layer of the keras_mlp.py in both models does never apply dropout:

model = Sequential()
model.add( Dense( params['layer_1_size'], init = params['init'], 
activation = params['layer_1_activation'], input_dim = input_dim ))

for i in range( int( params['n_layers'] ) - 1 ):
	
	extras = 'layer_{}_extras'.format( i + 1 )
	
	if params[extras]['name'] == 'dropout':
		model.add( Dropout( params[extras]['rate'] ))
	elif params[extras]['name'] == 'batchnorm':
		model.add( BatchNorm())
		
	model.add( Dense( params['layer_{}_size'.format( i + 2 )], init = params['init'], 
		activation = params['layer_{}_activation'.format( i + 2 )]))
	   
model.add( Dense( 1, init = params['init'], activation = 'linear' ))

As can be seen in the code, the last hidden layer can't have dropout since the dropout is coded before the layer itself. Is this intentional or it's undesired behaviour?

data/classification.pkl missing

When I try to run an example I get

IOError: [Errno 2] No such file or directory: 'data/classification.pkl'

Can you post that file somewhere? Or is it derived from one of the other data files?

Thanks