Coder Social home page Coder Social logo

simonblanke / hyperactive Goto Github PK

View Code? Open in Web Editor NEW
489.0 12.0 41.0 31.09 MB

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

Home Page: https://simonblanke.github.io/hyperactive-documentation

License: MIT License

Python 99.20% Makefile 0.80%
hyperparameter-optimization scikit-learn machine-learning python data-science parameter-tuning xgboost keras deep-learning bayesian-optimization

hyperactive's People

Contributors

0liu avatar 23pointsnorth avatar adavidzh avatar dbready avatar simonblanke avatar vanshikas253 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hyperactive's Issues

Hyperactive returns integer parameters as float in best_para when search space has both int and float parameters

Hi Simon,

This is not strictly a bug but more like a suggestion or concern. As I reported in SimonBlanke/Gradient-Free-Optimizers#15, integer parameters in search space are returned as float, which caused TypeError if passing the best para to model for evaluation. It turned out the issue was not in the optimizers, but in Hyperactive converter functions, specifically:

def position2value(self, position):
value = []
for n, space_dim in enumerate(self.search_space_values):
value.append(space_dim[position[n]])
return np.array(value)

This function converts para value list to np.array, which converts all integers to float.
This only happens when search space contains both integer and float parameters.

I browsed the code around and don't see it's very necessary to use numpy array as returns. So it may be better to use plain Python list or nested list to pass parameter values between converters. But you may have other considerations to use numpy array in those functions. I will create a pull request to better clarify my idea and provide some edits, just as my suggestion.

Problem with "memory_warm_start" when functions are in search space

In the current version (v3.0.4) there is a problem with the feature to utilize the search-data of a previous run, when the search space contains functions. For this problem to occur the search-data is previously saved to disk. The same problem exists with search-data passed to "smbo_warm_start".

The following code shows an example for this problem:

import pandas as pd
from hyperactive import Hyperactive

def func1():
    pass


def func2():
    pass


def func3():
    pass


search_space = {
    "func1": [func1, func2, func3],
}


def objective_function(para):
    return 1


hyper_0 = Hyperactive()
hyper_0.add_search(objective_function, search_space, n_iter=20)
hyper_0.run()


search_data_0 = hyper_0.results(objective_function)

path = "./search_data.csv"

search_data_0.to_csv(path, index=False)
search_data_0_ = pd.read_csv(path)

hyper_1 = Hyperactive()
hyper_1.add_search(
    objective_function, search_space, n_iter=20, memory_warm_start=search_data_0_
)
hyper_1.run()

The problem results in search-data in the memory-dictionary that Hyperactive does not recognize. The functions are new because they have a new position in memory (the random access memory not the Hyperactive-memory-dictionary). Python recognizes functions by their position in memory. For example: "<function func3 at 0x7fc5708ea7b8>".

This problem does not result in an error, but Hyperactive will not "remember" the search-data passed to "memory_warm_start" and therefore won't skip evaluations for known positions in the search space.

para2value - 'NoneType' object is not subscriptable

Describe the bug
In some optimizations, I get this error. I can run the same optimization.
Sometimes it works perfectly and other times it suddenly throws this. Couldn't retrace why it's None.

Error message from command line

============================== EXCEPTION TRACEBACK:
  File "/usr/local/bin/jesse", line 33, in <module>
    sys.exit(load_entry_point('jesse', 'console_scripts', 'jesse')())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/home/src/jesse/jesse/__init__.py", line 379, in optimize_hyperactive
    optimize_mode_hyperactive(start_date, finish_date, optimal_total, cpu, optimizer, iterations)
  File "/home/home/src/jesse/jesse/modes/optimize_hyperactive_mode/__init__.py", line 307, in optimize_mode_hyperactive
    optimizer.run()
  File "/home/home/src/jesse/jesse/modes/optimize_hyperactive_mode/__init__.py", line 268, in run
    hyper.run()
  File "/usr/local/lib/python3.9/site-packages/hyperactive/hyperactive.py", line 201, in run
    self.results_list = run_search(self.process_infos, self.distribution)
  File "/usr/local/lib/python3.9/site-packages/hyperactive/run_search.py", line 42, in run_search
    results_list = single_process(_process_, process_infos)
  File "/usr/local/lib/python3.9/site-packages/hyperactive/distribution.py", line 10, in single_process
    results = [process_func(**search_processes_infos[0])]
  File "/usr/local/lib/python3.9/site-packages/hyperactive/process.py", line 25, in _process_
    optimizer.search(
  File "/usr/local/lib/python3.9/site-packages/hyperactive/optimizers.py", line 167, in search
    self._convert_results2hyper()
  File "/usr/local/lib/python3.9/site-packages/hyperactive/optimizers.py", line 80, in _convert_results2hyper
    value = self.trafo.para2value(self.optimizer.best_para)
  File "/usr/local/lib/python3.9/site-packages/hyperactive/hyper_gradient_trafo.py", line 52, in para2value
    value.append(para[para_name])
=========================================================================

 Uncaught Exception: TypeError: 'NoneType' object is not subscriptable

System information:

  • OS Platform and Distribution - Ubuntu (Docker)
  • Python version 3.8
  • Hyperactive version 3.0.5.1

AttributeError: module 'os' has no attribute 'mknod' within Progress Board

Look into the FAQ of the readme. Can the bug be resolved by one of those solutions?
No
Describe the bug
AttributeError: module 'os' has no attribute 'mknod'

Code to reproduce the behavior

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston

from hyperactive import Hyperactive
from hyperactive.dashboards import ProgressBoard

data = load_boston()
X, y = data.data, data.target


def model(opt):
    gbr = GradientBoostingRegressor(
        n_estimators=opt["n_estimators"],
        max_depth=opt["max_depth"],
        min_samples_split=opt["min_samples_split"],
    )
    scores = cross_val_score(gbr, X, y, cv=3)

    return scores.mean()


search_space = {
    "n_estimators": list(range(50, 150, 5)),
    "max_depth": list(range(2, 12)),
    "min_samples_split": list(range(2, 22)),
}

progress_board = ProgressBoard()

hyper = Hyperactive()

hyper.add_search(
    model,
    search_space,
    n_iter=120,
    progress_board=progress_board,
)

hyper.run()
**Error message from command line**
AttributeError                            Traceback (most recent call last)
<ipython-input-1-f7abb1559f49> in <module>
     34 
     35 # pass the instance of the ProgressBoard to .add_search(...)
---> 36 hyper.add_search(
     37     model,
     38     search_space,

~\Anaconda3\lib\site-packages\hyperactive\hyperactive.py in add_search(self, objective_function, search_space, n_iter, search_id, optimizer, n_jobs, initialize, max_score, early_stopping, random_state, memory, memory_warm_start, progress_board)
    142         optimizer = self._default_opt(optimizer)
    143         search_id = self._default_search_id(search_id, objective_function)
--> 144         progress_collector = self._init_progress_board(
    145             progress_board, search_id, search_space
    146         )

~\Anaconda3\lib\site-packages\hyperactive\hyperactive.py in _init_progress_board(self, progress_board, search_id, search_space)
    117     def _init_progress_board(self, progress_board, search_id, search_space):
    118         if progress_board:
--> 119             data_c = progress_board.init_paths(search_id, search_space)
    120 
    121             if progress_board.uuid not in self.progress_boards:

~\Anaconda3\lib\site-packages\hyperactive\dashboards\progress_board\progress_board.py in init_paths(self, search_id, search_space)
     38 
     39         self._io_.remove_progress(progress_id)
---> 40         self.create_lock(progress_id)
     41         data_c = DataCollector(self._io_.get_progress_data_path(progress_id))
     42         self.progress_collectors[progress_id] = data_c

~\Anaconda3\lib\site-packages\hyperactive\dashboards\progress_board\progress_board.py in create_lock(self, progress_id)
     26         path = self._io_.get_lock_file_path(progress_id)
     27         if not os.path.exists(path):
---> 28             os.mknod(path)
     29 
     30     def init_paths(self, search_id, search_space):

AttributeError: module 'os' has no attribute 'mknod'

System information:
OS Platform and Distribution
Windows 10
Python version 3.8.8
Hyperactive version 3.3.2

Additional context
From stackoverflow https://stackoverflow.com/questions/32691981/python-module-os-has-no-attribute-mknod
"os offers functionality that is closely related to the OS you're using. If most other attributes can be accessed from os (meaning you haven't got a os.py file in the current dir masking the standard module) an AttributeError will 99% signal an unsupported function on your Operating System.

This is what the case is with os.mknod on Windows. Creating named pipes in Windows has, as far as I can understand, very different semantics.

Either way, if you are trying to use mknod to create named pipes you'd probably be better using mkfifo() (again, only Unix supported) . If you're using it to create ordinary files, don't, use open() which is portable."

Printing of Results from Runs - Preference to be able to provide additional parameters to objective function not through search space

Is your feature request related to a problem? Please describe.
Yes, when using the "print results" parameter for verbosity within the Hyperactive initialization the parameter set printed includes all of the parameters used. In my case, one of the parameters within the search space is the dataframe that I am passing to the objective_function. I can't find any other way to pass the dataframe that the objective function is being performed on without including it in the search space.

Describe the solution you'd like

  1. Either the ability to edit which parameters will be printed from the print_results within the parameter set or
  2. The ability to pass extra parameters to the objective function without including them in the search space (Preferred)
    This could perhaps be done through the "initialize" parameter if it was opened up to more arguments than grid, vertices, and
    random, perhaps **kwargs so that any user parameters could be added. As it is written now, if you were to add an additional parameter to add_search, it might complicate things rather than just having the optimizer only look within search_space, you would have to make that change everywhere and for each optimizer which is a ton of work, but if you were to add it for initialize, it may mean less changes.

Describe alternatives you've considered
I have considered not printing the results because the dataframe printed makes my results not as clean in the console.

Additional context
If I do not include the dataframe in the search parameters then I can't run my objective_function. But if I do, I can't include memory = True because the fact that my dataframe is now something stored means that it would consume a ton of memory very quickly.

"IndexError: list index out of range" because of unordered dictionary

The error occurs in less than 10 percent of the cases when using python 3.5 and is located in the "_positions2results"-method of the optimizer-wrapper-class. It occurs because the method is not save to unordered dictionaries.

Hyperactive is not susceptible to this error for python versions 3.6, 3.7 and 3.8.

Errors when there is only one option for a hyperparameter

Describe the bug
Sometimes while trying out different combinations of hyperparameters I find it useful to fix a hyperparameter to a specific value (e.g. I have determined that a specific value is better than all others). However doing this by simply providing the search_space parameter with a list of length 1 throws a value error. I could fix the values downstream but I lose some flexibility by doing so.

This issue has been observed for several different optimizers including TreeStructuredParzenEstimators, BayesianOptimizer and DecisionTreeOptimizer but not for others (HillClimbingOptimizer, ParticleSwarmOptimizer)

Code to reproduce the behavior

from hyperactive import Hyperactive, TreeStructuredParzenEstimators

search_space = {   
    "param_1": [1],
    "param_2": [0.01, 0.02, 0.03, 0.04],
}

def my_func(optimizer):
    return optimizer["param_2"]

hyper = Hyperactive()
optimizer = TreeStructuredParzenEstimators()
n_iter = 20

hyper.add_search(
  objective_function=my_func,
  search_space=search_space,
  optimizer=optimizer,
  n_iter=n_iter,
)

hyper.run()

Error message from command line
ValueError: Found array with 0 sample(s) (shape=(0, 2)) while a minimum of 1 is required.

System information:

  • OS Platform and Distribution
    WSL (Ubuntu 18.04) on Windows 10

  • Python version
    3.7

  • Hyperactive version
    3.0.4

Additional context
Full error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/hyperactive/hyperactive.py", line 199, in run
    self.results_list = run_search(self.process_infos, self.distribution)
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/hyperactive/run_search.py", line 42, in run_search
    results_list = single_process(_process_, process_infos)
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/hyperactive/distribution.py", line 10, in single_process
    results = [process_func(**search_processes_infos[0])]
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/hyperactive/process.py", line 34, in _process_
    nth_process=nth_process,
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/hyperactive/optimizers.py", line 160, in search
    nth_process,
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/gradient_free_optimizers/search.py", line 146, in search
    self._iteration(nth_iter)
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/gradient_free_optimizers/times_tracker.py", line 27, in wrapper
    res = func(self, *args, **kwargs)
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/gradient_free_optimizers/search.py", line 65, in _iteration
    pos_new = self.iterate()
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/gradient_free_optimizers/optimizers/base_optimizer.py", line 36, in wrapper
    pos = func(self, *args, **kwargs)
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/gradient_free_optimizers/optimizers/sequence_model/smbo.py", line 67, in wrapper
    pos = func(self, *args, **kwargs)
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/gradient_free_optimizers/optimizers/base_optimizer.py", line 47, in wrapper
    return func(self, *args, **kwargs)
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/gradient_free_optimizers/optimizers/sequence_model/tree_structured_parzen_estimators.py", line 82, in iterate
    return self.propose_location()
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/gradient_free_optimizers/optimizers/sequence_model/tree_structured_parzen_estimators.py", line 70, in propose_location
    exp_imp = self.expected_improvement()
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/gradient_free_optimizers/optimizers/sequence_model/tree_structured_parzen_estimators.py", line 45, in expected_improvement
    logprob_best = self.kd_best.score_samples(self.all_pos_comb)
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/sklearn/neighbors/_kde.py", line 190, in score_samples
    X = check_array(X, order='C', dtype=DTYPE)
  File "/home/wedge/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 586, in check_array
    context))
ValueError: Found array with 0 sample(s) (shape=(0, 2)) while a minimum of 1 is required.

Request TPE example

Hello, the link you provide to TPE use example gives me a 404 error.
Here's the URL:
https://simonblanke.github.io/Hyperactive/#/./optimizers/sequential_mehttps://github.com/SimonBlanke/Hyperactive/tree/master/examples/optimization_techniques/tpe.py

information displayed in the optimization process

Thank you for your code sharing and your extraordinary development. I try to modify your code to show each iteration versus best score, that help us to show a graph n_iter vs best score.
print('Iteration {}: Best Cost = {}'.format(best_iter, best_score=)).
But i cant not succeed,
Could you help me to handle this issue ? Thank you for your help.

Bug when using memory_warm_start with non-numerical search space

There is a bug in v3.0.2 when using the memory_warm_start and a non-numerical search space. This can be reproduced in the following example:

import pandas as pd
from hyperactive import Hyperactive

def dummy_function(opt):
    return 1

def func1():
    pass

search_space = {
    "obj1": [func1],
}

mem_df = pd.DataFrame([[func1, 1], [func1, 1], [func1, 1]], columns=["obj1", "score"])

hyper = Hyperactive()
hyper.add_search(dummy_function, search_space, n_iter=15, memory_warm_start=mem_df)
hyper.run()

Pickling issue with multiprocessing

Describe the bug
Enabling more than 1 jobs crashes the code because the arguments can not be pickled.

Code to reproduce the behavior

space = dict(
        alpha=np.linspace(0, 1),
        beta=np.linspace(0, 1),
        start=[-0.95],
        end=[0.95])
points = np.random.rand(10)
def model(opt):
    rv = lambda x:x
    return np.sum(rv(points))

hyper = Hyperactive()

hyper.add_search(model, space, n_iter=50, n_jobs=2, )
hyper.run()

Error message from command line


TypeError                                 Traceback (most recent call last)
<ipython-input-6-53d227ee7377> in <module>
     12 
     13 hyper.add_search(model, space, n_iter=50, n_jobs=2, )
---> 14 hyper.run()

~/anaconda3/lib/python3.8/site-packages/hyperactive/hyperactive.py in run(self, max_time)
    199             self.process_infos[nth_process]["max_time"] = max_time
    200 
--> 201         self.results_list = run_search(self.process_infos, self.distribution)

~/anaconda3/lib/python3.8/site-packages/hyperactive/run_search.py in run_search(search_processes_infos, distribution)
     45             distribution
     46         )
---> 47         results_list = distribution(process_func, process_infos, **dist_paras)
     48 
     49     return results_list

~/anaconda3/lib/python3.8/site-packages/hyperactive/distribution.py in multiprocessing_wrapper(process_func, search_processes_paras, **kwargs)
     16     n_jobs = len(search_processes_paras)
     17 
---> 18     pool = Pool(n_jobs, **kwargs)
     19     results = pool.map(process_func, search_processes_paras)
     20 

~/anaconda3/lib/python3.8/multiprocessing/context.py in Pool(self, processes, initializer, initargs, maxtasksperchild)
    117         '''Returns a process pool object'''
    118         from .pool import Pool
--> 119         return Pool(processes, initializer, initargs, maxtasksperchild,
    120                     context=self.get_context())
    121 

~/anaconda3/lib/python3.8/multiprocessing/pool.py in __init__(self, processes, initializer, initargs, maxtasksperchild, context)
    210         self._processes = processes
    211         try:
--> 212             self._repopulate_pool()
    213         except Exception:
    214             for p in self._pool:

~/anaconda3/lib/python3.8/multiprocessing/pool.py in _repopulate_pool(self)
    301 
    302     def _repopulate_pool(self):
--> 303         return self._repopulate_pool_static(self._ctx, self.Process,
    304                                             self._processes,
    305                                             self._pool, self._inqueue,

~/anaconda3/lib/python3.8/multiprocessing/pool.py in _repopulate_pool_static(ctx, Process, processes, pool, inqueue, outqueue, initializer, initargs, maxtasksperchild, wrap_exception)
    324             w.name = w.name.replace('Process', 'PoolWorker')
    325             w.daemon = True
--> 326             w.start()
    327             pool.append(w)
    328             util.debug('added worker')

~/anaconda3/lib/python3.8/multiprocessing/process.py in start(self)
    119                'daemonic processes are not allowed to have children'
    120         _cleanup()
--> 121         self._popen = self._Popen(self)
    122         self._sentinel = self._popen.sentinel
    123         # Avoid a refcycle if the target function holds an indirect

~/anaconda3/lib/python3.8/multiprocessing/context.py in _Popen(process_obj)
    282         def _Popen(process_obj):
    283             from .popen_spawn_posix import Popen
--> 284             return Popen(process_obj)
    285 
    286     class ForkServerProcess(process.BaseProcess):

~/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py in __init__(self, process_obj)
     30     def __init__(self, process_obj):
     31         self._fds = []
---> 32         super().__init__(process_obj)
     33 
     34     def duplicate_for_child(self, fd):

~/anaconda3/lib/python3.8/multiprocessing/popen_fork.py in __init__(self, process_obj)
     17         self.returncode = None
     18         self.finalizer = None
---> 19         self._launch(process_obj)
     20 
     21     def duplicate_for_child(self, fd):

~/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py in _launch(self, process_obj)
     45         try:
     46             reduction.dump(prep_data, fp)
---> 47             reduction.dump(process_obj, fp)
     48         finally:
     49             set_spawning_popen(None)

~/anaconda3/lib/python3.8/multiprocessing/reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

TypeError: cannot pickle '_thread.RLock' object

System information:

  • OS Platform and Distribution: MacOS 10.14.6
  • Python version:
sys.version_info
# sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0)
  • Hyperactive version: 3.0.5

Out of memory when running Feature Selection (FS)

Hi there,

Thank you so much for your works on the Hyperactive library. Very useful to our works for Blue Carbon ecosystem evaluation.

Could you have a look at the memory (RAM) consuming when running the Feature Selection? I executed your example for FS using my own data in Google Colab (12 Gb RAM) but always got out of memory.

Anythings need to be noted here?

Many thanks,
Thang

Early stopping using max_score

I am trialling a use of Hyperactive in-conjunction with early stopping.

From my research in the code and docs of this package I can see that we can achieve early stopping by passing parameters along with the name of the chosen gradient free optimizer, like below (?):

optimizer = {"Bayesian": {"max_score": 0.9}}

However, the code below does not stop early and continues until all n_iter are completed, which is not the behaviour i expected. Please could you let me know what I am doing wrong trying to get early stopping to work.

Full example:

from hyperactive import Hyperactive
import time
import numpy as np

def my_model(para, X, y):
    time.sleep(3)
    return 0.9
  
search_config = {
    my_model: {'n_estimators': range(10, 200, 10)}
}

opt = Hyperactive(np.asarray([]), np.asarray([]), memory="short")
opt.search(search_config, n_jobs=1, n_iter=8, optimizer = {"Bayesian": {"max_score": 0.1}})
Set random start position
Thread 0 -> my_model: 100%|██████████| 8/8 [00:18<00:00,  2.25s/it, best_score=0.9, best_since_iter=0] 
best para = {'n_estimators': 180}
score     = 0.9 

Thanks

Particle Swarm Optimizer

In Particle Swarm Optimizer the parameters what we can set as following. But in PSO the number of particles is a very important parameter we need to consider. How to set the number of particles because we can't see the parameter value setting in the function.
optimizer = ParticleSwarmOptimizer(inertia=0.4, cognitive_weight=0.7, social_weight=0.7, temp_weight=0.3, rand_rest_p=0.05)

Question about "example_cnn.py"

Thank you for making a nice tool.

I have question about "example_cnn.py".
There was the following part in the code.

Optimizer = ParticleSwarmOptimizer(
    search_config, n_iter=3, metric="mean_squared_error", verbosity=0
)

I want to use "accuracy_score" in metric, but it couldn't use because error.

ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets

This code is a MNIST classification task, why is MSE used?

How to handle exceptions / pruned trials and calculating n_iter from search space?

Thank you for this awesome python library. I'm currently working on an implementation of it for the Jesse project. Its a crypto backtesting framework. I use hyperactive to optimize the parameters of the trading strategies. https://github.com/jesse-ai/jesse/blob/hyperactive/jesse/modes/optimize_hyperactive_mode/init.py

The score is calculated from a ratio that describes how good the strategy performs and a custom created rate, that punishes parameter sets, that lead to too few or many trades compared to a given optimum. Right now I set the score of backtests with too few trades (so few that ratio doesn't mean anything) and backtests that threw exceptions (sometimes this happens when parameters don't work together) to a very small score (0.0001).
Is this the right way? Wouldn't this punish the whole parameter set although it might be only one parameter that is "bad"?

How to set the n_iter dynamically. The search space of a strategy optimization can vary from one parameter with only a handful of values to 20 parameters with thousands of values and way more possible combinations. My idea was to calculate the number of possible combinations in the search space and use this as a base to calculate a good n_iter value from it. Any ideas or tips on how to do it. This logic approach would be better than letting the user guess what's a good value. It did research a little bit, but it seems the n_iter is often chosen by gut feeling instead of logic.

Thank you for your help.

Dynamic inertia in ParticleSwarmOptimizer

Is there a possibility to dynamically change the inertia of the ParticleSwarmOptimizer?
I am following this article for optimizing CNN hyperparameters: https://www.sciencedirect.com/science/article/pii/S2210650221000249
It says that inertia should change in the following way:
image

Is there already a way to achieve that? Maybe by changing this parameter in the model() function? If not, could you implement some sort of a callback to provide a function which dynamically changes inertia?

add siamese network example

Since Hyperactive allows a lot of flexibility in creating the objective function, the optimization of hyperparameters, cost function or structure of a siamese network should be possible.

add resnet example

Since Hyperactive is able to perform neural architecture search it would be interesting to have an example of the optimization of a residual neural network. It would then be possible to tune the skip-connections of the resnet by changing the number of skip-connections and/or the connection-positions in the neural network.

RecursionError: maximum recursion depth exceeded in comparison

When running the Progress Board for a long time (~1 hour) the following error occurs. I have not been able to reproduce this error in a shorter time. This error seems to originate from plotly but I suspect that the integration of the plotly-figure into the streamlit dashboard (which reruns ~1/second) triggers this.

File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/streamlit/script_runner.py", line 350, in _run_script
    exec(code, module.__dict__)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/hyperactive/dashboards/progress_board/run_streamlit.py", line 78, in <module>
    main()
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/hyperactive/dashboards/progress_board/run_streamlit.py", line 40, in main
    plotly_fig = backend.plotly(progress_data, progress_id)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/hyperactive/dashboards/progress_board/streamlit_backend.py", line 119, in plotly
    fig = px.parallel_coordinates(
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/express/_chart_types.py", line 1306, in parallel_coordinates
    return make_figure(args=locals(), constructor=go.Parcoords)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/express/_core.py", line 2121, in make_figure
    fig.update_layout(template=args["template"], overwrite=True)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 1403, in update_layout
    self.layout.update(dict1, overwrite=overwrite, **kwargs)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 5082, in update
    BaseFigure._perform_update(self, kwargs, overwrite=overwrite)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 3891, in _perform_update
    plotly_obj[key] = val
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 5821, in __setitem__
    super(BaseLayoutHierarchyType, self).__setitem__(prop, value)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 4811, in __setitem__
    self._set_compound_prop(prop, value)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 5222, in _set_compound_prop
    val = validator.validate_coerce(val, skip_invalid=self._skip_invalid)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/_plotly_utils/basevalidators.py", line 2747, in validate_coerce
    return super(BaseTemplateValidator, self).validate_coerce(
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/_plotly_utils/basevalidators.py", line 2458, in validate_coerce
    v = self.data_class(v)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/graph_objs/layout/_template.py", line 319, in __init__
    self["data"] = _v
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 4811, in __setitem__
    self._set_compound_prop(prop, value)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 5222, in _set_compound_prop
    val = validator.validate_coerce(val, skip_invalid=self._skip_invalid)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/_plotly_utils/basevalidators.py", line 2454, in validate_coerce
    v = self.data_class(v, skip_invalid=skip_invalid, _validate=_validate)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/graph_objs/layout/template/_data.py", line 1533, in __init__
    self["barpolar"] = _v
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 4815, in __setitem__
    self._set_array_prop(prop, value)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 5296, in _set_array_prop
    val = validator.validate_coerce(val, skip_invalid=self._skip_invalid)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/_plotly_utils/basevalidators.py", line 2546, in validate_coerce
    res.append(self.data_class(v_el, skip_invalid=skip_invalid))
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/graph_objs/_barpolar.py", line 1840, in __init__
    self["marker"] = _v
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 4811, in __setitem__
    self._set_compound_prop(prop, value)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 5222, in _set_compound_prop
    val = validator.validate_coerce(val, skip_invalid=self._skip_invalid)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/_plotly_utils/basevalidators.py", line 2454, in validate_coerce
    v = self.data_class(v, skip_invalid=skip_invalid, _validate=_validate)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/graph_objs/barpolar/_marker.py", line 1123, in __init__
    self["line"] = _v
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 4811, in __setitem__
    self._set_compound_prop(prop, value)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 5222, in _set_compound_prop
    val = validator.validate_coerce(val, skip_invalid=self._skip_invalid)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/_plotly_utils/basevalidators.py", line 2454, in validate_coerce
    v = self.data_class(v, skip_invalid=skip_invalid, _validate=_validate)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/graph_objs/barpolar/marker/_line.py", line 628, in __init__
    self["color"] = _v
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 4819, in __setitem__
    self._set_prop(prop, value)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/plotly/basedatatypes.py", line 5158, in _set_prop
    val = validator.validate_coerce(val)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/_plotly_utils/basevalidators.py", line 1367, in validate_coerce
    validated_v = self.vc_scalar(v)
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/_plotly_utils/basevalidators.py", line 1397, in vc_scalar
    return ColorValidator.perform_validate_coerce(
File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/_plotly_utils/basevalidators.py", line 1420, in perform_validate_coerce
    if isinstance(v, numbers.Number) and allow_number:
File "/home/simon/anaconda3/envs/dev/lib/python3.8/abc.py", line 98, in __instancecheck__
    return _abc_instancecheck(cls, instance)

Parameters: { early_stopping_rounds } might not be used

Hello there; I got a message saying that I should report this error when it pops up:

Parameters: { early_stopping_rounds } might not be used
This may not be accurate due to some parameters are only 
used in language bindings but passed down to XGBoost core. 
Or some parameters are not used but slip through this verification. 
Please open an issue if you find above cases.

Here is how I added this parameter to the model, since I want 20 stopping rounds:

def model(opt):
    clf_xgb = xgb.XGBClassifier(
        n_estimators=opt["n_estimators"],
        max_depth=opt["max_depth"],
        learning_rate=opt["learning_rate"],
        objective="binary:logistic",
        # eta=0.4,
        subsample=0.5,
        base_score=np.mean(y_labels),
        eval_metric="logloss",
        missing=None,
        use_label_encoder=False,
        seed=42,
        early_stopping_rounds=20 )

System information:

  • OS: Ubuntu 20.04
  • Python 3.6
  • Hyperactive version: 3.0.5.1

ValueError: ndarray is not C-contiguous

Hello there!

In your keras_example, I have this error:

ValueError Traceback (most recent call last)
in ()
48
49 opt = Hyperactive(X_train, y_train)
---> 50 opt.search(search_config, n_iter=5)

13 frames
/usr/local/lib/python3.6/dist-packages/hyperactive/memory/util.py in get_hash(object)
14
15 def get_hash(object):
---> 16 return hashlib.sha1(object).hexdigest()
17
18

ValueError: ndarray is not C-contiguous

I'm using Google Colab.

Multiprocessing multiple add_searches

Hello, it wasn't entirely clear in the multiprocessing example but do the multiple add_searches run in parallel or when would you increase n_jobs? For example, if I want to run two optimizers in parallel (say the repulsive hill climber and the bayesian optimizer) how would I run both in parallel? Would it be like this below?

h = Hyperactive(["progress_bar", "print_results", "print_times"])
search_space = {
    "exp": list(range(0, 5)),
    "slope": list(np.arange(0.001, 10, step=0.05)),
    "freq_mult": list(np.arange(1, 2.5, 0.005)),
    "clust": [5],
    "df": [ret_df],
    "finp": [finp],
    "asc": [False],
    "use_pca": [False],
    "last": [False],
    "disc_type": ["type"],
}
h.add_search(
    func_minl,
    search_space=search_space,
    n_iter=10,
    optimizer=RepulsingHillClimbingOptimizer(
        epsilon=0.05,
        distribution="normal",
        n_neighbours=3,
        rand_rest_p=0.03,
        repulsion_factor=3,
    ),
    n_jobs=1,
    max_score=None,
    initialize={
        "warm_start": [
            {
                "exp": 2,
                "slope": 5,
                "freq_mult": 1.5,
                "clust": 5,
                "df": ret_df,
                "finp": finp,
                "asc": False,
                "use_pca": False,
                "last": False,
                "disc_type": "type",
            }
        ]
    },
    early_stopp={"tol_rel": 1, "n_iter_no_change": 3},
    random_state=0,
    memory=True,
    memory_warm_start=None,
)

h.add_search(
    func_minl,
    search_space=search_space,
    n_iter=maxiter,
    optimizer=BayesianOptimizer(rand_rest_p=0.03, xi=0.03, warm_start_smbo=None),
    n_jobs=1,
    max_score=None,
    early_stopping={"tol_rel": 0.005, "n_iter_no_change": 10},
    random_state=0,
    memory=False,
    memory_warm_start=None,
)

add multi gauss fit example

An interesting example for gradient-free-optimization is fitting one or multiple gauss functions to data. The data can be generated with numpy for this example.
A "real world" example of this problem is fitting gauss functions to particle-resonances in an energy spectrum. So this example should be very helpful for particle physicists.

API for v4.0.0 of Hyperactive

In this issue the progress for the final API-design for Hyperactive is tracked.

There are some important features I want to implement for Hyperactive:

  • A reliable way for Hyperactive to store search data.
  • Support for the search data storage of python functions

This would open up the following possibilities:
Hyperactive stores the search data after/during the search in the background (based on a experiment- and model-id provided by the user). When the user starts a new search the search data is read in automatically to save computation time.

This is not possible of the search space contains python objects that have a unique name that never changes. Python functions have this unique name. A python function name could be saved in place of a function in the search dataframe. Additionally the function is saved via "dill" in a separate file. If the search data is read in Hyperactive could read the function name to know which function it is and read the dill file to know what the function does.

So in the future the user could use int, float, string and functions in the search space. If the user needs other python objects in the search space it could be wrapped in a function like that:

def list1():
  return [1, 1, 0]

def list2():
  return [1, 1, 1]

search_space = {"list": [list1, list2]}

This would open up more possibilities for convenient plotting of the search data e.g. via plotly or a streamlit dashboard.

I think those features are very important for the future of Hyperactive, because it has to distinguish itself more from Gradient-Free-Optimizers and specialize more in optimization of computationally expensive objective functions.

Feature: Passing extra parameters to the optimization function

Is your feature request related to a problem? Please describe.
There are situations in which the optimization function is governed by different external parameters - e.g. if the optimization score is calculated as s = alpha * x + beta, where depending on the initial conditions, alpha and beta are different, it can become handy to be able to pass those as either values of the optimization function or of the opt input variable.

Describe alternatives you've considered
Right now this can be done by creating a lambda function depending on the initial condition that wraps the hyperactive optimization call - e.g. optim_func = lambda opt: true_optim_func(opt, alpha=1, beta=external_var), and changing the lambda dynamically.
This, however, does now work if we want to use n_jobs!=1 as mp.Pool cannot serialize the lambda function.

Describe the solution you'd like

  • Use multiprocess, dill, pathos or the like to allow serialization of the method and allowing to choose between a process vs thread model.
  • Allow passing additional arguments to the optimization function.

Additional context
This would even allow extra functionality like optimizing symbolic functions that are externally referenced.

edit: assessing if this type of alternative works.

New feature: Optimization Strategies

I would like to introduce a new feature to Hyperactive to chain together multiple optimization algorithms. This will be called an Optimization Strategy in the future.

The API for this feature could look like this:

opt_strat = OptimizationStrategy()
opt_strat.add_optimizer(RandomSearchOptimizer(), duration=0.5)
opt_strat.add_optimizer(HillClimbingOptimizer(), duration=0.5)

hyper = Hyperactive()
hyper.add_search(model, search_space, n_iter=20, optimizer=opt_strat)
hyper.run()

The duration will be the fraction of n_iter passed to add_search(...). Each optimizer will automatically pass the memory to the next one.

This feature-idea is in an early stage and might change in the future.

Change Optimization paramters at runtime

This adds a way to change the parameters of the optimization algorithms during runtime (e.g. epsilon from the hill-climbing optimizer). My idea is to enable this within the objective function. This way the user can change parameters based on conditions/data each time the objective function is called (evaluated). As seen in issue #49 there are already some parameters that can be changed via the objective-function argument, but this is not standardized, tested or documented, yet.

Population size

Look into the FAQ of the readme. Can the bug be resolved by one of those solutions?

Describe the bug

Code to reproduce the behavior

Error message from command line

System information:

  • OS Platform and Distribution
  • Python version
  • Hyperactive version

Additional context
Dear Simon,
i have one question relating the population size of PSO,
search_space,
initialize={"grid": 4, "random": 2, "vertices": 4},
population=10,
inertia=0.5,
cognitive_weight=0.5,
social_weight=0.5,
temp_weight=0.2,
rand_rest_p=0.03,
)
i can not increase the population size which is automatically reduced to 10 when i modifies to 20 or greater ? i tried to midifiy on file search.py

if random_state is None:
    random_state = np.random.randint(0, high=2 ** 32 - 2, dtype=np.int64)

but it is not changed anything, so can you help me to debug this issue please ?
I would investigate the population effect on time-consuming and cost function of my problem.
Thank you for your help

ValueError: assignment destination is read-only

Look into the FAQ of the readme. Can the bug be resolved by one of those solutions?
No
Describe the bug

When using the joblib as the parallel distributor, if the number of processes / size of them gets too big then an error will be thrown
ValueError: assignment destination is read-only
(image)

This issue is described here

Code to reproduce the behavior
scikit-learn/scikit-learn#5956

Error message from command line
ValueError: assignment destination is read-only

System information:

  • OS Platform and Distribution: Windows 10
  • Python version: 3.8
  • Hyperactive version 3.3.3

Additional context
The issue is not actually with Hyperactive however, the fix for the ValueError issue is to add max_nbytes='50M' to the Parallel instantiation. The issue is that when instantiating Hyperactive, there is no way to pass this argument through to joblib without changing the underlying Hyperactive package.

warm_start_smbo in sequence model based optimizers not working properly

The argument "warm_start_smbo" is not working properly in the current version (v3.0.3) of Hyperactive. The reason is a faulty conversion from the Hyperactive search space to the Gradient-Free-Optimizers search space.

The "warm_start_smbo" argument is used in the following optimizers:

  • BayesianOptimizer
  • TreeStructuredParzenEstimators
  • DecisionTreeOptimizer

Data type

Hello and tnx for this great library.
in CNN networks, most of the time we used imageDatagenerator which save data and label in a data frame and we have not a separate x and y. this is same for train, validation and test sets. this library can support this? or we need separate data and labels like in examples. if this support dataframeiterator type (output of imageDatagenerator) will be amazing.

Type Error: unsopported operand type(s) for -: 'function and function'

Look into the FAQ of the readme. Can the bug be resolved by one of those solutions?
Not in the FAQs

Describe the bug

TypeError: unsupported operand type(s) for -: 'function' and 'function'

Code to reproduce the behavior
'''
from hyperactive import Hyperactive
from hyperactive import RepulsingHillClimbingOptimizer, RandomAnnealingOptimizer

  class Parameters:
      def __init__(self):
          self.x = 5
          
  finp = Parameters()
  def ret_df():
      return df
  
  def func_minl(opts):
      return opts['slope'] + opts['exp']
    
  h = Hyperactive(["progress_bar", "print_results", "print_times"])
  search_space = {'exp':list(range(0, 5)),
                 'slope': list(np.arange(.001,10,step = .05)),
                 'freq_mult':list(np.arange(1,2.5,.005)),
                 'clust':[5],
                  'df': [ret_df],
                  'finp': [finp],
                  'asc': [False],
                  'use_pca':[False],
                  'last':[False],
                  'disc_type':['type']
                  }
  h.add_search(func_minl, search_space = search_space, n_iter = 10, optimizer = 
      RepulsingHillClimbingOptimizer(epsilon=0.05,
      distribution="normal",n_neighbours=3,rand_rest_p=0.03,repulsion_factor=3), n_jobs = 1, max_score = None,initialize = 
      {'warm_start' : [{'exp':2,'slope':5,'freq_mult':1.5,'clust':5,
                  'df': ret_df,
                  'finp': finp,
                  'asc': False,
                  'use_pca':False,
                  'last':False,
                  'disc_type':'type'
                  }]}, early_stopping = {'tol_rel':1, 'n_iter_no_change':3},random_state = 0, memory= True, memory_warm_start = None)

'''

Error message from command line
When adding a dataframe as a parameter in the search space, by using a function as mentioned in the documentation, I am receiving an error
--> 973 h.add_search(func_minl, search_space = search_space, n_iter = maxiter, optimizer = RepulsingHillClimbingOptimizer(epsilon=0.05,
974 distribution="normal",
975 n_neighbours=3,

~\Anaconda3\lib\site-packages\hyperactive\hyperactive.py in add_search(self, objective_function, search_space, n_iter, search_id, optimizer, n_jobs, initialize, max_score, early_stopping, random_state, memory, memory_warm_start, progress_board)
148 self.check_list(search_space)
149
--> 150 optimizer.init(search_space, initialize, progress_collector)
151
152 self._add_search_processes(

~\Anaconda3\lib\site-packages\hyperactive\optimizers\gfo_wrapper.py in init(self, search_space, initialize, progress_collector)
76 self.trafo = HyperGradientTrafo(search_space)
77
---> 78 initialize = self.trafo.trafo_initialize(initialize)
79 search_space_positions = self.trafo.search_space_positions
80

~\Anaconda3\lib\site-packages\hyperactive\optimizers\hyper_gradient_trafo.py in trafo_initialize(self, initialize)
113 for warm_start_ in warm_start:
114 value = self.para2value(warm_start_)
--> 115 position = self.value2position(value)
116 pos_para = self.value2para(position)
117

~\Anaconda3\lib\site-packages\hyperactive\optimizers\hyper_gradient_trafo.py in value2position(self, value)
16 position = []
17 for n, space_dim in enumerate(self.search_space_values):
---> 18 pos = np.abs(value[n] - np.array(space_dim)).argmin()
19 position.append(int(pos))
20
TypeError: unsupported operand type(s) for -: 'function' and 'function'

System information:

  • OS Platform and Distribution
  • Windows 10
  • Python version 3.8.8
  • Hyperactive version 3.3.2

Additional context

Handle np.inf or -np.inf returned by objective_function/model

If objective_function return -np.inf some optimizers cannot handle it.

For example:

EvolutionStrategyOptimizer, ParallelTemperingOptimizer throw exception
AttributeError: 'NoneType' object has no attribute 'shape'

DecisionTreeOptimizer throw exception
MemoryError: Unable to allocate 1.81 PiB for an array with shape (9857, 1620, 1620, 9857) and data type int64

But RandomSearchOptimizer works well

In my case objective function returns -np.inf when no solution for selected set of parameters (e.g division by zero). Now function just returns big negative integer as workaround

Bug when calling .results(...) in jupyter-notebook

This bug occurs when you want to access the search data via the .results(objective_function) method within a jupyter-notebook. The reason is because a jupyter-notebook seems to have a strange way to store python functions after they were defined.
Sometimes Hyperactive doesn't find the objective_function any more because it is stored at another place.

I can probably fix this error by comparing the passed objective_function with the known one by their name (via __name__).
Before that I will try to find a reliable way to reproduce this error.

Metadata issue under Windows

Hi, thank you for making this amazing package!

When I try to run the demo (with no changes) on a windows environment under conda using the latest version from pypi I run into the following error. Please could you tell me what is going wrong or if windows is even a supported OS?

Error:

(hyperactive) λ python test.py
Traceback (most recent call last):
  File "test.py", line 24, in <module>
    opt.search(search_config, n_iter=20)
  File "C:\Users\<user>\Miniconda3\lib\site-packages\hyperactive\hyperactive_api.py", line 40, in search
    core.run()
  File "C:\Users\<user>\Miniconda3\lib\site-packages\hyperactive\hyperactive_core.py", line 20, in run
    dist.dist(Search, self._main_args_, self._opt_args_)
  File "C:\Users\<user>\Miniconda3\lib\site-packages\hyperactive\distribution.py", line 31, in dist
    self.dist_default(search_class, _main_args_, _opt_args_)
  File "C:\Users\<user>\Miniconda3\lib\site-packages\hyperactive\distribution.py", line 36, in dist_default
    _optimizer_.search()
  File "C:\Users\<user>\Miniconda3\lib\site-packages\hyperactive\search.py", line 73, in search
    self._run_job(nth_process)
  File "C:\Users\<user>\Miniconda3\lib\site-packages\hyperactive\search.py", line 96, in _run_job
    _cand_, _p_ = self._search(nth_process)
  File "C:\Users\<user>\Miniconda3\lib\site-packages\hyperactive\search.py", line 122, in _search
    _cand_ = self._initialize_search(self._main_args_, nth_process, self._info_)
  File "C:\Users\<user>\Miniconda3\lib\site-packages\hyperactive\search.py", line 146, in _initialize_search
    _cand_ = Candidate(nth_process, _main_args_, _info_)
  File "C:\Users\<user>\Miniconda3\lib\site-packages\hyperactive\candidate.py", line 49, in __init__
    self.mem = LongTermMemory(self._space_, _main_args_, self)
  File "C:\Users\<user>\Miniconda3\lib\site-packages\hyperactive\memory\memory.py", line 39, in __init__
    self._load_ = MemoryLoad(_space_, _main_args_, _cand_)
  File "C:\Users\<user>\Miniconda3\lib\site-packages\hyperactive\memory\memory_load.py", line 25, in __init__
    super().__init__(_space_, _main_args_, _cand_)
  File "C:\Users\<user>\Miniconda3\lib\site-packages\hyperactive\memory\memory_io.py", line 35, in __init__
    os.makedirs(self.date_path, exist_ok=True)
  File "C:\Users\<user>\Miniconda3\lib\os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "C:\Users\<user>\Miniconda3\lib\os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "C:\Users\<user>\Miniconda3\lib\os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "C:\Users\<user>\Miniconda3\lib\os.py", line 221, in makedirs
    mkdir(name, mode)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\<user>\\Miniconda3\\lib\\site-packages\\hyperactive\\memory\\paths.py/meta_data'

Code being run in test.py

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from hyperactive import Hyperactive

data = load_breast_cancer()
X, y = data.data, data.target

'''define the model in a function'''
def model(para, X, y):
    '''optimize one or multiple hyperparameters'''
    gbc = GradientBoostingClassifier(n_estimators=para['n_estimators'])
    scores = cross_val_score(gbc, X, y)

    return scores.mean()

'''create the search space and search_config'''
search_config = {
    model: {'n_estimators': range(10, 200, 10)}
}

'''start the optimization run'''
opt = Hyperactive(X, y)
opt.search(search_config, n_iter=20)

Versions:

  • Python 3.7.3
  • hyperactive 2.3.0
  • sklearn 0.21.3
  • Windows 10 64bit

Memoize evaluation results across multiple processes

Hi Simon,

I'm not sure if this is an enhancement request or it's already done. With memory = True Hyperactive can memoize evaluation results for later use, while in multiprocessing setup with n_jobs > 1, it looks each process keeps its own local dictionary and they cannot access the results from other processes, because I save evaluation results of all processes and there are some identical parameter sets.

This feature would save more time and should not be very hard to implement, for example, using a global Manager().dict(), or share a cache. I'm now manually wrapping my objective function so it looks up a synchronized dict() before doing actual evaluation, but it would be much nicer to have Hyperactive manage it.
Thanks!

Please help me with the error I'm getting

I am trying to run the following code :

import time
import numpy as np
from keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
from keras import optimizers

from hyperactive import RandomSearchOptimizer, ParticleSwarmOptimizer

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

X_train = X_train.astype('float32')/255.0
X_test = X_test.astype('float32')/255.0

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

sgd = optimizers.SGD(lr=0.01)
adam = optimizers.Adam(lr=0.01)

#this defines the structure of the model and print("time: {}".format(t2-t1))the search space in each layer
search_config = {
"keras.compile.0": {"loss": ["categorical_crossentropy"], "optimizer": [adam, sgd]},
"keras.fit.0": {"epochs": [10], "batch_size": range(10, 101), "verbose": [2]},
"keras.layers.Conv2D.1": {
"filters": range(4, 101),
"kernel_size": [3, 5, 7],
"activation": ["sigmoid", "relu", "tanh"],
"input_shape": [(32, 32, 3)],
},
"keras.layers.MaxPooling2D.2": {"pool_size": [(2, 2)]},
"keras.layers.Conv2D.3": {
"filters": range(4, 101),
"kernel_size": [3, 5, 7],
"activation": ["sigmoid", "relu", "tanh"],
},
"keras.layers.MaxPooling2D.4": {"pool_size": [(2, 2)]},
"keras.layers.Flatten.5": {},
"keras.layers.Dense.6": {"units": range(4, 201), "activation": ["sigmoid", "relu", "tanh"]},
"keras.layers.Dense.7": {"units": range(4, 201), "activation": ["sigmoid", "relu", "tanh"]},
#"keras.layers.Dropout.7": {"rate": list(np.arange(0.2, 0.8, 0.2))},
"keras.layers.Dense.8": {"units": [10], "activation": ["softmax"]},
}

Optimizer = ParticleSwarmOptimizer(search_config, n_iter=10, n_part=10, metric='accuracy', cv=0.8, w=0.7, c_k=2.0, c_s=2.0)
#Optimizer = ParticleSwarmOptimizer(search_config, n_iter=10, metric="accuracy", n_jobs=1, cv=3, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, n_part=4, w=0.5, c_k=0.5, c_s=0.9)

t1 = time.time()
Optimizer.fit(X_train, y_train)
t2 = time.time()

print("time: {}".format(t2-t1))

Optimizer.predict(X_test)
score = Optimizer.score(X_test, y_test)

print("test score: {}".format(score))

But I am getting an issue,
Screenshot 2021-07-06 at 9 00 30 PM

Saving checkpoints

Hi Simon,

Please could you let me know if there is a way to checkpoint the search space and persist it to disk after each iteration. I imagine there is a way since there is a memory_warm_start parameter available that could be used to resume from a saved checkpoint once loaded into a pandas dataframe?
I can see a call to .results() will return the dataframe in question but that seems to only be possible at the completion of all n_iter?

Thanks!

n_jobs opens a limited number of processes

Describe the bug
n_jobs does not start the provided number of processes, instead it starts min(n_jobs, 12), and with n_jobs=-1 it runs 12 jobs.

Code to reproduce the behavior

points = np.random.rand(100)

def model(opt):
    return np.sum(points)

hyper = Hyperactive(distribution="joblib")
space = dict(a=np.arange(10))
hyper.add_search(model, space, n_iter=50, n_jobs=20)
hyper.run()
# len(hyper.results_list) shows 12

Error message from command line
NA

System information:

  • OS Platform and Distribution: MacOS 10.16.4
  • Python version: 3.8.5
  • Hyperactive version: 3.0.5

Additional context

Basically, I want to run the optimization algorithm e.g. 20 times and as far as I understood, I could achieve that with n_jobs, but n_jobs creates fewer runs.

add ray multiprocessing support

The popular python package ray has a multiprocessing feature that could be used to run optimization-processes in parallel:

from ray.util.multiprocessing import Pool

def f(index):
    return index

pool = Pool()
for result in pool.map(f, range(100)):
    print(result)

This Pool-API is very similar to the regular multiprocessing. I will look further into if it's possible to integrate Ray so that its features can be used in Hyperactive.

Blank lines in terminal when verbosity=False

from hyperactive import Hyperactive
from hyperactive import RandomSearchOptimizer
import numpy as np

print('Start')

def fn(params):
  return 123

search_space = {
  'a': np.arange(0, 100),
}

optimizer = RandomSearchOptimizer()

for _ in range(10):
  hyper = Hyperactive(verbosity=False)
  hyper.add_search(fn, search_space, optimizer=optimizer, n_iter=1000, memory=False)
  hyper.run()
  best_para = hyper.best_para(fn)

print('done')

Output:

Start




















done

No score variance over 50 iterations despite multiple parameters switched

Attempted tuning a xgboost binary classifier on tf-idf data adjusting n_estimators, max_depth, and learning_rate and there was zero variation in the score for each of 50 iterations. When I manually tweak parameters and run a single training instance manually, I achieve score variations. Note: I have also tried this with the default optimizer for 20 iterations and different ranges for the parameter tuning, and it gave me the same results: the score is always 0.6590446358653093.

SYSTEM DETAILS:
Amazon SageMaker
Hyperactive ver: 3.0.5.1
Python ver: 3.6.13

Here is my code:

freq_df, y_labels = jc.prep_train_data('raw_data.pkl', remove_stopwords=False)

def model(opt):
    clf_xgb = xgb.XGBClassifier(objective='binary:logistic',
                            #eta=0.4,
                            #max_depth=8,
                            subsample=0.5,
                            base_score=np.mean(y_labels),
                            eval_metric = 'logloss',
                            missing=None,
                            use_label_encoder=False,
                            seed=42)
    
    scores = cross_val_score(clf_xgb, freq_df, y_labels, cv=5) # default is 5, hyperactive example is 3

    return scores.mean()

# Configure the range of hyperparameters we want to test out
search_space = {
    "n_estimators": list(range(500, 5000, 100)),
    "max_depth": list(range(6, 12)),
    "learning_rate": [0.1, 0.3, 0.4, 0.5, 0.7],
}

# Configure the optimizer
optimizer = SimulatedAnnealingOptimizer(
    epsilon=0.1,
    distribution="laplace",
    n_neighbours=4,
    rand_rest_p=0.1,
    p_accept=0.15,
    norm_factor="adaptive",
    annealing_rate=0.999,
    start_temp=0.8)

# Execute optimization
hyper = Hyperactive()
hyper.add_search(model, search_space, n_iter=50, optimizer=optimizer)
hyper.run()

# Print-out the results and save them to a dataframe
results_filename = "xgboost_hyperactive_results.csv"

search_data = hyper.results(model)
search_data.to_csv(results_filename, index=0)

New feature: save optimizer object to continue optimization run at a later time.

Explanation

It would be very useful if Hyperactive has the ability to save the optimization backend (via pickle, dill, cloudpickle, ...) to disk and load it later into Hyperactive to continue the optimization run.

So the goal is, that the optimizer can be saved during one code execution and loaded at a later time during a second code execution. The optimization run should behave as if there was no break between the two optimization runs.

The optimization backend of Hyperactive is Gradient-Free-Optimizers. So I first confirmed that GFO optimizer-objects can be saved and loaded in two different code executions. In the following script the optimizer-object is saved if it does not exist, yet. This code must then be executed a second time. The optimizer-object is loaded and continues the search.

Save and load GFO-optimizer

import os
import numpy as np

from gradient_free_optimizers import RandomSearchOptimizer

import dill as pkl

file_name = "./optimizer.pkl"

def load(file_name):
    if os.path.isfile(file_name):
        with open(file_name, "rb") as pickle_file:
            return pkl.load(pickle_file)
    else:
        print("---> Warning: No file found in path:", file_name)

def save(file_name, data):
    with open(file_name, "wb") as f:
        pkl.dump(data, f)


def parabola_function(para):
    loss = para["x"] * para["x"]
    return -loss

search_space = {"x": np.arange(-10, 10, 0.1)}

opt_loaded = load(file_name)
if opt_loaded:
    print("Optimizer loaded!")
    opt_loaded.search(parabola_function, n_iter=100)

else:
    opt = RandomSearchOptimizer(search_space)
    opt.search(parabola_function, n_iter=10000)

    save(file_name, opt)
    print("Optimizer saved!")

The code above works fine!

So lets try to now access the optimizer objects from within Hyperactive, save it and load it during a second code execution:

Save and load optimizer (GFO-wrapper) from within Hyperactive

import os
import numpy as np

from hyperactive import Hyperactive

import dill as pkl

file_name = "./optimizer.pkl"

def load(file_name):
    if os.path.isfile(file_name):
        with open(file_name, "rb") as pickle_file:
            return pkl.load(pickle_file)
    else:
        print("---> Warning: No file found in path:", file_name)

def save(file_name, data):
    with open(file_name, "wb") as f:
        pkl.dump(data, f)


def parabola_function(para):
    loss = para["x"] * para["x"]
    return -loss

search_space = {"x": list(np.arange(-10, 10, 0.1))}

opt_loaded = load(file_name)
if opt_loaded:
    print("Optimizer loaded!")
    # do stuff

else:
    hyper = Hyperactive()
    hyper.add_search(parabola_function, search_space, n_iter=100)
    hyper.run()

    # access the optimizer attribute from the list of results
    optimizer = hyper.opt_pros[0]._optimizer  # not official API

    save(file_name, optimizer)
    print("Optimizer saved!")

If you executed the code above two times you will probably encounter the error message further down. The reason why this error occurs is a mystery to me. There is a FileNotFoundError even though the file is present. I do not have expert knowledge about pickling processes/functions, so I would be very grateful to get help with this problem.

If you take a look at the type of hyper.opt_pros[0]._optimizer from Hyperactive you can see, that it is the same GFO optimizer-object as in the GFO stand-alone-code (the first example).

My guess would be, that the optimizer-class in Hyperactive receives parameters that cannot be pickled by dill (or couldpickle) for some reason. The source code where GFO receives parameters within Hyperactive can be found here.

Traceback (most recent call last):
  File "hyper_pkl_optimizer.py", line 33, in <module>
    opt_loaded = load(file_name)
  File "hyper_pkl_optimizer.py", line 15, in load
    return pkl.load(pickle_file)
  File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/dill/_dill.py", line 373, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/dill/_dill.py", line 646, in load
    obj = StockUnpickler.load(self)
  File "/home/simon/anaconda3/envs/dev/lib/python3.8/multiprocessing/managers.py", line 959, in RebuildProxy
    return func(token, serializer, incref=incref, **kwds)
  File "/home/simon/anaconda3/envs/dev/lib/python3.8/multiprocessing/managers.py", line 809, in __init__
    self._incref()
  File "/home/simon/anaconda3/envs/dev/lib/python3.8/multiprocessing/managers.py", line 863, in _incref
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/home/simon/anaconda3/envs/dev/lib/python3.8/multiprocessing/connection.py", line 502, in Client
    c = SocketClient(address)
  File "/home/simon/anaconda3/envs/dev/lib/python3.8/multiprocessing/connection.py", line 630, in SocketClient
    s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory

So the goal is now to fix the problem with the second code example and enable the correct saving and loading of the optimizer-object from Hyperactive.

AttributeError can't pickle local function object

Look into the FAQ of the readme. Can the bug be resolved by one of those solutions?
No

Describe the bug
When using multiple optimizers and passing a local function, I am getting the error listed below.

Code to reproduce the behavior

import pandas as pd
from hyperactive import Hyperactive
from hyperactive import RepulsingHillClimbingOptimizer, BayesianOptimizer

def f():
    def optimization_func(opt):
        return 5 

    h = Hyperactive(["progress_bar", "print_results", "print_times"])
    search_space = {
        "exp": list(range(0, 5)),
        "slope": list(np.arange(0.001, 10, step=0.05)),
        "clust": [5]
    }

    h.add_search(
        optimization_func,
        search_space=search_space,
        n_iter=10,
        optimizer=RepulsingHillClimbingOptimizer(
            epsilon=0.05,
            distribution="normal",
            n_neighbours=3,
            rand_rest_p=0.03,
            repulsion_factor=3,
        ),
        n_jobs=1,
        max_score=None,
        initialize={
            "warm_start": [
                {
                    "exp": 2,
                    "slope": 5,
                    "clust": 5
                }
            ]
        },
        early_stopping={"tol_rel": 0.001, "n_iter_no_change": 3},
        random_state=0,
        memory=True,
        memory_warm_start=None,
    )
    h.add_search(optimization_func,search_space = search_space,n_iter = 10)
    h.run()

f()

Error message from command line

AttributeError                            Traceback (most recent call last)
<ipython-input-21-5522e445d18f> in <module>
     45     h.run()
     46 
---> 47 f()

<ipython-input-21-5522e445d18f> in f()
     43     )
     44     h.add_search(optimization_func,search_space = search_space,n_iter = 10)
---> 45     h.run()
     46 
     47 f()

~\Anaconda3\lib\site-packages\hyperactive\hyperactive.py in run(self, max_time, _test_st_backend)
    178                 progress_board.open_dashboard()
    179 
--> 180         self.results_list = run_search(
    181             self.process_infos, self.distribution, self.n_processes
    182         )

~\Anaconda3\lib\site-packages\hyperactive\run_search.py in run_search(search_processes_infos, distribution, n_processes)
     49         (distribution, process_func), dist_paras = _get_distribution(distribution)
     50 
---> 51         results_list = distribution(
     52             process_func, process_infos, n_processes, **dist_paras
     53         )

~\Anaconda3\lib\site-packages\hyperactive\distribution.py in multiprocessing_wrapper(process_func, search_processes_paras, n_processes, **kwargs)
     18 ):
     19     pool = mp.Pool(n_processes, **kwargs)
---> 20     results = pool.map(process_func, search_processes_paras)
     21 
     22     return results

~\Anaconda3\lib\multiprocessing\pool.py in map(self, func, iterable, chunksize)
    362         in a list that is returned.
    363         '''
--> 364         return self._map_async(func, iterable, mapstar, chunksize).get()
    365 
    366     def starmap(self, func, iterable, chunksize=None):

~\Anaconda3\lib\multiprocessing\pool.py in get(self, timeout)
    769             return self._value
    770         else:
--> 771             raise self._value
    772 
    773     def _set(self, i, obj):

~\Anaconda3\lib\multiprocessing\pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
    535                         break
    536                     try:
--> 537                         put(task)
    538                     except Exception as e:
    539                         job, idx = task[:2]

~\Anaconda3\lib\multiprocessing\connection.py in send(self, obj)
    204         self._check_closed()
    205         self._check_writable()
--> 206         self._send_bytes(_ForkingPickler.dumps(obj))
    207 
    208     def recv_bytes(self, maxlength=None):

~\Anaconda3\lib\multiprocessing\reduction.py in dumps(cls, obj, protocol)
     49     def dumps(cls, obj, protocol=None):
     50         buf = io.BytesIO()
---> 51         cls(buf, protocol).dump(obj)
     52         return buf.getbuffer()
     53 

AttributeError: Can't pickle local object 'f.<locals>.optimization_func'

System information:

  • OS Platform and Distribution -Windows 10
  • Python version -3.9
  • Hyperactive version 3.3.3

Additional context
In the previous question you had mentioned support for pathos. How do I use that in order to run this successfully?

hyper.results(model)

Dear Simon,
I have recently updated to the Hyperactive 4.2.0 however, it seems that the interessting feature as hyper.results(model) which help us save each iteration of optimization process in the csv file, is no longer existed. Perhaps, you change to print_results ? please help me fix this issue.
Best regard,
Thank you

Optimization in serial?

Dear SimonBlanke,

First of all, thank You for this wonderful project!

My question is: can I somehow specify the optimizers to NOT run in parallel? My function is actually a call to external (parallelized) program, which furthermore cannot be run in multiple instances on the same machine (it is a commercial program). So I would like to prevent the optimizer from initiating next calculations while the first one is not finished.

Best regards,
Igors

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.