Coder Social home page Coder Social logo

glimr's People

Contributors

cooperlab avatar lawrence-chillrud avatar marinayad avatar raminnateghi avatar tilly-s avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

glimr's Issues

Tuning search space should not be large

Recently, did an experiment and used rastrigin function to figure out how well Ray works when we increase the dimensionality of search space. rastrigin has a lot of local minima, but it's global minimum is at 0 [Fig. 1].
The results of the experiment shows that as the dimensionality of the search space increases, the results deteriorate. It raised my concern if Ray is able to find best config when there are a lot of tunable hyperparameters. Maybe we need to define some trimming tools to trim the search space and reduce its dimensionally, or we can set some hyperparameters constant, especially those we think we might not get any benefit from.

image

Code:

from ray import train, tune
from ray.tune.schedulers import PopulationBasedTraining
import numpy as np

# rastrigin function.
def rastrigin(config):
    x = list(config.values())
    n = len(x)
    score = 10*n + sum([xi**2 - 10*np.cos(2*np.pi*xi) for xi in x])
    return {"score": score}

# plot rastrigin in 3D 
# Note: it's global minimum is at zeros.
x = np.linspace(-5.12, 5.12, 100)
y = np.linspace(-5.12, 5.12, 100)
X, Y = np.meshgrid(x, y)
Z = rastrigin({"a":X, "b":Y})

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z['score'], cmap='viridis')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')


# run ray experiment to find global minimum of n-D rastrigin func.
max_dim = 10
scores = []
for d in range(1, max_dim):
  
  search_space = {f"var_{i}": tune.quniform(-2, 2, 0.05) for i in range(d)}
  scheduler = PopulationBasedTraining(
      time_attr="training_iteration", 
      hyperparam_mutations=search_space, 
      metric="score", 
      mode="max",
  )

  tuner = tune.Tuner(rastrigin, 
                    param_space=search_space,
                    tune_config=tune.TuneConfig(
                          num_samples=50,
                          scheduler=scheduler,
                      ),)

  results = tuner.fit()
  scores.append(results.get_best_result(metric="score", mode="min").metrics['score']) 

The resulting loss value vs the dimensionality of search space.
Screenshot 2023-12-13 at 13 34 22

Dictionary Key Indexing

Python return the dict_keys object when calling a dictionary keys

The following lines(Line 116 in Search.py) need to be modified

task_name = space["tasks"].keys()[0]
metricname = space["tasks"][taskname].keys()[0]

to

task_name = list(space["tasks"].keys())[0]
metricname = list(space["tasks"][taskname].keys())[0]

Fix trial numbering mismatch in the experiment_table function

Here is the suggested code fix:

import re
trials = []
subdirs = os.listdir(exp_dir)
for i, subdir in enumerate(subdirs):
if subdir.startswith(“trainable”) and os.path.isdir(
os.path.join(exp_dir, subdir)
):
trial_num = re.search(r’(?:[^_]*){3}(\d+)’, subdir).group(1)
result_path = os.path.join(exp_dir, subdir, “result.json”)
if os.path.exists(result_path):
trial = pd.read_json(result_path, lines=True)
trial.insert(0, “trial
#“, trial_num)
trial.insert(1, “subdir”, subdir)
trial.insert(2, “exp_dir”, exp_dir)
trials.append(trial)
df = pd.concat(trials, ignore_index=True)

Integrate K-fold cross-validation into glimr tuning

@cooperlab , currently for tuning we are using a config dictionary that enables us to search through different configurations and select the best model. However, the data pipeline is still fixed for all the trials, which means all the trials use the same train/validation sets [Fig.1].
It would be interesting to resample data automatically during tuning, i.e. generate train/validation subsets during tuning [Fig. 2].
This would allow us to train various models independently through distinct training/validation sets, which enable us to do ensemble tuning, a valuable approach to deal with overfitting issue, particularly in situations with limited data.

[Fig. 1]
Screenshot 2023-11-29 at 13 04 28

[Fig. 2]
Screenshot 2023-11-29 at 13 54 49

The common resampling technique is k-fold cross validation (cv), but we can use other resampling methods.
I think we need to consider the following items if we want to implement that:

  1. Update search space so we can resample data during tuning using k-fold cv. This can be done by passing an index into the data loader.
  2. Write a wrapper to analyse the logs generated by trials to extract fold-specific information, models, etc.
  3. Each trial should be run on a specific resampled data

Investigate conditional search spaces

See if conditional search spaces are compatible with PBT and ASHA schedulers.

They are not compatible with most search algorithms, but search algorithms are not currently working anyway.

Allow overriding of hyperparameter notation

Users should be able to specify samplers from ray.tune.search.sample in place of list or set convention that provide more limited sampling options.

This can be enabled in glimr.utils.set_hyperparameter by checking if a hyperparameter is callable, and if fun.__module__ == ray.tune.search.sample, and also by checking against a list of known sampler function names.

remove functions from search space using prune_constants, applied to conditional tuning

To define conditional search space, we can employ tune.sample_from, where conditions are expressed as functions using lambda expressions.".

However, we will not be able to use some schedulers like PBT, if the search space contains functions. To fix that and in fact enable glimr to use PBT for conditional search spaces, one thing we can do is to update prune_constants so it can remove functions(defined by tune.sample_from) from search space.

Removing conditional functions from PBT's search space does not mean that they will no longer exist in the search space, but means that they will not be mutable anymore, which makes sense, because functions are not mutable things.

Prevent callables from using kwargs

Using kwargs with callable losses via functools.partial prevents saving of models (tensorflow error). For now, raise an error when kwargs are provided with losses. Upgrade losses to classes to enable kwargs.

Allow multiple losses per output

Currently glimr.keras.keras_losses only allows one loss per output. Enable multiple losses per output, similar to glimr.keras.keras_metrics.

restore is broken

PR #31 breaks the ability to restore halted experiments.

Failure # 1 (occurred at 2023-07-18_14-54-14) �[36mray::ImplicitFunc.train()�[39m (pid=8512, ip=127.0.0.1, repr=trainable) File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 384, in train raise skipped from exception_cause(skipped) File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 336, in entrypoint return self._trainable_func( File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 653, in _trainable_func output = fn() File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/glimr/search.py", line 311, in trainable model, losses, loss_weights, metrics = config["builder"](config) TypeError: 'tuple' object is not callable Failure # 2 (occurred at 2023-07-18_14-54-24) �[36mray::ImplicitFunc.train()�[39m (pid=8570, ip=127.0.0.1, repr=trainable) File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 384, in train raise skipped from exception_cause(skipped) File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 336, in entrypoint return self._trainable_func( File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 653, in _trainable_func output = fn() File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/glimr/search.py", line 311, in trainable model, losses, loss_weights, metrics = config["builder"](config) TypeError: 'tuple' object is not callable Failure # 3 (occurred at 2023-07-18_14-54-30) �[36mray::ImplicitFunc.train()�[39m (pid=8582, ip=127.0.0.1, repr=trainable) File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 384, in train raise skipped from exception_cause(skipped) File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 336, in entrypoint return self._trainable_func( File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 653, in _trainable_func output = fn() File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/glimr/search.py", line 311, in trainable model, losses, loss_weights, metrics = config["builder"](config) TypeError: 'tuple' object is not callable Failure # 4 (occurred at 2023-07-18_14-54-30) �[36mray::ImplicitFunc.train()�[39m (pid=8515, ip=127.0.0.1, repr=trainable) File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 384, in train raise skipped from exception_cause(skipped) File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 336, in entrypoint return self._trainable_func( File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 653, in _trainable_func output = fn() File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/glimr/search.py", line 311, in trainable model, losses, loss_weights, metrics = config["builder"](config) TypeError: 'tuple' object is not callable Failure # 5 (occurred at 2023-07-18_14-54-38) �[36mray::ImplicitFunc.train()�[39m (pid=8586, ip=127.0.0.1, repr=trainable) File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 384, in train raise skipped from exception_cause(skipped) File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 336, in entrypoint return self._trainable_func( File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 653, in _trainable_func output = fn() File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/glimr/search.py", line 311, in trainable model, losses, loss_weights, metrics = config["builder"](config) TypeError: 'tuple' object is not callable Failure # 6 (occurred at 2023-07-18_14-54-45) �[36mray::ImplicitFunc.train()�[39m (pid=8596, ip=127.0.0.1, repr=trainable) File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 384, in train raise skipped from exception_cause(skipped) File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 336, in entrypoint return self._trainable_func( File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 653, in _trainable_func output = fn() File "/Users/lac5440/anaconda3/lib/python3.10/site-packages/glimr/search.py", line 311, in trainable model, losses, loss_weights, metrics = config["builder"](config) TypeError: 'tuple' object is not callable

PBT TuneError in perturbing config

TuneError: Traceback (most recent call last):
  File "/home/lgc2035/miniconda3/envs/kidney/lib/python3.8/site-packages/ray/tune/execution/tune_controller.py", line 853, in _on_result
    on_result(trial, *args, **kwargs)
  File "/home/lgc2035/miniconda3/envs/kidney/lib/python3.8/site-packages/ray/tune/execution/trial_runner.py", line 735, in _on_training_result
    self._process_trial_results(trial, result)
  File "/home/lgc2035/miniconda3/envs/kidney/lib/python3.8/site-packages/ray/tune/execution/trial_runner.py", line 748, in _process_trial_results
    decision = self._process_trial_result(trial, result)
  File "/home/lgc2035/miniconda3/envs/kidney/lib/python3.8/site-packages/ray/tune/execution/trial_runner.py", line 791, in _process_trial_result
    decision = self._scheduler_alg.on_trial_result(
  File "/home/lgc2035/miniconda3/envs/kidney/lib/python3.8/site-packages/ray/tune/schedulers/pbt.py", line 545, in on_trial_result
    self._checkpoint_or_exploit(
  File "/home/lgc2035/miniconda3/envs/kidney/lib/python3.8/site-packages/ray/tune/schedulers/pbt.py", line 652, in _checkpoint_or_exploit
    self._exploit(trial_runner, trial, trial_to_clone)
  File "/home/lgc2035/miniconda3/envs/kidney/lib/python3.8/site-packages/ray/tune/schedulers/pbt.py", line 819, in _exploit
    new_config, operations = self._get_new_config(trial, trial_to_clone)
  File "/home/lgc2035/miniconda3/envs/kidney/lib/python3.8/site-packages/ray/tune/schedulers/pbt.py", line 709, in _get_new_config
    return _explore(
  File "/home/lgc2035/miniconda3/envs/kidney/lib/python3.8/site-packages/ray/tune/schedulers/pbt.py", line 80, in _explore
    nested_new_config, nested_ops = _explore(
  File "/home/lgc2035/miniconda3/envs/kidney/lib/python3.8/site-packages/ray/tune/schedulers/pbt.py", line 80, in _explore
    nested_new_config, nested_ops = _explore(
  File "/home/lgc2035/miniconda3/envs/kidney/lib/python3.8/site-packages/ray/tune/schedulers/pbt.py", line 125, in _explore
    new_config[key] = config[key] * perturbation_factor
TypeError: can't multiply sequence by non-int of type 'float'

Seems to be an error in the PBT scheduler's attempt to generate a new config for a new trial... Regular trials were training just fine until this error popped up when it was time for the PBT scheduler to perturb a config.

restore is broken again

Values for the dataloader and model builder functions in config are being received by trials as placeholders. This relates to ray internals and may be difficult to fix. Check ray versions.

Testing

Revisit testing and hit 80% coverage.

Create a function to trim constants from the PBT mutations dict

Constant values that are not mutatable should be removed from the hyperparameter_mutations argument of PopulationBasedTraining.

Traceback (most recent call last): File "i-score.py", line 59, in <module> attempt_tuning( File "/renal_allograft/code/utils/attempt_tuning.py", line 35, in attempt_tuning scheduler = PopulationBasedTraining( File "/usr/local/lib/python3.8/dist-packages/ray/tune/schedulers/pbt.py", line 360, in __init__ raise TypeError( TypeError: hyperparam_mutation values must be either a List, Tuple, Dict, a tune search space object, or a callable.

update notebooks

Illustrate how to use the ResultsGrid object and Lawrence's top_k function to analyze results on completion of PR #40.

Allow non-class losses

A valid loss can be a callable/function and doesn't have to be a class.

Change keras_losses to handle callable loss objects too.

Allow for custom stopper object

Update Search class to include method to set custom trial or experiment stopper. Should probably include some error handling as well, check that the stopper passed is of a certain allowable type.

Support metric and loss kwargs

Support passing of class definitions and kwargs in configurations.

Currently, we use a mapper argument to map strings to metrics and losses since most class instances cannot be passed to trials (non-picklable). This will be replaced with passing class definitions or callables with kwargs.

For metrics - a list of dicts as below (metrics do not contain hyperparameters and can be stored in a list)

[{“name”: str, “metric”: class/callable, “kwargs”: dict}]

For losses - a single dict (multiple losses not supported):

{“name”: str, “loss”: class/callable, “kwargs”: dict}

Multi-output models raise errors

Some single-task networks can have multiple outputs (e.g. an attention model that outputs predictions and attention scores). In this case, checking the task count fails to predict how keras will name the output metrics. Use len(model.outputs) instead.

Update get_top_k_trials for losses

Add an argument "mode" to allow filtering of trials by max or min values. This permits finding of trials with the highest metric value or the lowest loss.

Support kwargs for metrics and losses

Many metrics and losses have parameters that users may want to specify or tune in the space definition. These should be supported by enhancing the space specification. This requires changes to glimr.keras.keras and testing with single task and multiple task models.

Model averaging of top trials

We could average top trials or compose an ensemble of sensitive and specific models for prediction. This should improve accuracy and also enables calculation of uncertainties. Perhaps this cannot be generalized for all applications and should be handled in the application libraries instead.

Support dataloader kwargs

Users should be able to specify dataloader kwargs and hyperparameters in their space. This can support operations like augmentation which is application specific, or training with different feature sets stored in different directories.

The batch should also be integrated into a data dictionary in the search space.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.