Coder Social home page Coder Social logo

jajcayn / pygpso Goto Github PK

View Code? Open in Web Editor NEW
19.0 1.0 1.0 3.16 MB

Gaussian-Processes Surrogate Optimisation in python

License: MIT License

Python 99.89% Shell 0.11%
optimisation bayesian-optimisation gaussian-processes space-partition-tree python3 neuroscience neuroscience-methods gaussian-processes-surrogate scale-biophysical-models partition-tree

pygpso's Introduction

Build Status codecov PyPI license Binder DOI Code style: black

pyGPSO

Optimise anything (but mainly large-scale biophysical models) using Gaussian Processes surrogate

pyGPSO is a python package for Gaussian-Processes Surrogate Optimisation. GPSO is a Bayesian optimisation method designed to cope with costly, high-dimensional, non-convex problems by switching between exploration of the parameter space (using partition tree) and exploitation of the gathered knowledge (by training the surrogate function using Gaussian Processes regression). The motivation for this method stems from the optimisation of large-scale biophysical models in neuroscience when the modelled data should match the experimental one. This package leverages GPFlow for training and predicting the Gaussian Processes surrogate.

This is port of original Matlab implementation by the paper's author.

Reference: Hadida, J., Sotiropoulos, S. N., Abeysuriya, R. G., Woolrich, M. W., & Jbabdi, S. (2018). Bayesian Optimisation of Large-Scale Biophysical Networks. NeuroImage, 174, 219-236.

Comparison of the GPR surrogate and the true objective function after optimisation.

Example of ternary partition tree after optimisation.

Installation

GPSO package is tested and should run without any problems on python versions 3.6 -- 3.9.

Note on python3.9 with macOS

Installing pytables might give you hdf5 errors. If this is the case, please do

brew install hdf5 c-blosc

and all should work like a charm afterwards.

One-liner

For those who want to optimise right away just

pip install pygpso

and go ahead! Make sure to check example notebooks in the examples directory to see how it works and what it can do. Or, alternatively, you can run interactive notebooks in binder: Binder

Go proper

When you are the type of girl or guy who likes to install packages properly, start by cloning (or forking) this repository, then installing all the dependencies and finally install the package itself

git clone https://github.com/jajcayn/pygpso
cd pygpso/
pip install -r requirements.txt
# optionally, but recommended
pip install -r requirements_optional.txt
pip install .

Don't forget to test!

pytest

Usage

A guide on how to optimise and what can be done using this package is given as jupyter notebooks in the examples directory. You can also try them out live thanks to binder: Binder.

The basic idea is to initialise the parameter space in which the optimisation is to be run and then iteratively dig deeper and evaluate the objective function when necessary

from gpso import ParameterSpace, GPSOptimiser


def objective_function(params):
    # params as a list or tuple
    x, y = params
    ...
    <some hardcore computation>
    ...
    return <float>

# bounds of the parameters we will optimise
x_bounds = [-3, 5]
y_bounds = [-3, 3]
space = ParameterSpace(parameter_names=["x", "y"], parameter_bounds=[x_bounds, y_bounds])
opt = GPSOptimiser(parameter_space=space, n_workers=4)
best_point = opt.run(objective_function)

The package also offers plotting functions for visualising the results. Again, those are documented and showcased in the examples directory.

Notes

Gaussian Processes regression uses normalised coordinates within the bounds [0, 1]. All normalisation and de-normalisation is done automatically, however when you want to call predict_y on GPR model, do not forget to pass normalised coordinates. The normalisation is handled by sklearn.MinMaxScaler and ParameterSpace instance offers a convenience functions for this: ParameterSpace.normalise_coords(orig_coords) and ParameterSpace.denormalise_coords(normed_coords).

Plotting of the ternary tree (gpso.plotting.plot_ternary_tree()) requires igraph package, whose layout function is exploited. If you want to see the resulting beautiful tree, please install python-igraph.

Support of saver (for saving models run, e.g. timeseries along with the optimisation) is provided by PyTables (and pandas if you're saving results to DataFrames).

Known bugs and future improvements

  • saving of GP surrogate is now hacky, as GPFlow supports only saving model for future prediction but AFAIK they cannot be trained anymore, since the information on kernels and mean-functions are not saved (only the trained weights in the computational graph). Thus, pyGPSO still relies on hacky saving to pkl files and recreating kernels and mean-function on-the-go when loading from saved.

Final notes

When you encounter a bug or have any idea for an improvement, please open an issue and/or contact me.

When using this package in publications, please cite the original Jonathan's paper for the methodology as

@article{hadida2018bayesian,
  title={Bayesian Optimisation of Large-Scale Biophysical Networks},
  author={Hadida, Jonathan and Sotiropoulos, Stamatios N and Abeysuriya, Romesh G and Woolrich, Mark W and Jbabdi, Saad},
  journal={Neuroimage},
  volume={174},
  pages={219--236},
  year={2018},
  publisher={Elsevier}
}

and acknowledge the usage of this software via its DOI: DOI. After clicking, you will see citation data.

pygpso's People

Contributors

jajcayn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

ufangyang

pygpso's Issues

Allow resuming the optimisation

For this, the following is needed:

  • saving of parameter space and its state (which leaves has been explored etc.)
  • saving of GPSurrogate (both - list of points and GPR model) (WIP: #40)
  • allow resuming with new conditions (i.e. number of evaluations etc.)

Add support for strongly stochastic objective functions

In some modelling cases, the objective function can be strongly stochastic (due to the nature of the model). In that case, each evaluation might introduce a non-negligible error.

One obvious solution would be to add (parallelized of course) multiple evaluations of the same point and in the end, take their mean/median/whatever as the actual evaluated score.

Requires minor edition of the source code, since some parallelisation is already done when initialising.

Middle child and parent's center are not the same

In some occasions, the assertion within the LeafNode class throws, that the centers of middle child and parents are not the same when splitting using a ternary partition.

See the traceback error send by one of the users:
image

Possible culprit: floating-point errors when the tree is really deep.
Possible solution: all-close should be enough

Installation (macOS 10.14.6): gcc/clang error of igraph on

I'm having installation problems of igraph with the following error, linked to this igraph issue.

clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
    ld: library not found for -lstdc++
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    error: command 'gcc' failed with exit status 1

Solution:

brew install igraph
MACOSX_DEPLOYMENT_TARGET=10.14 pip install python-igraph

gcc/clang error of igraph on macOS 10.14.6

I'm having installation problems of igraph with the following error, linked to this igraph issue.

clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
    ld: library not found for -lstdc++
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    error: command 'gcc' failed with exit status 1

Solution:

brew install igraph
MACOSX_DEPLOYMENT_TARGET=10.14 pip install python-igraph

Support for saving full output from the objective function

This stems from the LBSM: when doing an optimisation of the biophysical model, it might be useful to save the full model output (i.e. timeseries) along with the parameter to an external file.

Probably will use hdf and tables for this.

Implement user-defined callbacks

Optional user-defined functions such as _post_initialise(), _pre_iteration(), _post_iteration(), _post_update(), _pre_finalise() where user can define hers/his own callbacks. They would default at pass.

A typical use-case would be plotting ternary tree after each iteration (i.e. in _post_iteration()), special logging as per user, saving after each iteration, etc...

Better example notebooks

Especially after new features:

  • show user-defined callbacks
  • show and explain the number of evaluation repeats
  • saving / loading / checkpoints / continuing
  • maybe even stepping the algorithm for yourself, i.e. not use run method, but iteratively use tree selection, tree evaluation, and gp update steps

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.