stefan-endres / shgo Goto Github PK

Simplicial Homology Global Optimization

Home Page: https://stefan-endres.github.io/shgo

License: MIT License

Python 99.33% Makefile 0.29% Batchfile 0.38%

black-box black-box-model black-box-optimization clustering-algorithm global-optimization mathematics nonlinear nonlinear-optimization optimization science

shgo's People

Contributors

Stargazers

Watchers

Forkers

rkern lepy etudpy minhhg simonmb iamtrask tk-21st passion4energy tessavdheiden pedrogasparotti arkochem not-a-feature

shgo's Issues

Code requires 3.6

I think it's fine to run only on Python 3, but currently the code fails on 3.5, which is still quite a large installed base. We can use Travis or Tox to test for different version compatibility.

Force round-off local minimisation of smallest sampling value

Force round-off of a local minimisation of the smallest sampling value every iteration, even in the case where no minimiser homology is found. This solves the issue of the symmetric sampling method not finding solutions for symmetrically structured problems such as the issue raised in the scipy repository.

sampling_method='sobol' can't handle constraints with args

from shgo import shgo
import numpy as np

def eggholder(x):
return (-(x[1] + 47.0)
* np.sin(np.sqrt(abs(x[0]/2.0 + (x[1] + 47.0))))
- x[0] * np.sin(np.sqrt(abs(x[0] - (x[1] + 47.0))))
)

def f(x): # (cattle-feed)
return 24.55x[0] + 26.75x[1] + 39x[2] + 40.50x[3]

def g1(x):
return 2.3x[0] + 5.6x[1] + 11.1x[2] + 1.3x[3] - 5 # >=0

def g2(x):
return (12x[0] + 11.9x[1] +41.8x[2] + 52.1x[3] - 21
- 1.645 * np.sqrt(0.28x[0]**2 + 0.19x[1]**2
+ 20.5x[2]**2 + 0.62x[3]**2)
) # >=0

def h1(x):
return x[0] + x[1] + x[2] + x[3] - 1 # == 0

cons = ({'type': 'ineq', 'fun': g1},
{'type': 'ineq', 'fun': g2},
{'type': 'eq', 'fun': h1})
bounds = [(0, 1.0),]*4
res = shgo(f, bounds, iters=3, constraints=cons)
res

##################################################
#################### modified ####################
#using sampling method sobol
res_sobol = shgo(f, bounds, iters=3, constraints=cons,sampling_method='sobol')
res_sobol # still working

#using constrain with arguments
def g1_modified(x,i):
return i2.3x[0] + i5.6x[1] + 11.1x[2] + 1.3x[3] - 5 # >=0
cons = ({'type': 'ineq', 'fun': g1_modified, "args":(0,)},
{'type': 'ineq', 'fun': g2},
{'type': 'eq', 'fun': h1})
res_modConstr = shgo(f, bounds, iters=3, constraints=cons)
res_modConstr #still working

#using constrain with arguments AND sampling method sobol
res_modConstr_sobol = shgo(f, bounds, iters=3, constraints=cons,sampling_method='sobol')
res_modConstr_sobol #TypeError: g1_modified() missing 1 required positional argument: 'i'

Remove functions from sobol_seq.py

There are functions in sobol_seq.py which are not used.

is_prime() is only used in prime_ge, which is not used anywhere else.
i4_uniform() is not used anywhere else

The less code we have, the less we need to maintain.

Move testing to pytest

pytest is superior to UnitTest in almost every way. SciPy is also moving their test suite to pytest in the 1.0.0 release

Replace check-and-update pattern with dict.update

Instead of the pattern of

if options is not None:
    if 'f_tol' in options:
        self.minimizer_kwargs['options']['ftol'] = options['f_tol']
...

I think we can rather do

if options is not None:
    self.minimizer_kwargs.update(options)

This will automatically overwrite keys if they exist in options. Less code to maintain.

Implement VertexCache for arbitrary sampling

The 1-dimensional Sobol sequences can't be implemented as Delaunay triangulation object since qhull only accepts dimensions 2 and higher.

This is causing major issues in refactoring and expansion of the code since we have to deal with two data structures in all the functions.

A common data structure for both cases similar to the simplicial sampling method would ease future changes to the code.

Stopping a routine in finite processing time using the standard library

Currently in the go_bench_f_min.py file uses the interruptingcow module (interruptingcow.timeout) to stop the entire while loop in finite time. We cannot asses the time pass at the start of every loop since building on the simplicial complex (by either using delaunay triangulation or the hypercube sampling) can take extremely long on a single iteration.

We want to minimise dependencies for shgo since we want to include it in many other modules.

Therefore we need a way to stop the entire loop in a similar way that interruptingcow by wrapping the entire loop in a with statement and raising a RuntimeError when the processing time limit expires.

Equality constraints not handled

It seems that the equality constraints, referenced in the documentation as eq, are not support. See https://github.com/Stefan-Endres/shgo/blob/master/shgo/_shgo.py#L495.

for cons in constraints:
  if cons['type'] is 'ineq':
    self.g_cons.append(cons['fun'])
    try:
      self.g_args.append(cons['args'])
    except KeyError:
      self.g_args.append(())
self.g_cons = tuple(self.g_cons)
self.g_args = tuple(self.g_args)

If I've missed how to specify, please let me know.

Merge universal_iters branch with master

The universal_iters branch is a major restructuring of the code with the following benefits:

Iteration and initiation methods have been replaced by a single iteration method for an arbitrary sampling method. This allows the code to be more easily understood by external users in addition to allowing the class to proceed with iteration calls externally (for realtime user optimisation without stopping the routine and losing the data on the complex).
Stopping criteria methods for both Sobol and Simplicial sampling are much easier to implement and expand since they are used in a single routine.
Greater versatility in the external input to the routine allowing for minor to potentially large performance improvements.
It is now much simpler to input an arbitrary sampling method into the function by just adding the function to the sampling_method argument.
Minor fixes to the master branch.

Unfortunately, it is not a simple refactoring since the inputs (n and iters arguments) actually need to be changed slightly to produce the same outputs in the overall routine, although the same behaviour should (at least theoretically) be possible to replicate given the correct usage of these parameters. The 3 unittests for ambiguous unittests also need to be removed since they are no longer ambiguous, but can used in a customized optimisation run. All other unittests are kept and working with little to no modifications.

However, just because the unittests are working does not mean the performance has been maintained. Therefore we need a way to compare performance of the two branches in addition to replicating the performance shown in publications. One possibility is adding unittests assessing the function evaluations of the current master branch. Finally some of the changes allowing for Python compatibility other than Python 3.6 which were committed to the master branch need to be rechecked with the new code.

Is there any way to terminate close to a desired number of function evaluations?

I've noticed that {'maxfev':50} often results in >200 actual function evaluations. Is there a way to ensure this number is closer to what one desires, for example when the function to be evaluated is very expensive?

Split README.rst into multiple files

README.rst should be split into files at the top level headings

Question (theory)

It seems plausible that an SHGO variant might be capable of finding good plateaus, which is to say relatively stable local minima where there is no particularly steep dropoff nearby. I realize that isn't a rigorous definition, but do you have thoughts on how this might be hacked to achieve the intent, approximately?

Pytest problems with test classes containing init

Not all tests are running. I receive the following warnings:

shgo/tests/test__shgo.py::TestFunction
  cannot collect test class 'TestFunction' because it has a __init__ constructor

shgo/tests/test__shgo.py::Test1
  cannot collect test class 'Test1' because it has a __init__ constructor

shgo/tests/test__shgo.py::Test2
  cannot collect test class 'Test2' because it has a __init__ constructor

shgo/tests/test__shgo.py::Test3
  cannot collect test class 'Test3' because it has a __init__ constructor

shgo/tests/test__shgo.py::Test4
  cannot collect test class 'Test4' because it has a __init__ constructor

shgo/tests/test__shgo.py::TestLJ
  cannot collect test class 'TestLJ' because it has a __init__ constructor

shgo/tests/test__shgo.py::TestTable
  cannot collect test class 'TestTable' because it has a __init__ constructor

shgo/tests/test__shgo.py::TestInfeasible
  cannot collect test class 'TestInfeasible' because it has a __init__ constructor

-- Docs: http://doc.pytest.org/en/latest/warnings.html

cc test coverage

Configure codeclimate to run test coverage on the repo.

https://docs.codeclimate.com/v1.0/docs/configuring-test-coverage

"type(thing) is" is an antipattern

We should not be checking for explicit type membership, but rather use isinstance.

Discussion

Incorporate the FunctionCache class

Incorporate the FunctionCache class from pointclasses.py into the main code. While the global routine will not evaluate the same value twice (and it is very unlikely that a non-deterministic local minimization would), it will allow us to reduce some complexity in the random sampling code to make it more similar to the simplicial sampling.

is there an option to specify starting point?

Is there an option to specify starting point? Something like this: res = shgo(f, bounds, x0=x0)
where the x0 is the user best guess for the optimum.

Some local minima are missing even if setting a very large `n`.

Some local minima are missing even if setting a very large n. Is this the method's limit or the improper settings of parameters? Is it possible to find all local minima with some scale through this method?

e.g. the rastrigin function

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import shgo

def rastrigin(X):
    return 20. + sum([(x**2 - 10. * np.cos(2. * np.pi * x)) for x in X])

func = rastrigin
xmin, xmax = -5.12, 5.12

x = np.linspace(xmin, xmax, 100)
y = np.linspace(xmin, xmax, 100)
xx, yy = np.meshgrid(x, y)
zz = func((xx, yy))
bounds = [(xmin, xmax)] * 2
ret = shgo(func, bounds,
           n=100000, iters=1, sampling_method='sobol')
print('nfev: {:5d}'.format(ret.nfev))
print('nxl: {:5d}'.format(len(ret.xl)))

plt.figure()
plt.pcolormesh(x, y, zz)
for x1, y1 in ret.xl:
    plt.plot(x1, y1, 'ro')
plt.show()

Fix coverage on coveralls

Coveralls appears to be tracking only the coverage of the test script itself, not the other files. This is why we had such great coverage to begin with and why it has dropped a lot now, since the tests are not all running (see #13).

We need to make it so that the coverage is tracked across all files in the project not just the test script.

Parallelization

Overview

There are many sub-routines in shgo that are low hanging fruit for parallelization, most importantly the sampling mapping of the objective function itself (in cases where it is possible to parallelize the problem). In addition we can optimize the sampling size of an iteration based on the computational resources available.

We need to be careful with both dependencies and code structure changes we include for several reasons. First our dependencies on scipy.optimize.minimize and scipy.spatial.Delaunay. Secondly our ambition to include shgo in scipy.optimize means it should ideally have the same dependencies and structure. Finally we want to minimize reliance on maintenance from other packages which can lead to such issues that we had with using multiprocessing_on_dill in tgo.

My suggestion is to use numba to avoid needing to change the code structure at all. We can do this using tricks such as the one used in poliastro:
https://github.com/poliastro/poliastro/blob/0.6.x/src/poliastro/jit.py
https://github.com/poliastro/poliastro/blob/master/setup.py#L40

which simply maps the decorator to the dependency if it is installed or does nothing. Obviously it is possible that we can use the same tricks for other libraries and methods by redefining range functions etc in our code.

While it is possible that SciPy will eventually include numba as a dependency, based on discussions in the scipy-dev mailing lists this will not happen in the near future:
https://mail.python.org/pipermail/scipy-dev/2018-March/022576.html

However, for now we should be able to maintain numba as an optional dependency as describe in the rest of this post. My idea is to provide two main modes of parallelization, based on both CPU and GPU parallelization. So therefore using numba for GPU parallelization would be ideal since it avoids extra dependencies. Finally numba provides us with access to LLVM that can be used in the sampling generation.

GPU

User architecture can be found or specified and then we can map it to our own decorator.

Nvidia

We can use @numba.cuda.jit, I propose an early test of the objective function so that the user can be warned if this fails.
https://devblogs.nvidia.com/seven-things-numba/
https://numba.pydata.org/numba-doc/latest/cuda/index.html

AMD

We can use @hsa.jit(device=True)
https://numba.pydata.org/numba-doc/latest/hsa/overview.html

CPU

Our options for CPU parallelization
http://numba.pydata.org/numba-doc/dev/user/parallel.html
or multiprocessing_on_dill etc.

However, it appears that parallelization with numba isn't as simple as just adding @jit(parallel = True)
https://stackoverflow.com/questions/45610292/how-to-make-numba-jit-use-all-cpu-cores-parallelize-numba-jit

So we should also do a few tests on non-trivial functions to see if it is worth implementing.

Avoid using `is` to check string equality

There are a couple of places in the code where we use is to check for string equality, so

if a is 'teststring':
    # whatever

This is not guaranteed to work for all strings as it relies on an implementation detail in cpython (where small strings are cached. It should be taken away wherever it is used.

Documentation typos and expanded examples

Fix the typos in the documentation found here and improve the readability as well as detailing the use of the arguments inside the algorithm #27.

The examples should also include usage of advanced features such as the symmetry argument of the algorithm to increase visibility.

How to supply a constraint for a boolean parameter?

Hi,

The function I would like to optimize has (amongst others) one parameter that only allows a boolean value (so the parameter can be zero or one).

How could I set this as a constraint for shgo?

Thanks in advance!

Fix links to references in documentation

shgo stuck searching at x_min, x_max bounds

I'm trying to use shgo to tune the gains of a control algorithm in a black-box fashion, by iteratively running the controller in a simulation (each simulation only takes about 20 seconds to complete) and computing a cost from the simulation results, which can then be used as the objective function f for shgo. However, having left shgo running for the past few hours, I'm not seeing any progress being made. What I have noticed is the following:

shgo seems to be stuck switching the values in the optimization parameter vector x between the bounds x_min and x_max that were given to it, and doesn't seem to have done any searching within the bounds themselves.
In the first few minutes of running shgo, it iterates quickly through each simulation run, however, as time goes on it slows down substantially, spending minutes between each iteration. What is shgo doing in the minutes between simulation runs, is there a way to produce verbose output to investigate further?

Due to the strange results seen above, there could be two likely culprits that initially come to mind. The first is that my optimization vector x is of dimension 15, and after reading the shgo documentation it looks like anything above 10 dimensions will be a struggle for shgo to solve quickly (due to symmetries in the system dynamics, I can likely get this number below 10 though). The second potential problem is that the objective function (i.e. the cost output from the simulation) can return inf (or sys.float_info.max) if the controller causes the system that it is controlling to become unstable and have its states blow up, can shgo deal with objective functions that return inf?

Warning on bounds w.r.t. symmetry

Hi,

Here you are using:

if (self.bounds[symmetry[i]] is not
        self.bounds[symmetry[j]]):

The line might be changed into:

if (not np.array_equal(self.bounds[symmetry[i]], self.bounds[symmetry[j]])):

This is because the tuples are changed into numpy arrays here.

Symmetry overwrite

Hi!

Is it correct that you overwrite the symmetry passed?

if self.symmetry:
    self.symmetry = [0, ]*len(self.bounds)

The skip argument is not consistently handled in sobol_points_*

The code expects to pass a skip argument to sobol_points, but sobol_points_10k doesn't accept such an argument.

This shows two things:

This will break in high dimensions
We don't have a test which breaks with this problem right now

Remove "if 0" code to another branch

I understand why you have left the code there, but if it's not working it shouldn't ship effectively commented out this way. It's better to move this stuff to another branch where it is active and fix it there.