brml / climin Goto Github PK

View Code? Open in Web Editor NEW

179.0 179.0 66.0 815 KB

Optimizers for machine learning

License: Other

Python 100.00%

climin's People

Contributors

Stargazers

Watchers

Forkers

tjsongzw osdf datahacking daniel-vainsencher wqren bayerj psattige tirune chtlp richardkelley aurora1625 scofield0li gabobert ogh saromanov kuitang warmspringwinds yorkerlin vinodrajendran001 atousatorabi tttthomasssss superbobry msoelch ageek bachard wgapl lizhangzhan luk0r jluttine akosiorek vishal-kit gwjensen wiebke shannonyu rickymos giggleliu vishalbelsare actank gonzmg88 clustersdata clau eycab mfouda afcarl harit7 saubhik reiisky zhwiwi freephys leeyang sofiendhouib syyunn ypymy davidpqc1231 zphilip juanjogg1987 alexnewtown farhansabir123 yb27 jotelha eric-vader blotero nicoclerc madscie254 m-lyon fcas

climin's Issues

Some optimizers skip the first iteration and only yield for i > 0

why?

consistency!

Signal handling fails for certain signals

When using the criterion climin.stops.OnSignal with signal.SIGUSR1 the module fails with the following message:

Exception TypeError: 'signal handler must be signal.SIG_IGN, signal.SIG_DFL, or a callable object' in <bound method OnSignal.del of <climin.stops.stops.OnSignal object at 0x22b58fd0>> ignored

Can LBFGSB used in online learning?

Can this implementation of L-BFGS-B used as a replacement for stochastic gradient descent?

Add Hessian free optimizer

There are several public releases already,

which can serve as inspiration. I think that the first one is mainly interesting because it looks maximally simple compared to the others.

The signature should be

def __init__(self, f, fprime, f_Hp, ...)

which means that most of the complexity of HF (mainly the whole Gauss-Newton thing) are abstracted away. Structural damping also is part of f_Hp.

We will need a specialized version of CG, though.

Make LBFGS work on the GPU

This includes adding at least one line search that works on the gpu.

initial value used in rmsprop

@bayerj
The 152-th line of rmsprop.py

        self.moving_mean_squared = 1

I think the initial value should be 0 instead of 1.
Any reason why 1 is better than 0

Error in tutorial

Hello.

At the following page: https://climin.readthedocs.org/en/latest/tutorial.html, there is an error.

The following code is found about halfway down:

import climin
opt = climin.GradientDescent(parameters, d_loss_wrt_pars, step_rate=0.1, momentum=.95, args=args)

The problem is step_rate is not an option for climin.GradientDescent, and it should be steprate instead.

[WinError 126] The specified module could not be found

self.logfunc in line 169 of bfgs.py

The bfgs object does not have a logfunc attribute or method

Installing climin on windows x64, anaconda 2.5 and python 2.7

Let me point out that with anaconda 2.5, python 2.7 and windows x64, after installing climin with pip, it is necessary to comment out this section in init.py of climin.

if sys.platform == 'win32':

basepath = imp.find_module('numpy')[1]

ctypes.CDLL(os.path.join(basepath, 'core', 'libmmd.dll'))

ctypes.CDLL(os.path.join(basepath, 'core', 'libifcoremd.dll'))

then climin runs superbly.

Today 64-bit machines are probably the majority and 32-bit machines an exception.

Test reorganization

Currently, most of the optimizers have different tests. There should be a central module containing three functions to optimize:

a skewed quadratic function,
Rosenbrock function,
a machine learning model that uses data and args/kwargs.

The last one needs implementation of the model and the creation of a simple data set for which the finding of the global minimum is easily verifiable. I am thinking of something like a mixture of two Gaussians and logistic regression.

About updates in adadelta

I think in the original paper of adadelta, there is no step_rate.

Add climin to PyPI?

Hi, I'd love to see this package in PyPI. Any plans to do so? It would help installing the package and declaring it as a dependency.

Just to reserve the package name and to test if things work, I registered and uploaded the most recent version to PyPI. I will either remove the package from PyPI or move the PyPI package ownership to you, whatever you wish.

In order to use PyPI, you'd need to fix version numbering to follow PEP 0440: https://www.python.org/dev/peps/pep-0440/. Thus, you would need to change the version number in setup.py to something like 0.1a1, 0.1b4, 0.1rc2 etc.

Uploading to PyPI can be done as python setup.py sdist upload.

If you are interested in this and need any help, I'd be happy to help if I can.

Make OnSignal work on windows

There is an issue with FORTAN libraries replacing signal handlers and not being able to recover them:

http://stackoverflow.com/questions/15457786/ctrl-c-crashes-python-after-importing-scipy-stats

The solution is to

Add to climin/__init__.py:

if sys.platform == 'win32':
  basepath = imp.find_module('numpy')[1]
  ctypes.CDLL(os.path.join(basepath, 'core', 'libmmd.dll'))
  ctypes.CDLL(os.path.join(basepath, 'core', 'libifcoremd.dll'))

And then extend OnSignal with

import win32api
win32api.SetConsoleCtrlHandler(self._console_ctrl_handler, 1)

in the windows case.

After that, climin has to be imported before scipy by the user.

Stopping criterions and a convenience function to optimize

Several reasons might lead you to stop optimization:

the gradient is 0,
the change of the parameters is negligible,
a finite amout of time has passed,
a desired error has been reached,
a finite amount of function/gradient evaluations has been done,
a finite amount of iterations has been done.

It would be nice to have convenience functions for this. Most are easy, but e.g. 2. needs to keep track of values--thus, the stopping criterion is stateful. I have a feeling we will overshoot if we solve all of these.

Support to complex numbers

Optimization schemes with complex numbers are widely used in physics, and recently, machine learning.

I strongly suggest to add the support to complex numbers for optimization engines like RmsProp et. al.

We just need a few lines of change and several tests.

E.g. climin/rmsprop.py line 165-167

            self.moving_mean_squared = (
                self.decay * self.moving_mean_squared
                + (1 - self.decay) * gradient ** 2) 
            --> + (1 - self.decay) * np.abs(gradient) ** 2)

A single line of change would make it applicable for complex numbers.

The same is true for Adam and Adadelta.

On the other side, GradientDescent works well already without any change.

Maybe a bit effort is needed for Rprop, I have no clue yet how to make it compatible with complex numbers due to the ill defined sign function for complex numbers.

potential bug in adadelta

@bayerj
It seems there is a bug in adadelta.py when momentum is used.
The momentum correction can be applied to adadelta, rmrprop and others stochastic updates.
The potential bug is at the 110-th line of adadelta.py

    def _iterate(self):
        for args, kwargs in self.args:
            step_m1 = self.step
            d = self.decay
            o = self.offset
            m = self.momentum
            step1 = step_m1 * m * self.step_rate
            self.wrt -= step1

            gradient = self.fprime(self.wrt, *args, **kwargs)

            self.gms = (d * self.gms) + (1 - d) * gradient ** 2
            step2 = sqrt(self.sms + o) / sqrt(self.gms + o) * gradient * self.step_rate
            self.wrt -= step2

            self.step = step1 + step2
            self.sms = (d * self.sms) + (1 - d) * self.step ** 2

            self.n_iter += 1

            yield {
                'n_iter': self.n_iter,
                'gradient': gradient,
                'args': args,
                'kwargs': kwargs,
            }

I think it should be step1 = step_m1 * m instead of step1 = step_m1 * m * self.step_rate.
Correct me if I am wrong.

Note that in rmsprop.py, the 160-th line is

 step1 = step_m1 * self.momentum

, which is correct.

Add SMD optimizer

@osdf has code, it only has to be adapted a little.

Remove stop

I just realized that if we never calculate more than needed in the optimizers loop (because it can be done from the outside) we actually don't need the stop functionality. Yields are rather fast (compared to model evaluations). This would make code a lot easier.

Any objections?

Make the wrt variable optionally be a pair containing a setter and a getter

Currently, the signature of optimizers only allows the following:

def __init__(self, wrt, f, ...):
    # ...

However, in some cases, this can lead to problems with the GPU: e.g. theano does not guarantee that changing shared variables (e.g. retrieved with borrow=True) will actually change the real thing in the background.

I therefore suggest to add the following behaviour:

def __init__(self, wrt, f, ...):
    if isinstance(wrt, tuple):
       self._get_wrt, self._set_wrt = wrt
    else:
        # the old methods from below work for the array
        self._wrt = wrt

def _get_wrt(self): return self._wrt
def _set_wrt(self, val): self._wrt[:] = val

The downside is that we will lose some inplace operations. But I am not too sure whether that is actually the case.

Make info dictionary consistent in all optimizers

currently the info dictionary is not consistent across optimizers. Some return a lot of information, some barely anything. Certain information, like number of iteration n_iter should be returned in all optimizers. This becomes particularly important for the stopping condition mechanism. The stopping conditions (most likely?) work on the info dict and require certain labels, like wrt, loss, n_iter etc.

Can this be done in the base class even? So that optimizer-independent information is added to the dict in the base class and the optimizer only adds specific information to it before yielding?

Also, some optimizers use dict(...) syntax and others the {...} syntax to create the dict. Make this consistent!

Add documentation

I guess we will use sphinx.

Should the args iterator be divided into iter_args/iter_kwargs instead?

The construction of different arguments for each iteration of the optimizer is somehow tedious. However, in the case of some optimizers (HF, KSD) there are several arguments. E.g., KSD needs a different argument iterator for the gradient calcuation, the subspace construction and the inner loop. If that would result in 6 different arguments passed to the constructor, that'd be rather ugly.

Needs more thought.

What is the best behaviour here?

throw a Diverged exception?
stop iterating?
try to recover via heuristics?
just continue and let the user find out via inspection of info?

rmsprop final steps

I'm slightly confused about the final steps described in the doc vs the code below, should the Nesterov momentum be applied before updating the parameters, i.e.: self.wrt -= step1 + step2

        step1 = step_m1 * self.momentum
        self.wrt -= step1
        gradient = self.fprime(self.wrt, *args, **kwargs)

        self.moving_mean_squared = (
            self.decay * self.moving_mean_squared
            + (1 - self.decay) * gradient ** 2)
        step2 = self.step_rate * gradient
        step2 /= sqrt(self.moving_mean_squared + 1e-8)
        self.wrt -= step2

        step = step1 + step2