brml / climin Goto Github PK
View Code? Open in Web Editor NEWOptimizers for machine learning
License: Other
Optimizers for machine learning
License: Other
why?
consistency!
When using the criterion climin.stops.OnSignal
with signal.SIGUSR1
the module fails with the following message:
Exception TypeError: 'signal handler must be signal.SIG_IGN, signal.SIG_DFL, or a callable object' in <bound method OnSignal.del of <climin.stops.stops.OnSignal object at 0x22b58fd0>> ignored
Can this implementation of L-BFGS-B used as a replacement for stochastic gradient descent?
There are several public releases already,
which can serve as inspiration. I think that the first one is mainly interesting because it looks maximally simple compared to the others.
The signature should be
def __init__(self, f, fprime, f_Hp, ...)
which means that most of the complexity of HF (mainly the whole Gauss-Newton thing) are abstracted away. Structural damping also is part of f_Hp.
We will need a specialized version of CG, though.
This includes adding at least one line search that works on the gpu.
@bayerj
The 152-th
line of rmsprop.py
self.moving_mean_squared = 1
I think the initial value should be 0
instead of 1
.
Any reason why 1
is better than 0
Hello.
At the following page: https://climin.readthedocs.org/en/latest/tutorial.html, there is an error.
The following code is found about halfway down:
import climin
opt = climin.GradientDescent(parameters, d_loss_wrt_pars, step_rate=0.1, momentum=.95, args=args)
The problem is step_rate is not an option for climin.GradientDescent, and it should be steprate instead.
The bfgs object does not have a logfunc attribute or method
Let me point out that with anaconda 2.5, python 2.7 and windows x64, after installing climin with pip, it is necessary to comment out this section in init.py of climin.
then climin runs superbly.
Today 64-bit machines are probably the majority and 32-bit machines an exception.
Currently, most of the optimizers have different tests. There should be a central module containing three functions to optimize:
The last one needs implementation of the model and the creation of a simple data set for which the finding of the global minimum is easily verifiable. I am thinking of something like a mixture of two Gaussians and logistic regression.
I think in the original paper of adadelta, there is no step_rate.
Hi, I'd love to see this package in PyPI. Any plans to do so? It would help installing the package and declaring it as a dependency.
Just to reserve the package name and to test if things work, I registered and uploaded the most recent version to PyPI. I will either remove the package from PyPI or move the PyPI package ownership to you, whatever you wish.
In order to use PyPI, you'd need to fix version numbering to follow PEP 0440: https://www.python.org/dev/peps/pep-0440/. Thus, you would need to change the version number in setup.py
to something like 0.1a1
, 0.1b4
, 0.1rc2
etc.
Uploading to PyPI can be done as python setup.py sdist upload
.
If you are interested in this and need any help, I'd be happy to help if I can.
There is an issue with FORTAN libraries replacing signal handlers and not being able to recover them:
http://stackoverflow.com/questions/15457786/ctrl-c-crashes-python-after-importing-scipy-stats
The solution is to
Add to climin/__init__.py
:
if sys.platform == 'win32':
basepath = imp.find_module('numpy')[1]
ctypes.CDLL(os.path.join(basepath, 'core', 'libmmd.dll'))
ctypes.CDLL(os.path.join(basepath, 'core', 'libifcoremd.dll'))
And then extend OnSignal
with
import win32api
win32api.SetConsoleCtrlHandler(self._console_ctrl_handler, 1)
in the windows case.
After that, climin has to be imported before scipy by the user.
Several reasons might lead you to stop optimization:
It would be nice to have convenience functions for this. Most are easy, but e.g. 2. needs to keep track of values--thus, the stopping criterion is stateful. I have a feeling we will overshoot if we solve all of these.
Optimization schemes with complex numbers are widely used in physics, and recently, machine learning.
I strongly suggest to add the support to complex numbers for optimization engines like RmsProp et. al.
We just need a few lines of change and several tests.
E.g. climin/rmsprop.py
line 165-167
self.moving_mean_squared = (
self.decay * self.moving_mean_squared
+ (1 - self.decay) * gradient ** 2)
--> + (1 - self.decay) * np.abs(gradient) ** 2)
A single line of change would make it applicable for complex numbers.
The same is true for Adam and Adadelta.
On the other side, GradientDescent works well already without any change.
Maybe a bit effort is needed for Rprop, I have no clue yet how to make it compatible with complex numbers due to the ill defined sign function for complex numbers.
@bayerj
It seems there is a bug in adadelta.py
when momentum
is used.
The momentum
correction can be applied to adadelta
, rmrprop
and others stochastic updates.
The potential bug is at the 110-th
line of adadelta.py
def _iterate(self):
for args, kwargs in self.args:
step_m1 = self.step
d = self.decay
o = self.offset
m = self.momentum
step1 = step_m1 * m * self.step_rate
self.wrt -= step1
gradient = self.fprime(self.wrt, *args, **kwargs)
self.gms = (d * self.gms) + (1 - d) * gradient ** 2
step2 = sqrt(self.sms + o) / sqrt(self.gms + o) * gradient * self.step_rate
self.wrt -= step2
self.step = step1 + step2
self.sms = (d * self.sms) + (1 - d) * self.step ** 2
self.n_iter += 1
yield {
'n_iter': self.n_iter,
'gradient': gradient,
'args': args,
'kwargs': kwargs,
}
I think it should be step1 = step_m1 * m
instead of step1 = step_m1 * m * self.step_rate
.
Correct me if I am wrong.
Note that in rmsprop.py
, the 160-th
line is
step1 = step_m1 * self.momentum
, which is correct.
@osdf has code, it only has to be adapted a little.
I just realized that if we never calculate more than needed in the optimizers loop (because it can be done from the outside) we actually don't need the stop functionality. Yields are rather fast (compared to model evaluations). This would make code a lot easier.
Any objections?
Currently, the signature of optimizers only allows the following:
def __init__(self, wrt, f, ...):
# ...
However, in some cases, this can lead to problems with the GPU: e.g. theano does not guarantee that changing shared variables (e.g. retrieved with borrow=True
) will actually change the real thing in the background.
I therefore suggest to add the following behaviour:
def __init__(self, wrt, f, ...):
if isinstance(wrt, tuple):
self._get_wrt, self._set_wrt = wrt
else:
# the old methods from below work for the array
self._wrt = wrt
def _get_wrt(self): return self._wrt
def _set_wrt(self, val): self._wrt[:] = val
The downside is that we will lose some inplace operations. But I am not too sure whether that is actually the case.
currently the info
dictionary is not consistent across optimizers. Some return a lot of information, some barely anything. Certain information, like number of iteration n_iter
should be returned in all optimizers. This becomes particularly important for the stopping condition mechanism. The stopping conditions (most likely?) work on the info dict and require certain labels, like wrt
, loss
, n_iter
etc.
Can this be done in the base class even? So that optimizer-independent information is added to the dict in the base class and the optimizer only adds specific information to it before yielding?
Also, some optimizers use dict(...)
syntax and others the {...}
syntax to create the dict. Make this consistent!
I guess we will use sphinx.
The construction of different arguments for each iteration of the optimizer is somehow tedious. However, in the case of some optimizers (HF, KSD) there are several arguments. E.g., KSD needs a different argument iterator for the gradient calcuation, the subspace construction and the inner loop. If that would result in 6 different arguments passed to the constructor, that'd be rather ugly.
Needs more thought.
There are multiple ways of initializing parameters. See e.g. 'efficient backprop' by LeCun et al, or Martens' hessian free paper.
Currently, some line searches do not cache the function values at the last step. The callers thus have to evaluate those again if necessary (or if the user wants to inspect them) which is expensive in several cases, e.g. batch learning.
There should be a unified API on how to get the latest f and f' results.
Contrary to what the docstring says, gradient descent does not accept a sequence for the step_rate
parameter.
I'm happy to submit a pull request for this, if there's still interest!
So we had that issue yesterday that climin should never stop iterating because it thinks it converged. However, it might happen that it diverges and that that needs to be handled.
E.g. I am currently playing around with logistic regression and NCG which sometimes diverges because the direction becomes invalid (e.g. NaN).
What is the best behaviour here?
I'm slightly confused about the final steps described in the doc vs the code below, should the Nesterov momentum be applied before updating the parameters, i.e.: self.wrt -= step1 + step2
step1 = step_m1 * self.momentum
self.wrt -= step1
gradient = self.fprime(self.wrt, *args, **kwargs)
self.moving_mean_squared = (
self.decay * self.moving_mean_squared
+ (1 - self.decay) * gradient ** 2)
step2 = self.step_rate * gradient
step2 /= sqrt(self.moving_mean_squared + 1e-8)
self.wrt -= step2
step = step1 + step2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.