Comments (4)
Thanks for reporting. I'll take a look today.
from lbfgspp.
The reason is that the gradient was incorrectly calculated.
Since you have added a diagonal noise to K
, the derivatives of l
and sigma
need to be computed before K
is added the noise.
Other tweaks:
- You should cache
makeDist(xData_, xData_)
outsidekernel()
andoperator()
, since it stays the same. - Also cache the Cholesky decomposition of
K
to compute other related quantities, for exampleK^(-1)
,K^(-1) * y
, andlog|K|
. - Avoid direct matrix-matrix multiplication: instead of
y.transpose() * kinv * dkl * kinv * y
, do
alpha = kinv * y
alpha.transpose() * dkl * alpha
tr(A * B) = sum(A .* B)
, where.*
is the elementwise multiplication. The former isO(n^3)
, whereas the latter isO(n^2)
.
Below is my version:
#include <iostream>
#include <Eigen/Core>
#include <Eigen/Cholesky>
#include <LBFGSB.h>
#include <LBFGS.h>
constexpr double PI = 3.141592653589793238462643383279502884197169399375105820974944;
using Matrix2D = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor | Eigen::AutoAlign>;
using Vector = Eigen::Matrix<double, Eigen::Dynamic, 1>;
Matrix2D makeDist(const Matrix2D & x1, const Matrix2D & x2) {
return ((-2.0 * (x1 * x2.transpose())).colwise() + x1.rowwise().squaredNorm()).rowwise() +
x2.rowwise().squaredNorm().transpose();
}
Matrix2D kernel(const Matrix2D & dist, double l = 1.0, double sigma = 1.0) {
return std::pow(sigma, 2) * ((-0.5 / std::pow(l, 2)) * dist).array().exp();
}
struct Likelihood {
Likelihood(const Matrix2D & xData, const Vector & muData, double noise) :
dist_(makeDist(xData, xData)), muData_(muData), noise_(noise) {}
const Matrix2D dist_;
const Vector & muData_;
double noise_;
double operator()(const Eigen::VectorXd& x, Eigen::VectorXd& grad) {
const int N = muData_.size();
const double l = x[0];
const double sigma = x[1];
Matrix2D k = kernel(dist_, l, sigma);
// Compute derivatives as per
// https://math.stackexchange.com/questions/1030534/gradients-of-marginal-likelihood-of-gaussian-process-with-squared-exponential-co
Matrix2D dkl = k.cwiseProduct(dist_) / std::pow(l, 3);
Matrix2D dks = (2.0 / sigma) * k;
k.diagonal().array() += std::pow(noise_, 2);
auto llt = Eigen::LLT<Matrix2D>(k);
Matrix2D kinv = llt.solve(Matrix2D::Identity(N, N));
Vector alpha = llt.solve(muData_);
grad[0] = 0.5 * alpha.transpose() * dkl * alpha - 0.5 * kinv.cwiseProduct(dkl).sum();
grad[1] = 0.5 * alpha.transpose() * dks * alpha - 0.5 * kinv.cwiseProduct(dks).sum();
// Minimize negative log-likelihood
grad = -grad;
// Compute log likelihood (correct)
const double logLikelihood = -0.5 * muData_.dot(alpha) - llt.matrixL().selfadjointView().diagonal().array().log().sum() - 0.5 * N * std::log(2.0 * PI);
// Print current parameters
std::cout << l << ", " << sigma << " ==> " << -logLikelihood << " - " << grad.transpose() << std::endl;
// Return negative log-likelihood
return -logLikelihood;
}
};
int main() {
const double noise = 0.4;
// Some test points to train on.
Matrix2D X_train(6, 1);
X_train << -5, -4, -3, -2, -1, 1;
Vector Y_train = X_train.array().sin();
// Log-likelihood function with set data/noise
Likelihood like(X_train, Y_train, noise);
// Bounds
Vector lb = Vector::Constant(2, 1e-8);
Vector ub = Vector::Constant(2, std::numeric_limits<double>::infinity());
// Initial guess
Vector x = Vector::Constant(2, 1.0);
// L-BFGS-B
LBFGSpp::LBFGSBParam<double> param;
param.epsilon = 1e-5;
param.max_iterations = 100;
LBFGSpp::LBFGSBSolver<double> solver(param);
double log;
solver.minimize(like, x, log, lb, ub);
std::cout << "\n===============================\n\n";
LBFGSpp::LBFGSParam<double> paramx;
paramx.epsilon = 1e-5;
paramx.max_iterations = 10;
LBFGSpp::LBFGSSolver<double> solver2(paramx);
x = Vector::Constant(2, 1.0);
solver2.minimize(like, x, log);
return 0;
}
Output:
1, 1 ==> 6.49954 - -1.64172 2.33992
1.85404, 0.479791 ==> 7.35173 - 2.13605 -6.82365
1.3421, 0.791619 ==> 5.88876 - -0.101547 0.295286
1.36368, 0.760786 ==> 5.88481 - 0.0601469 -0.0763338
1.35529, 0.76696 ==> 5.88431 - 0.0152388 0.0146282
1.35236, 0.76593 ==> 5.88426 - 0.00972153 0.0133352
1.34553, 0.762505 ==> 5.8842 - -5.29077e-06 -0.000405903
1.34559, 0.762563 ==> 5.8842 - 1.4304e-06 1.52874e-06
===============================
1, 1 ==> 6.49954 - -1.64172 2.33992
1.57435, 0.181389 ==> 10.1205 - 1.09806 -14.218
1.28717, 0.590695 ==> 6.05536 - 0.38649 -2.43346
1.21589, 0.792321 ==> 5.92615 - -0.488156 0.681163
1.25462, 0.747928 ==> 5.89378 - -0.233815 0.12984
1.28602, 0.735536 ==> 5.88844 - -0.0968389 -0.107242
1.3083, 0.741189 ==> 5.88623 - -0.045728 -0.113439
1.34315, 0.760708 ==> 5.88421 - -0.00147428 -0.0116414
1.34548, 0.76243 ==> 5.8842 - 0.000102842 -0.00104711
1.3456, 0.762561 ==> 5.8842 - 2.875e-05 -4.39889e-05
1.34559, 0.762563 ==> 5.8842 - 3.86423e-06 -3.34703e-06
from lbfgspp.
Sorry for my mistake, I hope you haven't lost too much time on it. Thanks a ton for the help!
from lbfgspp.
Not a big deal. It was actually a good time to refresh my memory on GP fitting. 😂
from lbfgspp.
Related Issues (20)
- Bounded LBFGS (box) doesn't work well HOT 1
- LBFGS Breaks on this trivial case HOT 2
- Comparison with Pythons scipy.optimize lbfgsb HOT 6
- Several ways to crash LBFGS++ HOT 8
- Examples hang on Linux/ppc64le HOT 1
- LineSearchNocedalWright - NaN in Step 2 (Zooming) HOT 1
- How to create a functor for LBFGSpp given objective function and its gradient HOT 2
- compare with some optimization tools? HOT 2
- The optimization returns 6 precision numbers HOT 2
- Division by zero in BFGSMat.h (apply_Hv(...)) HOT 2
- More Thuente line search can find proper step HOT 10
- Line Search failed using LBFGS for optimisation of chemical structures HOT 12
- Why the stop criteria is like Gradient Norm< epsilon_relative * x.norm() HOT 9
- `dg_hi` set but never used warning in LBFGSpp/LineSearchNocedalWright.h ? HOT 1
- Some compiler warnings. HOT 1
- How about support auto-diff for computing the gradient
- linesearch fails
- Is there an issue with your cauchy points finding step? HOT 5
- Vcpkg support HOT 1
- return covariance of estimated parameters
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lbfgspp.