Hi, I'm trying to estimate the parameters of a GP with a standard RBF kernel. I'm havi

Trouble converging about lbfgspp HOT 4 CLOSED

yixuan commented on September 26, 2024

Trouble converging

from lbfgspp.

Comments (4)

yixuan commented on September 26, 2024

Thanks for reporting. I'll take a look today.

from lbfgspp.

yixuan commented on September 26, 2024

The reason is that the gradient was incorrectly calculated.

Since you have added a diagonal noise to K, the derivatives of l and sigma need to be computed before K is added the noise.

Other tweaks:

You should cache makeDist(xData_, xData_) outside kernel() and operator(), since it stays the same.
Also cache the Cholesky decomposition of K to compute other related quantities, for example K^(-1), K^(-1) * y, and log|K|.
Avoid direct matrix-matrix multiplication: instead of y.transpose() * kinv * dkl * kinv * y, do

alpha = kinv * y
alpha.transpose() * dkl * alpha

tr(A * B) = sum(A .* B), where .* is the elementwise multiplication. The former is O(n^3), whereas the latter is O(n^2).

Below is my version:

#include <iostream>
#include <Eigen/Core>
#include <Eigen/Cholesky>

#include <LBFGSB.h>
#include <LBFGS.h>

constexpr double PI = 3.141592653589793238462643383279502884197169399375105820974944;

using Matrix2D = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor | Eigen::AutoAlign>;
using Vector = Eigen::Matrix<double, Eigen::Dynamic, 1>;

Matrix2D makeDist(const Matrix2D & x1, const Matrix2D & x2) {
    return ((-2.0 * (x1 * x2.transpose())).colwise() + x1.rowwise().squaredNorm()).rowwise() +
    		x2.rowwise().squaredNorm().transpose();
}

Matrix2D kernel(const Matrix2D & dist, double l = 1.0, double sigma = 1.0) {
    return std::pow(sigma, 2) * ((-0.5 / std::pow(l, 2)) * dist).array().exp();
}

struct Likelihood {
    Likelihood(const Matrix2D & xData, const Vector & muData, double noise) :
            dist_(makeDist(xData, xData)), muData_(muData), noise_(noise) {}

    const Matrix2D dist_;
    const Vector & muData_;
    double noise_;

    double operator()(const Eigen::VectorXd& x, Eigen::VectorXd& grad) {
    	const int N = muData_.size();
        const double l = x[0];
        const double sigma = x[1];

        Matrix2D k = kernel(dist_, l, sigma);

        // Compute derivatives as per
        // https://math.stackexchange.com/questions/1030534/gradients-of-marginal-likelihood-of-gaussian-process-with-squared-exponential-co
        Matrix2D dkl = k.cwiseProduct(dist_) / std::pow(l, 3);
        Matrix2D dks = (2.0 / sigma) * k;

        k.diagonal().array() += std::pow(noise_, 2);
        auto llt = Eigen::LLT<Matrix2D>(k);
        Matrix2D kinv = llt.solve(Matrix2D::Identity(N, N));
        Vector alpha = llt.solve(muData_);

        grad[0] = 0.5 * alpha.transpose() * dkl * alpha - 0.5 * kinv.cwiseProduct(dkl).sum();
        grad[1] = 0.5 * alpha.transpose() * dks * alpha - 0.5 * kinv.cwiseProduct(dks).sum();
        // Minimize negative log-likelihood
        grad = -grad;

        // Compute log likelihood (correct)
        const double logLikelihood = -0.5 * muData_.dot(alpha) - llt.matrixL().selfadjointView().diagonal().array().log().sum() - 0.5 * N * std::log(2.0 * PI);

        // Print current parameters
        std::cout << l << ", " << sigma << " ==> " << -logLikelihood << "  -  " << grad.transpose() << std::endl;

        // Return negative log-likelihood
        return -logLikelihood;
    }
};

int main() {
    const double noise = 0.4;

    // Some test points to train on.
    Matrix2D X_train(6, 1);
    X_train << -5, -4, -3, -2, -1, 1;
    Vector Y_train = X_train.array().sin();

    // Log-likelihood function with set data/noise
    Likelihood like(X_train, Y_train, noise);

    // Bounds
    Vector lb = Vector::Constant(2, 1e-8);
    Vector ub = Vector::Constant(2, std::numeric_limits<double>::infinity());
    // Initial guess
    Vector x = Vector::Constant(2, 1.0);

    // L-BFGS-B
    LBFGSpp::LBFGSBParam<double> param;
    param.epsilon = 1e-5;
    param.max_iterations = 100;
    LBFGSpp::LBFGSBSolver<double> solver(param);
    double log;
    solver.minimize(like, x, log, lb, ub);

    std::cout << "\n===============================\n\n";

    LBFGSpp::LBFGSParam<double> paramx;
    paramx.epsilon = 1e-5;
    paramx.max_iterations = 10;
    LBFGSpp::LBFGSSolver<double> solver2(paramx);
    x = Vector::Constant(2, 1.0);
    solver2.minimize(like, x, log);

    return 0;
}

Output:

1, 1 ==> 6.49954  -  -1.64172  2.33992
1.85404, 0.479791 ==> 7.35173  -   2.13605 -6.82365
1.3421, 0.791619 ==> 5.88876  -  -0.101547  0.295286
1.36368, 0.760786 ==> 5.88481  -   0.0601469 -0.0763338
1.35529, 0.76696 ==> 5.88431  -  0.0152388 0.0146282
1.35236, 0.76593 ==> 5.88426  -  0.00972153  0.0133352
1.34553, 0.762505 ==> 5.8842  -  -5.29077e-06 -0.000405903
1.34559, 0.762563 ==> 5.8842  -   1.4304e-06 1.52874e-06

===============================

1, 1 ==> 6.49954  -  -1.64172  2.33992
1.57435, 0.181389 ==> 10.1205  -  1.09806 -14.218
1.28717, 0.590695 ==> 6.05536  -   0.38649 -2.43346
1.21589, 0.792321 ==> 5.92615  -  -0.488156  0.681163
1.25462, 0.747928 ==> 5.89378  -  -0.233815   0.12984
1.28602, 0.735536 ==> 5.88844  -  -0.0968389  -0.107242
1.3083, 0.741189 ==> 5.88623  -  -0.045728 -0.113439
1.34315, 0.760708 ==> 5.88421  -  -0.00147428  -0.0116414
1.34548, 0.76243 ==> 5.8842  -  0.000102842 -0.00104711
1.3456, 0.762561 ==> 5.8842  -     2.875e-05 -4.39889e-05
1.34559, 0.762563 ==> 5.8842  -   3.86423e-06 -3.34703e-06

from lbfgspp.

Svalorzen commented on September 26, 2024

Sorry for my mistake, I hope you haven't lost too much time on it. Thanks a ton for the help!

from lbfgspp.

yixuan commented on September 26, 2024

Not a big deal. It was actually a good time to refresh my memory on GP fitting. 😂

from lbfgspp.

Trouble converging about lbfgspp HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent