Coder Social home page Coder Social logo

Trouble converging about lbfgspp HOT 4 CLOSED

yixuan avatar yixuan commented on September 26, 2024
Trouble converging

from lbfgspp.

Comments (4)

yixuan avatar yixuan commented on September 26, 2024

Thanks for reporting. I'll take a look today.

from lbfgspp.

yixuan avatar yixuan commented on September 26, 2024

The reason is that the gradient was incorrectly calculated.

Since you have added a diagonal noise to K, the derivatives of l and sigma need to be computed before K is added the noise.

Other tweaks:

  1. You should cache makeDist(xData_, xData_) outside kernel() and operator(), since it stays the same.
  2. Also cache the Cholesky decomposition of K to compute other related quantities, for example K^(-1), K^(-1) * y, and log|K|.
  3. Avoid direct matrix-matrix multiplication: instead of y.transpose() * kinv * dkl * kinv * y, do
alpha = kinv * y
alpha.transpose() * dkl * alpha
  1. tr(A * B) = sum(A .* B), where .* is the elementwise multiplication. The former is O(n^3), whereas the latter is O(n^2).

Below is my version:

#include <iostream>
#include <Eigen/Core>
#include <Eigen/Cholesky>

#include <LBFGSB.h>
#include <LBFGS.h>

constexpr double PI = 3.141592653589793238462643383279502884197169399375105820974944;

using Matrix2D = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor | Eigen::AutoAlign>;
using Vector = Eigen::Matrix<double, Eigen::Dynamic, 1>;

Matrix2D makeDist(const Matrix2D & x1, const Matrix2D & x2) {
    return ((-2.0 * (x1 * x2.transpose())).colwise() + x1.rowwise().squaredNorm()).rowwise() +
    		x2.rowwise().squaredNorm().transpose();
}

Matrix2D kernel(const Matrix2D & dist, double l = 1.0, double sigma = 1.0) {
    return std::pow(sigma, 2) * ((-0.5 / std::pow(l, 2)) * dist).array().exp();
}

struct Likelihood {
    Likelihood(const Matrix2D & xData, const Vector & muData, double noise) :
            dist_(makeDist(xData, xData)), muData_(muData), noise_(noise) {}

    const Matrix2D dist_;
    const Vector & muData_;
    double noise_;

    double operator()(const Eigen::VectorXd& x, Eigen::VectorXd& grad) {
    	const int N = muData_.size();
        const double l = x[0];
        const double sigma = x[1];

        Matrix2D k = kernel(dist_, l, sigma);

        // Compute derivatives as per
        // https://math.stackexchange.com/questions/1030534/gradients-of-marginal-likelihood-of-gaussian-process-with-squared-exponential-co
        Matrix2D dkl = k.cwiseProduct(dist_) / std::pow(l, 3);
        Matrix2D dks = (2.0 / sigma) * k;

        k.diagonal().array() += std::pow(noise_, 2);
        auto llt = Eigen::LLT<Matrix2D>(k);
        Matrix2D kinv = llt.solve(Matrix2D::Identity(N, N));
        Vector alpha = llt.solve(muData_);

        grad[0] = 0.5 * alpha.transpose() * dkl * alpha - 0.5 * kinv.cwiseProduct(dkl).sum();
        grad[1] = 0.5 * alpha.transpose() * dks * alpha - 0.5 * kinv.cwiseProduct(dks).sum();
        // Minimize negative log-likelihood
        grad = -grad;

        // Compute log likelihood (correct)
        const double logLikelihood = -0.5 * muData_.dot(alpha) - llt.matrixL().selfadjointView().diagonal().array().log().sum() - 0.5 * N * std::log(2.0 * PI);

        // Print current parameters
        std::cout << l << ", " << sigma << " ==> " << -logLikelihood << "  -  " << grad.transpose() << std::endl;

        // Return negative log-likelihood
        return -logLikelihood;
    }
};

int main() {
    const double noise = 0.4;

    // Some test points to train on.
    Matrix2D X_train(6, 1);
    X_train << -5, -4, -3, -2, -1, 1;
    Vector Y_train = X_train.array().sin();

    // Log-likelihood function with set data/noise
    Likelihood like(X_train, Y_train, noise);

    // Bounds
    Vector lb = Vector::Constant(2, 1e-8);
    Vector ub = Vector::Constant(2, std::numeric_limits<double>::infinity());
    // Initial guess
    Vector x = Vector::Constant(2, 1.0);

    // L-BFGS-B
    LBFGSpp::LBFGSBParam<double> param;
    param.epsilon = 1e-5;
    param.max_iterations = 100;
    LBFGSpp::LBFGSBSolver<double> solver(param);
    double log;
    solver.minimize(like, x, log, lb, ub);

    std::cout << "\n===============================\n\n";

    LBFGSpp::LBFGSParam<double> paramx;
    paramx.epsilon = 1e-5;
    paramx.max_iterations = 10;
    LBFGSpp::LBFGSSolver<double> solver2(paramx);
    x = Vector::Constant(2, 1.0);
    solver2.minimize(like, x, log);

    return 0;
}

Output:

1, 1 ==> 6.49954  -  -1.64172  2.33992
1.85404, 0.479791 ==> 7.35173  -   2.13605 -6.82365
1.3421, 0.791619 ==> 5.88876  -  -0.101547  0.295286
1.36368, 0.760786 ==> 5.88481  -   0.0601469 -0.0763338
1.35529, 0.76696 ==> 5.88431  -  0.0152388 0.0146282
1.35236, 0.76593 ==> 5.88426  -  0.00972153  0.0133352
1.34553, 0.762505 ==> 5.8842  -  -5.29077e-06 -0.000405903
1.34559, 0.762563 ==> 5.8842  -   1.4304e-06 1.52874e-06

===============================

1, 1 ==> 6.49954  -  -1.64172  2.33992
1.57435, 0.181389 ==> 10.1205  -  1.09806 -14.218
1.28717, 0.590695 ==> 6.05536  -   0.38649 -2.43346
1.21589, 0.792321 ==> 5.92615  -  -0.488156  0.681163
1.25462, 0.747928 ==> 5.89378  -  -0.233815   0.12984
1.28602, 0.735536 ==> 5.88844  -  -0.0968389  -0.107242
1.3083, 0.741189 ==> 5.88623  -  -0.045728 -0.113439
1.34315, 0.760708 ==> 5.88421  -  -0.00147428  -0.0116414
1.34548, 0.76243 ==> 5.8842  -  0.000102842 -0.00104711
1.3456, 0.762561 ==> 5.8842  -     2.875e-05 -4.39889e-05
1.34559, 0.762563 ==> 5.8842  -   3.86423e-06 -3.34703e-06

from lbfgspp.

Svalorzen avatar Svalorzen commented on September 26, 2024

Sorry for my mistake, I hope you haven't lost too much time on it. Thanks a ton for the help!

from lbfgspp.

yixuan avatar yixuan commented on September 26, 2024

Not a big deal. It was actually a good time to refresh my memory on GP fitting. 😂

from lbfgspp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.