Hello, I tried to use Eigen3 with enzyme (not sure if it's planned t

Per this working on several type analysis speedups in <a class="issue-link js-issue-li

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="81

Compilation hangs (mk ii) : Matrix edition,about enzymead/enzyme

Comments (13)

wsmoses commented on July 22, 2024

Per this working on several type analysis speedups in #143 that appear to make this run in a reasonable time.

Some of the later ones require derivatives to be registered for ldexp which will be added shortly.

Indeed (at least without the Clang in this repo) a custom gradient is a bit tricky to add. I can add a similar mechanism to the custom inactive call we discussed earlier to remedy this nicely, perhaps something like:

__enzyme_customgradient(normal, augmentedfwd, gradient);

Obviously this per-call basis isn't as powerful as the replace all functions, which I suppose we could do something to register them with a magic global.

Perhaps like this?

void* __enzyme_register_gradient1[3] = { normal, augmentedfwd, gradient };

Incidentally this might be a good way to similarly enable a nice interface for shadow globals.

Thoughts?

from enzyme.

unrealwill commented on July 22, 2024

Thanks.

My initial thoughts were something more in the line the registering gradient solution.
Use your current language independent way (llvm) way of registering gradients and provide an api to this in multiple languages.

From the user perspective, the best is when gradient works out of the box without needing to add anything (in theory enzyme, acting on the language level should be able to do once all llvm features languages have been squared) but there will be higher level optimizations that the optimizer compiler don't know how to do so will need custom gradients.

Currently the way it works for the maths function is fine, so probably something similar for eigen3 would be great. But it's hard to set bounds on which library should be supported natively or not, and for the user it's reassuring to have a way to never be blocked at the expanse of providing additional work.

It's best if I can register my custom gradients inside my project, without having to define them inside the compiler project.
Ideally I want to be able to use the op from multiple files. I don't mind registering the op multiple times but the definition should be there only once. I need to be able to compile each cpp file to a .o independently without it falling into symbol defined multiple times hell.

I don't mind registering the op as a plugin to the enzyme plugin.
clang -shared mycustomop.cpp -Xclang -load -Xclang /usr/local/lib/ClangEnzyme-11.so -O2 -o mycustomop.so
clang fileusingop.cpp -Xclang -load -Xclang /usr/local/lib/ClangEnzyme-11.so -additionalops mycustomop.so -O2 -o output

In tensorflow you can register new ops via a shared object. But for something like c++ you want to be able to inline the ops statically so this plugin won't be loaded dynamically.

from enzyme.

wsmoses commented on July 22, 2024

Yeah I wasn't offhand thinking of adding eigen derivatives to Enzyme proper (though perhaps useful to do so), but some misc scalar functions that some of these templates call when using the default (non custom) gradient.

I don't think that a custom plugin to Enzyme would be the nicest solution, but more along the lines of just making a nice way for the frontend to specify information that Enzyme can preprocess into the relevant LLVM metadata prior to AD.

Using the above ideas as examples, you could do something like:

double normSolveMatrix( const  Matrix<double,T,T,RowMajor>& m)
{
  Matrix<double,T,1> v;
  for( int i = 0 ; i < m.cols() ; i++)
  {
    v(i,0) = i;
  }
  // This call specifically would have its gradient be customfwd/customreverse
  Matrix<double,T,1> sol = __enzyme_customgradient(Matrix<double,T,T,RowMajor>::fullPivLu, customfwd, customreverse, &m).solve(v);
  return normVector<T>(sol);
}

or say

// header.h

// These magic globals would be translated into equivalent custom gradient metadata whenever seen by Enzyme pass
void* __enzyme_register_gradient1[3] = { Matrix<double,T,T,RowMajor>::fullPivLu, augmentedfwd1, gradient1 };
void* __enzyme_register_gradient2[3] = { Matrix<double,T,T,RowMajor>::exp, augmentedfwd2, gradient2 };

//mycode.cpp
#include <header.h>
...

Obviously you could always have the augmented fwd/gradient in separate compilation units that you link to after the fact.

My disincentive for a plugin to a plugin is that embedding IR through the plugin seems prone to version breakages among other headaches. Also because you may want to compile the custom op with a diff compiler (say nvcc).

from enzyme.

unrealwill commented on July 22, 2024

I like your second solution with register_gradient a lot more.

It allows the forward code to be used independently of enzyme.
What's great with enzyme, is that it takes a standard function existing function like normSolveMatrix, and without needing to modify it, you can get its derivative.

The first solution would break this and make the code of the forward pass harder to read.

If you can make the second solution work without the need of a plugin that's even better.

from enzyme.

wsmoses commented on July 22, 2024

Update, nontrivial speedups to type analysis have landed to main with various additional intrinsics required being added.

Will add the custom gradient registration shortly

from enzyme.

wsmoses commented on July 22, 2024

#149 implements a beta of the second solution if you want to try it out

A quick and dirty experiment with the following "works" (not testing for correctness at moment, but feel free to try shortly):

MatrixXd* augment_iexp(const MatrixXd& arg, const MatrixXd& d_arg, MatrixXd& out, MatrixXd& d_out) {
  Eigen::internal::matrix_exp_compute<MatrixXd>(arg, out);
  d_out = MatrixXd::Constant(arg.rows(), arg.cols(), 0.0);
  return new MatrixXd(arg);
}

void gradient_iexp(const MatrixXd& arg, MatrixXd& d_arg, const MatrixXd& out, const MatrixXd& d_out, MatrixXd* tape) {
  d_arg += (*tape) * tape->exp() * d_out;
  delete tape;
}

void* __enzyme_register_gradient[3] = { (void*)Eigen::internal::matrix_exp_compute<MatrixXd,MatrixXd>, (void*)augment_iexp, (void*)gradient_iexp };

from enzyme.

unrealwill commented on July 22, 2024

Thanks, I'll try testing today.
Regarding correctness, that's probably wrong, as I'm not looking at the pointwise exponential, but the matrix exponential.
https://en.wikipedia.org/wiki/Matrix_exponential
In Eigen they are usually computed by Pade Approximation.
But a quick way (numerically problematic) that enzyme could be doing it, is just computing the series :
Sum( M^k/k!, k=0..n) naive up to some n.

from enzyme.

wsmoses commented on July 22, 2024

Yeah was more interested in testing the custom derivative working than getting the derivative right for the test (hence putting something in, which you should feel free to actually correct).

Relatedly, if you end up implementing various custom rules, feel free to submit a PR with them in a header file!

from enzyme.

wsmoses commented on July 22, 2024

Specifically note that the custom ABI is a tad fragile, but in essence for functions that return a pointer, you should return a struct (can be anonymous, named whatever) that contains the tape type, the original return, the shadow return. For functions that return void (like above), returning the tape alone should be sufficient.

You may also have to define custom gradients for slightly different functions than you may expect. For example, in the code snippet above I needed to define the gradient for matrix_exp_compute rather than MatrixXd::exp because the latter was inlined and therefore would not use a custom gradient.

from enzyme.

unrealwill commented on July 22, 2024

Thanks.

Little updates from my side.
I tried before going to sleep but #149 didn't get merged yet, so I tried with latest commit #148 but the custom gradient is not registering, and even though there are probably some type inference speed, when enzyme encounter the matrix exp the compilation still hangs.

I'll try again when #149 get merged, although I could probably grab your branch if I were in a hurry.
On the positive side I got the cuda enzyme example to run today, (needed to install cuda-10.1 and set -DLLVM_TARGETS_TO_BUILD="host;NVPTX" in the LLVM source build command)

from enzyme.

wsmoses commented on July 22, 2024

Yeah you need #149 for the magic global registration which is pending per code review. Once that goes it, should be able to try out (I'd start with a code that just does the matrix exponentiation and print in the gradient as a double check).

There's also some additional, fairly substantial speedups to the Analysis coming in #150 (basically as I now start some concerted efforts to reduce analysis time).

from enzyme.

unrealwill commented on July 22, 2024

Hello,

I tried it this morning, and managed to get the exp to run with #150.
Then I tried the exercise of getting inverse to work the same way that exp

I encountered a few problems :

duplicate __enzyme_register_gradient error I solved by renaming the global to __enzyme_register_gradient_exp and __enzyme_register_gradient_inverse
Couldn't get easily the name of the function I'm trying to implement the derivative of. I tried :
void* __enzyme_register_gradient_inverse[3] = { (void*)(&MatrixXd::inverse), (void*)augment_iinverse, (void*)gradient_iinverse };
which failed with :
cannot initialize an array element of type 'void *' with an rvalue of type 'const Inverse<Eigen::Matrix<double, -1, -1, 0, -1, -1>> (Eigen::MatrixBase<Eigen::Matrix<double, -1, -1, 0, -1, -1>>::*)() const'
which means it has problem converting the member function pointer to void* (which is indeed problematic in c++)

I battled with the compiler for too long a time and didn't get anything to work so I tried to introduce a function :

MatrixXd inverseMatrix( const MatrixXd& a)
{
  return a.inverse();
}

For which I can more easily register the custom gradient.

Which then compiles fine but crash with segfault, probably because I couldn't get the type right somewhere because of the above trick.

I'm not quite sure I understand exactly the structure of what needs to be defined to properly register a gradient, and the right types of everything, (for example the tape). Also, I see you are using new/delete for the tape which will become problematic for me when I put it on the GPU, or inside loops. Would it be possible to somehow stack-allocate the tape ? Or avoid the tape allocation all-together at the cost of recomputation.

Thanks

customgradient.cpp

#include <stdio.h>
#include <iostream>
#include <stdlib.h>
#include <random>
#include <math.h>
#include <vector>
#include <algorithm>

#include <Eigen/Dense>
#include <unsupported/Eigen/MatrixFunctions>
using Eigen::MatrixXd;
using namespace std;
using namespace Eigen;

int enzyme_dup;
int enzyme_out;
int enzyme_const;

void __enzyme_autodiff(...);

MatrixXd* augment_iexp(const MatrixXd& arg, const MatrixXd& d_arg, MatrixXd& out, MatrixXd& d_out) {
  Eigen::internal::matrix_exp_compute<MatrixXd>(arg, out);
  d_out = MatrixXd::Constant(arg.rows(), arg.cols(), 0.0);
  return new MatrixXd(arg);
}

void gradient_iexp(const MatrixXd& arg, MatrixXd& d_arg, const MatrixXd& out, const MatrixXd& d_out, MatrixXd* tape) {
  d_arg += (*tape) * tape->exp() * d_out;
  delete tape;
}




void* __enzyme_register_gradient_exp[3] = { (void*)Eigen::internal::matrix_exp_compute<MatrixXd,MatrixXd>, (void*)augment_iexp, (void*)gradient_iexp };


MatrixXd inverseMatrix( const MatrixXd& a)
{
  return a.inverse();
}


MatrixXd* augment_iinverse(const MatrixXd& arg, const MatrixXd& d_arg, MatrixXd& out, MatrixXd& d_out) {
  Eigen::internal::matrix_exp_compute<MatrixXd>(arg, out);
  d_out = MatrixXd::Constant(arg.rows(), arg.cols(), 0.0);
  return new MatrixXd(arg);
}

void gradient_iinverse(const MatrixXd& arg, MatrixXd& d_arg, const MatrixXd& out, const MatrixXd& d_out, MatrixXd* tape) {
  d_arg += (*tape) * tape->exp() * d_out;
  delete tape;
}


void* __enzyme_register_gradient_inverse[3] = { (void*)inverseMatrix, (void*)augment_iinverse, (void*)gradient_iinverse };


double normMatrixXd( const MatrixXd& m )
{
  double out = 0.0;
  for( int i = 0 ; i < m.rows() ; i++ )
  {
    for( int j = 0 ; j < m.cols(); j++)
    {
      out += m(i,j)* m(i,j);
    }
  }
  return out;
}

double normExpMatrixXd( const MatrixXd& m)
{
  return normMatrixXd(m.exp());
}

double normInverseMatrixXd( const MatrixXd& m)
{
  MatrixXd inv = inverseMatrix(m);
  return normMatrixXd(inv);
}

void testMatrixXd( int T )
{

  MatrixXd m(T,T);
  for( int i = 0; i < T ; i++)
  {
    for( int j = 0 ; j < T ;j++)
    {
      m(i,j) = (i+j)*(i+j);
    }
  }

  MatrixXd dm(T,T);
  for( int i = 0; i < T ; i++)
  {
    for( int j = 0 ; j < T ;j++)
    {
      dm(i,j) = 0.0;
    }
  }
  std::cout <<"m : " << std::endl;
  std::cout << m << std::endl;
  std::cout << "normExpMatrix "<< std::endl;
  std::cout << normExpMatrixXd(m) << std::endl;
  std::cout << "normInverseMatrix "<< std::endl;
  std::cout << normInverseMatrixXd(m) << std::endl;
  //__enzyme_autodiff(normExpMatrixXd, enzyme_dup, &m,&dm); //Hangs compilation
  __enzyme_autodiff(normInverseMatrixXd, enzyme_dup, &m,&dm); //Hangs compilation

  std::cout << dm << std::endl;

}



int main()
{
  testMatrixXd(3);

  return 0;
}

Compilation with :
clang customgradient.cpp -I/usr/include/eigen3/ -lstdc++ -lm -Xclang -load -Xclang /usr/local/lib/ClangEnzyme-11.so -O2 -o customgradient -fno-exceptions

from enzyme.

wsmoses commented on July 22, 2024

The fix for your immediate problem is relatively easy. Specifically, you defined a custom gradient / forward pass that assumes that it takes two pointer arguments, and returns a void (and thus the forward pass returns only the "tape", has the two original arguments and the corresponding shadows; and the reverse pass has the two original arguments and corresponding shadow arguments, and tape as the final argument).

The function you defined it to have as the custom derivative (inverseMatrix), however, takes only one argument, and returns another argument. Remedying your calling convention, the following works without a segfault:

#include <stdio.h>
#include <iostream>
#include <stdlib.h>
#include <random>
#include <math.h>
#include <vector>
#include <algorithm>

#include <Eigen/Dense>
#include <unsupported/Eigen/MatrixFunctions>
using Eigen::MatrixXd;
using namespace std;
using namespace Eigen;

int enzyme_dup;
int enzyme_out;
int enzyme_const;

void __enzyme_autodiff(...);

MatrixXd* augment_iexp(const MatrixXd& arg, const MatrixXd& d_arg, MatrixXd& out, MatrixXd& d_out) {
  Eigen::internal::matrix_exp_compute<MatrixXd>(arg, out);
  d_out = MatrixXd::Constant(arg.rows(), arg.cols(), 0.0);
  return new MatrixXd(arg);
}

void gradient_iexp(const MatrixXd& arg, MatrixXd& d_arg, const MatrixXd& out, const MatrixXd& d_out, MatrixXd* tape) {
  d_arg += (*tape) * tape->exp() * d_out;
  delete tape;
}




void* __enzyme_register_gradient_exp[3] = { (void*)Eigen::internal::matrix_exp_compute<MatrixXd,MatrixXd>, (void*)augment_iexp, (void*)gradient_iexp };


void inverseMatrix( const MatrixXd& a, MatrixXd& out)
{
  out = a.inverse();
}


MatrixXd* augment_iinverse(const MatrixXd& arg, const MatrixXd& d_arg, MatrixXd& out, MatrixXd& d_out) {
  Eigen::internal::matrix_exp_compute<MatrixXd>(arg, out);
  d_out = MatrixXd::Constant(arg.rows(), arg.cols(), 0.0);
  return new MatrixXd(arg);
}

void gradient_iinverse(const MatrixXd& arg, MatrixXd& d_arg, const MatrixXd& out, const MatrixXd& d_out, MatrixXd* tape) {
  d_arg += (*tape) * tape->exp() * d_out;
  delete tape;
}


void* __enzyme_register_gradient_inverse[3] = { (void*)inverseMatrix, (void*)augment_iinverse, (void*)gradient_iinverse };


double normMatrixXd( const MatrixXd& m )
{
  double out = 0.0;
  for( int i = 0 ; i < m.rows() ; i++ )
  {
    for( int j = 0 ; j < m.cols(); j++)
    {
      out += m(i,j)* m(i,j);
    }
  }
  return out;
}

double normExpMatrixXd( const MatrixXd& m)
{
  return normMatrixXd(m.exp());
}

double normInverseMatrixXd( const MatrixXd& m)
{
  MatrixXd inv;
  inverseMatrix(m, inv);
  return normMatrixXd(inv);
}

void testMatrixXd( int T )
{

  MatrixXd m(T,T);
  for( int i = 0; i < T ; i++)
  {
    for( int j = 0 ; j < T ;j++)
    {
      m(i,j) = (i+j)*(i+j);
    }
  }

  MatrixXd dm(T,T);
  for( int i = 0; i < T ; i++)
  {
    for( int j = 0 ; j < T ;j++)
    {
      dm(i,j) = 0.0;
    }
  }
  std::cout <<"m : " << std::endl;
  std::cout << m << std::endl;
  std::cout << "normExpMatrix "<< std::endl;
  std::cout << normExpMatrixXd(m) << std::endl;
  std::cout << "normInverseMatrix "<< std::endl;
  std::cout << normInverseMatrixXd(m) << std::endl;
  //__enzyme_autodiff(normExpMatrixXd, enzyme_dup, &m,&dm); //Hangs compilation
  __enzyme_autodiff(normInverseMatrixXd, enzyme_dup, &m,&dm); //Hangs compilation

  std::cout << dm << std::endl;

}



int main()
{
  testMatrixXd(3);

  return 0;
}

Enzyme nicely defines that the corresponding arguments to the reverse pass will be precisely the same as in the forward pass. Thus if the argument is never overwritten you don't need to cache. However, if the memory is written to somewhere in between (and defining a correct derivative for arbitrary use, I was conservative in my example and cached). The current custom derivative registration system (fragile as you can see as only used internally to Enzyme at this point) asks for the most conservative version of a gradient, but we can add mechanisms to specify whether it could've been overwritten/etc (the wonders of compiler alias analysis!)

As for getting the pointer to member function, I had tried something earlier that worked along the lines of

struct {
  typeof(&MatrixXd::inverse) original;
  typeof(augment_inverse) augment;
  typeof(gradient_inverse) gradient;
} __enzyme_register_gradient_inverse = { &MatrixXd::inverse, (void*)augment_iinverse, (void*)gradient_iinverse };

that would work (and if not I can make it work). However, it's a bit late (6am) for me right now so I'll let you play around with the custom derivative registration for now and call it for tonight.

Again, clearly this is not the best UX, but one has to start somewhere.

Also as we add some custom derivative registration (and related pieces to the calling convention), I'm sure others would love to see how to use it so feel free to make PR's to the www branch if you have time to write some docs.

from enzyme.

Compilation hangs (mk ii) : Matrix edition about enzyme HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent