Coder Social home page Coder Social logo

Comments (5)

coreyostrove avatar coreyostrove commented on July 26, 2024

Some places even do this as a way to compute (squared) Frobenius norms.

This hurts me a bit in my soul.

I'm generally supportive of this. I used einsum for accelerating the trace for products of matrices a bunch when re-implementing the germ selection algorithm and indeed found it to be much faster. One thing that I did find back then when doing profiling was that even though you can do the trace of the product in one fell swoop with einsum, it was faster to actually stop a bit short and compute the diagonal entries and then sum those with np.sum (no idea why this was/is the case). You can see an example of where I used that pattern here:

inv_update_term_diag= _np.einsum('ij,ji->i', pinv_E_beta_central_chol, pinv_E_beta_central_chol.T)
. I couldn't tell you exactly what the performance differential was exactly anymore (that's probably a lie, I'm just too lazy to find the notebook where I tested this), but it was enough I decided it was worth doing. Ymmv though, so I'd recommend you try profiling that on your end.

Another observation I made at that time was that einsum was much faster at performing this trace/diagonal calculation subroutine when acting on products on inputs of the form A and A^T, than on general pairs of matrices A and B. If I recall correctly I concluded this was related to some optimization that must have been happening under the hood due to the fact that A^T was a view into A (you're better equipped to understand exactly what would be happening here, but my guess at the time was better cache behavior). I concluded this was related to being a view in part by checking what happened if instead I passed in a copy of A^T and found that the performance fell back in line with general pairs of matrices. I mention all of this because the trace(A@A^T) pattern is one that appears in a bunch of the cost functions we use in experiment design (and probably other places) due to its relation to something called A-optimal experiment design.

One last incoherent nugget extracted from the deeper confines of my memory. I have found that the aforementioned performance boost from using einsum for diagonals/traces of products of matrices of the form A and A^T is big enough that it in some instances justified doing some otherwise weird looking additional calculations. For example, in germ selection there was one particular hotspot in the code related to evaluating an expression of the form trace(A@C@A^T) with A having shape (N, r), C having shape (r,r) and N>>r. You can do this with a one-liner with einsum, but I found that it was significantly advantageous to first take a cholesky decomposition (even with the additional cost that imposed) and then fold the square roots of C (really half since you only need to do this on one since since we know we'll just be taking a transpose) into A before doing the einsum.

Thanks for coming to my TED talk.

from pygsti.

coreyostrove avatar coreyostrove commented on July 26, 2024

Whoops, just read your PR related to this. vdot is cool too.

from pygsti.

enielse avatar enielse commented on July 26, 2024

I'm generally for this, and would suggest we add comments to the effected lines where it seems appropriate, as while einsum and other options are faster tend to be less readable. One of the frustrations of Python is that in many circumstances you can have either speed or readability but not both at the same time :)

from pygsti.

rileyjmurray avatar rileyjmurray commented on July 26, 2024

@coreyostrove and @enielse, I like vdot because it's fast, consistent (about how it always handles conjugation in the first argument), and readable (once you know what it's useful for).

@coreyostrove, regarding

Another observation I made at that time was that einsum was much faster at performing this trace/diagonal calculation subroutine when acting on products on inputs of the form A and A^T, than on general pairs of matrices A and B. If I recall correctly I concluded this was related to some optimization that must have been happening under the hood due to the fact that A^T was a view into A (you're better equipped to understand exactly what would be happening here, but my guess at the time was better cache behavior).

There is indeed something going on with A.T being a view of A. Computing np.dot(A.ravel(), A.ravel()) will have half the data movement from RAM into cache as np.dot(A.ravel(), A.copy().ravel()). Supposing that $A$ is $m \times n$, this operation involves $O(mn)$ arithmetic and $O(mn)$ data movement. In a one-for-one comparison, moving data costs more than operating on data, so the data movement ends up being the dominant cost in the operation. This can be contrasted with an operation like A.T @ B where $B$ is also $m \times n$ -- which takes $O(mn^2)$ arithmetic and involves $O(mn)$ data movement, leaving arithmetic as the dominant cost by far.

As for

For example, in germ selection there was one particular hotspot in the code related to evaluating an expression of the form trace(A@C@A^T) with A having shape (N, r), C having shape (r,r) and N>>r. You can do this with a one-liner with einsum, but I found that it was significantly advantageous to first take a cholesky decomposition (even with the additional cost that imposed) and then fold the square roots of C (really half since you only need to do this on one since since we know we'll just be taking a transpose) into A before doing the einsum.

The benefits of Cholesky here don't surprise me! It's cool that you figured this out :)

from pygsti.

sserita avatar sserita commented on July 26, 2024

Closed with merged PR :)

from pygsti.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.