(On the latest commit of develop, at time of writing. pyGSTi 0.9.12.) <p dir="auto

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Replace instances of ``np.trace(np.dot(...`` when computing the trace inner product or Frobenius norm. about pygsti HOT 5 CLOSED

rileyjmurray commented on July 26, 2024

Replace instances of ``np.trace(np.dot(...`` when computing the trace inner product or Frobenius norm.

from pygsti.

Comments (5)

coreyostrove commented on July 26, 2024

Some places even do this as a way to compute (squared) Frobenius norms.

This hurts me a bit in my soul.

I'm generally supportive of this. I used einsum for accelerating the trace for products of matrices a bunch when re-implementing the germ selection algorithm and indeed found it to be much faster. One thing that I did find back then when doing profiling was that even though you can do the trace of the product in one fell swoop with einsum, it was faster to actually stop a bit short and compute the diagonal entries and then sum those with np.sum (no idea why this was/is the case). You can see an example of where I used that pattern here:

pyGSTi/pygsti/algorithms/germselection.py

Line 3473 in ee21585

    
           inv_update_term_diag= _np.einsum('ij,ji->i', pinv_E_beta_central_chol, pinv_E_beta_central_chol.T)

. I couldn't tell you exactly what the performance differential was exactly anymore (that's probably a lie, I'm just too lazy to find the notebook where I tested this), but it was enough I decided it was worth doing. Ymmv though, so I'd recommend you try profiling that on your end.

Another observation I made at that time was that einsum was much faster at performing this trace/diagonal calculation subroutine when acting on products on inputs of the form A and A^T, than on general pairs of matrices A and B. If I recall correctly I concluded this was related to some optimization that must have been happening under the hood due to the fact that A^T was a view into A (you're better equipped to understand exactly what would be happening here, but my guess at the time was better cache behavior). I concluded this was related to being a view in part by checking what happened if instead I passed in a copy of A^T and found that the performance fell back in line with general pairs of matrices. I mention all of this because the trace(A@A^T) pattern is one that appears in a bunch of the cost functions we use in experiment design (and probably other places) due to its relation to something called A-optimal experiment design.

One last incoherent nugget extracted from the deeper confines of my memory. I have found that the aforementioned performance boost from using einsum for diagonals/traces of products of matrices of the form A and A^T is big enough that it in some instances justified doing some otherwise weird looking additional calculations. For example, in germ selection there was one particular hotspot in the code related to evaluating an expression of the form trace(A@C@A^T) with A having shape (N, r), C having shape (r,r) and N>>r. You can do this with a one-liner with einsum, but I found that it was significantly advantageous to first take a cholesky decomposition (even with the additional cost that imposed) and then fold the square roots of C (really half since you only need to do this on one since since we know we'll just be taking a transpose) into A before doing the einsum.

Thanks for coming to my TED talk.

from pygsti.

coreyostrove commented on July 26, 2024

Whoops, just read your PR related to this. vdot is cool too.

from pygsti.

enielse commented on July 26, 2024

I'm generally for this, and would suggest we add comments to the effected lines where it seems appropriate, as while einsum and other options are faster tend to be less readable. One of the frustrations of Python is that in many circumstances you can have either speed or readability but not both at the same time :)

from pygsti.

rileyjmurray commented on July 26, 2024

@coreyostrove and @enielse, I like vdot because it's fast, consistent (about how it always handles conjugation in the first argument), and readable (once you know what it's useful for).

@coreyostrove, regarding

Another observation I made at that time was that einsum was much faster at performing this trace/diagonal calculation subroutine when acting on products on inputs of the form A and A^T, than on general pairs of matrices A and B. If I recall correctly I concluded this was related to some optimization that must have been happening under the hood due to the fact that A^T was a view into A (you're better equipped to understand exactly what would be happening here, but my guess at the time was better cache behavior).

There is indeed something going on with A.T being a view of A. Computing np.dot(A.ravel(), A.ravel()) will have half the data movement from RAM into cache as np.dot(A.ravel(), A.copy().ravel()). Supposing that $A$ is $m \times n$, this operation involves $O(mn)$ arithmetic and $O(mn)$ data movement. In a one-for-one comparison, moving data costs more than operating on data, so the data movement ends up being the dominant cost in the operation. This can be contrasted with an operation like A.T @ B where $B$ is also $m \times n$ -- which takes $O(mn^2)$ arithmetic and involves $O(mn)$ data movement, leaving arithmetic as the dominant cost by far.

As for

For example, in germ selection there was one particular hotspot in the code related to evaluating an expression of the form trace(A@C@A^T) with A having shape (N, r), C having shape (r,r) and N>>r. You can do this with a one-liner with einsum, but I found that it was significantly advantageous to first take a cholesky decomposition (even with the additional cost that imposed) and then fold the square roots of C (really half since you only need to do this on one since since we know we'll just be taking a transpose) into A before doing the einsum.

The benefits of Cholesky here don't surprise me! It's cool that you figured this out :)

from pygsti.

sserita commented on July 26, 2024

Closed with merged PR :)

from pygsti.

Replace instances of ``np.trace(np.dot(...`` when computing the trace inner product or Frobenius norm. about pygsti HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent