In quantum statistical mechanics, it is common to compute the trace of the product of two (or more) matrices. The obvious way of expressing this in julia is tr(A * B)
; however, this results in the computation of all elements of A * B
even though only the diagonal elements are required for the result, thus turning what could be an O(n^2) operation into an O(n^3) one.
Because this is highly suboptimal for large matrices, I often define
import Compat
function trprod(A::AbstractMatrix, B::AbstractMatrix)
let Ah = A'
size(Ah) == size(B) || throw(DimensionMismatch())
Compat.dot(Ah, B)
end
end
I prefer not to write Compat.dot(A', B)
directly because I think this obscures what I mean (and besides, this would skip the check on the shapes of A
and B
). But I always wonder: is there an existing way of expressing this functionality in a clear and direct way that I’m simply unaware of?
Assuming there is no existing method, this pattern is so common that I think it may make sense to have a widely available package with this functionality. That said, I’m trying to figure out the appropriate generality and level of abstraction for such a package. Some thoughts:
- It would be nice if optimized methods were to exist for when one or more matrices are
Hermitian
, since this is very common in quantum mechanics. - If the matrices are not all square, then due to the cyclic property of the trace, there may be a preferred method for computing
trprod
which minimizes the overall number of operations, given the shapes of the matrices. - More generally, one might wish to consider a method that explicitly computes only the diagonal of a product of matrices. Then one could use e.g.
tr(diagprod(A, B))
(although this would result in an unnecessary allocation to store the diagonal). However, this is slightly at odds with the previous point, which relies on a special property of the trace for optimization. That said, would having such adiagprod
function be useful in other contexts/domains? - One alternative to
diagprod
mentioned above would be to have a view that computes elements of a product of two matrices on-the-fly as they are needed. Then one could usetr(prodview(A, B))
to compute the trace without any need for allocation (although I would need to think more about whether this would result in efficient code from a cache-locality perspective). Note that computing elements on-the-fly would be inefficient for a product of more than two matrices, soprodview
should accept no more than two arguments, and one would need to do e.g.prodview(A, B * C)
explicitly (and pay attention to the optimal way to decompose the multiplication based on the relative sizes of the matrices). But might such a function be useful in other contexts that I have not considered here?