In quantum statistical mechanics, it is common to compute the trace of the product of two (or more) matrices. The obvious way of expressing this in julia is `tr(A * B)`

; however, this results in the computation of all elements of `A * B`

even though only the diagonal elements are required for the result, thus turning what could be an O(n^2) operation into an O(n^3) one.

Because this is highly suboptimal for large matrices, I often define

```
import Compat
function trprod(A::AbstractMatrix, B::AbstractMatrix)
let Ah = A'
size(Ah) == size(B) || throw(DimensionMismatch())
Compat.dot(Ah, B)
end
end
```

I prefer not to write `Compat.dot(A', B)`

directly because I think this obscures what I mean (and besides, this would skip the check on the shapes of `A`

and `B`

). But I always wonder: is there an existing way of expressing this functionality in a clear and direct way that I’m simply unaware of?

Assuming there is no existing method, this pattern is so common that I think it may make sense to have a widely available package with this functionality. That said, I’m trying to figure out the appropriate generality and level of abstraction for such a package. Some thoughts:

- It would be nice if optimized methods were to exist for when one or more matrices are
`Hermitian`

, since this is very common in quantum mechanics. - If the matrices are not all square, then due to the cyclic property of the trace, there may be a preferred method for computing
`trprod`

which minimizes the overall number of operations, given the shapes of the matrices. - More generally, one might wish to consider a method that explicitly computes only the diagonal of a product of matrices. Then one could use e.g.
`tr(diagprod(A, B))`

(although this would result in an unnecessary allocation to store the diagonal). However, this is slightly at odds with the previous point, which relies on a special property of the trace for optimization. That said, would having such a`diagprod`

function be useful in other contexts/domains? - One alternative to
`diagprod`

mentioned above would be to have a view that computes elements of a product of two matrices on-the-fly as they are needed. Then one could use`tr(prodview(A, B))`

to compute the trace without any need for allocation (although I would need to think more about whether this would result in efficient code from a cache-locality perspective). Note that computing elements on-the-fly would be inefficient for a product of more than two matrices, so`prodview`

should accept no more than two arguments, and one would need to do e.g.`prodview(A, B * C)`

explicitly (and pay attention to the optimal way to decompose the multiplication based on the relative sizes of the matrices). But might such a function be useful in other contexts that I have not considered here?