I’d like to calculate B' * A * B
, where A
is Hermitian
, without allocating intermediary results. Note that B
will be, in general, a rectangular matrix. I’m currently using Tullio to do the calculation, but I was wondering if there are any better ways. Thanks in advance!
Surely you must have at least one allocation for the result, unless you have that pre-allocated. In either case, why not allocate the intermediate result A * B
as well? Why is it imperative this happen without allocation? If you allocate it outside of any loops, it’s a one-time cost that won’t cause any noticeable slowdown (except at extremely small matrix sizes).
I don’t think there’s an efficient way to do this without additional storage. A non-allocating matrix-matrix-matrix multiply is normally a O(N^4) operation, which is much worse than the O(N^3) allocating version of performing it as a sequence of matrix-matrix multiplies.
You might do slightly better if you have a factorization of A
, especially a symmetric factorization like Cholesky. If A == U' * U
then B' * A * B == (U * B)' * (U * B)
. lmul!(U, B)
can overwrite B
in-place with the result of U * B
for some types of U
. But this still would likely not save you much (and any gains would be overwhelmed by the cost of factorizing A
if it was not already). And you’d still be left needing to allocate space for the final result.