I’d like to calculate `B' * A * B`

, where `A`

is `Hermitian`

, without allocating intermediary results. Note that `B`

will be, in general, a rectangular matrix. I’m currently using Tullio to do the calculation, but I was wondering if there are any better ways. Thanks in advance!

Surely you must have at least one allocation for the result, unless you have that pre-allocated. In either case, why not allocate the intermediate result `A * B`

as well? Why is it imperative this happen without allocation? If you allocate it outside of any loops, it’s a one-time cost that won’t cause any noticeable slowdown (except at extremely small matrix sizes).

I don’t think there’s an efficient way to do this without additional storage. A non-allocating matrix-matrix-matrix multiply is normally a O(N^4) operation, which is much worse than the O(N^3) allocating version of performing it as a sequence of matrix-matrix multiplies.

You might do slightly better if you have a factorization of `A`

, especially a symmetric factorization like Cholesky. If `A == U' * U`

then `B' * A * B == (U * B)' * (U * B)`

. `lmul!(U, B)`

can overwrite `B`

in-place with the result of `U * B`

for some types of `U`

. But this still would likely not save you much (and any gains would be overwhelmed by the cost of factorizing `A`

if it was not already). And you’d still be left needing to allocate space for the final result.