What's the most efficient way to compute the matrix product A'*B*A for general B?

The matrix A has orthonormal columns, but is rectangular (tall). Currently, I allocate caches C and D, and evaluate

mul!(D, A', mul!(C, B, A)

In my use case, A and B are somewhat large matrices, so I’d ideally like to avoid allocating the intermediate matrix C. I was wondering if there’s a way to use the structure of A to come up with a smarter algorithm? Unfortunately, B is a general (maybe complex) matrix, and doesn’t have an obvious structure.

By your setup, we know that B is square, but we can’t assume anything about symmetry?

Is A very “thin” relative to B? Could it be feasible to compute a truncated SVD to the number if columns of A, then transform the singular vectors by A?

A and B are roughly similar-sized (eg. if B is 3000 x 3000, A is 3000 x 2880). B is indeed square, but there’s no symmetry.

Do you have to compute each product only once, or multiple times with the same B (e.g. A_i^* B A_i for i=1,2,\dots) or with the same A?

I do need to compute the product multiple times for different B, but the same A.

If you need to perform the multiplications really a lot of times AND matrix A has an SVD decomposition A = U S V’ which can be truncated without altering the representation of A, then you might try to pay the price for the SVD decomposition and gain during the calculation of A’B A since there you multiply smaller matrices.