Most efficient way of computing B'* inv(A)* B for A large

I have a positive definite high-dimensional matrix A and wish to compute B’ * inv(A) * B where the result is low-dimensional. What is the fastest way of accomplishing this for generic B?

1 Like

To start, writing it as (A\B')*B will be more accurate and likely faster.

1 Like

Do this: B' * (A \ B)

Sure. I was looking for something a little better than that. And preferably something that guarantees that the result is exactly symmetric (instead of symmetric up to rounding error).

Symmetric(B' * (A \ B)) would do that.

Is A hermitian?


It’s positive-definite even. Are you thinking a Cholesky decomposition?


1 Like

That would work. I’d been hoping for something canned in MKL or some such library for the whole thing.

I’ll see if I can do Cholesky, then Tullio.

Yeah, Cholesky and then Tullio was exactly my thought. :slight_smile:


Is B sparse? Is A sparse?

No no

A = LL^T, C = L^{-1} B, then compute C^T C.

Edit: by L^{-1} B I mean L \ B.


Right, that’s Cholesky.

It’s Cholesky, but don’t do A \ B, it’s unnecessary.

Isn’t that L \ B?

I see: you are saying that only backward substitution is needed as opposed to back/forward for A \ B.

Yes, exactly.

For a 2,000 by 2,000 matrix A and 2,000 by 10 matrix B, I tried

function one()
    C[:,:] =Symmetric(  B' * (A \ B ) )

function two()
    D[:,:] = A \ B
    @tullio C[i,j] = D[k,i] * D[k,j]

function three()
    D[:,:] = cholesky(A).U \ B
    @tullio C[i,j] = D[k,i] * D[k,j]

function four()
    D[:,:] = cholesky(A).U \ B
    C[:,:] = D' * D
function five()
    D = cholesky(A).U \ B
    C[:,:] = D' * D

Here are the second run times:
0.032261 seconds (12 allocations: 30.687 MiB, 8.10% gc time)
0.031080 seconds (25 allocations: 30.686 MiB)
0.015401 seconds (24 allocations: 30.671 MiB, 13.39% gc time)
0.009717 seconds (15 allocations: 30.671 MiB)
0.008867 seconds (13 allocations: 30.671 MiB)

I’m puzzled by the garbage collection time for three(), but there appears to be a clear winner. My matrices are of the order 4_000 * 4_000 and it won’t have to be done more than a few thousand times per run, so this will be good enough.

Thanks all!
(there were some preallocations there)