# Most efficient way of computing B'* inv(A)* B for A large

I have a positive definite high-dimensional matrix A and wish to compute B’ * inv(A) * B where the result is low-dimensional. What is the fastest way of accomplishing this for generic B?

1 Like

To start, writing it as `(A\B')*B` will be more accurate and likely faster.

1 Like

Do this: `B' * (A \ B)`

Sure. I was looking for something a little better than that. And preferably something that guarantees that the result is exactly symmetric (instead of symmetric up to rounding error).

`Symmetric(B' * (A \ B))` would do that.

Is `A` hermitian?

3 Likes

It’s positive-definite even. Are you thinking a Cholesky decomposition?

Yes.

1 Like

That would work. I’d been hoping for something canned in MKL or some such library for the whole thing.

I’ll see if I can do Cholesky, then Tullio.

Yeah, Cholesky and then Tullio was exactly my thought.

2 Likes

Is `B` sparse? Is `A` sparse?

No no

A = LL^T, C = L^{-1} B, then compute C^T C.

Edit: by L^{-1} B I mean `L \ B`.

2 Likes

Right, that’s Cholesky.

It’s Cholesky, but don’t do `A \ B`, it’s unnecessary.

Isn’t that `L \ B`?

I see: you are saying that only backward substitution is needed as opposed to back/forward for `A \ B`.

Yes, exactly.

For a 2,000 by 2,000 matrix A and 2,000 by 10 matrix B, I tried

``````function one()
C[:,:] =Symmetric(  B' * (A \ B ) )
end

function two()
cholesky!(A)
D[:,:] = A \ B
@tullio C[i,j] = D[k,i] * D[k,j]
end

function three()
D[:,:] = cholesky(A).U \ B
@tullio C[i,j] = D[k,i] * D[k,j]
end

function four()
D[:,:] = cholesky(A).U \ B
C[:,:] = D' * D
end

function five()
D = cholesky(A).U \ B
C[:,:] = D' * D
end
``````

Here are the second run times:
0.032261 seconds (12 allocations: 30.687 MiB, 8.10% gc time)
0.031080 seconds (25 allocations: 30.686 MiB)
0.015401 seconds (24 allocations: 30.671 MiB, 13.39% gc time)
0.009717 seconds (15 allocations: 30.671 MiB)
0.008867 seconds (13 allocations: 30.671 MiB)

I’m puzzled by the garbage collection time for three(), but there appears to be a clear winner. My matrices are of the order 4_000 * 4_000 and it won’t have to be done more than a few thousand times per run, so this will be good enough.

Thanks all!
(there were some preallocations there)