I have a positive definite high-dimensional matrix A and wish to compute B’ * inv(A) * B where the result is low-dimensional. What is the fastest way of accomplishing this for generic B?

To start, writing it as `(A\B')*B`

will be more accurate and likely faster.

Do this: `B' * (A \ B)`

Sure. I was looking for something a little better than that. And preferably something that guarantees that the result is exactly symmetric (instead of symmetric up to rounding error).

`Symmetric(B' * (A \ B))`

would do that.

Is `A`

hermitian?

It’s positive-definite even. Are you thinking a Cholesky decomposition?

Yes.

That would work. I’d been hoping for something canned in MKL or some such library for the whole thing.

I’ll see if I can do Cholesky, then Tullio.

Yeah, Cholesky and then Tullio was exactly my thought.

Is `B`

sparse? Is `A`

sparse?

No no

A = LL^T, C = L^{-1} B, then compute C^T C.

Edit: by L^{-1} B I mean `L \ B`

.

Right, that’s Cholesky.

It’s Cholesky, but don’t do `A \ B`

, it’s unnecessary.

Isn’t that `L \ B`

?

I see: you are saying that only backward substitution is needed as opposed to back/forward for `A \ B`

.

Yes, exactly.

For a 2,000 by 2,000 matrix A and 2,000 by 10 matrix B, I tried

```
function one()
C[:,:] =Symmetric( B' * (A \ B ) )
end
function two()
cholesky!(A)
D[:,:] = A \ B
@tullio C[i,j] = D[k,i] * D[k,j]
end
function three()
D[:,:] = cholesky(A).U \ B
@tullio C[i,j] = D[k,i] * D[k,j]
end
function four()
D[:,:] = cholesky(A).U \ B
C[:,:] = D' * D
end
function five()
D = cholesky(A).U \ B
C[:,:] = D' * D
end
```

Here are the second run times:

0.032261 seconds (12 allocations: 30.687 MiB, 8.10% gc time)

0.031080 seconds (25 allocations: 30.686 MiB)

0.015401 seconds (24 allocations: 30.671 MiB, 13.39% gc time)

0.009717 seconds (15 allocations: 30.671 MiB)

0.008867 seconds (13 allocations: 30.671 MiB)

I’m puzzled by the garbage collection time for three(), but there appears to be a clear winner. My matrices are of the order 4_000 * 4_000 and it won’t have to be done more than a few thousand times per run, so this will be good enough.

Thanks all!

(there were some preallocations there)