Most efficient way of computing B'* inv(A)* B for A large

Is this model reduction?

For the sake of completeness, B’*inv(A)*B does better than I’d thought: 0.45 seconds on second run.

Can tullio incorporate symmetry?

@PetrKryslUCSD I’m using it for some optimization problem in many variables. I can simplify this by writing it as a nested problem with the inner nest a bunch of high-dimensional problems. The Hessian of the outer problem then becomes the sum over terms of the form C-B’A^{-1}B, where the A’s are the Hessians of the inner problem.

Good question; I don’t know. Doesn’t look like it.

It looks like you may be trying to solve a system with

\begin{bmatrix} A & B \\ B^T & -C \end{bmatrix}.

If C is also positive definite, you can factorize the above matrix directly while taking advantage of its structure using https://github.com/JuliaSmoothOptimizers/LDLFactorizations.jl.

3 Likes

Not quite. But yes, partitioned inverses are related.

If you’re looking for the inplace version of backsolve it’s ldiv!().
More generally, the syntax to write into a matrix D is D .= ....

3 Likes

Ha, I knew about ldiv, but thought .= was only for assigning scalars, thanks!

1 Like

Tried another run between some major contenders with B 3_000 by 10. The difference between the first four contenders is small. I’m surprised at the gain of using MKL (my computer has an AMD processor).

function one()
D.= cholesky(A).U \ B
@tullio C[i,j] = D[k,i] * D[k,j]
end

function two()
ldiv!(D, cholesky(A).U, B)
@tullio C[i,j] = D[k,i] * D[k,j]
end

function three()
D .= cholesky(A).U \ B
C .= D’ * D
end

function four()
ldiv!(D, cholesky(A).U, B)
C .= D’ * D
end

function five()
C.= B’ * inv(cholesky(A)) * B
end

with openblas:

61.508 ms (40 allocations: 68.89 MiB)
61.744 ms (36 allocations: 68.67 MiB)
61.966 ms (15 allocations: 68.89 MiB)
61.603 ms (11 allocations: 68.67 MiB)
142.128 ms (15 allocations: 137.56 MiB)

with mkl:

34.641 ms (40 allocations: 68.89 MiB)
34.490 ms (36 allocations: 68.67 MiB)
34.084 ms (15 allocations: 68.89 MiB)
33.951 ms (11 allocations: 68.67 MiB)
129.186 ms (15 allocations: 137.56 MiB)

What if you do the loops by hand with LoopVectorization, taking into account the symmetry. Can you halve the fastest time?