This question is a follow-up to

To recap, given two dense square matrices with ComplexF64 elements of size approximately 100000X100000, A and B, what is the fastest way to perform R=A*B*A’, where A’=conj(tr(A)) is the Hermitian conjugate of A?.

If it helps:

(1) A=inv(I-X), where I is the identity matrix and X is another dense matrix.

(2) A and B are not necessarily Hermitian.

(3) I need only a certain sub-block of the result R, for instance, R[aa,bb], where aa,bb are vectors of indices. In the linked problem, the solution (atleast in my system, with the attached code below) looks good for very small number of elements in aa,bb. But for larger aa,bb, the LU-based inverse seems to perform significantly worse. How to go about rectifying this?

In my laptop (using julia -t8, since I have 8 cores)

- N=1000, m=500

```
588.018 ms (13 allocations: 28.62 MiB)
33.331 ms (15 allocations: 30.33 MiB)
```

- N=1000, m=10

```
17.920 ms (12 allocations: 15.50 MiB)
23.468 ms (14 allocations: 17.21 MiB)
```

The first line corresponds to the LU based inverse, while the second corresponds to the naive inverse. In my system, for anything over m=15 (1.5% of matrix dimension), the LU based inverse is slower. For larger matrices (N=10000…), I find that this threshold, where the benefits of the LU based inverse cease to exist, gets lowered.

I have appended the code.

```
using LinearAlgebra
using MKL
using BenchmarkTools
N = 1000;
m = 500;
aa = zeros(Int32, m); bb = zeros(Int32, m);
for ij = 1:m
aa[ij] = rand(1:N);
bb[ij] = rand(1:N);
end
X = 0.1 .* rand(Float64, N,N);
B = 1 .* rand(Float64, N,N);
function LUinv(X,B,a,b)
LU = lu(I - X)
Aa = zeros(eltype(LU), length(a), size(X,1))
for i = 1:length(a)
Aa[i, a[i]] = 1
end # Aa = I[:, a]
rdiv!(Aa, LU) # Aa = inv(I - X)[:, a]
Ab = zeros(eltype(LU), length(b), size(X,1))
for i = 1:length(b)
Ab[i, b[i]] = 1;
end # Ab = I[:, b]
rdiv!(Ab, LU) # Ab = inv(I - X)[:, b]
Rab = length(a) < length(b) ? (Aa * B) * Ab' : Aa * (B * Ab'); # R[a, b]
return Rab
end
function basicinv(X,B,a,b)
G = inv(I-X);
Ga = G';
Rab = G[a,:]*B*Ga[:,b];
return Rab
end
@btime LUinv(X,B,aa,bb)
@btime basicinv(X,B,aa,bb)
```