Fastest way to perform A[a,:] * B * A’[:,b] where a and b are vectors of indices

When you construct Aa, construct it transposed so that you don’t have to pass Aa' to ldiv! — I suspect that you are hitting a slow generic fallback (not LAPACK).

Then, at the end, multiply Aa' * B (there are fast BLAS calls for multiplying by transposed matrices).

1 Like