Optimizing Linear Algebra Code?

There is definitely something wrong here. Anything more than a factor of 2 difference is suspicious.

Can you make an absolute minimal example in both Matlab and Julia to show both code and benchmarking? Right now there is a bit too much code and things going on in your example, so I tried to condense it.

Here is my MWE, using Julia 1.6 (OpenBLAS) and Matlab R2021a (MKL). Does it capture your situation?

Julia:

using BenchmarkTools

N, M = 1000, 256
A = rand(ComplexF64, N, M^2)
B = rand(ComplexF64, M, M)

testmul(X, Y) = X * vec(Y)

Matlab:

N = 1000;
M = 256;
A = rand(N, M^2) + 1i * rand(N, M^2);
B = rand(M,M) + 1i * rand(M,M);

Benchmarks:

jl> @btime testmul($A, $B);
  39.787 ms (3 allocations: 15.83 KiB)

>> timeit(@()A * B(:))
ans =
    0.0410

These speeds are basically identical.

(BTW: Instead of this: typeof(Test_mat[1,1]), it is more idiomatic to use eltype(Test_mat))

1 Like