I now believe that both Julia and Matlab can run the same BLAS and get different timings. To follow up my previous comment on JIT and allocations, I ran some more tests and found Matlab about 1 ms faster.
Julia (11.8 ms):
julia> using LinearAlgebra, MKL, BenchmarkTools
julia> M = 288;
julia> @btime eigen(H) setup=(A=rand(ComplexF64,M,M);H=A+A');
11.775 ms (16 allocations: 4.03 MiB)
Matlab (10.5 ms):
M = 288;
timee = eigtimer(M,1000);
mean(timee(10:end)) % ans = 0.0105
function time = eigtimer(M,N)
time = zeros(N,1);
for i = 1:N
A = rand(M, M) + 1i * rand(M, M);
A = A + A';
tic
[T, V] = eig(A);
time(i) = toc;
end
end
Assuming both are calling a similar MKL BLAS, I previously speculated that there could be a difference in allocations. Julia’s eigen!
calls LAPACK.geevx!
which includes some boilerplate that allocates seven (7) matrices similar to A
(specifically they are wr, wi, VL, VR, scale, rconde, rcondv
). eigen
also allocates a copy of A
. There is no mechanism to feed pre-allocated storage, so this is allocated anew each time you call it.
Perhaps Matlab’s JIT figures out that it can re-use existing space? Let’s try just allocating a 288x288 matrix.
Julia (45 µs):
julia> @btime V=zeros(288,288);
45.002 μs (2 allocations: 648.05 KiB)
Matlab (26.3 µs, freshly restarted to flush JIT):
M = 288;
time = zeros(1000,1);
for i = 1:1000
tic;
V = zeros(M,M);
time(i) = toc;
end
mean(time(10:end))
I have trouble believing that Matlab can allocate a matrix 42% faster than Julia, so I am guessing that it figures out not to do it. Now let’s apply that to eigenvalues. For Julia, 45 µs x 8 copies x 2 re-im = 0.72 ms. If Matlab avoids those allocations, that almost makes up for the 1 ms difference.
Here’s a plot from Matlab that shows the allocations getting faster within the first ten or so iterations, and the eigenvalues similarly adapting but then reaching the steady state of about 10.5 ms.
Conclusion? I believe Julia leaves some performance on the table, because there’s no easy mechanism to re-use previous allocations with eigen
and the like (there are 176 matches for similar
in LAPACK.jl
). It might help if there were a truly in-place design pattern LAPACK.geevx!!(A, ... bufferspace)
. Then geevx!
could retain its original functionality by allocating and then calling geevx!!
. And if you know what you’re doing, you can re-use your own space repeatedly.
Also as a side note, I wonder if it’s a good idea for geevx!
to allocate seven copies with similar
, or better to allocate a 7x array just once and use views?