I now believe that both Julia and Matlab can run the same BLAS and get different timings. To follow up my previous comment on JIT and allocations, I ran some more tests and found Matlab about 1 ms faster.
Julia (11.8 ms):
julia> using LinearAlgebra, MKL, BenchmarkTools
julia> M = 288;
julia> @btime eigen(H) setup=(A=rand(ComplexF64,M,M);H=A+A');
11.775 ms (16 allocations: 4.03 MiB)
Matlab (10.5 ms):
M = 288;
timee = eigtimer(M,1000);
mean(timee(10:end)) % ans = 0.0105
function time = eigtimer(M,N)
time = zeros(N,1);
for i = 1:N
A = rand(M, M) + 1i * rand(M, M);
A = A + A';
[T, V] = eig(A);
time(i) = toc;
Assuming both are calling a similar MKL BLAS, I previously speculated that there could be a difference in allocations. Julia’s
LAPACK.geevx! which includes some boilerplate that allocates seven (7) matrices similar to
A (specifically they are
wr, wi, VL, VR, scale, rconde, rcondv).
eigen also allocates a copy of
A. There is no mechanism to feed pre-allocated storage, so this is allocated anew each time you call it.
Perhaps Matlab’s JIT figures out that it can re-use existing space? Let’s try just allocating a 288x288 matrix.
Julia (45 µs):
julia> @btime V=zeros(288,288);
45.002 μs (2 allocations: 648.05 KiB)
Matlab (26.3 µs, freshly restarted to flush JIT):
M = 288;
time = zeros(1000,1);
for i = 1:1000
V = zeros(M,M);
time(i) = toc;
I have trouble believing that Matlab can allocate a matrix 42% faster than Julia, so I am guessing that it figures out not to do it. Now let’s apply that to eigenvalues. For Julia, 45 µs x 8 copies x 2 re-im = 0.72 ms. If Matlab avoids those allocations, that almost makes up for the 1 ms difference.
Here’s a plot from Matlab that shows the allocations getting faster within the first ten or so iterations, and the eigenvalues similarly adapting but then reaching the steady state of about 10.5 ms.
Conclusion? I believe Julia leaves some performance on the table, because there’s no easy mechanism to re-use previous allocations with
eigen and the like (there are 176 matches for
LAPACK.jl). It might help if there were a truly in-place design pattern
LAPACK.geevx!!(A, ... bufferspace). Then
geevx! could retain its original functionality by allocating and then calling
geevx!!. And if you know what you’re doing, you can re-use your own space repeatedly.
Also as a side note, I wonder if it’s a good idea for
geevx! to allocate seven copies with
similar, or better to allocate a 7x array just once and use views?