slow performance of eig

question

#1

Hello,

the performance of the determination of the eigenvalues and the eigenvector for small complex matrices in Julia seems for me quite slow. If I use the following code

N = 100000
dim = 3
function eigen()
for i = 1:N
A = rand(dim, dim) + im * rand(dim, dim)
eig(A)
end
end
@time eigen()

it takes about 3sec.

3.226431 seconds (5.76 M allocations: 436.489 MiB, 2.34% gc time)

If I use the same code in Python (with numpy) the code is slighty slower (4.4s), however by using numba I can speed up the loop and it takes then only 0.7s which is much faster than Julia.

Is there a workaround or package to enhance the performance for these matrices?

PS: versioninfo in Julia gives the following
Julia Version 0.6.2
Commit d386e40c17* (2017-12-13 18:08 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel® Core™ i5-6500 CPU @ 3.20GHz
WORD_SIZE: 64
BLAS: mkl_rt
LAPACK: mkl_rt
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, skylake)


#2

Most likely you are benchmarking the performance of rand() and not of eig(). Did you try to pre-calculate the random arrays and @time only the eig() calculations?


#4

Can you try BenchmarkTools? For me it looks like you are measuring compile time.


#5

If you want to benchmark eig, you should do that without mixing in the performance of rand, and also avoid using global variables (N and dim are globals), which violates the “first guideline of Julia benchmarking”.

Here’s the “right” way:

using BenchmarkTools
A = rand(Complex{Float64}, dim, dim)
@btime eig($A)

#6

Well, if you define them as const this should also be fine:
const N=100000
const dim=3


#7

Dear All,

thanks a lot for the comments and help. If I use BenchmarkTools and define the matrix before the loop I can reduce the computation time by 1s. But its still about 3 times slower than using numpy + numba.

The code I used now is the following:

using BenchmarkTools
const mat = rand(3, 3) + im * rand(3, 3)
@btime for i = 1:100000 eig(mat) end;

The output is then:
2.132 s (3800000 allocations: 289.92 MiB)


#8

You don’t need the loop since btime will take care of this for you and you should interpolate the variables in a btime expression as such @btime eig($mat).


#9

Thanks. Sorry, I’m still a beginner in Julia. If I remove the loop I’ll end up with 20µs for one call. But still, for 10^5 calls I need 2s. So there is only a small enhancement of the time. The main result, that its much slower than numpy+numba, is not changing.


#10

Note that this creates several unnecessary temporary arrays. You should do this instead:

rand(Complex{Float64}, 3, 3)

which creates just one array.

Yes, the speedups from better benchmarking was not as dramatic as expected. Can you provide the python code you use for benchmarking? I am seeing a factor of 2x difference. Perhaps an issue should be opened.

Python:

In [12]: A = np.random.rand(3,3) + 1j*np.random.rand(3,3)  # I don't know how to do this correctly in Python! :/

In [13]: %timeit np.linalg.eig(A)
30.4 µs ± 539 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

         @jit
    ...: def myeig(A):
    ...:     return np.linalg.eig(A)

In [27]: %timeit myeig(A)
10.7 µs ± 25.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Julia

julia> A = rand(Complex{Float64}, 3, 3);

julia> @btime eigen($A);
  20.381 μs (35 allocations: 8.80 KiB)

#11

Are you using Python and Numba with MKL or with OpenBLAS?


#12

Are you asking me or @itzui? I’m not sure, my np.__config__ seems to indicate MKL. It’s whatever ships with Anaconda/Python 3.6.

I guess this is all up to the BLAS library then. But I am surprised how slow this all is. Even for 2x2 matrices it runs in 16us, while I would have expected 16 nano seconds, or something. There must be a tremendous overhead due to BLAS.

So I tried StaticArrays, and to my surprise it was no faster. (In fact, StaticArrays gives an error for non-hermitian matrices! But even for hermitian ones it was very slow.)


#13

My code for testing in python was the following (i know that it is optimized):

from numba import njit, prange
import numpy as np
import time

@njit
def test():
for i in prange(0,100000):
A = np.random.rand(3, 3) + 1j * np.random.rand(3,3)
np.linalg.eig(A)
return

As result I got
773 ms ± 8.37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

np.__config__ tells me, that MKL is used. But this should also be true for Julia if I interpreted the versioninfo() (BLAS: mkl_rt) correctly


#14

Static arrays don’t have optimized implementations of complex hermitian eigenproblems, and fall back to regular eig : https://github.com/JuliaArrays/StaticArrays.jl/blob/master/src/eigen.jl, although it possibly shouldn’t be too hard to adapt the existing code? Lapack and Blas libraries are pretty slow for small matrices (newer mkl have a special “fast” mode for such operations, but I’m not sure how fast it is)


#15

One of the issues with eigen is that it is type unstable, which is causing overhead for tiny matrices.

julia> @benchmark LinearAlgebra.LAPACK.geevx!('B', 'N', 'V', 'N', copy($A))
BenchmarkTools.Trial: 
  memory estimate:  7.59 KiB
  allocs estimate:  13
  --------------
  minimum time:     6.627 μs (0.00% GC)
  median time:      6.833 μs (0.00% GC)

julia> @benchmark eigen($A)
BenchmarkTools.Trial: 
  memory estimate:  8.98 KiB
  allocs estimate:  38
  --------------
  minimum time:     16.252 μs (0.00% GC)
  median time:      17.885 μs (0.00% GC)

Type instability stems from the fact that it sometimes uses real arithmetic when the eigenvalues and eigenvectors are real, and complex arithmetic otherwise.

Edit: I just realized that the above might look unintuitive when dealing with complex-valued matrices, so here’s an example. When the matrix is Hermitian (A' == A) the eigenvalues are real, and so is the vector that Julia returns. For non-Hermitian matrices vice-versa:

julia> eigen(begin
         A = rand(ComplexF64, 3, 3)
         A + A'
       end).values |> typeof
Array{Float64,1}

julia> eigen(rand(ComplexF64, 3, 3)).values |> typeof
Array{Complex{Float64},1}

#16

Thanks for the comment. Does it mean, that for small matrices it is better to test the type of the matrix and call the LAPACK routine directly?