SVD 2x slower than in Matlab and how to get best performance on Windows10


#1

Hello ,

Recently, I found that julia svd is slower than the one in matlab. It has been discussed once here in https://github.com/JuliaLang/julia/issues/3521, but after reading this, I am still not quite sure what is exactly happening here. Could someone please help me? Thanks in advance.

The code is

A = rand(1000,1000);
@benchmark svd(A)

output is

BenchmarkTools.Trial:
  memory estimate:  45.90 MiB
  allocs estimate:  13
  --------------
  minimum time:     449.142 ms (0.90% GC)
  median time:      580.609 ms (3.14% GC)
  mean time:        597.990 ms (4.71% GC)
  maximum time:     784.079 ms (2.63% GC)
  --------------
  samples:          9
  evals/sample:     1

and for Matlab is

tic;svd(A);toc

output is

Elapsed time is 0.286179 seconds.

I am using Windows10 system, julia is the binary downloaded from the website. It seems the lapack version for julia is libopenblas64_, and mkl for matlab. Could this be the reasons?

Is anyone able to intall julia on Windows10 so that it has equivalent performance as matlab? If so, how did you do that? Thanks very much.

Sincerely
BaiCai


#2

It is likely the reason.

You can build julia with MKL but it is a bit tricky. Check out https://github.com/JuliaComputing/MKL.jl (but read the caveats).


#4

It’s a little bit embarrassing, but I don’t know how to install it. There is no instructions on how to install it. And since I already have mkl installed on my computer, reinstall mkl is not necessary. Thanks anyway, i would try to figure it out.


#5

It’s very difficult to build with MKL, on a Mac at least. I do it, but it requires

  1. Installing MKL
  2. Setting Make.user
  3. Running make until it errors out
  4. Make symbolic links as explained in https://github.com/JuliaLang/julia/issues/15133
  5. Rerun make

That said, it’s worth it for the speed up. Although you can’t use PyPlot.jl with MKL so for plotting it helps to keep both versions around.


#6

For me, on Linux, it is super simple to link against MKL. Just follow the instructions on Julia’s github page.

Also, FWIW, it is possible to have MKL and use PyPlot.jl. See https://github.com/JuliaPy/PyCall.jl/issues/443#issuecomment-405632507.


#7

What’s wrong with MKL.jl way of doing it? Is it not Mac compatible?


#8

I haven’t tried MKL.jl, but the warning about slow REPL is not encouraging.


#9

In my opinion, the proper solution for new users who wants MATLAB like performance in Julia is bringing back the MKL Flavor of Julia Pro.

Preferably with all the tweaks of the latest versions of MKL (Handling small matrices, doing same operation many times, etc…) talked about.

In real world it will make a significant difference compared to OpenBLAS used now.


#10

Update:

I did more benchmarks on MKL and OPENBLAS. Also more comparison with matlab. Here are the results.

MKL:
single thread (by setting BLAS.set_num_threads(1)), the results are:

 > A = rand(1000,1000)
 > @benchmark svd(A)
BenchmarkTools.Trial: 
  memory estimate:  45.90 MiB
  allocs estimate:  13
  --------------
  minimum time:     255.843 ms (0.10% GC)
  median time:      275.403 ms (0.09% GC)
  mean time:        270.219 ms (1.18% GC)
  maximum time:     298.626 ms (7.98% GC)
  --------------
  samples:          19
  evals/sample:     1

6 threads (BLAS.set_num_threads(6)):

@benchmark svd(A)
BenchmarkTools.Trial: 
  memory estimate:  45.90 MiB
  allocs estimate:  13
  --------------
  minimum time:     108.813 ms (4.38% GC)
  median time:      111.913 ms (0.24% GC)
  mean time:        114.318 ms (1.39% GC)
  maximum time:     154.178 ms (0.23% GC)
  --------------
  samples:          44
  evals/sample:     1

OPENBLAS:

single thread:

> A = rand(1000,1000)
>@benchmark svd(A)
BenchmarkTools.Trial: 
  memory estimate:  45.90 MiB
  allocs estimate:  13
  --------------
  minimum time:     322.695 ms (0.19% GC)
  median time:      356.963 ms (1.86% GC)
  mean time:        353.865 ms (3.56% GC)
  maximum time:     385.039 ms (8.32% GC)
  --------------
  samples:          15
  evals/sample:     1

6 threads:

>A = rand(1000,1000)
>@benchmark svd(A)
BenchmarkTools.Trial: 
  memory estimate:  45.90 MiB
  allocs estimate:  13
  --------------
  minimum time:     182.592 ms (3.35% GC)
  median time:      202.389 ms (3.40% GC)
  mean time:        212.733 ms (5.51% GC)
  maximum time:     251.271 ms (3.24% GC)
  --------------
  samples:          24
  evals/sample:     1

It seems using MKL is indeed faster than OPENBLAS.

However, they are still not comparable with matlab. for matlab, the results are:

single thread (by run matlab singleCompThread):

>>A = rand(1000,1000);
>>tic; svd(A);toc
Elapsed time is 0.154609 seconds.

6 threads (run matlab directly):

>> A = rand(1000,1000);
>> tic; svd(A); toc
Elapsed time is 0.074904 seconds.

It turns out matlab still works better than Julia +MKL. The tests are done on Ubuntu, and my julia is compiled from source with MKL. I am not quite sure why julia is still slower, I would be super happy if someone give me some hints. Maybe because the way Julia deals with threads?


#11

In matlab, svd computes only the singular values S when called in an expression (nargout=1). In julia this is the svdvals() function which is much faster. If you want to compare like with like, I think you need svd(A) in julia (which computes the “thin” SVD containing U,S,V factors in a factorization object) and [U,S,V] = SVD(A,'econ') in matlab.

Edit edit: I thought I should check what matlab does when called without outputs (presumably nargout=0). It seems the implicit ans variable is involved in a nontrivial way and the user can set or not set it depending on logic in their function :man_facepalming:. To quote…

If you check for a nargout value of 0 within a function and you specify the value of the output, MATLAB populates ans . However, if you check nargout and do not specify a value for the output, then MATLAB does not modify ans .


#12

Indeed on my machine:

numElements = 1000;

mA = randn(numElements, numElements );
hF = @() svd(mA, 'econ');
hG = @() svd(mA);

timeit(hF, 3)
timeit(hG, 3)
timeit(hG)

Yields:

ans =

    0.1601

ans =

    0.1608

ans =

    0.0781

@Baicai_Xiao, Could you try my code above on your MATLAB?

Regarding MKL, Those are a “Must” videos for proper integration:

I wonder if this is the way Julia integrates MKL.


#13

I ran your code in my machine and got similar results
`ans =

0.1109

ans =

0.1149

ans =

0.0555`

@Chris_Foster You are right, it is because I am not comparing the same thing. I am not a heavy matlab user, I didn’t realize the difference before. Thank you for pointing this out.