SVD 2x slower than in Matlab and how to get best performance on Windows10

Hello ,

Recently, I found that julia svd is slower than the one in matlab. It has been discussed once here in https://github.com/JuliaLang/julia/issues/3521, but after reading this, I am still not quite sure what is exactly happening here. Could someone please help me? Thanks in advance.

The code is

A = rand(1000,1000);
@benchmark svd(A)

output is

BenchmarkTools.Trial:
  memory estimate:  45.90 MiB
  allocs estimate:  13
  --------------
  minimum time:     449.142 ms (0.90% GC)
  median time:      580.609 ms (3.14% GC)
  mean time:        597.990 ms (4.71% GC)
  maximum time:     784.079 ms (2.63% GC)
  --------------
  samples:          9
  evals/sample:     1

and for Matlab is

tic;svd(A);toc

output is

Elapsed time is 0.286179 seconds.

I am using Windows10 system, julia is the binary downloaded from the website. It seems the lapack version for julia is libopenblas64_, and mkl for matlab. Could this be the reasons?

Is anyone able to intall julia on Windows10 so that it has equivalent performance as matlab? If so, how did you do that? Thanks very much.

Sincerely
BaiCai

1 Like

It is likely the reason.

You can build julia with MKL but it is a bit tricky. Check out GitHub - JuliaLinearAlgebra/MKL.jl: Intel MKL linear algebra backend for Julia (but read the caveats).

2 Likes

It’s a little bit embarrassing, but I don’t know how to install it. There is no instructions on how to install it. And since I already have mkl installed on my computer, reinstall mkl is not necessary. Thanks anyway, i would try to figure it out.

It’s very difficult to build with MKL, on a Mac at least. I do it, but it requires

  1. Installing MKL
  2. Setting Make.user
  3. Running make until it errors out
  4. Make symbolic links as explained in https://github.com/JuliaLang/julia/issues/15133
  5. Rerun make

That said, it’s worth it for the speed up. Although you can’t use PyPlot.jl with MKL so for plotting it helps to keep both versions around.

1 Like

For me, on Linux, it is super simple to link against MKL. Just follow the instructions on Julia’s github page.

Also, FWIW, it is possible to have MKL and use PyPlot.jl. See Segfault with Julia MKL build (library conflict) · Issue #443 · JuliaPy/PyCall.jl · GitHub.

2 Likes

What’s wrong with MKL.jl way of doing it? Is it not Mac compatible?

I haven’t tried MKL.jl, but the warning about slow REPL is not encouraging.

In my opinion, the proper solution for new users who wants MATLAB like performance in Julia is bringing back the MKL Flavor of Julia Pro.

Preferably with all the tweaks of the latest versions of MKL (Handling small matrices, doing same operation many times, etc…) talked about.

In real world it will make a significant difference compared to OpenBLAS used now.

1 Like

Update:

I did more benchmarks on MKL and OPENBLAS. Also more comparison with matlab. Here are the results.

MKL:
single thread (by setting BLAS.set_num_threads(1)), the results are:

 > A = rand(1000,1000)
 > @benchmark svd(A)
BenchmarkTools.Trial: 
  memory estimate:  45.90 MiB
  allocs estimate:  13
  --------------
  minimum time:     255.843 ms (0.10% GC)
  median time:      275.403 ms (0.09% GC)
  mean time:        270.219 ms (1.18% GC)
  maximum time:     298.626 ms (7.98% GC)
  --------------
  samples:          19
  evals/sample:     1

6 threads (BLAS.set_num_threads(6) ):

@benchmark svd(A)
BenchmarkTools.Trial: 
  memory estimate:  45.90 MiB
  allocs estimate:  13
  --------------
  minimum time:     108.813 ms (4.38% GC)
  median time:      111.913 ms (0.24% GC)
  mean time:        114.318 ms (1.39% GC)
  maximum time:     154.178 ms (0.23% GC)
  --------------
  samples:          44
  evals/sample:     1

OPENBLAS:

single thread:

> A = rand(1000,1000)
>@benchmark svd(A)
BenchmarkTools.Trial: 
  memory estimate:  45.90 MiB
  allocs estimate:  13
  --------------
  minimum time:     322.695 ms (0.19% GC)
  median time:      356.963 ms (1.86% GC)
  mean time:        353.865 ms (3.56% GC)
  maximum time:     385.039 ms (8.32% GC)
  --------------
  samples:          15
  evals/sample:     1

6 threads:

>A = rand(1000,1000)
>@benchmark svd(A)
BenchmarkTools.Trial: 
  memory estimate:  45.90 MiB
  allocs estimate:  13
  --------------
  minimum time:     182.592 ms (3.35% GC)
  median time:      202.389 ms (3.40% GC)
  mean time:        212.733 ms (5.51% GC)
  maximum time:     251.271 ms (3.24% GC)
  --------------
  samples:          24
  evals/sample:     1

It seems using MKL is indeed faster than OPENBLAS.

However, they are still not comparable with matlab. for matlab, the results are:

single thread (by run matlab singleCompThread ):

>>A = rand(1000,1000);
>>tic; svd(A);toc
Elapsed time is 0.154609 seconds.

6 threads (run matlab directly):

>> A = rand(1000,1000);
>> tic; svd(A); toc
Elapsed time is 0.074904 seconds.

It turns out matlab still works better than Julia +MKL. The tests are done on Ubuntu, and my julia is compiled from source with MKL. I am not quite sure why julia is still slower, I would be super happy if someone give me some hints. Maybe because the way Julia deals with threads?

1 Like

In matlab, svd computes only the singular values S when called in an expression (nargout=1). In julia this is the svdvals() function which is much faster. If you want to compare like with like, I think you need svd(A) in julia (which computes the “thin” SVD containing U,S,V factors in a factorization object) and [U,S,V] = SVD(A,'econ') in matlab.

Edit edit: I thought I should check what matlab does when called without outputs (presumably nargout=0). It seems the implicit ans variable is involved in a nontrivial way and the user can set or not set it depending on logic in their function :man_facepalming:. To quote…

If you check for a nargout value of 0 within a function and you specify the value of the output, MATLAB populates ans . However, if you check nargout and do not specify a value for the output, then MATLAB does not modify ans .

7 Likes

Indeed on my machine:

numElements = 1000;

mA = randn(numElements, numElements );
hF = @() svd(mA, 'econ');
hG = @() svd(mA);

timeit(hF, 3)
timeit(hG, 3)
timeit(hG)

Yields:

ans =

    0.1601

ans =

    0.1608

ans =

    0.0781

@Baicai_Xiao, Could you try my code above on your MATLAB?

Regarding MKL, Those are a “Must” videos for proper integration:

I wonder if this is the way Julia integrates MKL.

I ran your code in my machine and got similar results
`ans =

0.1109

ans =

0.1149

ans =

0.0555`

@c42f You are right, it is because I am not comparing the same thing. I am not a heavy matlab user, I didn’t realize the difference before. Thank you for pointing this out.

3 Likes

Thank you for your post, I met the same situation. So the final solution is using a MKL package. I have tested it on my own window computer, and for matrices with dimensions (1000,1000), matlab and Julia’s svd matrix decomposition time is about the same. But when I tried to test the svd of an array with dimension (10000,10000), I found that matlab and Julia run about the same time for single-thread cases, but Julia is still significantly slower than matlab for multithreaded cases.
6 threads:

#matlab
>> a=rand(10000,10000);
>> tic;[u,s,v]=svd(a);toc
Elapsed time is 141.392837 seconds.
#julia
using MKL
using LinearAlgebra
using BenchmarkTools

a=rand(10000,10000)
@btime u,s,v=svd(a)
>296.304 s (17 allocations: 4.47 GiB)

I don’t know what’s going on. Can someone help me with this?

I checked on my PC. I can confirm the issue. I noticed that the CPU temperature when running the Julia code reached 70 °C, when running the Matlab code it reached 100 °C. In both cases the CPU load was 1600% (16 cores).

For me this looks as if Matlab is using different CPU commands than Julia+MKL.

I have a Ryzen 7950x CPU. Which CPU do you have? Perhaps MKL does not support AMD CPUs well?

2 Likes

I’m almost certain that this is an insignificant blip next to svd’s runtimes, but generally interpolate global variables and constants u,s,v = @btime svd($a) in BenchmarkTools to leave out compiler shenanigans. I moved the assignments because they disappear inside the benchmark expression, which doesn’t seem to be your intent based on the MATLAB benchmark:

julia> @btime a = rand($1)
  23.015 ns (2 allocations: 64 bytes)
1-element Vector{Float64}:
 0.587895628234269

julia> a
ERROR: UndefVarError: `a` not defined in `Main`
1 Like

I have a Intel(R) Core™ i7-8700 CPU, not AMD CPUs. It seems that the problem is not caused by the CPU type.

Sorry, I cann’t get your idea, in another words, I don’t understand what this has to do with interpolation.

$ is the interpolation operator for strings and expressions. Here, $a interpolates the value of a into the benchmarking expression. You can read about it here: Manual · BenchmarkTools.jl

The most common use of $ is perhaps in string interpolation: Strings · The Julia Language

Example:

julia> str = "World"
"World"

julia> "Hello, $(str)!"
"Hello, World!"
2 Likes

Also worth pointing out that $-interpolation is not a general feature in macro calls because source code doesn’t have $:

julia> macro pass(blah) blah end
@pass (macro with 1 method)

julia> @pass 1+1
2

julia> @pass $1+$1
ERROR: syntax: "$" expression outside quote around REPL[17]:1

Macros need to specially implement them for the context. @eval emulates $-interpolation in quotes that can be inputs to eval, while BenchmarkTools.quasiquote! collects $-interpolated expressions to be incorporated into the benchmark functions.

But that’s not why the time cost is slower than matlab, is it?