mrcinv
March 28, 2018, 10:24pm
1
I noticed that QR decomposition is 2x slower in julia than octave. For example on my computer I get
julia> A = rand(5000, 5000); @time qr(A);
27.075522 seconds (37 allocations: 765.687 MiB, 0.83% gc time)
and
>> A=rand(5000, 5000);
>> tic; qr(A); toc
Elapsed time is 8.35741 seconds.
>>
The results are more or less the same if repeated more times.
Anyone has an idea why is that? I thought both julia and octave use BLAS/Lapack, so both methods should take roughly the same amount of time.
1 Like
What does your versioninfo()
say? Did you rebuild the system image if you used the generic binary?
I suspect they may be doing different things. I don’t know what qr
method Octave calls when no output arguments are provided, but my guess is that it’s just doing the 1st phase (computing the Householder reflectors and upper-triangular part) and not the 2nd (building Q). This would roughly match qrfact
in Julia. On my machine, I get:
julia> @time qrfact(A);
5.906331 seconds (25.99 k allocations: 194.912 MiB, 2.13% gc time)
julia> @time qr(A);
16.202900 seconds (93.73 k allocations: 770.718 MiB, 1.62% gc time)
2 Likes
julia> A = rand(5000, 5000); @time qr(A);
14.143329 seconds (294.06 k allocations: 781.888 MiB, 0.90% gc time)
julia> versioninfo()
Julia Version 0.7.0-DEV.4690
Commit 78c7d87369* (2018-03-23 22:25 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 1
julia> A = rand(5000, 5000); @time qrfact(A);
5.583176 seconds (2.32 k allocations: 193.608 MiB, 1.85% gc time)
julia>
versus
>> A=rand(5000, 5000);
>> tic; qr(A); toc
Elapsed time is 4.189508 seconds.
>> version
ans =
'9.4.0.813654 (R2018a)'
mrcinv
March 28, 2018, 11:06pm
5
That would explain the difference. If you call the function qr in octave with both return values [Q, R] = qr(A)
, it takes more time, since it has to generate Q
>> A = rand(5000); tic; [Q,R] = qr(A); toc
Elapsed time is 17.8421 seconds.
>> tic; qr(A); toc
Elapsed time is 9.22192 seconds.
Not so for Matlab:
>> A=rand(5000, 5000);
>> tic; qr(A); toc
Elapsed time is 4.189508 seconds.
>> version
ans =
'9.4.0.813654 (R2018a)'
>> tic; [q,r]=qr(A); toc
Elapsed time is 5.785541 seconds.
PetrKryslUCSD:
Not so for Matlab:
MATLAB uses MKL. For an out of the box comparison there it would be interesting to test JuliaPro+MKL against MATLAB.
3 Likes
Still, there is a big difference in JuliaPro (MKL):
julia> A = rand(5000, 5000); @time qr(A);
6.561612 seconds (37 allocations: 765.687 MiB, 2.20% gc time)
julia> versioninfo()
Julia Version 0.6.0
Commit 903644385b* (2017-06-19 13:05 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
WORD_SIZE: 64
BLAS: mkl_rt
LAPACK: mkl_rt
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
and MATLAB:
>> tic, A=rand(5000, 5000); [Q,R] = qr(A); toc
Elapsed time is 2.760503 seconds.
Have you tried using qrfact instead of qr?
In my case the performance of qrfact(A) is approx. 2.5X faster than qr(A), for A = rand(5000, 5000).
mbaz
April 13, 2018, 2:39pm
10
I get similar results:
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17* (2017-12-13 18:08 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT NO_AFFINITY HASWELL)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
julia> using BenchmarkTools
julia> A = rand(5000, 5000);
julia> @btime qr($A)
9.746 s (33 allocations: 765.69 MiB)
julia> @btime qrfact($A)
4.266 s (17 allocations: 193.48 MiB)
Julia is running in Virtualbox. Native windows Matlab 2018a:
>> tic; A=rand(5000, 5000); [Q,R] = qr(A); toc
Elapsed time is 3.738918 seconds.
Use the same computer. Timings mean nothing when the same computer isn’t used. VirtualBox is a free shared cloud resource: I would expect it to not have the best performance.
mbaz
April 13, 2018, 2:58pm
12
It is the same computer. Virtualbox is running locally.
1 Like
Oh my bad. For some reason I read JuliaBox…
mbaz
April 13, 2018, 3:01pm
14
Heh, that would indeed be a meaningless comparison
What about Julia(Pro) with MKL? MATLAB is MKL.
mbaz
April 13, 2018, 3:18pm
16
I hadn’t realized that there’s a free version of JuliaPro… I’ll need to try it out. If nobody does it first, I’ll post my results in a few days.
Pier
April 13, 2018, 5:36pm
17
With MKL under Windows:
versioninfo()
Julia Version 0.6.2
Commit d386e40c17* (2017-12-13 18:08 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
WORD_SIZE: 64
BLAS: mkl_rt
LAPACK: mkl_rt
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, westmere)
Julia:
using BenchmarkTools
@btime qr($A)
14.230 s (33 allocations: 765.69 MiB)
@btime qrfact($A)
5.645 s (17 allocations: 193.48 MiB)
Matlab:
tic; A=rand(5000, 5000); [Q,R] = qr(A); toc
Elapsed time is 8.418957 seconds.
In my experience the QR routines benefit from tuning. In particular, for sizes like 5000x5000, hyperthreading is counterproductive and the default block size is too small. So setting BLAS.set_num_threads(nr_of_physical_cores)
and calling the LAPACK routines to manually reset the block size (default is 36):
julia> A=rand(5000,5000);
julia> A1=copy(A); @time F=Base.LinAlg.LAPACK.geqrt!(A1,256);
1.460697 seconds (14 allocations: 19.532 MiB)
julia> FF=Base.LinAlg.QRCompactWY(F...);
# this finishes computing Q:
julia> B=eye(A); @time Base.LinAlg.LAPACK.gemqrt!('L','N',FF.factors,FF.T,B);
2.057016 seconds (7 allocations: 190.735 MiB, 0.56% gc time)
julia> @time x=qr(A); # for comparison
7.118840 seconds (35 allocations: 765.687 MiB, 2.22% gc time)
julia> vecnorm(x[1]-B) # consistency check
5.419168097206542e-13
This is about as fast as Matlab on a comparable machine. (Aside: these times are from a quiet system; I saw wild variation when other programs were minimally active.)
Here I am using Julia 0.6.2 with OpenBLAS.
4 Likes