mrcinv
March 28, 2018, 10:24pm
1
I noticed that QR decomposition is 2x slower in julia than octave. For example on my computer I get

```
julia> A = rand(5000, 5000); @time qr(A);
27.075522 seconds (37 allocations: 765.687 MiB, 0.83% gc time)
```

and

```
>> A=rand(5000, 5000);
>> tic; qr(A); toc
Elapsed time is 8.35741 seconds.
>>
```

The results are more or less the same if repeated more times.

Anyone has an idea why is that? I thought both julia and octave use BLAS/Lapack, so both methods should take roughly the same amount of time.

1 Like

What does your `versioninfo()`

say? Did you rebuild the system image if you used the generic binary?

I suspect they may be doing different things. I don’t know what `qr`

method Octave calls when no output arguments are provided, but my guess is that it’s just doing the 1st phase (computing the Householder reflectors and upper-triangular part) and not the 2nd (building Q). This would roughly match `qrfact`

in Julia. On my machine, I get:

```
julia> @time qrfact(A);
5.906331 seconds (25.99 k allocations: 194.912 MiB, 2.13% gc time)
julia> @time qr(A);
16.202900 seconds (93.73 k allocations: 770.718 MiB, 1.62% gc time)
```

2 Likes

```
julia> A = rand(5000, 5000); @time qr(A);
14.143329 seconds (294.06 k allocations: 781.888 MiB, 0.90% gc time)
julia> versioninfo()
Julia Version 0.7.0-DEV.4690
Commit 78c7d87369* (2018-03-23 22:25 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 1
julia> A = rand(5000, 5000); @time qrfact(A);
5.583176 seconds (2.32 k allocations: 193.608 MiB, 1.85% gc time)
julia>
```

versus

```
>> A=rand(5000, 5000);
>> tic; qr(A); toc
Elapsed time is 4.189508 seconds.
>> version
ans =
'9.4.0.813654 (R2018a)'
```

mrcinv
March 28, 2018, 11:06pm
5
That would explain the difference. If you call the function qr in octave with both return values `[Q, R] = qr(A)`

, it takes more time, since it has to generate `Q`

```
>> A = rand(5000); tic; [Q,R] = qr(A); toc
Elapsed time is 17.8421 seconds.
>> tic; qr(A); toc
Elapsed time is 9.22192 seconds.
```

Not so for Matlab:

```
>> A=rand(5000, 5000);
>> tic; qr(A); toc
Elapsed time is 4.189508 seconds.
>> version
ans =
'9.4.0.813654 (R2018a)'
>> tic; [q,r]=qr(A); toc
Elapsed time is 5.785541 seconds.
```

PetrKryslUCSD:

Not so for Matlab:

MATLAB uses MKL. For an out of the box comparison there it would be interesting to test JuliaPro+MKL against MATLAB.

3 Likes

Still, there is a big difference in JuliaPro (MKL):

```
julia> A = rand(5000, 5000); @time qr(A);
6.561612 seconds (37 allocations: 765.687 MiB, 2.20% gc time)
julia> versioninfo()
Julia Version 0.6.0
Commit 903644385b* (2017-06-19 13:05 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
WORD_SIZE: 64
BLAS: mkl_rt
LAPACK: mkl_rt
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
```

and MATLAB:

```
>> tic, A=rand(5000, 5000); [Q,R] = qr(A); toc
Elapsed time is 2.760503 seconds.
```

Have you tried using qrfact instead of qr?
In my case the performance of qrfact(A) is approx. 2.5X faster than qr(A), for A = rand(5000, 5000).

mbaz
April 13, 2018, 2:39pm
10
I get similar results:

```
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17* (2017-12-13 18:08 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT NO_AFFINITY HASWELL)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
julia> using BenchmarkTools
julia> A = rand(5000, 5000);
julia> @btime qr($A)
9.746 s (33 allocations: 765.69 MiB)
julia> @btime qrfact($A)
4.266 s (17 allocations: 193.48 MiB)
```

Julia is running in Virtualbox. Native windows Matlab 2018a:

```
>> tic; A=rand(5000, 5000); [Q,R] = qr(A); toc
Elapsed time is 3.738918 seconds.
```

Use the same computer. Timings mean nothing when the same computer isn’t used. VirtualBox is a free shared cloud resource: I would expect it to not have the best performance.

mbaz
April 13, 2018, 2:58pm
12
It is the same computer. Virtualbox is running locally.

1 Like

Oh my bad. For some reason I read JuliaBox…

mbaz
April 13, 2018, 3:01pm
14
Heh, that would indeed be a meaningless comparison

What about Julia(Pro) with MKL? MATLAB is MKL.

mbaz
April 13, 2018, 3:18pm
16
I hadn’t realized that there’s a free version of JuliaPro… I’ll need to try it out. If nobody does it first, I’ll post my results in a few days.

Pier
April 13, 2018, 5:36pm
17
With MKL under Windows:

```
versioninfo()
Julia Version 0.6.2
Commit d386e40c17* (2017-12-13 18:08 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
WORD_SIZE: 64
BLAS: mkl_rt
LAPACK: mkl_rt
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, westmere)
```

Julia:

```
using BenchmarkTools
@btime qr($A)
14.230 s (33 allocations: 765.69 MiB)
@btime qrfact($A)
5.645 s (17 allocations: 193.48 MiB)
```

Matlab:

```
tic; A=rand(5000, 5000); [Q,R] = qr(A); toc
Elapsed time is 8.418957 seconds.
```

In my experience the QR routines benefit from tuning. In particular, for sizes like 5000x5000, hyperthreading is counterproductive and the default block size is too small. So setting `BLAS.set_num_threads(nr_of_physical_cores)`

and calling the LAPACK routines to manually reset the block size (default is 36):

```
julia> A=rand(5000,5000);
julia> A1=copy(A); @time F=Base.LinAlg.LAPACK.geqrt!(A1,256);
1.460697 seconds (14 allocations: 19.532 MiB)
julia> FF=Base.LinAlg.QRCompactWY(F...);
# this finishes computing Q:
julia> B=eye(A); @time Base.LinAlg.LAPACK.gemqrt!('L','N',FF.factors,FF.T,B);
2.057016 seconds (7 allocations: 190.735 MiB, 0.56% gc time)
julia> @time x=qr(A); # for comparison
7.118840 seconds (35 allocations: 765.687 MiB, 2.22% gc time)
julia> vecnorm(x[1]-B) # consistency check
5.419168097206542e-13
```

This is about as fast as Matlab on a comparable machine. (Aside: these times are from a quiet system; I saw wild variation when other programs were minimally active.)

Here I am using Julia 0.6.2 with OpenBLAS.

4 Likes