I suspect they may be doing different things. I don’t know what qr method Octave calls when no output arguments are provided, but my guess is that it’s just doing the 1st phase (computing the Householder reflectors and upper-triangular part) and not the 2nd (building Q). This would roughly match qrfact in Julia. On my machine, I get:
That would explain the difference. If you call the function qr in octave with both return values [Q, R] = qr(A), it takes more time, since it has to generate Q
>> A = rand(5000); tic; [Q,R] = qr(A); toc
Elapsed time is 17.8421 seconds.
>> tic; qr(A); toc
Elapsed time is 9.22192 seconds.
>> A=rand(5000, 5000);
>> tic; qr(A); toc
Elapsed time is 4.189508 seconds.
>> version
ans =
'9.4.0.813654 (R2018a)'
>> tic; [q,r]=qr(A); toc
Elapsed time is 5.785541 seconds.
Use the same computer. Timings mean nothing when the same computer isn’t used. VirtualBox is a free shared cloud resource: I would expect it to not have the best performance.
I hadn’t realized that there’s a free version of JuliaPro… I’ll need to try it out. If nobody does it first, I’ll post my results in a few days.
In my experience the QR routines benefit from tuning. In particular, for sizes like 5000x5000, hyperthreading is counterproductive and the default block size is too small. So setting BLAS.set_num_threads(nr_of_physical_cores) and calling the LAPACK routines to manually reset the block size (default is 36):
This is about as fast as Matlab on a comparable machine. (Aside: these times are from a quiet system; I saw wild variation when other programs were minimally active.)