Have you seen this comparison of Python, Julia, R, Matlab and IDL. There are interesting tables with results:
Oh yeah, this guy who we had to email to tell him that Julia is column major and not row major (I think it was @Ismael-VC who ended up emailing him back in January). He says he fixed that , but if you open up even just problem 3, you’ll see:
function regularTimeStep(u::Matrix) (n, m) = size(u) err = 0.0 for i in 2:n-1 for j in 2:m-1 tmp = u[i,j] u[i,j] = ((u[i-1, j] + u[i+1, j] + u[i, j-1] + u[i, j+1])*4.0 + u[i-1,j-1] + u[i-1,j+1] + u[i+1,j-1] + u[i+1,j+1])/20.0 diff = u[i,j] - tmp err = err + diff*diff end end return u, sqrt(err) end
Hell, it’s not just getting the indexing ordering wrong, it’s also keeping bounds checks on when it’s unnecessary. Even the simplest problem, Problem 1 is missing it:
function foo1(A) N = size(A, 1) for j = 1:N, i = 1:N A[i,j,1] = A[i,j,2] A[i,j,3] = A[i,j,1] A[i,j,2] = A[i,j,3] end end
I’m sure anyone who’s been using Julia for more than a week would see that
@inbounds is missing:
function foo2(A) N = size(A, 1) @inbounds for j = 1:N, i = 1:N A[i,j,1] = A[i,j,2] A[i,j,3] = A[i,j,1] A[i,j,2] = A[i,j,3] end end
A = rand(100,100,100) using BenchmarkTools @btime foo1($A) 14.051 μs (0 allocations: 0 bytes) @btime foo1($A) 10.538 μs (0 allocations: 0 bytes)
So take it as a lower bound for Julia’s performance since any user who’s been using it for a week will get better results. Hell, I would take this whole set of benchmarks with a grain of salt if you haven’t checked the codes yourself given the author didn’t even get row vs column ordering before first posting the results (twice! After he read the email he updated the table to say Julia is column-major, but didn’t update the code ) .
These benchmarks seem pretty flawed to me:
In the first problem, they permute an NxNx3 matrix in all languages, despite the fact that that some are column-major and some are row-major so the cache characteristics are different. And they benchmark the Julia “vectorized” version in global scope, with out-of-place slicing (as opposed to
The second benchmark just measures BLAS
dgemmperformance. i.e. all that matters is which BLAS library is linked.
In the third benchmark (Gauss-Jacobi solver), the Julia implementation’s loops are in the wrong order for column-major (unlike the Fortran and Numba versions, which both have the correct loop order).
The fourth benchmark includes a plotting library in the timing, which seems weird. (And they “couldn’t build the plotting library” in Julia.) A big part of the benchmark seems to be reading NetCDF files, which all use the same external C library. A bunch of the Julia code seems to be really suboptimal (allocating temporary arrays all over the place), but it’s hard to tell if it matters because it’s really unclear to me what this benchmark is measuring.
(This is putting aside the opportunity for lots of micro-optimizations, like
@simd annotations. There is also likely an inconsistency in how many threads were used.)