# Nasa Modelling Guru: Python, Julia, R, Matlab with interesting results

Have you seen this comparison of Python, Julia, R, Matlab and IDL. There are interesting tables with results:

https://modelingguru.nasa.gov/docs/DOC-2625

1 Like

Oh yeah, this guy who we had to email to tell him that Julia is column major and not row major (I think it was @Ismael-VC who ended up emailing him back in January). He says he fixed that , but if you open up even just problem 3, youâ€™ll see:

``````function regularTimeStep(u::Matrix)
(n, m) = size(u)

err = 0.0
for i in 2:n-1
for j in 2:m-1
tmp = u[i,j]
u[i,j] = ((u[i-1, j] + u[i+1, j] +
u[i, j-1] + u[i, j+1])*4.0 +
u[i-1,j-1] + u[i-1,j+1] +
u[i+1,j-1] + u[i+1,j+1])/20.0

diff = u[i,j] - tmp
err = err + diff*diff
end
end

return u, sqrt(err)
end
``````

Hell, itâ€™s not just getting the indexing ordering wrong, itâ€™s also keeping bounds checks on when itâ€™s unnecessary. Even the simplest problem, Problem 1 is missing it:

``````function foo1(A)
N = size(A, 1)
for j = 1:N, i = 1:N
A[i,j,1] = A[i,j,2]
A[i,j,3] = A[i,j,1]
A[i,j,2] = A[i,j,3]
end
end
``````

Iâ€™m sure anyone whoâ€™s been using Julia for more than a week would see that `@inbounds` is missing:

``````function foo2(A)
N = size(A, 1)
@inbounds for j = 1:N, i = 1:N
A[i,j,1] = A[i,j,2]
A[i,j,3] = A[i,j,1]
A[i,j,2] = A[i,j,3]
end
end
``````
``````A = rand(100,100,100)
using BenchmarkTools
@btime foo1(\$A)
14.051 ÎĽs (0 allocations: 0 bytes)
@btime foo1(\$A)
10.538 ÎĽs (0 allocations: 0 bytes)
``````

So take it as a lower bound for Juliaâ€™s performance since any user whoâ€™s been using it for a week will get better results. Hell, I would take this whole set of benchmarks with a grain of salt if you havenâ€™t checked the codes yourself given the author didnâ€™t even get row vs column ordering before first posting the results (twice! After he read the email he updated the table to say Julia is column-major, but didnâ€™t update the code ) .

10 Likes

These benchmarks seem pretty flawed to me:

• In the first problem, they permute an NxNx3 matrix in all languages, despite the fact that that some are column-major and some are row-major so the cache characteristics are different. And they benchmark the Julia â€śvectorizedâ€ť version in global scope, with out-of-place slicing (as opposed to `@views`).

• The second benchmark just measures BLAS `dgemm` performance. i.e. all that matters is which BLAS library is linked.

• In the third benchmark (Gauss-Jacobi solver), the Julia implementationâ€™s loops are in the wrong order for column-major (unlike the Fortran and Numba versions, which both have the correct loop order).

• The fourth benchmark includes a plotting library in the timing, which seems weird. (And they â€ścouldnâ€™t build the plotting libraryâ€ť in Julia.) A big part of the benchmark seems to be reading NetCDF files, which all use the same external C library. A bunch of the Julia code seems to be really suboptimal (allocating temporary arrays all over the place), but itâ€™s hard to tell if it matters because itâ€™s really unclear to me what this benchmark is measuring.

(This is putting aside the opportunity for lots of micro-optimizations, like `@inbounds` or `@simd` annotations. There is also likely an inconsistency in how many threads were used.)

13 Likes